[Wikitech-l] Unbreaking statistics

2009-06-05 Thread Peter Gervai
Hello,

I see I've created quite a stir around, but so far nothing really
useful popped up. :-(

But I see that one from Neil:
> Yes, modifying the http://stats.grok.se/ systems looks like the way to go.

For me it doesn't really seem to be, since it seems to be using an
extremely dumbed down version of input, which only contains page views
and [unreliable] byte counters. Most probably it would require large
rewrites, and a magical new data source.

> What do people actually want to see from the traffic data? Do they want
> referrers, anonymized user trails, or what?

Are you old enough to remember stats.wikipedia.org? As far as I
remember originally it ran webalizer, then something else, then
nothing. If you check a webalizer stat you'll see what's in it. We are
using, or we used until our nice fellow editors broke it, awstats,
which basically provides the same with more caching.

Most used and useful stats are page views (daily and hourly stats are
pretty useful too), referrers, visitor domain and provider stats, os
and browser stats, screen resolution stats, bot activity stats,
visitor duration and depth, among probably others.

At a brief glance I could replicate the grok.se stats easily since it
seems to work out of http://dammit.lt/wikistats/, but it's completely
useless for anything beyond page hit count.

Is there a possibility to write a code which process raw squid data?
Who do I have to bribe? :-/

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] wikimedia, wikipedia and ipv6

2009-06-12 Thread Peter Gervai
Ehlo,

I see that this topic has been popped up from time to time since 2004,
and that most of the misc servers have been already IPv6 enabled. I
have checked around whether google have any info on that, and found a
few (really few) mail on that, from the original 2004 test to a
comment from 2008 that squid and mediawiki is the problem, apart from
some smaller issues. (As a sidenote, google don't seem to find
anything on site:lists.wikimedia.org about "ipv6", interesting.)

Now, squid fully supports IPv6 as of now (since 3.1), so I guess
that's check. (I didn't try it, though, but others seem to have.)

MediaWiki, well, http://www.mediawiki.org/wiki/IPv6_support didn't
mention any outstanding problem and the linked bug is closed, so as
far as I'm observing (without actually testing it) it looks okay.

The database structure may require some tuning as far as I see. Right?

Apache handle it since eternity, php does I guess.

Are there any further, non v6 compatible components in running a
wikipedia? If not, is there any outstanding proble which would make it
impossible to fire up a test interface on ipv6?

I'd say to use a separate host, like en.ipv6.wikipedia.org, and not to
worry about the cache efficiency because I doubt that the ipv6 level
traffic would really measure up to the ipv4 one. At least it could be
properly measured, and decision should base on facts how to go on.

Maybe there's a test host already on, but I wasn't able to find it, so
I guess nobody else can. ;-)

Is there any further problem in this topic require solutions, or it
just didn't occur to anyone lately?

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikimedia, wikipedia and ipv6

2009-06-12 Thread Peter Gervai
On Fri, Jun 12, 2009 at 13:55, Aryeh
Gregor wrote:
> This might be useful, although most of the info is probably outdated:
> http://wikitech.wikimedia.org/view/Special:Search?search=ipv6&go=Go

Yep, including the dead labs link.

But it mentioned LVS [ipvs], dunno whether we use it or not, but it
supports ipv6 either. ;-)

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikimedia, wikipedia and ipv6

2009-06-15 Thread Peter Gervai
On Fri, Jun 12, 2009 at 23:55, Platonides wrote:

> List archives are not searchable by google.

Is it on purpose? Why?

> I don't think so.

You quoted back 3 questions and answered one of them, dunno which. :-)

> MediaWiki code may need some assumptions about it, though.

> I think this comment in the config summarises it:
> "no IPv6 support - 20051207"

I don't think so. If you look more carefully you see references about
code cleanups regarding IPv6, actually lots of them, and generally it
seems that people silently work on that, so maybe it's just checking
it again, since 2005 wasn't yesterday.

Any core devels with opinions? Tim? Brion?

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [WikiEN-l] MediaWiki is getting a new programming language

2009-07-08 Thread Peter Gervai
On Wed, Jul 8, 2009 at 10:16, Gerard Meijssen wrote:
> The argument that a language should be readable and easy to learn is REALLY
> relevant and powerful. A language that is only good for geeks is detrimental
> to the use of MediaWiki. Our current templates and template syntax are
> horrible. Wikipedia is as a consequence hardly editable by everyone.

Mortals _use_ the templates, not _create_ them. Geeks create templates
for mortals.

Current syntax is indeed horrible, but complete readibility is not the
main issue I'd say. Security, speed and flexibility should be, along
the ease of implementation.

Peter

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [WikiEN-l] MediaWiki is getting a new programming language

2009-07-08 Thread Peter Gervai
On Wed, Jul 8, 2009 at 12:23, Neil Harris wrote:

> {{#switch:
> {{#iferror: {{#expr: {{{1}}} + {{{2}}} }} | error | correct }}
> | error = that's an error
> | correct = {{{1}}} + {{{2}}} = {{#expr: {{{1}}} + {{{2}}} 

{{#perl
if( error( eval( "$arg1+$arg2" ) ) ) {
 return "that's an error";
} else {
 return "1+2=", $arg1+$arg2;
}
}}

:-)

But really, any above cited language should work, the more easy to
parse (and to create an interpreter) the better. I guess a lua-like
language should be easy and readable.

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [WikiEN-l] MediaWiki is getting a new programming language

2009-07-17 Thread Peter Gervai
On Fri, Jul 17, 2009 at 12:06, Gerard Meijssen wrote:

> Well this strength is not that great when people like myself who has commit
> right on SVN does not want to touch templates with a barge pole if I can
> help it. Wikipedia is supposed to be this thing everybody can edit.

I think you misprioritise the whole thing.

Consider it a feature, not a base functionality. Most installations do
not use 10% of the possible features available, due to lack of
knowledge, time, bravery or else. Writing templates with code is a
rare art by my observation, most of the larger MW installations never
have used it in the first hand.

You would like to remove TeX (math) input because the language is complex?

And since it'd be an extension I guess, it would imply that you want
to forbid(?) creating complex extensions? That's unrealistic. Geeks
need this functionality, so ungeeks may or may not care about that, it
doesn't really matter. If it's easier to understand than not: that's a
plus.

By the way I wouldn't touch PHP with a teen feet pole and rubber
gloves, but I'm fine with current template syntax. We're individuals
with our own preferences. ;-)

grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikitext vs. WYSIWYG (was: Proposal for editing template calls within pages)

2009-09-24 Thread Peter Gervai
On Thu, Sep 24, 2009 at 14:36, David Gerard  wrote:

> However, impenetrable wikitext is one of *the* greatest barriers to
> new users on Wikimedia projects. And this impenetrability is not, in
> any way whatsoever or by any twists of logic, a feature.

Adding a gui layer to wikitext is always okay, as long as it's
possible to get rid of, since majority of edits not coming from "new
users", and losing flexibility for power users to get more newbies
doesn't sound like a good deal to me.

At least all of the GUIs I've seen were slow and hard to use, and
resulted unwanted (side) effects if something even barely complex were
entered. And this isn't the problem of Wikipedia: google docs, which
is one of the most advanced web-based gui systems I guess have plenty
of usability problems, which only can be fixed by messing with the
Source. And many core people want to mess with the source.

So, adding a newbie layer is okay as long as you don't mess up the
work of the non-newbies.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Bugzilla Vs other trackers.

2010-01-06 Thread Peter Gervai
On Thu, Jan 7, 2010 at 08:17, Robert Rohde  wrote:
> On Wed, Jan 6, 2010 at 9:14 PM, Steve Bennett  wrote:

>> Anyone tried FogBugz, Joel Spolsky's baby? I'm so curious... although it's
>> commercial software, who knows, you might get a discount or even a freebie.
>
> The historical position has been that absolutely nothing goes into the
> WMF software pool unless it is open source.

I deeply agree that.

> However, my recollection is based on discussions years ago.  On
> searching, I couldn't find any policy forbidding closed source
> software (is there one?).  So, it is possible that closed source might
> be looked on as a more acceptable possibility for some functions now
> (though I wouldn't bet on it).

Wouldn't be nice. First, it's an attitude thing: we want (and have to)
promote open stuff.
Second, it isn't nice to show something to the users they cannot use
themselves. It's kind of against or basic principle of "you can do
what we do, you're free to do it, we just do it better" :-)

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] "Google phases out support for IE6"

2010-01-31 Thread Peter Gervai
What about creating an "monobo'oldies" theme for them? I mean, move
current stuff to oldies, and drop elders support from monobook.

-- 
 byte-byte,
grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] "Google phases out support for IE6"

2010-02-02 Thread Peter Gervai
On Tue, Feb 2, 2010 at 00:44, Gregory Maxwell  wrote:

> People are really bad at complaining, especially web users.  We've had
> prolonged obvious glitches which must have effected hundreds of
> thousands of people and maybe we get a couple of reports.

For Average Joe and Jane it usually isn't obvious what to do when
something's broken. I've observed people using really broken websites
(fallen apart layout, broken menus) and never "report" but complain to
their colleagues. I second that people are bad at reporting problems,
and I must add that computer people are usually bad to get the
complaints and fix them. ;-) I guess if you have a problem and you
know someone who can do something about it, then it'll get fixed,
otherwise it _may_ get fixed, one day or other. [I've experienced this
latter problem regarding email config bugs and [not] having them
fixed.]

Nevertheless I wouldn't miss any IE features, but then again I'm an
anti m$ fascist by genetics. ;-)

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Version control

2010-02-07 Thread Peter Gervai
On Sun, Feb 7, 2010 at 00:38, Ævar Arnfjörð Bjarmason  wrote:

> It's interesting that the #1 con against Git in that document is "Lots
> of annoying Git/Linux fanboys".

No, it's the "screaming 'hell yeah!' but have no idea what they're
talking about" part. :-)

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] What is wrong with Wikia's WYSIWYG?

2011-05-02 Thread Peter Gervai
On Mon, May 2, 2011 at 10:02, Tim Starling  wrote:

> don't think using wikitext is the best way to make things easier for
> new users.

It's always been a dilemma for me to see how much amount of computer illiterate
users should be wished for (or actually possible to tolerate). I don't
feel that
web number dot number (whatever version we call it) is fast, reliable, useful
enough to be used as the main way to input encyclopedia text. These
usually very slow, and quite unreliable (including google docs stuff which is
I believe the most advanced tech out here in this topic).

And... People habitually completely get lost in DTP software (be that
[open/whatever]office or else), they can't comprehend formatting, fonts, text
annotation and other advanced features. I do not see that WYSIWYG would've
made them more able to use the techniques. There are some guys who actually
learned enough markup to completely screw up wikibooks (putting flashing 80pt
large fonts in scrolling frames with all kinds of - otherwise not
horrible on purpose - features
of CSS), and I just fear what they can do with wysiwig. Such texts are sometimes
just easier to completely reformat (reset ALL formatting to default
and start over).

Foundation's purpose is to make it easier for everyone and to invite
and involve
everyone, I know.  I just have my doubts and worries, which I have just shared.

Peter

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XKCD: Extended Mind

2011-05-25 Thread Peter Gervai
On Wed, May 25, 2011 at 09:24, Tim Starling  wrote:
> On 25/05/11 17:05, Domas Mituzas wrote:
>> On May 25, 2011, at 9:35 AM, K. Peachey wrote:
>>
>>> http://xkcd.org/903/ -Peachey
>>
>> that error is fake! 10.0.0.242 is internal services DNS server and
>> is not used to serve en.wikipedia.org - dberror log does not have a
>> single instance of it! 10.0.6.42 on the other hand
>
> I would have thought the fact that it was hand drawn would have given
> it away.

But in this particular case hand drawn doesn't mean facts can slip. At
least these drawings are usually extremely precise. (You can see which
pulldowns he usually keep open. :-))

I second Domas to check because there may be a super secret conspiracy
and the drawing may be correct. ;-)
-- 
 byte-byte,
    grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XKCD: Extended Mind

2011-05-25 Thread Peter Gervai
On Wed, May 25, 2011 at 17:16, Domas Mituzas 

Thanks for clearing that up. Nice work.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XKCD: Extended Mind

2011-05-26 Thread Peter Gervai
On Thu, May 26, 2011 at 17:38, Leo Koppelkamm  wrote:
> http://ryanelmquist.com/cgi-bin/xkcdwiki

Nice way to see that first sentences eventually lead to a general
quantity or property which links to [[property (phylosophy)]] which
links to Philosophy itself. So far I didn't see a way which wasn't
following 'property'.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Errors in Wikimedia Commons old files

2012-02-29 Thread Peter Gervai
On Thu, Mar 1, 2012 at 00:56, emijrp  wrote:
> I'm trying to download Wikimedia Commons, but I have found some errors. For

There are still occasional errors around, would be nice to run a
script against the files database... but it can be usually fixed by
downloading (sometimes from history) and upoading again.

g

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] forking media files

2011-08-15 Thread Peter Gervai
Let me retitle one of the topics nobody seems to touch.

On Fri, Aug 12, 2011 at 13:44, Brion Vibber  wrote:

> * media files -- these are freely copiable but I'm not sure the state of
> easily obtaing them in bulk. As the data set moved into TB it became
> impractical to just build .tar dumps. There are batch downloader tools
> available, and the metadata's all in dumps and api.

Right now it is basically locked: there is no way to bulk copy the
media files, including doing simply a backup of one wikipedia, or
commons. I've tried, I've asked, and the answer was basically to
contact a dev and arrange it, which obviously could be done (I know
many of the folks) but that isn't the point.

Some explanations were mentioned, mostly mentioning that media and its
metadata is quite detached, and thus it's hard to enforce licensing
quirks like attribution, special licenses and such. I can guess this
is a relevant comment since the text corpus is uniformly licensed
under CC/GFDL while the media files are at best non-homogeneous (like
commons, where everything's free in a way) and completely chaos at its
worst (individual wikipedias, where there may be anything from
leftover fair use to copyrighted by various entities to images to be
deleted "soon").

Still, I do not believe it's a good method to make it close to
impossible to bulk copy the data. I am not sure which technical means
is best, as there are many competing ones.

We could, for example, open up an API which would serve media file
with its metadata together, possibly supporting mass operations.
Still, it's pretty ineffective.

Or we could support zsync, rsync and such (and I again recommend
examining zsync's several interesting abilities to offload the work to
the client), but there ought to be some pointers to image metadata, at
least an oneliner file with every image linking to the license page.

Or we could connect the bulk way to established editor accounts, so we
could have at least a bit of an assurance that s/he knows what s/he's
doing.

-- 
 byte-byte,
    grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] forking media files

2011-08-15 Thread Peter Gervai
On Mon, Aug 15, 2011 at 18:40, Russell N. Nelson - rnnelson
 wrote:
> The problem is that 1) the files are bulky,

That's expected. :-)

> 2) there are many of them, 3) they are in constant flux,

That is not really a problem: since there are many of them
statistically they are not in flux.

> and 4) it's likely that your connection would close for whatever reason 
> part-way through the download..

I seem not to forgot to mention zsync/rsync. ;-)

> Even taking a snapshot of the filenames is dicey. By the time you finish, 
> it's likely that there will be new ones, and possible that some will be 
> deleted. Probably the best way to make this work is to 1) make a snapshot of 
> files periodically,

Since I've been told they're backed up it naturally should exist.

> 2) create an API which returns a tarball using the snapshot of files that 
> also implements Range requests.

I would very much prefer ready-to-use format instead of a tarball, not
to mention it's pretty resource consuming to create a tarball just for
that.

> Of course, this would result in a 12-terabyte file on the recipient's host. 
> That wouldn't work very well. I'm pretty sure that the recipient would need 
> an http client which would 1) keep track of the place in the bytestream and 
> 2) split out files and write them to disk as separate files. It's possible 
> that a program like getbot already implements this.

I'd make a snapshot without tar especially because partial transfers
aren't possible that way.

-- 
 byte-byte,
    grin

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l