Re: [Wikitech-l] Acceptable use of API

2010-09-24 Thread Robert Ullmann
Hi, You don't need the full dumps. Look at (for example) the tr.wp dump that is running at the moment: http://download.wikimedia.org/trwiki/20100924/ you'll see the text dumps and also dumps of various SQL tables. Look at the one that is labelled "Wiki interlanguage link records." You ought to

Re: [Wikitech-l] Showing bytes added/removed in each edit in "View history" and "User contributions"

2010-08-03 Thread Robert Ullmann
Ahem. The revision size (and page size, meaning that of last revision) in bytes, is available in the API. If you change the definition there is no telling what you will break. Essentially you can't. A character count would have to be another field. best, Robert On Tue, Aug 3, 2010 at 9:53 AM, C

Re: [Wikitech-l] Unicode equivalence

2010-05-29 Thread Robert Ullmann
y 29, 2010 at 5:31 PM, Robert Ullmann wrote: > In December, Praveen Prakesh wrote > "We are currently using Unicode 5.1 Redirect to unicode 5.0 titled > articles in some cases. After converting both these titles become same. > Is that a problem?" > > Yes, it is ... >

Re: [Wikitech-l] Unicode equivalence

2010-05-29 Thread Robert Ullmann
9, 2010 at 5:26 PM, Platonides wrote: > Robert Ullmann wrote: >> I've looked at this a bit more. There are more serious problems. >> >> Apparently, no-one converted the 5.0 titles in the wiki to 5.1 when >> "normalization" was turned on; there are pages

Re: [Wikitech-l] Unicode equivalence

2010-05-29 Thread Robert Ullmann
I've looked at this a bit more. There are more serious problems. Apparently, no-one converted the 5.0 titles in the wiki to 5.1 when "normalization" was turned on; there are pages that can't be accessed. (!) for example, try this (Malayalam for "fish"): http://ml.wiktionary.org/wiki/%E0%B4%AE%E0%

Re: [Wikitech-l] Unicode equivalence

2010-05-22 Thread Robert Ullmann
Hi, If you don't still have this thread, the background is that the Malayam projects want to, and are, using Unicode 5.1 for five characters that have composed code points in 5.1, and decomposed in 5.0. The equivalences are: CHILLU NN 0D23, 0D4D, 200D0D7A CHILLU N 0D28, 0D4D, 20

Re: [Wikitech-l] Wiki links beginning with slash character

2010-03-28 Thread Robert Ullmann
Yes, just leave out the extraneous : you have included for some reason. {{/nav}} will work nicely, if the namespace allows subpages. If you'd like to see an amusing example of / in a namespace without subpages, see: http://en.wiktionary.org/wiki// Robert On Mon, Mar 29, 2010 at 3:05 AM, Jim Tit

Re: [Wikitech-l] importing enwiki into local database

2010-02-18 Thread Robert Ullmann
Hi, There is a note here: http://www.mediawiki.org/wiki/Extension:ParserFunctions saying you should use a different version of ParserFunctions for 1.15.1. I'm not at all sure what that actually means; but the problem definitely seems to be within ParserFunctions ... Robert __

Re: [Wikitech-l] importing enwiki into local database

2010-02-14 Thread Robert Ullmann
Hi, On Sun, Feb 14, 2010 at 11:03 AM, Eric Sun wrote: > I'm using MediaWiki 1.15.1 and I imported the dump using xml2sql. > Most enwiki pages render correctly, but a bunch of pages (e.g. > Jennifer_Garner) show spurious tags (inspecting the page > source show a bunch of ). Are you using

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Robert Ullmann
> I've been spending hours on the parsing now and don't find it simple > at all due to the fact that templates can be nested. Just extracting > the Infobox as one big lump is hard due to the need to match nested {{ > and }} > > Andrew Dunbar (hippietrail) Hi, Come now, you are over-thinking it. F

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Robert Ullmann
Hi Hippietrail! What do you mean by "intractably slow"? Just how fast must it be? If I do http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Infobox_Language&eilimit=100&einamespace=0 it says (on one given try) that it was served in 0,047 seconds. How long can it take

Re: [Wikitech-l] Software updates Wednesday morning

2009-09-28 Thread Robert Ullmann
Hi, On Wed, Sep 23, 2009 at 8:28 PM, Platonides wrote: > Aryeh Gregor wrote: > >> I've been meaning to investigate this, but haven't found the time yet. >>  Have you come up with a minimal test case, or filed a bug with >> Mozilla?  I'd be willing to look at this if I get the time, but I >> don't

Re: [Wikitech-l] Software updates Wednesday morning

2009-09-20 Thread Robert Ullmann
almost always on Windoze ;-) Best, Robert On Thu, Sep 17, 2009 at 11:20 PM, Platonides wrote: > Robert Ullmann wrote: >> Hi, >> >> Something bad happened, having to do with the "legend" junk add to RC >> and similar pages. Firefox will go compute bound

Re: [Wikitech-l] Software updates Wednesday morning

2009-09-17 Thread Robert Ullmann
Hi, Something bad happened, having to do with the "legend" junk add to RC and similar pages. Firefox will go compute bound (or very nearly) as long as the page is open, even if hours. It isn't java/javascript (first suspect ;-), turning them off has no effect. It doesn't quite happen with 50 chan

Re: [Wikitech-l] Speed of parsing messages (was: how to chang {{SITENAME}})

2009-09-16 Thread Robert Ullmann
Like maybe:$1 thingy ? does that do it? Robert On Wed, Sep 16, 2009 at 4:32 PM, Roan Kattouw wrote: > 2009/9/16 Tisza Gergő : >> Some of them may be rephrased, and some localizations do not really need >> them at >> all. Foe example, in Hungarian " " constructs the noun is >> always >> in

Re: [Wikitech-l] Wiktionary API acceptable use policy

2009-09-03 Thread Robert Ullmann
Hi, In general a small number of requests is fine, but large numbers (using the wikts as a live back-end database) is not so good. (Note that "live mirrors", re-presenting WM data as part of another site are explicitly prohibited. ;-) What you should probably do is use the XML dumps from http://d

Re: [Wikitech-l] HTML not Rendered correctly after Import of Wikipedia

2009-03-15 Thread Robert Ullmann
Hi, On Fri, Mar 13, 2009 at 8:37 PM, O. O. wrote: > Hi, >        I attempted to import the English Wikipedia into MediaWiki by first ... > The problem that I am now facing is that the HTML Rendered is wrong in > places. Mostly this happens at the beginning of the text on the Page. > For example i

Re: [Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
now going down slowly, as you would expect. Perhaps "normal" ;-) On Wed, Mar 4, 2009 at 2:07 AM, Robert Ullmann wrote: > now we have server lag at 870 seconds, slowly, but inexorably, increasing (?) > Robert > > On Tue, Mar 3, 2009 at 11:30 AM, Robert Ullmann wrote: &

Re: [Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
now we have server lag at 870 seconds, slowly, but inexorably, increasing (?) Robert On Tue, Mar 3, 2009 at 11:30 AM, Robert Ullmann wrote: > Ah, that makes sense. Also explains why the lag dropped back from > 42000 to <5 a few minutes ago ;-) > Thanks, Robert > >> A se

Re: [Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
Ah, that makes sense. Also explains why the lag dropped back from 42000 to <5 a few minutes ago ;-) Thanks, Robert > A server was taken out of rotation and its slave process stopped to > produce a dump for the toolserver. It was put back into rotation > before it caught up, and so it was suddenly

[Wikitech-l] Server lag at 45K seconds

2009-03-03 Thread Robert Ullmann
I have a bot called Interwicket which keeps itself busy adding and updating the language links for the wiktionaries (namespace 0); it is much more efficient for that than the "standard" pedia bot. It uses the API, and the maxlag parameter, and is a good citizen, backing off sharply as maxlag exceed

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-25 Thread Robert Ullmann
Hi, On Thu, Feb 26, 2009 at 2:29 AM, Andrew Garrett wrote: > On Thu, Feb 26, 2009 at 5:08 AM, John Doe wrote: >> But server space saved by compression would be would be compensated by the >> stability, and flexibility provided by this method. this would allow what >> ever server is controlling t

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-24 Thread Robert Ullmann
> The worry bit is that it seems srv136 will now work as apache. > So, where will dumps be done? I'm not sure where (or if it has changed), but they are running now (:-) To Ariel Glenn: On getting them to work better in the future, this is what I would suggest: First, note that everything

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-23 Thread Robert Ullmann
Let me ask a separate question (Ariel may be interested in this): What if we took the regular permanent media backups, and WMF filtered them in house just to remove the classified stuff (;-), and then put them somewhere where others could convert them to the desired format(s)? (Build all-history f

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-23 Thread Robert Ullmann
On Tue, Feb 24, 2009 at 6:49 AM, Andrew Garrett wrote: > On Tue, Feb 24, 2009 at 1:07 PM, Robert Ullmann wrote: >> Really? I mean is this for real? >> >> The sequence ought to be something like: breaker trips, monitor shows >> within a minute or two that 4 servers are

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-23 Thread Robert Ullmann
Hmm: On Mon, Feb 23, 2009 at 9:04 PM, Russell Blau wrote: > 2) Within the last hour, the server log at > http://wikitech.wikimedia.org/wiki/Server_admin_log indicates that Rob found > and fixed the cause of srv31 (and srv32-34) being down -- a circuit breaker > was tripped in the data center.

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-22 Thread Robert Ullmann
Hi, Maybe I should offer a constructive suggestion? Clearly, trying to do these dumps (particularly "history" dumps) as it is being done from the servers is proving hard to manage I also realize that you can't just put the set of daily permanent-media backups on line, as they contain lots of use

Re: [Wikitech-l] Dump processes seem to be dead

2009-02-22 Thread Robert Ullmann
What is with this? Why are the XML dumps (the primary product of the projects: re-usable content) the absolute effing lowest possible effing priority? Why? I just finished (I thought) putting together some new software to update iwikis on the wiktionaries. It is set up to read the "langlinks" and

[Wikitech-l] API breakage a few hours ago

2009-02-18 Thread Robert Ullmann
The API return in XML for queries with no result elements was broken a few hours ago. I believe the culprit is r46845 http://www.mediawiki.org/wiki/Special:Code/MediaWiki/46845 which does say it is a "breaking change", but the break intended is that applications may see query-continue earlier than