Re: [Wikitech-l] Proposal: slight change to the XML dump format

2014-10-29 Thread Andrew Dunbar
I noticed that the dump format version number went from 0.9 to 0.10. I wonder if this format is documented somewhere or if some code might expect 1.0? Andrew Dunbar (hippietrail) On 28 October 2014 20:45, Daniel Kinzler dan...@brightbyte.de wrote: Am 27.10.2014 21:58, schrieb Ariel T. Glenn

Re: [Wikitech-l] Distinguishing disambiguation pages

2012-12-26 Thread Andrew Dunbar
It would also be great if these pages were marked in the dump files too. It should be exactly the same way as how redirect pages are marked. On 27 December 2012 01:41, Brad Jorsch bjor...@wikimedia.org wrote: On Tue, Dec 25, 2012 at 6:00 AM, Liangent liang...@gmail.com wrote: Is this

Re: [Wikitech-l] HTML wikipedia dumps: Could you please provide them, or make public the code for interpreting templates?

2012-09-09 Thread Andrew Dunbar
this will clean up the code to the point that making your own parser becomes a lot easier. Good luck and sympathy (-: Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Order of execution JavaScript extensions

2012-06-06 Thread Andrew Dunbar
setting a breakpoint on the node in Google Chrome's dev tools and reloading the page, but it is never triggered. Apologies if this is not the right mailing list. None of the lists seemed fit according to http://www.mediawiki.org/wiki/Mailing_lists Andrew Dunbar (hippietrail

Re: [Wikitech-l] Order of execution JavaScript extensions

2012-06-06 Thread Andrew Dunbar
On 6 June 2012 13:57, Bergi a.d.be...@web.de wrote: Andrew Dunbar schrieb: I'm having trouble getting a simple one-line User JS working on Wiktionary. Apologies if this is not the right mailing list. None of the lists seemed fit according to http://www.mediawiki.org/wiki/Mailing_lists I

Re: [Wikitech-l] Visual watchlist

2012-05-18 Thread Andrew Dunbar
interested in. 3) The history page give me a way to get a diff of all changes since my last edit, rather than just the most recent chage. Andrew Dunbar (hippietrail)   4. 0 is colored grey making it disappear from the list. But that does   not mean the article never changed, it could be +400

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-14 Thread Andrew Dunbar
the (Redirected from X) bar that accompanies the redirects The JavaScript we use on the English Wiktionary also makes a slightly different (Automaticaly redirected from X) bar, or something very similar. Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-13 Thread Andrew Dunbar
of seconds. With the different nature of Wikipedia titles you would probably want to check sentence case and title case but would still miss quite a few where only proper nouns within the title are capitalized. And some people would probably hate such a feature too (-: Andrew Dunbar (hippietrail

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-13 Thread Andrew Dunbar
never been enough interest and it's never been important enough and no developer has ever stepped up. It would take a bit of work to implement. Andrew Dunbar (hippietrail) 2011/5/12 Carl (CBM) cbm.wikipe...@gmail.com On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth j...@baylink.com wrote: They're

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-13 Thread Andrew Dunbar
of the canonicalization. Andrew Dunbar (hippietrail) Some projects, like probably all Wiktionaries, would doubtless not want case-folding at all, so we should support different canonicalization algorithms.  Even the ones that don't want case-folding could still benefit from allowing underscores

Re: [Wikitech-l] Licensing (Was: WYSIWYG and parser plans)

2011-05-03 Thread Andrew Dunbar
license this hypothetical code would be released under. - Trevor I'm pretty sure the offline wikitext parsing community would care about the licensing as a separate issue to what kind of parser technology it uses internally. Andrew Dunbar (hippietrail) On Tue, May 3, 2011 at 1:25 PM, David

Re: [Wikitech-l] Licensing (Was: WYSIWYG and parser plans)

2011-05-03 Thread Andrew Dunbar
readers want as close results to the official sites as possible so will want to implement the same hooks. Other non-wikitext or non-page data from the database would also go into the same interface/abstraction, or a separate one. Andrew Dunbar (hippietrail) By having this available as a parser

Re: [Wikitech-l] WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

2011-05-03 Thread Andrew Dunbar
This is the single most exciting news on the MediaWiki front since I started contributing to Wiktionary nine years ago (-: Andrew Dunbar (hippietrail) -- Tim Starling ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org

Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-25 Thread Andrew Dunbar
provides the ordering info for the people that require it. Andrew Dunbar (hippietrail) Ariel -- James Linden kodekr...@gmail.com -- ___ Wikitech-l mailing list Wikitech-l

Re: [Wikitech-l] [Foundation-l] Data Summit Streaming

2011-02-11 Thread Andrew Dunbar
It doesn't work for me )-: Your input can't be opened: VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'. Check the log for details. Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https

Re: [Wikitech-l] [Foundation-l] Data Summit Streaming

2011-02-11 Thread Andrew Dunbar
On 11 February 2011 22:18, Chad innocentkil...@gmail.com wrote: On Fri, Feb 11, 2011 at 5:57 AM, Andrew Dunbar hippytr...@gmail.com wrote: It doesn't work for me )-: Your input can't be opened: VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'. Check the log for details

Re: [Wikitech-l] Matching main namespace articles with associated talk page

2011-01-09 Thread Andrew Dunbar
don't have one.  This would allow you read access to a live replica of Wikipedia's database, which of course has all these indexes. You don't even have to use a B-Tree if that's beyond you. I just sort the titles and then use a binary search on them. Plenty fast even in Perl and Javascript. Andrew

Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2011-01-03 Thread Andrew Dunbar
On 3 January 2011 21:54, Andreas Jonsson andreas.jons...@kreablo.se wrote: 2010-12-29 08:33, Andrew Dunbar skrev: I've thought a lot about this too. It certainly is not any type of standard grammar. But on the other hand it is a pretty common kind of nonstandard grammar. I call it a recursive

Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Andrew Dunbar
cases would be easier to locate. Andrew Dunbar (hippietrail) Those are all standard gripes, and nothing new or exciting.  There are also, to quote a much-abused former world leader, some known unknowns: 1) we don't know how to explain What You See when you parse wikitext except by prodding

[Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
programming experience negligible. (I'm also interested in hearing from other people working on offline tools for dump files, wikitext parsing, or Wiktionary) Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https

Re: [Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
2010/12/16 Ángel González keis...@gmail.com: On 15/12/10 16:21, Andrew Dunbar wrote: I've long been interested in offline tools that make use of WikiMedia information, particularly the English Wiktionary. I've recently come across a tool which can provide random access to a bzip2 archive

Re: [Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
On 15 December 2010 20:41, Anthony wikim...@inbox.org wrote: On Wed, Dec 15, 2010 at 12:01 PM, Andrew Dunbar hippytr...@gmail.com wrote: By the way I'm keen to find something similar for .7z I've written something similar for .xz, which uses LZMA2 same as .7z. It creates a virtual read-only

Re: [Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
be watching it. Where do the HTML dumps come from? I'm pretty sure I've only seen static for Wikipedia and not for Wiktionary for example. I am also looking at adapting the parser for offline use to generate HTML from the dump file wikitext. Andrew Dunbar (hippietrail) http://openzim.org

Re: [Wikitech-l] require language dump for developing words and corresponding frequency

2010-12-14 Thread Andrew Dunbar
://ur.wikipedia.org If you'd like to use it I have a tool that downloads random samples of wiki pages and strips the HTML for purposes such as this. Good luck! Andrew Dunbar (hippietrail) On 14 December 2010 18:36, pravin@gmail.com pravin@gmail.com wrote: Hi All,  I am Pravin Satpute, I am

Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
should be between Feb to June. A Google search hints that enwiki-20100312-pages-articles.xml.bz2 might be the one with size 6117881141. Andrew Dunbar (hippietrail) Does anybody remember the version between this period, or happened to download the same version with me? Thanks very much

Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
On 14 December 2010 20:04, Andrew Dunbar hippytr...@gmail.com wrote: On 14 December 2010 01:57, Monica shu monicashu...@gmail.com wrote: Thanks Diederik and Waksman, It seems that I need to do parse the dump for article data to get this piece of information... Yes, this will be the last

[Wikitech-l] Looking for a mediawiki.org dump

2010-12-05 Thread Andrew Dunbar
Could anybody help me locate a dump of mediawiki.org while the dump server is broken please? I only need current revisions. Thanks in advance. Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https

Re: [Wikitech-l] alternative way to get wikipedia dump while server is down

2010-11-27 Thread Andrew Dunbar
___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l I don't suppose anybody has a copy of any Romanian or Georgian Wiktionary from any time? (-: Andrew Dunbar (hippietrail

[Wikitech-l] Invoking maintenance scripts return nothing at all

2010-11-16 Thread Andrew Dunbar
, wrapped etc. Am I missing something obvious or do these scripts return no errors by design? Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https

Re: [Wikitech-l] Invoking maintenance scripts return nothing at all

2010-11-16 Thread Andrew Dunbar
On 17 November 2010 02:37, Dmitriy Sintsov ques...@rambler.ru wrote: * Andrew Dunbar hippytr...@gmail.com [Tue, 16 Nov 2010 23:01:33 +1100]: I wish to do some MediaWiki hacking which uses the codebase, specifically the parser, but not the database or web server. I'm running on Windows XP

Re: [Wikitech-l] API vs data dumps

2010-11-07 Thread Andrew Dunbar
to happen. If you'd like to collaborate or anyone else for that matter it would be pretty cool. You'll find my stuff on the Toolserver: https://fisheye.toolserver.org/browse/enwikt Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net

Re: [Wikitech-l] Datamining infoboxes

2009-10-25 Thread Andrew Dunbar
2009/10/23 Aryeh Gregor simetrical+wikil...@gmail.com: On Fri, Oct 23, 2009 at 12:20 PM, Andrew Dunbar hippytr...@gmail.com wrote: Yes I didn't specify tl_namespace In MySQL that will usually make it impossible to effectively use an index on (tl_namespace, tl_title), so it's essential

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Andrew Dunbar
2009/10/23 Robert Ullmann rlullm...@gmail.com: I've been spending hours on the parsing now and don't find it simple at all due to the fact that templates can be nested. Just extracting the Infobox as one big lump is hard due to the need to match nested {{ and }} Andrew Dunbar (hippietrail

[Wikitech-l] Datamining infoboxes

2009-10-22 Thread Andrew Dunbar
using either the Toolserver's Wikipedia database or the Mediawiki API have not been fruitful. In particular, SQL queries on the templatelinks table are intractably slow. Why are there no keys on tl_from or tl_title? Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http

Re: [Wikitech-l] sharing an article on Facebook

2009-10-17 Thread Andrew Dunbar
non-Latin URLs just as the modern browsers do. Andrew Dunbar (hippietrail) -- אמיר אלישע אהרוני Amir Elisha Aharoni http://aharoni.wordpress.com We're living in pieces, I want to live in peace. - T. Moore ___ Wikitech-l mailing list

Re: [Wikitech-l] Wiktionary API acceptable use policy

2009-09-01 Thread Andrew Dunbar
quite a large number of requests, so we thought we should check with you first. Is that acceptable use? Another approach is to download the Wiktionary dump archive to parse offline: http://download.wikipedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2 Andrew Dunbar

Re: [Wikitech-l] Extensions in Bugzilla

2009-07-31 Thread Andrew Dunbar
@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l DidYouMean is mine Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https

Re: [Wikitech-l] URLs that aren't cool...

2009-07-29 Thread Andrew Dunbar
titles which it consulted when displaying a page. The code for display was designed for compatibility with the then-current Wiktionary templates and would need to be implemented in a more general way. A core version would probably just add a field to the existing table. Andrew Dunbar (hippietrail

Re: [Wikitech-l] Bugzilla Weekly Report

2009-07-27 Thread Andrew Dunbar
than created. Congratulations to everyone responsible! :) Could it be due to the new known to fail logic? Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Minify

2009-06-26 Thread Andrew Dunbar
the pluses and minuses of that be? Andrew Dunbar (hippietrail) -Robert Rohde ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- http://wiktionarydev.leuksman.com http

Re: [Wikitech-l] Different apostrophe signs and MediaWiki internal search

2009-06-23 Thread Andrew Dunbar
have definitely seen edits on Wikipedia where people were correcting various kinds of hyphens and dashes. And of course while the English Wikipedia forbids curved quotes each other wiki may well have its own policy. Andrew Dunbar (hippietrail) -- brion

Re: [Wikitech-l] Extending wikilinks syntax

2009-06-20 Thread Andrew Dunbar
syntax. Andrew Dunbar (hippietrail) — Kalan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net

Re: [Wikitech-l] Different apostrophe signs and MediaWiki internal search

2009-06-20 Thread Andrew Dunbar
2009/6/20 Neil Harris use...@tonal.clara.co.uk: Neil Harris wrote: Andrew Dunbar wrote: 2009/6/20 Jaska Zedlik jz5...@gmail.com: Hello, On Fri, Jun 19, 2009 at 20:31, Rolf Lampa rolf.la...@rilnet.com wrote: Jaska Zedlik skrev: ... The code of the override function is the following