Re: [Wikitech-l] Proposal: slight change to the XML dump format

2014-10-29 Thread Andrew Dunbar
I noticed that the dump format version number went from "0.9" to "0.10". I wonder if this format is documented somewhere or if some code might expect "1.0"? Andrew Dunbar (hippietrail) On 28 October 2014 20:45, Daniel Kinzler wrote: > Am 27.10.2014 21:58, schr

Re: [Wikitech-l] Distinguishing disambiguation pages

2012-12-26 Thread Andrew Dunbar
It would also be great if these pages were marked in the dump files too. It should be exactly the same way as how redirect pages are marked. On 27 December 2012 01:41, Brad Jorsch wrote: > On Tue, Dec 25, 2012 at 6:00 AM, Liangent wrote: > > Is this enough? > > > > api.php?action=query&prop=p

Re: [Wikitech-l] HTML wikipedia dumps: Could you please provide them, or make public the code for interpreting templates?

2012-09-09 Thread Andrew Dunbar
you are making an offline app you either need to parse the wikipages into html pages offline yourself, or include parsing code into your app. You are not the first to want this, but due to the nature and complexity of the markup, which includes "parser functions", and the parser, this is not trivial. The only parser that is guaranteed to parse MediaWiki markup is MediaWiki, but the parser is tied to other code. There is an open feature request to separate this code so apps like yours can take just the part of the rendering code you need, or translate that part of the code into another programming language. Bug 25984 - Isolate parser from database dependencies https://bugzilla.wikimedia.org/show_bug.cgi?id=25984 Nobody at WikiMedia are working on this, but there's some patches from other people that will certainly get you on your way. But the developers at WikiMedia are very busy making a whole new parser and WYSIWYG editor to go with it. Hopefully this will clean up the code to the point that making your own parser becomes a lot easier. Good luck and sympathy (-: Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Order of execution JavaScript "extensions"

2012-06-06 Thread Andrew Dunbar
On 6 June 2012 13:57, Bergi wrote: > Andrew Dunbar schrieb: > >> I'm having trouble getting a simple one-line User JS working on >> Wiktionary. > > >> Apologies if this is not the right mailing list. None of the lists >> seemed fit according to http://www

[Wikitech-l] Order of execution JavaScript "extensions"

2012-06-06 Thread Andrew Dunbar
I'm not sure how to check that. I have tried setting a breakpoint on the node in Google Chrome's dev tools and reloading the page, but it is never triggered. Apologies if this is not the right mailing list. None of the lists seemed fit according to http://www.media

Re: [Wikitech-l] Visual watchlist

2012-05-17 Thread Andrew Dunbar
ones were in languages I am interested in. 3) The history page give me a way to get a diff of all changes since my last edit, rather than just the most recent chage. Andrew Dunbar (hippietrail) >   4. 0 is colored grey making it disappear from the list. But that does >   not mean the articl

Re: [Wikitech-l] List of ISO 639-1/3 language codes and names in MW core?

2011-09-06 Thread Andrew Dunbar
language code data from official sources, and de facto data from MediaWiki sites and APIs and provides various ways to query it via JSON. It got a bit messy with just me hacking it. It's used behind the scenes in some Wiktionary stuff but I don't think anybody else took to it. It or

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-14 Thread Andrew Dunbar
show the "(Redirected from X)" bar that > accompanies the redirects The JavaScript we use on the English Wiktionary also makes a slightly different "(Automaticaly redirected from X)" bar, or something very similar. Andrew Dunbar (hippietrail) >

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-13 Thread Andrew Dunbar
anguages of Central Asia. One solution is to do accent/diacritic normalization too as part of the canonicalization. Andrew Dunbar (hippietrail) > Some projects, like probably all Wiktionaries, would doubtless not > want case-folding at all, so we should support different > canonic

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-13 Thread Andrew Dunbar
ve" and "first-letter". But there's never been enough interest and it's never been important enough and no developer has ever stepped up. It would take a bit of work to implement. Andrew Dunbar (hippietrail) > 2011/5/12 Carl (CBM) > >> On Fri, May 13, 2011 at

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

2011-05-13 Thread Andrew Dunbar
the rest lowercase. If one of those exists it automatically redirects after a couple of seconds. With the different nature of Wikipedia titles you would probably want to check sentence case and title case but would still miss quite a few where only proper nouns within the title are capitalized. An

Re: [Wikitech-l] WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

2011-05-03 Thread Andrew Dunbar
Doing this would leverage the MediaWiki development community and the > existing PHP codebase to provide a well-maintained, reusable reference > parser for MediaWiki wikitext. +1 This is the single most exciting news on the MediaWiki front since I started contributing to Wiktionary nine ye

Re: [Wikitech-l] Licensing (Was: WYSIWYG and parser plans)

2011-05-03 Thread Andrew Dunbar
d format or direct from an XML dump file. Some datamining tools might just stub this interface and deal with the bare minimum. Extension hooks are more interesting. I might assume offline readers want as close results to the official sites as possible so will want to implement the same hooks. Othe

Re: [Wikitech-l] Licensing (Was: WYSIWYG and parser plans)

2011-05-03 Thread Andrew Dunbar
at can convert such AST to HTML? Because of the semantic soup nobod has even brought this up yet. > So, it's probably not an issue what license this hypothetical code would be > released under. > > - Trevor I'm pretty sure the offline wikitext parsing community would care abo

Re: [Wikitech-l] Moving the Dump Process to another language

2011-03-25 Thread Andrew Dunbar
* a way to "fold previous content into the current dumps" that consists > of making a straight copy of what's on disk with no processing.  (What > do we do if something has been deleted or moved, or is corrupt?  The > existing format isn't friendly to those cases.) >

Re: [Wikitech-l] [Foundation-l] Data Summit Streaming

2011-02-11 Thread Andrew Dunbar
On 11 February 2011 22:18, Chad wrote: > On Fri, Feb 11, 2011 at 5:57 AM, Andrew Dunbar wrote: >> It doesn't work for me )-: >> >> Your input can't be opened: >> VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'. >> Check

Re: [Wikitech-l] [Foundation-l] Data Summit Streaming

2011-02-11 Thread Andrew Dunbar
It doesn't work for me )-: Your input can't be opened: VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'. Check the log for details. Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists

Re: [Wikitech-l] Matching main namespace articles with associated talk page

2011-01-09 Thread Andrew Dunbar
e yourself, you might want to > look into getting a toolserver account, if you don't have one.  This > would allow you read access to a live replica of Wikipedia's database, > which of course has all these indexes. You don't even have to use a B-Tree if that's beyond you.

Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2011-01-03 Thread Andrew Dunbar
On 3 January 2011 21:54, Andreas Jonsson wrote: > 2010-12-29 08:33, Andrew Dunbar skrev: >> I've thought a lot about this too. It certainly is not any type of >> standard grammar. But on the other hand it is a pretty common kind of >> nonstandard grammar. I call it a &

Re: [Wikitech-l] Big problem to solve: good WYSIWYG on WMF wikis

2010-12-28 Thread Andrew Dunbar
ch a grammar deterministically into an LALR grammar... But even if not I'm certain it would demysitfy what happens in the parser so that problems and edge cases would be easier to locate. Andrew Dunbar (hippietrail) > Those are all standard gripes, and nothing new or exciting.  There are

Re: [Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
interesting and I'll be watching it. Where do the HTML dumps come from? I'm pretty sure I've only seen "static" for Wikipedia and not for Wiktionary for example. I am also looking at adapting the parser for offline use to generate HTML from the dump file wikitext. Andrew Du

Re: [Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
On 15 December 2010 20:41, Anthony wrote: > On Wed, Dec 15, 2010 at 12:01 PM, Andrew Dunbar wrote: >> By the way I'm keen to find something similar for .7z > > I've written something similar for .xz, which uses LZMA2 same as .7z. > It creates a virtual read-only fil

Re: [Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
2010/12/16 Ángel González : > On 15/12/10 16:21, Andrew Dunbar wrote: >> I've long been interested in offline tools that make use of WikiMedia >> information, particularly the English Wiktionary. >> >> I've recently come across a tool which can provide random

[Wikitech-l] Offline wiki tools

2010-12-15 Thread Andrew Dunbar
erience is now quite stale and my 64-bit programming experience negligible. (I'm also interested in hearing from other people working on offline tools for dump files, wikitext parsing, or Wiktionary) Andrew Dunbar (hippietrail) ___ Wikitech-l mail

Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
On 14 December 2010 20:04, Andrew Dunbar wrote: > On 14 December 2010 01:57, Monica shu wrote: >> Thanks Diederik and Waksman, >> >> It seems that I need to do parse the dump for article data to get this piece >> of information... >> Yes, this will be the las

Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
on 2010-01-30 as Waksman said, my version should be between Feb to > June. A Google search hints that enwiki-20100312-pages-articles.xml.bz2 might be the one with size 6117881141. Andrew Dunbar (hippietrail) > Does anybody remember the version between this period, or happened to > downlo

Re: [Wikitech-l] require language dump for developing words and corresponding frequency

2010-12-14 Thread Andrew Dunbar
://ur.wikipedia.org If you'd like to use it I have a tool that downloads random samples of wiki pages and strips the HTML for purposes such as this. Good luck! Andrew Dunbar (hippietrail) On 14 December 2010 18:36, pravin@gmail.com wrote: > Hi All, > >  I am Pravin Satpute, I am worki

[Wikitech-l] Looking for a mediawiki.org dump

2010-12-05 Thread Andrew Dunbar
Could anybody help me locate a dump of mediawiki.org while the dump server is broken please? I only need current revisions. Thanks in advance. Andrew Dunbar (hippietrail) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https

Re: [Wikitech-l] alternative way to get wikipedia dump while server is down

2010-11-27 Thread Andrew Dunbar
ter.ac.uk<mailto:schmidt...@email.ulster.ac.uk> >> >> — >> >> ___ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > __

Re: [Wikitech-l] Invoking maintenance scripts return nothing at all

2010-11-16 Thread Andrew Dunbar
On 17 November 2010 02:37, Dmitriy Sintsov wrote: > * Andrew Dunbar [Tue, 16 Nov 2010 23:01:33 > +1100]: >> I wish to do some MediaWiki hacking which uses the codebase, >> specifically the parser, but not the database or web server. >> I'm running on Windows XP o

[Wikitech-l] Invoking maintenance scripts return nothing at all

2010-11-16 Thread Andrew Dunbar
needs to be stubbed, wrapped etc. Am I missing something obvious or do these scripts return no errors by design? Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net ___ Wikitech-l mailing list Wikitech-l@lists

Re: [Wikitech-l] API vs data dumps

2010-11-07 Thread Andrew Dunbar
best way to store all the index files I create especially in code to share with other people like I would like to happen. If you'd like to collaborate or anyone else for that matter it would be pretty cool. You'll find my stuff on the Toolserver: http

Re: [Wikitech-l] Unicode equivalence

2009-12-01 Thread Andrew Dunbar
h had fonts designed for a different sequence of letters and modifiers. The code would logically belong in the same place I suspect. See: Unicode normalization "sorts" Hebrew/Arabic/Myanmar vowels wrongly https://bugzilla.wikimedia.org/show_bug.cgi?id=2399 http://www.mediawiki.or

Re: [Wikitech-l] Datamining infoboxes

2009-10-25 Thread Andrew Dunbar
2009/10/23 Aryeh Gregor : > On Fri, Oct 23, 2009 at 12:20 PM, Andrew Dunbar wrote: >> Yes I didn't specify tl_namespace > > In MySQL that will usually make it impossible to effectively use an > index on (tl_namespace, tl_title), so it's essential that you specify

Re: [Wikitech-l] Datamining infoboxes

2009-10-25 Thread Andrew Dunbar
lly amazing and I think I'm going to be learning the query language and possibly spending some time with dbpedia. Have you thought about doing the same for Wiktionary? Andrew Dunbar (hippietrail) > On Fri, Oct 23, 2009 at 18:20, Andrew Dunbar wrote: >> 2009/10/23 Aryeh Gregor : >

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Andrew Dunbar
2009/10/23 Aryeh Gregor : > On Fri, Oct 23, 2009 at 8:27 AM, Andrew Dunbar wrote: >> Yes I found how to get it through the API now. It was actually just >> the Toolserver database that was intractably slow. > > There's nothing slow about the TS database here: > >

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Andrew Dunbar
2009/10/23 Robert Ullmann : >> I've been spending hours on the parsing now and don't find it simple >> at all due to the fact that templates can be nested. Just extracting >> the Infobox as one big lump is hard due to the need to match nested {{ >> and }} >&g

Re: [Wikitech-l] Datamining infoboxes

2009-10-23 Thread Andrew Dunbar
s on the parsing now and don't find it simple at all due to the fact that templates can be nested. Just extracting the Infobox as one big lump is hard due to the need to match nested {{ and }} Andrew Dunbar (hippietrail) > Oh, and do remember to look for "

Re: [Wikitech-l] Datamining infoboxes

2009-10-22 Thread Andrew Dunbar
x27;t generate any such external links and probably couldn't very easily... But I have just discovered the rvgeneratexml parameter to action=query&prop=revisions This includes a field for each template parameter with a and a for each... Andrew Dunbar (hippietrail) > [[User:Dschwen]] > &

[Wikitech-l] Datamining infoboxes

2009-10-22 Thread Andrew Dunbar
using either the Toolserver's Wikipedia database or the Mediawiki API have not been fruitful. In particular, SQL queries on the templatelinks table are intractably slow. Why are there no keys on tl_from or tl_title? Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com

Re: [Wikitech-l] sharing an article on Facebook

2009-10-16 Thread Andrew Dunbar
a userfriendly improvement for Facebook to interpret non-Latin URLs just as the modern browsers do. Andrew Dunbar (hippietrail) > -- > אמיר אלישע אהרוני > Amir Elisha Aharoni > > http://aharoni.wordpress.com > > "We're living in pieces, > I want to liv

Re: [Wikitech-l] Wiktionary API acceptable use policy

2009-09-01 Thread Andrew Dunbar
ive proper attribution, etc., but it is > possible that we will eventually be making quite a large number of requests, > so we thought we should check with you first. > > Is that acceptable use? Another approach is to download the Wiktionary dump archive to parse offline: http://down

Re: [Wikitech-l] Extensions in Bugzilla

2009-07-31 Thread Andrew Dunbar
gt; Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > DidYouMean is mine Andrew Dunbar (hippietrail) -- http://wiktionarydev.leuksman.com http://linguaphile.sf.net ___

Re: [Wikitech-l] URLs that aren't cool...

2009-07-28 Thread Andrew Dunbar
YouMean http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/DidYouMean/ https://bugzilla.wikimedia.org/show_bug.cgi?id=8648 It hooked all ways to create delete or move a page to maintain a separate table of normalized page titles which it consulted when displaying a page. The code for d

Re: [Wikitech-l] Bugzilla Weekly Report

2009-07-27 Thread Andrew Dunbar
re bugs have > been resolved than created. Congratulations to everyone responsible! :) Could it be due to the new "known to fail" logic? Andrew Dunbar (hippietrail) > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org &g

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-26 Thread Andrew Dunbar
it... I always thought it would be much more useful to generate the HTML of action=render for every page rather than the action=view with the HTML for one specific skin a million or so times, which is then a pain to parse out if you want to do anything other than open the HTML in a browser. (-: Andr

Re: [Wikitech-l] Minify

2009-06-26 Thread Andrew Dunbar
bandwidth constrained > sites. This sounds great but I have a problem with making action=raw return something that is not raw. For MediaWiki I think it would be better to add a new action=minify What would the pluses and minuses of that be? Andrew Dunbar (hippietrail) > -Robert R

Re: [Wikitech-l] Different apostrophe signs and MediaWiki internal search

2009-06-23 Thread Andrew Dunbar
t; can insert characters not directly on the keyboard. And cutting and pasting from web pages where the author tried to choose specific characters with HTML entities and such. I have definitely seen edits on Wikipedia where people were "correcting" various kinds of hyphen

Re: [Wikitech-l] Different apostrophe signs and MediaWiki internal search

2009-06-20 Thread Andrew Dunbar
2009/6/20 Neil Harris : > Neil Harris wrote: >> Andrew Dunbar wrote: >> >>> 2009/6/20 Jaska Zedlik : >>> >>> >>>> Hello, >>>> On Fri, Jun 19, 2009 at 20:31, Rolf Lampa wrote: >>>> >>>> >>>> >&

Re: [Wikitech-l] Extending wikilinks syntax

2009-06-20 Thread Andrew Dunbar
upport for them it shouldn't be any extra work to add support for the other attributes as long as everyone can agree on a decent syntax. Andrew Dunbar (hippietrail) > — Kalan > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wi

Re: [Wikitech-l] Different apostrophe signs and MediaWiki internal search

2009-06-19 Thread Andrew Dunbar
all these redundant assignments > should be strepped for the productivity purposes, I just used a framework > from the Japanese language class which does soma Japanese-specific > reduction, but I agree with your notice. The username anti-spoofing code already knows about a lot of

Re: [Wikitech-l] feature request: hide left navbar

2009-02-07 Thread Andrew Dunbar
i miss this feature since long at wikipedia, where i am used to read > up texts heavily and i always regret there's only so small space left on > non-wide-screens, netbooks, PDAs, or when you need huge fonts (working from a > distance). > > What do you think about it ? https: