I noticed that the dump format version number went from 0.9 to 0.10.
I wonder if this format is documented somewhere or if some code might
expect 1.0?
Andrew Dunbar (hippietrail)
On 28 October 2014 20:45, Daniel Kinzler dan...@brightbyte.de wrote:
Am 27.10.2014 21:58, schrieb Ariel T. Glenn
It would also be great if these pages were marked in the dump files too.
It should be exactly the same way as how redirect pages are marked.
On 27 December 2012 01:41, Brad Jorsch bjor...@wikimedia.org wrote:
On Tue, Dec 25, 2012 at 6:00 AM, Liangent liang...@gmail.com wrote:
Is this
this will clean up the code to the point that making your
own parser becomes a lot easier.
Good luck and sympathy (-:
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
setting a breakpoint on the node in Google Chrome's dev
tools and reloading the page, but it is never triggered.
Apologies if this is not the right mailing list. None of the lists
seemed fit according to http://www.mediawiki.org/wiki/Mailing_lists
Andrew Dunbar (hippietrail
On 6 June 2012 13:57, Bergi a.d.be...@web.de wrote:
Andrew Dunbar schrieb:
I'm having trouble getting a simple one-line User JS working on
Wiktionary.
Apologies if this is not the right mailing list. None of the lists
seemed fit according to http://www.mediawiki.org/wiki/Mailing_lists
I
interested in.
3) The history page give me a way to get a diff of all changes since
my last edit, rather than just the most recent chage.
Andrew Dunbar (hippietrail)
4. 0 is colored grey making it disappear from the list. But that does
not mean the article never changed, it could be +400
the (Redirected from X) bar that
accompanies the redirects
The JavaScript we use on the English Wiktionary also makes a slightly
different (Automaticaly redirected from X) bar, or something very
similar.
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
of seconds.
With the different nature of Wikipedia titles you would probably want
to check sentence case and title case but would still miss quite a few
where only proper nouns within the title are capitalized.
And some people would probably hate such a feature too (-:
Andrew Dunbar (hippietrail
never been enough interest and it's never been important enough and no
developer has ever stepped up. It would take a bit of work to
implement.
Andrew Dunbar (hippietrail)
2011/5/12 Carl (CBM) cbm.wikipe...@gmail.com
On Fri, May 13, 2011 at 12:25 AM, Jay Ashworth j...@baylink.com wrote:
They're
of the canonicalization.
Andrew Dunbar (hippietrail)
Some projects, like probably all Wiktionaries, would doubtless not
want case-folding at all, so we should support different
canonicalization algorithms. Even the ones that don't want
case-folding could still benefit from allowing underscores
license this hypothetical code would be
released under.
- Trevor
I'm pretty sure the offline wikitext parsing community would care
about the licensing as a separate issue to what kind of parser
technology it uses internally.
Andrew Dunbar (hippietrail)
On Tue, May 3, 2011 at 1:25 PM, David
readers
want as close results to the official sites as possible so will want
to implement the same hooks.
Other non-wikitext or non-page data from the database would also go
into the same interface/abstraction, or a separate one.
Andrew Dunbar (hippietrail)
By having this available as a parser
This is the single most exciting news on the MediaWiki front since I started
contributing to Wiktionary nine years ago (-:
Andrew Dunbar (hippietrail)
-- Tim Starling
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org
provides the ordering info for the
people that require it.
Andrew Dunbar (hippietrail)
Ariel
--
James Linden
kodekr...@gmail.com
--
___
Wikitech-l mailing list
Wikitech-l
It doesn't work for me )-:
Your input can't be opened:
VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'.
Check the log for details.
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https
On 11 February 2011 22:18, Chad innocentkil...@gmail.com wrote:
On Fri, Feb 11, 2011 at 5:57 AM, Andrew Dunbar hippytr...@gmail.com wrote:
It doesn't work for me )-:
Your input can't be opened:
VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'.
Check the log for details
don't have one. This
would allow you read access to a live replica of Wikipedia's database,
which of course has all these indexes.
You don't even have to use a B-Tree if that's beyond you. I just sort
the titles and then use a binary search on them. Plenty fast even in
Perl and Javascript.
Andrew
On 3 January 2011 21:54, Andreas Jonsson andreas.jons...@kreablo.se wrote:
2010-12-29 08:33, Andrew Dunbar skrev:
I've thought a lot about this too. It certainly is not any type of
standard grammar. But on the other hand it is a pretty common kind of
nonstandard grammar. I call it a recursive
cases would be easier to locate.
Andrew Dunbar (hippietrail)
Those are all standard gripes, and nothing new or exciting. There are also,
to quote a much-abused former world leader, some known unknowns:
1) we don't know how to explain What You See when you parse wikitext except
by prodding
programming
experience negligible.
(I'm also interested in hearing from other people working on offline
tools for dump files, wikitext parsing, or Wiktionary)
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https
2010/12/16 Ángel González keis...@gmail.com:
On 15/12/10 16:21, Andrew Dunbar wrote:
I've long been interested in offline tools that make use of WikiMedia
information, particularly the English Wiktionary.
I've recently come across a tool which can provide random access to a
bzip2 archive
On 15 December 2010 20:41, Anthony wikim...@inbox.org wrote:
On Wed, Dec 15, 2010 at 12:01 PM, Andrew Dunbar hippytr...@gmail.com wrote:
By the way I'm keen to find something similar for .7z
I've written something similar for .xz, which uses LZMA2 same as .7z.
It creates a virtual read-only
be watching it. Where do the HTML
dumps come from? I'm pretty sure I've only seen static for Wikipedia
and not for Wiktionary for example. I am also looking at adapting the
parser for offline use to generate HTML from the dump file wikitext.
Andrew Dunbar (hippietrail)
http://openzim.org
://ur.wikipedia.org
If you'd like to use it I have a tool that downloads random samples of
wiki pages and strips the HTML for purposes such as this.
Good luck!
Andrew Dunbar (hippietrail)
On 14 December 2010 18:36, pravin@gmail.com pravin@gmail.com wrote:
Hi All,
I am Pravin Satpute, I am
should be between Feb to
June.
A Google search hints that enwiki-20100312-pages-articles.xml.bz2
might be the one with size 6117881141.
Andrew Dunbar (hippietrail)
Does anybody remember the version between this period, or happened to
download the same version with me?
Thanks very much
On 14 December 2010 20:04, Andrew Dunbar hippytr...@gmail.com wrote:
On 14 December 2010 01:57, Monica shu monicashu...@gmail.com wrote:
Thanks Diederik and Waksman,
It seems that I need to do parse the dump for article data to get this piece
of information...
Yes, this will be the last
Could anybody help me locate a dump of mediawiki.org while the dump
server is broken please? I only need current revisions.
Thanks in advance.
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I don't suppose anybody has a copy of any Romanian or Georgian
Wiktionary from any time? (-:
Andrew Dunbar (hippietrail
, wrapped etc.
Am I missing something obvious or do these scripts return no errors by design?
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https
On 17 November 2010 02:37, Dmitriy Sintsov ques...@rambler.ru wrote:
* Andrew Dunbar hippytr...@gmail.com [Tue, 16 Nov 2010 23:01:33
+1100]:
I wish to do some MediaWiki hacking which uses the codebase,
specifically the parser, but not the database or web server.
I'm running on Windows XP
to
happen. If you'd
like to collaborate or anyone else for that matter it would be pretty cool.
You'll find my stuff on the Toolserver:
https://fisheye.toolserver.org/browse/enwikt
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net
2009/10/23 Aryeh Gregor simetrical+wikil...@gmail.com:
On Fri, Oct 23, 2009 at 12:20 PM, Andrew Dunbar hippytr...@gmail.com wrote:
Yes I didn't specify tl_namespace
In MySQL that will usually make it impossible to effectively use an
index on (tl_namespace, tl_title), so it's essential
2009/10/23 Robert Ullmann rlullm...@gmail.com:
I've been spending hours on the parsing now and don't find it simple
at all due to the fact that templates can be nested. Just extracting
the Infobox as one big lump is hard due to the need to match nested {{
and }}
Andrew Dunbar (hippietrail
using either the Toolserver's
Wikipedia database or the Mediawiki API have not been fruitful. In
particular, SQL queries on the templatelinks table are intractably
slow. Why are there no keys on tl_from or tl_title?
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com http
non-Latin URLs just as
the modern browsers do.
Andrew Dunbar (hippietrail)
--
אמיר אלישע אהרוני
Amir Elisha Aharoni
http://aharoni.wordpress.com
We're living in pieces,
I want to live in peace. - T. Moore
___
Wikitech-l mailing list
quite a large number of requests,
so we thought we should check with you first.
Is that acceptable use?
Another approach is to download the Wiktionary dump archive to parse offline:
http://download.wikipedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2
Andrew Dunbar
@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
DidYouMean is mine
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https
titles which it consulted when
displaying a page.
The code for display was designed for compatibility with the
then-current Wiktionary templates and would need to be implemented in
a more general way.
A core version would probably just add a field to the existing table.
Andrew Dunbar (hippietrail
than created. Congratulations to everyone responsible! :)
Could it be due to the new known to fail logic?
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
the pluses and minuses of that be?
Andrew Dunbar (hippietrail)
-Robert Rohde
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
http://wiktionarydev.leuksman.com http
have definitely seen edits on Wikipedia where people were
correcting various kinds of hyphens and dashes. And of course while
the English Wikipedia forbids curved quotes each other wiki may well
have its own policy.
Andrew Dunbar (hippietrail)
-- brion
syntax.
Andrew Dunbar (hippietrail)
— Kalan
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net
2009/6/20 Neil Harris use...@tonal.clara.co.uk:
Neil Harris wrote:
Andrew Dunbar wrote:
2009/6/20 Jaska Zedlik jz5...@gmail.com:
Hello,
On Fri, Jun 19, 2009 at 20:31, Rolf Lampa rolf.la...@rilnet.com wrote:
Jaska Zedlik skrev:
...
The code of the override function is the following
43 matches
Mail list logo