I noticed that the dump format version number went from "0.9" to "0.10".
I wonder if this format is documented somewhere or if some code might
expect "1.0"?
Andrew Dunbar (hippietrail)
On 28 October 2014 20:45, Daniel Kinzler wrote:
> Am 27.10.2014 21:58, schr
It would also be great if these pages were marked in the dump files too.
It should be exactly the same way as how redirect pages are marked.
On 27 December 2012 01:41, Brad Jorsch wrote:
> On Tue, Dec 25, 2012 at 6:00 AM, Liangent wrote:
> > Is this enough?
> >
> > api.php?action=query&prop=p
you are making an offline app you either need to parse
the wikipages into html pages offline yourself, or include parsing
code into your app.
You are not the first to want this, but due to the nature and
complexity of the markup, which includes "parser functions", and the
parser, this is not trivial.
The only parser that is guaranteed to parse MediaWiki markup is
MediaWiki, but the parser is tied to other code.
There is an open feature request to separate this code so apps like
yours can take just the part of the rendering code you need, or
translate that part of the code into another programming language.
Bug 25984 - Isolate parser from database dependencies
https://bugzilla.wikimedia.org/show_bug.cgi?id=25984
Nobody at WikiMedia are working on this, but there's some patches from
other people that will certainly get you on your way.
But the developers at WikiMedia are very busy making a whole new
parser and WYSIWYG editor to go with it.
Hopefully this will clean up the code to the point that making your
own parser becomes a lot easier.
Good luck and sympathy (-:
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 6 June 2012 13:57, Bergi wrote:
> Andrew Dunbar schrieb:
>
>> I'm having trouble getting a simple one-line User JS working on
>> Wiktionary.
>
>
>> Apologies if this is not the right mailing list. None of the lists
>> seemed fit according to http://www
I'm not sure how to check that.
I have tried setting a breakpoint on the node in Google Chrome's dev
tools and reloading the page, but it is never triggered.
Apologies if this is not the right mailing list. None of the lists
seemed fit according to http://www.media
ones were in languages I
am interested in.
3) The history page give me a way to get a diff of all changes since
my last edit, rather than just the most recent chage.
Andrew Dunbar (hippietrail)
> 4. 0 is colored grey making it disappear from the list. But that does
> not mean the articl
language code data from official
sources, and de facto data from MediaWiki sites and APIs and provides
various ways to query it via JSON.
It got a bit messy with just me hacking it. It's used behind the
scenes in some Wiktionary stuff but I don't think anybody else took to
it. It or
show the "(Redirected from X)" bar that
> accompanies the redirects
The JavaScript we use on the English Wiktionary also makes a slightly
different "(Automaticaly redirected from X)" bar, or something very
similar.
Andrew Dunbar (hippietrail)
>
anguages of Central Asia. One solution is to
do accent/diacritic normalization too as part of the canonicalization.
Andrew Dunbar (hippietrail)
> Some projects, like probably all Wiktionaries, would doubtless not
> want case-folding at all, so we should support different
> canonic
ve" and "first-letter". But there's
never been enough interest and it's never been important enough and no
developer has ever stepped up. It would take a bit of work to
implement.
Andrew Dunbar (hippietrail)
> 2011/5/12 Carl (CBM)
>
>> On Fri, May 13, 2011 at
the rest lowercase. If one of those exists it automatically
redirects after a couple of seconds.
With the different nature of Wikipedia titles you would probably want
to check sentence case and title case but would still miss quite a few
where only proper nouns within the title are capitalized.
An
Doing this would leverage the MediaWiki development community and the
> existing PHP codebase to provide a well-maintained, reusable reference
> parser for MediaWiki wikitext.
+1
This is the single most exciting news on the MediaWiki front since I started
contributing to Wiktionary nine ye
d format or direct from an
XML dump file.
Some datamining tools might just stub this interface and deal with the
bare minimum.
Extension hooks are more interesting. I might assume offline readers
want as close results to the official sites as possible so will want
to implement the same hooks.
Othe
at can convert such AST to
HTML? Because of the semantic soup nobod has even brought this up yet.
> So, it's probably not an issue what license this hypothetical code would be
> released under.
>
> - Trevor
I'm pretty sure the offline wikitext parsing community would care
abo
* a way to "fold previous content into the current dumps" that consists
> of making a straight copy of what's on disk with no processing. (What
> do we do if something has been deleted or moved, or is corrupt? The
> existing format isn't friendly to those cases.)
>
On 11 February 2011 22:18, Chad wrote:
> On Fri, Feb 11, 2011 at 5:57 AM, Andrew Dunbar wrote:
>> It doesn't work for me )-:
>>
>> Your input can't be opened:
>> VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'.
>> Check
It doesn't work for me )-:
Your input can't be opened:
VLC is unable to open the MRL 'http://transcode1.wikimedia.org:8080'.
Check the log for details.
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists
e yourself, you might want to
> look into getting a toolserver account, if you don't have one. This
> would allow you read access to a live replica of Wikipedia's database,
> which of course has all these indexes.
You don't even have to use a B-Tree if that's beyond you.
On 3 January 2011 21:54, Andreas Jonsson wrote:
> 2010-12-29 08:33, Andrew Dunbar skrev:
>> I've thought a lot about this too. It certainly is not any type of
>> standard grammar. But on the other hand it is a pretty common kind of
>> nonstandard grammar. I call it a &
ch a
grammar deterministically into an LALR grammar...
But even if not I'm certain it would demysitfy what happens in the
parser so that problems and edge cases would be easier to locate.
Andrew Dunbar (hippietrail)
> Those are all standard gripes, and nothing new or exciting. There are
interesting and I'll be watching it. Where do the HTML
dumps come from? I'm pretty sure I've only seen "static" for Wikipedia
and not for Wiktionary for example. I am also looking at adapting the
parser for offline use to generate HTML from the dump file wikitext.
Andrew Du
On 15 December 2010 20:41, Anthony wrote:
> On Wed, Dec 15, 2010 at 12:01 PM, Andrew Dunbar wrote:
>> By the way I'm keen to find something similar for .7z
>
> I've written something similar for .xz, which uses LZMA2 same as .7z.
> It creates a virtual read-only fil
2010/12/16 Ángel González :
> On 15/12/10 16:21, Andrew Dunbar wrote:
>> I've long been interested in offline tools that make use of WikiMedia
>> information, particularly the English Wiktionary.
>>
>> I've recently come across a tool which can provide random
erience is now quite stale and my 64-bit programming
experience negligible.
(I'm also interested in hearing from other people working on offline
tools for dump files, wikitext parsing, or Wiktionary)
Andrew Dunbar (hippietrail)
___
Wikitech-l mail
On 14 December 2010 20:04, Andrew Dunbar wrote:
> On 14 December 2010 01:57, Monica shu wrote:
>> Thanks Diederik and Waksman,
>>
>> It seems that I need to do parse the dump for article data to get this piece
>> of information...
>> Yes, this will be the las
on 2010-01-30 as Waksman said, my version should be between Feb to
> June.
A Google search hints that enwiki-20100312-pages-articles.xml.bz2
might be the one with size 6117881141.
Andrew Dunbar (hippietrail)
> Does anybody remember the version between this period, or happened to
> downlo
://ur.wikipedia.org
If you'd like to use it I have a tool that downloads random samples of
wiki pages and strips the HTML for purposes such as this.
Good luck!
Andrew Dunbar (hippietrail)
On 14 December 2010 18:36, pravin@gmail.com wrote:
> Hi All,
>
> I am Pravin Satpute, I am worki
Could anybody help me locate a dump of mediawiki.org while the dump
server is broken please? I only need current revisions.
Thanks in advance.
Andrew Dunbar (hippietrail)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https
ter.ac.uk<mailto:schmidt...@email.ulster.ac.uk>
>>
>> —
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> __
On 17 November 2010 02:37, Dmitriy Sintsov wrote:
> * Andrew Dunbar [Tue, 16 Nov 2010 23:01:33
> +1100]:
>> I wish to do some MediaWiki hacking which uses the codebase,
>> specifically the parser, but not the database or web server.
>> I'm running on Windows XP o
needs to be stubbed, wrapped etc.
Am I missing something obvious or do these scripts return no errors by design?
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net
___
Wikitech-l mailing list
Wikitech-l@lists
best way to store all the
index files I create
especially in code to share with other people like I would like to
happen. If you'd
like to collaborate or anyone else for that matter it would be pretty cool.
You'll find my stuff on the Toolserver:
http
h had fonts designed for
a different sequence of letters and modifiers.
The code would logically belong in the same place I suspect.
See:
Unicode normalization "sorts" Hebrew/Arabic/Myanmar vowels wrongly
https://bugzilla.wikimedia.org/show_bug.cgi?id=2399
http://www.mediawiki.or
2009/10/23 Aryeh Gregor :
> On Fri, Oct 23, 2009 at 12:20 PM, Andrew Dunbar wrote:
>> Yes I didn't specify tl_namespace
>
> In MySQL that will usually make it impossible to effectively use an
> index on (tl_namespace, tl_title), so it's essential that you specify
lly amazing and I think I'm going to be learning the query
language and possibly spending some time with dbpedia. Have you
thought about doing the same for Wiktionary?
Andrew Dunbar (hippietrail)
> On Fri, Oct 23, 2009 at 18:20, Andrew Dunbar wrote:
>> 2009/10/23 Aryeh Gregor :
>
2009/10/23 Aryeh Gregor :
> On Fri, Oct 23, 2009 at 8:27 AM, Andrew Dunbar wrote:
>> Yes I found how to get it through the API now. It was actually just
>> the Toolserver database that was intractably slow.
>
> There's nothing slow about the TS database here:
>
>
2009/10/23 Robert Ullmann :
>> I've been spending hours on the parsing now and don't find it simple
>> at all due to the fact that templates can be nested. Just extracting
>> the Infobox as one big lump is hard due to the need to match nested {{
>> and }}
>&g
s on the parsing now and don't find it simple
at all due to the fact that templates can be nested. Just extracting
the Infobox as one big lump is hard due to the need to match nested {{
and }}
Andrew Dunbar (hippietrail)
> Oh, and do remember to look for "
x27;t generate any such
external links and probably couldn't very easily...
But I have just discovered the rvgeneratexml parameter to
action=query&prop=revisions
This includes a field for each template parameter with a
and a for each...
Andrew Dunbar (hippietrail)
> [[User:Dschwen]]
>
&
using either the Toolserver's
Wikipedia database or the Mediawiki API have not been fruitful. In
particular, SQL queries on the templatelinks table are intractably
slow. Why are there no keys on tl_from or tl_title?
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com
a userfriendly improvement for Facebook to interpret non-Latin URLs just as
the modern browsers do.
Andrew Dunbar (hippietrail)
> --
> אמיר אלישע אהרוני
> Amir Elisha Aharoni
>
> http://aharoni.wordpress.com
>
> "We're living in pieces,
> I want to liv
ive proper attribution, etc., but it is
> possible that we will eventually be making quite a large number of requests,
> so we thought we should check with you first.
>
> Is that acceptable use?
Another approach is to download the Wiktionary dump archive to parse offline:
http://down
gt; Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
DidYouMean is mine
Andrew Dunbar (hippietrail)
--
http://wiktionarydev.leuksman.com http://linguaphile.sf.net
___
YouMean
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/DidYouMean/
https://bugzilla.wikimedia.org/show_bug.cgi?id=8648
It hooked all ways to create delete or move a page to maintain a
separate table of normalized page titles which it consulted when
displaying a page.
The code for d
re bugs have
> been resolved than created. Congratulations to everyone responsible! :)
Could it be due to the new "known to fail" logic?
Andrew Dunbar (hippietrail)
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
&g
it...
I always thought it would be much more useful to generate the HTML of
action=render for every page rather than the action=view with the HTML
for one specific skin a million or so times, which is then a pain to
parse out if you want to do anything other than open the HTML in a
browser.
(-:
Andr
bandwidth constrained
> sites.
This sounds great but I have a problem with making action=raw return
something that is not raw. For MediaWiki I think it would be better to
add a new action=minify
What would the pluses and minuses of that be?
Andrew Dunbar (hippietrail)
> -Robert R
t; can insert characters not directly on the
keyboard. And cutting and pasting from web pages where the author
tried to choose specific characters with HTML entities and such.
I have definitely seen edits on Wikipedia where people were
"correcting" various kinds of hyphen
2009/6/20 Neil Harris :
> Neil Harris wrote:
>> Andrew Dunbar wrote:
>>
>>> 2009/6/20 Jaska Zedlik :
>>>
>>>
>>>> Hello,
>>>> On Fri, Jun 19, 2009 at 20:31, Rolf Lampa wrote:
>>>>
>>>>
>>>>
>&
upport for them it shouldn't be any extra work to
add support for the other attributes as long as everyone can agree
on a decent syntax.
Andrew Dunbar (hippietrail)
> — Kalan
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wi
all these redundant assignments
> should be strepped for the productivity purposes, I just used a framework
> from the Japanese language class which does soma Japanese-specific
> reduction, but I agree with your notice.
The username anti-spoofing code already knows about a lot of
i miss this feature since long at wikipedia, where i am used to read
> up texts heavily and i always regret there's only so small space left on
> non-wide-screens, netbooks, PDAs, or when you need huge fonts (working from a
> distance).
>
> What do you think about it ?
https:
52 matches
Mail list logo