[Xmldatadumps-l] ANN: A Go package providing utilities for processing Wikipedia and Wikidata dumps

2022-01-09 Thread Mitar
://gitlab.com/tozd/go/mediawiki Any feedback is welcome. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

[Xmldatadumps-l] Access imageinfo data in a dump

2022-02-02 Thread Mitar
s there a dump which contains that information? And what is "media/file descriptions" then? Wiki pages of files? [1] https://www.mediawiki.org/wiki/API:Imageinfo Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ Xmldatadumps-l

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi! I see. Thanks. Mitar On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF wrote: > > The media/file descriptions contained in the dump are the wikitext of the > revisions of pages with the File: prefix, plus the metadata about those pages > and revisions (user that made the edi

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
/mediawiki Mitar On Thu, Feb 3, 2022 at 9:13 AM Mitar wrote: > > Hi! > > I see. Thanks. > > > Mitar > > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF wrote: > > > > The media/file descriptions contained in the dump are the wikitext of the > > r

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-04 Thread Mitar
uot;:"tt:609531648","text":"tt:609531649"}} But that table itself does not seem to be available as a dump? Or am I missing something or misunderstanding something? [1] https://www.mediawiki.org/wiki/Manual:Text_table Mitar On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn

[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-09 Thread Mitar
Hi! I made this ticket [1] to track regaining access to metadata as a dump. [1] https://phabricator.wikimedia.org/T301039 Mitar On Tue, Feb 8, 2022 at 2:32 AM Platonides wrote: > > The metadata used to be included in the image table, but it was changed 6 > months ago out to Externa

[Xmldatadumps-l] Re: Missing pages/stale data in HTML dumps

2022-04-05 Thread Mitar
Hi! Thanks for noticing and sharing. Another known issue with HTML dumps is that it seems that categories and templates are not always extracted: https://phabricator.wikimedia.org/T300124 Mitar On Tue, Apr 5, 2022 at 12:59 PM Jan Berkel wrote: > > Hello, > > just a heads-up for