[Xmldatadumps-l] Re: Access imageinfo data in a dump
This looks great! If you like, you might add the link and a brief description to this page: https://meta.wikimedia.org/wiki/Data_dumps/Other_tools so that more people can find and use the library :-) (Anyone else have tools they wrote and use, that aren't on this list? Please add them!) Ariel On Fri, Feb 4, 2022 at 2:31 AM Mitar wrote: > Hi! > > If it is useful to anyone else, I have added to my library [1] in Go > for processing dumps support for processing SQL dumps directly, > without having to load them into a database. So one can process them > directly to extract data, like dumps in other formats. > > [1] https://gitlab.com/tozd/go/mediawiki > > > Mitar > > On Thu, Feb 3, 2022 at 9:13 AM Mitar wrote: > > > > Hi! > > > > I see. Thanks. > > > > > > Mitar > > > > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF > wrote: > > > > > > The media/file descriptions contained in the dump are the wikitext of > the revisions of pages with the File: prefix, plus the metadata about those > pages and revisions (user that made the edit, timestamp of edit, edit > comment, and so on). > > > > > > Width and hieght of the image, the media type, the sha1 of the image > and a few other details can be obtained by looking at the image.sql.gz file > available for download for the dumps for each wiki. Have a look at > https://www.mediawiki.org/wiki/Manual:Image_table for more info. > > > > > > Hope that helps! > > > > > > Ariel Glenn > > > > > > > > > > > > On Wed, Feb 2, 2022 at 10:45 PM Mitar wrote: > > >> > > >> Hi! > > >> > > >> I am trying to find a dump of all imageinfo data [1] for all files on > > >> Commons. I thought that "Articles, templates, media/file descriptions, > > >> and primary meta-pages" XML dump would contain that, given the > > >> "media/file descriptions" part, but it seems this is not the case. Is > > >> there a dump which contains that information? And what is "media/file > > >> descriptions" then? Wiki pages of files? > > >> > > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo > > >> > > >> > > >> Mitar > > >> > > >> -- > > >> http://mitar.tnode.com/ > > >> https://twitter.com/mitar_m > > >> ___ > > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org > > >> To unsubscribe send an email to > xmldatadumps-l-le...@lists.wikimedia.org > > > > > > > > -- > > http://mitar.tnode.com/ > > https://twitter.com/mitar_m > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > ___ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
[Xmldatadumps-l] Re: Access imageinfo data in a dump
Hi! If it is useful to anyone else, I have added to my library [1] in Go for processing dumps support for processing SQL dumps directly, without having to load them into a database. So one can process them directly to extract data, like dumps in other formats. [1] https://gitlab.com/tozd/go/mediawiki Mitar On Thu, Feb 3, 2022 at 9:13 AM Mitar wrote: > > Hi! > > I see. Thanks. > > > Mitar > > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF wrote: > > > > The media/file descriptions contained in the dump are the wikitext of the > > revisions of pages with the File: prefix, plus the metadata about those > > pages and revisions (user that made the edit, timestamp of edit, edit > > comment, and so on). > > > > Width and hieght of the image, the media type, the sha1 of the image and a > > few other details can be obtained by looking at the image.sql.gz file > > available for download for the dumps for each wiki. Have a look at > > https://www.mediawiki.org/wiki/Manual:Image_table for more info. > > > > Hope that helps! > > > > Ariel Glenn > > > > > > > > On Wed, Feb 2, 2022 at 10:45 PM Mitar wrote: > >> > >> Hi! > >> > >> I am trying to find a dump of all imageinfo data [1] for all files on > >> Commons. I thought that "Articles, templates, media/file descriptions, > >> and primary meta-pages" XML dump would contain that, given the > >> "media/file descriptions" part, but it seems this is not the case. Is > >> there a dump which contains that information? And what is "media/file > >> descriptions" then? Wiki pages of files? > >> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo > >> > >> > >> Mitar > >> > >> -- > >> http://mitar.tnode.com/ > >> https://twitter.com/mitar_m > >> ___ > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org > >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
[Xmldatadumps-l] Re: Access imageinfo data in a dump
Hi! I see. Thanks. Mitar On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF wrote: > > The media/file descriptions contained in the dump are the wikitext of the > revisions of pages with the File: prefix, plus the metadata about those pages > and revisions (user that made the edit, timestamp of edit, edit comment, and > so on). > > Width and hieght of the image, the media type, the sha1 of the image and a > few other details can be obtained by looking at the image.sql.gz file > available for download for the dumps for each wiki. Have a look at > https://www.mediawiki.org/wiki/Manual:Image_table for more info. > > Hope that helps! > > Ariel Glenn > > > > On Wed, Feb 2, 2022 at 10:45 PM Mitar wrote: >> >> Hi! >> >> I am trying to find a dump of all imageinfo data [1] for all files on >> Commons. I thought that "Articles, templates, media/file descriptions, >> and primary meta-pages" XML dump would contain that, given the >> "media/file descriptions" part, but it seems this is not the case. Is >> there a dump which contains that information? And what is "media/file >> descriptions" then? Wiki pages of files? >> >> [1] https://www.mediawiki.org/wiki/API:Imageinfo >> >> >> Mitar >> >> -- >> http://mitar.tnode.com/ >> https://twitter.com/mitar_m >> ___ >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org