[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Ariel Glenn WMF
This looks great! If you like, you might add the link and a  brief
description to this page:
https://meta.wikimedia.org/wiki/Data_dumps/Other_tools  so that more people
can find and use the library :-)

(Anyone else have tools they wrote and use, that aren't on this list?
Please add them!)

Ariel

On Fri, Feb 4, 2022 at 2:31 AM Mitar  wrote:

> Hi!
>
> If it is useful to anyone else, I have added to my library [1] in Go
> for processing dumps support for processing SQL dumps directly,
> without having to load them into a database. So one can process them
> directly to extract data, like dumps in other formats.
>
> [1] https://gitlab.com/tozd/go/mediawiki
>
>
> Mitar
>
> On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
> >
> > Hi!
> >
> > I see. Thanks.
> >
> >
> > Mitar
> >
> > On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF 
> wrote:
> > >
> > > The media/file descriptions contained in the dump are the wikitext of
> the revisions of pages with the File: prefix, plus the metadata about those
> pages and revisions (user that made the edit, timestamp of edit, edit
> comment, and so on).
> > >
> > > Width and hieght of the image, the media type, the sha1 of the image
> and a few other details can be obtained by looking at the image.sql.gz file
> available for download for the dumps for each wiki. Have a look at
> https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> > >
> > > Hope that helps!
> > >
> > > Ariel Glenn
> > >
> > >
> > >
> > > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> > >>
> > >> Hi!
> > >>
> > >> I am trying to find a dump of all imageinfo data [1] for all files on
> > >> Commons. I thought that "Articles, templates, media/file descriptions,
> > >> and primary meta-pages" XML dump would contain that, given the
> > >> "media/file descriptions" part, but it seems this is not the case. Is
> > >> there a dump which contains that information? And what is "media/file
> > >> descriptions" then? Wiki pages of files?
> > >>
> > >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> > >>
> > >>
> > >> Mitar
> > >>
> > >> --
> > >> http://mitar.tnode.com/
> > >> https://twitter.com/mitar_m
> > >> ___
> > >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> > >> To unsubscribe send an email to
> xmldatadumps-l-le...@lists.wikimedia.org
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi!

If it is useful to anyone else, I have added to my library [1] in Go
for processing dumps support for processing SQL dumps directly,
without having to load them into a database. So one can process them
directly to extract data, like dumps in other formats.

[1] https://gitlab.com/tozd/go/mediawiki


Mitar

On Thu, Feb 3, 2022 at 9:13 AM Mitar  wrote:
>
> Hi!
>
> I see. Thanks.
>
>
> Mitar
>
> On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
> >
> > The media/file descriptions contained in the dump are the wikitext of the 
> > revisions of pages with the File: prefix, plus the metadata about those 
> > pages and revisions (user that made the edit, timestamp of edit, edit 
> > comment, and so on).
> >
> > Width and hieght of the image, the media type, the sha1 of the image and a 
> > few other details can be obtained by looking at the image.sql.gz file 
> > available for download for the dumps for each wiki. Have a look at 
> > https://www.mediawiki.org/wiki/Manual:Image_table for more info.
> >
> > Hope that helps!
> >
> > Ariel Glenn
> >
> >
> >
> > On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> I am trying to find a dump of all imageinfo data [1] for all files on
> >> Commons. I thought that "Articles, templates, media/file descriptions,
> >> and primary meta-pages" XML dump would contain that, given the
> >> "media/file descriptions" part, but it seems this is not the case. Is
> >> there a dump which contains that information? And what is "media/file
> >> descriptions" then? Wiki pages of files?
> >>
> >> [1] https://www.mediawiki.org/wiki/API:Imageinfo
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
> >> ___
> >> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> >> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org


[Xmldatadumps-l] Re: Access imageinfo data in a dump

2022-02-03 Thread Mitar
Hi!

I see. Thanks.


Mitar

On Thu, Feb 3, 2022 at 7:17 AM Ariel Glenn WMF  wrote:
>
> The media/file descriptions contained in the dump are the wikitext of the 
> revisions of pages with the File: prefix, plus the metadata about those pages 
> and revisions (user that made the edit, timestamp of edit, edit comment, and 
> so on).
>
> Width and hieght of the image, the media type, the sha1 of the image and a 
> few other details can be obtained by looking at the image.sql.gz file 
> available for download for the dumps for each wiki. Have a look at 
> https://www.mediawiki.org/wiki/Manual:Image_table for more info.
>
> Hope that helps!
>
> Ariel Glenn
>
>
>
> On Wed, Feb 2, 2022 at 10:45 PM Mitar  wrote:
>>
>> Hi!
>>
>> I am trying to find a dump of all imageinfo data [1] for all files on
>> Commons. I thought that "Articles, templates, media/file descriptions,
>> and primary meta-pages" XML dump would contain that, given the
>> "media/file descriptions" part, but it seems this is not the case. Is
>> there a dump which contains that information? And what is "media/file
>> descriptions" then? Wiki pages of files?
>>
>> [1] https://www.mediawiki.org/wiki/API:Imageinfo
>>
>>
>> Mitar
>>
>> --
>> http://mitar.tnode.com/
>> https://twitter.com/mitar_m
>> ___
>> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
>> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org