Hello Wolfgang,

I am trying to repro your issue. The file is ~140gb so doing a `bzcat`
takes a long while. Will get back to you with the result.

For now, here is the sha1 hash of that file so that you can compare
against your local copy, see if it was corrupted in flight?

$ sha1sum wikidatawiki-20240101-pages-articles-multistream.xml.bz2

1be753ba90e0390c8b65f9b80b08015922da12f1


On Fri, Jan 5, 2024 at 12:03 PM Wurgl <heisewu...@gmail.com> wrote:

> Hello!
>
> I am having some unexpected messages, so I tried the following:
>
> curl -s
> https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2
> | bzip2 -d | tail
>
> an got this:
>
> bzip2: Compressed file ends unexpectedly;
>         perhaps it is corrupted?  *Possible* reason follows.
> bzip2: Inappropriate ioctl for device
>         Input file = (stdin), output file = (stdout)
>
> It is possible that the compressed file(s) have become corrupted.
> You can use the -tvv option to test integrity of such files.
>
> You can use the `bzip2recover' program to attempt to recover
> data from undamaged sections of corrupted files.
>
>       <parentid>1227967782</parentid>
>       <timestamp>2023-12-07T00:22:05Z</timestamp>
>       <contributor>
>         <username>Renamerr</username>
>         <id>2883061</id>
>       </contributor>
>       <comment>/* wbsetdescription-add:1|uk */ бактеріальний білок,
> наявний у Listeria monocytogenes EGD-e,
> [[:toollabs:quickstatements/#/batch/218434|batch #218434]]</comment>
>       <model>wikibase-item</model>
>       <format>application/json</format>
>
> The first part is an error message which I could also see when running my
> PHP-script from within the toolserver-cloud (php 7.4 because class
> XMLReader with the installed php 8.2 simple core dumps, see T352886). The
> second part is the output from the "tail" command.
>
> Just as a crosschek: I have no such problem with
> curl -s
> https://dumps.wikimedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2
> | bzip2 -d | tail
>
> No error and the last line is "</mediawiki>"
>
> Cheers,
> Wolfgang
>
> _______________________________________________
> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>


-- 
Xabriel J. Collazo Mojica (he/him, pronunciation
<https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciation.ogg>
)
Sr Software Engineer
Wikimedia Foundation
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

Reply via email to