Charles-Francois Natali <neolo...@free.fr> added the comment: > Stupid questions are always worth asking. I did check the MD5 sum earlier > and just checked it again (since I copied the file from one machine to > another): > > ebwolf@ubuntu:/opt$ md5sum /host/full-planet-110115-1800.osm.bz2 > 0e3f81ef0dd415d8f90f1378666a400c /host/full-planet-110115-1800.osm.bz2 > ebwolf@ubuntu:/opt$ cat full-planet-110115-1800.osm.bz2.md5 > 0e3f81ef0dd415d8f90f1378666a400c full-planet-110115-1800.osm.bz2 >
Well, that only proves that the file wasn't corrupted during the download. But this doesn't prove that the file on the remote server isn't corrupt (see for example the link I gave you, the guy used rsync and had a correct checksum but was still unable to extract the file). > There you have it. I was able to convert the bz2 to gzip with no errors: > > bzcat full-planet-110115-1800.osm.bz2 | gzip > full-planet.osm.gz > How big is full-planet.osm.gz ? Since bzip2 uses bzlib, and can very well return after having uncompressed only half the file. A more interesting test would be $ bzip2 -cd full-planet-110115-1800.osm.bz2 | bzip2 -c > full-planet.new.osm.bz2 $ md5sum full-planet.*.bz2 > FYI: This problem came up last year with no resolution: > > http://mail.python.org/pipermail/tutor/2010-February/074610.html > Yeah, and it was also on an OSM file. Now, I know that OSM are probably one of the biggest providers of huge archives, but it's surprising that everytime there's a problem with bz2, it's with an OSM file, no ? Look at what I just found, a message from an OSM admin dating from later 2010: """ On 26 October 2010 13:47, Anthony <osm <at> inbox.org> wrote: > a <at> A-PC:/media/usbdrive$ cat full-planet-101022.osm.bz2.md5 > 0a90fec8ce66bdd82984c2ee8c6bb6ac full-planet-101022.osm.bz2 > a <at> A-PC:/media/usbdrive$ md5sum full-planet-101022.osm.bz2 > c652430b00668c30bb04816ff16cbfbe full-planet-101022.osm.bz2 > > Just me? > We had problems with the network card in that machine last night causing some corruption, try rsync://planet.openstreetmap.org/planet/full-experimental/ the file into a good state. Although best to wait a few hours, currently packet loss issues on server's upstream network. Regards Grant """ > In general, is it best to always read the same number of bytes? In that case, it doesn't matter. > And what is the best value to pass for buffering in BZ2File? I just made up > something hoping it would work. The default one ;-) (don't provide any) > Colin was using an OSM planet file from some time last year and it quit at > exactly 900000 bytes. OSM again :-) 900.000 is exacty the default bz2 block size... ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10900> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com