Re: [Tutor] Reading large bz2 Files
Norman Rieß, 19.02.2010 13:42: > i am trying to read a large bz2 file with this code: > > source_file = bz2.BZ2File(file, "r") > for line in source_file: > print line.strip() > > But after 4311 lines, it stoppes without a errormessage. The bz2 file is > much bigger though. Could you send in a copy of the unpacked bytes around the position where it stops? I.e. a couple of lines before and after that position? Note that bzip2 is a block compressor, so, depending on your data, you may have to send enough lines to fill the block size. Does it also stop if you parse only those lines from a bzip2 file, or is it required that the file has at least the current amount of data before those lines? Based on this, could you please do a bit of poking around yourself to figure out if it is a) the byte position, b) the data content or c) the length of the file that induces this behaviour? I assume it's rather unpractical to share the entire file, so you will have to share hints and information instead if you want this resolved. Stefan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
Am 19.02.2010 22:24, schrieb Lie Ryan: > On 02/20/10 07:49, Norman Rieß wrote: > >> Am 19.02.2010 21:42, schrieb Lie Ryan: >> >>> On 02/19/10 23:42, Norman Rieß wrote: >>> >>> Hello, i am trying to read a large bz2 file with this code: source_file = bz2.BZ2File(file, "r") for line in source_file: print line.strip() But after 4311 lines, it stoppes without a errormessage. The bz2 file is much bigger though. How can i read the whole file line by line? >>> Is the bz2 file an archive[1]? >>> >>> [1] archive: contains more than one file >>> >>> >> No it is a single file. But how could i check for sure? Its extracts to >> a single file... >> > use "bzip2 -dc" or "bunzip2" instead of "bzcat" since bzcat concatenates > its output file to a single file. > > > Yes, it is a single file. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
Am 19.02.2010 22:03, schrieb Kent Johnson: > On Fri, Feb 19, 2010 at 7:42 AM, Norman Rieß wrote: > >> Hello, >> >> i am trying to read a large bz2 file with this code: >> >> source_file = bz2.BZ2File(file, "r") >> for line in source_file: >>print line.strip() >> >> But after 4311 lines, it stoppes without a errormessage. The bz2 file is >> much bigger though. >> How can i read the whole file line by line? >> > I wonder if it is dying after reading 2^31 or 2^32 bytes? It sounds a > bit like this (fixed) bug: > http://bugs.python.org/issue1215928 > > Kent > > ./osmcut.py ../planet-100210.osm.bz2 > test.txt sm...@loki ~/osm/osmcut $ ls -lh test.txt -rw-r--r-- 1 871K 19. Feb 22:41 test.txt Seems like far from it. Norman ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
On 02/20/10 07:49, Norman Rieß wrote: > Am 19.02.2010 21:42, schrieb Lie Ryan: >> On 02/19/10 23:42, Norman Rieß wrote: >> >>> Hello, >>> >>> i am trying to read a large bz2 file with this code: >>> >>> source_file = bz2.BZ2File(file, "r") >>> for line in source_file: >>> print line.strip() >>> >>> But after 4311 lines, it stoppes without a errormessage. The bz2 file is >>> much bigger though. >>> How can i read the whole file line by line? >>> >> Is the bz2 file an archive[1]? >> >> [1] archive: contains more than one file >> > > No it is a single file. But how could i check for sure? Its extracts to > a single file... use "bzip2 -dc" or "bunzip2" instead of "bzcat" since bzcat concatenates its output file to a single file. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
On 02/20/10 07:42, Lie Ryan wrote: > On 02/19/10 23:42, Norman Rieß wrote: >> Hello, >> >> i am trying to read a large bz2 file with this code: >> >> source_file = bz2.BZ2File(file, "r") >> for line in source_file: >> print line.strip() >> >> But after 4311 lines, it stoppes without a errormessage. The bz2 file is >> much bigger though. >> How can i read the whole file line by line? > > Is the bz2 file an archive[1]? > > [1] archive: contains more than one file Or more clearly, is the bz2 contains multiple file compressed using -c flag? The -c flag will do a simple concatenation of multiple compressed streams to stdout; it is only decompressible with bzip2 0.9.0 or later[1]. You cannot use bz2.BZ2File to open this, instead use the stream decompressor bz2.BZ2Decompressor. A better approach, is to use a real archiving format (e.g. tar). [1] http://www.bzip.org/1.0.3/html/description.html ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
On Fri, Feb 19, 2010 at 7:42 AM, Norman Rieß wrote: > Hello, > > i am trying to read a large bz2 file with this code: > > source_file = bz2.BZ2File(file, "r") > for line in source_file: > print line.strip() > > But after 4311 lines, it stoppes without a errormessage. The bz2 file is > much bigger though. > How can i read the whole file line by line? I wonder if it is dying after reading 2^31 or 2^32 bytes? It sounds a bit like this (fixed) bug: http://bugs.python.org/issue1215928 Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
Am 19.02.2010 21:42, schrieb Lie Ryan: > On 02/19/10 23:42, Norman Rieß wrote: > >> Hello, >> >> i am trying to read a large bz2 file with this code: >> >> source_file = bz2.BZ2File(file, "r") >> for line in source_file: >> print line.strip() >> >> But after 4311 lines, it stoppes without a errormessage. The bz2 file is >> much bigger though. >> How can i read the whole file line by line? >> > Is the bz2 file an archive[1]? > > [1] archive: contains more than one file > No it is a single file. But how could i check for sure? Its extracts to a single file... ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
On 02/19/10 23:42, Norman Rieß wrote: > Hello, > > i am trying to read a large bz2 file with this code: > > source_file = bz2.BZ2File(file, "r") > for line in source_file: > print line.strip() > > But after 4311 lines, it stoppes without a errormessage. The bz2 file is > much bigger though. > How can i read the whole file line by line? Is the bz2 file an archive[1]? [1] archive: contains more than one file ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
Am 19.02.2010 17:04, schrieb Steven D'Aprano: > My guess is one of two things: > (1) You are mistaken that the file is bigger than 4311 lines. > > (2) You are using Windows, and somehow there is a Ctrl-Z (0x26) > character in the file, which Windows interprets as End Of File when > reading files in text mode. Try changing the mode to "rb" and see if > the behaviour goes away. > Am 19.02.2010 17:15, schrieb Stefan Behnel: > What does "stops" mean here? Does it crash? Does it exit from the loop? Is > the above code exactly what you used for testing? Are you passing a > filename? What platform is this on? > > > How many lines does it have? How did you count them? Did you make sure that > you are reading from the right file? > > Hello, i took the liberty and copied your mails together, so i do not have to repeat things. How big is the file and how did i count that: sm...@loki ~/osm $ bzcat planet-100210.osm.bz2 | wc -l 1717362770 (this took a looong time ;-)) sm...@loki ~/osm $ du -h planet-100210.osm.bz2 8,0Gplanet-100210.osm.bz2 So as you can see, the file really is bigger. I am not using Windows and the next character would be a period. sm...@loki ~/osm/osmcut $ ./osmcut.py ../planet-100210.osm.bz2 [...] I did set the mode to "rb" with the same result. I also edited the code to see if the loop was exited or the program crashed. As you can see, there is no error, the loop just exits. This is the _exact_ code i use: source_file = bz2.BZ2File(osm_file, "r") for line in source_file: print line.strip() print "Exiting" print "I used file: " + osm_file As you can see above, the loop exits, the prints are executed and the right file is used. The content of the file is really distinctive, so there is no doubt, that it is the right file. Here is my platform information: Python 2.6.4 Linux 2.6.32.8 #1 SMP Fri Feb 12 13:29:10 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU U9400 @ 1.40GHz GenuineIntel GNU/Linux Note: This symptome shows on another platform (SuSE 11.1) with different software versions as well. Is there a possibility, that the bz2 module reads only into a limited buffer and no further? If so, the same behaviour of the two independent systems would be explained and that it works in Stevens smaller example. How could i avoid that? Oh and the content of the file is free, so i do not get into legal issues exposing it. Thanks. Regards, Norman ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
Norman Rieß, 19.02.2010 13:42: > i am trying to read a large bz2 file with this code: > > source_file = bz2.BZ2File(file, "r") > for line in source_file: > print line.strip() > > But after 4311 lines, it stoppes without a errormessage. What does "stops" mean here? Does it crash? Does it exit from the loop? Is the above code exactly what you used for testing? Are you passing a filename? What platform is this on? > The bz2 file is much bigger though. How many lines does it have? How did you count them? Did you make sure that you are reading from the right file? > How can i read the whole file line by line? Just as you do above, and it works for me. So the problem is likely elsewhere. Stefan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Reading large bz2 Files
On Fri, 19 Feb 2010 11:42:07 pm Norman Rieß wrote: > Hello, > > i am trying to read a large bz2 file with this code: > > source_file = bz2.BZ2File(file, "r") > for line in source_file: > print line.strip() > > But after 4311 lines, it stoppes without a errormessage. The bz2 file > is much bigger though. > > How can i read the whole file line by line? "for line in file" works for me: >>> import bz2 >>> >>> writer = bz2.BZ2File('file.bz2', 'w') >>> for i in xrange(2): ... # write some variable text to a line ... writer.write('abc'*(i % 5) + '\n') ... >>> writer.close() >>> reader = bz2.BZ2File('file.bz2', 'r') >>> i = 0 >>> for line in reader: ... i += 1 ... >>> reader.close() >>> i 2 My guess is one of two things: (1) You are mistaken that the file is bigger than 4311 lines. (2) You are using Windows, and somehow there is a Ctrl-Z (0x26) character in the file, which Windows interprets as End Of File when reading files in text mode. Try changing the mode to "rb" and see if the behaviour goes away. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Reading large bz2 Files
Hello, i am trying to read a large bz2 file with this code: source_file = bz2.BZ2File(file, "r") for line in source_file: print line.strip() But after 4311 lines, it stoppes without a errormessage. The bz2 file is much bigger though. How can i read the whole file line by line? Thank you. Regards, Norman ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor