ID: 42117 User updated by: phofstetter at sensational dot ch Reported By: phofstetter at sensational dot ch -Status: Feedback +Status: Open Bug Type: Bzip2 Related Operating System: Linux and Mac OSX PHP Version: 5.2.3 New Comment:
Hello, even with the latest snapshot, the bug is still there. Data in bzip's internal buffer is lost on stream close due to the problem I discovered in bz2_filter.c The patch I proposed blow seems to do the right thing and in fact is now creating more than 1000 correct bzip2-streams per day, so I think it's save to say that it really does its job :-) It's illegal to compare the return code of BZ2_bzCompress(&(data->strm), BZ_FINISH); with BZ_OUTBUFF_FULL as BZ2_bzCompress *never* returns BZ_OUTBUFF_FULL (which is a return value of one of the higher level convenience functions in bzlib. Philip Previous Comments: ------------------------------------------------------------------------ [2007-08-04 20:56:33] [EMAIL PROTECTED] Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows (zip): http://snaps.php.net/win32/php5.2-win32-latest.zip For Windows (installer): http://snaps.php.net/win32/php5.2-win32-installer-latest.msi ------------------------------------------------------------------------ [2007-07-27 10:06:28] phofstetter at sensational dot ch looking at the documentation wasn't enough. When I looked at the source of bzlib, I found out this: BZ2_bzCompress called with BZ_FINISH keeps returning BZ_FINISH_OK (instead of BZ_RUN_OK which I assumed after reading the docs) until it's really done. Then it will return BZ_STREAM_END So the following patch fixes this bug: --- bz2_filter.c.orig 2007-07-27 11:24:44.000000000 +0200 +++ bz2_filter.c 2007-07-27 11:54:35.000000000 +0200 @@ -228,8 +228,8 @@ if (flags & PSFS_FLAG_FLUSH_CLOSE) { /* Spit it out! */ - status = BZ_OUTBUFF_FULL; - while (status == BZ_OUTBUFF_FULL) { + status = BZ_FINISH_OK; + while (status == BZ_FINISH_OK) { status = BZ2_bzCompress(&(data->strm), BZ_FINISH); if (data->strm.avail_out < data->outbuf_len) { size_t bucketlen = data->outbuf_len - data->strm.avail_out; With this modification, the complete data gets written out to the stream. Please consider applying this patch as without it, the bzip2.compress filter will sometimes (often - if the data is large enough to be bigger than the internal buffer) create corrupted data. Philip PS: The patch is against 5.2.2 as I'm unable to compile 5.2.3 on OSX with GD enabled due to gcc being called with an empty -L tag somewhere in configure. ------------------------------------------------------------------------ [2007-07-27 09:21:38] phofstetter at sensational dot ch after finally getting some sleep, today I looked at the problem in bz2_filter.c and I may have found something. On line 229+ we have if (flags & PSFS_FLAG_FLUSH_CLOSE) { /* Spit it out! */ status = BZ_OUTBUFF_FULL; while (status == BZ_OUTBUFF_FULL) { status = BZ2_bzCompress(&(data->strm), BZ_FINISH); if (data->strm.avail_out < data->outbuf_len) { size_t bucketlen = data->outbuf_len - data->strm.avail_out; bucket = php_stream_bucket_new(stream, estrndup(data->outbuf, bucketlen), bucketlen, 1, 0 TSRMLS_CC); php_stream_bucket_append(buckets_out, bucket TSRMLS_CC); data->strm.avail_out = data->outbuf_len; data->strm.next_out = data->outbuf; exit_status = PSFS_PASS_ON; } } } now the problem is IMHO that BZ2_bzCompress with BZ_FINISH will never return BZ_OUTBUFF_FULL. Looking at the documentation, it will return BZ_RUN_OK until all data has been processed when it will return BZ_FINISH_OK. So with the code as it is currently in PHP, it will only do one run ob BZ2_bzCompress and then stop working even though more calls could be needed. This is consistent with how the bug manifests itself. I will try to correct the return code handling, but keep in mind that my C-skills are subpar, so the patch I'm going to post afterwards is probably not as good as it could be, so please have a look at the thing. Philip ------------------------------------------------------------------------ [2007-07-26 22:53:50] phofstetter at sensational dot ch Description: ------------ When bzip2.compress is attached to a stream and enough data is created so the output will be larger than some internal buffer, all data in the last not totally full buffer seems to get lost on the way out. The sample code contains quite a lot of filler text which is needed to actually fill up the internal buffer full enough to trigger the problem. I always had this problem since the stream filters got introduced into PHP, but now I could finally create a very much reduced test case explaining the problem. Reproduce code: --------------- <? $str = "BEGIN (%d)\n Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. \nEND (%d)\n"; $h = fopen($_SERVER['argv'][1], 'w'); $f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE); for($x=0; $x < 10000; $x++){ fprintf($h, $str, $x, $x); } fclose($h); echo "Written\n"; ?> Expected result: ---------------- If I call the script with ./script.php blah and then use bzcat blah I expect the complete data output to the console. from BEGIN (0) to END (9999) Actual result: -------------- bzcat outputs until somewhere around "END (9207)" and then bails out with bzcat: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzcat: Unknown error: 0 Input file = blah, output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. and true enough - the file is not completely written. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=42117&edit=1