ID: 42117 User updated by: phofstetter at sensational dot ch Reported By: phofstetter at sensational dot ch Status: Open Bug Type: Bzip2 Related Operating System: Linux and Mac OSX PHP Version: 5.2.3 New Comment:
after finally getting some sleep, today I looked at the problem in bz2_filter.c and I may have found something. On line 229+ we have if (flags & PSFS_FLAG_FLUSH_CLOSE) { /* Spit it out! */ status = BZ_OUTBUFF_FULL; while (status == BZ_OUTBUFF_FULL) { status = BZ2_bzCompress(&(data->strm), BZ_FINISH); if (data->strm.avail_out < data->outbuf_len) { size_t bucketlen = data->outbuf_len - data->strm.avail_out; bucket = php_stream_bucket_new(stream, estrndup(data->outbuf, bucketlen), bucketlen, 1, 0 TSRMLS_CC); php_stream_bucket_append(buckets_out, bucket TSRMLS_CC); data->strm.avail_out = data->outbuf_len; data->strm.next_out = data->outbuf; exit_status = PSFS_PASS_ON; } } } now the problem is IMHO that BZ2_bzCompress with BZ_FINISH will never return BZ_OUTBUFF_FULL. Looking at the documentation, it will return BZ_RUN_OK until all data has been processed when it will return BZ_FINISH_OK. So with the code as it is currently in PHP, it will only do one run ob BZ2_bzCompress and then stop working even though more calls could be needed. This is consistent with how the bug manifests itself. I will try to correct the return code handling, but keep in mind that my C-skills are subpar, so the patch I'm going to post afterwards is probably not as good as it could be, so please have a look at the thing. Philip Previous Comments: ------------------------------------------------------------------------ [2007-07-26 22:53:50] phofstetter at sensational dot ch Description: ------------ When bzip2.compress is attached to a stream and enough data is created so the output will be larger than some internal buffer, all data in the last not totally full buffer seems to get lost on the way out. The sample code contains quite a lot of filler text which is needed to actually fill up the internal buffer full enough to trigger the problem. I always had this problem since the stream filters got introduced into PHP, but now I could finally create a very much reduced test case explaining the problem. Reproduce code: --------------- <? $str = "BEGIN (%d)\n Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. \nEND (%d)\n"; $h = fopen($_SERVER['argv'][1], 'w'); $f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE); for($x=0; $x < 10000; $x++){ fprintf($h, $str, $x, $x); } fclose($h); echo "Written\n"; ?> Expected result: ---------------- If I call the script with ./script.php blah and then use bzcat blah I expect the complete data output to the console. from BEGIN (0) to END (9999) Actual result: -------------- bzcat outputs until somewhere around "END (9207)" and then bails out with bzcat: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzcat: Unknown error: 0 Input file = blah, output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. and true enough - the file is not completely written. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=42117&edit=1