ID:               42117
 User updated by:  phofstetter at sensational dot ch
 Reported By:      phofstetter at sensational dot ch
 Status:           Open
 Bug Type:         Bzip2 Related
 Operating System: Linux and Mac OSX
 PHP Version:      5.2.3
 New Comment:

looking at the documentation wasn't enough. When I looked at the source
of bzlib, I found out this: 

BZ2_bzCompress called with BZ_FINISH keeps returning

BZ_FINISH_OK

(instead of BZ_RUN_OK which I assumed after reading the docs) until
it's really done. Then it will return 

BZ_STREAM_END

So the following patch fixes this bug:

--- bz2_filter.c.orig   2007-07-27 11:24:44.000000000 +0200
+++ bz2_filter.c        2007-07-27 11:54:35.000000000 +0200
@@ -228,8 +228,8 @@
 
        if (flags & PSFS_FLAG_FLUSH_CLOSE) {
                /* Spit it out! */
-               status = BZ_OUTBUFF_FULL;
-               while (status == BZ_OUTBUFF_FULL) {
+               status = BZ_FINISH_OK;
+               while (status == BZ_FINISH_OK) {
                        status = BZ2_bzCompress(&(data->strm),
BZ_FINISH);
                        if (data->strm.avail_out < data->outbuf_len) {
                                size_t bucketlen = data->outbuf_len -
data->strm.avail_out;

With this modification, the complete data gets written out to the
stream.

Please consider applying this patch as without it, the bzip2.compress
filter will sometimes (often - if the data is large enough to be bigger
than the internal buffer) create corrupted data.

Philip
PS: The patch is against 5.2.2 as I'm unable to compile 5.2.3 on OSX
with GD enabled due to gcc being called with an empty -L tag somewhere
in configure.


Previous Comments:
------------------------------------------------------------------------

[2007-07-27 09:21:38] phofstetter at sensational dot ch

after finally getting some sleep, today I looked at the problem in
bz2_filter.c and I may have found something.

On line 229+ we have

        if (flags & PSFS_FLAG_FLUSH_CLOSE) {
                /* Spit it out! */
                status = BZ_OUTBUFF_FULL;
                while (status == BZ_OUTBUFF_FULL) {
                        status = BZ2_bzCompress(&(data->strm), BZ_FINISH);
                        if (data->strm.avail_out < data->outbuf_len) {
                                size_t bucketlen = data->outbuf_len - 
data->strm.avail_out;

                                bucket = php_stream_bucket_new(stream, 
estrndup(data->outbuf,
bucketlen), bucketlen, 1, 0 TSRMLS_CC);
                                php_stream_bucket_append(buckets_out, bucket 
TSRMLS_CC);
                                data->strm.avail_out = data->outbuf_len;
                                data->strm.next_out = data->outbuf;
                                exit_status = PSFS_PASS_ON;
                        }
                }
        }

now the problem is IMHO that BZ2_bzCompress with BZ_FINISH will never
return BZ_OUTBUFF_FULL. Looking at the documentation, it will return
BZ_RUN_OK until all data has been processed when it will return
BZ_FINISH_OK.

So with the code as it is currently in PHP, it will only do one run ob
BZ2_bzCompress and then stop working even though more calls could be
needed.

This is consistent with how the bug manifests itself.

I will try to correct the return code handling, but keep in mind that
my C-skills are subpar, so the patch I'm going to post afterwards is
probably not as good as it could be, so please have a look at the
thing.

Philip

------------------------------------------------------------------------

[2007-07-26 22:53:50] phofstetter at sensational dot ch

Description:
------------
When bzip2.compress is attached to a stream and enough data is created
so the output will be larger than some internal buffer, all data in the
last not totally full buffer seems to get lost on the way out.

The sample code contains quite a lot of filler text which is needed to
actually fill up the internal buffer full enough to trigger the
problem.

I always had this problem since the stream filters got introduced into
PHP, but now I could finally create a very much reduced test case
explaining the problem.

Reproduce code:
---------------
<?
$str = "BEGIN (%d)\n
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.    
\nEND (%d)\n";

$h = fopen($_SERVER['argv'][1], 'w');
$f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE);
for($x=0; $x < 10000; $x++){
    fprintf($h, $str, $x, $x);

}
fclose($h);
echo "Written\n";
?>

Expected result:
----------------
If I call the script with

./script.php blah

and then use

bzcat blah

I expect the complete data output to the console.

from 

BEGIN (0)

to 

END (9999)


Actual result:
--------------
bzcat outputs until somewhere around "END (9207)" and then bails out
with

bzcat: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzcat: Unknown error: 0
        Input file = blah, output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.


and true enough - the file is not completely written.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=42117&edit=1

Reply via email to