On Tue, Oct 9, 2012 at 5:10 AM, Tjerk Anne Meesters <datib...@php.net> wrote:
> Hi,
>
> I've managed to pinpoint the issue inside the code itself and attached a
> patch for 5.4.4 (I can make one for trunk as well, but at the time of
> writing I worked with what I had).
>
> The bug manifests itself when delimiter size > 1 AND the file pointer falls
> in between a delimiter after filling the read buffer with
> php_stream_fill_read_buffer().
>
> When this happens, the part of the delimiter that falls on the left side of
> the file pointer is skipped at the next iteration because it was examined
> before; however, that only makes sense for single character delimiters.
>
> My patch will decrement the skip length (if non-zero) by at most <delimiter
> length - 1> bytes before performing the search. This will make sure any
> buffered characters are taken into consideration (again).
>
>


Yup, that makes perfect sense. I had narrowed it down to somewhere
within _php_stream_search_delim, but I couldn't actually think of a
reasonable fix without potentially breaking something else. This looks
quite reasonable and I think it should work fine.


> On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan <theanomaly...@gmail.com>
> wrote:
>>
>> On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters <datib...@php.net>
>> wrote:
>> > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer
>> > <sc...@planetavent.de>wrote:
>> >
>> >> Hi!
>> >>
>> >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a
>> >> behaviour of stream_get_line that is most likely a bug and breaks a
>> >> lot of our file processing code.
>> >>
>> >> The issue seems to have been introduced from 5.3.10 to 5.3.11.
>> >>
>> >> I opened a bug report: #63240.
>> >>
>> >
>> > I've managed to reduce the code to this; it's very specific:
>> >
>> > $file = __DIR__ . '/input_dummy.txt';
>> > $delimiter = 'MM';
>> > file_put_contents($file, str_repeat('.', 8189) . $delimiter .
>> > $delimiter);
>> >
>> > $fh = fopen($file, "rb");
>> >
>> > stream_get_line($fh, 8192, $delimiter);
>> > var_dump($delimiter === stream_get_line($fh, 8192, $delimter));
>> >
>> > fclose($fh);
>> > unlink($file);
>> >
>> > If the internal buffer length is 8192, after the first call to
>> > stream_get_line() the read position (x) and physical file pointer (y)
>> > should be positioned like so:
>> >
>> > .......MM(x)M(y)M
>> >
>> > The fact that (y) is in between the delimiter seems to cause an issue.
>> >
>> >
>>
>>
>> I'm not sure why this bug exists, and I haven't exactly been able to
>> pinpoint where the bug manifests itself, but something I find
>> incredibly unusual here is the fact that the size of the stream being
>> exactly 8193 bytes long is the reason the bug exists.
>>
>> It has nothing to do with the file pointers position since all we have
>> to do here is increase or decrease the size of the file by exactly 1
>> byte and the bug will never show its face.
>>
>> Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes)
>>
>> $file = __DIR__ . '/input_dummy.txt';
>> $delimiter = 'MM';
>> file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter);
>>
>> $fh = fopen($file, "rb");
>>
>> stream_get_line($fh, 8192, $delimiter);
>> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));
>>
>> fclose($fh);
>> unlink($file);
>>
>> /* bool(false) */
>>
>> ---------------------------------------
>>
>> Test 2: (we increase the file size from 8193 bytes to 8194 bytes)
>>
>> $file = __DIR__ . '/input_dummy.txt';
>> $delimiter = 'MM';
>> file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter);
>>
>> $fh = fopen($file, "rb");
>>
>> stream_get_line($fh, 8192, $delimiter);
>> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter));
>>
>> fclose($fh);
>> unlink($file);
>>
>> /* bool(false) */
>>
>>
>> ----------------------
>>
>>
>> As long as the file size is not exactly equal to 8193 bytes you don't
>> get this issue. In fact, you can test it with any multiple of 8192 + 1
>> and the same issue appears. However, the bigger anomaly is that it
>> also requires the length of the delimiter to be larger than 1 before
>> the bug manifests itself.
>>
>> I suspect this has something to do with the way PHP streams are
>> buffered internally. The internal stream is read up to a certain
>> length and buffered in memory using the internal API functions, while
>> your calls to PHP-facing functions like stream_get_line() read
>> directly from the buffer instead. So it's possible somewhere in this
>> function (line 1026 of main/streams/streams.c
>> http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the
>> bug.
>>
>>
>>
>> >> The issue seems to be related to #44607, but that one got fixed years
>> >> ago.
>> >>
>> >> Is anybody able to confirm this behaviour or has stumbled upon this?
>> >>
>> >> Furthermore the behaviour of stream_get_line on an empty file seems to
>> >> have changed between php 5.3.10 and php 5.3.11:
>> >>
>> >> <?php
>> >>
>> >> $file = __DIR__ . 'empty.txt';
>> >> file_put_contents( $file, '' );
>> >> $fh = fopen( $file, 'rb' );
>> >> $data = stream_get_line( $fh, 4096 );
>> >> var_dump( $data );
>> >>
>> >> result in
>> >>
>> >> string(0) ""
>> >>
>> >> for php 5.3.10
>> >>
>> >> and in
>> >>
>> >> bool(false)
>> >>
>> >> for php > 5.3.10.
>> >
>> > I don't know if this should be considered a bug, but as far as I know
>> >> such a behaviour should not change during minor releases...
>> >>
>> >> Any insight is appreciated!
>> >>
>> >> Greetings
>> >>
>> >> Nico
>> >>
>> >> --
>> >> PHP Internals - PHP Runtime Development Mailing List
>> >> To unsubscribe, visit: http://www.php.net/unsub.php
>> >>
>> >>
>> >
>> >
>> > --
>> > --
>> > Tjerk
>
>
>
>
> --
> --
> Tjerk

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to