On Tue, Oct 9, 2012 at 5:10 AM, Tjerk Anne Meesters <datib...@php.net> wrote: > Hi, > > I've managed to pinpoint the issue inside the code itself and attached a > patch for 5.4.4 (I can make one for trunk as well, but at the time of > writing I worked with what I had). > > The bug manifests itself when delimiter size > 1 AND the file pointer falls > in between a delimiter after filling the read buffer with > php_stream_fill_read_buffer(). > > When this happens, the part of the delimiter that falls on the left side of > the file pointer is skipped at the next iteration because it was examined > before; however, that only makes sense for single character delimiters. > > My patch will decrement the skip length (if non-zero) by at most <delimiter > length - 1> bytes before performing the search. This will make sure any > buffered characters are taken into consideration (again). > >
Yup, that makes perfect sense. I had narrowed it down to somewhere within _php_stream_search_delim, but I couldn't actually think of a reasonable fix without potentially breaking something else. This looks quite reasonable and I think it should work fine. > On Tue, Oct 9, 2012 at 4:33 PM, Sherif Ramadan <theanomaly...@gmail.com> > wrote: >> >> On Tue, Oct 9, 2012 at 12:59 AM, Tjerk Anne Meesters <datib...@php.net> >> wrote: >> > On Tue, Oct 9, 2012 at 12:14 AM, Nicolai Scheer >> > <sc...@planetavent.de>wrote: >> > >> >> Hi! >> >> >> >> We switched from php 5.3.10 to 5.3.17 this weekend and stumbled upon a >> >> behaviour of stream_get_line that is most likely a bug and breaks a >> >> lot of our file processing code. >> >> >> >> The issue seems to have been introduced from 5.3.10 to 5.3.11. >> >> >> >> I opened a bug report: #63240. >> >> >> > >> > I've managed to reduce the code to this; it's very specific: >> > >> > $file = __DIR__ . '/input_dummy.txt'; >> > $delimiter = 'MM'; >> > file_put_contents($file, str_repeat('.', 8189) . $delimiter . >> > $delimiter); >> > >> > $fh = fopen($file, "rb"); >> > >> > stream_get_line($fh, 8192, $delimiter); >> > var_dump($delimiter === stream_get_line($fh, 8192, $delimter)); >> > >> > fclose($fh); >> > unlink($file); >> > >> > If the internal buffer length is 8192, after the first call to >> > stream_get_line() the read position (x) and physical file pointer (y) >> > should be positioned like so: >> > >> > .......MM(x)M(y)M >> > >> > The fact that (y) is in between the delimiter seems to cause an issue. >> > >> > >> >> >> I'm not sure why this bug exists, and I haven't exactly been able to >> pinpoint where the bug manifests itself, but something I find >> incredibly unusual here is the fact that the size of the stream being >> exactly 8193 bytes long is the reason the bug exists. >> >> It has nothing to do with the file pointers position since all we have >> to do here is increase or decrease the size of the file by exactly 1 >> byte and the bug will never show its face. >> >> Test case 1: (we decrease the file size from 8193 bytes to 8192 bytes) >> >> $file = __DIR__ . '/input_dummy.txt'; >> $delimiter = 'MM'; >> file_put_contents($file, str_repeat('.', 8188) . $delimiter . $delimiter); >> >> $fh = fopen($file, "rb"); >> >> stream_get_line($fh, 8192, $delimiter); >> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); >> >> fclose($fh); >> unlink($file); >> >> /* bool(false) */ >> >> --------------------------------------- >> >> Test 2: (we increase the file size from 8193 bytes to 8194 bytes) >> >> $file = __DIR__ . '/input_dummy.txt'; >> $delimiter = 'MM'; >> file_put_contents($file, str_repeat('.', 8190) . $delimiter . $delimiter); >> >> $fh = fopen($file, "rb"); >> >> stream_get_line($fh, 8192, $delimiter); >> var_dump($delimiter === stream_get_line($fh, 8192, $delimiter)); >> >> fclose($fh); >> unlink($file); >> >> /* bool(false) */ >> >> >> ---------------------- >> >> >> As long as the file size is not exactly equal to 8193 bytes you don't >> get this issue. In fact, you can test it with any multiple of 8192 + 1 >> and the same issue appears. However, the bigger anomaly is that it >> also requires the length of the delimiter to be larger than 1 before >> the bug manifests itself. >> >> I suspect this has something to do with the way PHP streams are >> buffered internally. The internal stream is read up to a certain >> length and buffered in memory using the internal API functions, while >> your calls to PHP-facing functions like stream_get_line() read >> directly from the buffer instead. So it's possible somewhere in this >> function (line 1026 of main/streams/streams.c >> http://lxr.php.net/xref/PHP_5_4/main/streams/streams.c#1026) lies the >> bug. >> >> >> >> >> The issue seems to be related to #44607, but that one got fixed years >> >> ago. >> >> >> >> Is anybody able to confirm this behaviour or has stumbled upon this? >> >> >> >> Furthermore the behaviour of stream_get_line on an empty file seems to >> >> have changed between php 5.3.10 and php 5.3.11: >> >> >> >> <?php >> >> >> >> $file = __DIR__ . 'empty.txt'; >> >> file_put_contents( $file, '' ); >> >> $fh = fopen( $file, 'rb' ); >> >> $data = stream_get_line( $fh, 4096 ); >> >> var_dump( $data ); >> >> >> >> result in >> >> >> >> string(0) "" >> >> >> >> for php 5.3.10 >> >> >> >> and in >> >> >> >> bool(false) >> >> >> >> for php > 5.3.10. >> > >> > I don't know if this should be considered a bug, but as far as I know >> >> such a behaviour should not change during minor releases... >> >> >> >> Any insight is appreciated! >> >> >> >> Greetings >> >> >> >> Nico >> >> >> >> -- >> >> PHP Internals - PHP Runtime Development Mailing List >> >> To unsubscribe, visit: http://www.php.net/unsub.php >> >> >> >> >> > >> > >> > -- >> > -- >> > Tjerk > > > > > -- > -- > Tjerk -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php