ID: 46817 Updated by: lu...@php.net Reported By: master dot jexus at gmail dot com Status: Verified Bug Type: Scripting Engine problem Operating System: * PHP Version: 5.3.0alpha3 New Comment:
I'm seeing what could be related if not the same problem trying to detect trailing windows CR+LF in T_WHITESPACE: Reproduce code: --------------- <?php // this comment and trailing blank contain windows CR+LF^M ^M Expected result: ---------------- array(3) { [0]=> array(3) { [0]=> int(367) [1]=> string(6) "<?php " [2]=> int(1) } [1]=> array(3) { [0]=> int(365) [1]=> " string(57) "// this comment and trailing blank contain windows CR+LF [2]=> int(2) } [2]=> array(3) { [0]=> int(370) [1]=> string(3) " " int(2) } } [2]=> int(2) } } Actual result: -------------- array(2) { [0]=> array(3) { [0]=> int(368) [1]=> string(6) "<?php " [2]=> int(1) } [1]=> array(3) { [0]=> int(366) [1]=> " string(57) "// this comment and trailing blank contain windows CR+LF [2]=> int(2) } } Previous Comments: ------------------------------------------------------------------------ [2008-12-10 10:25:25] nlop...@php.net this is a problem in the new lexer. The problem is reproduceable if after the comment there's the EOF (with no \n after the comment). This, again, is triggered because of the difference in handling the EOF between flex and re2c.. A simple hack would be to detect the ST_ONE_LINE_COMMENT state on EOF and return the correct value, but I would prefer a more general thing. ------------------------------------------------------------------------ [2008-12-09 22:35:46] master dot jexus at gmail dot com Description: ------------ When using the tokenizer to lex given text, the output seems to miss the last token, if it was a single line comment. It only seems to occur if there isn't a newline behind the comment lexeme. Note the last entries in the arrays. Reproduce code: --------------- <?php print_r(token_get_all(file_get_contents(__FILE__))); // test $var = 5; // test Expected result: ---------------- Array ( [0] => Array ( [0] => 367 [1] => 1 ) [1] => Array ( [0] => 307 [1] => print_r [2] => 2 ) [2] => ( [3] => Array ( [0] => 307 [1] => token_get_all [2] => 2 ) [4] => ( [5] => Array ( [0] => 307 [1] => file_get_contents [2] => 2 ) [6] => ( [7] => Array ( [0] => 364 [1] => __FILE__ [2] => 2 ) [8] => ) [9] => ) [10] => ) [11] => ; [12] => Array ( [0] => 370 [1] => [2] => 2 ) [13] => Array ( [0] => 365 [1] => // test [2] => 4 ) [14] => Array ( [0] => 309 [1] => $var [2] => 5 ) [15] => Array ( [0] => 370 [1] => [2] => 5 ) [16] => = [17] => Array ( [0] => 370 [1] => [2] => 5 ) [18] => Array ( [0] => 305 [1] => 5 [2] => 5 ) [19] => ; [20] => Array ( [0] => 370 [1] => [2] => 5 ) [21] => Array ( [0] => 365 [1] => // test [2] => 6 ) ) Actual result: -------------- Array ( [0] => Array ( [0] => 368 [1] => 1 ) [1] => Array ( [0] => 307 [1] => print_r [2] => 2 ) [2] => ( [3] => Array ( [0] => 307 [1] => token_get_all [2] => 2 ) [4] => ( [5] => Array ( [0] => 307 [1] => file_get_contents [2] => 2 ) [6] => ( [7] => Array ( [0] => 365 [1] => __FILE__ [2] => 2 ) [8] => ) [9] => ) [10] => ) [11] => ; [12] => Array ( [0] => 371 [1] => [2] => 2 ) [13] => Array ( [0] => 366 [1] => // test [2] => 4 ) [14] => Array ( [0] => 309 [1] => $var [2] => 5 ) [15] => Array ( [0] => 371 [1] => [2] => 5 ) [16] => = [17] => Array ( [0] => 371 [1] => [2] => 5 ) [18] => Array ( [0] => 305 [1] => 5 [2] => 5 ) [19] => ; [20] => Array ( [0] => 371 [1] => [2] => 5 ) ) ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=46817&edit=1