#33093 [Bgs]: token_get_all() inconsistent results?

[EMAIL PROTECTED] Sun, 22 May 2005 07:05:09 -0700

 ID:               33093
 User updated by:  [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
 Status:           Bogus
 Bug Type:         Unknown/Other Function
 Operating System: Mac OS X 10.4.1
 PHP Version:      5.0.4
 New Comment:


The second command-line test should have pairs of \n newlines, not
singles.

A corollary issue is that the results on the same code are
inconsistent.  Sometimes my token_get_all() returns the expected result
(T_OPEN_TAG) and sometimes an unexpected result ("<", "?", T_STRING of
"php").  Could there be a reason for the engine being "finicky"?


Previous Comments:
------------------------------------------------------------------------

[2005-05-22 13:16:04] [EMAIL PROTECTED]

Indeed, there is no bug here.

------------------------------------------------------------------------

[2005-05-22 05:55:48] [EMAIL PROTECTED]

Actually the tokenizer just plugs into the internal tokenize code used
by the engine. As such, the engine doesnt need to know some
information, and is written to work as quickly and effeciently as
possible, rather than being 100% dead on for parsing.

It's unlikely to be fixed just for token_get_all(), as introducing
changes can have quite radical effects sometimes when touching that bit
of code.

The values with the tokens should enable you to get the CR/LF count
ok.. 

------------------------------------------------------------------------

[2005-05-22 05:51:20] [EMAIL PROTECTED]

wheres the missing data?

php -r 'var_dump(token_get_all("<?php echo \$var ?>"));'
array(6) {
  [0]=>
  array(2) {
    [0]=>
    int(366)
    [1]=>
    string(6) "<?php "
  }
  [1]=>
  array(2) {
    [0]=>
    int(316)
    [1]=>
    string(4) "echo"
  }
  [2]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [3]=>
  array(2) {
    [0]=>
    int(309)
    [1]=>
    string(4) "$var"
  }
  [4]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [5]=>
  array(2) {
    [0]=>
    int(368)
    [1]=>
    string(2) "?>"
  }
}




php -r 'var_dump(token_get_all("<?php \necho \$var\n?>"));'
array(7) {
  [0]=>
  array(2) {
    [0]=>
    int(366)
    [1]=>
    string(6) "<?php "
  }
  [1]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) "
"
  }
  [2]=>
  array(2) {
    [0]=>
    int(316)
    [1]=>
    string(4) "echo"
  }
  [3]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [4]=>
  array(2) {
    [0]=>
    int(309)
    [1]=>
    string(4) "$var"
  }
  [5]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) "
"
  }
  [6]=>
  array(2) {
    [0]=>
    int(368)
    [1]=>
    string(2) "?>"
  }


------------------------------------------------------------------------

[2005-05-21 18:40:38] [EMAIL PROTECTED]

Description:
------------
It appears that token_get_all() does not report T_OPEN_TAG and
T_WHITESPACE properly, depending on the whitespace following the
opening tag.  For example, when parsing ...

<?php echo $var ?>

... you get T_OPEN_TAG, T_ECHO, T_WHITESPACE, T_VAR, T_WHITESPACE, and
T_CLOSE_TAG.  This is not entirely the expected result (I would expect
T_WHITESPACE between the open tag and the echo).

However, when parsing the functional equivalent...

<?php

echo $var

?>

you get "<", "?", T_STRING ("php"), T_WHITESPACE, T_ECHO, T_WHITESPACE,
T_VAR, T_WHITESPACE, and T_CLOSE_TAG.  In addition, the first whitespace
value reported does not include all the newlines (it drops one).

Although Macs use \r for their newlines natively, the test code uses
the Unix-standard \n, so I don't think it's Mac-related.

If this is in fact a bug, the current behavior makes it difficult to
write a reliable userland code auditor and report proper line numbers.

Am I missing some assumptions behind the behavior of the tokenizer
function?



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=33093&edit=1

#33093 [Bgs]: token_get_all() inconsistent results?

Reply via email to