Edit report at https://bugs.php.net/bug.php?id=54089&edit=1

 ID:                 54089
 Patch added by:     ni...@php.net
 Reported by:        nicolas dot grekas+php at gmail dot com
 Summary:            token_get_all with regards to __halt_compiler is not
                     binary safe
 Status:             Assigned
 Type:               Bug
 Package:            Unknown/Other Function
 Operating System:   Any
 PHP Version:        5.3.5
 Assigned To:        iliaa
 Block user comment: N
 Private report:     N

 New Comment:

The following patch has been added/updated:

Patch Name: tokenizer_patch_full.txt
Revision:   1316097166
URL:        
https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch_full.txt&revision=1316097166


Previous Comments:
------------------------------------------------------------------------
[2011-09-15 14:27:27] ni...@php.net

The following patch has been added/updated:

Patch Name: tokenizer_patch.txt
Revision:   1316096847
URL:        
https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch.txt&revision=1316096847

------------------------------------------------------------------------
[2011-09-13 17:11:54] ni...@php.net

The following patch has been added/updated:

Patch Name: tokenizer_patch_full.txt
Revision:   1315933914
URL:        
https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch_full.txt&revision=1315933914

------------------------------------------------------------------------
[2011-09-13 15:50:49] ni...@php.net

The following patch has been added/updated:

Patch Name: tokenizer_patch.txt
Revision:   1315929049
URL:        
https://bugs.php.net/patch-display.php?bug=54089&patch=tokenizer_patch.txt&revision=1315929049

------------------------------------------------------------------------
[2011-09-13 07:49:54] nicolas dot grekas+php at gmail dot com

Excerpt from internals:

Nikita Popov    Fri, Sep 9, 2011 at 09:15

[...]
The problem with the patch is, that there are some tokens after
T_HALT_COMPILER that are of interest, namely the '(' ')' ';'. After
the patch it is impossible to get those tokens, without either
relexing the code after T_HALT_COMPILER (that way you get the binary
data problem back, just with much more complex code) or writing a
regular expression to match it (which is really hard, as there may be
any token dropped by the PHP parser in there, i.e. whitespace,
comments, PHP tags).

[...]
I would like this patch to be reverted on the 5.4 and trunk branches.
I assume it's too late to revert it on 5.3, as it has been there for
some time already. It is just counterproductive. (Alternatively one
could fix token_get_all to return the (); tokens after
__halt_compiler, too, but that would be hard, probably.)

--

Ferenc Kovacs   Fri, Sep 9, 2011 at 10:01

I think that it wouldn't be too hard.
>From a quick glance on the code, we should introduce a new local
variable, set that to true where we break now (
http://svn.php.net/viewvc/php/php-src/trunk/ext/tokenizer/tokenizer.c?view=markup#l155
) and don't break there but for the next ';'. another maybe less
confusing solution would be to explicitly add '(', ')' and ';' to the
result in the T_HALT_COMPILER condition before breking out of the
loop.
I will create a patch for this afternoon.

or could there be other important tokens after the __halt_compiler()
which should be present in the token_get_all() result?

--

Nicolas Grekas  Fri, Sep 9, 2011 at 10:46

> don't break there but for the next ';'.

You can also just count the number of semantic token after T_HALT_COMPILER (ie 
excluding whitespace and comments) and once you hit 3, halt.

> less confusing solution would be to explicitly add '(', ')' and ';' to the
> result in the T_HALT_COMPILER condition before breking out of the loop.

If you mean verifying that '(', ')' and (';' or T_CLOSE_TAG) are effectively 
following T_HALT_COMPILER, I think that's part of the syntax analyser's job, 
not tokenizer's.
If you're ok with this argument, then just couting 3 tokens is really the most 
basic "syntax analysis" we have to do to fix the pb, don't you think?

> could there be other important tokens after the __halt_compiler()
> which should be present in the token_get_all() result?

Maybe the binary data itself, as a big T_INLINE_HTML for example ?

Also, if token_get_all you be made binary safe, that would be very cool ! (no 
more eating of \x00-\x1F inside regular code) :)

--

Nikita Popov    Fri, Sep 9, 2011 at 19:39

> You can also just count the number of semantic token after T_HALT_COMPILER
> (ie excluding whitespace and comments) and once you hit 3, halt.
> [...]
> Maybe the binary data itself, as a big T_INLINE_HTML for example ?
In favor of both proposals!
Returning the next 3 tokens should be quite easy [1].
Returning the rest as an T_INLINE_HTML makes sense too, as extracting
the data is probably what you want. Though I have no idea how to
implement that ^^

[1]: 
https://github.com/nikic/php-src/commit/2d4cfa05947f04de447635ca5748b3b58defbfaf
(Not tested, only guessing)

--

Nikita Popov    Tue, Sep 13, 2011 at 09:16

I just set up an PHP environment and wrote a proper patch (including
test changes) to make it collect the next three tokens. It's a git
patch and I'm not sure whether it's compatible with SVN patches. I
would love it if this would go into 5.4 before beta. I didn't know how
one could fetch the rest into T_INLINE_HTML, so I'm hoping on help
here from someone who actually knowns C there :)

------------------------------------------------------------------------
[2011-03-17 23:51:25] nicolas dot grekas+php at gmail dot com

Latest 5.3.6 has been released with the patch...
So now token_get_all() stops on T_HALT_COMPILER for ever :)

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=54089


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=54089&edit=1

Reply via email to