The patch looks generally ok. However I'll need a few more days to review it carefully and throughly. (you can merge it in the meantime if you want). I'm just slighty concern with the amount of parsing we are now doing by hand, and with the possible (local) security bugs we might be introducing..

Nuno

----- Original Message -----
Hi Dmitry, Brian, all,

Here's a scanner patch that I mentioned awhile ago, with a possible way to work around the re2c EOF handling issues.

The primary change is to do a "manual scan" like I talked about in areas that match large amounts and can contain NULL bytes (strings/comments, which are now scanned faster too), as is done for inline HTML. I called it a "diet" :-) because it removes my complicated string regex patterns from a couple years ago, which doesn't make the .l file much smaller after adding the manual scan code (easier to understand...?), but it does result in a ~34k reduction of 5.3's generated .c file...

This fixes Bug #46817, as well as a better, more proper fix for the older Bug #42767, both related to ending comments.

Now inline HTML chunks aren't broken up when a tag starting with "s" is encountered (<script> for JS, <span>, etc.), since it's unlikely to be a long PHP <script> tag.

If an opening PHP <SCRIPT> tag was used with a capital "S", it was missed if it wasn't the first thing scanned:

var_dump(token_get_all("HTML... <SCRIPT language=php>"));

Single-line comments with a Windows newline didn't include the full \r\n:

var_dump(token_get_all("<?php // Comment\r\n?>"));

Finally, part of the optimized scanning is that, for double quoted strings, when the first variable is encountered (making it non-constant), the amount that's been scanned up to that point is remembered, which can then be skipped over (up to the variable) after returning the quote token. Previously that initial part of the string was rescanned -- the cost dependent on how far "into" the string the first var is.


I think that's about all -- I'll send another message if I forgot to mention anything... Just wanted to send this along quick for to you guys to look at or whatever. It was basically done last week, I just had to do a couple finishing touches and verify that everything was OK.

http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't test yet.)
http://realplain.com/php/scanner_diet_5_3.diff


Thanks,
Matt


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to