Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi Brian, - Original Message - From: "shire" Sent: Monday, May 04, 2009 Matt Wilmas wrote: [...] How about this? #define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len) #define GET_DOUBLE_QUOTES_SCANNED_LENGTH() CG(doc_comment_len) Sure, works for me ;-) Cool. :-) [...] Have you considered using the lexer STATES and regex's instead of the manual C code for scanning the rest. It seems like if we have a one-char regex match for what the C code is doing we could handle this in the lexer without a lot of manual intervention (need to look at it more, just a thought I had earlier, the expressions are clearer now with your patch applied) ;-) It seems that matching one-char-at-a-time with re2c would be more complicated than the manual way, not to mention slower than the old (current) way. Do you have any objection (well, you've kinda mentioned some :-)) if I'd commit the changes in a little while like Dmitry thought could be done? Well I'm wondering if something more along these lines (just did this on-top of your patch as you cleaned up a lot) might be more appealing. (I'm not sure how much slower this would be than the current implementation, obviously it'll be somewhat slower, I'm basically just doing what you did in C but in the scanner instead of course). "#"|"//" { BEGIN(ST_EOL_COMMENT); yymore(); } ({NEWLINE}|"%>"|"?>") { char tmp = *(YYCURSOR-1); if ((tmp == '%' && CG(asp_tags)) | tmp == '?') { YYCURSOR -= 2; } CG(zend_lineno)++; BEGIN(ST_IN_SCRIPTING); return T_COMMENT; } {ANY_CHAR} { if (YYCURSOR >= YYLIMIT) { BEGIN(ST_IN_SCRIPTING); return T_COMMENT; } yymore(); } Let me know what the thoughts are on the above, if we don't want that then I say yeah, commit away! Wouldn't it be a little more complicated for strings/heredocs than comments? Or not, haven't thought about it much! :-) And you still need the "manual use" of YYCURSOR, etc. In other words, to me, the scanner rules are doing what the manual switch ()'s case statements do, but in a slower, "roundabout" way. Well, I'm gonna be away for a bit now, but I guess I can commit away when I get back. -shire - Matt -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Matt Wilmas wrote: Gotcha. If something changes, YYFILL -- or something to handle what needs to be done -- could just be added to the manual parts as necessary, right? Sorry forget to reply on this one, but yeah we'd have to do a manual call to YYFILL or a check or whatever we come up with wherever we're scanning ahead manually. (there's probably some other parts that might need this as well). -shire -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Matt Wilmas wrote: Hi Brian, - Original Message - From: "shire" Sent: Monday, May 04, 2009 Hey Matt, Matt Wilmas wrote: +/* To save initial string length after scanning to first variable, CG(doc_comment_len) can be reused */ +#define double_quotes_scanned_len CG(doc_comment_len) + (minor) Maybe we should rename this var if we're going to use it for other purposes, this doesn't really save any typing. Also if we do want the define maybe we should upper case it so it's more obvious? Yeah, I tried to think of other ways to do it, but just left it trying to look like another variable (not to save typing). Well, it can easily be changed later if a "cleaner" way is decided... Yeah I would just prefer if it was more obvious that it is *not* a variable ;-) How about this? #define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len) #define GET_DOUBLE_QUOTES_SCANNED_LENGTH() CG(doc_comment_len) Sure, works for me ;-) + while (YYCURSOR < YYLIMIT) { + switch (*YYCURSOR++) { In the example above, which we have a couple examples of here, we don't obey the YYFILL macro to detect if we have exceeded our EOF. This *might* be a problem, but only really depends on if we intend to use the YYFILL as a solution for exceeding our mmap bounds. I don't understand what the problem might be? The YYCURSOR < YYLIMIT check is what the YYFILL has been doing. If you mean after changes later, as long as the the whole thing is mmap()'d (which I'm assuming would be the case?), it just "looks" like a standard string, with terminating '\0', right? And there's no reading past YYLIMIT. Sorry yeah this wouldn't be a problem currently, but only if we try to fix the mmap issue by using YYFILL to realloc more space into the buffer. Then that macro would change to something more complicated. (per my previous replies with Arnaud) Gotcha. If something changes, YYFILL -- or something to handle what needs to be done -- could just be added to the manual parts as necessary, right? Have you considered using the lexer STATES and regex's instead of the manual C code for scanning the rest. It seems like if we have a one-char regex match for what the C code is doing we could handle this in the lexer without a lot of manual intervention (need to look at it more, just a thought I had earlier, the expressions are clearer now with your patch applied) ;-) It seems that matching one-char-at-a-time with re2c would be more complicated than the manual way, not to mention slower than the old (current) way. Do you have any objection (well, you've kinda mentioned some :-)) if I'd commit the changes in a little while like Dmitry thought could be done? Well I'm wondering if something more along these lines (just did this on-top of your patch as you cleaned up a lot) might be more appealing. (I'm not sure how much slower this would be than the current implementation, obviously it'll be somewhat slower, I'm basically just doing what you did in C but in the scanner instead of course). "#"|"//" { BEGIN(ST_EOL_COMMENT); yymore(); } ({NEWLINE}|"%>"|"?>") { char tmp = *(YYCURSOR-1); if ((tmp == '%' && CG(asp_tags)) | tmp == '?') { YYCURSOR -= 2; } CG(zend_lineno)++; BEGIN(ST_IN_SCRIPTING); return T_COMMENT; } {ANY_CHAR} { if (YYCURSOR >= YYLIMIT) { BEGIN(ST_IN_SCRIPTING); return T_COMMENT; } yymore(); } Let me know what the thoughts are on the above, if we don't want that then I say yeah, commit away! -shire -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi Brian, - Original Message - From: "shire" Sent: Monday, May 04, 2009 Hey Matt, Matt Wilmas wrote: +/* To save initial string length after scanning to first variable, CG(doc_comment_len) can be reused */ +#define double_quotes_scanned_len CG(doc_comment_len) + (minor) Maybe we should rename this var if we're going to use it for other purposes, this doesn't really save any typing. Also if we do want the define maybe we should upper case it so it's more obvious? Yeah, I tried to think of other ways to do it, but just left it trying to look like another variable (not to save typing). Well, it can easily be changed later if a "cleaner" way is decided... Yeah I would just prefer if it was more obvious that it is *not* a variable ;-) How about this? #define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len) #define GET_DOUBLE_QUOTES_SCANNED_LENGTH()CG(doc_comment_len) + while (YYCURSOR < YYLIMIT) { + switch (*YYCURSOR++) { In the example above, which we have a couple examples of here, we don't obey the YYFILL macro to detect if we have exceeded our EOF. This *might* be a problem, but only really depends on if we intend to use the YYFILL as a solution for exceeding our mmap bounds. I don't understand what the problem might be? The YYCURSOR < YYLIMIT check is what the YYFILL has been doing. If you mean after changes later, as long as the the whole thing is mmap()'d (which I'm assuming would be the case?), it just "looks" like a standard string, with terminating '\0', right? And there's no reading past YYLIMIT. Sorry yeah this wouldn't be a problem currently, but only if we try to fix the mmap issue by using YYFILL to realloc more space into the buffer. Then that macro would change to something more complicated. (per my previous replies with Arnaud) Gotcha. If something changes, YYFILL -- or something to handle what needs to be done -- could just be added to the manual parts as necessary, right? Have you considered using the lexer STATES and regex's instead of the manual C code for scanning the rest. It seems like if we have a one-char regex match for what the C code is doing we could handle this in the lexer without a lot of manual intervention (need to look at it more, just a thought I had earlier, the expressions are clearer now with your patch applied) ;-) It seems that matching one-char-at-a-time with re2c would be more complicated than the manual way, not to mention slower than the old (current) way. Do you have any objection (well, you've kinda mentioned some :-)) if I'd commit the changes in a little while like Dmitry thought could be done? -shire - Matt -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hey Matt, Matt Wilmas wrote: +/* To save initial string length after scanning to first variable, CG(doc_comment_len) can be reused */ +#define double_quotes_scanned_len CG(doc_comment_len) + (minor) Maybe we should rename this var if we're going to use it for other purposes, this doesn't really save any typing. Also if we do want the define maybe we should upper case it so it's more obvious? Yeah, I tried to think of other ways to do it, but just left it trying to look like another variable (not to save typing). Well, it can easily be changed later if a "cleaner" way is decided... Yeah I would just prefer if it was more obvious that it is *not* a variable ;-) + while (YYCURSOR < YYLIMIT) { + switch (*YYCURSOR++) { In the example above, which we have a couple examples of here, we don't obey the YYFILL macro to detect if we have exceeded our EOF. This *might* be a problem, but only really depends on if we intend to use the YYFILL as a solution for exceeding our mmap bounds. I don't understand what the problem might be? The YYCURSOR < YYLIMIT check is what the YYFILL has been doing. If you mean after changes later, as long as the the whole thing is mmap()'d (which I'm assuming would be the case?), it just "looks" like a standard string, with terminating '\0', right? And there's no reading past YYLIMIT. Sorry yeah this wouldn't be a problem currently, but only if we try to fix the mmap issue by using YYFILL to realloc more space into the buffer. Then that macro would change to something more complicated. (per my previous replies with Arnaud) Have you considered using the lexer STATES and regex's instead of the manual C code for scanning the rest. It seems like if we have a one-char regex match for what the C code is doing we could handle this in the lexer without a lot of manual intervention (need to look at it more, just a thought I had earlier, the expressions are clearer now with your patch applied) ;-) -shire -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi Dmitry, - Original Message - From: "Dmitry Stogov" Sent: Monday, May 04, 2009 Hi Matt, I wasn't able to look into all details of the patch, but in general I like it, as it fixes bugs and makes scanner smaller. I think you can commit it. OK, you mean before the freeze for RC2...? I'll go ahead and do that later unless someone says not to. Although this patch doesn't fix the EOF handling related to mmap(). :-/ Thanks. Dmitry. Thanks, Matt -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi Brian, - Original Message - From: "shire" Sent: Monday, May 04, 2009 Hey Matt, Thanks for posting, sorry for not having a chance to reply to this sooner. Maybe couple things from the patch, +/* To save initial string length after scanning to first variable, CG(doc_comment_len) can be reused */ +#define double_quotes_scanned_len CG(doc_comment_len) + (minor) Maybe we should rename this var if we're going to use it for other purposes, this doesn't really save any typing. Also if we do want the define maybe we should upper case it so it's more obvious? Yeah, I tried to think of other ways to do it, but just left it trying to look like another variable (not to save typing). Well, it can easily be changed later if a "cleaner" way is decided... + while (YYCURSOR < YYLIMIT) { + switch (*YYCURSOR++) { In the example above, which we have a couple examples of here, we don't obey the YYFILL macro to detect if we have exceeded our EOF. This *might* be a problem, but only really depends on if we intend to use the YYFILL as a solution for exceeding our mmap bounds. I don't understand what the problem might be? The YYCURSOR < YYLIMIT check is what the YYFILL has been doing. If you mean after changes later, as long as the the whole thing is mmap()'d (which I'm assuming would be the case?), it just "looks" like a standard string, with terminating '\0', right? And there's no reading past YYLIMIT. [...] -shire - Matt -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
On Mon, May 4, 2009 at 5:51 PM, shire wrote: > Arnaud Le Blanc wrote: >> >> Hi, >> On Mon, May 4, 2009 at 9:36 AM, shire wrote: >>> >>> Regarding the ZEND_MMAP_AHEAD issue and the temp. fix that Dmitry put in >>> we >>> need to find a solution to that, perhaps I can play with that this week >>> too >>> as I think I'm seeing some related issues in my testing of 5.3. >>> Essentially >>> we abuse ZEND_MMAP_AHEAD by adding it to the file size and passing it to >>> the >>> mmap call which isn't at all valid and only really works up to PAGESIZE. >>> We >>> could possibly use YYFILL to re-allocate more space as necessary past the >>> end of file to fix this. >> >> I was thinking of doing something like that with YYFILL too. However >> there is a bunch of pointers to take in to account and to update (e.g. >> yy_marker, yy_text, etc). >> > > Yeah, I'm pretty sure that's how most of the example re2c code is setup: > > #define YYFILL(n) {cursor = fill(s, cursor);} > > uchar *fill(Scanner *s, uchar *cursor){ > if(!s->eof){ > unint cnt = s->lim - s->tok; > uchar *buf = malloc((cnt + 1)*sizeof(uchar)); > memcpy(buf, s->tok, cnt); > cursor = &buf[cursor - s->tok]; > s->pos = &buf[s->pos - s->tok]; > s->ptr = &buf[s->ptr - s->tok]; > s->lim = &buf[cnt]; > s->eof = s->lim; *(s->eof)++ = '\n'; > s->tok = buf; > } > return cursor; > } > > > > -shire > This is what I seen too, but this is not always applicable. The scanner have code that refers to yy_text, yy_start, yy_cursor, yy_marker, etc. All those pointers point to the original buffer and must be updated by fill(). At each point in time the scanner may rollback to yy_marker or a rule may want to fetch yy_text or yy_start at any time. So the buffer must be large enough to contain all data from min(all_of_them) to max(all_of_them). That makes things a little complicated and potentially less efficient than a big buffer for the whole file. Regards, Arnaud -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Arnaud Le Blanc wrote: Hi, On Mon, May 4, 2009 at 9:36 AM, shire wrote: Regarding the ZEND_MMAP_AHEAD issue and the temp. fix that Dmitry put in we need to find a solution to that, perhaps I can play with that this week too as I think I'm seeing some related issues in my testing of 5.3. Essentially we abuse ZEND_MMAP_AHEAD by adding it to the file size and passing it to the mmap call which isn't at all valid and only really works up to PAGESIZE. We could possibly use YYFILL to re-allocate more space as necessary past the end of file to fix this. I was thinking of doing something like that with YYFILL too. However there is a bunch of pointers to take in to account and to update (e.g. yy_marker, yy_text, etc). Yeah, I'm pretty sure that's how most of the example re2c code is setup: #define YYFILL(n) {cursor = fill(s, cursor);} uchar *fill(Scanner *s, uchar *cursor){ if(!s->eof){ unint cnt = s->lim - s->tok; uchar *buf = malloc((cnt + 1)*sizeof(uchar)); memcpy(buf, s->tok, cnt); cursor = &buf[cursor - s->tok]; s->pos = &buf[s->pos - s->tok]; s->ptr = &buf[s->ptr - s->tok]; s->lim = &buf[cnt]; s->eof = s->lim; *(s->eof)++ = '\n'; s->tok = buf; } return cursor; } -shire -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi, On Mon, May 4, 2009 at 9:36 AM, shire wrote: > Regarding the ZEND_MMAP_AHEAD issue and the temp. fix that Dmitry put in we > need to find a solution to that, perhaps I can play with that this week too > as I think I'm seeing some related issues in my testing of 5.3. Essentially > we abuse ZEND_MMAP_AHEAD by adding it to the file size and passing it to the > mmap call which isn't at all valid and only really works up to PAGESIZE. We > could possibly use YYFILL to re-allocate more space as necessary past the > end of file to fix this. I was thinking of doing something like that with YYFILL too. However there is a bunch of pointers to take in to account and to update (e.g. yy_marker, yy_text, etc). There is mmap(MAP_FIXED) as an other temporary solution. The idea is to map the entire file rounded up to a whole page size, and to map the last (already mapped) page to anonymous memory using MAP_FIXED. In the end we get a readable contiguous memory region with the file data and ZEND_MMAP_AHEAD. This should work on many systems as long as the requested address is already part of the process's address space (which is always the case here). When this fails we can fallback to malloc/read. Regards, Arnaud -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi Matt, I wasn't able to look into all details of the patch, but in general I like it, as it fixes bugs and makes scanner smaller. I think you can commit it. Although this patch doesn't fix the EOF handling related to mmap(). Thanks. Dmitry. Matt Wilmas wrote: Hi guys, - Original Message - From: "Nuno Lopes" Sent: Thursday, April 30, 2009 The patch looks generally ok. However I'll need a few more days to review it carefully and throughly. (you can merge it in the meantime if you want). I'm just slighty concern with the amount of parsing we are now doing by hand, and with the possible (local) security bugs we might be introducing.. Am I understanding this properly, that this addresses the re2c EOF bug? So we have an RC planned for next week (freeze Monday evening). Can you get this fixed and released by then as Marcus is unable to do this himself? So this addresses some of the re2c EOF problems, but I don't know if it addresses all of them or not. I haven't had the time yet for a full review. Anyway, Matt can surelly comment on this. Yes, it addresses the re2c EOF issues for strings and comments, as they were the problem ones that allowed NULL bytes, and scanned past the EOF NULL. As I said to Dmitry, I'm not sure if it's now possible to remove the temporary mmap() fixes that he wanted removed before the next RC (??), or if there would still be problems with re2c scanning other tokens, even though they can't contain NULLs. I didn't attempt to make any changes there, since I'm not familiar with what's been done. I just wanted to finally send the patch for others to review, and decide what to do, so I won't commit any changes yet in the meantime. :-) Nuno - Matt -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hey Matt, Thanks for posting, sorry for not having a chance to reply to this sooner. Maybe couple things from the patch, +/* To save initial string length after scanning to first variable, CG(doc_comment_len) can be reused */ +#define double_quotes_scanned_len CG(doc_comment_len) + (minor) Maybe we should rename this var if we're going to use it for other purposes, this doesn't really save any typing. Also if we do want the define maybe we should upper case it so it's more obvious? + while (YYCURSOR < YYLIMIT) { + switch (*YYCURSOR++) { In the example above, which we have a couple examples of here, we don't obey the YYFILL macro to detect if we have exceeded our EOF. This *might* be a problem, but only really depends on if we intend to use the YYFILL as a solution for exceeding our mmap bounds. Regarding the ZEND_MMAP_AHEAD issue and the temp. fix that Dmitry put in we need to find a solution to that, perhaps I can play with that this week too as I think I'm seeing some related issues in my testing of 5.3. Essentially we abuse ZEND_MMAP_AHEAD by adding it to the file size and passing it to the mmap call which isn't at all valid and only really works up to PAGESIZE. We could possibly use YYFILL to re-allocate more space as necessary past the end of file to fix this. I don't see anything glaring in the patch that's a major issue, I can probably test more on a larger code base in the next 2-3 days. As I've said before this seems to be crossing the line of us writing a scanner by hand rather than letting re2c do the heavy lifting, but without a modification to re2c to handle EOF I don't have an alternative solution currently. (If we had some way to detect which regex we where matching against in the YYFILL that would likely be able to handle these bugs, but I didn't see a way to do that easily). -shire -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi guys, - Original Message - From: "Nuno Lopes" Sent: Thursday, April 30, 2009 The patch looks generally ok. However I'll need a few more days to review it carefully and throughly. (you can merge it in the meantime if you want). I'm just slighty concern with the amount of parsing we are now doing by hand, and with the possible (local) security bugs we might be introducing.. Am I understanding this properly, that this addresses the re2c EOF bug? So we have an RC planned for next week (freeze Monday evening). Can you get this fixed and released by then as Marcus is unable to do this himself? So this addresses some of the re2c EOF problems, but I don't know if it addresses all of them or not. I haven't had the time yet for a full review. Anyway, Matt can surelly comment on this. Yes, it addresses the re2c EOF issues for strings and comments, as they were the problem ones that allowed NULL bytes, and scanned past the EOF NULL. As I said to Dmitry, I'm not sure if it's now possible to remove the temporary mmap() fixes that he wanted removed before the next RC (??), or if there would still be problems with re2c scanning other tokens, even though they can't contain NULLs. I didn't attempt to make any changes there, since I'm not familiar with what's been done. I just wanted to finally send the patch for others to review, and decide what to do, so I won't commit any changes yet in the meantime. :-) Nuno - Matt -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
The patch looks generally ok. However I'll need a few more days to review it carefully and throughly. (you can merge it in the meantime if you want). I'm just slighty concern with the amount of parsing we are now doing by hand, and with the possible (local) security bugs we might be introducing.. Am I understanding this properly, that this addresses the re2c EOF bug? So we have an RC planned for next week (freeze Monday evening). Can you get this fixed and released by then as Marcus is unable to do this himself? So this addresses some of the re2c EOF problems, but I don't know if it addresses all of them or not. I haven't had the time yet for a full review. Anyway, Matt can surelly comment on this. Nuno -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
On 30.04.2009, at 19:18, Nuno Lopes wrote: The patch looks generally ok. However I'll need a few more days to review it carefully and throughly. (you can merge it in the meantime if you want). I'm just slighty concern with the amount of parsing we are now doing by hand, and with the possible (local) security bugs we might be introducing.. Am I understanding this properly, that this addresses the re2c EOF bug? So we have an RC planned for next week (freeze Monday evening). Can you get this fixed and released by then as Marcus is unable to do this himself? regards, Lukas Kahwe Smith m...@pooteeweet.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
The patch looks generally ok. However I'll need a few more days to review it carefully and throughly. (you can merge it in the meantime if you want). I'm just slighty concern with the amount of parsing we are now doing by hand, and with the possible (local) security bugs we might be introducing.. Nuno - Original Message - Hi Dmitry, Brian, all, Here's a scanner patch that I mentioned awhile ago, with a possible way to work around the re2c EOF handling issues. The primary change is to do a "manual scan" like I talked about in areas that match large amounts and can contain NULL bytes (strings/comments, which are now scanned faster too), as is done for inline HTML. I called it a "diet" :-) because it removes my complicated string regex patterns from a couple years ago, which doesn't make the .l file much smaller after adding the manual scan code (easier to understand...?), but it does result in a ~34k reduction of 5.3's generated .c file... This fixes Bug #46817, as well as a better, more proper fix for the older Bug #42767, both related to ending comments. Now inline HTML chunks aren't broken up when a tag starting with "s" is encountered (
[PHP-DEV] [PATCH] Scanner "diet" with fixes, etc.
Hi Dmitry, Brian, all, Here's a scanner patch that I mentioned awhile ago, with a possible way to work around the re2c EOF handling issues. The primary change is to do a "manual scan" like I talked about in areas that match large amounts and can contain NULL bytes (strings/comments, which are now scanned faster too), as is done for inline HTML. I called it a "diet" :-) because it removes my complicated string regex patterns from a couple years ago, which doesn't make the .l file much smaller after adding the manual scan code (easier to understand...?), but it does result in a ~34k reduction of 5.3's generated .c file... This fixes Bug #46817, as well as a better, more proper fix for the older Bug #42767, both related to ending comments. Now inline HTML chunks aren't broken up when a tag starting with "s" is encountered (