On 10.07.2008, at 01:21, Nuno Lopes wrote:
I didn't test it, but yeah that should fix the # problem. :-) BTW,
I also
had other ideas about checking for <?, <%, <script>, etc. tags in
the inline
HTML scanning part, so the largest chunk of HTML is always grabbed
(I'll
send the patch in the future; didn't modify anything yet, and it's
not
related to the subject anyway :-)).
my code doesn't find the optimal largest chunk of inline html, but
almost. It just gives up when it finds a potential tag. It can be
made optimal easily, at some expense. I don't know if it's
beneficial or not.
Still wondering about the behavior of re2c at EOF being different
than
Flex -- can't re2c have an addition/enhancement that simply keeps
track of
the rule that *would have* matched before hitting EOF (e.g.
YYCURSOR >=
YYLIMIT) and then jump to it when doing the YYFILL check?
Yes, this is horrible.. I'm also afraid there might be some other
corner cases that we are returning EOF where we shouldn't. This
behaviour can be workarounded with the state feature though.
Another thing that isn't working is the warning about /* Unterminated
comments... (never seen). The optimization for comment parsing I
was going
to do (along with the above HTML stuff) would also work around that
-- not
using re2c rules, but a manual scan or zend_memnstr() for the
closing */.
Ok, please file a bug report and assign it to me, or go ahead and
fix it yourself :-)
Like I said in the comments for Bug #45372, if the last thing at
the end of
a file is matched by a variable length rule, it will not be returned.
Because of
#define YYFILL(n) { if (YYCURSOR >= YYLIMIT) return 0; }
I put the ? in the subject line because I'm not sure how important
this
really is, but it just seems broken to me (though it's usually with
invalid
code), and I couldn't think of a workaround with my limited
knowledge of
re2c (though I think it would need to be changed internally). Some
things
this affects are 1) the tokenizer extension -- the last token won't
be
returned (if variable length, of course); 2) highlighting (if
someone is
trying to "see" an unclosed string error, for example? PHP
highlighting on
forums...), and parse errors can be different than previously if
the parser
gets one less token, for example:
I'm not much worried about input errors, although I agree the
current approach isn't the best one. As I said, this can be
workrounded with the states thing (IIRC).
It's been awhile since I checked out the details, so I can't recall
at the
moment if there are more serious examples. Also not sure if some
of this is
affecting the ini scanner (see Bug #45384), as I haven't really
look at its
code.
The ini scanner is a bit broken yes.. :/
Maybe Marcus can help us here? Maybe add some new feature to re2c or
help in implementing some workarond?
Whats the status here?
regards,
Lukas Kahwe Smith
[EMAIL PROTECTED]
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php