Matt Wilmas wrote:
Hi Brian,

----- Original Message -----
From: "shire"
Sent: Monday, May 04, 2009


Hey Matt,

Matt Wilmas wrote:

+/* To save initial string length after scanning to first variable,
CG(doc_comment_len) can be reused */
+#define double_quotes_scanned_len CG(doc_comment_len)
+

(minor) Maybe we should rename this var if we're going to use it for
other
purposes, this doesn't really save any typing. Also if we do want the
define maybe we should upper case it so it's more obvious?

Yeah, I tried to think of other ways to do it, but just left it trying
to look like another variable (not to save typing). Well, it can easily
be changed later if a "cleaner" way is decided...


Yeah I would just prefer if it was more obvious that it is *not* a
variable ;-)

How about this?

#define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len)
#define GET_DOUBLE_QUOTES_SCANNED_LENGTH() CG(doc_comment_len)


Sure, works for me ;-)


+ while (YYCURSOR < YYLIMIT) {
+ switch (*YYCURSOR++) {

In the example above, which we have a couple examples of here, we don't
obey the YYFILL macro to detect if we have exceeded our EOF. This
*might* be a problem, but only really depends on if we intend to use
the
YYFILL as a solution for exceeding our mmap bounds.

I don't understand what the problem might be? The YYCURSOR < YYLIMIT
check is what the YYFILL has been doing. If you mean after changes
later, as long as the the whole thing is mmap()'d (which I'm assuming
would be the case?), it just "looks" like a standard string, with
terminating '\0', right? And there's no reading past YYLIMIT.

Sorry yeah this wouldn't be a problem currently, but only if we try to
fix the mmap issue by using YYFILL to realloc more space into the buffer.
Then that macro would change to something more complicated. (per my
previous replies with Arnaud)

Gotcha. If something changes, YYFILL -- or something to handle what
needs to be done -- could just be added to the manual parts as
necessary, right?

Have you considered using the lexer STATES and regex's instead of the
manual C code for scanning the rest. It seems like if we have a one-char
regex match for what the C code is doing we could handle this in the
lexer without a lot of manual intervention (need to look at it more, just
a thought I had earlier, the expressions are clearer now with your patch
applied) ;-)

It seems that matching one-char-at-a-time with re2c would be more
complicated than the manual way, not to mention slower than the old
(current) way.

Do you have any objection (well, you've kinda mentioned some :-)) if I'd
commit the changes in a little while like Dmitry thought could be done?

Well I'm wondering if something more along these lines (just did this on-top of 
your patch as you cleaned up a lot) might be more appealing.  (I'm not sure how 
much slower this would be than the current implementation, obviously it'll be 
somewhat slower, I'm basically just doing what you did in C but in the scanner 
instead of course).

<ST_IN_SCRIPTING>"#"|"//" {
    BEGIN(ST_EOL_COMMENT);
    yymore();
}

<ST_EOL_COMMENT>({NEWLINE}|"%>"|"?>") {
    char tmp = *(YYCURSOR-1);
    if ((tmp == '%' && CG(asp_tags)) | tmp == '?') {
      YYCURSOR -= 2;
    }
    CG(zend_lineno)++;
    BEGIN(ST_IN_SCRIPTING);
    return T_COMMENT;
}

<ST_EOL_COMMENT>{ANY_CHAR} {
    if (YYCURSOR >= YYLIMIT) {
      BEGIN(ST_IN_SCRIPTING);
      return T_COMMENT;
    }
    yymore();
}



Let me know what the thoughts are on the above, if we don't want that then I 
say yeah, commit away!

-shire





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to