Hi Brian,

----- Original Message -----
From: "shire"
Sent: Monday, May 04, 2009


Hey Matt,

Matt Wilmas wrote:

+/* To save initial string length after scanning to first variable,
CG(doc_comment_len) can be reused */
+#define double_quotes_scanned_len CG(doc_comment_len)
+

(minor) Maybe we should rename this var if we're going to use it for
other
purposes, this doesn't really save any typing. Also if we do want the
define maybe we should upper case it so it's more obvious?

Yeah, I tried to think of other ways to do it, but just left it trying
to look like another variable (not to save typing). Well, it can easily
be changed later if a "cleaner" way is decided...


Yeah I would just prefer if it was more obvious that it is *not* a variable ;-)

How about this?

#define SET_DOUBLE_QUOTES_SCANNED_LENGTH(len) CG(doc_comment_len) = (len)
#define GET_DOUBLE_QUOTES_SCANNED_LENGTH()    CG(doc_comment_len)

+ while (YYCURSOR < YYLIMIT) {
+ switch (*YYCURSOR++) {

In the example above, which we have a couple examples of here, we don't
obey the YYFILL macro to detect if we have exceeded our EOF. This
*might* be a problem, but only really depends on if we intend to use the
YYFILL as a solution for exceeding our mmap bounds.

I don't understand what the problem might be? The YYCURSOR < YYLIMIT
check is what the YYFILL has been doing. If you mean after changes
later, as long as the the whole thing is mmap()'d (which I'm assuming
would be the case?), it just "looks" like a standard string, with
terminating '\0', right? And there's no reading past YYLIMIT.

Sorry yeah this wouldn't be a problem currently, but only if we try to
fix the mmap issue by using YYFILL to realloc more space into the buffer.
Then that macro would change to something more complicated. (per my
previous replies with Arnaud)

Gotcha. If something changes, YYFILL -- or something to handle what needs to be done -- could just be added to the manual parts as necessary, right?

Have you considered using the lexer STATES and regex's instead of the
manual C code for scanning the rest.  It seems like if we have a one-char
regex match for what the C code is doing we could handle this in the
lexer without a lot of manual intervention (need to look at it more, just
a thought I had earlier, the expressions are clearer now with your patch
applied) ;-)

It seems that matching one-char-at-a-time with re2c would be more complicated than the manual way, not to mention slower than the old (current) way.

Do you have any objection (well, you've kinda mentioned some :-)) if I'd commit the changes in a little while like Dmitry thought could be done?

-shire

- Matt

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to