According to Geoff Hutchison:
> At 5:16 PM -0500 4/10/00, Gilles Detillieux wrote:
> >I'm still not satisfied with the config parser. Unless I'm misreading
> >the grammar, it looks like there are a lot of potential problems lurking
> >in it. It seems that for the value of any parameter, it will only allow
> >a string, a number or a list, where a list is defined as:
>
> I would agree with this sentiment in general. However, please see my
> notes below.
>
> >list: T_STRING T_STRING
> > | list T_STRING
> > | list T_NUMBER
> >
> >But in reality, it should be possible to set an attribute's value to
> >just about anything. The grammar defines these as valid tokens:
> >
> >%token NUM T_DELIMITER T_NEWLINE T_RIGHT_BR T_LEFT_BR T_SLASH
> >%token <str> T_STRING T_KEYWORD T_NUMBER
>
> Right, but it doesn't make it clear what they are. It's pretty easy
> to work out that T_DELIMITER is ':' T_NEWLINE, T_RIGHT_BR, T_LEFT_BR,
> and T_SLASH are also IMHO self-explanatory. This leaves NUM, <str>
> (I'm guessing this is for the blocks), T_STRING, T_KEYWORD and
> T_NUMBER.
>
> As you say, it seems that T_KEYWORD are essentially strings on the
> left-hand side of delimeters. If so, that makes them
> *context-dependent* and the interpretation depends on where it is.
>
> I still don't know what NUM is.
No, I don't know what NUM and <str> are, and I don't see them returned
by any of the Lex code, though I vaguely recall that <str> just indicated
that a particular token had a string value associated with it, as opposed
to just the token type. I have to admit I have fairly minimal experience
with YACC/Bison, and a lot less with Lex, so to some extent I'm fumbling
along here. I hadn't looked very closely at the Lex code before today,
so I didn't realise that tokens were context-dependent. In my University
courses, when we learned about lexical analysers, we learned that they
commonly stick to a strict type 1 grammar, which means they have no
"memory" of anything but current state and current input character,
so there was no such thing as context-dependent tokens.
I still see a potential problem with the Lex code's handling of "<",
in that it's not restricted to the INITIAL context, which, if I'm not
mistaken, means that even in the t_right context, a "<" will begin a
bracket context, so an attribute definition that begins with an HTML tag
could trigger a syntax error. Mind you, I haven't heard any complaints
about it rejecting next_pate_text and prev_page_text, so I must be
missing something here. Can anyone shed some light on this?
> >so that leaves a lot of tokens that it will not allow within
> >a list. Aren't keywords allowed as attribute values? After all,
>
> Nope. Since a keyword is only a keyword on the left side, it's a
> string here. It seems an odd distinction, but that's what I see.
Yes, you're right on this point. I hadn't looked closely enough.
> >build_select_lists allows (even requires) that some of the list items
> >be attribute names, which the grammar seems to treat as T_KEYWORD.
>
> Well, but the *parser* itself doesn't validate. Otherwise, it would
> have to have a list of T_KEYWORDS. We talked about this a few weeks
> ago, but this isn't a validating parser. Since this code has makes a
> contextual difference between a T_STRING and a T_KEYWORD, in a list,
> it's just a T_STRING. That it happens to be the same string as a
> keyword is meaningless *to the parser.* (It may have meaning to the
> code, of course.)
>
> >What about a list that begins with a number? The list syntax seems to
> >require lists to begin with at least two strings.
>
> OK, here I'd agree that makes sense. However, I'm not even sure my
> code for adding a number onto a list is correct. It seems to work,
> but I also just copied the code from the list T_STRING case. If you
> feel I should do the same for T_NUMBER T_NUMBER, I'll do so, but it
> "felt wrong" to me.
It just seems that all these "special cases" is making the code quite
convoluted, not to mention repetitive. Do we really need 6 sections that
do essentially the same concatenation, and do we need to add more if we
find that other token types need to be added? What I had in mind was
something more like this:
simple_expression: T_KEYWORD T_DELIMITER list T_NEWLINE {
//
// We can't do inserting into config
// here because we don't know if it's
// in complex expression or not.
$$=new ConfigDefaults;
$$->name = $1; $$->value=$3;
}
| T_NEWLINE { /* Ignore empty lines */ }
;
list: item
| list item {
// Reallocate memory for 2 components and concatenate.
char *old=$$;
if (($$=new char [strlen(old)+strlen($2)+1+1])==NULL) {
fprintf(stderr,"Can't reallocate memory\n");
exit(1);
}
strcpy($$,old);
strcat($$," "); // Delimiter in list
strcat($$,$2);
delete [] old;
delete [] $2;
}
;
item: T_STRING
| T_NUMBER
| /* nothing */ {
if (($$=new char [1])==NULL) {
fprintf(stderr,"Can't allocate memory\n");
exit(1);
}
strcpy($$,"");
}
;
If we find that we need to add other token types to lists, we just
need to add one entry to the "item" definition. Am I oversimplifying,
or introducing an ambiguity in the grammar by taking this approach?
I guess we'd need to add a "%type <str> item" definition above that,
but all other definitions would be as-is. Am I missing something?
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.