Re: [htdig3-dev] Re: htdig-3.2.0b2 status

Gilles Detillieux Tue, 11 Apr 2000 08:39:49 -0700
According to Geoff Hutchison:
> At 5:16 PM -0500 4/10/00, Gilles Detillieux wrote:
> >I'm still not satisfied with the config parser.  Unless I'm misreading
> >the grammar, it looks like there are a lot of potential problems lurking
> >in it.  It seems that for the value of any parameter, it will only allow
> >a string, a number or a list, where a list is defined as:
> 
> I would agree with this sentiment in general. However, please see my 
> notes below.
> 
> >list:        T_STRING T_STRING
> >     | list T_STRING
> >     | list T_NUMBER
> >
> >But in reality, it should be possible to set an attribute's value to
> >just about anything.  The grammar defines these as valid tokens:
> >
> >%token NUM T_DELIMITER T_NEWLINE T_RIGHT_BR T_LEFT_BR T_SLASH
> >%token <str> T_STRING T_KEYWORD T_NUMBER
> 
> Right, but it doesn't make it clear what they are. It's pretty easy 
> to work out that T_DELIMITER is ':' T_NEWLINE, T_RIGHT_BR, T_LEFT_BR, 
> and T_SLASH are also IMHO self-explanatory. This leaves NUM, <str> 
> (I'm guessing this is for the blocks), T_STRING, T_KEYWORD and 
> T_NUMBER.
> 
> As you say, it seems that T_KEYWORD are essentially strings on the 
> left-hand side of delimeters. If so, that makes them 
> *context-dependent* and the interpretation depends on where it is.
> 
> I still don't know what NUM is.

No, I don't know what NUM and <str> are, and I don't see them returned
by any of the Lex code, though I vaguely recall that <str> just indicated
that a particular token had a string value associated with it, as opposed
to just the token type.  I have to admit I have fairly minimal experience
with YACC/Bison, and a lot less with Lex, so to some extent I'm fumbling
along here.  I hadn't looked very closely at the Lex code before today,
so I didn't realise that tokens were context-dependent.  In my University
courses, when we learned about lexical analysers, we learned that they
commonly stick to a strict type 1 grammar, which means they have no
"memory" of anything but current state and current input character,
so there was no such thing as context-dependent tokens.

I still see a potential problem with the Lex code's handling of "<",
in that it's not restricted to the INITIAL context, which, if I'm not
mistaken, means that even in the t_right context, a "<" will begin a
bracket context, so an attribute definition that begins with an HTML tag
could trigger a syntax error.  Mind you, I haven't heard any complaints
about it rejecting next_pate_text and prev_page_text, so I must be
missing something here.  Can anyone shed some light on this?

> >so that leaves a lot of tokens that it will not allow within
> >a list.  Aren't keywords allowed as attribute values?  After all,
> 
> Nope. Since a keyword is only a keyword on the left side, it's a 
> string here. It seems an odd distinction, but that's what I see.

Yes, you're right on this point.  I hadn't looked closely enough.

> >build_select_lists allows (even requires) that some of the list items
> >be attribute names, which the grammar seems to treat as T_KEYWORD.
> 
> Well, but the *parser* itself doesn't validate. Otherwise, it would 
> have to have a list of T_KEYWORDS. We talked about this a few weeks 
> ago, but this isn't a validating parser. Since this code has makes a 
> contextual difference between a T_STRING and a T_KEYWORD, in a list, 
> it's just a T_STRING. That it happens to be the same string as a 
> keyword is meaningless *to the parser.* (It may have meaning to the 
> code, of course.)
> 
> >What about a list that begins with a number?  The list syntax seems to
> >require lists to begin with at least two strings.
> 
> OK, here I'd agree that makes sense. However, I'm not even sure my 
> code for adding a number onto a list is correct. It seems to work, 
> but I also just copied the code from the list T_STRING case. If you 
> feel I should do the same for T_NUMBER T_NUMBER, I'll do so, but it 
> "felt wrong" to me.

It just seems that all these "special cases" is making the code quite
convoluted, not to mention repetitive.  Do we really need 6 sections that
do essentially the same concatenation, and do we need to add more if we
find that other token types need to be added?  What I had in mind was
something more like this:

simple_expression:      T_KEYWORD T_DELIMITER list T_NEWLINE    {
                //
                // We can't do inserting into config
                // here because we don't know if it's
                // in complex expression or not.
                $$=new ConfigDefaults;
                $$->name = $1; $$->value=$3;
        }
        | T_NEWLINE     { /* Ignore empty lines */ }
        ;

list:   item
        | list item             {
                // Reallocate memory for 2 components and concatenate.
                char *old=$$;
                if (($$=new char [strlen(old)+strlen($2)+1+1])==NULL) {
                    fprintf(stderr,"Can't reallocate memory\n");
                    exit(1);
                }
                strcpy($$,old);
                strcat($$," ");         // Delimiter in list
                strcat($$,$2);
                delete [] old;
                delete [] $2;
        }
        ;

item:   T_STRING
        | T_NUMBER
        | /* nothing */         {
                if (($$=new char [1])==NULL) {
                    fprintf(stderr,"Can't allocate memory\n");
                    exit(1);
                }
                strcpy($$,"");
        }
        ;

If we find that we need to add other token types to lists, we just
need to add one entry to the "item" definition.  Am I oversimplifying,
or introducing an ambiguity in the grammar by taking this approach?
I guess we'd need to add a "%type <str> item" definition above that,
but all other definitions would be as-is.  Am I missing something?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
Re: [htdig3-dev] Re: htdig-3.2.0b2 status

Reply via email to