Re: [Grammatica-users] Regex token order

Per Cederberg Sun, 27 Feb 2011 22:50:31 -0800

It works like this (as flex):

1. Longest matching token first.


2. On equal length, use the token defined previously in the grammar.

So, in your case, don't make your TEXT token a repetitive regexp.
Match a single char only.

Cheers,

/Per

On Monday, February 28, 2011, Drew Vogel <drewpvo...@gmail.com> wrote:
> I would expect the token definition order to matter, based on my experience 
> with similar tools like flex. I must be doing something wrong.
> This is the test file I am trying to parse:
> --------------------------------------------------
>>email<  Enter your email address:
>
> This is my test grammar:--------------------------------------------------
> %header%GRAMMARTYPE = "LL"
> %tokens%RCARET = ">"LCARET = "<"ITEM_NAME = <<[a-zA-Z][a-zA-Z0-9]+>>
> TEXT = <<.+>>
> %productions%Item = ItemDecl TEXT;ItemDecl = RCARET ITEM_NAME LCARET ;
>
> This is the error I get from grammatica:
> --------------------------------------------------java -jar 
> grammatica-1.5.jar Q.grammar --parse test.qParse tree from test.q:
> Error: in test.q: line 1:    unexpected token ">email<" <TEXT>, expected ">"
>
> If I remove the TEXT token definition and the reference in the Item 
> production, the remaining grammar does properly match the first line and I 
> get a parse error at the new line character (as expected). Why does the 
> introduction of my TEXT token override those previously-matching tokens, even 
> though it is listed last in the %tokens% section?
>
>
>
> On Sun, Feb 27, 2011 at 11:49 PM, Oliver Bock <oli...@g7.org> wrote:
>
>
>
>
>
>
>
>     I had to do a similar thing, but putting the more specific tokens
>     first in %tokens% worked for me.  From my grammar:
>
>     ON = "ON"
>     VARNAME = <<[A-Z@#]([A-Z0-9._$#@]*[A-Z0-9_$#@])?>>
>
>     The text "ON" could match both these tokens, but for me ON matches,
>     not VARNAME.  I suggest you cut your example down into a very simple
>     grammar (like the above).
>
>
>       Oliver
>
>     On 28/02/2011 4:37 PM, Drew Vogel wrote:
>     If I have two regex tokens A and B and A is a subset
>       of B, how do I disambiguate them such that A will always be tried
>       before B? The order they appear in the %tokens% section does not
>       seem to affect this and I did not see an example of this in the
>       documentation.
>
>
>
>       The parser I am trying to construct is for a template-like
>         language with commands embedded in text. Thus I have a "text"
>         token regex <<.+>> to match everything not otherwise
>         matched as a command, but I only want to match it after all
>         other token regex patterns have been tried.
>
>
>
>           Drew Vogel
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> Grammatica-users@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>
>
>
>
> _______________________________________________
> Grammatica-users mailing list
> Grammatica-users@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
>
>
>

_______________________________________________
Grammatica-users mailing list
Grammatica-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/grammatica-users

Re: [Grammatica-users] Regex token order

Reply via email to