Re: Re: Lex token problem

David Beazley Mon, 05 Jan 2009 04:22:13 -0800

Instead of trying to solve this problem with magic regexs, another 
possible solution to this is to try and solve it in the parser.   For 
example, you could have  a grammar rule such as this:


def p_VECTOR(p):
       "VECTOR : NUM"
       if p[1] < 0 or p[1] > 1:
              raise SyntaxError
       p[0] = p[1]

The catch is that if the grammar uses NUM and VECTOR  in
certain ways, you might get shift-reduce or reduce-reduce 
conflicts (if not, all is well).

Cheers,
Dave



On Mon 05/01/09  1:13 AM , zt [email protected] sent:
> 
> Hi Dennis,
> 
> Thanks a lot.
> 
> I totally ignore the priority part the lex at PLY documents.
> The only difference between NUM and VECTOR tokens are NUM always after
> some special words like TSET or repeat.
> I got a way to do this: t_NUM=r'(?<=TSET\s)\d+|(?<=repeat\s)\d+'
> only problem is (?<=) grammar only support fixed length so I can not
> use \s* to match the spaces between them. So another scan to sub all
> duplicated \s is needed.
> 
> What would be the usually way to solve this kind problem: the same
> character at different locations meaning different TOKEN?
> 
> Best Regards,
> Adun
> 
> On Dec 31 2008, 5:32 pm, "Hendriks, D." <D.Hendr...@
> tue.nl> wrote:> Hello zt,
> >
> > both r'\d+' and r'0|1|...' match the numbers 0
> and 1. Since the r'0|1|...' regular expression has a longer length, it is
> given priority (see Ply documentation). Is there any way to differentiate
> the NUM and VECTOR tokens? For instance, can NUM tokens start with a 0 at
> all? You will need to have two regular expressions that only match the
> given input for that token (that is, no overlap). Well, you can have
> overlap, as long as you know it's there and the one that is given priority
> is the one you want to have priority, but still, I think it is better to
> avoid the overlap alltogether...>
> > Dennis
> >
> > ________________________________
> >
> > Van: ply
> [email protected] namens zt> Verzonden: wo 31-12-2008 9:49
> > Aan: ply-hack
> > Onderwerp: Lex token problem
> >
> > Hi all,
> >
> > I am still learning how to write parser with
> PLY. I need to parse> following format data:
> >  TSET 1        001 X 0
> 00;>              
>       001 X 0 00;>              
>       001 X 0 00;>  TSET 7        001 X 0
> 00;> repeat 12      001 X 0
> 00;>
> > The tokens are defined as:
> > t_TSET=r'TSET'
> > t_NUM=r'\d+'
> > t_MCODE=r'repeat'
> > t_VECTOR=r'0|1|H|L|X'
> >
> > but it kept treating the first "1" at
> line 1 as VECTOR instead of NUM> and the "1" after "repeat"
> as VECTOR.> Is there a good way to fix this?
> >
> > Thanks a lot!
> >
> >  winmail.dat
> > 6KViewDownload
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups
> "ply-hack" group.To post to this group, send email to ply
> [email protected] unsubscribe from this group, send email to ply-hack+
> [email protected] more options, visit this group at 
> http://groups.google.com/group/ply-hack?hl=en-~----------~----~----~----~------~----
~------~--~---
> 
> 
> 
> 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Re: Lex token problem

Reply via email to