Pedro Lopes wrote: > Hi, I have a difficult(?) problem to solve in PLY. I'm trying to parse > a > little language that allows statements to be broken across several > lines > by "escaping" the newline with \. This wouldn't be unusual, except in > this case the break is allowed anywhere, even in the middle of a > token.
In my view, the best solution would be to ignore the latter part, and only allow a subset of specifications as input. > Here is a real example, note how the "chrU42" identifier is split > across > 2 lines: > > test4 = chrt34||chrh35||chre36||chrF38||chrO39||chrR40||chrM41||chr\ > U42||chrL43||chrA44||chrP46||chrA47||chrR48||chrS49||chrE50||chrR51\ > &&y>0.03&&y<0.07 I see, but I am not convinced of the need. > Now, this would be easy to do by preprocessing the input to PLY with > a > regex, but I would rather do it the lexer. Problem is, I can't figure > out how. Ignored characters in the lexer aren't really ignored > because > they still act as token delimiters, so that doesn't work. Ideas? You'd have to allow for them to exist anywhere between two characters in the token, ie a number token [0-9]+ would become (\\\n)*[0-9]((\\\n)*[0-9])+ where "(\\\n)" is the escape sequence. I suspect you have to think carefully about white space too to prevent ambiguity problems. While this will probably work, it is ugly to say the least. Aside from doing pre-processing as you already suggested, you could also write a custom scanner yourself. It is not terribly difficult, and it gives you the freedom to make it work correctly (eg the position information of tokens is going to be a challenge with disappearing \n characters) Albert --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en -~----------~----~----~----~------~----~------~--~---
