Re: [sqlite] Lemon: Conflicts with repeated TERMINALS

2007-11-29 Thread Ralf Junker
Joe Wilson <[EMAIL PROTECTED]> wrote:

>The following grammar may be clearer to you:

Yes, it is many thanks! I believe I am making progress! At least I can see the 
picture much clearer now and was able to come up with the following grammar 
with just one conflict unsolved:

  %left NEWLINE.   /* Do these matter here at all? */
  %nonassoc TEXT LINK.
  %left HEADING_START.
  %left HEADING_END.

  article ::= blocks.

  blocks ::= block. /* EOF */
  blocks ::= blocks NEWLINE./* EOF */
  blocks ::= blocks NEWLINE NEWLINE block.

  block ::= .   /* EOF */
  block ::= paragraph.
  block ::= heading.

  heading ::= HEADING_START text HEADING_END.

  paragraph ::= line.
  paragraph ::= paragraph NEWLINE line.

  line ::= text.

  text ::= textpiece.
  text ::= text textpiece.

  textpiece ::= TEXT.
  textpiece ::= LINK.

I of course appreciate any comments ;-) My idea is that

* A block can be either a paragraph or a heading. Multiple blocks are separated 
by two NEWLINEs.

* A paragraph is made up of n >= 1 lines. Each line within a paragraph ends 
with a single NEWLINE. Two NEWLINEs start a new block (see above).

* A line consists of text, which can be TEXT or LINK.

Not all works well with the grammer, and unfortunately I do not understand why. 
Given this input, for example:

  TEXT, NEWLINE

the parser gets stuck at

  paragraph ::= paragraph NEWLINE line.

instead of falling back to the line above
  
  paragraph ::= line.

to find the conditions of a paragraph fulfilled. Why does it not try the other 
alternatives? Or are there none in the grammar?

>Try reading some papers on parsing or search for the book
>"Compilers: Principles, Techniques, and Tools" (a.k.a. 
>the dragon book).

I certainly will.

>Also try writing on paper random sequences of tokens and 
>manually parse your grammar to see the conflicts firsthand.

As I throw different token sequences to my experimental parser I am slowly 
starting to make sense of the debugger output.

Ralf 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Lemon: Conflicts with repeated TERMINALS

2007-11-29 Thread Joe Wilson
--- Ralf Junker <[EMAIL PROTECTED]> wrote:
> >  paragraph ::= PARA text.
> 
> I observed the new PARA terminal token (the clear separator!?). Unfortunately 
> the lexer does not
> generate such a token. Paragraph repeats are also removed.

It was just an HTML-like example. I just wanted to demonstrate one
possible way to remove the conflicts by adding a special tag. 
I'm not suggesting that you alter your grammar in this way.

> >Here's another:
> >
> >  article ::= blocks.
> >
> >  blocks ::= block.
> >  blocks ::= blocks block.
> >
> >  block ::= heading NEWLINE.
> >  block ::= paragraph NEWLINE.
> >
> >  heading ::= HEADING_START text HEADING_END.
> >  heading ::= HEADING_START text.
> >  heading ::= HEADING_START.
> >
> >  paragraph ::= text.
> >
> >  text ::= textpiece.
> >  text ::= text textpiece.
> >
> >  textpiece ::= TEXT.
> >  textpiece ::= LINK.
> 
> This one also removes paragraph repeats, doesn't it? Unfortunately paragraphs 
> need to repeat for
> my grammar. Is there a way to achieve this without conflicts?

In your original grammar, you could have random sequences of TEXT 
and LINK and NEWLINE tokens without any way of differentiating whether 
they were part of a "text" or "paragraph" or "heading", hence the conflict.
So I figured that a paragraph may as well be any combination of
TEXT and LINK tokens ending with NEWLINE. The headings in my 2nd grammar
also have to end with NEWLINE. "paragraph" will not repeat, per se, but you 
can repeat "block" (see the "blocks" rules), where you can have several 
consecutive "block"s that happen to be of type paragraph so you can achieve 
the same effect. 

The following grammar may be clearer to you:

  article ::= blocks.

  blocks ::= block.
  blocks ::= blocks block.

  block ::= heading.
  block ::= paragraph.

  heading ::= HEADING_START text HEADING_END.
  heading ::= HEADING_START text NEWLINE.
  heading ::= HEADING_START NEWLINE.

  paragraph ::= NEWLINE.
  paragraph ::= text NEWLINE.

  text ::= textpiece.
  text ::= text textpiece.

  textpiece ::= TEXT.
  textpiece ::= LINK.

It's much the same as the previous grammar, but puts the NEWLINEs as part 
of the paragraph and heading rules instead of in the block rule.
The difference being that heading no longer needs to end with NEWLINE 
in one case if HEADING_END is encountered, as it is not ambiguous:

  heading ::= HEADING_START text HEADING_END.

and a paragraph in this grammar may also be empty so you can parse 
consecutive NEWLINE tokens with this rule:

  paragraph ::= NEWLINE.

Again, this was just an example of how to disambiguate the grammar. 
You can find other ways.

> >Lemon generates an .out file for the .y file processed.
> >You can examine it for errors.
> 
> I have tried to make sense of the .out file before. It tells me where to look 
> for the problem,
> but not how to fix it ...

Try reading some papers on parsing or search for the book
"Compilers: Principles, Techniques, and Tools" (a.k.a. 
the dragon book).

Also try writing on paper random sequences of tokens and 
manually parse your grammar to see the conflicts firsthand.



  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Lemon: Conflicts with repeated TERMINALS

2007-11-29 Thread Ralf Junker
Many thanks, Joe,

>Your grammar is ambiguous. The text tokens run together for 
>various rules because the grammar lacks clear separators between 
>them. 

OK, I begin to understand. The "clear separators" need to be TERMINALs, right? 
I believed that these were imlicit because there are TEXT and LINK after all 
text tokens are fully expanded. Therefore I thought that the grammar would not 
be ambiguous.

>You can fix it a million ways by altering your grammar.

Thanks for the suggestions - I can see that they do not generate conflicts, but 
they certainly alter the grammar.

>Here is one way:
>
>  article ::= blocks.
>
>  blocks ::= block.
>  blocks ::= blocks block.
>
>  block ::= heading.
>  block ::= paragraph.
>
>  heading ::= HEADING_START text HEADING_END.
>  heading ::= HEADING_START text.
>  heading ::= HEADING_START.
>
>  paragraph ::= PARA text.
>
>  text ::= textpiece.
>  text ::= text textpiece.
>
>  textpiece ::= TEXT.
>  textpiece ::= LINK.

I observed the new PARA terminal token (the clear separator!?). Unfortunately 
the lexer does not generate such a token. Paragraph repeats are also removed.

>Here's another:
>
>  article ::= blocks.
>
>  blocks ::= block.
>  blocks ::= blocks block.
>
>  block ::= heading NEWLINE.
>  block ::= paragraph NEWLINE.
>
>  heading ::= HEADING_START text HEADING_END.
>  heading ::= HEADING_START text.
>  heading ::= HEADING_START.
>
>  paragraph ::= text.
>
>  text ::= textpiece.
>  text ::= text textpiece.
>
>  textpiece ::= TEXT.
>  textpiece ::= LINK.

This one also removes paragraph repeats, doesn't it? Unfortunately paragraphs 
need to repeat for my grammar. Is there a way to achieve this without conflicts?

>Lemon generates an .out file for the .y file processed.
>You can examine it for errors.

I have tried to make sense of the .out file before. It tells me where to look 
for the problem, but not how to fix it ...

I am sorry to appear stupid, but I still can not make sense of it all. Can 
someone still help, please?

Ralf 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Lemon: Conflicts with repeated TERMINALS

2007-11-28 Thread Joe Wilson
--- Ralf Junker <[EMAIL PROTECTED]> wrote:
> article ::= blocks.
> 
> blocks ::= block.
> blocks ::= blocks block.
> 
> block ::= heading.
> block ::= paragraph.
> 
> heading ::= HEADING_START text HEADING_END.
> heading ::= HEADING_START text.
> heading ::= HEADING_START.
> 
> paragraph ::= text NEWLINE.
> paragraph ::= paragraph text NEWLINE.
> paragraph ::= text.
> paragraph ::= paragraph text.
> 
> text ::= textpiece.
> text ::= text textpiece.
> 
> textpiece ::= TEXT.
> textpiece ::= LINK.

Your grammar is ambiguous. The text tokens run together for 
various rules because the grammar lacks clear separators between 
them. You can fix it a million ways by altering your grammar.

Here is one way:

  article ::= blocks.

  blocks ::= block.
  blocks ::= blocks block.

  block ::= heading.
  block ::= paragraph.

  heading ::= HEADING_START text HEADING_END.
  heading ::= HEADING_START text.
  heading ::= HEADING_START.

  paragraph ::= PARA text.

  text ::= textpiece.
  text ::= text textpiece.

  textpiece ::= TEXT.
  textpiece ::= LINK.

Here's another:

  article ::= blocks.

  blocks ::= block.
  blocks ::= blocks block.

  block ::= heading NEWLINE.
  block ::= paragraph NEWLINE.

  heading ::= HEADING_START text HEADING_END.
  heading ::= HEADING_START text.
  heading ::= HEADING_START.

  paragraph ::= text.

  text ::= textpiece.
  text ::= text textpiece.

  textpiece ::= TEXT.
  textpiece ::= LINK.

Lemon generates an .out file for the .y file processed.
You can examine it for errors.



  

Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  
http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Lemon: Conflicts with repeated TERMINALS

2007-11-28 Thread Ralf Junker
I am trying to write a Wiki parser with Lemon. The Lemon features suite my 
needs perfectly, but I am unfortunately stuck with the problem of parsing 
conflicts.

All conflicts seem caused by repeat constructs like this:

  text ::= textpiece.
  text ::= text textpiece.

The complete grammar follows below and results in 10 conflicts.

I have read the manual, looked at tutorials, and searched the mailing list, but 
nothing helped me to reduce the number of conflicts. Changing token order even 
tends cause more of them.

Reading similar grammars for Bison makes me wonder why Bison apparently has no 
problems with them but Lemon does. Am I doing something wrong or is this simply 
not possible with Lemon?

Ralf

---

article ::= blocks.

blocks ::= block.
blocks ::= blocks block.

block ::= heading.
block ::= paragraph.

heading ::= HEADING_START text HEADING_END.
heading ::= HEADING_START text.
heading ::= HEADING_START.

paragraph ::= text NEWLINE.
paragraph ::= paragraph text NEWLINE.
paragraph ::= text.
paragraph ::= paragraph text.

text ::= textpiece.
text ::= text textpiece.

textpiece ::= TEXT.
textpiece ::= LINK.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-