Re: [sqlite] Lemon: broken lookahead propagation?

2012-01-08 Thread Vincent Zweije
On Sat, Jan 07, 2012 at 10:18:12AM -0500, Richard Hipp wrote:

||  On Wed, Jan 4, 2012 at 11:00 AM, Vincent Zweije  wrote:
||
||  > I may have hit a bug in the lemon parser generator.
||  >
||
||  Please see if the following fix clears your problem:
||
||  http://www.sqlite.org/src/info/ce32775b23

Looks good: it produces a few more 's and other terminals in
the starters list, but I'm sure you checked that too. Haven't checked
the generated parser yet on actual input yet.

Thanks.  Vincent.
-- 
Vincent Zweije| "If you're flamed in a group you
  | don't read, does anybody get burnt?"
[Xhost should be taken out and shot] |-- Paul Tomblin on a.s.r.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Lemon: broken lookahead propagation?

2012-01-07 Thread Richard Hipp
On Wed, Jan 4, 2012 at 11:00 AM, Vincent Zweije  wrote:

> I may have hit a bug in the lemon parser generator.
>

Please see if the following fix clears your problem:

http://www.sqlite.org/src/info/ce32775b23



>
> It looks like lookahead symbols aren't propagated properly, in some cases.
>
> The consequence is that in some states, valid symbols are rejected with
> a syntax error.
>
> The complete grammar can be found at
>
>http://www.zweije.nl/~vzweije/parser.y
>
> I'm processing the grammar with lemon version 3.7.7-2ubuntu2. I've
> compared that version to the current head in sqlite, and it only differs
> in non-essential areas (calls to fclose recently added in trunk, calls
> to snprintf added in ubuntu). I'm pretty sure the current head version
> of lemon will exhibit the problem as well.
>
> For completeness' sake, this was found on an armv7l system.
>
> I have just reproduced the problem with the head lemon.c on an i386
> system. Valgrind reports no stray memory accesses (though quite a few
> memory leaks, but those are probably harmless).
>
> The command to process the parser is:
>
> lemon -c -s parser.y
>
> The generated parser has, among others, the following state:
>
> State 244:
>  constructor_def ::= existental_quant_variables_opt upper_name *
> fix_opt type_clos
>(200) fix_opt ::= *
>  fix_opt ::= * Fix
>
>  Open reduce 200
> CurlyOpen reduce 200
>  Hash reduce 200
>  RectOpen reduce 200
> A reduce 200
>   Dot reduce 200
>   Exclaim reduce 200
>  Star reduce 200
>   Int reduce 200
>  Real reduce 200
>  Char reduce 200
>  Bool reduce 200
>  RectOpenExclaimRectClose reduce 200
>   Dynamic reduce 200
>  File reduce 200
>String reduce 200
> World reduce 200
>   Fix shift  486
>   LowerCaseId reduce 200
>   UpperCaseId reduce 200
>   FunnyId reduce 200
>   fix_opt shift  130
>
> As can be seen, fix_opt is nullable - it may derive the empty string.
> Therefore, any symbol that can start what follows it (type_clos),
> or any lookahead symbol (because type_clos is nullable too), must
> be acceptable in this state. However, some tokens (Bar, Semicolon,
> LowerCaseId, UpperCaseId, FunnyId) are not.
>
> The next state lists the acceptable symbols there:
>
> State 130:
>  type_clos ::= * open_type_clos
>  type_clos ::= * closed_type_clos
>  open_type_clos ::= * closed_type_clos type_prefix_seq
>  closed_type_seq ::= * closed_type_clos type_prefix_clos
> type_expression
>  closed_type_clos ::= * closed_type_seq
>(144) closed_type_clos ::= *
>  constructor_def ::= existental_quant_variables_opt upper_name
> fix_opt * type_clos
>
>  Open reduce 144
> CurlyOpen reduce 144
>   Bar reduce 144
>  Hash reduce 144
>  RectOpen reduce 144
> A reduce 144
>   Dot reduce 144
>   Exclaim reduce 144
>  Star reduce 144
>   Int reduce 144
>  Real reduce 144
>  Char reduce 144
>  Bool reduce 144
>  RectOpenExclaimRectClose reduce 144
>   Dynamic reduce 144
>  File reduce 144
>String reduce 144
> World reduce 144
>   LowerCaseId reduce 144
>   UpperCaseId reduce 144
>   FunnyId reduce 144
> Semicolon reduce 144
> type_clos shift  563
>   closed_type_seq shift  375
>  closed_type_clos shift  179
>open_type_clos shift  491
>
> Some of the missing symbols (LowerCaseId, UpperCaseId, FunnyId) are
> starters of type_clos:
>
>  164: type_clos: Open CurlyOpen Hash RectOpen A Dot Exclaim Star Int Real
> Char Bool RectOpenExclaimRectClose Dynamic File String World LowerCaseId
> UpperCaseId FunnyId
>
> The other two (Bar, Semicolon) are not; these are followers of the
> constructor_def symbol.
>
> Looking at it now,  is not in the starters of type_clos,
> though according to the grammar rules, it is obviously nullable
> (type_clos => closed_type_close => ). Curious.
>
> The problem disappears when fix_opt is deleted from the RHS of the
> rule. (Of course, this means that a present Fix token is no longer
> accepted.)
>
> Ciao.  

[sqlite] Lemon: broken lookahead propagation?

2012-01-04 Thread Vincent Zweije
I may have hit a bug in the lemon parser generator.

It looks like lookahead symbols aren't propagated properly, in some cases.

The consequence is that in some states, valid symbols are rejected with
a syntax error.

The complete grammar can be found at

http://www.zweije.nl/~vzweije/parser.y

I'm processing the grammar with lemon version 3.7.7-2ubuntu2. I've
compared that version to the current head in sqlite, and it only differs
in non-essential areas (calls to fclose recently added in trunk, calls
to snprintf added in ubuntu). I'm pretty sure the current head version
of lemon will exhibit the problem as well.

For completeness' sake, this was found on an armv7l system.

I have just reproduced the problem with the head lemon.c on an i386
system. Valgrind reports no stray memory accesses (though quite a few
memory leaks, but those are probably harmless).

The command to process the parser is:

lemon -c -s parser.y

The generated parser has, among others, the following state:

State 244:
  constructor_def ::= existental_quant_variables_opt upper_name * 
fix_opt type_clos
(200) fix_opt ::= *
  fix_opt ::= * Fix

  Open reduce 200
 CurlyOpen reduce 200
  Hash reduce 200
  RectOpen reduce 200
 A reduce 200
   Dot reduce 200
   Exclaim reduce 200
  Star reduce 200
   Int reduce 200
  Real reduce 200
  Char reduce 200
  Bool reduce 200
  RectOpenExclaimRectClose reduce 200
   Dynamic reduce 200
  File reduce 200
String reduce 200
 World reduce 200
   Fix shift  486
   LowerCaseId reduce 200
   UpperCaseId reduce 200
   FunnyId reduce 200
   fix_opt shift  130

As can be seen, fix_opt is nullable - it may derive the empty string.
Therefore, any symbol that can start what follows it (type_clos),
or any lookahead symbol (because type_clos is nullable too), must
be acceptable in this state. However, some tokens (Bar, Semicolon,
LowerCaseId, UpperCaseId, FunnyId) are not.

The next state lists the acceptable symbols there:

State 130:
  type_clos ::= * open_type_clos
  type_clos ::= * closed_type_clos
  open_type_clos ::= * closed_type_clos type_prefix_seq
  closed_type_seq ::= * closed_type_clos type_prefix_clos 
type_expression
  closed_type_clos ::= * closed_type_seq
(144) closed_type_clos ::= *
  constructor_def ::= existental_quant_variables_opt upper_name fix_opt 
* type_clos

  Open reduce 144
 CurlyOpen reduce 144
   Bar reduce 144
  Hash reduce 144
  RectOpen reduce 144
 A reduce 144
   Dot reduce 144
   Exclaim reduce 144
  Star reduce 144
   Int reduce 144
  Real reduce 144
  Char reduce 144
  Bool reduce 144
  RectOpenExclaimRectClose reduce 144
   Dynamic reduce 144
  File reduce 144
String reduce 144
 World reduce 144
   LowerCaseId reduce 144
   UpperCaseId reduce 144
   FunnyId reduce 144
 Semicolon reduce 144
 type_clos shift  563
   closed_type_seq shift  375
  closed_type_clos shift  179
open_type_clos shift  491

Some of the missing symbols (LowerCaseId, UpperCaseId, FunnyId) are
starters of type_clos:

  164: type_clos: Open CurlyOpen Hash RectOpen A Dot Exclaim Star Int Real Char 
Bool RectOpenExclaimRectClose Dynamic File String World LowerCaseId UpperCaseId 
FunnyId

The other two (Bar, Semicolon) are not; these are followers of the
constructor_def symbol.

Looking at it now,  is not in the starters of type_clos,
though according to the grammar rules, it is obviously nullable
(type_clos => closed_type_close => ). Curious.

The problem disappears when fix_opt is deleted from the RHS of the
rule. (Of course, this means that a present Fix token is no longer
accepted.)

Ciao.  Vincent.
-- 
Vincent Zweije| "If you're flamed in a group you
  | don't read, does anybody get burnt?"
[Xhost should be taken out and shot] |-- Paul Tomblin on a.s.r.
___
sqlite-users mailing list
sqlite-users@sqlit