Re: Am I misunderstanding precedence?

2021-06-21 Thread Justin Ng
I see now that I really misunderstood how Bison works.

Thanks for breaking it down, and pointing me in the right direction!


From: Akim Demaille 
Sent: Saturday, June 19, 2021 02:01 AM
To: Justin Ng 
Cc: Bison Help 
Subject: Re: Am I misunderstanding precedence?

Hi Justin,

> Le 14 juin 2021 à 07:02, Justin Ng  a écrit :
>
> I've encountered something which confuses me, but I'm not sure if it's a bug 
> or just something I don't understand.
>
> I'm looking at this file,
> https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L1169
>
> https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L9300
>
> https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L9582
>
> Of note are,
>
> ```
> %left   EQ EQUAL_SYM GE GT_SYM LE LT NE IS LIKE REGEXP IN_SYM
> ...
> %left  INTERVAL_SYM
>
> ...
>
> bool_pri:
>  bool_pri IS NULL_SYM %prec IS
> ...
>
> simple_expr:
>| INTERVAL_SYM expr interval '+' expr %prec INTERVAL_SYM
> ```
>
> From the above snippet, it looks like the intention is for the `INTERVAL_SYM` 
> rule to have a higher precedence than the `IS NULL` rule.

That sentence makes no sense in Bison ((*) by default, see below).
Bison solves statically (at generation time) shift/reduce
conflicts (competition between a token and a rule) by taking
precedence/associativity into account when it applies.

In the case of conflicts between two rules (reduce/reduce conflicts)
no precedence/associativity applies.  There is no concept of
"a rule has higher precedence than another".  It only applies
to token vs. rule, not rule vs. rule.

In the case of reduce/reduce conflicts, it's always the first rule
(in order in the file) that wins (at generation time too).  Usually
when you have r/r conflicts both reduction are valid, and your grammar
is actually ambiguous, so the recommendation for r/r conflicts is to...
fix the grammar.


(*) In GLR parsing, both parses are continued, and conflicts between
rules are solved *at parse time*.  But I don't believe you are using
a GLR parser, so I don't believe that applies to your case.  And if
it did, it would have been %dprec, not %prec.


> Given the input string `INTERVAL 0 DAY + NULL IS NULL`, there are two valid 
> parse trees, ignoring precedence,
>
> + (INTERVAL 0 DAY + NULL) IS NULL
> + INTERVAL 0 DAY + (NULL IS NULL)

That's not the way shift/reduce parsing works.

In this case the problem is here:

> INTERVAL 0 DAY + NULL * IS NULL


At that point there is a shift/reduce conflict between IS (the lookahead)
that wants to be shifted, and the rule of + (%prec INTERVAL_SYM) that wants
to be reduced.

*That* is the conflict you need to fix: token IS vs. rule INTERVAL_SYM.
And then I agree that the precedences you quoted indicate that this
should be parse as

> (INTERVAL 0 DAY + NULL) IS NULL

given that INTERVAL_SYM has higher precedence than IS.

However you don't quote the grammar enough to see if the conflict
is indeed the way it looks like it is.  You need to generate the
*.output file to see the automaton, and follow by hand what happens
with "INTERVAL 0 DAY + NULL" when "IS" arrives.  That will give
you control over that conflict.


> I find that IS NULL has higher precedence. Which is unexpected.
>
> Is this a bug with bison? Or am I misunderstanding something?

I very much doubt it.  This part of the code is decades-old.  I
rather think it's a flaw in the grammar.

> I'm not sure what specific version of bison is being used but I found it 
> should be 2.1+
> https://dev.mysql.com/doc/refman/5.7/en/source-installation-prerequisites.html

OMG.

I strongly recommend 3.7 which includes new and powerful tools to
understand conflicts such as yours (look for "counterexamples" in the
documentation).

Cheers!


Re: Am I misunderstanding precedence?

2021-06-19 Thread Akim Demaille
Hi Justin,

> Le 14 juin 2021 à 07:02, Justin Ng  a écrit :
> 
> I've encountered something which confuses me, but I'm not sure if it's a bug 
> or just something I don't understand.
> 
> I'm looking at this file,
> https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L1169
> 
> https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L9300
> 
> https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L9582
> 
> Of note are,
> 
> ```
> %left   EQ EQUAL_SYM GE GT_SYM LE LT NE IS LIKE REGEXP IN_SYM
> ...
> %left  INTERVAL_SYM
> 
> ...
> 
> bool_pri:
>  bool_pri IS NULL_SYM %prec IS
> ...
> 
> simple_expr:
>| INTERVAL_SYM expr interval '+' expr %prec INTERVAL_SYM
> ```
> 
> From the above snippet, it looks like the intention is for the `INTERVAL_SYM` 
> rule to have a higher precedence than the `IS NULL` rule.

That sentence makes no sense in Bison ((*) by default, see below).
Bison solves statically (at generation time) shift/reduce
conflicts (competition between a token and a rule) by taking
precedence/associativity into account when it applies.

In the case of conflicts between two rules (reduce/reduce conflicts)
no precedence/associativity applies.  There is no concept of
"a rule has higher precedence than another".  It only applies
to token vs. rule, not rule vs. rule.

In the case of reduce/reduce conflicts, it's always the first rule
(in order in the file) that wins (at generation time too).  Usually
when you have r/r conflicts both reduction are valid, and your grammar
is actually ambiguous, so the recommendation for r/r conflicts is to...
fix the grammar.


(*) In GLR parsing, both parses are continued, and conflicts between
rules are solved *at parse time*.  But I don't believe you are using
a GLR parser, so I don't believe that applies to your case.  And if
it did, it would have been %dprec, not %prec.


> Given the input string `INTERVAL 0 DAY + NULL IS NULL`, there are two valid 
> parse trees, ignoring precedence,
> 
> + (INTERVAL 0 DAY + NULL) IS NULL
> + INTERVAL 0 DAY + (NULL IS NULL)

That's not the way shift/reduce parsing works.

In this case the problem is here:

> INTERVAL 0 DAY + NULL * IS NULL


At that point there is a shift/reduce conflict between IS (the lookahead)
that wants to be shifted, and the rule of + (%prec INTERVAL_SYM) that wants
to be reduced.

*That* is the conflict you need to fix: token IS vs. rule INTERVAL_SYM.
And then I agree that the precedences you quoted indicate that this
should be parse as

> (INTERVAL 0 DAY + NULL) IS NULL

given that INTERVAL_SYM has higher precedence than IS.

However you don't quote the grammar enough to see if the conflict
is indeed the way it looks like it is.  You need to generate the
*.output file to see the automaton, and follow by hand what happens
with "INTERVAL 0 DAY + NULL" when "IS" arrives.  That will give
you control over that conflict.


> I find that IS NULL has higher precedence. Which is unexpected.
> 
> Is this a bug with bison? Or am I misunderstanding something?

I very much doubt it.  This part of the code is decades-old.  I
rather think it's a flaw in the grammar.

> I'm not sure what specific version of bison is being used but I found it 
> should be 2.1+
> https://dev.mysql.com/doc/refman/5.7/en/source-installation-prerequisites.html

OMG.

I strongly recommend 3.7 which includes new and powerful tools to
understand conflicts such as yours (look for "counterexamples" in the
documentation).

Cheers!


Am I misunderstanding precedence?

2021-06-14 Thread Justin Ng
I've encountered something which confuses me, but I'm not sure if it's a bug or 
just something I don't understand.

I'm looking at this file,
https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L1169

https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L9300

https://github.com/mysql/mysql-server/blob/5c8c085ba96d30d697d0baa54d67b102c232116b/sql/sql_yacc.yy#L9582

Of note are,

```
%left   EQ EQUAL_SYM GE GT_SYM LE LT NE IS LIKE REGEXP IN_SYM
...
%left  INTERVAL_SYM

...

bool_pri:
  bool_pri IS NULL_SYM %prec IS
...

simple_expr:
| INTERVAL_SYM expr interval '+' expr %prec INTERVAL_SYM
```

>From the above snippet, it looks like the intention is for the `INTERVAL_SYM` 
>rule to have a higher precedence than the `IS NULL` rule.

Given the input string `INTERVAL 0 DAY + NULL IS NULL`, there are two valid 
parse trees, ignoring precedence,

+ (INTERVAL 0 DAY + NULL) IS NULL
+ INTERVAL 0 DAY + (NULL IS NULL)

If we include precedence, it seems we would want,

+ (INTERVAL 0 DAY + NULL) IS NULL

The MySQL documentation seems to agree,
https://dev.mysql.com/doc/refman/5.7/en/operator-precedence.html

It shows that INTERVAL should have a higher precedence than IS.

However, when I actually run this input against MySQL,
```
SELECT
INTERVAL 0 DAY + NULL IS NULL,
(INTERVAL 0 DAY + NULL) IS NULL,
INTERVAL 0 DAY + (NULL IS NULL)
```
https://www.db-fiddle.com/f/cMPJfx5kZKoMeX4PKDrTWC/0


I find that IS NULL has higher precedence. Which is unexpected.

Is this a bug with bison? Or am I misunderstanding something?

I'm not sure what specific version of bison is being used but I found it should 
be 2.1+
https://dev.mysql.com/doc/refman/5.7/en/source-installation-prerequisites.html