Re: Multiple, different, %merge of a token sequence

Joel E. Denny Tue, 02 May 2006 15:27:50 -0700

On Mon, 1 May 2006, Derek M Jones wrote:

> I'm not sure if the following is a feature or a bug.
> 
> There are four parses of the C input (x)*sizeof(y);
> 
> (x) can be parsed as:
> 
>    1) a cast of the expression *sizeof(y) to the type x (not
>       possible semantically, but this is syntax)
>    2) the left operand of the multiplication operator *
> 
> sizeof(y) can be parsed as:
> 
>    1) the sizeof the expression (y)
>    2) the sizeof the type y
> 
> I put %merge on the production for sizeof (see below) and
> was expecting the sizeof ambiguity to  be 'fully' resolved.
> 
> What happens is that the bison generated parser builds
> all four parse stacks, resolves one pair of sizeof
> ambiguities by calling sizeof_merge (leaving three stacks)
> and then calls multiplicative_merge twice (leaving 1 stack).
> 
> I think there should be two calls to sizeof_merge and one to
> multiplicative_merge.


I've confirmed this behavior with both Bison 2.1 and the latest CVS 
sources.

I've now given your code a much closer look.  This behavior is another 
manifestation of a complex Bison GLR problem that I've known about for a 
while and am still working on.  Unfortunately, I don't believe I've 
previously posted about it.  It touches on many areas of the GLR 
implementation, and I just haven't found the time yet to properly describe 
what I know.

One characteristic of the underlying problem is that Bison-generated 
parsers are not careful about how they merge parse trees.  Fortunately, I 
haven't ever observed them produce an invalid parse tree or miss a 
possible parse tree (and I don't think those are potential manifestations 
of the underlying problem here).  However, I have observed merges 
happening at unexpected points in the trees.  In the worst cases, I've 
seen them merge higher up when they should have reported an ambiguity 
lower down in the trees.  I've also see them merge higher up when they 
should have *merged* lower down.  This last case is what you observed.

Unfortunately, it seems like the most obvious solution will worsen the 
parser's time complexity.  I'm exploring a more sophisticated solution, 
but I'm afraid it'll be a while before I return to it.

> This is a problem because the xxx_merge code free's up the
> parse tree generated by the production that is discarded.
> The three parse trees passed to multiplicative_merge
> some share common subtrees, which means it is possible for
> the same nodes to be free'ed more than once (not good).
> The nodes of the parse trees passed to sizeof_merge are all
> unique and will be free'ed once each.

If your %merge functions eliminate subtrees, it appears that you are 
right: this bug actually does increase subtree sharing for your example 
input because it delays elimination until higher up.  However, node 
sharing is a general problem in GLR parsers that can't always be solved by 
writing such %merge functions.  Specifically, if any subtrees (such as 
tokens) exist on a stack before a split and appear on the RHS of 
reductions during the split, those subtrees will be shared by both stacks 
and you will have to be careful not to doubly free them when the stacks 
are merged.

My usual solution is node reference counting.  Until this bug is fixed, it 
looks like you will need that too.  I suspect however that you'll 
eventually need reference counting regardless of this bug.  You may even 
need it now for tokens.

> Is this behavior a bug in the parser generated by bison (I
> hope so) or a feature of the way things have to work?

I see it as a bug.

Joel


_______________________________________________
help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison

Re: Multiple, different, %merge of a token sequence

Reply via email to