On Mon, 1 May 2006, Derek M Jones wrote: > I'm not sure if the following is a feature or a bug. > > There are four parses of the C input (x)*sizeof(y); > > (x) can be parsed as: > > 1) a cast of the expression *sizeof(y) to the type x (not > possible semantically, but this is syntax) > 2) the left operand of the multiplication operator * > > sizeof(y) can be parsed as: > > 1) the sizeof the expression (y) > 2) the sizeof the type y > > I put %merge on the production for sizeof (see below) and > was expecting the sizeof ambiguity to be 'fully' resolved. > > What happens is that the bison generated parser builds > all four parse stacks, resolves one pair of sizeof > ambiguities by calling sizeof_merge (leaving three stacks) > and then calls multiplicative_merge twice (leaving 1 stack). > > I think there should be two calls to sizeof_merge and one to > multiplicative_merge.
I've confirmed this behavior with both Bison 2.1 and the latest CVS sources. I've now given your code a much closer look. This behavior is another manifestation of a complex Bison GLR problem that I've known about for a while and am still working on. Unfortunately, I don't believe I've previously posted about it. It touches on many areas of the GLR implementation, and I just haven't found the time yet to properly describe what I know. One characteristic of the underlying problem is that Bison-generated parsers are not careful about how they merge parse trees. Fortunately, I haven't ever observed them produce an invalid parse tree or miss a possible parse tree (and I don't think those are potential manifestations of the underlying problem here). However, I have observed merges happening at unexpected points in the trees. In the worst cases, I've seen them merge higher up when they should have reported an ambiguity lower down in the trees. I've also see them merge higher up when they should have *merged* lower down. This last case is what you observed. Unfortunately, it seems like the most obvious solution will worsen the parser's time complexity. I'm exploring a more sophisticated solution, but I'm afraid it'll be a while before I return to it. > This is a problem because the xxx_merge code free's up the > parse tree generated by the production that is discarded. > The three parse trees passed to multiplicative_merge > some share common subtrees, which means it is possible for > the same nodes to be free'ed more than once (not good). > The nodes of the parse trees passed to sizeof_merge are all > unique and will be free'ed once each. If your %merge functions eliminate subtrees, it appears that you are right: this bug actually does increase subtree sharing for your example input because it delays elimination until higher up. However, node sharing is a general problem in GLR parsers that can't always be solved by writing such %merge functions. Specifically, if any subtrees (such as tokens) exist on a stack before a split and appear on the RHS of reductions during the split, those subtrees will be shared by both stacks and you will have to be careful not to doubly free them when the stacks are merged. My usual solution is node reference counting. Until this bug is fixed, it looks like you will need that too. I suspect however that you'll eventually need reference counting regardless of this bug. You may even need it now for tokens. > Is this behavior a bug in the parser generated by bison (I > hope so) or a feature of the way things have to work? I see it as a bug. Joel _______________________________________________ help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison