Re: several messages

Akim Demaille Mon, 07 Sep 2009 11:20:49 -0700

Hi Joel!

Le 3 sept. 09 à 07:35, Joel E. Denny a écrit :

Hi Akim.

On Wed, 2 Sep 2009, Akim Demaille wrote:
I played with canonical-lr, but it is way too expensive for me,both in termsof size, and duration of compilation (maybe in a release mode,but...).
By compilation, you mean gcc?  Or is bison slow?


I meant bison.

But
toying with lr.default-reductions (still in lalr) gives very goodresults.
I have 382 LR(0) states, and 12941 is canonical LR(1).
That's quite a jump. I usually see only one more digit. There mustbemany similar contexts within your grammar. What does IELR(1) giveyou?
How long does bison take to run for IELR(1) versus LALR(1)?


My grammar has:
- 185 rules
- 104 terminals
- 58 nonterminals

Here are the figures:

bison=./_build/debug/bison/tests/bison
src=$HOME/src/kernel
build=$src/_build/debug
includes="-I$build/src -I$src/src -I$src/include -I$src/sdk-remote/include -I$build/sdk-remote/libport/include -I$src/sdk-remote/libport/include -I$build/include -I$src/kernel -I/opt/local/include"
for lr in lalr ielr canonical-lr
do
 for red in all consistent accepting
 do
   echo
   echo $lr-$red
   TIMEFMT="Bison %E real  %U user  %S system"
time $bison --trace=time -Dlr.type=$lr -Dlr.default-reductions=$red src/parser/ugrammar.y -o /tmp/ugrammar.cc
   TIMEFMT="G++   %E real  %U user  %S system"
time g++ -DHAVE_CONFIG_H $=includes -c /tmp/ugrammar.cc -o /tmp/ugrammar.o$bison -Dlr.type=$lr -Dlr.default-reductions=$red src/parser/ugrammar.y -o /tmp/ugrammar.cc -rall
   sed -n '/^state [0-9]*$/h;${x;p;};' /tmp/ugrammar.output
   gls -sh /tmp/ugrammar.*
 done
done |& tee log.txt

lalr-all

Execution times (seconds)
 LALR(1)               :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 outputing parser      :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 running m4            :   0.47 (96%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.49             0.00             0.00
Bison 0.55s real  0.50s user  0.02s system
G++   5.99s real  5.34s user  0.60s system
state 382
164K /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
836K /tmp/ugrammar.o
672K /tmp/ugrammar.output

lalr-consistent

Execution times (seconds)
 LALR(1)               :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 outputing parser      :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 running m4            :   0.47 (94%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.50             0.00             0.00
Bison 0.55s real  0.51s user  0.01s system
G++   6.01s real  5.38s user  0.60s system
state 382
256K /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
860K /tmp/ugrammar.o
764K /tmp/ugrammar.output

lalr-accepting

Execution times (seconds)
 LALR(1)               :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 outputing parser      :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 running m4            :   0.47 (90%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.52             0.00             0.00
Bison 0.55s real  0.53s user  0.01s system
G++   6.19s real  5.42s user  0.60s system
state 382
312K /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
876K /tmp/ugrammar.o
896K /tmp/ugrammar.output

ielr-all

Execution times (seconds)
 LALR(1)               :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 2       :   0.03 ( 6%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 3       :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 outputing parser      :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 running m4            :   0.46 (85%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.54             0.00             0.00
Bison 0.56s real  0.55s user  0.01s system
G++   5.98s real  5.34s user  0.59s system
state 382
164K /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
836K /tmp/ugrammar.o
672K /tmp/ugrammar.output

ielr-consistent

Execution times (seconds)
 LALR(1)               :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 2       :   0.03 ( 5%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 3       :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 outputing parser      :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 running m4            :   0.46 (84%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.55             0.00             0.00
Bison 0.58s real  0.56s user  0.01s system
G++   6.35s real  5.42s user  0.61s system
state 382
256K /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
860K /tmp/ugrammar.o
764K /tmp/ugrammar.output

ielr-accepting

Execution times (seconds)
 LALR(1)               :   0.01 ( 2%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 2       :   0.03 ( 5%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 3       :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :   0.03 ( 5%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 outputing parser      :   0.02 ( 4%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 running m4            :   0.46 (81%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.57             0.00             0.00
Bison 0.60s real  0.59s user  0.01s system
G++   6.12s real  5.41s user  0.60s system
state 382
312K /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
876K /tmp/ugrammar.o
896K /tmp/ugrammar.output

canonical-lr-all

Execution times (seconds)
 IELR(1) Phase 3       :   0.33 (10%) usr   0.01 (10%) sys   0.00 ( 0%) wall
 IELR(1) Phase 4       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 conflicts             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :   2.19 (65%) usr   0.01 (10%) sys 128.00 (100%) wall
 outputing parser      :   0.16 ( 5%) usr   0.02 (20%) sys   0.00 ( 0%) wall
 running m4            :   0.65 (19%) usr   0.05 (50%) sys   0.00 ( 0%) wall
 freeing               :   0.01 ( 0%) usr   0.01 (10%) sys   0.00 ( 0%) wall
 TOTAL                 :   3.36             0.10           128.00
Bison 3.49s real  3.37s user  0.11s system
G++   7.85s real  6.76s user  0.76s system
state 12941
2.7M /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
1.6M /tmp/ugrammar.o
 25M /tmp/ugrammar.output

canonical-lr-consistent

Execution times (seconds)
 IELR(1) Phase 3       :   0.36 ( 2%) usr   0.01 ( 4%) sys   0.00 ( 0%) wall
 IELR(1) Phase 4       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 conflicts             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :  16.38 (91%) usr   0.05 (18%) sys   0.00 ( 0%) wall
 outputing parser      :   0.34 ( 2%) usr   0.08 (29%) sys   0.00 ( 0%) wall
 running m4            :   0.89 ( 5%) usr   0.13 (46%) sys   0.00 ( 0%) wall
 freeing               :   0.02 ( 0%) usr   0.01 ( 4%) sys   0.00 ( 0%) wall
 TOTAL                 :  18.01             0.28             0.00
Bison 18.64s real  18.02s user  0.30s system
G++   10.63s real  9.15s user  0.96s system
state 12941
5.6M /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
2.3M /tmp/ugrammar.o
 27M /tmp/ugrammar.output

canonical-lr-accepting

Execution times (seconds)
 reader                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 IELR(1) Phase 3       :   0.37 ( 1%) usr   0.01 ( 2%) sys   0.00 ( 0%) wall
 IELR(1) Phase 4       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 conflicts             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 parser action tables  :  46.08 (95%) usr   0.21 (34%) sys   0.00 ( 0%) wall
 outputing parser      :   0.61 ( 1%) usr   0.12 (20%) sys   0.00 ( 0%) wall
 running m4            :   1.28 ( 3%) usr   0.26 (43%) sys   0.00 ( 0%) wall
 freeing               :   0.01 ( 0%) usr   0.01 ( 2%) sys   0.00 ( 0%) wall
 TOTAL                 :  48.38             0.61             0.00
Bison 49.65s real  48.39s user  0.63s system
G++   13.04s real  11.08s user  1.03s system
state 12941
8.5M /tmp/ugrammar.cc
 52K /tmp/ugrammar.hh
3.2M /tmp/ugrammar.o
 31M /tmp/ugrammar.output

It would be nice to extend the *.output file with various metricsabout the grammar.

Here are the size of the parser files.

a...@montero ~/src/gnet/kernel $ ls -l _build/*/ugrammar.cc
12:40:05
-rw-r--r-- 1 akim akim 321287 Sep 2 12:10 _build/accepting/ugrammar.cc
-rw-r--r--  1 akim  akim   170357 Sep  2 12:13 _build/all/ugrammar.cc
-rw-r--r-- 1 akim akim 8913314 Sep 2 11:52 _build/canonical-lr/ugrammar.cc-rw-r--r-- 1 akim akim 263165 Sep 2 12:06 _build/consistent/ugrammar.cc
It's interesting that each value for lr.default-reductions has a
significant impact on size, but at least it's not orders ofmagnitude as
for canonical LR.


Yes, agreed.

The syntax error on "1 1" behaves as expected with both accepting and
consistent (i.e., mute since there are too many possibilities ;).


If you could unmute it (by adding a few more possibilities in the

skeleton), it would be interesting to see if canonical LR gives thesame

expected tokens list.

One way out of the fixed arity would be to use "note: " lines, as Alexsuggested it (and I'm slowly changing my mind on this regard).

So I will proceed with default-reductions=all on space-starvingtargets (onwhich anyway I use simple error messages to get rid of the tables),and
"accepting" otherwise.
Why not "consistent"? For error messages, I don't know of anyreason to
expect either to typically be better than the other.  But "consistent"
produces the traditional Yacc behavior (actually defined in POSIX)of notrequesting a token from yylex until necessary. This can affectsemantic
actions.

I don't see value in this feature in my case. I guess that this isprecious in languages like C where the token kind of identifiersdepends on the context.

The only reason I made canonical LR choose "accepting" by default is
because I think of that as being the canonical form.


OK, I'll use consistent, thanks!  It is significantly smaller.

The documentation for error-verbose should probably
promote the use of accepting/consistent.
They can sometimes worsen the error messages.  That is, in some cases,
lookahead merging might produce worse effects on the expected listin thestate containing the default reduction than is produced byperforming the
default reduction.  For example, consider the input "aaz" for this
grammar:

%token 'z';
S: 'a' A 'a'
 | 'b' A B
 ;
A: 'a' ;
B: 'a' | 'b' | 'c' | 'd' ;


Wow.  You do have a collection of interesting grammars at hand :)

With lr.type=canonical-lr or lr.type=lalr:

syntax error, unexpected 'z', expecting 'a'

But with lr.type=lalr and lr.default-reductions=accepting:

syntax error, unexpected 'z', expecting 'a' or 'b' or 'c' or 'd'

If we add:

%left 'a';
A: 'a' 'a';

Then lr.default-reductions=consistent has the same problem.

My suspicion would have been that disabling default reductions almost
always does more good than harm, but I think we need more exploration.

So if we can't trust static tables, should we actually try eachpossible token one after the other, and see if one can be shifted atsome point? Of course without touching the stack, that can be quite apain to write :( But it should not be really different from our error-recovery code.

Do I also understand that we neither have a superset nor a subset,just some fuzzyset?

Maybe that would also explain my incorrect
messages in the nearby thread about semantic error messages.
I skimmed that, and I would guess that you need canonical LR. Ihaven't
explored your grammar yet though.
I don't think you have the grammar I was referring too, or maybe anold
version of it.
For some reason, I thought you meant the calc++ example grammar,


Maybe it features the same phenomenon indeed, I didn't check.

so that's
what I looked at.


I can send ugrammar.y privately, if you wish.

Re: several messages

Reply via email to