Hello bison bug fixers -
I built C++ software (xmlInstanceParserGenerator) that reads XML schema
files and writes C++ classes and a YACC-Lex parser for XML data files
conforming to the schema files.
The Quality Information Framework (QIF) ANSI standard is defined using
22 (or 23) XML schema files. The schema files are valid in commercial
tools such as XMLSpy and oXygen. Data files conforming to the standard
are XML files. I am one of the developers of QIF.
For the QIF model, the QIFDocument.y file that is generated by the
xmlInstanceParserGenerator is 6460833 bytes. When I process that file
with bison 3.4.2 there are no errors or warnings. The QIFDocumentYACC.cc
file that is generated by bison is 20550463 bytes. In that file
YYNSTATES is 68691 and YYNRULES is 25278. The QIFDocumentParser built
from that file (and a couple dozen other files) compiles without error
or warning.
I got nearly identical results when I used bison 3.4.1.
The problem seems to be that the parser that is built does not seem to
handle more than 32768 (2^15) states. I have an XML test file (16 pages)
that is valid against the schema in XMLSpy and exhibits the problem. I
have a radically reduced version of that file (1 page) that exhibits the
same problem. The problem manifests itself by putting negative values
(the state number minus 2^16) on the state stack for states larger than
32768 (I do not know what happens with states larger than 2^16). This
causes a segmentation fault when the states are referenced.
I ran the parser with yydebug set to 1. A portion of the end of the
output is shown below. The segmentation fault occurs at the same place
in the data file when yydebug is not set. I have studied the dump data,
and all of the steps before negative state numbers arise seem to be correct.
My request is: please enable bison to handle much larger state numbers.
I am guessing that all that is required is to change some numerical type
from 16 bits to 32 bits, which should be very easy. But that's a guess
and I realize "should be very easy" is often a dream.
I will be happy to send you more information, but I think a quick glance
at the dump data shown below is probably all that is required to
understand the problem. Since the problem occurs when yydebug is not
set, I do not think the problem is limited to printing the dump data.
Thanks.
Tom Kramer
kra...@nist.gov
Entering state 32204
Next token is token BodyIdsSTART ()
Reducing stack by rule 19683 (line 122101):
-> $$ = nterm y_SecurityClassification_SecurityClassificationType_0 ()
Stack now 0 3 6 8 13 105 113 120 125 132 143 181 234 280 318 367 471 684
1001 1369 1696 1990 2200 2362 2555 2880 3361 3913 4560 5316 5996 6702
7343 7954 8604 9377 10347 11442 12613 13847 15191 16648 18216 19638
20929 22252 23632 25096 26659 28411 30203 32204
Entering state 34474
Next token is token BodyIdsSTART ()
Reducing stack by rule 6259 (line 55426):
-> $$ = nterm y_ExportControlClassification_XmlString_0 ()
Stack now 0 3 6 8 13 105 113 120 125 132 143 181 234 280 318 367 471 684
1001 1369 1696 1990 2200 2362 2555 2880 3361 3913 4560 5316 5996 6702
7343 7954 8604 9377 10347 11442 12613 13847 15191 16648 18216 19638
20929 22252 23632 25096 26659 28411 30203 32204 -31062
Entering state 36931
Next token is token BodyIdsSTART ()
Reducing stack by rule 6784 (line 58254):
-> $$ = nterm y_FeatureNominalIds_ArrayReferenceType_0 ()
Stack now 0 3 6 8 13 105 113 120 125 132 143 181 234 280 318 367 471 684
1001 1369 1696 1990 2200 2362 2555 2880 3361 3913 4560 5316 5996 6702
7343 7954 8604 9377 10347 11442 12613 13847 15191 16648 18216 19638
20929 22252 23632 25096 26659 28411 30203 32204 -31062 -28605
...
Entering state 47074
Next token is token BodyIdsSTART ()
Reducing stack by rule 16131 (line 103986):
-> $$ = nterm y_PartNoteIds_ArrayReferenceType_0 ()
Stack now 0 3 6 8 13 105 113 120 125 132 143 181 234 280 318 367 471 684
1001 1369 1696 1990 2200 2362 2555 2880 3361 3913 4560 5316 5996 6702
7343 7954 8604 9377 10347 11442 12613 13847 15191 16648 18216 19638
20929 22252 23632 25096 26659 28411 30203 32204 -31062 -28605 -25982
-23359 -20886 -18462
Entering state 18172
Reducing stack by rule 16138 (line 104024):
$1 = nterm (null) ()
$2 = nterm (null) ()
$3 = token closedATTR ()
$4 = nterm (null) ()
$5 = nterm y_PartNoteIds_ArrayReferenceType_0 ()
Segmentation fault (core dumped)