Hi Joel (et al):
Ok. I decided to not use yytoknum[], as it is undocumented and also requires a
funky "#define YYPRINT". Here is the code I am using inside yylex():
// get map of keywords from strings in the grammar
static map<string,int> keyword;
if(keyword.size() == 0) {
for(int i=0; i<YYNTOKENS; ++i) {
if(yytname[i][0] != '"') continue;
string name(yytname[i]+1);
name.erase(name.size()-1,1);
for(int j=YYMAXUTOK; j>0; --j) {
if(yytranslate[j] == i) {
keyword[name] = j;
break;
}
}
}
}
// when an identifier is found that is in keyword[],
// return keyword[name]
This is a C++ file that does #include "sxfread.tab.c" inside a namespace
(generated by bison from sxfread.y). That seemed to me to be a much easier way
to interface to C++ than via the C++ support in bison [#].
With this approach, this bug becomes just a documentation issue. Though it does
depend on retaining %token-table (I believe yytranslate[] is essential).
BTW my program is working, and the grammar matched a large and complicated input
file, as desired. So despite the documentation issue, I was able to figure it
out. The biggest hassle was having to use pointers to C++ string and map (I
considered defining YYSTYPE as a polymorphic class, but decided against it).
Tom Roberts
[#] The bison C++ interface seems overly complex -- I am linking into a >2
million-line scientific program with a custom build system, and the many files
generated by bison in C++ mode are difficult to deal with; the single .tab.c
file is better for me. The build system does not know about sfxread.y, only
sxfread.tab.c, so bison is used as a "text editor" rather than a build tool;
fortunately the grammar won't change often, if at all (it has only 8 terminals).
As it is #include-d inside a namespace, I can still add additional parsers to
the program in the future.
Joel E. Denny wrote:
Hi Tom,
On Sat, 25 Dec 2010, Tom Roberts wrote:
I want to have the grammar define the keywords as literal strings, so on first
call my yylex() will build up its list of keywords by scanning yytname[] for
entries beginning with '"'.
As you no doubt know, yytname is requested using %token-table. However,
since 2001 (according to our vc log), Bison's TODO has described
%token-table as a broken feature that might not be worth keeping.
Unfortunately, %token-table originated before my time, and I have no
practical experience with it, so it's hard for me to determine the best
way forward.
In both bison-2.4.3/doc/bison.info and on page 84 of
http://www.gnu.org/software/bison/manual/bison.pdf , the example code to map a
string terminal to the return value from yylex() is incomplete -- it only
gives a loop over yytname[], without telling the user what value to return.
I agree that the documentation is unclear here.
The loop variable is i, and the value that must be returned is yytoknum[i].
But yytoknum[] is inside that #ifdef and is not available.
The manual does not document yytoknum, and that's usually a sign it wasn't
intended for users.
Does anyone remember exactly how yytname was originally intended to be
used?