Hi Akim, I managed to take the first step and get it running but it wasn't as easy as I thought. First, I wanted to take the approach that the 'Simple C++ Example' demonstrates in the bison manual. However, I could not figure out what my yylex() should return when defining api.value.type variant. In the C parser I used to return an int so to avoid turning everything upside down in one big step, I sticked to that concept and tried what happens if I define my tokens as %token <int> mytoken number where 'mytoken' is just a token name and 'number' is the number assigned to it. As a real example:
%token <int> t_Con 1 The question is, in case of having many tokens like I do, how do I decide which shall be returned, as bison now generates a symbol_type make_TOKEN() for each token which I shall be able to return in yylex(). Though, I'd rather not put a huge switch() in yylex(). Is there any other solution like defining a "dummy" token like %token <int> INT; whose constructor make_INT(const& int) would simply return the int passed to it? Or shall I simply try to cast the integers of my tokens to symbol_type? The other problem I ran into was related to the non-terminals: wherever I wanted to read the value of a symbol in an action via e.g. $1, I got an error about type conversion as it could not be converted any more to an integer as in the C parser. For this I have only one guess namely, that each non-terminal needs a %type declaration like %type <int> ENG_Con; and even the = operator needs to be defined for it, right? So here I got stuck at least with regards to api.value.type variant. Then I decided to take a step back and not to use complete symbols but split symbols for a first try. This I managed to figure out and make it work but with a small hack as I declared yylex as: int yylex(int* yylval); If I did it like: int yylex(semantic_type* yylval); the compiler kept complaining about not knowing semantic_type (nor parser::semantic_type, nor yy::parser::semantic_type). So I took clang's hint when it said semantic_type* is aka int* and it worked. In the end, to make my hack a bit more nicer, I added %define api.value.type {int}. Still, the question in this case would be, how shall yylex() be correctly declared? I also attached the source I ended up with which was originally generated for a C parser but I manually modified and played around with it till I got it working. Thanks for any hints! Best regards, r0ller -------- Eredeti levél -------- Feladó: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) > Dátum: 2018 november 12 11:43:47 Tárgy: Re: bison for nlp Címzett: Akim Demaille < a...@lrde.epita.fr (Link -> mailto:a...@lrde.epita.fr) > Hi Akim, Sorry for the delay, I had to go through my own code to be able to answer your question about the tokens:) But to begin with your first observation, you're right: I should wrap that conditional ternary op for logging. After going through the code, I concluded that currently it's only used to make sure that a constant (or unknown word) is mapped to the same symbol in each language having the token value 1. But it could be solved in a different way e.g. that each language can define its own symbol for constants in a newly introduced (customizing) db table. As you can guess, currently if the program bumps into a token with value 1, then it assumes that it's an unknown word/morpheme/constant. It seemed ok 8 years ago, but now it has it 's price to turn back the wheels. However, I think I'll do it even though it seems that individual numbering does not cause any problem as there are only two numbers to avoid conflicts (0 and 256). The other place where it'd be used is the symbol prediction (where I need to remap a token to a symbol) in case of an error but that method is currently not called at all as it does not yet work well and now I just return the bison error message about the expected symbols. Mine would have the additional functionality on top that it'd not only tell what's syntactically expected but would return a subset of those symbols which are semantically expected. Concerning the c++ bison wrapper, what I mentioned is simply that I read somewhere an article in 2010 when I started the project which made that statement and I didn't even validate it. But now I'm pretty curious about its c++ features so I'll definitely go through the documentation you sent and try to turn mine into a c++ parser:) Best regards, r0ller -------- Eredeti levél -------- Feladó: Akim Demaille < a...@lrde.epita.fr (Link -> mailto:a...@lrde.epita.fr) > Dátum: 2018 november 9 06:13:29 Tárgy: Re: bison for nlp Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) > Hi! > Le 7 nov. 2018 à 10:09, r0ller <r0l...@freemail.hu> a écrit : > > Hi Akim, > > The file hi_nongen.y is just left there as the last version that I wrote > manually:) If you check out any other hi.y files in the platform specific > directories (e.g. the one for the online demo is > https://github.com/r0ller/alice/blob/master/hi_js/hi.y but you can have a > look in hi_android or hi_desktop as well) you’ll see how they look like > nowadays. You have tons of logger::singleton()==NULL?(void)0:logger::singleton()->log(2,"vm is NULL!"); you could introduce logger::log, or whatever free function, that does that for you instead of having to deal with that in every call site. > Numbering tokens was introduced in the very beginning and has been questioned > by myself quite a many times if it's still needed. I didn’t give a hard try > to get rid of it mainly due to one reason: I want to have an error handling > that tells in case of an error which symbols could be accepted instead of the > erroneous one just as bison itself does it but in a structured way (as bison > returns that info in an error message string). Where are these numbers used? > Though, I could not come up with any better idea when it comes to remapping a > token to a symbol. As far as I know bison uses internally the tokens and not > the symbols for the terminals and it's not possible to get back a symbol > belonging to a certain token. That's it roughly but I'd be glad to get rid of > it. However, if it's not possible and poses no problems then I can live with > it. By the way, are there any number ranges or specific numbers that are > reserved? Some numbers are reserved, yes: 0 for eof and 256 for error (per POSIX). For error, Bison can accommodate if you use 256. EOF must be 0. > Not using the C++ features of bison has historical reasons: I started writing > the project in C and even back then I used yacc which I later replaced with > bison. When I started to shift the project to C++ I was glad that it still > worked with the generated C parser and since then I never had time to make > such an excursion but it'd be great. I also must admit that I wasn't really > aware of it. The only thing I read somewhere was that bison has a C++ wrapper > but have never taken any steps into that direction. I don’t know what you mean here: this is bison itself, there’s no need for a wrapper, and the deterministic parser itself is genuine C++, not C++ wrapping C. The GLR parser in C++ though _is_ a wrapper for the C GLR parser. > Now I think I'll find some time for it -at least to check it out:) Could you > give me any links pointing to any tutorial or something like that? It’d be > very kind if you could help me in taking the first steps, thanks! I would very like to have your opinion on the open section of the documentation about C++. It’s recent, and it probably needs polishing. https://www.gnu.org/software/bison/manual/bison.html#A-Simple-C_002b_002b-Example
hi.y
Description: Binary data
_______________________________________________ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison