Bison lexer

Hans Åberg Wed, 29 Aug 2018 06:57:13 -0700


> On 29 Aug 2018, at 00:31, Frank Heckenbach <[email protected]> wrote:
> 
> Hans Åberg wrote:
> 
>>> On 27 Aug 2018, at 22:10, Akim Demaille <[email protected]> wrote:
>>> 
>>>> Most of my porting work, apart from writing the new skeletons, was
>>>> general grammar cleanup and conversion of semantic types from raw
>>>> pointers and containers to smart pointers and other RAII classes
>>>> (which was my main goal of the port, of course), and changes in the
>>>> lexer (dropping flex, but that's another story).
>>> 
>>> I fought a lot with Flex, but it works ok in C++ too with lalr1.cc.
>>> I have one parser here, 
>>> https://gitlab.lrde.epita.fr/vcsn/vcsn/tree/master/lib/vcsn/dot,
>>> and another there 
>>> https://gitlab.lrde.epita.fr/vcsn/vcsn/tree/master/lib/vcsn/rat
>>> for instance, using Flex.
>> 
>> That is probably versions before 2.6; the yyin and yyout have been
>> changed in the C++ header so that they are no longer pointers, so
>> it is not only incompatible with the header of older versions, but
>> also with the code it writes, resulting in the issue [1].
>> 
>> 1. 
>> https://stackoverflow.com/questions/34438023/openfoam-flex-yyin-rdbufstdcin-rdbuf-error
> 
> Though this wasn't actually my problem, I'll reply to this mail
> rather than the main thraed to keep it separate from the actual
> Bison discussion.


One can change the subject. :-)

> For a start, I didn't have very good experience communicating with
> Flex maintainer(s?) who seemed rather nonchalant WRT gcc warnings
> etc. in the generated code, so over the years I'd been adjusting
> various warning-suppression gcc options or doing dirty #define
> tricks to avoid warnings, or sometimes even post-processing the
> generated lexer with sed.

GCC 8.2 uses C17 as default.

> But the final straw was when, after changing to C++ Bison, I wanted
> to switch to C++ Flex too and found this beautiful comment:
> 
>    /* The c++ scanner is a mess. The FlexLexer.h header file relies on the
>     * following macro. This is required in order to pass the 
> c++-multiple-scanners
>     * test in the regression suite. We get reports that it breaks inheritance.
>     * We will address this in a future release of flex, or omit the C++ 
> scanner
>     * altogether. */

It has been like that since the 1990s, I believe.

> I know there are no guarantees in the future of free software
> (neither of non-free software, of course), but such an
> announcement/threat seemed too risky to me.

Indeed, it seems broken now.

> Meanwhile I'd often thought that all Flex actually does is matching
> alternative regular expressions. Plain RE can do that as well, and
> by capturing subexpressions I can find out which alternative was
> matched.
> 
> Of course, it would (indeed turn out to be) somewhat slower (RE
> built at runtime vs. compile time), but like parsing, lexing speed
> is not a big issue to me. So I was ready to trade that in for
> convenience of programming and one less dependence on a problematic
> tool.
> 
> (Side node: Many years ago, on a different project, I dropped gperf
> to recognize predefined identifiers for similar reasons, and put
> them in a look-up table instead. Except for a tiny slowdown, that
> had worked out well, so I was confident I could drop Flex, too. --
> Now apparently the next one in line after dropping gperf and Flex
> should be Bison, but don't worry, I don't see an easy way to replace
> it, since Bison actually does some nontrivial stuff. :)
> 
> So I wrote a small library that builds that massive RE out of single
> rules and maps subexpressions back to rules (even in the case that
> rules contain subexpressions of their own), and that works for me.

I did that, too: I wrote some DFA/NFA code, and incidentally found the most 
efficient method make action matches via a reverse NFA lookup, cf. [1-3]. Also, 
I have made UTF-8/32 to octet character class translations. 

1. https://gcc.gnu.org/ml/libstdc++/2018-04/msg00032.html
2. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
3. https://gcc.gnu.org/ml/libstdc++/2018-05/msg00015.html

Bison lexer

Reply via email to