Re: Bison lexer

2018-08-31 Thread Hans Åberg


> On 1 Sep 2018, at 00:12, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>>> I haven't used gcc-8 yet, but how is this relevant? If anything, I
>>> expect newer gcc versions to produce more warnings (usually useful)
>>> which flex might also suffer from.
>> 
>> Maybe the Flex lexers errors is due to using C89 to compile it or something.
> 
> No, the warnings seemed legit.

It uses "register" which has been deprecated in C++17.

>>> Interesting, thanks. Fortunately, my REs are not so complex, so the
>>> bug you reported won't affect me and lexing speed is not so
>>> important for me, so (at least for now) I can just use the library
>>> as is. But if I ever need something more sophisticated, I'll keep
>>> this in mind.
>> 
>> If that is what you are using, note that it is recursive, so the function 
>> stack might overflow. But perhaps the rewrite it someday.
> 
> I don't think my lexing REs should cause much recursion. No nested
> repetitions or such.

It is in the backtracking, which it does instead of a DFA iteration, in the GCC 
regex library, that is. Some example in the links I gave illustrate that.





Re: Bison lexer

2018-08-31 Thread Frank Heckenbach
Hans Åberg wrote:

> > I haven't used gcc-8 yet, but how is this relevant? If anything, I
> > expect newer gcc versions to produce more warnings (usually useful)
> > which flex might also suffer from.
> 
> Maybe the Flex lexers errors is due to using C89 to compile it or something.

No, the warnings seemed legit.

> > Interesting, thanks. Fortunately, my REs are not so complex, so the
> > bug you reported won't affect me and lexing speed is not so
> > important for me, so (at least for now) I can just use the library
> > as is. But if I ever need something more sophisticated, I'll keep
> > this in mind.
> 
> If that is what you are using, note that it is recursive, so the function 
> stack might overflow. But perhaps the rewrite it someday.

I don't think my lexing REs should cause much recursion. No nested
repetitions or such.

Regards,
Frank



Re: Bison lexer

2018-08-31 Thread Hans Åberg


> On 31 Aug 2018, at 22:26, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>>> For a start, I didn't have very good experience communicating with
>>> Flex maintainer(s?) who seemed rather nonchalant WRT gcc warnings
>>> etc. in the generated code, so over the years I'd been adjusting
>>> various warning-suppression gcc options or doing dirty #define
>>> tricks to avoid warnings, or sometimes even post-processing the
>>> generated lexer with sed.
>> 
>> GCC 8.2 uses C17 as default.
> 
> I haven't used gcc-8 yet, but how is this relevant? If anything, I
> expect newer gcc versions to produce more warnings (usually useful)
> which flex might also suffer from.

Maybe the Flex lexers errors is due to using C89 to compile it or something.

>>> But the final straw was when, after changing to C++ Bison, I wanted
>>> to switch to C++ Flex too and found this beautiful comment:
>>> 
>>>   /* The c++ scanner is a mess. The FlexLexer.h header file relies on the
>>>* following macro. This is required in order to pass the 
>>> c++-multiple-scanners
>>>* test in the regression suite. We get reports that it breaks 
>>> inheritance.
>>>* We will address this in a future release of flex, or omit the C++ 
>>> scanner
>>>* altogether. */
>> 
>> It has been like that since the 1990s, I believe.
> 
> Even better! :(
> 
> Especially since C++ in the 1990s was totally different from modern
> C++, so I have no idea if anything of this comment is still
> relevant, or maybe even more relevant, today compared to then.

Indeed, very old.

> Lesson (as if anyone was listening): Always put a date on such
> messages.

Probably just a hack, never actually developed.

>>> So I wrote a small library that builds that massive RE out of single
>>> rules and maps subexpressions back to rules (even in the case that
>>> rules contain subexpressions of their own), and that works for me.
>> 
>> I did that, too: I wrote some DFA/NFA code, and incidentally found
>> the most efficient method make action matches via a reverse NFA
>> lookup, cf. [1-3]. Also, I have made UTF-8/32 to octet character
>> class translations.
>> 
>> 1. https://gcc.gnu.org/ml/libstdc++/2018-04/msg00032.html
>> 2. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
>> 3. https://gcc.gnu.org/ml/libstdc++/2018-05/msg00015.html
> 
> Interesting, thanks. Fortunately, my REs are not so complex, so the
> bug you reported won't affect me and lexing speed is not so
> important for me, so (at least for now) I can just use the library
> as is. But if I ever need something more sophisticated, I'll keep
> this in mind.

If that is what you are using, note that it is recursive, so the function stack 
might overflow. But perhaps the rewrite it someday.





Re: Bison lexer

2018-08-31 Thread Frank Heckenbach
Hans Åberg wrote:

> > For a start, I didn't have very good experience communicating with
> > Flex maintainer(s?) who seemed rather nonchalant WRT gcc warnings
> > etc. in the generated code, so over the years I'd been adjusting
> > various warning-suppression gcc options or doing dirty #define
> > tricks to avoid warnings, or sometimes even post-processing the
> > generated lexer with sed.
> 
> GCC 8.2 uses C17 as default.

I haven't used gcc-8 yet, but how is this relevant? If anything, I
expect newer gcc versions to produce more warnings (usually useful)
which flex might also suffer from.

> > But the final straw was when, after changing to C++ Bison, I wanted
> > to switch to C++ Flex too and found this beautiful comment:
> > 
> >/* The c++ scanner is a mess. The FlexLexer.h header file relies on the
> > * following macro. This is required in order to pass the 
> > c++-multiple-scanners
> > * test in the regression suite. We get reports that it breaks 
> > inheritance.
> > * We will address this in a future release of flex, or omit the C++ 
> > scanner
> > * altogether. */
> 
> It has been like that since the 1990s, I believe.

Even better! :(

Especially since C++ in the 1990s was totally different from modern
C++, so I have no idea if anything of this comment is still
relevant, or maybe even more relevant, today compared to then.

Lesson (as if anyone was listening): Always put a date on such
messages.

> > So I wrote a small library that builds that massive RE out of single
> > rules and maps subexpressions back to rules (even in the case that
> > rules contain subexpressions of their own), and that works for me.
> 
> I did that, too: I wrote some DFA/NFA code, and incidentally found
> the most efficient method make action matches via a reverse NFA
> lookup, cf. [1-3]. Also, I have made UTF-8/32 to octet character
> class translations.
> 
> 1. https://gcc.gnu.org/ml/libstdc++/2018-04/msg00032.html
> 2. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472
> 3. https://gcc.gnu.org/ml/libstdc++/2018-05/msg00015.html

Interesting, thanks. Fortunately, my REs are not so complex, so the
bug you reported won't affect me and lexing speed is not so
important for me, so (at least for now) I can just use the library
as is. But if I ever need something more sophisticated, I'll keep
this in mind.

Regards,
Frank