Re: Is there a Bison IRC channel?

2023-12-05 Thread Hans Åberg


> On Dec 5, 2023, at 03:16, Steve Litt  wrote:
> 
> Is there a Bison IRC channel? Is there a Flex IRC channel?

You might try the Usenet newsgroup comp.compilers.





Re: yytname woes

2023-11-13 Thread Hans Åberg


> On Nov 12, 2023, at 18:18, James K. Lowden  wrote:
> 
> On Mon, 13 Nov 2023 10:52:02 +0100
> Hans Åberg  wrote:
> 
>>> Let's start, shall we, with "is it a bug"?  
>> 
>> The parser knows how to translate token numbers into names for error
>> messages, but there is no API for that as far as I know.
> 
> I disagree.  Bison might know how to do that, but "the parser" does
> not.

Read the generated parser code: Error message strings are generated by 
translating token numbers into symbol names, first into the symbol-number 
mentioned.

>  Bison is a parser-generator.  I'm writing "the parser" using
> Bison. I implemented yyerror(). I'm producing the error message using
> the only vehicle I've found, yytname.  It's insufficient to the
> purpose.

It generates an error message string, which one is expected to use. Not the 
internal tables directly.





Re: yytname woes

2023-11-13 Thread Hans Åberg


> On Nov 12, 2023, at 02:23, James K. Lowden  wrote:
> 
> My yytname array is obviously messed up.  Whether or not it's needed by
> the parser, whether or not ASCII characters can be used as tokens, is
> immaterial.  
> 
> Let's start, shall we, with "is it a bug"?  

The parser knows how to translate token numbers into names for error messages, 
but there is no API for that as far as I know.





Re: yytname woes

2023-11-12 Thread Hans Åberg


> On Nov 12, 2023, at 00:06, James K. Lowden  wrote:
> 
> I think the purpose of the yytname array is simple: for each token
> (that is not a character), it holds a string with the token's name.

Yes, and it is not needed in the actual parser. It is used to print error 
messages, but one can also use it in the grammar:
%token this_key "this"
…
%%
…
this_rule:
  "this" …
;




Re: yytname woes

2023-11-11 Thread Hans Åberg


On Nov 10, 2023, at 01:57, James K. Lowden  wrote:
> 
> I can't seem to look up token names in yytname correctly using enum
> yytokentype.
…
> When I look up #899, NOT, I get "NOT".  But when I look up #900, NE, I
> get "'<'" because that's the next element in yytname (900 - 255).

In Bison, being compatible with POSIX Yacc, it is allowed use characters as 
token numbers, so those numbers are reserved for that purpose.





Re: Online yacc/lex grammar editor/tester

2023-09-07 Thread Hans Åberg


> On 7 Sep 2023, at 19:01, Domingo Alvarez Duarte  wrote:
> 
> I already know about that grammar and it's already there as "Cxx parser (not 
> working)" because it uses "error" and auxiliary code to parse "C++" and the 
> parser I'm using right now doesn't support "error" and at first it's mostly 
> declarative only, anyway as the time goes by probably other ways to achieve 
> the same can arrive.

So your Cxx is not this one:
https://github.com/dtolnay/cxx




Re: Online yacc/lex grammar editor/tester

2023-09-07 Thread Hans Åberg


> On 7 Sep 2023, at 15:56, Domingo Alvarez Duarte  wrote:
> 
> I'm trying to build an online yacc/lex (LALR(1)) grammar editor/tester to 
> help develop/debug/document grammars the main repository is here 
> https://github.com/mingodad/parsertl-playground and the online playground 
> with several non trivial examples is here 
> https://mingodad.github.io/parsertl-playground/playground/ .
> 
> Select a grammar/example from "Examples" select box and then click "Parse" to 
> see a parser tree for the source in "Input source" editor.
> 
> It's based on https://github.com/BenHanson/gram_grep and 
> https://github.com/BenHanson/lexertl14 .
> 
> Any feedback is welcome !

There is a C++ LALR(1) grammar, see:
https://isocpp.org/wiki/faq/compiler-dependencies#yaccable-grammar





Re: Pattern matches one type of paren but not another

2023-05-28 Thread Hans Åberg


> On 28 May 2023, at 20:12, Maury Markowitz  wrote:
> 
> Following, sort of, some advice I received earlier I modified my bison by 
> adding this:
> 
> open_bracket:
> '(' | '[';
> close_bracket:
> ')' | ']’;

You probably do not want to exchange delimiter types, as in "(…]" and "[…)".

> And then modified the original code thus:
> 
> variable:
> VARIABLE_NAME
> {
>   variable_t *new = malloc(sizeof(*new));
>   new->name = $1;
>   new->subscripts = NULL;
>   new->slicing = NULL;
>   $$ = new;
> }
> |
> VARIABLE_NAME open_bracket exprlist close_bracket
> {
>   variable_t *new = malloc(sizeof(*new));
>   new->name = $1;
>   new->subscripts = $3;
>   new->slicing = NULL;
>   $$ = new;
> }


To exclude that, use:

operator_value:
   VARIABLE_NAME '(' exprlist ')'
 | VARIABLE_NAME '[' exprlist ']'
;

Or more structured (in a C++-like syntax):
operator_value:
   function_value
 | index_value
;

function_value:
  VARIABLE_NAME '(' exprlist ')'
;

index_value:
  VARIABLE_NAME '[' exprlist ']'
;





Re: What is a Parser Skeleton?

2023-04-12 Thread Hans Åberg
[Please cc the Bison list, as others can follow the issue and tune in if 
needed.]

> On 12 Apr 2023, at 11:18, Johannes Veit  wrote:
> 
> Hello and sorry for the long pause and tanks for your explanation!It makes 
> sense, but I don’t see a proper connection to the Bison Exception.
>  In my understanding of the text, the exception comes in place, when I (as a 
> user of bison) going to build an own parser generator using bison.
…
> My interpretation is, that it is more than protecting the rights of the 
> skeletons, it is rather a protection against a forked* parser generator under 
> e.g. proprietary terms.
> *forked=>I know, that it is not forking in the sense of forking a repo. What 
> I mean is: using bison to generate it, instead of writing it with own hands

In my interpretation, there are two parts:

The skeleton file contains handwritten parts which are subject to copyright, 
and in normal use is copied over to the generated parser. So one wants to avoid 
the full GPL to apply, thereby restricting the copyright of the program it is a 
part of. The same thing is used for C/C++ libraries: you can write and compile, 
if you so will, proprietary programs using GCC and distribute them without GPL 
applying to them.

In addition, for special use, one may copy the whole skeleton, modify it, and 
include it in a program, as though it was LGPL, but not if that program is in 
itself a parser generator that generats parsers; then the full GPL will apply 
for that program.

This latter, to make ones own edited skeleton file, I do not recommend unless 
it is really needed, because it may change between Bison versions, and it is 
hard to keep it in sync. So it is better trying to get special feature into the 
Bison project.





Re: What is a Parser Skeleton?

2023-02-23 Thread Hans Åberg


> On 23 Feb 2023, at 11:25, Johannes Veit  wrote:
> 
> Hello Mr. Åberg and thanks a lot for the explanation.
> 
> I attached a pdf where I (try to) explain the Bison exception. 
> Could you please verify if it is correct?

It is the Bison license exception that is referred to, I believe: Copyright 
applies to essentially creatively unique parts, but not to machine processed 
parts. For example, in an editor, what you write is copyrightable, but writer 
of the editor cannot claim copyright of that material for the electronic 
processing.

Now, Bison processes the grammar one writes, using an algorithm like LALR, and 
generates output which is not in itself copyrightable by the Bison copyright 
holder. However, that output is combined with the material in the skeleton 
file, which Bison now uses M4 for, in the past a simpler type of processor. 
That skeleton material is copyrightable and forwarded to the output, and 
therefore the Bison copyright holders must decide what copyright should apply.

For situations, GNU has developed the LGPL [1], which does not impose full the 
GPL license on such material that in nature is like machine processed, but 
formally not.

https://en.wikipedia.org/wiki/GNU_Lesser_General_Public_License





Re: What is a Parser Skeleton?

2023-02-10 Thread Hans Åberg


> On 10 Feb 2023, at 13:52, Johannes Veit  wrote:
> 
> "Parser Skeleton" is crucial for understanding the Bison Exception
> , but I didn't find a
> proper explanation of that term.
> 
> So either:
> Where can I find a proper explanation of "Parser Skeleton"?
> Or:
> What is a "Parser Skeleton"?

It is M4 code used as a template, details from Bison, to write the generated 
parser.





Re: how to solve this reduce/reduce conflict?

2022-09-22 Thread Hans Åberg


> On 22 Sep 2022, at 21:02, Lukas Arsalan  wrote:
> 
> On 2022-09-22T15:54:31UTC Hans Åberg  wrote:
>> Context switches are best avoided unless absolutely necessary, in my 
>> experience.
>> So if one designs ones own language, it might be good to try to avoid them
>> by a change in the grammar.
>> 
> OK... I know that there are no signed numbers usually... But I wanted to try 
> to change that...
> So for _me_ in "-2" the minus is a sign... And in "- 2" the minus is a unary 
> inversion operator... And in "1-2"  the minus is a subtraction operator (or 
> an abbreviation for "1+-2" respectively (where the minus is a sign again))...
> This can all be done quite elegantly with this context trick in the ll-file...

I think the C/C++ interpretation with a unary operator and no signed integers 
is the best one for arithmetic expressions. Having a sign as a prt of the numer 
may be suitable in other contexts.

>> It might be confusing with -2^4 meaning (-2)^4, because in 1 - 2^4, it 
>> should be 1 - (2^4),
>> and 1 -2^4 would be an error if two number cannot follow each other.
>> 
> "1 -2^4" is no error in my program... it results in "-15".
> It even says, that "- 2^4" is "-16", while "-2^4" is "16". 拾
> 
> Do u think there will be any unwanted side effects?

In the minds of those interpreting it. :-)





Re: how to solve this reduce/reduce conflict?

2022-09-22 Thread Hans Åberg


> On 22 Sep 2022, at 16:52, Lukas Arsalan  wrote:
> 
> On 2022-09-22T07:57:45UTC Hans Åberg  wrote:
>> On 22 Sep 2022, at 08:30, Lukas Arsalan  wrote:
>>> [1] -1 --> "num"
>>> [2] 1-2 --> "num" "-" "num"
>>> [3] (-1^-2) --> "(" "num" "^" "num" ")"
>>> [4] 1--2 --> "num" "-" "num"
>>> [5] 1---3 --> "num" "-" "-" "num"
>>> [6] 1-2^3 --> "num" "-" "num" "^" "num"
>>> I do not think that it is possible, to do that with regular expressions...
>>> 
>> I think it is not possible, so therefore one expects -2⁴ to be parsed as 
>> -(2⁴).
>> 
> I found that `%s nosinum` for the ll-file...
> Now I can do things like this:
> "+" BEGIN(INITIAL); return yy::parser::make_ADD(loc);
> "(" BEGIN(INITIAL); return yy::parser::make_BROP(loc);
> ")" BEGIN(nosinum); return yy::parser::make_BRCL(loc);
> {bint}  BEGIN(nosinum); return make_INT(yytext,loc);
> {float} BEGIN(nosinum); return make_FLOAT(yytext,loc);
> [+-]?{bint}BEGIN(nosinum); return make_INT(yytext,loc);
> [+-]?{float}   BEGIN(nosinum); return make_FLOAT(yytext,loc);
> 
> and i removed the SNUM token...
> 
> now it seems to work just right.. 拾
> 
> it even handles the whitespaces to my liking... 
> 
> but i do not know what kind of formal language that is now...

Context switches are best avoided unless absolutely necessary, in my 
experience. So if one designs ones own language, it might be good to try to 
avoid them by a change in the grammar.

It might be confusing with -2^4 meaning (-2)^4, because in 1 - 2^4, it should 
be 1 - (2^4), and 1 -2^4 would be an error if two number cannot follow each 
other.





Re: how to solve this reduce/reduce conflict?

2022-09-22 Thread Hans Åberg


> On 22 Sep 2022, at 08:30, Lukas Arsalan  wrote:
> 
> Hi,
> 
> At 2022-09-22T07:08:55CEST Akim Demaille  wrote:
>> This snippet is clearly ambiguous, since it allows two different parses of 
>> -1, which -Wcex nicely showed.
>> 
> yes. right.
> 
>> If I were you, I would handle this in the scanner.  IOW, the scanner should 
>> be extended to support signed literals, and > process that initial `-`.
>> 
> uhm... is that possible?
> e. g.:
> [1] -1 --> "num"
> [2] 1-2 --> "num" "-" "num"
> [3] (-1^-2) --> "(" "num" "^" "num" ")"
> [4] 1--2 --> "num" "-" "num"
> [5] 1---3 --> "num" "-" "-" "num"
> [6] 1-2^3 --> "num" "-" "num" "^" "num"
> I do not think that it is possible, to do that with regular expressions...

I think it is not possible, so therefore one expects -2⁴ to be parsed as -(2⁴).





Re: Is there a way to have two "cases" with one method?

2022-08-22 Thread Hans Åberg


> On 22 Aug 2022, at 18:12, Maury Markowitz  wrote:
> 
> In my BASIC interpreter’s bison code I have one of these:
> 
>  CHANGE variable TO variable
…
> CHANGE is from the original Dartmouth BASIC. It turns out that HP included an 
> identical feature in their dialect, CONVERT. So I did:
…
>  CONVERT variable TO variable
> Works great.
> 
> My question is whether there is a simple way to combine the two to eliminate 
> the duplicated code?

For example:

CHANGE_or_CONVERT variable TO variable;

CHANGE_or_CONVERT: CHANGE | CONVERT;





Re: [Suspected Spam] Weird Rule Matching

2022-04-07 Thread Hans Åberg


> On 7 Apr 2022, at 16:35, Tom Flux  wrote:
> 
> Thanks a lot for the example, that's pretty close to how I modeled my 
> instructions, I think I'm am just storing them slightly weirdly.
> 
> Think I am finally getting the hang of it... :)

Make sure to post replies to the list so that others can follow the issue and 
tune in at need.





Re: Weird Rule Matching

2022-04-07 Thread Hans Åberg


> On 7 Apr 2022, at 12:00, tom2  wrote:
> 
> I have heard of an AST, and, against my better judgement, thought they were 
> to complex for my needs and decided to represent the instructions as one long 
> list, that gets edited by loops/conditionals.
> 
> I see the error of my ways now, but I am too close to the deadline of this 
> project for me to go back and change it now...

Here is a functional style outline in terms of C++:

First a base class with a virtual function evaluate(), say:
class unit {
  ref evaluate(ref) { return {}; }
};
where ref is say the C++ reference counting class std::shared_ptr.

Then
class function_application : public virtual unit {
object f_, a_;   // Function and arguments.
  
ref evaluate(ref x) { return f_(a_(x)); }
};

In addition, classes for functions and tuples.

When constructed, this will delay any action until applies 'evaluate' to the 
object.

For an if-then-else conditional, it might have an boolean argument b, and an 
argument to evaluate if true t and false f_
class if_then_else {
   object b_, t_, f_;

   ref evaluate(ref x) { if (b_(x)) return t_(x) else return f_(x); 
}
};

For loops, it is possible in C++ to implement break and continue objects that 
throw exceptions to achieve the appropriate effect. This is not particularly 
efficient, but easy to write.

There are more expanded and efficient ways to do this, but the underlying idea 
would be the same.





Re: Weird Rule Matching

2022-04-07 Thread Hans Åberg


> On 7 Apr 2022, at 10:14, tom2  wrote:
> 
> It does actually cause an issue because I am rely on the idea that the rule 
> will be matched before the next if token is found, in order to have nested 
> conditionals.

Typically, one build an AST (abstract syntax/semantic tree) that can be 
executed after the parse. This is necessary for loops. Sometimes it is 
necessary to avoid lookahead when the lexer has context switches that are set 
in the parser; this can be done by have a distinct lookahead token for such 
contexts (there is more info about it in the Bison manual somewhere).





Re: Typescript grammar in Bison

2022-03-22 Thread Hans Åberg


> On 22 Mar 2022, at 21:24, Ricard Gascons  wrote:
> 
> I've been a Bison user for some time now, I've been writing some toy
> projects here and there. I was wondering if there are online resources I
> could find existing grammar from well-known programming languages? Just the
> definitions, not the implementations of course.
> 
> More specifically, I'm looking for an existing Typescript grammar I could
> use for a personal project. I guess Javascript would work too. I've looked
> online and haven't had any luck so far.

You might ask in the Usenet newsgroup comp.compilers.





Re: glr2.cc compile errors under Windows

2021-11-21 Thread Hans Åberg


> On 21 Nov 2021, at 13:02, Jot Dot  wrote:
> 
>>   53 #if defined __cplusplus
>>   54 # define YY_CPLUSPLUS __cplusplus
>>   55 #else
>>   56 # define YY_CPLUSPLUS 199711L
>>   57 #endif
>> 
>> Please check why your compiler does not define __cplusplus.  Compliant 
>> compilers
>> must define it properly so that we can know what version of C++17 we're in.
>> See https://en.cppreference.com/w/cpp/preprocessor/replace#Predefined_macros.
> 
> 
> It is defined. Just not what we think it should be. It is 199711L

Some compilers do this, also Apple clang uses C++98. So one must use an option 
-std=c++20 or something. The default on GCC 11 is C++17, and on Clang 12 it is 
C++14.




Re: Syntax error messages

2021-10-02 Thread Hans Åberg



> On 1 Oct 2021, at 23:30, Christian Schoenebeck  
> wrote:
> 
>> For the purpose of writing out the line in the error messages, this method
>> (using C++) did not work out well, because I have two parsers, one for the
>> language and one for directives, and it turns out to be difficult to pass
>> the location information back to the top parser.
>> 
>> So instead, in addition to the input stream stack, I added two, for the
>> current stream position, and the current stream line position. Because of
>> the lexer buffering, they are computed in the lexer. These are properties
>> attached to the input streams then, not the parser locations.
>> 
>> In the Bison type, I use line number and for columns the number of UTF-8
>> characters. An ASCII caret marking the error is surprisingly accurate even
>> in the presence of non-ASCII characters. But perhaps one should have a
>> method to mark it on the line itself, not underneath.
> 
> Hmm, those two parsers run independently from each other, or do you rather 
> mean you have coupled them in a way that they cross-influence their behaviour 
> *while* they are still running?

Currently the main language parser calls the directive parser so that one can 
set switches for the behavior. The directive parser just reads the current file 
that the language parser provides. But the latter would be desirable say using 
coroutines, because then the directive parser could things that is above the 
main language, such as loading input files (as in the C preprocessor).

> So far I have not encountered any restriction with my location approach. I'm 
> using it for all kinds of things like, of course warnings/errors on the CLI, 
> highlighting of the same in code editors, but also for code refactoring 
> stuff. 
> The latter only works well with a full language aware parser, unlike those 
> typical RegEx hacks.

When calling the directive parser from the language parser, I found it 
difficult to pass the location information back to the language parser, getting 
out of sync, as its current location is an independent copy of the location 
passed to it. Therefore, I switched to using separate stream current and line 
positions, computed in the parsers because of lexer buffering, stacked for 
recursive streams.

One spinoff is that I can have a program startup option --directive="…" which 
calls the directive parser via C++ string stream, saving work for having 
separate options for all that it can do. And error messages work there as well.




Re: Getting a Counter

2021-10-01 Thread Hans Åberg



> On 1 Oct 2021, at 07:16, Guenther Sohler  wrote:
> 
> Hi,
> In my parser i don't want flex to return comments in my code as flex tokens.
> Instead i want to store them in a different list together with their
> position.
> 
> Lateron i am using their position to included them in the code again.
> For that reason i am interested to know the Token position in the parsed
> stream.
> Ideally i need the number in the exact same scope as its displayed during
> ambiguity resolution.
> After some googling i could not find such a feature.
> Does anybody know such a variable next to yylval or yytext ?

As the lexer must buffer the input, it is necessary to compute the stream 
position in the lexer. As Akim pointed out, in Flex, this can be done with a 
user action (C++ here):

std::istream::pos_type current_position = 0;

%{
#define YY_USER_ACTION  yylloc.columns(length_utf8(yytext)); current_position 
+= yyleng;
%}

This example also shows how to compute the UTF-8 column number for the Bison 
parser, if one wants that.

If files are opened recursively, one needs a stack for the current position as 
well.





Re: Syntax error messages

2021-10-01 Thread Hans Åberg


> On 28 Sep 2021, at 14:10, Christian Schoenebeck  
> wrote:
> 
> On Montag, 27. September 2021 22:07:33 CEST Hans Åberg wrote:
>>>> In order to generate better syntax error messages writing out the input
>>>> line with the error and a line with a marker underneath, I thought of
>>>> checking how Bison does it, but I could not find the place in its
>>>> sources. —Specifically, a suggestion is to tweak YY_INPUT in the lexer
>>>> to buffer one input line at a time, but Bison does not seem to do that.> 
>>> No, I keep track of the byte offset in the file, and print from the file,
>>> which I reopen to quote the source.
>> OK. I thought of this method, but then it does not work with streams.
> 
> In the past at least, builtin location support did not work well for me. So 
> I'm usually overriding location data type and behaviour with custom type 
> declaration, plus implementation on lexer side.
> 
> I also prefer this data type presentation:
> 
> // custom Bison location type to support raw byte positions
> struct _YYLTYPE {
>int first_line;
>int first_column;
>int last_line;
>int last_column;
>int first_byte;
>int length_bytes;
> };
> #define YYLTYPE _YYLTYPE
> #define YYLTYPE_IS_DECLARED 1
> 
> // override Bison's default location passing to support raw byte positions
> #define YYLLOC_DEFAULT(Cur, Rhs, N) \
> do  \
>  if (N)\
>{   \
>  (Cur).first_line   = YYRHSLOC(Rhs, 1).first_line; \
>  (Cur).first_column = YYRHSLOC(Rhs, 1).first_column;   \
>  (Cur).last_line= YYRHSLOC(Rhs, N).last_line;  \
>  (Cur).last_column  = YYRHSLOC(Rhs, N).last_column;\
>  (Cur).first_byte   = YYRHSLOC(Rhs, 1).first_byte; \
>  (Cur).length_bytes = (YYRHSLOC(Rhs, N).first_byte  -  \
>YYRHSLOC(Rhs, 1).first_byte) +  \
>YYRHSLOC(Rhs, N).length_bytes;  \
>}   \
>  else  \
>{   \
>  (Cur).first_line   = (Cur).last_line   =  \
>YYRHSLOC(Rhs, 0).last_line; \
>  (Cur).first_column = (Cur).last_column =  \
>YYRHSLOC(Rhs, 0).last_column;   \
>  (Cur).first_byte   = YYRHSLOC(Rhs, 0).first_byte; \
>  (Cur).length_bytes = YYRHSLOC(Rhs, 0).length_bytes;   \
>}   \
> while (0)
> 
> Because sometimes you need high level column & line span, and sometimes you 
> rather need low level raw byte position & byte length in the input data 
> stream.

For the purpose of writing out the line in the error messages, this method 
(using C++) did not work out well, because I have two parsers, one for the 
language and one for directives, and it turns out to be difficult to pass the 
location information back to the top parser.

So instead, in addition to the input stream stack, I added two, for the current 
stream position, and the current stream line position. Because of the lexer 
buffering, they are computed in the lexer. These are properties attached to the 
input streams then, not the parser locations.

In the Bison type, I use line number and for columns the number of UTF-8 
characters. An ASCII caret marking the error is surprisingly accurate even in 
the presence of non-ASCII characters. But perhaps one should have a method to 
mark it on the line itself, not underneath.





Re: Syntax error messages

2021-09-28 Thread Hans Åberg


> On 27 Sep 2021, at 22:02, Akim Demaille  wrote:
> 
> Hi Hans,
> 
>> Le 27 sept. 2021 à 20:54, Hans Åberg  a écrit :
>> 
>> In order to generate better syntax error messages writing out the input line 
>> with the error and a line with a marker underneath, I thought of checking 
>> how Bison does it, but I could not find the place in its sources. 
>> —Specifically, a suggestion is to tweak YY_INPUT in the lexer to buffer one 
>> input line at a time, but Bison does not seem to do that.
> 
> No, I keep track of the byte offset in the file, and print from the file, 
> which I reopen to quote the source.

In C++, I can do that using tellg and seekg without reopening the file, and 
then search for the line, which is an easy hack.




Re: Syntax error messages

2021-09-27 Thread Hans Åberg


> On 27 Sep 2021, at 22:02, Akim Demaille  wrote:
> 
> Hi Hans,
> 
>> Le 27 sept. 2021 à 20:54, Hans Åberg  a écrit :
>> 
>> In order to generate better syntax error messages writing out the input line 
>> with the error and a line with a marker underneath, I thought of checking 
>> how Bison does it, but I could not find the place in its sources. 
>> —Specifically, a suggestion is to tweak YY_INPUT in the lexer to buffer one 
>> input line at a time, but Bison does not seem to do that.
> 
> No, I keep track of the byte offset in the file, and print from the file, 
> which I reopen to quote the source.

OK. I thought of this method, but then it does not work with streams.

> Almost everything is in src/location.[ch].  It is location_caret that quotes 
> the input file and underline the error's location.

Thanks.





Syntax error messages

2021-09-27 Thread Hans Åberg
In order to generate better syntax error messages writing out the input line 
with the error and a line with a marker underneath, I thought of checking how 
Bison does it, but I could not find the place in its sources. —Specifically, a 
suggestion is to tweak YY_INPUT in the lexer to buffer one input line at a 
time, but Bison does not seem to do that.





Re: text is not parsed correctly due to shift/reduce conflict

2021-07-25 Thread Hans Åberg
Indeed, there is an example in the Bison manual, sec. 1.5.2, on how to use GLR 
resolve C++ style ambiguities. C++ is not LALR, so if using that, one has to 
write a grammar for a larger language and cut it down in the actions.


> On 25 Jul 2021, at 09:45, Alex Shkotin  wrote:
> 
> or try %glr-parser command - it helps in my case:-)
> 
> сб, 24 июл. 2021 г. в 21:16, Hans Åberg :
> 
> > On 24 Jul 2021, at 16:34, Guenther Sohler  wrote:
> > 
> > When trying to code a c language parser I got a issue with shift/reduce
> > conflict in bison, which actually hurts me.
> 
> You might check the LALR(1) grammars for C and C++ others have done. Two 
> examples:
> https://isocpp.org/wiki/faq/compiler-dependencies#yaccable-grammar
> http://www.quut.com/c/ANSI-C-grammar-y.html
> 
> 
> 




Re: text is not parsed correctly due to shift/reduce conflict

2021-07-24 Thread Hans Åberg


> On 24 Jul 2021, at 16:34, Guenther Sohler  wrote:
> 
> When trying to code a c language parser I got a issue with shift/reduce
> conflict in bison, which actually hurts me.

You might check the LALR(1) grammars for C and C++ others have done. Two 
examples:
https://isocpp.org/wiki/faq/compiler-dependencies#yaccable-grammar
http://www.quut.com/c/ANSI-C-grammar-y.html





Re: Collecting statistics after parsing

2021-05-04 Thread Hans Åberg


> On 4 May 2021, at 20:09, Maury Markowitz  wrote:
> 
> Before I reinvent the wheel: I'm wondering if anyone has some boilerplate 
> code for printing out the frequency of tokens found in the source?

The Bison parser calls yylex when getting new tokens, so you might it make it.




Re: Resolving shift/reduce conflicts?

2021-02-02 Thread Hans Åberg


> On 2 Feb 2021, at 07:50, Christoph Grüninger  wrote:
> 
> Dear Bisons,
> 
> I have another issue within the CMake parser code. When using the
> attached cmDependsJavaParser.y with Bison 3.7.5, i get the following
> warning: 4 shift/reduce conflicts [-Wconflicts-sr]. When adding
> -Wcounterexamples I get the output below. Ok, I understand the issue and
> Bison is right.
> But what should I do to get rid of the problem?

One way is to add precedences %left, %right, %nonassoc, to the tokens 
immediately before and after the parsing dot . in the conflicting rules. If 
there are no such tokens, the grammar must be rewritten, or using GLR.





Re: Debugging "$4"

2020-10-09 Thread Hans Åberg


> On 9 Oct 2020, at 06:13, Akim Demaille  wrote:
> 
> Hi Hans,

Hello,

>> Le 8 oct. 2020 à 18:06, Hans Åberg  a écrit :
>> 
>> When you compile, did you get a shift/reduce conflict? —I recall Bison 
>> chooses the reduce over shift.
> 
> Nope, in unresolved S/R conflicts, shifts win.  That's what you want for 
> if-then-else.

Bummer!





Re: Debugging "$4"

2020-10-08 Thread Hans Åberg


> On 7 Oct 2020, at 22:52, Maury Markowitz  wrote:
> 
> I have these rules in my BASIC.y:
> 
> statements:
>   statement
>   {
> $$ = g_list_prepend(NULL, $1);
>   }
>   |
>statement ':' statements
>{
> $$ = g_list_prepend($3, $1);
>}
>   ;

When you compile, did you get a shift/reduce conflict? —I recall Bison chooses 
the reduce over shift.

One can see these in the .output file, generated when invoking the verbose 
option, see the Bison manual, 8.1 Understanding Your Parser.

Also, left recursion is preferred, see 3.3.3 Recursive Rules.

And one can use $ variable names, more readable and avoid mistakes:
  IF expression[x] THEN statements[y] { … $x … $y … }

> statement:
> (among others...)
>   IF expression THEN statements
>   {
> statement_t *new = make_statement(IF);
> new->parms._if.condition = $2;
> new->parms._if.then_expression = $4;
> new->parms._if.then_linenumber = 0;
> $$ = new;
>   }
> 
> There is a single routine that runs the statements. When I feed it this:
> 
> 100 PRINT "1":PRINT"2"
> 
> I get 1\n2 as expected. However, when I tell it to run then_expression with 
> statements
> 
> 200 IF 1=1 THEN PRINT"3":PRINT"4"
> 
> I get only 3. I *believe* that $4 is a correctly formatted g_list of 
> statements, but how do I test that assumption? How does one print out the 
> value of $4 in the debugger? 

Also perhaps try using a terminator on the rule, just to see if something 
changes.
  IF expression THEN statements "."

In BASIC, there is probably an expected implicit line terminator. It might help 
simplifying the grammar by implementing it in the lexer.





Re: C preprocessor

2020-08-14 Thread Hans Åberg


> On 13 Aug 2020, at 07:49, Giacinto Cifelli  wrote:
> 
> I am wondering if it is possible to interpret a c-preprocessor (the second
> preprocessor, not the one expanding trigrams and removing "\\\n") or an m4
> grammar through bison, and in case if it has already been done.
> I think  this kind of tool does not produce a type-2 Chomsky grammar,
> rather a type-1 or even type-0.
> Any idea how to build something like an AST from it?

There is a Yaccable C-grammar:
  http://www.quut.com/c/ANSI-C-grammar-y.html





Re: Which lexer do people use?

2020-07-04 Thread Hans Åberg


> On 3 Jul 2020, at 23:15, Daniele Nicolodi  wrote:
> 
> Which other scanners do people use?

You might ask this question in the Usenet newsgroup comp.compilers.





Re: Token value in custom error reporting

2020-06-18 Thread Hans Åberg


> On 18 Jun 2020, at 19:21, Akim Demaille  wrote:
> 
>> Le 18 juin 2020 à 19:11, Hans Åberg  a écrit :
>> 
>>> On 18 Jun 2020, at 18:56, Akim Demaille  wrote:
>>> 
>>> I have already explained what I don't think this is a good idea.
>>> 
>>> https://lists.gnu.org/r/help-bison/2020-06/msg00017.html
>>> 
>>> I also have explained that scanner errors should be handled
>>> by the scanner.  For instance, in the bistro, you can read:
>>> 
>>> int
>>> yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc)
>>> {
>>> int c;
>>> 
>>> [...]
>>> 
>>> switch (c)
>>>  {
>>> [...]
>>>// Stray characters.
>>>  default:
>>>yyerror (yylloc, "syntax error: invalid character: %c", c);
>>>return TOK_YYerror;
>>>  }
>>> }
>>> 
>>> Cheers!
>> 
>> Is that not the case, which I responded to, where you get double error 
>> messages, both from the lexer and parser?
> 
> No, that's the whole point of YYerror.
> 
> In the news of 3.6:
> 
> *** Returning the error token
> 
>  When the scanner returns an invalid token or the undefined token
>  (YYUNDEF), the parser generates an error message and enters error
>  recovery.  Because of that error message, most scanners that find lexical
>  errors generate an error message, and then ignore the invalid input
>  without entering the error-recovery.
> 
>  The scanners may now return YYerror, the error token, to enter the
>  error-recovery mode without triggering an additional error message.  See
>  the bistromathic for an example.

Ah, I thought one should have something like that.

Otherwise, in your link above you suggest not using the semantic value in error 
messages, but when using locations, it contains the token delimitations. So 
there seems to be no advantage letting the lexer generating the error.





Re: Token value in custom error reporting

2020-06-18 Thread Hans Åberg


> On 18 Jun 2020, at 18:56, Akim Demaille  wrote:
> 
>> Le 18 juin 2020 à 14:54, Hans Åberg  a écrit :
>> 
>> In my C++ parser, the lexer has rule
>> .  { return my_parser::token::token_error; }
>> 
>> When it is triggers, I get the error:
>> :21.1: error: syntax error, unexpected token error
>> 
>> It might be nicer to actually write out this token, though.
> 
> I have already explained what I don't think this is a good idea.
> 
> https://lists.gnu.org/r/help-bison/2020-06/msg00017.html
> 
> I also have explained that scanner errors should be handled
> by the scanner.  For instance, in the bistro, you can read:
> 
> int
> yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc)
> {
>  int c;
> 
> [...]
> 
>  switch (c)
>{
> [...]
>  // Stray characters.
>default:
>  yyerror (yylloc, "syntax error: invalid character: %c", c);
>  return TOK_YYerror;
>}
> }
> 
> Cheers!

Is that not the case, which I responded to, where you get double error 
messages, both from the lexer and parser?







Re: Token value in custom error reporting

2020-06-18 Thread Hans Åberg


> On 18 Jun 2020, at 10:24, Daniele Nicolodi  wrote:
> 
> On 18/06/2020 00:39, Akim Demaille wrote:
>> 
>> Would you have an example of what you mean?
> …
> In the existing code, on error the lexer emits a LEX_ERROR token. This
> results in a grammar error that triggers error recovery (good) but also
> in an extra error emitted by Bison (bad). Right now the code checks the
> error messages in yyerror() and suppresses the unwanted error reporting
> if it contains the string "LEX_ERROR”.

In my C++ parser, the lexer has rule
.  { return my_parser::token::token_error; }

When it is triggers, I get the error:
  :21.1: error: syntax error, unexpected token error

It might be nicer to actually write out this token, though.





Re: Dynamic tokens

2020-02-03 Thread Hans Åberg


> On 3 Feb 2020, at 16:33, Ervin Hegedüs  wrote:
…
>>> Example from the language:
>>> @eq 1
>>> @lt 2
>>> @streq foo
>>> 
>>> The problem is that the LANG_OP_ARGUMENT could be anything - for example,
>>> that could be also starts with "@". So, the next expression is valid:
>>> 
>>> @streq @streq
>> 
>> So here you might have a context switch that is set when the operator token 
>> comes, that says that the next token, even if it is a valid operator name, 
>> should be treated as an argument. It when the argument is finished, set the 
>> switch back.
>> 
>>> Now I'm using this rules:
>>> @[a-z][a-zA-Z0-9]+ { BEGIN(ST_LANG_OP); return LANG_OP; }
>>> 
>>> 
>>> but now the operator isn't optional.
>> 
>> Something must follow in the grammar, so the switch may be set back in the 
>> grammar. Check in the .output file for clues.
> 
> so, you think (if I understand correctly) something like this:
> 
> @[a-z][a-zA-Z0-9]+  { BEGIN(ST_LANG_OP); if(op_valid(yytext); { return 
> LANG_OP; } else { ... } }
> ….

Something like that.





Re: Dynamic tokens

2020-02-03 Thread Hans Åberg


> On 2 Feb 2020, at 20:29, Ervin Hegedüs  wrote:
> 
> is there any way to make a parser with "dynamic" tokens?
> 
> I mean in compiling time I don't know the available tokens.

It is not possible to have dynamically created token values, …

> Now I describe
> the necessary token with regex, but I bumped into a problem.

… but it might be possible to use augmented methods.

> The language syntax is some like this:
> 
> EXPRESSION: LANG_OP LANG_OP_ARGUMENT | LANG_OP_ARGUMENT
> 
> where (as you can see) the LANG_OP is optional. If there isn't LANG_OP,
> that means that is the most usable operator (namely "@rx" in my case). The
> syntax of the operator (with regex): "@[a-z][a-zA-Z0-9]+".
> 
> Example from the language:
> @eq 1
> @lt 2
> @streq foo
> 
> The problem is that the LANG_OP_ARGUMENT could be anything - for example,
> that could be also starts with "@". So, the next expression is valid:
> 
> @streq @streq

So here you might have a context switch that is set when the operator token 
comes, that says that the next token, even if it is a valid operator name, 
should be treated as an argument. It when the argument is finished, set the 
switch back.

> Now I'm using this rules:
> @[a-z][a-zA-Z0-9]+ { BEGIN(ST_LANG_OP); return LANG_OP; }
> 
> 
> but now the operator isn't optional.

Something must follow in the grammar, so the switch may be set back in the 
grammar. Check in the .output file for clues.

> If I write in the language:
> 
> "@rx" that means that's an operator argument, without operator.

One can have a symbol table that stores all operator names. If it is not there, 
return it as ain identifier. This way, one can dynamically define new operator.

If further, the table stores the token value, it can be used for other object, 
like variables that may have different syntax depending on type.





Re: Further C++ operators for position

2019-11-05 Thread Hans Åberg


> On 5 Nov 2019, at 07:51, Akim Demaille  wrote:
> 
>> Le 4 nov. 2019 à 21:16, Hans Åberg  a écrit :
>> 
>> 
>>> On 4 Nov 2019, at 18:12, Akim Demaille  wrote:
>>> 
>>>> Le 4 nov. 2019 à 17:03, Matthew Fernandez  a 
>>>> écrit :
>>>> 
>>>> The std::less implementation you suggest is to also lexicographically 
>>>> compare the filenames themselves? I’m not sure this makes sense, because 
>>>> source positions from two different files aren’t really orderable at all.
>>> 
>>> The point of defining std::less is to have an easy means to insert 
>>> positions in a sorted container, say std::map.  Now, the order in itself is 
>>> well defined, but my not reflect the order the user would like to see.
>>> 
>>> To be clear: I don't have a problem with std::less which I see as an 
>>> implementation detail, but operators such as <= and the like are different: 
>>> they express a total
> 
> (I meant "natural" here).
> 
>>> order that we can't implement easily.  
>> 
>> The total order is expressed via std::less in containers such as std::map, 
>> with undefined results if not fulfilling the specs for that.
> 
> Yes, but that's not my point.  I mean: it is not important std::less "means" 
> something natural, what matters is only that it's total and well-defined 
> (unless, of course, you make this order visible to the user).  So I wouldn't 
> mind defining std::less for position and locations.

The point of implementing it would in use of containers like std::map, which 
will assume that std::less can be to define a total order.

> But operator<= is expected to mean something natural (in addition to well 
> defined and total).  So I would not define such an operator (except with a 
> global offset/counter).

It is only in C++ they are expected to relate to a total order, common use in 
partially ordered sets [1]. In fact, I defined a type “order” with values 
unordered, less, equal, greater, and it can be used to define a partial order. 

1. https://en.wikipedia.org/wiki/Partially_ordered_set


>>> In addition, think of C where you also have main.c that #include "foo.h" 
>>> somewhere, which results in main.c:1 (i.e., line 1) < foo.h:1 < ... < 
>>> foo.h:42 < ... < main.c:3.
>> 
>> Here the files are stacked, and if the nested files are closed after being 
>> read, the location pointers are dead.
> 
> W00t?  Typical parsers generate ASTs and typical ASTs are decorated with 
> locations.

Only that when open an included file, one may use yyin = new 
std::ifstream(str), where all data, buffers and locations are stacked. Then 
after the file has been read, it is closed and the yyin pointer is deallocated.





Re: Further C++ operators for position

2019-11-04 Thread Hans Åberg


> On 4 Nov 2019, at 18:12, Akim Demaille  wrote:
> 
>> Le 4 nov. 2019 à 17:03, Matthew Fernandez  a 
>> écrit :
>> 
>> The std::less implementation you suggest is to also lexicographically 
>> compare the filenames themselves? I’m not sure this makes sense, because 
>> source positions from two different files aren’t really orderable at all.
> 
> The point of defining std::less is to have an easy means to insert positions 
> in a sorted container, say std::map.  Now, the order in itself is well 
> defined, but my not reflect the order the user would like to see.
> 
> To be clear: I don't have a problem with std::less which I see as an 
> implementation detail, but operators such as <= and the like are different: 
> they express a total order that we can't implement easily.  

The total order is expressed via std::less in containers such as std::map, with 
undefined results if not fulfilling the specs for that.

> In addition, think of C where you also have main.c that #include "foo.h" 
> somewhere, which results in main.c:1 (i.e., line 1) < foo.h:1 < ... < 
> foo.h:42 < ... < main.c:3.

Here the files are stacked, and if the nested files are closed after being 
read, the location pointers are dead.

> If we want a total order here, it's actually easy: positions should have a 
> counter somewhere which is the *total* "offset" since the first byte of the 
> first file.  Or something like that.





Re: Further C++ operators for position

2019-11-04 Thread Hans Åberg


> On 4 Nov 2019, at 07:52, Akim Demaille  wrote:
> 
>> Le 4 nov. 2019 à 05:27, Matthew Fernandez  a 
>> écrit :
>> 
>> I recently had a use case for comparing source positions coming out of a C++ 
>> Bison-produced parser. While operator== and operator!= are implemented on 
>> the position class [0], the ordering operators (<, <=, >, >=) are not. It 
>> was relatively straightforward to implement these myself, but I was 
>> wondering if these were of wider use and should live upstream in Bison’s 
>> position implementation. Perhaps there is some history behind this or some 
>> deliberate omission of these operators? Just wanted to ask if there’s a 
>> reason these don’t already exist before thinking about posting a patch. I’m 
>> not subscribed to the list, so please CC me in replies.
> 
> The semantics for line and columns are quite clear, so comparing Positions in 
> the same file is quite well defined.
> 
> But what should you do when the files are different?  (And Locations are 
> intervals, so there's no way to compare them totally in a natural order.)
> 
> What we can do, though, is offer implementations for std::less, that would 
> blindly apply the lexicographic order in both cases.
> 
> But the case of file names remains troublesome: should we compare the pointer 
> addresses (super fast, but non deterministic) or the pointees (super slow, 
> but deterministic)?

As it is not semantically well defined, but that one might want a total order 
for use in types like std::map, a pointer comparison might be used. Also 
containers like std::unordered_set have a total order through the iterators, so 
it fits with C++ paradigms, I would think.





Re: [Mesa-dev] Mesa (master): glsl: do not use deprecated bison-keyword

2019-05-22 Thread Hans Åberg

> On 22 May 2019, at 15:41, Hans Åberg  wrote:
> 
>> Otherwise we'd need to
>> commit the generated files, which is tricky because the location
>> depends on where the build-dir is located, and would probably not play
>> well with different developers having different versions, leading to
>> variance in the generated result.
>> 
>> I don't really think this would work for us in Mesa. We no longer
>> generate the distribution using autotools since we switched to Meson,
>> we just do git archive these days.
> 
> Bison itself does not have any of that in the archive, I think.

Actually, it does, and also the Flex general lexer. But perhaps it is a special 
case, as it self-compiles.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: [Mesa-dev] Mesa (master): glsl: do not use deprecated bison-keyword

2019-05-22 Thread Hans Åberg

> On 22 May 2019, at 15:31, Erik Faye-Lund  wrote:
> 
> On Wed, 2019-05-22 at 15:21 +0200, Hans Åberg wrote:
>> 
>> One can set the distribution so that the Bison sources are only re-
>> compiled if modified.
> 
> This would only work for tarballs, though, no?

That is what I have in mind.

> Otherwise we'd need to
> commit the generated files, which is tricky because the location
> depends on where the build-dir is located, and would probably not play
> well with different developers having different versions, leading to
> variance in the generated result.
> 
> I don't really think this would work for us in Mesa. We no longer
> generate the distribution using autotools since we switched to Meson,
> we just do git archive these days.

Bison itself does not have any of that in the archive, I think. One does:

# Getting branch 'maint' form archive:
git clone -b maint https://git.savannah.gnu.org/git/bison.git

# Then, cf. file README-hacking:
cd bison/

git submodule update --init

./bootstrap
./configure
make

make check
git diff



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: [Mesa-dev] Mesa (master): glsl: do not use deprecated bison-keyword

2019-05-22 Thread Hans Åberg


> On 22 May 2019, at 08:54, Erik Faye-Lund  wrote:
> 
> The problem is that Bison doesn't seem to have any mechanism for doing
> statements like these conditionally. The only way around that that I
> can see is to pre-process the source files somehow. But especially with
> three different build systems, that's not really a venue I find
> particularly attractive. If someone can think of a neat solution, I
> would certainly love to hear about it :)

One can set the distribution so that the Bison sources are only re-compiled if 
modified.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: how to get left hand side symbol in action

2019-05-10 Thread Hans Åberg

> On 10 May 2019, at 07:24, Akim Demaille  wrote:
> 
> 1. there is a real and valid need for the feature, which I still need
>   to be convinced of, especially because symbol names are technical
>   details!

One can also write better error messages by using these internal yytname_ table 
names:

If one checks on a lookup table whether the name has been already defined and 
it is, then one can give information about that already present name. For 
example:
  “name” {
  std::optional x0 = 
my::symbol_table.find($x.text);

  if (x0) {
throw syntax_error(@x, "Name " + $x.text + " already defined in this 
scope as "
  + yytnamerr_(yytname_[x0->first - 255]));
  }
…
  }

Right now, all parts are internal and may change: the token translation 
x0->first - 255, yytname_ lookup, and error message cleanup yytnamerr_.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: how to get left hand side symbol in action

2019-05-10 Thread Hans Åberg


> On 10 May 2019, at 07:24, Akim Demaille  wrote:
> 
>> In practice you just need the symbol name as is. Nobody needs the 
>> translation,
> 
> I beg to disagree.  Nobody should translate the keyword "break",
> but
> 
>> # bison /tmp/foo.y
>> /tmp/foo.y:1.7: erreur: erreur de syntaxe, : inattendu, attendait char ou 
>> identifier ou 
>>1 | %token: FOO
>>  |   ^
> 
> looks stupid; "char", "identifier" and "" should be translated.

I think it should only output whatever is in the yytname_ table. Does not the 
translation taking place dynamically of that in the error message?



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: how to get left hand side symbol in action

2019-05-09 Thread Hans Åberg

> On 9 May 2019, at 08:50, Akim Demaille  wrote:
> 
>> Le 6 mai 2019 à 22:45, Hans Åberg  a écrit :
>> 
>>> On 6 May 2019, at 18:09, Akim Demaille  wrote:
>>> 
>>>> Le 6 mai 2019 à 14:50, Hans Åberg  a écrit :
>>>> 
>>>>> On 6 May 2019, at 11:28, r0ller  wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Is it possible in *any* way to get the left hand side symbol in an action 
>>>>> of a rule? Say, I have:
>>>>> 
>>>>> A : B C
>>>>> {
>>>>> std:cout<<"left hand side symbol is:"<>>>> };
>>>>> 
>>>>> I tried to find it out myself and googled a lot but didn't find anything:(
>>>> 
>>>> In the C++ parser, one can write:
>>>> std::cout << “LHS: " << yytname_[yylhs.type_get()] << std::endl;
>>> 
>>> But it's an internal detail, there is no guarantee it won't change.
>> 
>> Right, so it might be a feature request for the longer term.
> 
> I'm trying to see what would make sense.
> 
>> Perhaps a variation of $ and @ that gives access to the name,
> 
> I am very uncomfortable with this.  Symbol names are technical details,
> most of the time they are irrelevant to the end user, just like the
> the user of a piece of software does not care about the names of the
> functions: that's a implementation detail.
> 
> In addition, tokens have several names: the identifier, and the
> string name, like
> 
> %token  ID "identifier"
> 
> Not to mention that I also want to provide support for
> internationalization.  So what name should that be? ID?
> identifier? or identifiant in French?
> 
> Of course when you debug a grammar, the names of the symbols
> are very important, and that's why the debug traces need the
> symbol names.  Again, like when you debug a program: then
> function names matter.
> 
> In the present case, I believe that the names that r0ller want
> should really be part of *his* specification, they should
> not come from internal details such as the symbol name.  So
> I do think it is saner that the names are explicitly put in the
> action.
> 
>> or the raw stack value in case there are more stuff to access.
> 
> Which only exists in lalr1.cc.  And I think r0ller is using
> glr.cc.  Maybe once Valentin is done there will be symbols.

Perhaps it is best to see what he wants, which looks complicated, and perhaps 
provide something more stable. I can’t recall any other request for the grammar 
variable names.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: Parsing user-defined types

2019-05-08 Thread Hans Åberg

> On 8 May 2019, at 22:30, EML  wrote:
> 
> Sometimes, to make the grammar manageable, the lexer has to *dynamically* 
> return 'typename' instead of 'identifier'. Only semantic analysis can 
> determine what is a user-defined type (say 'foo'), so the lexer must be told 
> at runtime that 'foo' is a 'typename' and not an 'identifier'.

That is done by the method I indicated. In flex have a rule:

identifier  [[:alpha:]][[:alnum:]]+

%%

{identifier} {
  std::optional> x = 
lookup_table.find(yylval.text);

  if (!x)
return my::yyparser::token::identifier;

  // Set semantic value return to x->second.

  return x->first;
}

The Bison parser will then get the token of whatever the identifier has been 
defined to. It will have rules like:

%token int_definition
%token int_variable
%token identifier

%%

definition:
  int_definition identifier[x] value[v] {
lookup_table.push($x, {my::yyparser::token::int_variable, $v});
  }

use_value:
  … int_variable[x] … { … $x … }



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: Parsing user-defined types

2019-05-08 Thread Hans Åberg


> On 8 May 2019, at 19:48, EML  wrote:
> 
> I'm having trouble seeing how to handle user-defined types without lots of 
> feedback from the parser to the lexer. For example, consider a C-like 
> language with a struct declaration:
> 
> foo() {
>  struct a {...};  // type defn
>  struct a b;  // declare object 'b' of user-defined type 'a'
> }
> 
> this is easy to parse, but if you add a typedef, or go to C++, you can have 
> code that looks like this:
> 
> foo() {
>  struct a {...};
>  a b;
> }
> 
> With a simple flex/bison setup this is likely to lead to a lot of conflicts. 
> So how do you handle this? Do you just work through the conflicts, if 
> possible, or is this a job for a hand-coded lexer, which can be told about 
> new types at runtime?

One can store the Bison token value on the lookup table that the lexer uses. So 
the lexer matches the identifier, then, checks if it has been identified, and 
if so, returns its token and semantic values. Otherwise it is just a name, like 
in a definition, whose Bison rule action puts it on the lookup table.


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: how to get left hand side symbol in action

2019-05-06 Thread Hans Åberg

> On 6 May 2019, at 18:09, Akim Demaille  wrote:
> 
>> Le 6 mai 2019 à 14:50, Hans Åberg  a écrit :
>> 
>> 
>>> On 6 May 2019, at 11:28, r0ller  wrote:
>>> 
>>> Hi All,
>>> 
>>> Is it possible in *any* way to get the left hand side symbol in an action 
>>> of a rule? Say, I have:
>>> 
>>> A : B C
>>> {
>>>   std:cout<<"left hand side symbol is:"<>> };
>>> 
>>> I tried to find it out myself and googled a lot but didn't find anything:(
>> 
>> In the C++ parser, one can write:
>> std::cout << “LHS: " << yytname_[yylhs.type_get()] << std::endl;
> 
> But it's an internal detail, there is no guarantee it won't change.

Right, so it might be a feature request for the longer term. Perhaps a 
variation of $ and @ that gives access to the name, or the raw stack value in 
case there are more stuff to access.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: how to get left hand side symbol in action

2019-05-06 Thread Hans Åberg

> On 6 May 2019, at 15:21, uxio prego  wrote:
> 
>> On 6 May 2019, at 14:50, Hans Åberg  wrote:
>> 
>>> On 6 May 2019, at 11:28, r0ller  wrote:
>>> 
>>> Is it possible in *any* way to get the left hand side symbol in an action 
>>> of a rule? Say, I have:
>>> 
>>> A : B C
>>> {
>>>   std:cout<<"left hand side symbol is:"<>> };
>> 
>> In the C++ parser, one can write:
>> std::cout << “LHS: " << yytname_[yylhs.type_get()] << std::endl;
>> 
>> Used for debugging, perhaps there is a more reliable macro.
> 
> Thanks for the hint.
> Do you mean you know that it won’t work in a C parser or yacc compatibility?

It probably has a similar thing there. Check in the generated parser.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: how to get left hand side symbol in action

2019-05-06 Thread Hans Åberg

> On 6 May 2019, at 11:28, r0ller  wrote:
> 
> Hi All,
> 
> Is it possible in *any* way to get the left hand side symbol in an action of 
> a rule? Say, I have:
> 
> A : B C
> {
> std:cout<<"left hand side symbol is:"< };
> 
> I tried to find it out myself and googled a lot but didn't find anything:(

In the C++ parser, one can write:
  std::cout << “LHS: " << yytname_[yylhs.type_get()] << std::endl;

Used for debugging, perhaps there is a more reliable macro.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: Compilation error in ancient YACC code

2019-04-16 Thread Hans Åberg
[Please keep the cc to the list so that others can follow the issue.]

> On 16 Apr 2019, at 01:23, Chris Bragg  wrote:
> 
> Hans,
> Thanks for the suggestion. It worked and I now have a YACC output file.
> Very grateful for your response.

Good.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Compilation error in ancient YACC code

2019-04-15 Thread Hans Åberg

> On 15 Apr 2019, at 22:37, Chris Bragg  wrote:
> 
> The YACC compiler generates only two errors. I suspect that the problem is a
> minor one and that a small fix to the syntax will fix the problem. I am not
> a YACC programmer - it is a complete mystery to me - and I was hoping a
> posting of the yacc source code up to the point the errors occur and the
> error report itself would allow someone to spot the problem. 
> 
> The YACC errors reported are follows:
> 
> calc.y:121.14: error: syntax error, unexpected =
> 
>   expr = { Expression = $1;};
> 
>  ^
> 
> calc.y:123.23: error: syntax error, unexpected =
> 
>   '('  expr ')' = { $$

Just take away that = before all actions {…}. Somebody else might enlighten on 
the history of that.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: simple problem

2019-02-24 Thread Hans Åberg


> On 23 Feb 2019, at 23:19, workbe...@gmx.at  wrote:
> 
> Hi again, i have the following code:

Here is a slightly different code, for example with input "x1 y2 34 ? exit". 
The lexer will rescan unless there is an action return, which happens for 
"exit". Typically in a lexer, whitespace is eaten, dividing up the tokens.

Otherwise, Flex has a mailing list, cf.
  https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator)
The manual is good, too.

Also, there is the Usenet newsgroup comp.compilers for general questions about 
parsing, mostly advanced.

--
%{
#include 
#include 
%}

%option noyywrap

identifier  [[:alpha:]][[:alnum:]]*
integer [[:digit:]]+

%%

[ \f\r\t\v\n]+  { /* eat whitespace */ }
"exit"  { printf("Exit!\n"); return 0; }
{identifier}{ printf("%s is an identifer\n", yytext); }
{integer}   { printf("%s is an integer\n", yytext); }
.   { printf("/* %s: do nothing*/\n", yytext); }

%%

int main(int argc, char **argv)
{

printf("Start\n");
yylex();
printf("End\n");

return(0);
}
--



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Not able to use %union?

2019-02-18 Thread Hans Åberg


> On 18 Feb 2019, at 04:37, Peng Yu  wrote:
> 
> I use rapidstring to make the string operations use and use the Boehm
> garbage collector so that I don't have to always remember to release
> the memory.
> 
> https://github.com/boyerjohn/rapidstring
> 
> Because I want to use the previously allocated memory, I don't want to
> call "rs_init(>str)" in any action.

That is probably not needed (Akim may clarify): Without a GC, just hand over 
the lexer allocated pointer to the parser marked with a type, and set 
%destructor for this type to handle error cleanup when the parser stack 
unwinds; otherwise deallocate in the rules.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Not able to use %union?

2019-02-18 Thread Hans Åberg


> On 18 Feb 2019, at 04:37, Peng Yu  wrote:
> 
> I use rapidstring to make the string operations use and use the Boehm
> garbage collector so that I don't have to always remember to release
> the memory.
> 
> https://github.com/boyerjohn/rapidstring
> 
> Because I want to use the previously allocated memory, I don't want to
> call "rs_init(>str)" in any action. So YYSTYPE must be a
> struct instead of a union.

The Boehm GC probably can find pointers in unions, a link suggested there might 
be a problem with guessing correctly for packed structs.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison info doc - precedence in recursive parsing

2019-02-18 Thread Hans Åberg

> On 18 Feb 2019, at 08:28, Akim Demaille  wrote:
> 
>> Le 17 févr. 2019 à 23:00, Hans Åberg  a écrit :
>> 
>>> On 17 Feb 2019, at 16:19, Akim Demaille  wrote:
>>> 
>>>> Le 10 févr. 2019 à 15:20, Hans Åberg  a écrit :
>>>> 
>>>>> On 10 Feb 2019, at 11:07, Akim Demaille  wrote:
>>>>> 
>>>>> [*.dot vs. *.gv]
>>>>> But it's too late to change the default behavior.
>>>> 
>>>> You might change it, as it is not usable on real life grammars.
>>> 
>>> You have a point :)
>> 
>> Or a dot! :-)
> 
> Yes, I'll gv you that :)

:-)


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: How to decide what to put in the lexer and the grammar respectively?

2019-02-18 Thread Hans Åberg

> On 18 Feb 2019, at 06:44, Akim Demaille  wrote:
> 
>> Le 18 févr. 2019 à 00:10, Hans Åberg  a écrit :
>> 
>>> On 17 Feb 2019, at 23:10, Peng Yu  wrote:
>>> 
>>> This lexical tie-in creates feedback from the parser to the lexer. So
>>> the lexer cannot be tested standalone.
>>> 
>>> But the principle of separating lexer and parser is to make parser
>>> builtin on top of the parser. Is there something that can avoid the
>>> feedback yet still allow context-dependent parsing? Alternatively, how
>>> to just testing the lexer without having to get the parser involved?
>> 
>> The LARL(1) that Bison uses is for context free grammars only, so contexts 
>> must involve switches somehow.
> 
> I don't think Peng was referring to context-sensitivity here in the
> technical sense.  He just means "lexical context depending on the
> current state of the parser", which is still CFG.  Let's not confuse
> people by referring to something unrelated.

Subsequent discussions might clarify that.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: How to decide what to put in the lexer and the grammar respectively?

2019-02-17 Thread Hans Åberg


> On 17 Feb 2019, at 23:10, Peng Yu  wrote:
> 
> This lexical tie-in creates feedback from the parser to the lexer. So
> the lexer cannot be tested standalone.
> 
> But the principle of separating lexer and parser is to make parser
> builtin on top of the parser. Is there something that can avoid the
> feedback yet still allow context-dependent parsing? Alternatively, how
> to just testing the lexer without having to get the parser involved?

The LARL(1) that Bison uses is for context free grammars only, so contexts must 
involve switches somehow. Think of a definition which changes a name into an 
identifier, implemented by putting it on a (typically stacked) lookup table, 
which the lexer checks, and returns its token value to the parser. A grammar 
definition alone would require an attribute grammar.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison info doc - precedence in recursive parsing

2019-02-17 Thread Hans Åberg

> On 17 Feb 2019, at 16:19, Akim Demaille  wrote:
> 
>> Le 10 févr. 2019 à 15:20, Hans Åberg  a écrit :
>> 
>>> On 10 Feb 2019, at 11:07, Akim Demaille  wrote:
>>> 
>>> [*.dot vs. *.gv]
>>> But it's too late to change the default behavior.
>> 
>> You might change it, as it is not usable on real life grammars.
> 
> You have a point :)

Or a dot! :-)

> But it does not mean it will not break something for someone.
> Maybe bound to %require "3.4", why not.

Since .dot does not work in some editors, it might good to use .gv.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: How to decide what to put in the lexer and the grammar respectively?

2019-02-17 Thread Hans Åberg


> On 17 Feb 2019, at 17:36, Peng Yu  wrote:
> 
> But how to recognize the nested parameter expansion assignment in the
> first place? The lexer should have builtin states to capture paired
> `{` `}`, and use states to remember whether it is in substring
> extraction or pattern replacement in order to make sure to capture any
> errors at the level of the lexer.

Such matched pairs can be recognized in the lexer by using an integer starting 
at 0 adding 1 for each '{' and -1 for each '}' when valid. If one gets non-zero 
at the end of the expression, there is a mismatch. The problem is how to 
recognize the end of the expression. The Bison parser does that by a lookahead 
token if needed. So that might suggest to put it on the parser.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison info doc - precedence in recursive parsing

2019-02-17 Thread Hans Åberg

> On 17 Feb 2019, at 16:18, Akim Demaille  wrote:
> 
>> Le 10 févr. 2019 à 15:10, Hans Åberg  a écrit :
>> 
>> 
>>> On 5 Feb 2019, at 07:18, Akim Demaille  wrote:
>>> 
>>> This feature is very handy for small grammars, but when it gets too big, 
>>> you'd better look at the HTML report (or text).
>> 
>> I made a graph for the grammar itself, using the shape=record feature, for 
>> the calc++ example. Might be of help when designing a grammar.
> 
> Can you send the result?  So that we understand better what you mean.

Attached is a version with the rule expansions dotted. Illustrates the 
language, the set of all expansions from the start symbol that end with 
non-terminals in all branches.



grammar.gv
Description: Binary data
___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: Is it always possible to make a non-reentrant parser reentrant?

2019-02-12 Thread Hans Åberg


> On 9 Feb 2019, at 00:21, Peng Yu  wrote:
> 
>> %x INITIAL HEREDOC
> 
> I see %x is from flex. Bash can support nested heredoc. How can it be
> implemented in flex?

For nested environments one uses a stack, if there are local variables to keep 
track of. Such switches will then appropriately have to be turned on and off.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison info doc - precedence in recursive parsing

2019-02-10 Thread Hans Åberg

> On 10 Feb 2019, at 11:07, Akim Demaille  wrote:
> 
>> Le 10 févr. 2019 à 00:12, Hans Åberg  a écrit :
>> 
>>> On 4 Feb 2019, at 07:32, Akim Demaille  wrote:
>>> 
>>> Make a full example, feed it to bison with --graph, and look at the 
>>> resulting graph.  You should understand what is going on (provided you 
>>> understand how LR parsers work).
>> 
>> According to [1], .gv is preferred as .dot can be confused with another 
>> format (for example, Xcode does).
>> 
>> 1. https://en.wikipedia.org/wiki/DOT_(graph_description_language)
> 
> Yes, I know, and that's what I do, as you have noticed in the Makefile.
> 
> %.c %.h %.xml %.gv: %.y
>$(BISON) $(BISONFLAGS) --defines --xml --graph=$*.gv -o $*.c $<
> 
> But it's too late to change the default behavior.

You might change it, as it is not usable on real life grammars.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison info doc - precedence in recursive parsing

2019-02-10 Thread Hans Åberg


> On 5 Feb 2019, at 07:18, Akim Demaille  wrote:
> 
> This feature is very handy for small grammars, but when it gets too big, 
> you'd better look at the HTML report (or text).

I made a graph for the grammar itself, using the shape=record feature, for the 
calc++ example. Might be of help when designing a grammar.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison info doc - precedence in recursive parsing

2019-02-09 Thread Hans Åberg


> On 4 Feb 2019, at 07:32, Akim Demaille  wrote:
> 
> Make a full example, feed it to bison with --graph, and look at the resulting 
> graph.  You should understand what is going on (provided you understand how 
> LR parsers work).

According to [1], .gv is preferred as .dot can be confused with another format 
(for example, Xcode does).

1. https://en.wikipedia.org/wiki/DOT_(graph_description_language)



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison info doc - precedence in recursive parsing

2019-02-05 Thread Hans Åberg

> On 5 Feb 2019, at 18:56, Akim Demaille  wrote:
> 
>> Le 5 févr. 2019 à 10:28, Hans Åberg  a écrit :
>> 
>>> On 5 Feb 2019, at 07:18, Akim Demaille  wrote:
>>> 
>>> Yes, on "real life grammars", Dot fails to render anything.  And the result 
>>> would probably be useless anyway.  This feature is very handy for small 
>>> grammars, but when it gets too big, you'd better look at the HTML report 
>>> (or text).
>> 
>> It only generates XML, it seems: for HTML, using xsltproc, a style sheet is 
>> required, and Bison does not seem to come with that.
> 
> Yes it does.  Have a look at the Makefiles of the examples.  For instance, 
> that of lexcalc.

Ah, I only looked at calc++ and the manual. It worked with my rather large 
grammar.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison info doc - precedence in recursive parsing

2019-02-05 Thread Hans Åberg

> On 5 Feb 2019, at 07:18, Akim Demaille  wrote:
> 
>> Le 4 févr. 2019 à 23:50, Hans Åberg  a écrit :
>> 
>>> On 4 Feb 2019, at 22:59, Uxio Prego  wrote:
>>> 
>>> can’t remember any such graphviz failure, even with graphs
>>> so large, their output isn't actually useful, unless for navigating
>>> with e.g. xdot.
>>> 
>>> I however have only used -Tpng, never -Tpdf. Also no -O, but I
>>> guess that’s simply and works the same for all cases.
>> 
>> It didn't help with PNG, despite running for more than half an hour. 
>> Probably too big.
> 
> Yes, on "real life grammars", Dot fails to render anything.  And the result 
> would probably be useless anyway.  This feature is very handy for small 
> grammars, but when it gets too big, you'd better look at the HTML report (or 
> text).

It only generates XML, it seems: for HTML, using xsltproc, a style sheet is 
required, and Bison does not seem to come with that.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison info doc - precedence in recursive parsing

2019-02-04 Thread Hans Åberg

> On 4 Feb 2019, at 22:59, Uxio Prego  wrote:
> 
> can’t remember any such graphviz failure, even with graphs
> so large, their output isn't actually useful, unless for navigating
> with e.g. xdot.
> 
> I however have only used -Tpng, never -Tpdf. Also no -O, but I
> guess that’s simply and works the same for all cases.

It didn't help with PNG, despite running for more than half an hour. Probably 
too big.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison info doc - precedence in recursive parsing

2019-02-04 Thread Hans Åberg

> On 4 Feb 2019, at 07:32, Akim Demaille  wrote:
> 
> Make a full example, feed it to bison with --graph, and look at the resulting 
> graph.

I could not get an output using 'dot -Tpdf parser.dot -O'. — Perhaps the 
grammar is too large, small .dot examples work.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison info doc - precedence in recursive parsing

2019-02-04 Thread Hans Åberg


> On 3 Feb 2019, at 07:50, an...@aakhare.in wrote:
> 
> The first effect of the precedence declarations is to assign precedence 
> levels to the terminal symbols declared. The second effect is to assign 
> precedence levels to certain rules: each rule gets its precedence from the 
> last terminal symbol mentioned in the components.

If you write the .output file using --verbose or %verbose, the precedence makes 
a choice at the parsing dot "." in the case of a shift/reduce conflict, see the 
Bison manual, sec. 5.2. I thought it was between the lookahead token and the 
token before the dot, but perhaps Akim can clarify.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: [Question]:Question about bison adding GPL copyright to the parser files generated by bison and yacc.c

2019-01-26 Thread Hans Åberg

> On 26 Jan 2019, at 23:54, Christian Schoenebeck  
> wrote:
> 
> On Samstag, 26. Januar 2019 22:29:08 CET Hans Åberg wrote:
>>> No, that's not what the exception sais. The exception applies (and hence
>>> the freedom to distribute a Bison generated parser under any arbitrary,
>>> different license than GPL) only if the generated parser is not itself a
>>> parser generator. This is not as obvious as you might think. It really
>>> depends on what his generated parser is capable to do.
>> 
>> From a legal point of view, copyright applies to the code in the skeleton
>> file, as the other part is considered machine generated, like in an editor,
>> and not copyrightable.
> 
> Yes, the copyright applies to the skeleton. But if the exception does not 
> apply, and since your Bison generated parser contains the skeleton, the 
> result 
> would be that your entire application would be subject to the GPL.

And that was the case in some earlier versions.

> So the point of whether or not the exception applies to your Bison generated 
> parser, is crucial if you intend to use Bison for developing a proprietary 
> application.

Indeed.

>>> I give you a simple example: let's say you used Bison to develop a tool
>>> which converts source code from one programming language A to B. Now you
>>> might think this is not a parser generator. Well, it was obviously not
>>> your intention. But now consider somebody uses that conversion tool for
>>> converting a parser originally written in programming language A to
>>> language B.
>>> 
>>> Right, your Bison generated conversion tool just generated a parser.
>> 
>> A parser generator is not merely a program that generates a parser, but does
>> so from a grammar [1]. So the intent of the exception, I think, is that you
>> cannot use the skeleton as a part of a program like Bison, but perhaps
>> there is the need for some clarification.
>> 
>> 1. https://en.wikipedia.org/wiki/Compiler-compiler
> 
> That Wikipedia article sais "The input *may* be a text file containing the 
> grammar written in BNF ... , although *other* definitions exist.", 
> immediately 
> followed by another type that is analogous to my example: meta compilers.
> 
> It is clear what the intention of the exception was: a) allowing people to 
> use 
> Bison for generating parsers also for propriety projects, but preventing b) 
> that somebody simply takes Bison's skeleton source code, adds the missing 
> pieces and distributes an entire unGPLed version of Bison.

That is how I parse it, too.

> But if you intend to use Bison for proprietary purposes, you should be aware 
> that the current wordings of the exception go far beyond of what was probably 
> intended for case b) and might thus indeed lead to a potential legal issue 
> for 
> your company if your Bison generated parser only has the smallest chance of 
> being capable to generate another parser.

The term "compiler-compiler" felt out of use, replaced by "parser generator", 
which I also prefer, and other renderings of that latter might be what confuses 
it.

> Many big IT companies out there are using Bison extensively for generating 
> their arsenal of meta compilers, and if you take the exception text 
> literally, 
> and if they even used a GPLv3 Bison version ... well you get the idea.

The idea of GPL is to block proprietary use.

> And this issue is not just limited to meta compilers. There are many other 
> non 
> obvious use cases where you might theoretically get into the same situation.

Maybe Akim can clarify.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: [Question]:Question about bison adding GPL copyright to the parser files generated by bison and yacc.c

2019-01-26 Thread Hans Åberg

> On 26 Jan 2019, at 23:36, John P. Hartmann  wrote:
> 
> I'm glad we two at least agree here.  Then it follows that the copyright 
> notice should not be copied into the output file.  It is, if memory serves, a 
> trivial change to the skeleton files to make that happen.

It can be copied to the output file, as multiple copyrights can be applicable, 
though it is only valid for the part that comes from the skeleton file. Perhaps 
that should be clarified, too.

> On 1/26/19 22:28, Hans Åberg wrote:
>> From a legal point of view, copyright applies to the code in the skeleton 
>> file, as the other part is considered machine generated, like in an editor, 
>> and not copyrightable.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: [Question]:Question about bison adding GPL copyright to the parser files generated by bison and yacc.c

2019-01-26 Thread Hans Åberg

> On 26 Jan 2019, at 21:48, Christian Schoenebeck  
> wrote:
> 
> On Samstag, 26. Januar 2019 14:31:06 CET Hans Åberg wrote:
>>> On 21 Jan 2019, at 17:24, bird bravo  wrote:
>>>  I noticed that when I use bison and the parser skeleton(yacc.c) to
>>> 
>>> generate a parser file... there will be a copyright notice to claim the
>>> file is a GPLV3 and an exception declaration... I wanna know is that OK to
>>> use and distribute the parser file as I wish..
>> 
>> Yes, see [1]. The copyright applies to the skeleton code which is copied
>> over to the generated parser, but there is an exception added for that.
>> 
>> 1. https://www.gnu.org/software/bison/manual/html_node/Conditions.html
> 
> No, that's not what the exception sais. The exception applies (and hence the 
> freedom to distribute a Bison generated parser under any arbitrary, different 
> license than GPL) only if the generated parser is not itself a parser 
> generator. This is not as obvious as you might think. It really depends on 
> what his generated parser is capable to do.

From a legal point of view, copyright applies to the code in the skeleton file, 
as the other part is considered machine generated, like in an editor, and not 
copyrightable.

> I give you a simple example: let's say you used Bison to develop a tool which 
> converts source code from one programming language A to B. Now you might 
> think 
> this is not a parser generator. Well, it was obviously not your intention. 
> But 
> now consider somebody uses that conversion tool for converting a parser  
> originally written in programming language A to language B.
> 
> Right, your Bison generated conversion tool just generated a parser.

A parser generator is not merely a program that generates a parser, but does so 
from a grammar [1]. So the intent of the exception, I think, is that you cannot 
use the skeleton as a part of a program like Bison, but perhaps there is the 
need for some clarification.

1. https://en.wikipedia.org/wiki/Compiler-compiler



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: [Question]:Question about bison adding GPL copyright to the parser files generated by bison and yacc.c

2019-01-26 Thread Hans Åberg

> On 26 Jan 2019, at 21:48, Christian Schoenebeck  
> wrote:
> 
> On Samstag, 26. Januar 2019 14:31:06 CET Hans Åberg wrote:
>>> On 21 Jan 2019, at 17:24, bird bravo  wrote:
>>>   I noticed that when I use bison and the parser skeleton(yacc.c) to
>>> 
>>> generate a parser file... there will be a copyright notice to claim the
>>> file is a GPLV3 and an exception declaration... I wanna know is that OK to
>>> use and distribute the parser file as I wish..
>> 
>> Yes, see [1]. The copyright applies to the skeleton code which is copied
>> over to the generated parser, but there is an exception added for that.
>> 
>> 1. https://www.gnu.org/software/bison/manual/html_node/Conditions.html
> 
> No, that's not what the exception sais. The exception applies (and hence the 
> freedom to distribute a Bison generated parser under any arbitrary, different 
> license than GPL) only if the generated parser is not itself a parser 
> generator. This is not as obvious as you might think. It really depends on 
> what his generated parser is capable to do.

From a legal point of view, copyright applies to the code in the skeleton file, 
as the other part is considered machine generated, like in an editor, and not 
copyrightable.

> I give you a simple example: let's say you used Bison to develop a tool which 
> converts source code from one programming language A to B. Now you might 
> think 
> this is not a parser generator. Well, it was obviously not your intention. 
> But 
> now consider somebody uses that conversion tool for converting a parser  
> originally written in programming language A to language B.
> 
> Right, your Bison generated conversion tool just generated a parser.

A parser generator is not merely a program that generates a parser, but does so 
from a grammar [1]. So the intent of the exception, I think, is that you cannot 
use the skeleton as a part of a program like Bison, but perhaps there is the 
need for some clarification.

1. https://en.wikipedia.org/wiki/Compiler-compiler



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: [Question]:Question about bison adding GPL copyright to the parser files generated by bison and yacc.c

2019-01-26 Thread Hans Åberg


> On 21 Jan 2019, at 17:24, bird bravo  wrote:
> 
>I noticed that when I use bison and the parser skeleton(yacc.c) to
> generate a parser file... there will be a copyright notice to claim the
> file is a GPLV3 and an exception declaration... I wanna know is that OK to
> use and distribute the parser file as I wish..

Yes, see [1]. The copyright applies to the skeleton code which is copied over 
to the generated parser, but there is an exception added for that.

1. https://www.gnu.org/software/bison/manual/html_node/Conditions.html



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: No Bison 3 in untainted macOS installs

2018-12-21 Thread Hans Åberg


> On 21 Dec 2018, at 11:32, Uxio Prego  wrote:
> 
> 0. Consider a Yacc parser as a way to have a well
>   documented common ground for Bison 2 and potential
>   future releases of Bison departing from Bison 2 but not
>   departing from Yacc compatibility - simply my own
>   guess on the Bison roadmap, which I don't know, so I
>   might be much wrong. I'm thinking that one can depart
>   from Bison 2 much easier than from Yacc, as there are
>   a lot more printed documentation (and non printed too)
>   about Yacc than about Bison 2.

If the generated parser is distributed with the other sources, which is fine as 
it is platform independent, then only those that want to rewrite the parser 
would need Bison, and it is possible to use the latest version.

> 1. Once hit a multithreading problem (e.g. two threads are
>   potentially wanting to parse some doc or string at a same
>   time using the non reentrant but Yacc compatible Bison
>   parser) proxy all parsing needs in a single thread, in
>   order to avoid invalid access to the Yacc compatibility
>   mode static globals.

Simplest is to use a single parse into an AST if the parse is unique and the 
thread the evaluation of that. If not a unique parse, there is the GLR parser, 
but not for C++ yet, I think. Otherwise, the Bison C++ parser is pure. (I don't 
recall if you want to switch to C++, though.)

> 2. After applying (1.) and once hit a performance issue (i.e.
>   at some point or under certain circumstances one thread
>   is not enough to handle all the parsing needs),
>   convert the Yacc compatible parser in a Bison 2.3 pure
>   parser, if `master` Bison is still compatible with Bison 2.
>   Else convert the Yacc compatible parser in a `master`
>   Bison pure parser, and then demand installs of the
>   software on macOS to upgrade to contemporary Bison,
>   ideally leaving the user the choice between a standalone
>   Bison install, a Macports install, a Homebrew install,
>   docker strategies, etc., unless in order to handle the rest
>   of dependencies of the software using Bison, it strongly
>   encourages one or several of these choices over the
>   other choices.

So if the generated parser is distributed with the other sources, this would 
not be an issue for users, only developers.

> Fortunately I can't think of any case where I could want to
> spawn or join threads inside inner C blocks of grammar
> rules. Anyway I suppose one can do this, specially if one
> such thread is joined in the same inner C block which
> spawned it.

C++11 and later versions have thread support.

https://en.cppreference.com/w/cpp/thread



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: No Bison 3 in untainted macOS installs

2018-12-21 Thread Hans Åberg


> On 21 Dec 2018, at 10:00, Uxio Prego  wrote:
> 
>> A threaded parser aside, I found threading difficult to debug in general, so 
>> good with a way to turn it off.
> 
> Not sure what do you mean. Threaded parser to me sounds like a *.y
> source code doc where threads are spawned and joined in the inner
> code blocks of grammar rules.

You snipped the part I replied to:

> - On performance or multithreading problems while in
> Yacc mode:
> 1. Proxy all parsing needs in a single thread and
>queue parsing requests.

I am not sure what you mean here. I looks like you want to make a threaded 
parser, which I haven't tried.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Error with grammar arguments

2018-12-18 Thread Hans Åberg


> On 18 Dec 2018, at 20:06, Rob Casey  wrote:
> 
> Yes - That is correct Chris. I am allocating memory within Flex for tokens.
> This issue first manifest itself when I started to add code for the proper
> freeing on this memory (where no longer required) within grammar actions.

It looks like a lexer issue, same pointer passed to the Bison parser. You might 
write out the strings to see whether it is overwritten or not written the 
second time.

> On Wed, 19 Dec. 2018, 04:40 Chris verBurg  
>> Rob,
>> 
>> To ask a sanity question, you do strdup (or otherwise allocate fresh
>> memory for) yytext on the Flex side when returning tokens, yes?
>> 
>> -Chris



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: No Bison 3 in untainted macOS installs

2018-12-15 Thread Hans Åberg


> On 15 Dec 2018, at 19:31, Uxio Prego  wrote:
> 
>> In fact not so difficult. But it might be good to complement with a package 
>> manager.
> 
> Yes. This is maybe not going to be popular here, but
> in the light of the other replies I think the best for me
> would be to:
> 
> - Develop a 2 to 3 migration guide on an example,
>  smaller than the actual parser but staying relevant
>  to it; as much simple, or as much featured.
> - Stay in Bison 2.3 in early development, for
>  straightforward macOS use.
> - Move to contemporary Bison, if Bison ever moves
>  away from Yacc; for straightforward GNU/Linux use.
>  Then in macOS recommend Macports as per your
>  advice, although supporting Homebrew too, for users
>  convenience, as it is very popular.

The Bison generated parser is platform independent, so it can distributed, and 
if you do that, you can move to latest Bison as quickly as possible. Only those 
that want to rewrite the parser would then need Bison installed.

> - On performance or multithreading problems while in
>  Yacc mode:
>  1. Proxy all parsing needs in a single thread and
> queue parsing requests.
>  2. If that isn't enough throughput, move to
> contemporary Bison finally.

A threaded parser aside, I found threading difficult to debug in general, so 
good with a way to turn it off.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: No Bison 3 in untainted macOS installs

2018-12-15 Thread Hans Åberg


> On 15 Dec 2018, at 17:58, Uxio Prego  wrote:
> 
> 
>> On 15 Dec 2018, at 17:02, Uxio Prego  wrote:
>> 
>>> Also, MacPorts [3] packages tend to be more up-to-date. It is installed in 
>>> /opt/, so /usr/local doesn't get cluttered. So it is possible to choose 
>>> what one wants, say GCC from MacPorts, and the original Clang from their 
>>> site in different builds, especially easy if you use Automake, which admits 
>>> out of source tree compilation.
>> 
>> Actually Homebrew clutters a specific `Cellar/` dir under
>> `/usr/local/`, so I see no effective benefit. And Homebrew
>> already seems really up to date.
> 
> You are right, it clutters /usr/local/ all over the place.
> I stand corrected. Thanks.

I think that the may make links into the directory you mentioned. They also 
used to have the requirement to change the permissions of the /usr/local to a 
single user.

> In my opinion that leaves the discriminant to be which one
> of these is less vulnerable to package hijacking a la NPM.
> Ouch. No idea which one.

On MacPorts, one has to sudo. But normally, for security reasons, one should 
compile the package as user, and as root, only install it.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: No Bison 3 in untainted macOS installs

2018-12-15 Thread Hans Åberg


> On 15 Dec 2018, at 11:05, Uxio Prego  wrote:
> 
> Does anybody know when Bison 3 is going to be added to
> macOS?
> 
> Fortunately other projects such as Homebrew allow a
> straightforward selection of a cutting edge alternative Bison
> under `/usr/local/`, however isn't it a bit sad to add Homebrew
> as a dependency to some other software if it would be added
> **only** for de facto upgrading the ancient Bison 2.3 ~2006 to
> the contemporary Bison 3?

Bison installs in /usr/local/ direct from the sources in [1], and works fine 
will their inhouse version of clang. Bison also needs m4 which can be installed 
the same from [2]. Just compile with ./configure && make, and 'make pdf' if you 
want the PDF manual, then sud only on the install: 'sudo make install' and 
'sudo make install-pdf'.

Also, MacPorts [3] packages tend to be more up-to-date. It is installed in 
/opt/, so /usr/local doesn't get cluttered. So it is possible to choose what 
one wants, say GCC from MacPorts, and the original Clang from their site in 
different builds, especially easy if you use Automake, which admits out of 
source tree compilation.

1. https://ftp.gnu.org/gnu/bison/
2. https://ftp.gnu.org/gnu/m4/
3. https://www.macports.org



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: are there user defined infix operators?

2018-11-10 Thread Hans Åberg

> On 10 Nov 2018, at 13:51, Uxio Prego  wrote:
> 
> Alright, but

OK, let's hear!

>> On 8 Nov 2018, at 23:37, Hans Åberg  wrote:
>> 
>>> On 8 Nov 2018, at 22:34, Uxio Prego  wrote:
>>> 
>>>> [...]
>>> 
>>> The example and explanation are worth a thousand words,
>>> thank you very much. So I use a simple grammar like that, and
>>> the stack data structures, and if necessary feed the lexer back
>>> with data from the parser once the user requests some infix
>>> operators.
>> 
>> It is only if you want to have a prefix and an infix or postfix operator 
>> with the same name, like operator- or operator++ in C++, that there is a 
>> need for handshake between the lexer and the parser, and it suffices with a 
>> boolean value that tells whether the token last seen is a prefix operator. 
>> Initially set to false, the prefix operators set it to true in the parser, 
>> and all other expression tokens set it to false. Then, when the lexer sees 
>> an operator that can be both a prefix and an infix or postfix, it uses this 
>> value to disambiguate. I leave it to you to figure out the cases, it is not 
>> that hard, just a bit fiddly. :-)
>> 
> 
> Yeah, but e.g. I don't plan to define ++ as operator at all, even
> though I would want any users wanting it to be able to configure
> so.

An implementation detail to be aware of is that if negative numbers are allowed 
as tokens, then 3-2 will parse as 3 followed by -2, not as a subtraction. So 
therefore, it may be better to having only positive numbers, not negative, and 
implement unary operator- and operator+, which is why C++ has them.

So you may not be able to escape having some name overloading.

> I guess this would require, either predefining it even with no
> actual core semantic; or providing the parser-to-lexer feedback,
> and eventually to replace a current vanilla and clean flex lexer
> for something else, and/or writing a lot of ugly hack in it.

Have a look at the C++ operator precedence table [1]. You might try to squeeze 
in the user defined operators at some point in the middle.

1. https://en.cppreference.com/w/cpp/language/operator_precedence

> Now think that the ++ operator has completely different meaning
> from a C++ perspective than from a Haskell perspective. Repeat
> for the ** operator, which exists in Python or Haskell but not (or
> if it does exist, for sure they are not very popular) in languages
> like C++ or Java. Some languages provide a // operator, etc. So
> predefining is not a good solution I would say.

In Haskell, it is a Monad operator, C++ does not have that. :-) The Haskell 
interpreter Hugs has a file Prelude.hs which defines a lot of prelude functions 
in Haskell code.

But Haskell has only 10 precedence levels, which is a bit too little.

> Anyway this is just thinking about the ultimate possibilities that in
> my opinion some abstract extensible spec should try to provide,
> or at least foresee, but I don't prioritize to fully implement.

It is good to think it through before implementing it. Bison makes it easy to 
define a compile time grammar, making it easy to test it out.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: improving error message

2018-11-10 Thread Hans Åberg

> On 10 Nov 2018, at 12:50, Akim Demaille  wrote:
> 
>> Le 10 nov. 2018 à 10:38, Hans Åberg  a écrit :
>> 
>>> Also, see if using %param does not already
>>> give you what you need to pass information from the scanner to the
>>> parser’s yyerror.
>> 
>> How would that get into the yyerror function?
> 
> In C, arguments of %parse-param are passed to yyerror.  That’s why I mentioned
> %param, not %lex-param.  And in the C++ case, these are members.

Actually, I was thinking about the token error. But for the yyerror function, I 
use C++, and compute the string for data in the semantic value, the prototype 
is:
  void yyparser::error(const location_type& loc, const std::string& errstr)

Then I use it for both errors and warnings, the latter we discussed long ago. 
For errors:
  throw syntax_error(@x, str); // Suitably computed string

For warnings:
  parser::error(@y, "warning: " + str);  // Suitably computed string

Then the error function above has:
  std::string s = "error: ";
  if (errstr.substr(0, 7) == "warning")
s.clear();

This way, the string beginning with "error: " is not shown in the case of a 
warning.

>>>>> I believe that the right approach is rather the one we have in compilers
>>>>> and in bison: caret errors.
>>>>> 
>>>>> $ cat /tmp/foo.y
>>>>> %token FOO 0xff 0xff
>>>>> %%
>>>>> exp:;
>>>>> $ LC_ALL=C bison /tmp/foo.y
>>>>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
>>>>> %token FOO 0xff 0xff
>>>>>  
>>>>> I would have been bothered by « unexpected 255 ».
>>>> 
>>>> Currently, that’s for those still using only ASCII.
>>> 
>>> No, it’s not, it works with UTF-8.  Bison’s count of characters is mostly
>>> correct.  I’m talking about Bison’s own location, used to parse grammars,
>>> which is improved compared to what we ship in generated parsers.
>> 
>> Ah. I thought of errors for the generated parser only. Then I only report 
>> byte count, but using character count will probably not help much for caret 
>> errors, as they vary in width. Then problem is that caret errors use two 
>> lines which are hard to synchronize in Unicode. So perhaps some kind of one 
>> line markup instead might do the trick.
> 
> Two things:
> 
> One is that the semantics of Bison’s location’s column is not specified:
> it is up the user to track characters or bytes.  As a matter of fact, Bison
> is hardly concerned by this choice; rather it’s the scanner that has to
> deal with that.
> 
> The other one is: once you have the location, you can decide how to display
> it.  In the case of Bison, I think the caret errors are fine, but you
> could decide to do something different, say use colors or delimiters, to
> be robust to varying width.

Yes, actually I though about the token errors. But it is interesting to see 
what you say about it.

>>>> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display 
>>>> properly. In fact, I am using special code to even write out Unicode 
>>>> characters in the error strings, since Bison assumes all strings are 
>>>> ASCII, the bytes with the high bit set being translated into escape 
>>>> sequences.
>>> 
>>> Yes, I’m aware of this issue, and we have to address it.
>> 
>> For what I could see, the function that converts it to escapes is sometimes 
>> applied once and sometimes twice, relying on that it is an idempotent.
> 
> It’s a bit more tricky than this.  I’m looking into it, and I’d like
> to address this in 3.3.

I realized one needs to know a lot about Bison's innards to fix this. A thing 
that made me curios is why the function it uses zeroes out the high bit: It 
looks like having something with the POSIX C locale, but I could not find 
anything require it to be set to zero in that locale.

Right now, I use a function that translates the escape sequences back to bytes.

>>> We also have to provide support for internationalization of
>>> the token names.
>> 
>> Personally, I don't have any need for that. I use strings, like
>> %token logical_not_key "¬"
>> %token logical_and_key "∧"
>> %token logical_or_key "∨"
>> and in the case there are names, they typically match what the lexer 
>> identifies.
> 
> Yes, not all the strings should be translated.  I was thinking of
> something like
> 
> %token NUM _("number")
> %token ID _("identifier")
> %token PLUS "+"
> 
> This way, we can even point xgettext to looking at the grammar file
> rather than the generated parser.

It might be good if one wants error messages in another language.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: improving error message (was: bison for nlp)

2018-11-10 Thread Hans Åberg

> On 10 Nov 2018, at 09:02, Akim Demaille  wrote:
> 
> Hi Hans,

Hello Akim,

> Yes.  Some day we will work on improving error message generation,
> there is much demand.
 
 One thing I’d like to have is if there is an error with say a identifier, 
 also writing the out the name of it.
>>> 
>>> Yes, that’s a common desire.  However, I don’t think it’s really
>>> what people need, because the way you print the semantic value
>>> might differ from what you actually wrote.  For instance, if I have
>>> a syntax error involving an integer literal written in binary,
>>> say 0b101010, then I will be surprised to read that I have an error
>>> involving 42.
>>> 
>>> So you would need to cary the exact string from the scanner to the
>>> parser, and I think that’s too much to ask for.
>> 
>> That is what I do. So I merely want an extra argument in the error reporting 
>> function where it can be put.
> 
> Please, be clearer: what extra argument, and show how the parser
> can provide it.  

Yes, I need to analyze it and get back.

> Also, see if using %param does not already
> give you what you need to pass information from the scanner to the
> parser’s yyerror.

How would that get into the yyerror function?

>>> I believe that the right approach is rather the one we have in compilers
>>> and in bison: caret errors.
>>> 
>>> $ cat /tmp/foo.y
>>> %token FOO 0xff 0xff
>>> %%
>>> exp:;
>>> $ LC_ALL=C bison /tmp/foo.y
>>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
>>> %token FOO 0xff 0xff
>>>
>>> I would have been bothered by « unexpected 255 ».
>> 
>> Currently, that’s for those still using only ASCII.
> 
> No, it’s not, it works with UTF-8.  Bison’s count of characters is mostly
> correct.  I’m talking about Bison’s own location, used to parse grammars,
> which is improved compared to what we ship in generated parsers.

Ah. I thought of errors for the generated parser only. Then I only report byte 
count, but using character count will probably not help much for caret errors, 
as they vary in width. Then problem is that caret errors use two lines which 
are hard to synchronize in Unicode. So perhaps some kind of one line markup 
instead might do the trick.

>> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display 
>> properly. In fact, I am using special code to even write out Unicode 
>> characters in the error strings, since Bison assumes all strings are ASCII, 
>> the bytes with the high bit set being translated into escape sequences.
> 
> Yes, I’m aware of this issue, and we have to address it.

For what I could see, the function that converts it to escapes is sometimes 
applied once and sometimes twice, relying on that it is an idempotent.

> We also have to provide support for internationalization of
> the token names.

Personally, I don't have any need for that. I use strings, like
  %token logical_not_key "¬"
  %token logical_and_key "∧"
  %token logical_or_key "∨"
and in the case there are names, they typically match what the lexer identifies.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison for nlp

2018-11-09 Thread Hans Åberg

> On 9 Nov 2018, at 12:11, Akim Demaille  wrote:
> 
>> Le 9 nov. 2018 à 09:58, Hans Åberg  a écrit :
>> 
>> 
>>> On 9 Nov 2018, at 05:59, Akim Demaille  wrote:
>>> 
>>>> By the way, I’ll still get the error message as a string I guess, right?
>>> 
>>> Yes.  Some day we will work on improving error message generation,
>>> there is much demand.
>> 
>> One thing I’d like to have is if there is an error with say a identifier, 
>> also writing the out the name of it.
> 
> Yes, that’s a common desire.  However, I don’t think it’s really
> what people need, because the way you print the semantic value
> might differ from what you actually wrote.  For instance, if I have
> a syntax error involving an integer literal written in binary,
> say 0b101010, then I will be surprised to read that I have an error
> involving 42.
> 
> So you would need to cary the exact string from the scanner to the
> parser, and I think that’s too much to ask for.

That is what I do. So I merely want an extra argument in the error reporting 
function where it can be put.

> Not to mention the
> case of super-long tokens, say a large string, or an ugly regex,
> cluttering the error message.

Have you ever seen a C++ error message? :-)

> I believe that the right approach is rather the one we have in compilers
> and in bison: caret errors.
> 
> $ cat /tmp/foo.y
> %token FOO 0xff 0xff
> %%
> exp:;
> $ LC_ALL=C bison /tmp/foo.y
> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
> %token FOO 0xff 0xff
> 
> I would have been bothered by « unexpected 255 ».

Currently, that's for those still using only ASCII. I am using Unicode 
characters and LC_CTYPE=UTF-8, so it will not display properly. In fact, I am 
using special code to even write out Unicode characters in the error strings, 
since Bison assumes all strings are ASCII, the bytes with the high bit set 
being translated into escape sequences.

Maybe the byte counts can be usable if there is some tool to display them.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison for nlp

2018-11-09 Thread Hans Åberg

> On 9 Nov 2018, at 05:59, Akim Demaille  wrote:
> 
>> By the way, I’ll still get the error message as a string I guess, right?
> 
> Yes.  Some day we will work on improving error message generation,
> there is much demand.

One thing I'd like to have is if there is an error with say a identifier, also 
writing the out the name of it.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg


> On 8 Nov 2018, at 22:34, Uxio Prego  wrote:
> 
>> Take a simple example, a + b*c #, where # is the end marker. First put the a 
>> on the value stack, and the + on the operator stack, and then the b on the 
>> value stack. When the * comes by, it has higher precedence than the + on top 
>> of the operator stack, so it must be stacked. Then the c comes by, so put it 
>> on the value stack. Finally the end marker #, which has lower precedence 
>> than *, so let * operate on the value stack, and put back its value, b*c. 
>> Next is the +, and # has lower precedence, so + operates on the value stack, 
>> computing a + (b*c), which is put back onto the value stack. Then the 
>> operator stack empty, so the process is finished, and the value stack has 
>> the value.
>> [...]
> 
> The example and explanation are worth a thousand words,
> thank you very much. So I use a simple grammar like that, and
> the stack data structures, and if necessary feed the lexer back
> with data from the parser once the user requests some infix
> operators.

It is only if you want to have a prefix and an infix or postfix operator with 
the same name, like operator- or operator++ in C++, that there is a need for 
handshake between the lexer and the parser, and it suffices with a boolean 
value that tells whether the token last seen is a prefix operator. Initially 
set to false, the prefix operators set it to true in the parser, and all other 
expression tokens set it to false. Then, when the lexer sees an operator that 
can be both a prefix and an infix or postfix, it uses this value to 
disambiguate. I leave it to you to figure out the cases, it is not that hard, 
just a bit fiddly. :-)



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg


> On 8 Nov 2018, at 21:19, Akim Demaille  wrote:
> 
> Hi Uxio, hi Hans,

Hi Akim,

> You cannot use Bison to resolve dynamically your precedence if
> you have a free set of levels.  But if you have a fixed number
> of level, say 10, then you could define ten tokens for each level,
> and give them the precedence you want.  Then, in the scanner,
> map each operator to the corresponding level, storing the actual
> operator as a semantic value.  The scanner could use a map for
> instance to decide to which token you map each operator.

That is also a possibility, but make it at least 20 to cover C/C++ [1], as the 
10 or so that Haskell admits is too limited. But it becomes problematic if the 
number of levels is large, like 1200 as in SWI-Prolog.

> That wouldn’t be of much help if you also want to play with
> associativity.  Maybe using even more tokens to denote the different
> possibilities.

I recall that the Haskell interpreter Hugs [2] used something like that.


1. https://en.cppreference.com/w/cpp/language/operator_precedence
2. https://wiki.haskell.org/Hugs



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg

> On 8 Nov 2018, at 20:21, Uxio Prego  wrote:
> 
>> You can write general rules say for prefix, infix, and postfix operators, 
>> [...]
> 
> For simplicity I would be happy to consider only infix operators.

For fixity overloads (same name but different fixity), one can have the 
overloads used in C/C++: prefix and postfix (as in ++), or prefix and infix (as 
in -), but not infix and postfix. This requires extra processing, keeping track 
of the token that was before; Bison cannot do that, so the lexer must do it. A 
grammar might look like, with the lexer figuring out what to return:

%left binary_operator
%left prefix_operator
%left binary_or_postfix_operator
%left postfix_operator

%%

expression:
value
  | prefix_operator expression
  | expression postfix_operator
  | expression binary_or_postfix_operator // Postfix operator
  | expression binary_or_postfix_operator expression // Binary operator:
  | expression binary_operator expression
;

>> [...] the actions put them onto to a stack with precedences and another for
>> values. Then, when a new operator comes by, let the operators on the
>> stack with higher precedences act on the value stack until something with
>> a lower precedence appears, [...]
> 
> I read this twice and didn't understand anything. I read it once again and now
> I understand you are proposing that when operators are used, I don’t really
> use the syntax tree I'm generating with Bison _straightly_, but a more complex
> syntax tree I'd be generating combining the natural tree that arises from the
> grammar and other information in those data structures you propose. Did I
> understand that right?

Take a simple example, a + b*c #, where # is the end marker. First put the a on 
the value stack, and the + on the operator stack, and then the b on the value 
stack. When the * comes by, it has higher precedence than the + on top of the 
operator stack, so it must be stacked. Then the c comes by, so put it on the 
value stack. Finally the end marker #, which has lower precedence than *, so 
let * operate on the value stack, and put back its value, b*c. Next is the +, 
and # has lower precedence, so + operates on the value stack, computing a + 
(b*c), which is put back onto the value stack. Then the operator stack empty, 
so the process is finished, and the value stack has the value.

One can also use a single stack, and an operator precedence grammar [1]. It 
might give better error reporting, but then you need to figure out how to 
integrate it into the Bison grammar.

1. https://en.wikipedia.org/wiki/Operator-precedence_grammar



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison for nlp

2018-11-08 Thread Hans Åberg


> On 8 Nov 2018, at 20:48, r0ller  wrote:
> 
> Sorry, I don't really get it:( What do you mean by replacing tokens by 
> strings? How can that be done?

Write
  %token t_ENG_Adv "English adverb"

Then, in error message, the Bison parser will write "English adverb", and you 
can also use it in the grammar instead of t_ENG_Adv.

> ---- Eredeti levél 
> Feladó: Hans Åberg < haber...@telia.com (Link -> mailto:haber...@telia.com) >
> Dátum: 2018 november 8 14:28:03
> Tárgy: Re: bison for nlp
> Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) >
>  
> > On 7 Nov 2018, at 10:09, r0ller  wrote:
> >
> > Numbering tokens was introduced in the very beginning and has been 
> > questioned by myself quite a many times if it's still needed. I didn't give 
> > a hard try to get rid of it mainly due to one reason: I want to have an 
> > error handling that tells in case of an error which symbols could be 
> > accepted instead of the erroneous one just as bison itself does it but in a 
> > structured way (as bison returns that info in an error message string). 
> > Though, I could not come up with any better idea when it comes to remapping 
> > a token to a symbol.
> 
> If the token numbers are replaced by strings "…", the Bison parser will print 
> those, and they can also be used in the grammar. Would that suffice?
>  


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg


> On 2 Nov 2018, at 17:53, Uxio Prego  wrote:
> 
> More specifically, I'm curious to know if Bison can modify precedences
> at parsing time according user sentences, now referring as user not the
> programmer who wrote the *.y doc but the programmer writing a program
> parsed by the parser generated from the *.y doc.

You can't but:

You can write general rules say for prefix, infix, and postfix operators, and 
then the actions put them onto to a stack with precedences and another for 
values. Then, when a new operator comes by, let the operators on the stack with 
higher precedences act on the value stack until something with a lower 
precedence appears, and put the new operator onto the stack. Continue until the 
end symbol comes by that has the lowest precedence. Operator associativity is 
handled by viewing left and right hand side precedences as different.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison for nlp

2018-11-08 Thread Hans Åberg

> On 7 Nov 2018, at 10:09, r0ller  wrote:
> 
> Numbering tokens was introduced in the very beginning and has been questioned 
> by myself quite a many times if it's still needed. I didn't give a hard try 
> to get rid of it mainly due to one reason: I want to have an error handling 
> that tells in case of an error which symbols could be accepted instead of the 
> erroneous one just as bison itself does it but in a structured way (as bison 
> returns that info in an error message string). Though, I could not come up 
> with any better idea when it comes to remapping a token to a symbol.

If the token numbers are replaced by strings "…", the Bison parser will print 
those, and they can also be used in the grammar. Would that suffice?



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: smoothing Xcode integration

2018-10-06 Thread Hans Åberg


> On 6 Oct 2018, at 14:42, Uxio Prego  wrote:
> 
> On StackOverflow I've been given some directions on getting Xcode
> to correctly use a cross platform Makefile, however haven't been
> completely successful so far.
> 
> I can add a target wrapping the Makefile, but haven't passed the
> environment variables to the `make` execution successfully yet,
> for it to remove the _line_ preprocessor directives.
> 
> Also I haven't figured out yet how to configure that Make target as
> a build step before the program target itself.

This is setup with configure for just using Xcode as a debugger, building the 
project in the directory project/ using 'make' from Terminal:
1. Name a sibling directory to the directory project/, say 'xcode', and move to 
it ('cd xcode').
2. In the xcode directory, create a Makefile using
 ../project/configure CXX=clang++ CXXFLAGS=-g
or whatever path to the compiler, and make. This also create a subdirectory 
called 'src'
3. From within Xcode, create in the directory 'xcode' an "External Build" 
project named 'src', which puts a file src.xcodeproj in the directory xcode/src.
4. Add 'project/src' to the project by dragging into the project window in the 
left sidebar.
5. Open scheme from the pop-up top left in the project window, or keyboard 
-'<' if already selected.
6. If the project is run from say the directory 'test', set its full path in 
Options.
5. If the program is from this directory 'test' using say
  ../xcode/src/project 
set the arguments in Arguments.

>>> I would want to run `sed 's/#line/\/\/#line/'` on the generated parser
>>> in order for it not to show as assembly during debugging. I think
>>> Xcode has some kind of non honoring to the way `#line` works,
>> 
>> You can disable using %no-lines, see the Bison manual, but it works fine in 
>> Xcode 10.
> 
> Nice to know, however maybe the `sed` use is more convenient.
> The default flow must be not to manipulate _line_ preprocessor
> directives.
> 
> If I added `%no-lines`, would GDB debugging break?
> I'd try once I have a Linux box at hand again...
> 
> So unless nothing breaks, injecting a `%no-lines` line in the Bison
> doc, I see unnecessarily complicated compared to optionally run
> a simple `sed` substitution on the generated parser.

The #line directives just causes the debugger to show that part instead of the 
.cc sources when encountering an error. For example, a thrown exception will 
show the correct line in the .yy file. Breakpoints must still be put in the .cc 
sources.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: smoothing Xcode integration

2018-10-06 Thread Hans Åberg


> On 5 Oct 2018, at 21:29, Uxio Prego  wrote:
> 
> How would you detect Xcode from GNU Make?

Is this for an external Makefile project?

> I would want to run `sed 's/#line/\/\/#line/'` on the generated parser
> in order for it not to show as assembly during debugging. I think
> Xcode has some kind of non honoring to the way `#line` works,

You can disable using %no-lines, see the Bison manual, but it works fine in 
Xcode 10.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Security warnings when using LLVM 9

2018-09-27 Thread Hans Åberg


> On 27 Sep 2018, at 15:30, uxio prego  wrote:
> 
> I suspect builtin [macOS High Sierra] Clang maybe defaults to C89, or needs 
> some
> kind of flag in order to use C99?
> By contrast, recent Clang claims to be trying GNU C11 first:
> https://clang.llvm.org/compatibility.html

The both have __STDC_VERSION__ == 201112L. By contrast, gcc8 has 
__STDC_VERSION__ == 201710L.


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Security warnings when using LLVM 9

2018-09-01 Thread Hans Åberg


> On 1 Sep 2018, at 19:22, Uxio Prego  wrote:
> 
> For a couple miscellaneous reasons I have seldom used custom
> installs of (then) newer GCC versions to `/usr/local/`.
> 
> If I ever need a newer or more canonical Clang for some reason,
> I'll be sure to remember your words.

One reason to use GCC and/or the real Clang is C++17. Apart from that, if you 
develop, they are slightly different in warnings, so for that reason it can be 
good to try both.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: Security warnings when using LLVM 9

2018-09-01 Thread Hans Åberg


> On 1 Sep 2018, at 15:52, Uxio Prego  wrote:
> 
> Oh, sorry. Code written by me in the action of a production rule
> was causing my problem.

So issue resolved! Note however that you use the Apple inhouse Clang. If you so 
like, you might download the real one at [1]. Copy it into say 
/usr/local/clang/ so it is easy to remove, and set the PATH or configure or 
make to use it. It then works with the Xcode debugger.

1. https://releases.llvm.org/download.html



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


  1   2   >