Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-26 Thread Hans Åberg


> On 26 Oct 2019, at 09:05, Akim Demaille  wrote:
> 
>> Le 25 oct. 2019 à 18:13, Paul Eggert  a écrit :
>> 
>> On 10/25/19 7:15 AM, Théophile Ranquet wrote:
>>> 
>>> This sounds interesting and I would love reading what people have to
>>> say about this. However, I have failed at finding any such discussion
>>> or source. Could you perhaps share a few pointers?
>> 
>> 
>> I don't know of a good central email thread about this, but here's a style 
>> guideline:
>> 
>> https://www.gnu.org/software/emacs/manual/html_node/elisp/C-Integer-Types.html
> 
> About C++, this page has a good summary of the trend, which is "run away from 
> unsigned".
> 
> https://stackoverflow.com/questions/18795453/why-prefer-signed-over-unsigned-in-c
> 
> This talk is about undefined behavior, and why it's good to have some (in 
> particular because, as Paul already reported, this allows tools such as 
> sanitizers to catch these errors).
> 
> https://youtu.be/yG1OZ69H_-o

Undefined behavior also allows modern optimization, so it is important to stick 
to the lang specs, see:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html





Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-26 Thread Akim Demaille



> Le 25 oct. 2019 à 18:13, Paul Eggert  a écrit :
> 
> On 10/25/19 7:15 AM, Théophile Ranquet wrote:
>> 
>> This sounds interesting and I would love reading what people have to
>> say about this. However, I have failed at finding any such discussion
>> or source. Could you perhaps share a few pointers?
> 
> 
> I don't know of a good central email thread about this, but here's a style 
> guideline:
> 
> https://www.gnu.org/software/emacs/manual/html_node/elisp/C-Integer-Types.html

About C++, this page has a good summary of the trend, which is "run away from 
unsigned".

https://stackoverflow.com/questions/18795453/why-prefer-signed-over-unsigned-in-c

This talk is about undefined behavior, and why it's good to have some (in 
particular because, as Paul already reported, this allows tools such as 
sanitizers to catch these errors).

https://youtu.be/yG1OZ69H_-o

In particular from here, Chandler Carruth treats the case of unsigned/signed 
(IIRC, I did not re-watch this talk).

https://youtu.be/yG1OZ69H_-o?t=1475

Eric Niebler, of BOOST_FOREACH and Range v3 fame, started a "thread" about 
eliminating unsigneds from the C++ standard library V2.

https://github.com/ericniebler/stl2/issues/182


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-25 Thread Paul Eggert

On 10/25/19 7:15 AM, Théophile Ranquet wrote:


This sounds interesting and I would love reading what people have to
say about this. However, I have failed at finding any such discussion
or source. Could you perhaps share a few pointers?



I don't know of a good central email thread about this, but here's a 
style guideline:


https://www.gnu.org/software/emacs/manual/html_node/elisp/C-Integer-Types.html




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-25 Thread Théophile Ranquet
Hi Paul,

On Tue, Oct 1, 2019 at 10:35 AM Paul Eggert  wrote:
>
> In other GNU applications, we've been moving away from using unsigned types 
> due
> to their confusing behavior (particularly when comparing signed vs unsigned),
> and because modern tools such as 'gcc -fsanitize=undefined' can check for 
> signed
> integer overflow but (obviously) not for unsigned integer overflow.

This sounds interesting and I would love reading what people have to
say about this. However, I have failed at finding any such discussion
or source. Could you perhaps share a few pointers?

Best regards,



Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-07 Thread Hans Åberg


> On 7 Oct 2019, at 07:15, Akim Demaille  wrote:
> 
>> Le 2 oct. 2019 à 15:58, Paul Eggert  a écrit :
>> 
>> On 10/1/19 10:27 PM, Akim Demaille wrote:
>>> I don't understand why you shy away from -128 and -32768 though.
>> 
>> The C Standard doesn't guarantee support for -128 and -32768. This 
>> originally was for ones' complement machines such as the Unisys ClearPath 
>> Dorado enterprise servers (still in use via firmware translation to Intel 
>> Xeon, and they have a C compiler),
> 
> Wow!  I was unaware of this.  Ain't it this kind of machines that is still in 
> use to run ancient COBOL programs?  Are they still used in production for 
> programs written in C?

The C/C++ compiler is still allowed to use it for optimizations, even though 
all current hardware I think use modulo 2^k representations; there is an 
example here:
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html#signed_overflow

Then this links gives an example how the optimizer can remove an overflow 
check, causing a security issue:
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html





Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-06 Thread Paul Eggert

The C Standard doesn't guarantee support for -128 and -32768. This originally 
was for ones' complement machines such as the Unisys ClearPath Dorado 
enterprise servers (still in use via firmware translation to Intel Xeon, and 
they have a C compiler),


Are they still used in production for programs written in C?


Yes, it does appear C is still active there; Unisys has a C reference manual 
dated 2015 
 
and they're still selling those mainframes with new hardware released this year 
.


PS. They also support Cobol, Fortran, and some other languages. (The world is 
still waiting patiently for Bison to support Cobol. :-)




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-06 Thread Akim Demaille
Hi Paul,

> Le 2 oct. 2019 à 15:58, Paul Eggert  a écrit :
> 
> On 10/1/19 10:27 PM, Akim Demaille wrote:
>> I don't understand why you shy away from -128 and -32768 though.
> 
> The C Standard doesn't guarantee support for -128 and -32768. This originally 
> was for ones' complement machines such as the Unisys ClearPath Dorado 
> enterprise servers (still in use via firmware translation to Intel Xeon, and 
> they have a C compiler),

Wow!  I was unaware of this.  Ain't it this kind of machines that is still in 
use to run ancient COBOL programs?  Are they still used in production for 
programs written in C?

> as well as for signed-magnitude machines that are no longer in use.



> Nowadays at least in theory this can be useful for debugging implementations 
> on two's-complement machines, which can use -32768 as a trap representation, 
> though it appears nobody is currently doing that and the next C standard may 
> change in this area. See:
> 
> Seacord RC. Uninitialized reads: understanding the proposed revisions to the 
> C language. ACM Queue. 2017;14(6). https://queue.acm.org/detail.cfm?id=3041020

it states:

The C standard also states that for sign and magnitude and two's 
complement, the value with sign bit 1 and all value bits zero can be a trap 
representation or a normal value.

Wow.  I was completely unaware of the existence of "trap values"...  Thanks!

> Bison should support the current C standard (who knows? maybe some 
> Bison-generated code will run on those Unisys mainframes :-)

Too bad programs don't have stickers to put on their backpack when they go to 
exotic places!

> and so I stuck with the standard values.

Sure.  Thanks for explaining!


>>> I like to name my types *_t, so I'd be happy with yy_state_t (scalars) and 
>>> yy_state_least_t/yy_small_state_t (arrays) instead of int and yy_state_num 
>>> currently. But we've had disagreements on this regard, yet I don't know 
>>> where you stand today (given that here, we're kind of protected by yy_*: 
>>> POSIX/ISO etc. would be really bad people to start using them.)
> 
> In the past I was reasonably strict about avoiding user-defined type names 
> ending in "_t", mostly because that suffix is reserved by POSIX, but partly 
> because I simply dislike the tradition of the Hungarian notation 
> . But you're right that 
> "yy_*_t" should be safe enough.

I'll do that, when we drop the pens for the rest of these changes.

And yes, Kaz, I am perfectly aware that POSIX reserves yy_* for Yacc, but I was 
referring to the rest of their world :)


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-06 Thread Paul Eggert

On 10/6/19 4:57 AM, Akim Demaille wrote:


With the appended commits, the CI is green again, and I was able to push my 
other changes.


Thanks. I reproduced the GCC 4.8 problem and installed a patch that works around 
it without casts (see first attached patch).




 "\"".c:1102:41: error: implicit conversion loses integer precision: 
'long' to 'int' [-Werror,-Wshorten-64-to-32]
   YYPTRDIFF_T yysize = yyssp - yyss + 1;
   ~~   ~^~~


This is a more serious problem, as it says that YYPTRDIFF_T is 'int' on a 
platform where ptrdiff_t is really 'long', which means Bison-generated parsers 
could mishandle vary large inputs. That is, the clang diagnostic is pointing out 
an (unlikely) bug, and is not a false alarm.


Although I installed the second attached patch which I hope works around the 
immediate problem, a better fix is needed here. With a C++ compiler, the 
definition of YYPTRDIFF_T should not be a guess at 'int' or 'long'; it should be 
the proper C++ type for pointer subtraction results. When you find the time I'd 
be interested to see the proper portable C++ magic to do that.
>From beceb2fa9382fd44816cae382dd12ffc6a210d5e Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 6 Oct 2019 11:49:56 -0700
Subject: [PATCH 1/2] Work around GCC 4.8 false alarms without casts

* data/skeletons/yacc.c (yyparse):
Initialize yyes_capacity with a signed expression.
* tests/local.at (AT_YYLEX_DEFINE(c)):
Use enum to avoid cast.
---
 TODO  | 8 
 data/skeletons/yacc.c | 2 +-
 tests/local.at| 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/TODO b/TODO
index 0a02ff03..a5935029 100644
--- a/TODO
+++ b/TODO
@@ -117,14 +117,6 @@ compiling yacc.c code:
 
 YYPTRDIFF_T yysize = (YYPTRDIFF_T) (yyesp - *yyes + 1);
 
-Or G++ 4.8
-
-yyes_capacity = (YYPTRDIFF_T) (sizeof yyesa / sizeof *yyes);
-
-Or GCC 4.8
-
-int input_elts = (int) (sizeof input / sizeof input[0]);
-
 * Completion
 Several features are not available in all the backends.
 
diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index abe20a5f..5ef4289e 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -1490,7 +1490,7 @@ b4_function_define([[yyparse]], [[int]], b4_parse_param)[
   yystacksize = YYINITDEPTH;]b4_lac_if([[
 
   yyes = yyesa;
-  yyes_capacity = (YYPTRDIFF_T) (sizeof yyesa / sizeof *yyes);
+  yyes_capacity = ]b4_percent_define_get([[parse.lac.es-capacity-initial]])[;
   if (YYMAXDEPTH < yyes_capacity)
 yyes_capacity = YYMAXDEPTH;]])[
 
diff --git a/tests/local.at b/tests/local.at
index 579b85c3..3acb2791 100644
--- a/tests/local.at
+++ b/tests/local.at
@@ -558,7 +558,7 @@ static
   static int toknum = 0;
   int res;
   ]AT_USE_LEX_ARGS[
-  int input_elts = (int) (sizeof input / sizeof input[0]);
+  enum { input_elts = sizeof input / sizeof input[0] };
   (void) input_elts;
   assert (0 <= toknum && toknum < input_elts);
   res = input[toknum++];
-- 
2.17.1

>From 6373b90fc8f9ab2bfa19a107d44cfad3404faf20 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 6 Oct 2019 11:58:10 -0700
Subject: [PATCH 2/2] Port better to C++ platforms

* data/skeletons/yacc.c (YYPTRDIFF_T, YYPTRDIFF_MAXIMUM):
Default to long, not int.
(yy_lac_stack_realloc, yy_lac, yytnamerr, yyparse):
Avoid casts to YYPTRDIFF_T that were masking the problem.
---
 TODO  | 13 -
 data/skeletons/yacc.c | 12 ++--
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/TODO b/TODO
index a5935029..f3f08ce1 100644
--- a/TODO
+++ b/TODO
@@ -107,15 +107,10 @@ name they have in gcc, clang, etc.  Likewise for the complain_* series of
 functions.
 
 * Modernization
-Remove some casts made for old compilers, such as Clang++ 3.3 and 3.4 when
-compiling yacc.c code:
-
-YYPTRDIFF_T yysize = (YYPTRDIFF_T) (yyssp - yyss + 1);
-
-YYPTRDIFF_T yysize_old =
-  *yytop == yytop_empty ? 0 : (YYPTRDIFF_T) (*yytop - *yybottom + 1);
-
-YYPTRDIFF_T yysize = (YYPTRDIFF_T) (yyesp - *yyes + 1);
+Fix data/skeletons/yacc.c so that it defines YYPTRDIFF_T properly for modern
+and older C++ compilers.  Currently the code defaults to defining it to
+'long' for non-GCC compilers, but it should use the proper C++ magic to
+define it to the same type as the C ptrdiff_t type.
 
 * Completion
 Several features are not available in all the backends.
diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index 5ef4289e..10708627 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -425,9 +425,9 @@ typedef short yytype_int16;
 #  include  /* INFRINGES ON USER NAME SPACE */
 #  define YYPTRDIFF_MAXIMUM PTRDIFF_MAX
 # else
-#  define YYPTRDIFF_T int
+#  define YYPTRDIFF_T long
 #  include  /* INFRINGES ON USER NAME SPACE */
-#  define YYPTRDIFF_MAXIMUM INT_MAX
+#  define YYPTRDIFF_MAXIMUM LONG_MAX
 # endif
 #endif
 
@@ -857,7 +857,7 @@ yy_lac_stack_realloc (YYPTRDIFF_T *yycapacity, YYPTRDIFF_T yyadd,
 

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-06 Thread Akim Demaille
Hi Kaz,

Thanks for your input in this thread!

> Le 1 oct. 2019 à 20:40, Kaz Kylheku  a écrit :
> 
> On 2019-10-01 01:35, Paul Eggert wrote:
>> On 9/29/19 11:34 AM, Akim Demaille wrote:
>>> As a matter of fact, we used two types:
>>> - most arrays (such as state stack, and its correspondance in the LAC
>>>  infrastructure) are using int16_t.  A few "temporary" variables also
>>>  have this type.
>>> - yystate, which is an intermediate variable, was an int.
>> Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18
>> does not require support for int16_t. It could well be more efficient
>> for them to use int_least16_t instead, for better caching; see below.
> 
> I would make two typedefs: one for the storage type of Yacc
> state values, and one for the "register" type for manipulating
> them in local variables.
> 
> The latter of course, being capable of representing all the values of
> the former. E.g. example definitions:
> 
>   typedef short yy_small_state_t;
>   typedef int yy_state_t;

I like these names, I think eventually we'll move to that.


> The special value -1 should be given a manifest constant,

AFAIR, we don't need such a special value for states.  And in other
cases, we can't use -1 which is a valid value for some of our tables.
For instance:

#define YYPACT_NINF (-130)

#define yypact_value_is_default(Yyn) \
  ((Yyn) == YYPACT_NINF)

  /* YYPACT[STATE-NUM] -- Index in YYTABLE of the portion describing
 STATE-NUM.  */
static const yytype_int16 yypact[] =
{
-130,36,   110,  -130,   -22,  -130,  -130, 2,  -130,  -130,
-130,  -130,  -130,  -130,   -19,  -130,-9,40,  -130,   -17,
  -2,  -130,57,  -130,21,66,77,  -130,  -130,  -130,
  78,  -130,87,92,44,  -130,  -130,  -130,   165,  -130,

But we agree special values should be named and care must be taken
during conversions.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-06 Thread Akim Demaille
Hi Paul!

> Le 6 oct. 2019 à 03:58, Paul Eggert  a écrit :
> 
> On 10/5/19 7:41 AM, Akim Demaille wrote:
> 
>> Maybe you want to extend the notes you added to NEWS.  Including
>> the bits about limits.h and stdint.h.  Maybe that should also make
>> its way into the doc, yet I wouldn't know where to write that.
> 
> OK, I did that in the revised patch (first attachment).

Great, thanks!

>>> +typedef int yytype_uint8;
>>>  #endif
>> Wow!  Why do we fallback to int? Is this part where unsigned int == unsigned 
>> short == unsigned char on a little number of architecture?
> 
> Yes, it's for those odd platforms. Though I now see that the above was too 
> extreme: it should have been 'short', not 'int'. Fixed in the first attached 
> patch.
>> The medicine seems worse than the disease to me.
> 
> Well, let's use stronger medicine then :-). I did it in a different way in 
> the first attached patch, so that yytype_uint8 should be 'unsigned char' 
> except for the odd but valid platforms where unsigned char and/or unsigned 
> short do not promote to int.

Ok.  I'm only worried that this might happen a little too often with some 
compilers that we don't have on our radar.

> On compilers not compatible with GCC, the revised patch includes  
> and (if available)  which infringes on the user namespace, but 
> yacc.c already infringes elsewhere for other reasons so that should be OK.

I would think so too.

>> Eventually, all this should move up into c.m4, and be applied to
>> glr.c too.
> 
> Done in the second attached patch. (I haven't installed either patch yet.)

Great!  Feel free to install this.

>> And see what to do about C++ parsers.
> 
> I was hoping our C++ expert could look into that

:) :) :)

Actually, "Eventually, all this should move up..." really did not mean "Paul, 
please do...", but a note for the future, and I do plan to propagate what we 
did to C++ once the dust has settled down.  It requires some expertise to M-w, 
C-x b lalr1.cc , C-y :)


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-06 Thread Akim Demaille



> Le 5 oct. 2019 à 09:28, Akim Demaille  a écrit :
> 
> I can't wait: these errors are blocking from me from pushing other changes 
> that I was about to send.

With the appended commits, the CI is green again, and I was able to push my 
other changes.

commit 4246cd81df3101b1abef3f5914f6b3fa7d1b44de
Author: Akim Demaille 
Date:   Sat Oct 5 17:28:51 2019 +0200

tests: avoid a GCC 4.8 warning

GCC 4.8 reports:

input.y:57:33: error: conversion to 'int' from 'long unsigned int'
  may alter its value [-Werror=conversion]
   int input_elts = sizeof input / sizeof input[0];
 ^

* tests/local.at (AT_YYLEX_DEFINE(c)): Add a cast (sorry, Paul!).

diff --git a/tests/local.at b/tests/local.at
index 17759b2f..579b85c3 100644
--- a/tests/local.at
+++ b/tests/local.at
@@ -558,7 +558,7 @@ static
   static int toknum = 0;
   int res;
   ]AT_USE_LEX_ARGS[
-  int input_elts = sizeof input / sizeof input[0];
+  int input_elts = (int) (sizeof input / sizeof input[0]);
   (void) input_elts;
   assert (0 <= toknum && toknum < input_elts);
   res = input[toknum++];







commit 5973d763c0867924830fa90f6560078f745d45a7
Author: Akim Demaille 
Date:   Sat Oct 5 17:35:40 2019 +0200

yacc.c: work around warnings from Clang++ 3.3 and 3.4

When we run the test suite with these C++ compilers to compile C code,
we get:

239. synclines.at:440: testing syncline escapes: yacc.c ...
../../tests/synclines.at:440: $CC $CFLAGS $CPPFLAGS \"\\\"\".c -o 
\"\\\"\" ||
  exit 77
stderr:
stdout:
../../tests/synclines.at:440: COLUMNS=1000; export COLUMNS;  bison 
--color=no -fno-caret  -o \"\\\"\".c \"\\\"\".y
../../tests/synclines.at:440: $CC $CFLAGS $CPPFLAGS  $LDFLAGS -o 
\"\\\"\" \"\\\"\".c $LIBS
stderr:
"\"".c:1102:41: error: implicit conversion loses integer precision: 
'long' to 'int' [-Werror,-Wshorten-64-to-32]
  YYPTRDIFF_T yysize = yyssp - yyss + 1;
  ~~   ~^~~
1 error generated.

193. conflicts.at:545: testing parse.error=verbose and consistent 
errors: lr.type=canonical-lr parse.lac=full ...
input.c:737:75: error: implicit conversion loses integer precision: 
'long' to 'int'
   [-Werror,-Wshorten-64-to-32]
  YYPTRDIFF_T yysize_old = *yytop == yytop_empty ? 0 : *yytop - 
*yybottom + 1;
  ~~   
~~~^~~
input.c:901:48: error: implicit conversion loses integer precision: 
'long' to 'int'
   [-Werror,-Wshorten-64-to-32]
YYPTRDIFF_T yysize = yyesp - *yyes + 1;
~~   ~~^~~

* data/skeletons/yacc.c: Add more casts.

diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index fdf4ba6d..e4705735 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -856,7 +856,8 @@ yy_lac_stack_realloc (YYPTRDIFF_T *yycapacity, YYPTRDIFF_T 
yyadd,
   yy_state_num *yybottom_no_free,
   yy_state_num **yytop, yy_state_num *yytop_empty)
 {
-  YYPTRDIFF_T yysize_old = *yytop == yytop_empty ? 0 : *yytop - *yybottom + 1;
+  YYPTRDIFF_T yysize_old =
+*yytop == yytop_empty ? 0 : (YYPTRDIFF_T) (*yytop - *yybottom + 1);
   YYPTRDIFF_T yysize_new = yysize_old + yyadd;
   if (*yycapacity < yysize_new)
 {
@@ -1023,7 +1024,7 @@ yy_lac (yy_state_num *yyesa, yy_state_num **yyes,
 YYDPRINTF ((stderr, " R%d", yyrule - 1));
 if (yyesp != yyes_prev)
   {
-YYPTRDIFF_T yysize = yyesp - *yyes + 1;
+YYPTRDIFF_T yysize = (YYPTRDIFF_T) (yyesp - *yyes + 1);
 if (yylen < yysize)
   {
 yyesp -= yylen;
@@ -1534,7 +1535,7 @@ yysetstate:
 #else
 {
   /* Get the current used size of the three stacks, in elements.  */
-  YYPTRDIFF_T yysize = yyssp - yyss + 1;
+  YYPTRDIFF_T yysize = (YYPTRDIFF_T) (yyssp - yyss + 1);
 
 # if defined yyoverflow
   {












commit 32e5a91a91bc034a3f596c056569c0eaa09ca7e1
Author: Akim Demaille 
Date:   Sat Oct 5 22:51:16 2019 +0200

yacc.c: work around warnings from G++ 4.8

input.c: In function 'int yyparse()':
input.c: error: conversion to 'long int' from 'long unsigned int'
may change the sign of the result [-Werror=sign-conversion]
   yyes_capacity = sizeof yyesa / sizeof *yyes;
^
cc1plus: all warnings being treated as errors

* data/skeletons/yacc.c: here.

diff --git a/TODO b/TODO
index f0ec27da..d99954e0 100644
--- a/TODO
+++ b/TODO
@@ -128,6 +128,25 @@ Rename these guys as "diagnostics.*" (or "diagnose.*"), 
since that's the
 name they have in gcc, clang, etc.  Likewise for the complain_* series of
 f

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert

On 10/5/19 7:41 AM, Akim Demaille wrote:


Why have they chosen this?  I guess the point is speed.


Yes, typically these machines are not byte-addressable and it's faster to make 
every integer the same size. This is sort of the "BCPL variant" of C, which the 
C standard allows.



Maybe you want to extend the notes you added to NEWS.  Including
the bits about limits.h and stdint.h.  Maybe that should also make
its way into the doc, yet I wouldn't know where to write that.


OK, I did that in the revised patch (first attachment).


+typedef int yytype_uint8;
  #endif


Wow!  Why do we fallback to int? Is this part where unsigned int == unsigned 
short == unsigned char on a little number of architecture?


Yes, it's for those odd platforms. Though I now see that the above was too 
extreme: it should have been 'short', not 'int'. Fixed in the first attached patch.

The medicine seems worse than the disease to me.


Well, let's use stronger medicine then :-). I did it in a different way in the 
first attached patch, so that yytype_uint8 should be 'unsigned char' except for 
the odd but valid platforms where unsigned char and/or unsigned short do not 
promote to int. On compilers not compatible with GCC, the revised patch includes 
 and (if available)  which infringes on the user namespace, 
but yacc.c already infringes elsewhere for other reasons so that should be OK.



Eventually, all this should move up into c.m4, and be applied to
glr.c too.


Done in the second attached patch. (I haven't installed either patch yet.)


And see what to do about C++ parsers.


I was hoping our C++ expert could look into that
>From 7f222b11bbb3c57220b47cb0a59072e90cca2ae5 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 5 Oct 2019 13:06:40 -0700
Subject: [PATCH 1/2] =?UTF-8?q?Use=20=E2=80=9Cleast=E2=80=9D=20types=20for?=
 =?UTF-8?q?=20integers=20in=20Yacc=20tables?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This changes the Yacc skeleton to use “least” integer types to
keep tables smaller on some platforms, which should lessen cache
pressure.  Since Bison uses the Yacc skeleton, it follows suit.
* data/skeletons/yacc.c: Include limits.h and stdint.h if this
seems to be needed.
(yytype_uint8, yytype_int8, yytype_uint16, yytype_int16):
If available, use GCC predefined macros __INT_MAX__ etc. to select
a “least” type, as this avoids namespace hassles.  Otherwise, if
available fall back on selecting a “least” type via the C99 macros
INT_MAX, INT_LEAST8_MAX, etc.  Otherwise, fall further back on one of
the builtin C99 types signed char, short, and int.  Make sure that
any selected type promotes to int.  Ignore any macros YYTYPE_INT16,
YYTYPE_INT8, YYTYPE_UINT16, YYTYPE_UINT8 defined by the user.
(ptrdiff_t, PTRDIFF_MAX): Simplify in the light of the above.
(yytype_uint8, yytype_uint16): Do not assume that unsigned char
and unsigned short promote to int, as this isn’t true on some
platforms (e.g., TI TMS320C55x).
* src/parse-gram.y (YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16)
(YYTYPE_UINT8): Remove, as these are no longer effective.
---
 NEWS  |  6 -
 data/skeletons/yacc.c | 61 ++-
 doc/bison.texi| 10 +++
 src/parse-gram.y  |  5 
 4 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/NEWS b/NEWS
index a64e3492..91d95775 100644
--- a/NEWS
+++ b/NEWS
@@ -48,7 +48,11 @@ GNU Bison NEWS
 
   Bison templates now prefer signed to unsigned integer types when
   either will do, as the signed types are less error-prone and allow
-  for better checking with 'gcc -fsanitize=undefined'.
+  for better checking with 'gcc -fsanitize=undefined'.  Also, the
+  types chosen are now portable to unusual machines where char, short and
+  int are all the same width.  On non-GNU platforms this may entail
+  including  and (if available)  to define integer types
+  and constants.
 
 * Noteworthy changes in release 3.4.2 (2019-09-12) [stable]
 
diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index fdf4ba6d..a0f7 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -114,7 +114,7 @@ m4_ifset([b4_parse_param], [b4_args(b4_parse_param), ])])
 # -
 # Return a narrow int type able to handle numbers ranging from
 # MIN to MAX (included).  Overwrite the version from c.m4,
-# so that the user can override the shorter types.
+# so that the code can use C99 types if available.
 m4_define([b4_int_type],
 [m4_if(b4_ints_in($@,   [-127],   [127]), [1], [yytype_int8],
b4_ints_in($@,  [0],   [255]), [1], [yytype_uint8],
@@ -388,26 +388,54 @@ m4_if(b4_api_prefix, [yy], [],
 # undef short
 #endif
 
-#ifdef YYTYPE_UINT8
-typedef YYTYPE_UINT8 yytype_uint8;
-#else
+/* On compilers that do not define __PTRDIFF_MAX__ etc., include
+and (if available)  so that the code can
+   choose integer types of a good width.  */
+
+#ifndef __PTRDIFF_MAX__
+# include 

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Akim Demaille



> Le 5 oct. 2019 à 11:21, Paul Eggert  a écrit :
> 
> On 10/1/19 1:35 AM, Paul Eggert wrote:
>> Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18 does not 
>> require support for int16_t. It could well be more efficient for them to use 
>> int_least16_t instead, for better caching
> 
> Attached is a proposed patch to implement this for Bison. This patch also 
> fixes some portability problems for odd machines like the TI TMS320C55x where 
> 'unsigned char', 'unsigned short', and 'unsigned' are all the same width 
> (which C allows),

Wow...  Why have they chosen this?  I guess the point is speed.

> and where picky -Wconversion compilers warn about assigning an unsigned char 
> value to an int variable.
> 
> Comments welcome; I haven't installed this.


> From 242504a103da5fc5cb868614a603710bad6a446a Mon Sep 17 00:00:00 2001
> From: Paul Eggert 
> Date: Sat, 5 Oct 2019 02:18:45 -0700
> Subject: [PATCH] =?UTF-8?q?Use=20=E2=80=9Cleast=E2=80=9D=20types=20for=20i?=
>  =?UTF-8?q?ntegers=20in=20Yacc=20tables?=
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> This changes the Yacc skeleton to use “least” integer types to
> keep tables smaller on some platforms, which should lessen cache
> pressure.  Since Bison uses the Yacc skeleton, it follows suit.

Maybe you want to extend the notes you added to NEWS.  Including
the bits about limits.h and stdint.h.  Maybe that should also make
its way into the doc, yet I wouldn't know where to write that.

> * data/skeletons/yacc.c (yytype_uint8, yytype_int8)
> (yytype_uint16, yytype_int16): If available, use GCC predefined
> macros __INT_MAX__ etc. to select a “least” type, as this avoids
> namespace hassles.  Otherwise, if available (because the relevant
> headers have been included) fall back on selecting a “least” type
> via the C99 macros INT_MAX, INT_LEAST8_MAX, etc.  Otherwise, fall
> further back on one of the builtin C99 types signed char, short,
> and int.  Make sure that any selected type promotes to int.
> Ignore any macros YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16,
> YYTYPE_UINT8 defined by the user.
> (yytype_uint8, yytype_uint16): Do not assume that unsigned char
> and unsigned short promote to int, as this isn’t true on some
> platforms (e.g., TI TMS320C55x).

Eventually, all this should move up into c.m4, and be applied to
glr.c too.  And see what to do about C++ parsers.

> * src/parse-gram.y: Include stdint.h, to help the yacc skeleton.
> (YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16, YYTYPE_UINT8):
> Remove, as these are no longer effective.

Good.


> +/* Narrow types that promote to a signed type and that can represent a
> +   signed or unsigned integer of at least N bits.  In tables they can
> +   save space and decrease cache pressure.  For best results, either
> +   use a compiler like GCC that predefines macros like __INT_MAX__, or
> +   include limits.h and stdint.h in your grammar prolog.  */
> +
> +#if defined __UINT_LEAST8_MAX__ && __UINT_LEAST8_MAX__ <= __INT_MAX__
> +typedef __UINT_LEAST8_TYPE__ yytype_uint8;
> +#elif defined UINT_LEAST8_MAX && defined INT_MAX && UINT_LEAST8_MAX <= 
> INT_MAX
> +typedef uint_least8_t yytype_uint8;
> +#elif defined UCHAR_MAX && UCHAR_MAX <= INT_MAX
>  typedef unsigned char yytype_uint8;
> +#else
> +typedef int yytype_uint8;
>  #endif

Wow!  Why do we fallback to int?  Is this part where unsigned int == unsigned 
short == unsigned char on a little number of architecture?  IIUC, you prefer to 
fall back to many bytes on all the architectures that don't define UCHAR_MAX 
rather than having warnings about storing an unsigned in an unsigned type?  I 
guess that will be frequent: users of other compilers than GCC who don't 
include limits.h.

I don't understand that.  I don't remember receiving bug reports for 
portability issues with "typedef unsigned char yytype_uint8".  The medicine 
seems worse than the disease to me.


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Akim Demaille



> Le 5 oct. 2019 à 12:30, Paul Eggert  a écrit :
> 
> On 10/5/19 3:07 AM, Akim Demaille wrote:
> 
>>> This is a bug in Clang, as both operands of ? are ptrdiff_t. You might try 
>>> filing a bug report with the Clang folks.
>> Maybe the error in actually in the division by the unsigned and the 
>> diagnostic is wrong. 
> 
> Although that's possible, it'd still be a Clang bug.

Sure it is.  I just wanted to characterize it, and indeed, with a cast, the CI 
was happy.  Your fix is fine too.

> There should be no diagnostic for arithmetic on constants where the result is 
> mathematically correct. GCC generally gets this right.
> 
> My experience in other projects is that we don't have time to maintain 
> workarounds for Clang warning bugs (or for warning bugs in old GCCs, for that 
> matter). But if you want to take on the task,

Bison is in a different place and cannot be compared to most projects: we embed 
code in project of others.  I don't care that much about bison-the-executable 
(which is in C99 anyway), but the problem at hand is/was in the output.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert
I installed the attached patch, which should work around the clang 8 bug. At 
least, it fixes things on Fedora 30, which has clang version 8.0.0 (Fedora 
8.0.0-3.fc30).
>From e69b47cd189d7eb45668acce4cf17815401bf2a5 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 5 Oct 2019 03:41:53 -0700
Subject: [PATCH] * data/skeletons/glr.c (yysplitStack): Pacify Clang 8.

---
 data/skeletons/glr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/data/skeletons/glr.c b/data/skeletons/glr.c
index 0e8f60e9..c3c1a9a4 100644
--- a/data/skeletons/glr.c
+++ b/data/skeletons/glr.c
@@ -1506,7 +1506,8 @@ yysplitStack (yyGLRStack* yystackp, ptrdiff_t yyk)
 {
   yyGLRState** yynewStates = YY_NULLPTR;
   yybool* yynewLookaheadNeeds;
-  ptrdiff_t half_max_capacity = YYSIZEMAX / (2 * sizeof yynewStates[0]);
+  ptrdiff_t state_size = sizeof yynewStates[0];
+  ptrdiff_t half_max_capacity = YYSIZEMAX / 2 / state_size;
 
   if (half_max_capacity < yystackp->yytops.yycapacity)
 yyMemoryExhausted (yystackp);
-- 
2.17.1



Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert

On 10/5/19 3:07 AM, Akim Demaille wrote:


This is a bug in Clang, as both operands of ? are ptrdiff_t. You might try 
filing a bug report with the Clang folks.


Maybe the error in actually in the division by the unsigned and the diagnostic is wrong. 


Although that's possible, it'd still be a Clang bug. There should be no 
diagnostic for arithmetic on constants where the result is mathematically 
correct. GCC generally gets this right.


My experience in other projects is that we don't have time to maintain 
workarounds for Clang warning bugs (or for warning bugs in old GCCs, for that 
matter). But if you want to take on the task,




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert

On 10/5/19 3:13 AM, Akim Demaille wrote:

-ARGMATCH_DEFINE_GROUP(report, enum report);
+ARGMATCH_DEFINE_GROUP(report, enum report)
It looks ugly to me eyes.  How about I change ARGMATCH_DEFINE_GROUP so that it finishes with a function prototype, for instance, and we restore that semicolon? 


Feel free, though to be honest I'd rather that ARGMATCH_DEFINE_GROUP were 
replaced by a set of functions and types somehow. Huge macros give me the willies.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Akim Demaille



> Le 5 oct. 2019 à 10:48, Paul Eggert  a écrit :
> 

> -ARGMATCH_DEFINE_GROUP(report, enum report);
> +ARGMATCH_DEFINE_GROUP(report, enum report)

It looks ugly to me eyes.  How about I change ARGMATCH_DEFINE_GROUP so that it 
finishes with a function prototype, for instance, and we restore that semicolon?




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Akim Demaille



> Le 5 oct. 2019 à 10:50, Paul Eggert  a écrit :
> 
> On 10/5/19 12:28 AM, Akim Demaille wrote:
>> I can't wait: these errors are blocking from me from pushing other changes 
>> that I was about to send.
> 
> Thanks for fixing that. I don't want to block you, and please feel free to 
> revert or amend any of my changes.

It's also that, given your recent sourcequakes, I'm very afraid that some of my 
branches become int-rotten :)  So I'd like to push them before additional large 
changes.

> I don't know C++ and don't particularly want to learn (just as I don't want 
> to learn Fortran :-) so I particularly rely on your expertise there.

I love when you call me an expert :)

Well, technically you didn't actually wrote that, but I'll keep on reading it 
that way :)


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Akim Demaille
Hi Paul,

Thanks for the quick response!

> Le 5 oct. 2019 à 10:48, Paul Eggert  a écrit :
> 
> On 10/3/19 10:02 PM, Akim Demaille wrote:
> 
>> I'm going back to Clang 3.3 and GCC 4.6.  That's all I managed to set up on 
>> Travis.  I would also like to try tcc, but only to run the tests, not build 
>> bison itself.  Travis also supports MSVC; when I feel brave enough, I'll see 
>> if I can use it for Bison.
> 
> For what it's worth, I can't build Bison master on Fedora 30 (the current 
> stable version of Fedora). Something to do with gettext version numbers.

I currently don't have Gettext 0.20 available on my distro, and bootstrap pulls 
gnulib-po which installs files from Gettext 0.20.  So I have to `cp 
po/Makefile.in.in gnulib-po`.  That's also what the CI does.


> At some point I suppose I should file a bug report. I work around the bug 
> with --disable-nls, but that's not compatible with --enable-gcc-warnings (so 
> I suppose I should file *two* bug reports :-).

I had not noticed this :(

> Also, I have been trying to build Bison master on Solaris 10 (the oldest 
> Solaris still supported) with Oracle Developer Studio 12.6 (the current 
> stable version). I found a few glitches and just now installed the attached. 
> I think there will be a few glitches more.

It's great that you tried that!  Nelson, at some point, had sent me logs about 
this platform, but that was a long time ago.

Maybe some of these fixes should make it into `maint`, in case we have to 
release 3.4.3.


>> We can introduce a %define variable to enable the new type for 
>> columns/locations, and hook it to %require 3.5 to promote the conversion.
> 
> Should we let the user specify the type (any integer type, I suppose), or 
> limit the user to just 'unsigned' and 'intmax_t'?

I'd rather play it safe and offer only a binary.  I am tired to writing tests 
for convoluted cases :)


> Perhaps call the type 'yy_pos_num' in yacc.c? (Is there a reason it's 
> yy_state_num in yacc.c but yyStateNum in glr.c?) Just thinking out loud.

Well, the reason is that yacc.c and the rest followed "our" style, but when 
Paul Hilfinger contributed glr.c, no-one told him to follow the GCS, and the 
code remained as is.  I would not object to migrate it to the "official" style, 
but I did not felt obligated to do so.  Even if that does mean that on 
occasions, for consistency between glr.c and yacc.c, I write 
snake_case_functions in glr.c, which ends up being self-inconsistent.


>>>   YY_CONVERT_INT_BEGIN
>>>   *yyssp = yystate;
>>>   YY_CONVERT_INT_END
>> Why would a warning here be bogus?  It looks to me like a perfect place to 
>> use a good old cast, as we're narrowing the type.
> 
> I don't like casts, because they mean the compiler won't catch serious errors 
> such as mistakenly assigning a pointer to an integer.

I fully agree.  Actually, C is too crude on casts: there's just one almighty 
cast.  C++ has split this in many casts (originally four, but there are more 
now, some of them being plain templated functions).  We could use macros to 
write clearer casts, say

*yyssp = NARROWING_CAST (yy_state_num, yystate);

which could also, in debug mode, include bounds checking and magical pragmas to 
silence some compilers.  But one has 

But in this precise case, I don't see much of a difference between a cast and 
dark pragma magic: in both cases it means "dear compiler, shut up", yet the 
cast is truly portable.


> Other GNU programs typically do not enable -Wconversion or 
> -Wimplicit-int-conversion because they generate too many false alarms. I 
> figured I could remove the need for casts by disabling the misguided warnings.
> 
> However, after seeing your diagnostics I suppose that the pragmas are more 
> trouble than they're worth. Bison skeletons should be robust even in the 
> presence of misguided compiler options like -Wconversion (because some apps 
> are foolish enough to use such options :-) and it's such a pain to turn off 
> these options portably that casts are probably the way to go in skeletons, 
> despite their flaws.

Right.


>> GCC 8 with ASAN (https://api.travis-ci.org/v3/job/592926526/log.txt):
> 
> That URL uses Clang 8, not GCC 8. Of course Bison should work with Clang too, 
> but see below.
>>> 113. output.at:293: testing Output file name: `~!@#$%^&*()-=_+{}[]|\:;<>, 
>>> .' ...
>>> ../../tests/output.at:293: touch "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.tmp" || 
>>> exit 77
>>> ../../tests/output.at:293: COLUMNS=1000; export COLUMNS;  bison --color=no 
>>> -fno-caret -o "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.c" 
>>> --defines="\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.h" glr.y
>>> ../../tests/output.at:293: ls "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.c" 
>>> "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.h"
>>> stdout:
>>> `~!@#$%^&*()-=_+{}[]|\:;<>, .'.c
>>> `~!@#$%^&*()-=_+{}[]|\:;<>, .'.h
>>> ../../tests/output.at:293: $CC $CFLAGS $CPPFLAGS  -c -o glr.o -c 
>>> "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.c"
>>> stderr:
>>> `~!@#$%^&*()-=_+{}[]|\

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert

On 10/1/19 1:35 AM, Paul Eggert wrote:
Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18 does not 
require support for int16_t. It could well be more efficient for them to use 
int_least16_t instead, for better caching


Attached is a proposed patch to implement this for Bison. This patch also fixes 
some portability problems for odd machines like the TI TMS320C55x where 
'unsigned char', 'unsigned short', and 'unsigned' are all the same width (which 
C allows), and where picky -Wconversion compilers warn about assigning an 
unsigned char value to an int variable.


Comments welcome; I haven't installed this.
From 242504a103da5fc5cb868614a603710bad6a446a Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 5 Oct 2019 02:18:45 -0700
Subject: [PATCH] =?UTF-8?q?Use=20=E2=80=9Cleast=E2=80=9D=20types=20for=20i?=
 =?UTF-8?q?ntegers=20in=20Yacc=20tables?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This changes the Yacc skeleton to use “least” integer types to
keep tables smaller on some platforms, which should lessen cache
pressure.  Since Bison uses the Yacc skeleton, it follows suit.
* data/skeletons/yacc.c (yytype_uint8, yytype_int8)
(yytype_uint16, yytype_int16): If available, use GCC predefined
macros __INT_MAX__ etc. to select a “least” type, as this avoids
namespace hassles.  Otherwise, if available (because the relevant
headers have been included) fall back on selecting a “least” type
via the C99 macros INT_MAX, INT_LEAST8_MAX, etc.  Otherwise, fall
further back on one of the builtin C99 types signed char, short,
and int.  Make sure that any selected type promotes to int.
Ignore any macros YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16,
YYTYPE_UINT8 defined by the user.
(yytype_uint8, yytype_uint16): Do not assume that unsigned char
and unsigned short promote to int, as this isn’t true on some
platforms (e.g., TI TMS320C55x).
* src/parse-gram.y: Include stdint.h, to help the yacc skeleton.
(YYTYPE_INT16, YYTYPE_INT8, YYTYPE_UINT16, YYTYPE_UINT8):
Remove, as these are no longer effective.
---
 data/skeletons/yacc.c | 39 +--
 src/parse-gram.y  |  6 +-
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index fdf4ba6d..41f22aa4 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -388,30 +388,49 @@ m4_if(b4_api_prefix, [yy], [],
 # undef short
 #endif
 
-#ifdef YYTYPE_UINT8
-typedef YYTYPE_UINT8 yytype_uint8;
-#else
+/* Narrow types that promote to a signed type and that can represent a
+   signed or unsigned integer of at least N bits.  In tables they can
+   save space and decrease cache pressure.  For best results, either
+   use a compiler like GCC that predefines macros like __INT_MAX__, or
+   include limits.h and stdint.h in your grammar prolog.  */
+
+#if defined __UINT_LEAST8_MAX__ && __UINT_LEAST8_MAX__ <= __INT_MAX__
+typedef __UINT_LEAST8_TYPE__ yytype_uint8;
+#elif defined UINT_LEAST8_MAX && defined INT_MAX && UINT_LEAST8_MAX <= INT_MAX
+typedef uint_least8_t yytype_uint8;
+#elif defined UCHAR_MAX && UCHAR_MAX <= INT_MAX
 typedef unsigned char yytype_uint8;
+#else
+typedef int yytype_uint8;
 #endif
 
-#ifdef YYTYPE_INT8
-typedef YYTYPE_INT8 yytype_int8;
+#if defined __INT_LEAST8_MAX__ && __INT_LEAST8_MAX__ <= __INT_MAX__
+typedef __INT_LEAST8_TYPE__ yytype_int8;
+#elif defined INT_LEAST8_MAX && defined INT_MAX && INT_LEAST8_MAX <= INT_MAX
+typedef int_least8_t yytype_int8;
 #else
 typedef signed char yytype_int8;
 #endif
 
-#ifdef YYTYPE_UINT16
-typedef YYTYPE_UINT16 yytype_uint16;
-#else
+#if defined __UINT_LEAST16_MAX__ && __UINT_LEAST16_MAX__ <= __INT_MAX__
+typedef __UINT_LEAST16_TYPE__ yytype_uint16;
+#elif defined UINT_LEAST16_MAX && defined INT_MAX && UINT_LEAST16_MAX <= 
INT_MAX
+typedef uint_least16_t yytype_uint16;
+#elif defined USHRT_MAX && USHRT_MAX <= INT_MAX
 typedef unsigned short yytype_uint16;
+#else
+typedef int yytype_uint16;
 #endif
 
-#ifdef YYTYPE_INT16
-typedef YYTYPE_INT16 yytype_int16;
+#if defined __INT_LEAST16_MAX__ && __INT_LEAST16_MAX__ <= __INT_MAX__
+typedef __INT_LEAST16_TYPE__ yytype_int16;
+#elif defined INT_LEAST16_MAX && defined INT_MAX && INT_LEAST16_MAX <= INT_MAX
+typedef int_least16_t yytype_int16;
 #else
 typedef short yytype_int16;
 #endif
 
+
 #ifndef YYPTRDIFF_T
 # if defined __PTRDIFF_TYPE__ && defined __PTRDIFF_MAX__
 #  define YYPTRDIFF_T __PTRDIFF_TYPE__
diff --git a/src/parse-gram.y b/src/parse-gram.y
index b4132841..ab3be761 100644
--- a/src/parse-gram.y
+++ b/src/parse-gram.y
@@ -33,6 +33,7 @@
 {
   #include "system.h"
   #include 
+  #include 
 
   #include "c-ctype.h"
   #include "complain.h"
@@ -112,11 +113,6 @@
   /* A string that describes a char (e.g., 'a' -> "'a'").  */
   static char const *char_name (char);
 
-  #define YYTYPE_INT16 int_fast16_t
-  #define YYTYPE_INT8 int_fast8_t
-  #define YYTYPE_UINT16 uint_fast16_t
-  #define YYTYPE_UINT8 uint_fast8_t
-
   

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert

On 10/5/19 12:28 AM, Akim Demaille wrote:

I can't wait: these errors are blocking from me from pushing other changes that 
I was about to send.


Thanks for fixing that. I don't want to block you, and please feel free to 
revert or amend any of my changes. I don't know C++ and don't particularly want 
to learn (just as I don't want to learn Fortran :-) so I particularly rely on 
your expertise there.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Paul Eggert

On 10/3/19 10:02 PM, Akim Demaille wrote:


I'm going back to Clang 3.3 and GCC 4.6.  That's all I managed to set up on 
Travis.  I would also like to try tcc, but only to run the tests, not build 
bison itself.  Travis also supports MSVC; when I feel brave enough, I'll see if 
I can use it for Bison.


For what it's worth, I can't build Bison master on Fedora 30 (the current stable 
version of Fedora). Something to do with gettext version numbers. At some point 
I suppose I should file a bug report. I work around the bug with --disable-nls, 
but that's not compatible with --enable-gcc-warnings (so I suppose I should file 
*two* bug reports :-).


Also, I have been trying to build Bison master on Solaris 10 (the oldest Solaris 
still supported) with Oracle Developer Studio 12.6 (the current stable version). 
I found a few glitches and just now installed the attached. I think there will 
be a few glitches more.



We can introduce a %define variable to enable the new type for 
columns/locations, and hook it to %require 3.5 to promote the conversion.


Should we let the user specify the type (any integer type, I suppose), or limit 
the user to just 'unsigned' and 'intmax_t'? Perhaps call the type 'yy_pos_num' 
in yacc.c? (Is there a reason it's yy_state_num in yacc.c but yyStateNum in 
glr.c?) Just thinking out loud.




   YY_CONVERT_INT_BEGIN
   *yyssp = yystate;
   YY_CONVERT_INT_END


Why would a warning here be bogus?  It looks to me like a perfect place to use 
a good old cast, as we're narrowing the type.


I don't like casts, because they mean the compiler won't catch serious errors 
such as mistakenly assigning a pointer to an integer. Other GNU programs 
typically do not enable -Wconversion or -Wimplicit-int-conversion because they 
generate too many false alarms. I figured I could remove the need for casts by 
disabling the misguided warnings.


However, after seeing your diagnostics I suppose that the pragmas are more 
trouble than they're worth. Bison skeletons should be robust even in the 
presence of misguided compiler options like -Wconversion (because some apps are 
foolish enough to use such options :-) and it's such a pain to turn off these 
options portably that casts are probably the way to go in skeletons, despite 
their flaws.

GCC 8 with ASAN (https://api.travis-ci.org/v3/job/592926526/log.txt):


That URL uses Clang 8, not GCC 8. Of course Bison should work with Clang too, 
but see below.

113. output.at:293: testing Output file name: `~!@#$%^&*()-=_+{}[]|\:;<>, .' ...
../../tests/output.at:293: touch "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.tmp" || 
exit 77
../../tests/output.at:293: COLUMNS=1000; export COLUMNS;  bison --color=no -fno-caret -o 
"\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.c" --defines="\`~!@#\$%^&*()-=_+{}[]|\\:;<>, 
.'.h" glr.y
../../tests/output.at:293: ls "\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.c" 
"\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.h"
stdout:
`~!@#$%^&*()-=_+{}[]|\:;<>, .'.c
`~!@#$%^&*()-=_+{}[]|\:;<>, .'.h
../../tests/output.at:293: $CC $CFLAGS $CPPFLAGS  -c -o glr.o -c 
"\`~!@#\$%^&*()-=_+{}[]|\\:;<>, .'.c"
stderr:
`~!@#$%^&*()-=_+{}[]|\:;<>, .'.c:1467:37: error: operand of ? changes 
signedness: 'ptrdiff_t' (aka 'long') to 'unsigned long' [-Werror,-Wsign-conversion]
   ptrdiff_t half_max_capacity = YYSIZEMAX / (2 * sizeof yynewStates[0]);
 ^ ~
`~!@#$%^&*()-=_+{}[]|\:;<>, .'.c:132:59: note: expanded from macro 'YYSIZEMAX'
#define YYSIZEMAX (PTRDIFF_MAX < SIZE_MAX ? PTRDIFF_MAX : (ptrdiff_t) SIZE_MAX)


This is a bug in Clang, as both operands of ? are ptrdiff_t. You might try 
filing a bug report with the Clang folks.


My experience with Clang warnings has not been great. Clang issues multiple 
bogus warnings when compiling Bison itself, and there's little point to enabling 
warnings, particularly in the test cases since Clang issues a bunch of false 
alarms, so many that it's not worth looking for wheat among the chaff. I suggest 
using --enable-gcc-warnings only with recent versions of GCC, which is my 
practice in other projects. It's too much work to pacify Clang or older GCC 
versions, and there's little benefit to doing so.


There's also the problem of using __STDC_VERSION__ which might be undefined. 


Sorry about that, and thanks for fixing it.
>From 41e84cddc778d2a8902b1cdff6a85f3ea3422512 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 5 Oct 2019 01:07:52 -0700
Subject: [PATCH 1/3] Port lexcalc scan.l to Solaris 10

* examples/c/lexcalc/scan.l: Include errno.h.
---
 examples/c/lexcalc/scan.l | 1 +
 1 file changed, 1 insertion(+)

diff --git a/examples/c/lexcalc/scan.l b/examples/c/lexcalc/scan.l
index fc73f3b0..815f7782 100644
--- a/examples/c/lexcalc/scan.l
+++ b/examples/c/lexcalc/scan.l
@@ -4,6 +4,7 @@
 %option nodefault noinput nounput noyywrap
 
 %{
+#include  /* errno, ERANGE */
 #include  /* INT_MIN */
 #include  /* strtol */
 
-- 
2.17.1

>From b75b05528803de1582d7bcfbd3480ee70a0e987b Mon Sep 17 00:00:00 

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-05 Thread Akim Demaille
Hi Paul,

> Le 4 oct. 2019 à 07:02, Akim Demaille  a écrit :
> 
>>> Old compilers choke on the current code in master, we have to stop and 
>>> focus on this first.
>> 
>> I can help with that if you'll give me advice about how old to worry about 
>> (as long as it's not before GCC 3.4.3 :-).
> 
> There are several types of errors.  Some are due to your removing some casts, 
> and replacing them with pragmas.  Actually, I don't understand what you meant:
> 
>> /* Suppress bogus -Wconversion warnings from GCC.  */
>> #if 4 < __GNUC__ + (7 <= __GNUC_MINOR__)
>> # define YY_CONVERT_INT_BEGIN \
>>_Pragma ("GCC diagnostic push") \
>>_Pragma ("GCC diagnostic ignored \"-Wconversion\"")
>> # define YY_CONVERT_INT_END \
>>_Pragma ("GCC diagnostic pop")
>> #else
>> # define YY_CONVERT_INT_BEGIN
>> # define YY_CONVERT_INT_END
>> #endif
>> 
>>typedef yytype_int8 yy_state_num;
>>int yystate;
>>yy_state_num *yyssp;
>> 
>>  YY_CONVERT_INT_BEGIN
>>  *yyssp = yystate;
>>  YY_CONVERT_INT_END

I can't wait: these errors are blocking from me from pushing other changes that 
I was about to send.

This patch is trivial.  I still don't understand what the Pragma approach was 
buying us, but we can restore it afterwards.

commit 5709f94a91155149fd69479b3dd179de674935b7
Author: Akim Demaille 
Date:   Sat Oct 5 08:55:51 2019 +0200

yacc.c: use casts instead of pragmas when losing integer width

For instance with Clang 4, 8, etc.:

input.c:1166:12: error: implicit conversion loses integer precision: 
'int' to 'yy_state_num' (aka 'signed char') [-Werror,-Wconversion]
  *yyssp = yystate;
 ~ ^~~

And GCC 8:

input.c:1166:12: error: implicit conversion loses integer precision: 
'int' to 'yy_state_num' (aka 'signed char') [-Werror,-Wimplicit-int-conversion]
  *yyssp = yystate;
 ~ ^~~

* data/skeletons/yacc.c (YY_CONVERT_INT_BEGIN): Remove.
Adjust callers.

diff --git a/data/skeletons/yacc.c b/data/skeletons/yacc.c
index f3ef63fe..fdf4ba6d 100644
--- a/data/skeletons/yacc.c
+++ b/data/skeletons/yacc.c
@@ -468,18 +468,6 @@ typedef ]b4_int_type(0, m4_eval(b4_states_number - 1))[ 
yy_state_num;
 
 ]b4_attribute_define[
 
-/* Suppress bogus -Wconversion warnings from GCC.  */
-#if 4 < __GNUC__ + (7 <= __GNUC_MINOR__)
-# define YY_CONVERT_INT_BEGIN \
-_Pragma ("GCC diagnostic push") \
-_Pragma ("GCC diagnostic ignored \"-Wconversion\"")
-# define YY_CONVERT_INT_END \
-_Pragma ("GCC diagnostic pop")
-#else
-# define YY_CONVERT_INT_BEGIN
-# define YY_CONVERT_INT_END
-#endif
-
 ]b4_parse_assert_if([[#ifdef NDEBUG
 # define YY_ASSERT(E) ((void) (0 && (E)))
 #else
@@ -1062,9 +1050,7 @@ yy_lac (yy_state_num *yyesa, yy_state_num **yyes,
 if (yyesp == yyes_prev)
   {
 yyesp = *yyes;
-YY_CONVERT_INT_BEGIN
-*yyesp = yystate;
-YY_CONVERT_INT_END
+*yyesp = (yy_state_num) yystate;
   }
 else
   {
@@ -1077,9 +1063,7 @@ yy_lac (yy_state_num *yyesa, yy_state_num **yyes,
 YYDPRINTF ((stderr, "\n"));
 return 2;
   }
-YY_CONVERT_INT_BEGIN
-*++yyesp = yystate;
-YY_CONVERT_INT_END
+*++yyesp = (yy_state_num) yystate;
   }
 YYDPRINTF ((stderr, " G%d", yystate));
   }
@@ -1542,9 +1526,7 @@ yynewstate:
 yysetstate:
   YYDPRINTF ((stderr, "Entering state %d\n", yystate));
   YY_ASSERT (0 <= yystate && yystate < YYNSTATES);
-  YY_CONVERT_INT_BEGIN
-  *yyssp = yystate;
-  YY_CONVERT_INT_END
+   *yyssp = (yy_state_num) yystate;
 
   if (yyss + yystacksize - 1 <= yyssp)
 #if !defined yyoverflow && !defined YYSTACK_RELOCATE




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-03 Thread Akim Demaille
Hi!

> Le 3 oct. 2019 à 20:20, Paul Eggert  a écrit :
> 
> On 10/2/19 11:28 PM, Akim Demaille wrote:
> 
>> Please, in the future, make smaller commits, per topic.
> 
> Yes, I'll try to split things up better.

Thanks.


>>  And run the test suite with an old compiler too.
> How old? Are we backward compatible to (say) GCC 3.4.3? That's the oldest I 
> have convenient access to, though it's on reaally slow hardware.

Not that old.  I wish I could easily reach older compilers but so far, I'm 
going back to Clang 3.3 and GCC 4.6.  That's all I managed to set up on Travis. 
 I would also like to try tcc, but only to run the tests, not build bison 
itself.  Travis also supports MSVC; when I feel brave enough, I'll see if I can 
use it for Bison.

Also, don't forget about ./configure --enable-gcc-warnings.


> The Bison manual says that Bison itself needs only C99, and that 
> Bison-generated C parsers need only C89 (or C99 for glr) or C++98, but I 
> don't know how that relates to compiler (and presumably library) version 
> numbers.

Can't help you here.

> At some point I suppose we should drop support for C89 as it is now more 
> trouble than it's worth. But that can wait.

Agreed.


>> I don't want to break the API for the generated locations in C++.  At least 
>> we need to talk about this with users.
> 
> Yes, that makes sense. Also, if we're going to change the API for line and 
> column numbers, I suggest we go to a wider type instead of stopping at 'int'. 
> These days, 'int' and 'unsigned' are both too narrow for some possible 
> applications of Bison, and the only portable-to-all-modern-C solutions that 
> are signed are 'intmax_t' and 'long long'. I could go with either of those.

Sounds good.

> How can we change the API to switch from 'unsigned' to 'intmax_t' (or to 
> 'long long') without breaking existing apps?

We can introduce a %define variable to enable the new type for 
columns/locations, and hook it to %require 3.5 to promote the conversion.

> I don't have enough C++ expertise to know how to do this in the best way.

We could play with overloading to support both types, but it will probably be 
nasty to disambiguate, to document.  I think we should rather address that at 
generation time.


>> [...] clutters the documentation.
> 
> OK, I changed it back, and put in a little note saying "this code assumes 
> that malloc and realloc calls always succeed" or something like that. See the 
> first attached patch. The second attachment fixes a lack of clarity with the 
> documentation of malloc elsewhere, which I noticed with the first patch.

Both are fine, thanks!

>> Old compilers choke on the current code in master, we have to stop and focus 
>> on this first.
> 
> I can help with that if you'll give me advice about how old to worry about 
> (as long as it's not before GCC 3.4.3 :-).


There are several types of errors.  Some are due to your removing some casts, 
and replacing them with pragmas.  Actually, I don't understand what you meant:

> /* Suppress bogus -Wconversion warnings from GCC.  */
> #if 4 < __GNUC__ + (7 <= __GNUC_MINOR__)
> # define YY_CONVERT_INT_BEGIN \
> _Pragma ("GCC diagnostic push") \
> _Pragma ("GCC diagnostic ignored \"-Wconversion\"")
> # define YY_CONVERT_INT_END \
> _Pragma ("GCC diagnostic pop")
> #else
> # define YY_CONVERT_INT_BEGIN
> # define YY_CONVERT_INT_END
> #endif
> 
> typedef yytype_int8 yy_state_num;
> int yystate;
> yy_state_num *yyssp;
> 
>   YY_CONVERT_INT_BEGIN
>   *yyssp = yystate;
>   YY_CONVERT_INT_END

Why would a warning here be bogus?  It looks to me like a perfect place to use 
a good old cast, as we're narrowing the type.  And that would help fixing the 
warnings we get from clang.  For instance:

Clang 3.3 (https://api.travis-ci.org/v3/job/592926546/log.txt)

> 25. input.at:1201: testing Torturing the Scanner ...
> ../../tests/input.at:1205: COLUMNS=1000; export COLUMNS; 
> VALGRIND_OPTS="$VALGRIND_OPTS --leak-check=summary --show-reachable=no"; 
> export VALGRIND_OPTS;  bison --color=no -fno-caret input.y
> ../../tests/input.at:1213: COLUMNS=1000; export COLUMNS; 
> VALGRIND_OPTS="$VALGRIND_OPTS --leak-check=summary --show-reachable=no"; 
> export VALGRIND_OPTS;  bison --color=no -fno-caret -fcaret  input.y
> ../../tests/input.at:1343: COLUMNS=1000; export COLUMNS;  bison --color=no 
> -fno-caret -d -v -o input.c input.y
> ../../tests/input.at:1344: $CC $CFLAGS $CPPFLAGS  -c -o input.o input.c 
> stderr:
> input.c:1166:12: error: implicit conversion loses integer precision: 'int' to 
> 'yy_state_num' (aka 'signed char') [-Werror,-Wconversion]
>   *yyssp = yystate;
>  ~ ^~~
> 1 error generated.
> stdout:
> ../../tests/input.at:1344: exit code was 1, expected 0
> 25. input.at:1201: 25. Torturing the Scanner (input.at:1201): FAILED 
> (input.at:1344)


Clang 7 (https://api.travis-ci.org/v3/job/592926536/log.txt)

> 25. input.at:1201: testing Torturing the Scanner ...
> [...]
> ../../tests/input.at:1344: $

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-03 Thread Akim Demaille
Hi Paul,

> Le 3 oct. 2019 à 20:20, Paul Eggert  a écrit :
> 
> On 10/2/19 11:28 PM, Akim Demaille wrote:
> 
>> But I see you already pushed it, although there are things I would have 
>> liked to discuss first.  Also, we now have failures in the history of master 
>> and maint, which I strive to avoid.  And the CI is red currently.
> 
> Sorry, I'm so rusty at Bison development I did not know the current rules. 
> What CI is that? I don't see it mentioned in 
>  or in 
> ; should it be?

Well...  It's not very compliant with the FSF policies, so I run it on my own:

- I keep a public repo on GitHub where I push my experiments and where push -f 
is authorized.  That's https://github.com/akimd/bison.
- In the repo, there is a .travis.yml: this is the CI I use, and it's publicly 
visible here: https://travis-ci.org/akimd/bison/builds
- In particular, our current failures: 
https://travis-ci.org/akimd/bison/builds/592926523
- Unfortunately, because I use "secrets" only my own jobs are run on the CI 
(otherwise one could easily change .travis.yml to display the secrets and just 
read them in the logs).

I don't have much time right now, I'll continue tomorrow in the morning.

Cheers!


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-03 Thread Paul Eggert

On 10/2/19 11:28 PM, Akim Demaille wrote:


But I see you already pushed it, although there are things I would have liked 
to discuss first.  Also, we now have failures in the history of master and 
maint, which I strive to avoid.  And the CI is red currently.


Sorry, I'm so rusty at Bison development I did not know the current 
rules. What CI is that? I don't see it mentioned in 
 or in 
; should it be?



Please, in the future, make smaller commits, per topic.


Yes, I'll try to split things up better.


  And run the test suite with an old compiler too.
How old? Are we backward compatible to (say) GCC 3.4.3? That's the 
oldest I have convenient access to, though it's on reaally slow hardware.


The Bison manual says that Bison itself needs only C99, and that 
Bison-generated C parsers need only C89 (or C99 for glr) or C++98, but I 
don't know how that relates to compiler (and presumably library) version 
numbers.


At some point I suppose we should drop support for C89 as it is now more 
trouble than it's worth. But that can wait.



I don't want to break the API for the generated locations in C++.  At least we 
need to talk about this with users.


Yes, that makes sense. Also, if we're going to change the API for line 
and column numbers, I suggest we go to a wider type instead of stopping 
at 'int'. These days, 'int' and 'unsigned' are both too narrow for some 
possible applications of Bison, and the only portable-to-all-modern-C 
solutions that are signed are 'intmax_t' and 'long long'. I could go 
with either of those.


How can we change the API to switch from 'unsigned' to 'intmax_t' (or to 
'long long') without breaking existing apps? I don't have enough C++ 
expertise to know how to do this in the best way.


Also, I dislike the fact that the examples in the documentation become 
more complex for being more robust.  I fully agree that examples should 
not be too naive, but IMHO going from



if (i == length)
   {
 length *= 2;
 symbuf = realloc (symbuf, length + 1);
   }


to


if (bufsize <= i)
   {
 ptrdiff_t maxsize
   = (PTRDIFF_MAX < SIZE_MAX
  ? PTRDIFF_MAX : SIZE_MAX);
 if ((maxsize - 40) / 2 < bufsize)
   abort ();
 bufsize = 2 * bufsize + 40;
 symbuf = realloc (symbuf, bufsize);
 if (!symbuf)
   abort ();
   }


clutters the documentation.


OK, I changed it back, and put in a little note saying "this code 
assumes that malloc and realloc calls always succeed" or something like 
that. See the first attached patch. The second attachment fixes a lack 
of clarity with the documentation of malloc elsewhere, which I noticed 
with the first patch.



Old compilers choke on the current code in master, we have to stop and focus on 
this first.


I can help with that if you'll give me advice about how old to worry 
about (as long as it's not before GCC 3.4.3 :-).
>From e356c1ca5759e5f4336842483b550cbd6a566b5d Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Thu, 3 Oct 2019 11:15:02 -0700
Subject: [PATCH 1/2] Simplify mfcalc error handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* doc/bison.texi (Mfcalc Symbol Table, Mfcalc Lexer):
Don’t abort on memory allocation failure or integer overflow.
Instead, comment that these things aren’t checked for.
---
 doc/bison.texi | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/doc/bison.texi b/doc/bison.texi
index 5545313d..a89142df 100644
--- a/doc/bison.texi
+++ b/doc/bison.texi
@@ -2662,19 +2662,21 @@ found, a pointer to that symbol is returned; otherwise zero is returned.
 
 @comment file: mfcalc.y: 3
 @example
-#include  /* malloc, abort. */
+@group
+/* The mfcalc code assumes that malloc and realloc
+   always succeed, and that integer calculations
+   never overflow.  Production-quality code should
+   not make these assumptions.  */
+#include  /* malloc, realloc. */
 #include  /* strlen. */
+@end group
 
 @group
 symrec *
 putsym (char const *name, int sym_type)
 @{
   symrec *res = (symrec *) malloc (sizeof (symrec));
-  if (!res)
-abort ();
   res->name = strdup (name);
-  if (!res->name)
-abort ();
   res->type = sym_type;
   res->value.var = 0; /* Set value to 0 even if fun. */
   res->next = sym_table;
@@ -2717,7 +2719,6 @@ operators in @code{yylex}.
 @example
 #include 
 #include 
-#include 
 
 @group
 int
@@ -2764,15 +2765,8 @@ Bison generated a definition of @code{YYSTYPE} with a member named
   /* If buffer is full, make it bigger. */
   if (bufsize <= i)
 @{
-  ptrdiff_t maxsize
-= (PTRDIFF_MAX < SIZE_MAX
-   ? PTRDIFF_MAX : SIZE_MAX);
-  if ((maxsize - 40) / 2 < bufsize)
-abort ();
   bufsize = 2 * bufsize + 40;
   symbuf = realloc (symbuf, bufsize);
-  if (!symbuf)
-  

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-02 Thread Akim Demaille
Hi Paul,

> Le 3 oct. 2019 à 02:15, Paul Eggert  a écrit :
> 
> On 10/1/19 10:27 PM, Akim Demaille wrote:
>> Let's start from this to finish the changes wrt state types
> 
> I went through the skeletons and got rid of unsigned types as best I could, 
> and installed the attached patch.

This is massive!  Thanks!

But I see you already pushed it, although there are things I would have liked 
to discuss first.  Also, we now have failures in the history of master and 
maint, which I strive to avoid.  And the CI is red currently.

Please, in the future, make smaller commits, per topic.  And run the test suite 
with an old compiler too.


> This also removes some casts that I think are no longer needed. (It also adds 
> casts in some places, alas.)

Yep, I saw this.  This is cast within C itself :)


> Not being a C++ expert, I may have messed up the C++ skeletons but I figure 
> we can fix them as needed.

I'll do that, but I first have to re-rebase my changes from yesterday, which 
were waiting the green light from the CI.

About your changes.  I don't want to break the API for the generated locations 
in C++.  At least we need to talk about this with users.  Also, I dislike the 
fact that the examples in the documentation become more complex for being more 
robust.  I fully agree that examples should not be too naive, but IMHO going 
from

> if (i == length)
>   {
> length *= 2;
> symbuf = realloc (symbuf, length + 1);
>   }

to

> if (bufsize <= i)
>   {
> ptrdiff_t maxsize
>   = (PTRDIFF_MAX < SIZE_MAX
>  ? PTRDIFF_MAX : SIZE_MAX);
> if ((maxsize - 40) / 2 < bufsize)
>   abort ();
> bufsize = 2 * bufsize + 40;
> symbuf = realloc (symbuf, bufsize);
> if (!symbuf)
>   abort ();
>   }

clutters the documentation.

Old compilers choke on the current code in master, we have to stop and focus on 
this first.

Cheers!


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-02 Thread Paul Eggert

On 10/1/19 10:27 PM, Akim Demaille wrote:

Let's start from this to finish the changes wrt state types


I went through the skeletons and got rid of unsigned types as best I 
could, and installed the attached patch. This also removes some casts 
that I think are no longer needed. (It also adds casts in some places, 
alas.) Not being a C++ expert, I may have messed up the C++ skeletons 
but I figure we can fix them as needed.
>From 9095910c7624194e819a6596700a3dd44995392b Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Wed, 2 Oct 2019 16:56:32 -0700
Subject: [PATCH] Prefer signed to unsigned integers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch contains more fixes to prefer signed to unsigned
integer types, as modern tools like 'gcc -fsanitize=undefined'
can check for signed integer overflow but not unsigned overflow.
* NEWS: Document the API change.
* boostrap.conf (gnulib_modules): Add intprops.
* data/skeletons/glr.c: Include stddef.h and stdint.h,
since this skeleton can assume C99 or later.
(YYSIZEMAX): Now signed, and the minimum of SIZE_MAX and PTRDIFF_MAX.
(yybool) [!__cplusplus]: Now signed (which is how bool behaves).
(YYTRANSLATE): Avoid use of unsigned, and make the macro
safe even for values greater than UINT_MAX.
(yytnamerr, struct yyGLRState, struct yyGLRStateSet, struct yyGLRStack)
(yyaddDeferredAction, yyinitStateSet, yyinitGLRStack)
(yyexpandGLRStack, yymarkStackDeleted, yyremoveDeletes)
(yyglrShift, yyglrShiftDefer, yy_reduce_print, yydoAction)
(yyglrReduce, yysplitStack, yyreportTree, yycompressStack)
(yyprocessOneStack, yyreportSyntaxError, yyrecoverSyntaxError)
(yyparse, yy_yypstack, yypstack, yypdumpstack):
* tests/input.at (Torturing the Scanner):
Prefer ptrdiff_t to size_t.
* data/skeletons/c++.m4 (b4_yytranslate_define):
* src/AnnotationList.c (AnnotationList__computePredecessorAnnotations):
* src/AnnotationList.h (AnnotationIndex):
* src/InadequacyList.h (InadequacyListNodeCount):
* src/closure.c (closure_new):
* src/complain.c (error_message, complains, complain_indent)
(complain_args, duplicate_directive, duplicate_rule_directive):
* src/gram.c (nritems, ritem_print, grammar_dump):
* src/ielr.c (ielr_compute_ritem_sees_lookahead_set)
(ielr_item_has_lookahead, ielr_compute_annotation_lists)
(ielr_compute_lookaheads):
* src/location.c (columns, boundary_print, location_print):
* src/muscle-tab.c (muscle_percent_define_insert)
(muscle_percent_define_check_values):
* src/output.c (prepare_rules, prepare_actions):
* src/parse-gram.y (id, handle_require):
* src/reader.c (record_merge_function_type, packgram):
* src/reduce.c (nuseless_productions, nuseless_nonterminals)
(inaccessable_symbols):
* src/relation.c (relation_print):
* src/scan-code.l (variant, variant_table_size, variant_count)
(variant_add, get_at_spec, show_sub_message, show_sub_messages)
(parse_ref):
* src/scan-gram.l ()
(scan_integer, convert_ucn_to_byte, handle_syncline):
* src/scan-skel.l (at_complain):
* src/symtab.c (complain_symbol_redeclared)
(complain_semantic_type_redeclared, complain_class_redeclared)
(symbol_class_set, complain_user_token_number_redeclared):
* src/tables.c (conflict_tos, conflrow, conflict_table)
(conflict_list, save_row, pack_vector):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Prefer signed to unsigned integer.
* data/skeletons/lalr1.cc (yy_lac_check_):
* tests/actions.at (_AT_CHECK_PRINTER_AND_DESTRUCTOR):
* tests/local.at (AT_YYLEX_DEFINE(c)):
Omit now-unnecessary casts.
* data/skeletons/location.cc (b4_location_define):
* doc/bison.texi (Mfcalc Lexer, C++ position, C++ location):
Prefer int to unsigned for line and column numbers.
Change example to abort explicitly on memory exhaustion,
and fix an off-by-one bug that led to undefined behavior.
* data/skeletons/stack.hh (stack::operator[]):
Also allow ptrdiff_t indexes.
(stack::pop, slice::slice, slice::operator[]):
Index arg is now ptrdiff_t, not int.
(stack::ssize): New method.
(slice::range_): Now ptrdiff_t, not int.
* data/skeletons/yacc.c (b4_state_num_type): Remove.
All uses replaced by b4_int_type.
(YY_CONVERT_INT_BEGIN, YY_CONVERT_INT_END): New macros.
(yylac, yyparse): Use them around conversions that -Wconversion
would give false alarms about. 	Omit unnecessary casts.
(yy_stack_print): Use int rather than unsigned, and omit
a cast that doesn’t seem to be needed here any more.
* examples/c++/variant.yy (yylex):
* examples/c++/variant-11.yy (yylex):
Omit no-longer-needed conversions to unsigned.
* src/InadequacyList.c (InadequacyList__new_conflict):
Don’t assume *node_count is unsigned.
* src/output.c (muscle_insert_unsigned_table):
Remove; no longer used.
---
 NEWS   |   4 +
 bootstrap.conf |   2 +-
 data/skeletons/c++.m4  |   4 +-
 data/skeletons/c.m4|   2 +-
 data/skeletons/glr.c   | 208 ++---
 data/skeletons/lalr1.cc|   4 +-
 data/skeletons/location.cc |  27 +++--
 data/skeletons/stack.hh   

Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-02 Thread Paul Eggert

On 10/2/19 10:55 AM, Kaz Kylheku wrote:

Over time, POSIX has invented new identifiers that didn't follow
any namespace pattern. Or they suddenly invented new namespace
prefixes, unpredictably. We've survived.


To be fair, they've largely done this only when you include a POSIX header.


If some new_fangled_t
is introduced into  in 2050, there will be some
-DPOSIX_SOURCE=20500112 or whatever to enable its visibility,
and that of any function prototypes which use it.


There's no need for an implementation to do that since _t is reserved, 
and in practice many implementations don't do it.


"Conforming applications shall not use names beginning in yy or YY since 
the yacc parser uses such names."


Sorry, I didn't know that.



Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-02 Thread Kaz Kylheku

On 2019-10-01 22:27, Akim Demaille wrote:

Apparently like Kaz, I like to name my types *_t, so I'd be happy with
yy_state_t (scalars) and yy_state_least_t/yy_small_state_t (arrays)


The claiming of _t by POSIX is largely bunk.

Here is one counterargument:

1. Every finite string has the empty string as a suffix.

2. All current and foreseeable future POSIX and ISO C identifiers
   are finite; thus they have the empty string as a suffix.

3. Therefore, to avoid clashes, don't use identifiers that have an
   empty string suffix claimed by POSIX! Use only infinitely long
   identifiers.

Over time, POSIX has invented new identifiers that didn't follow
any namespace pattern. Or they suddenly invented new namespace
prefixes, unpredictably. We've survived.

We also have feature selection macros. If some new_fangled_t
is introduced into  in 2050, there will be some
-DPOSIX_SOURCE=20500112 or whatever to enable its visibility,
and that of any function prototypes which use it.


(given that here, we're kind of protected by yy_*: POSIX/ISO etc.
would be really bad people to start using them.)


POSIX people *are* using them; yacc is specified in POSIX!

"Conforming applications shall not use names beginning in yy or YY since 
the yacc parser uses such names."


https://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html

You better know this! :)



Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-02 Thread Paul Eggert

On 10/1/19 10:27 PM, Akim Demaille wrote:

I don't understand why you shy away from -128 and -32768 though.


The C Standard doesn't guarantee support for -128 and -32768. This originally 
was for ones' complement machines such as the Unisys ClearPath Dorado enterprise 
servers (still in use via firmware translation to Intel Xeon, and they have a C 
compiler), as well as for signed-magnitude machines that are no longer in use. 
Nowadays at least in theory this can be useful for debugging implementations on 
two's-complement machines, which can use -32768 as a trap representation, though 
it appears nobody is currently doing that and the next C standard may change in 
this area. See:


Seacord RC. Uninitialized reads: understanding the proposed revisions to the C 
language. ACM Queue. 2017;14(6). https://queue.acm.org/detail.cfm?id=3041020


Bison should support the current C standard (who knows? maybe some 
Bison-generated code will run on those Unisys mainframes :-) and so I stuck with 
the standard values.


I like to name my types *_t, so I'd be happy with yy_state_t (scalars) and 
yy_state_least_t/yy_small_state_t (arrays) instead of int and yy_state_num 
currently. But we've had disagreements on this regard, yet I don't know where 
you stand today (given that here, we're kind of protected by yy_*: POSIX/ISO etc. would be really bad people to start using them.)


In the past I was reasonably strict about avoiding user-defined type names 
ending in "_t", mostly because that suffix is reserved by POSIX, but partly 
because I simply dislike the tradition of the Hungarian notation 
. But you're right that 
"yy_*_t" should be safe enough.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-02 Thread Hans Åberg


> On 2 Oct 2019, at 00:43, Paul Eggert  wrote:
> 
> However, we should avoid unsigned types that are 'unsigned' or wider, as they 
> have too many issues. I doubt whether there are practical uses of Bison with 
> more than INT_MAX states; but if there are, we should use ptrdiff_t to count 
> states, not int, because any application likely to exceed INT_MAX is also 
> pretty likely to exceed UINT_MAX.
> 
> There is an area where Bison uses 'int' when it should use a wider type, 
> presumably intmax_t. This is for line and column numbers, which these come 
> from input files and can exceed INT_MAX in some practical cases. Fixing that 
> could be a subject of a different patch.
> 
> There are no doubt other uses of 'int' in Bison for indexes should be changed 
> to ptrdiff_t, for things like stack depth where Bison should not impose 
> arbitrary limits. I think I should take a look at that next; this will 
> probably entail improvements to the patch I proposed.

For C++, I use
  typedef std::ptrdiff_t int_type;
  typedef std::size_t size_type;

In C++ one uses fpos_t for file positions [1], in functions like std::fgetpos 
[2]. 

1. https://en.cppreference.com/w/cpp/io/c
2. https://en.cppreference.com/w/cpp/io/c/fgetpos





Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Akim Demaille
Hi Paul,

Thanks for the answer!  (And thanks to Kaz too)

> Le 1 oct. 2019 à 10:35, Paul Eggert  a écrit :
> 
> On 9/29/19 11:34 AM, Akim Demaille wrote:

> 
>> In my proposal below, I fuse both cases to use a single type,
>> yy_state_num, which is the smallest type that contains the set of
>> states: an unsigned type.

For the record, eventually I had changed my mind to use (i) int for the scalar 
variables, and (ii) for arrays of states, the smallest unsigned types of at 
most 16 bits, then an int.  Which Kaz named (i) yy_state_t and (ii) 
yy_small_state_t.  That's this patch:

https://lists.gnu.org/archive/html/bug-bison/2019-09/msg00026.html


> In other GNU applications, we've been moving away from using unsigned types 
> due to their confusing behavior

I fully subscribe to this view.  I think the trend started in C++ first, and 
followed in C.  I was unaware though that GNU projects followed this path.


> (particularly when comparing signed vs unsigned), and because modern tools 
> such as 'gcc -fsanitize=undefined' can check for signed integer overflow but 
> (obviously) not for unsigned integer overflow. In this newer style, it's OK 
> to use unsigned types for bit vectors and hash values since these typically 
> don't involve integer comparisons and integer overflow is typically harmless; 
> for indexes, though, unsigned types are so error-prone that they should be 
> avoided.

On this last sentence, though, I was a bit concerned, since I read this as 
"don't use unsigned type".  Actually, you still agree to use unsigned type when 
we need their last bit, otherwise prefer signed:

>  # b4_int_type(MIN, MAX)
>  # -
> -# Return the smallest int type able to handle numbers ranging from
> -# MIN to MAX (included).
> +# Return a narrow int type able to handle integers ranging from MIN
> +# to MAX (included) in portable C code.  Assume MIN and MAX fall in
> +# 'int' range.
>  m4_define([b4_int_type],
> -[m4_if(b4_ints_in($@,  [0],   [255]), [1], [unsigned char],
> -   b4_ints_in($@,   [-128],   [127]), [1], [signed char],
> +[m4_if(b4_ints_in($@,  [0],   [127]), [1], [char],
> +   b4_ints_in($@,   [-127],   [127]), [1], [signed char],
> +   b4_ints_in($@,  [0],   [255]), [1], [unsigned char],
>  
> +   b4_ints_in($@, [-32767], [32767]), [1], [short],
> b4_ints_in($@,  [0], [65535]), [1], [unsigned short],
> -   b4_ints_in($@, [-32768], [32767]), [1], [short],
> -
> -   m4_eval([0 <= $1]),[1], [unsigned],
>  
> [int])])

and I'm fine with this.  I don't understand why you shy away from -128 and 
-32768 though.  I would have understood 0..254 as a means to "reserve" -1, but 
I don't understand -127..127.

For the record, Bison's own parser has currently 166 states: its tables can fit 
in 8 bits, and it's a good thing that we do it.

> In the past, Bison has gotten away with using uint8_fast_t and uint16_fast_t 
> for storing indexes, because on machines with 32-bit int (which is the 
> minimum supported by GNU) these types typically promote to 'int' and so avoid 
> most of the signed-vs-unsigned confusion. But if Bison starts using 32-bit 
> unsigned types it won't have this luxury since these types won't promote to 
> 'int'.

I don't want to use 'unsigned int', I fully agree.  That was one of the changes 
in https://lists.gnu.org/archive/html/bug-bison/2019-09/msg00027.html.

> And anyway, Bison shouldn't assume that uint8_fast_t etc. promote to 'int', 
> because the C standard doesn't guarantee that.
> 
> So, I suggest using signed types for indexes in Bison templates; please see 
> the attached patch. I haven't checked for compatibility between this and your 
> proposed patches, but I assume that's straightforward.

I have installed my commits as published, and then yours.  For some reason git 
am failed, I had to use patch, but after a few conflict resolutions, I think I 
had your patch properly adjusted.

Let's start from this to finish the changes wrt state types.

Apparently like Kaz, I like to name my types *_t, so I'd be happy with 
yy_state_t (scalars) and yy_state_least_t/yy_small_state_t (arrays) instead of 
int and yy_state_num currently.  But we've had disagreements on this regard, 
yet I don't know where you stand today (given that here, we're kind of 
protected by yy_*: POSIX/ISO etc. would be really bad people to start using 
them.)

Cheers!


Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Paul Eggert

On 10/1/19 11:40 AM, Kaz Kylheku wrote:

That said, parser states are more of an enumeration. They are identifiers.
We shouldn't be doing any math on them of this sort.


That may be true in theory, but not in practice. For example, the GLR 
skeleton's yyLRgotoState has this line:


  int yyr = yypgoto[yysym - YYNTOKENS] + yystate;

which is doing arithmetic on state numbers. If we use signed arithmetic 
for this sort of thing, we can catch state-number arithmetic overflow 
automatically at runtime with GCC. Currently we can't so easily catch 
such errors.



The unsigned types provide a greater range for the possible states,
without having to use negative values, so they are understandably an
attractive tool for combating the problem of running out of states. 


That is a practical issue for narrower-than-int types, so we may want to 
continue to use unsigned types in large arrays of narrower-than-int 
integers. This is reasonably safe since these integers promote to int 
and so avoid most of the unsigned-arithmetic problems. It could be the 
subject of another patch (a relatively-small one, I think).


However, we should avoid unsigned types that are 'unsigned' or wider, as 
they have too many issues. I doubt whether there are practical uses of 
Bison with more than INT_MAX states; but if there are, we should use 
ptrdiff_t to count states, not int, because any application likely to 
exceed INT_MAX is also pretty likely to exceed UINT_MAX.


There is an area where Bison uses 'int' when it should use a wider type, 
presumably intmax_t. This is for line and column numbers, which these 
come from input files and can exceed INT_MAX in some practical cases. 
Fixing that could be a subject of a different patch.


There are no doubt other uses of 'int' in Bison for indexes should be 
changed to ptrdiff_t, for things like stack depth where Bison should not 
impose arbitrary limits. I think I should take a look at that next; this 
will probably entail improvements to the patch I proposed.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Hans Åberg


> On 1 Oct 2019, at 21:21, Kaz Kylheku  wrote:
> 
> On 2019-10-01 06:53, Hans Åberg wrote:
>> One should note that the unsigned types are required to be 2’s
>> complement C/C++, unlike the signed ones, cf.
> 
> That is untrue by definition, since two's complement is strictly
> a mechanism for representing negative values, which the unsigned
> types do not do. They follow a "pure binary enumeration"
> (I think that's the wording).

Sorry, it should be integers modulo 2^n, where n is the number of bits, which 
2’s complement is equivalent to with a certain choice of integer 
representatives.





Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Kaz Kylheku

On 2019-10-01 06:53, Hans Åberg wrote:

One should note that the unsigned types are required to be 2’s
complement C/C++, unlike the signed ones, cf.


That is untrue by definition, since two's complement is strictly
a mechanism for representing negative values, which the unsigned
types do not do. They follow a "pure binary enumeration"
(I think that's the wording).

The unsigned types can *simulate* some aspects of two's complement
quite well. Unsigned and two's complement addition basically look
the same at the bit level; just the interpretation of the upper
two bits is different for determining sign and overflow. If you're
writing a software emulator for a two's complement CPU, you can use
unsigned for the basic ALU operations.

Code which uses unsigned numbers in a thoroughly disciplined way
to simulate two's complement math does not represent the typical
uses of the unsigned types in C programs.

When unsigned types are used to simulate signed, comparisons must
be done very carefully. The value UINT_MAX is representing -1
and so it has to compare less than zero. You need two write yourself
a a twocomp_less(x, y) function and consistently use it to get this
right. Most code that you will see in the wild that works with
unsigned numbers still uses the regular < and > operators, though.
(That's one way the bugs creep in.)

Unsigned multiplication and division also require careful handling
in order to correctly simulate two's complement.




Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Kaz Kylheku

On 2019-10-01 01:35, Paul Eggert wrote:

On 9/29/19 11:34 AM, Akim Demaille wrote:


As a matter of fact, we used two types:

- most arrays (such as state stack, and its correspondance in the LAC
  infrastructure) are using int16_t.  A few "temporary" variables also
  have this type.

- yystate, which is an intermediate variable, was an int.


Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18
does not require support for int16_t. It could well be more efficient
for them to use int_least16_t instead, for better caching; see below.


I would make two typedefs: one for the storage type of Yacc
state values, and one for the "register" type for manipulating
them in local variables.

The latter of course, being capable of representing all the values of
the former. E.g. example definitions:

   typedef short yy_small_state_t;
   typedef int yy_state_t;

The special value -1 should be given a manifest constant, like
YY_BAD_STATE or whatever. This constant should have a value
which is representable in the small type yy_small_state_t,
not a value which changes when converted to yy_small_state_t.

In other words:

   YY_BAD_STATE == (yy_state_t) YY_BAD_STATE

This is currently not true of the value -1 for various combinations
of the two types. Obviously if both are signed, there isn't
any issue.

(Needless to say, states should be compared to this value exactly,
never using "state < 0"; the constant is not even required to
to be negative.)


In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.


In other GNU applications, we've been moving away from using unsigned
types due to their confusing behavior (particularly when comparing
signed vs unsigned), and because modern tools such as 'gcc


The unsigned type are indeed silly when arithmetic is involved
because they break basic arithmetic identities like. For instance,
given:

  a > b - c

we'd like to be free to move the c to the other side:

  a + c > b

if all three are signed types, with small-ish values clustered
around zero, then there is no overflow and the derivation holds.
If they are unsigned types, even if the values are small, the
derivation is not justified, because b - c may produce a huge value.
That's the crux; unsigned has an overflow cliff right at zero,
creating a pitfall for calculations involving trivial numbers.

That said, parser states are more of an enumeration. They are 
identifiers.
We shouldn't be doing any math on them of this sort. Given a parser 
state
S, there is nothing interesting about S+1; they are not related. When 
would

we ever multiply two yacc states, or find their difference and such, or
care whether one is greater than the other?

The unsigned types provide a greater range for the possible states,
without having to use negative values, so they are understandably an
attractive tool for combating the problem of running out of states.

I don't see a big problem with using them. This is all internal to
a skeleton; it's not an API being perpetrated on the user. That's why
there is so much latitude to fiddle with these types at parser 
generation

time in the first place. Just the skeleton code and any generated
bits have to get it right; the parser generator user isn't burdened
with anything.

Rearding indexing, it's difficult to keep unsigned types entirely
out of indexing calculations because an unsigned type is injected
into the mix whenever we invoke the sizeof operator.



Re: [PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Hans Åberg


> On 1 Oct 2019, at 10:35, Paul Eggert  wrote:
> 
> In other GNU applications, we've been moving away from using unsigned types 
> due to their confusing behavior (particularly when comparing signed vs 
> unsigned), and because modern tools such as 'gcc -fsanitize=undefined' can 
> check for signed integer overflow but (obviously) not for unsigned integer 
> overflow. In this newer style, it's OK to use unsigned types for bit vectors 
> and hash values since these typically don't involve integer comparisons and 
> integer overflow is typically harmless; for indexes, though, unsigned types 
> are so error-prone that they should be avoided.

One should note that the unsigned types are required to be 2’s complement 
C/C++, unlike the signed ones, cf. [1]. Also, in C++, indices are required to 
be large enough to hold all values, so on 64-bit machines they are that also 
for strings that usually are quite small.

1. 
https://en.wikipedia.org/wiki/Integer_overflow#Methods_to_mitigate_integer_overflow_problems





[PATCH 0/3] yacc: compute the best type for the state number

2019-10-01 Thread Paul Eggert

On 9/29/19 11:34 AM, Akim Demaille wrote:


As a matter of fact, we used two types:

- most arrays (such as state stack, and its correspondance in the LAC
  infrastructure) are using int16_t.  A few "temporary" variables also
  have this type.

- yystate, which is an intermediate variable, was an int.


Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18 does not 
require support for int16_t. It could well be more efficient for them to use 
int_least16_t instead, for better caching; see below.



In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.


In other GNU applications, we've been moving away from using unsigned types due 
to their confusing behavior (particularly when comparing signed vs unsigned), 
and because modern tools such as 'gcc -fsanitize=undefined' can check for signed 
integer overflow but (obviously) not for unsigned integer overflow. In this 
newer style, it's OK to use unsigned types for bit vectors and hash values since 
these typically don't involve integer comparisons and integer overflow is 
typically harmless; for indexes, though, unsigned types are so error-prone that 
they should be avoided.


In the past, Bison has gotten away with using uint8_fast_t and uint16_fast_t for 
storing indexes, because on machines with 32-bit int (which is the minimum 
supported by GNU) these types typically promote to 'int' and so avoid most of 
the signed-vs-unsigned confusion. But if Bison starts using 32-bit unsigned 
types it won't have this luxury since these types won't promote to 'int'. And 
anyway, Bison shouldn't assume that uint8_fast_t etc. promote to 'int', because 
the C standard doesn't guarantee that.


So, I suggest using signed types for indexes in Bison templates; please see the 
attached patch. I haven't checked for compatibility between this and your 
proposed patches, but I assume that's straightforward.


This patch does not address the int_fast16_t versus int_least16_t issue, but 
that should be easy enough to fix in a followup patch.



That's where I need you Paul: do you think that, for sake of
performances, I should keep the scalar variable as an 'int', and move
to the unsigned type only for arrays?  Is it ok to move to stacks of
uint8_t instead of uint16_t, or should we stick to larger types?  I
can see how using small types is cache friendly, but I can also
imagine that cpus don't like small types.


For local scalar variables it doesn't matter much these days; 'int' is fine and 
is well-understood, but if the index might not fit into 'int' then 'ptrdiff_t' 
is needed instead.


Typically, if you have a large number of integers that all fit into 8 bits, 
you're better off using int_least8_t than wider types since that decreases cache 
pressure. Cache lines are typically wider than 64 bits anyway, so you're not 
saving CPU cycles by using 64-bit rather than 8-bit integers.
>From 5e167c50d77518d693f6165886788bdf323d3c13 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Tue, 1 Oct 2019 01:28:45 -0700
Subject: [PATCH] Prefer signed types for indexes in skeletons

* NEWS: Mention this.
* data/skeletons/c.m4 (b4_int_type):
Prefer char if it will do, and prefer signed types to unsigned if
either will do.
* data/skeletons/glr.c (yy_reduce_print): No need to
convert rule line to unsigned long.
(yyrecoverSyntaxError): Put action into an int to
avoid GCC warning of using a char subscript.
* data/skeletons/lalr1.cc (yy_lac_check_, yysyntax_error_):
Prefer ptrdiff_t to size_t.
* data/skeletons/yacc.c (b4_int_type):
Prefer signed types to unsigned if either will do.
* data/skeletons/yacc.c (b4_declare_parser_state_variables):
(YYSTACK_RELOCATE, YYCOPY, yy_lac_stack_realloc, yy_lac)
(yytnamerr, yysyntax_error, yyparse): Prefer ptrdiff_t to size_t.
(YYPTRDIFF_T, YYPTRDIFF_MAXIMUM): New macros.
(YYSIZE_T): Fix "! defined YYSIZE_T" typo.
(YYSIZE_MAXIMUM): Take the minimum of PTRDIFF_MAX and SIZE_MAX.
(YYSIZEOF): New macro.
(YYSTACK_GAP_MAXIMUM, YYSTACK_BYTES, YYSTACK_RELOCATE)
(yy_lac_stack_realloc, yyparse): Use it.
(YYCOPY, yy_lac_stack_realloc): Cast to YYSIZE_T to pacify GCC.
(yy_reduce_print): Use int instead of unsigned long when int
will do.
(yy_lac_stack_realloc): Prefer long to unsigned long when
either will do.
* tests/regression.at: Adjust to these changes.
---
 NEWS|   6 ++
 data/skeletons/c.m4 |  14 ++---
 data/skeletons/glr.c|  11 ++--
 data/skeletons/lalr1.cc |  20 +++
 data/skeletons/yacc.c   | 127 
 tests/regression.at |  16 ++---
 6 files changed, 112 insertions(+), 82 deletions(-)

diff --git a/NEWS b/NEWS
index 44929e35..e204a4b0 100644
--- a/NEWS
+++ b/NEWS
@@ -40,6 +40,12 @@ GNU Bison NEWS
   The Java backend no longer emits code and data for parser tracing if the
   %define variable parse.trace is not defined.
 
+*** Templates prefer signed integer types
+
+  Bison templates

[PATCH 0/3] yacc: compute the best type for the state number

2019-09-29 Thread Akim Demaille
Hi Tom,

This should address your concern.

Hi Paul,

I need your expertise here, if you have some time.

Currently we use the "best" integral type for tables, including those
storing state numbers.  However the variables used in yyparse (and its
dependencies such as yy_stack_print) use a mixture of int and int16_t
invariably.  As a consequence, very large models overflow these
variables, and that's Tom's report
(https://lists.gnu.org/archive/html/bug-bison/2019-09/msg00018.html).

As a matter of fact, we used two types:

- most arrays (such as state stack, and its correspondance in the LAC
  infrastructure) are using int16_t.  A few "temporary" variables also
  have this type.

- yystate, which is an intermediate variable, was an int.

In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.

That's where I need you Paul: do you think that, for sake of
performances, I should keep the scalar variable as an 'int', and move
to the unsigned type only for arrays?  Is it ok to move to stacks of
uint8_t instead of uint16_t, or should we stick to larger types?  I
can see how using small types is cache friendly, but I can also
imagine that cpus don't like small types.



This is currently in the 'fix-state-type' branch, which, amusingly, is
about not having a fix type for states.  It's also available here:

https://www.lrde.epita.fr/~akim/private/bison/bison-3.4.2.117-7e3dc.tar.gz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.4.2.117-7e3dc.tar.xz

Cheers!


Akim Demaille (3):
  style: change misleading macro argument name
  yacc: introduce a type for states
  yacc: use the most appropriate integral type for state numbers

 data/skeletons/glr.c  |  4 +-
 data/skeletons/yacc.c | 71 +---
 tests/linear  | 94 +++
 tests/local.mk|  2 +-
 tests/torture.at  | 43 +---
 5 files changed, 172 insertions(+), 42 deletions(-)
 create mode 100755 tests/linear

-- 
2.23.0