Re: VLA representation in GCC internals

2024-11-09 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sat, Nov 09, 2024 at 11:34:58AM GMT, Martin Uecker wrote:
> > > You can just eliminate the code for the star as it would now
> > > automatically end up as variable.
> > 
> > Thanks!
> > 
> > Have a lovely day!
> > Alex
> 
> Just committed, so you could rebase on trunk now.

Thanks!  I read your mail just a few seconds before starting the
hours-long regression testing.  Just in time.  :-)

> I also wanted to change zero representation to be the same as
> in the C++ FE, but I need to rethink this.  Maybe this change
> already solves most of our problems.

It should.  Thankfully, I already wrote the tests for when this got
fixed, so that I should only need to comment/uncomment them, so that I
wouldn't need to remember what I needed to test.  So, hopefully this
will work out of the box.  :)

> BTW: My main practical issue with zero-sized arrays is that the
> UB sanitizers triggers for zero-sized variable arrays.

Hmmm.

> 
> Martin

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


Re: VLA representation in GCC internals

2024-11-09 Thread Alejandro Colomar via Gcc
On Sat, Nov 09, 2024 at 09:38:45AM GMT, Martin Uecker wrote:
> Am Samstag, dem 09.11.2024 um 00:54 +0100 schrieb Alejandro Colomar via Gcc:
> > Hi Martin,
> > 
> > I'm in the process of rebasing my __countof__ changes after your patch
> > that fixes support for [*] and [0].
> > 
> > I should update the implementation of the following function:
> > 
> > static bool
> > is_top_array_vla (tree type)
> > {
> >   bool zero, star, var;
> >   tree d;
> > 
> >   if (TREE_CODE (type) != ARRAY_TYPE)
> > return false;
> >   if (!COMPLETE_TYPE_P (type))
> > return false;
> > 
> >   d = TYPE_DOMAIN (type);
> >   zero = !TYPE_MAX_VALUE (d);
> >   star = (zero && C_TYPE_VARIABLE_SIZE (type));
> >   if (star)
> > return true;
> >   if (zero)
> > return false;
> > 
> >   var = (TREE_CODE (TYPE_MIN_VALUE (d)) != INTEGER_CST
> > || TREE_CODE (TYPE_MAX_VALUE (d)) != INTEGER_CST);
> >   return var;
> > }
> > 
> > The 'star' calculation should be updated.  Would you mind proposing an
> > implementation of this function that works with your changes?  Thanks!
> > 
> You can just eliminate the code for the star as it would now
> automatically end up as variable.

Thanks!

Have a lovely day!
Alex

> 
> Martin

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature


VLA representation in GCC internals

2024-11-08 Thread Alejandro Colomar via Gcc
Hi Martin,

I'm in the process of rebasing my __countof__ changes after your patch
that fixes support for [*] and [0].

I should update the implementation of the following function:

static bool
is_top_array_vla (tree type)
{
  bool zero, star, var;
  tree d;

  if (TREE_CODE (type) != ARRAY_TYPE)
return false;
  if (!COMPLETE_TYPE_P (type))
return false;

  d = TYPE_DOMAIN (type);
  zero = !TYPE_MAX_VALUE (d);
  star = (zero && C_TYPE_VARIABLE_SIZE (type));
  if (star)
return true;
  if (zero)
return false;

  var = (TREE_CODE (TYPE_MIN_VALUE (d)) != INTEGER_CST
|| TREE_CODE (TYPE_MAX_VALUE (d)) != INTEGER_CST);
  return var;
}

The 'star' calculation should be updated.  Would you mind proposing an
implementation of this function that works with your changes?  Thanks!

Have a lovely night!
Alex


-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-11-08 Thread Alejandro Colomar via Gcc
Hi Joseph,

On Fri, Nov 08, 2024 at 03:54:51PM GMT, Joseph Myers wrote:
> On Fri, 8 Nov 2024, Alejandro Colomar wrote:
> 
> > Hi Thorsten, JeanHeyd,
> > 
> > I've just checked that JeanHeyd opened a survey about the name.
> > It's here: .
> > Thanks for the survey, JeanHeyd; It's a fair one.  (The only thing I
> > miss is anouncing it to some relevant publics, but I guess I'm doing it
> > now, so that's ok.)  Would you mind sharing provisional results before
> > Nov 18?  That's the cut-off date for having this feature in GCC 15.
> > If you don't want to publish provisional results for fear of influencing
> > other voters, you could share them in private.
> 
> I don't consider such survey results to be of any relevance to reviewing a 
> GCC patch.  The basis for review now is what's in C2Y now (which the 
> proposed patch does not follow).  If a name change is accepted in Graz 
> (possibly taking into account survey results at that point), that's the 
> point at which it's relevant for GCC.

Fair enough.  For now, I keep it for GCC 16.  Adding _Lengthof now to
GCC 15 may jeopardize my attempt to rename it in Graz, so I'll play my
cards.  If I see survey results that don't seem good for my renaming
attempts, I may reconsider and send _Lengthof next week; but for now I
have some hope.  :-)

BTW, I'm nevertheless going to rebase my patch, to test with the changes
that Martin did to arrays of 0 elements.  I may send another revision of
the _Countof patch set, just for having it in the list, but you can just
ignore it.

Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


'defer' (n3199) concerns

2024-11-08 Thread Alejandro Colomar via Gcc
Hi JeanHeyd,

I was involved this week in a fix for a bug I wrote some months ago
about a call to free(3) with a bad pointer.

The simplest reproducer is:

$ cat strsep_bad.c
#include 
#include 
#include 

int
main(void)
{
char  *p;

p = strdup("123;");
strsep(&p, ";");
free(p);

puts("done");
}

$ cc -Wall -Wextra strsep.c 
$ ./a.out 
free(): invalid pointer
Aborted

A fix for that program is to store a copy of the original pointer:

$ cat strsep_good.c
#include 
#include 
#include 

int
main(void)
{
char  *p, *dup;

p = dup = strdup("123;");
strsep(&p, ";");
free(dup);

puts("done");
}

While I could sympathize with 'defer', I'm wondering if it may simplify
some code avoiding goto's, at the expense of being obscurely dangerous
in other cases like this one.  I suspect one could blindly attempt to do
the following:

$ cat strsep_bad_defer.c
#include 
#include 
#include 

int
main(void)
{
char  *p;

p = strdup("123;");
defer free(p);

strsep(&p, ";");

printf("done");
}

Since 'p' in free(p) is evaluated after strsep(3), it is equivalent to
the first program, which is bogus.  I think goto better codifies the
evaluation order, and allows the programmer to think about what it's
doing.

So defer would only be good for the common case, which is usually
simple-enough that goto should be enough.  And it's bad in those corner
cases where you really to be careful.  I think I've changed my mind
about defer to not want it.

I wanted to have this thought written in a mailing list to be able to
reference it.


Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-11-08 Thread Alejandro Colomar via Gcc
Hi Thorsten, JeanHeyd,

I've just checked that JeanHeyd opened a survey about the name.
It's here: .
Thanks for the survey, JeanHeyd; It's a fair one.  (The only thing I
miss is anouncing it to some relevant publics, but I guess I'm doing it
now, so that's ok.)  Would you mind sharing provisional results before
Nov 18?  That's the cut-off date for having this feature in GCC 15.
If you don't want to publish provisional results for fear of influencing
other voters, you could share them in private.

Have a lovely day!
Alex

On Mon, Oct 28, 2024 at 01:34:04AM GMT, Thorsten Glaser wrote:
> Alejandro Colomar dixit:
> 
> >Yes; there are four n papers.  See below, plus the history of why
> >four papers.
> 
> Thanks, will look at them but probably on Tuesday.
> 
> >> >-  _Nelementsof
> >> >-  _Nelemsof
> >>
> >> If you want to shorten elements, elts is probably seen more
> >> often than elems, at least in my experience. But this is very
> >> low-grade bikeshedding, so just me pointing it out but putting
> >> not much weight behind it.
> >
> >That was my original proposal, but everybody I've talked so far told me
> >they preferred _Nelemsof.  I don't care about it at all, so whatever.
> 
> Ah, okay.
> 
> >> PS: Got any contacts in the OpenBSD and NetBSD worlds that can
> >> add input? They invest a lot into good C code as well, in
> >> the OpenBSD case rather heavily (if opinionated).
> >
> >I don't.  If you do, please let them know.  Theo doesn't seem to have a
> >good opinion of me, but maybe he could help.
> 
> I fear we share this property but I can try to reach out via
> Miod, he might have an idea, and via bsiegert to NetBSD.
> 
> bye,
> //mirabilos

-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-27 Thread Alejandro Colomar via Gcc
On Sun, Oct 27, 2024 at 02:00:01PM GMT, Alejandro Colomar wrote:
> Hi Thorsten,
> 
> On Sun, Oct 27, 2024 at 04:30:43AM GMT, Thorsten Glaser wrote:
> > Hi Alejandro,
> > 
> > >On Sun, Oct 27, 2024 at 01:40:23AM GMT, mirabilos wrote:
> > >> Not too sure what the root context of this thread is, but in BSD land
> > >
> > >The root context is that
> > >
> > >-  _Lengthof was added to C2y in the Minnesota meeting.  It was proposed
> > 
> > where/how would this be used for? Also, do you have the n.pdf link
> > handy? I don’t actively follow C other than what thephd posts.
> 
> Yes; there are four n papers.  See below, plus the history of why
> four papers.
> 
> 
> 
> 
> That first paper wasn't mine.  I didn't know about the existence of that
> paper until a few months after that paper I sent a patch to glibc adding
> the usual macro, and I think Joseph told me about the existence of the
> paper.  I tried to contact the author of the paper, to ask him to change
> the name, since I thought it would be dangerous as I said here.  I
> couldn't reach him in many years, so at some point I decided I would
> author a revision of the paper.
> 
> I believe stuff should go into the standard after it exists in an
> implementation, so I first prepared a patch for GCC:
> 
> Joseph Myers asked me to present a paper to WG14 to see what they think
> before he would finish the review of my patch set, and so I prepared the
> paper, which was submitted as n3313.  By the time I wrote this GCC
> patch, I had relaxed my feelings about the dangers of the name
> _Lengthof, so I didn't think too much about it and followed the original
> proposal, also since Martin Uecker --who helped me develop the GCC
> patch set-- had a preference for that name.
> 
> 
> 
> While iterating on the patches, at some point I reminded myself of
> actual bugs I had fixed in the recent past in which precisely this
> naming confusion was involved.  I decided to rename the operator to
> _Nelementsof, to avoid having the term length, and because the standard
> uses quite often the terms "number of elements in an array" and "one
> past the last element of an array object", so it just felt the natural
> term to use.  Since the name of the operator was too long, and some
> people complained about that, I proposed an alternative name _Neltsof.
> I then submitted n3325.
> 
> 
> 
> The paper was discussed in the Minnesota meeting of WG14, where I was
> invited to defend my paper.  The semantics were agreed upon, but there
> wasn't strong consensus on the name.  In the end, _Nelementsof was
> rejected in a poll with majority but no consensus (only slightly more
> YESs than NOs).  Then we voted to see what name should be used, and the

Oops; only slightly _less_ YESs than NOs.

> name _Lengthof was voted.  I had to present n3369.
> 
> I wasn't the only voice concerned about _Lengthof.  The name _Countof
> seemed to have also some traction in the people that didn't want
> _Lengthof, so it's the name I'm considering now, but either something
> derived from _Nelementsof or _Countof or anything not "length"y would be
> good IMO.
> 
> BTW, during the discussions, _Nelemsof seemed to be strongly preferred
> over _Neltsof.  I don't care at all about those trivial details, though.
> 
> 
> 
> And this paper was voted in for C2y, so here we are.  I'm still trying
> to overrule that paper by implementing it with a different name.  Since
> I have the advantage that I already have the patches working, I have
> months before anybody implements _Lengthof in another compiler and sets
> the name in stone.
> 
> It would be great if authors of other compilers would agree on the
> dangerousness of this name and pushed for a different name.
> 
> > >-  I'm sending a patch to GCC proposing __countof__, since overloading
> > >   length for both string length and array number of elements is (IMO)
> > >   going to promote off-by-one bugs in string-handling code (and I've
> > 
> > Definitely! The string length is one less than the amount of array
> > elements it has (works for char and wide strings).
> 
> Thanks!
> 
> > >-  _Nelementsof
> > >-  _Nelemsof
> > 
> > If you want to shorten elements, elts is probably seen more
> > often than elems, at least in my experience. But this is very
> > low-grade bikeshedding, so just me pointing it out but putting
> > not much weight behind it.
> 
> That was my original proposal, but everybody I've talked so far told me
> they preferred _Nelemsof.  I don't care about it at all, so whatever.
> 
> > bye,
> > //mirabilos
> > PS: Got any contacts in the OpenBSD and NetBSD worlds that can
> > add input? They invest a lot into good C 

Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-27 Thread Alejandro Colomar via Gcc
Hi Thorsten,

On Sun, Oct 27, 2024 at 04:30:43AM GMT, Thorsten Glaser wrote:
> Hi Alejandro,
> 
> >On Sun, Oct 27, 2024 at 01:40:23AM GMT, mirabilos wrote:
> >> Not too sure what the root context of this thread is, but in BSD land
> >
> >The root context is that
> >
> >-  _Lengthof was added to C2y in the Minnesota meeting.  It was proposed
> 
> where/how would this be used for? Also, do you have the n.pdf link
> handy? I don’t actively follow C other than what thephd posts.

Yes; there are four n papers.  See below, plus the history of why
four papers.




That first paper wasn't mine.  I didn't know about the existence of that
paper until a few months after that paper I sent a patch to glibc adding
the usual macro, and I think Joseph told me about the existence of the
paper.  I tried to contact the author of the paper, to ask him to change
the name, since I thought it would be dangerous as I said here.  I
couldn't reach him in many years, so at some point I decided I would
author a revision of the paper.

I believe stuff should go into the standard after it exists in an
implementation, so I first prepared a patch for GCC:

Joseph Myers asked me to present a paper to WG14 to see what they think
before he would finish the review of my patch set, and so I prepared the
paper, which was submitted as n3313.  By the time I wrote this GCC
patch, I had relaxed my feelings about the dangers of the name
_Lengthof, so I didn't think too much about it and followed the original
proposal, also since Martin Uecker --who helped me develop the GCC
patch set-- had a preference for that name.



While iterating on the patches, at some point I reminded myself of
actual bugs I had fixed in the recent past in which precisely this
naming confusion was involved.  I decided to rename the operator to
_Nelementsof, to avoid having the term length, and because the standard
uses quite often the terms "number of elements in an array" and "one
past the last element of an array object", so it just felt the natural
term to use.  Since the name of the operator was too long, and some
people complained about that, I proposed an alternative name _Neltsof.
I then submitted n3325.



The paper was discussed in the Minnesota meeting of WG14, where I was
invited to defend my paper.  The semantics were agreed upon, but there
wasn't strong consensus on the name.  In the end, _Nelementsof was
rejected in a poll with majority but no consensus (only slightly more
YESs than NOs).  Then we voted to see what name should be used, and the
name _Lengthof was voted.  I had to present n3369.

I wasn't the only voice concerned about _Lengthof.  The name _Countof
seemed to have also some traction in the people that didn't want
_Lengthof, so it's the name I'm considering now, but either something
derived from _Nelementsof or _Countof or anything not "length"y would be
good IMO.

BTW, during the discussions, _Nelemsof seemed to be strongly preferred
over _Neltsof.  I don't care at all about those trivial details, though.



And this paper was voted in for C2y, so here we are.  I'm still trying
to overrule that paper by implementing it with a different name.  Since
I have the advantage that I already have the patches working, I have
months before anybody implements _Lengthof in another compiler and sets
the name in stone.

It would be great if authors of other compilers would agree on the
dangerousness of this name and pushed for a different name.

> >-  I'm sending a patch to GCC proposing __countof__, since overloading
> >   length for both string length and array number of elements is (IMO)
> >   going to promote off-by-one bugs in string-handling code (and I've
> 
> Definitely! The string length is one less than the amount of array
> elements it has (works for char and wide strings).

Thanks!

> >-  _Nelementsof
> >-  _Nelemsof
> 
> If you want to shorten elements, elts is probably seen more
> often than elems, at least in my experience. But this is very
> low-grade bikeshedding, so just me pointing it out but putting
> not much weight behind it.

That was my original proposal, but everybody I've talked so far told me
they preferred _Nelemsof.  I don't care about it at all, so whatever.

> bye,
> //mirabilos
> PS: Got any contacts in the OpenBSD and NetBSD worlds that can
> add input? They invest a lot into good C code as well, in
> the OpenBSD case rather heavily (if opinionated).

I don't.  If you do, please let them know.  Theo doesn't seem to have a
good opinion of me, but maybe he could help.

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-26 Thread Alejandro Colomar via Gcc
Hi mirabilos,

On Sun, Oct 27, 2024 at 01:40:23AM GMT, mirabilos wrote:
> Not too sure what the root context of this thread is, but in BSD land

The root context is that

-  _Lengthof was added to C2y in the Minnesota meeting.  It was proposed
   by me, although they pushed me to rename the operator to _Lengthof,
   which I was strongly against.  The semantics of the operator are
   agreed by everyone, but the name is under fight.

-  I'm sending a patch to GCC proposing __countof__, since overloading
   length for both string length and array number of elements is (IMO)
   going to promote off-by-one bugs in string-handling code (and I've
   shown two bug fixes for bugs that I found and fixed in shadow utils
   in which this precise ambiguous naming issue was present, so it's not
   just theoretical).  That is, I'm trying GCC to force WG14 to change
   the name of the operator (which BTW was influenced by a survey
   conducted on programmers some of which declared hating C and not
   wanting to use it ever, so I refuse to acknowledge validity of the
   results of that survey; that survey also didn't inform of possible
   problems of each name).

So basically, we're on a fight for the standard name of the operator.
It seems that the names that have more serious support are

-  _Lengthof
-  _Countof
-  _Nelementsof
-  _Nelemsof

Anything else can be just discarded, as they didn't receive many votes.
Although I presonally would like anything that is not _Lengthof (or a
derivative), whatever it is.

> we use NELEM or NENTS for (sizeof(a) / sizeof((a)[0])) generally.

Yup, I know.  I think I prefer to keep a *of suffix for the operator for
consistency with the other operators, which is why _Nelemsof was
considered.

> 
> >> and Unix/Plan9.
> 
> yay!
> 
> >Another thing I think would be interesting is to allow to choose a
> >matrix of options:
> 
> I pointed out in Fedi (also mostly contextless) to thephd that another
> option is devotee-style preferential voting (Condorcet etc. and “none of
> the above” and “further discussion” can be voted atop or in the middle
> or below the options) for this kind of thing.

+1


Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-26 Thread Alejandro Colomar via Gcc
On Sat, Oct 26, 2024 at 12:12:39PM GMT, Alejandro Colomar wrote:
> On Sat, Oct 26, 2024 at 12:07:04PM GMT, Alejandro Colomar wrote:
> > [Moved from gcc-patches@ to gcc@]
> > 
> > Hi JeanHeyd, наб,
> > 
> > I see you (JeanHeyd) are developing yet another survey about the names
> > for this new operator.  Let me ask you to be clear about my fear of
> > ambiguity with null-terminated strings which are stored within arrays
> > and the length of both entities differ, usually by exactly one, being a
> > potential cause of a confusion that might result in buffer overflows, or
> > other kinds of errors (and it would be interesting to mention that I've
> > presented a link to an actual bug of that class in shadow-utils, which I
> > fixed recently).  I would also like the survey to be presented to
> > programmers that like programming in C; I would refuse to acknowledge
> > the results of any survey that is presented to C++ or rust programmers
> > who acknowledge not wanting to program in C ever again.  They have a
> > clear conflict of interest.  If this survey is conducted, it should
> > include the gcc and glibc communities, and the resons why each name is
> 
> It would also be interesting to hear the opinion of people from the BSDs
> and Unix/Plan9.
> 
> > considered good and bad should be clearly stated alongside the options,
> > in a detailed rationale.
> > 
> > For extentof(), my only caveat is that one might want to add a 2-args
> > operator (or macro) that has the C++ semantics, that is, allowing you to
> > specify the dimension of the array that you want.  I don't see why it
> > would be useful, but didn't want to preclude it either.  But other than
> > that, I'm okay with that name.
> > 
> > Another thing is that I'd prefer it to be based on email, or something
> > that doesn't impose a barrier to those who don't have an account in a
> > given platform, and don't want to create it.

Another thing I think would be interesting is to allow to choose a
matrix of options:

   | love | LGTM | could live with it | dislike | hate | No opinion
lenof
lengthof
countof
nelemsof
nelementsof
extentof

Where multiple ones can be voted by the same person at the same level of
likeness.

> > 
> > Thanks for your interest in this operator!
> > 
> > Have a lovely day!
> > Alex
> > 
> > 
> > P.S.:  наб, you asked why not array_size().  I agree with the defenders
> > of lengthof that size should not be overloaded to mean both the number
> > of bytes and the number of elements of an object (although I'm closer to
> > accepting overloading the term "size" than overloading the term
> > "length", since size at least doesn't promote off-by-one errors).  Also,
> > I have a macro sizeof_array() which I define as
> > 
> > #define sizeof_array(a)  (countof(a) * sizeof((a)[0]))
> > 
> > It would be very weird and confusing to have
> > 
> > #define sizeof_array(a)  (array_size(a) * sizeof((a)[0]))
> > 
> > 
> > 
> > On Sat, Oct 26, 2024 at 12:10:56AM GMT, Alejandro Colomar wrote:
> > > Hi Joseph,
> > > 
> > > On Fri, Oct 25, 2024 at 08:44:15PM GMT, Joseph Myers wrote:
> > > > I don't see the use of pedwarn_c23 and associated tests (error with 
> > > > -std=c23 -pedantic-errors, warning with -std=c23 -pedantic, no 
> > > > diagnostic 
> > > > with -std=c23 -pedantic-errors -Wno-c23-c2y-compat, no diagnostic with 
> > > > -std=c2y -pedantic-errors, warning with -std=c2y -pedantic-errors 
> > > > -Wc23-c2y-compat), previously discussed in comments on v13, that would 
> > > > be 
> > > > appropriate before considering this for inclusion with an appropriate 
> > > > substitution of names.
> > > 
> > > I removed it because I renamed it to __countof__, which is a GNU
> > > extension, and thus should not be warned by -Wpedantic.  As part of my
> > > opposition to _Lengthof, I will not provide you with that part, which
> > > would amount to basically giving you _Lengthof but not.  As part of the
> > > editorialising process, you'll also have to add pedantic warnings, if
> > > that's what you want to do.  Again, I will earnestly ask to once more to
> > > consider __countof__, but it's up to you.
> > > 
> > > Have a lovely night!
> > > Alex
> > > 
> > > -- 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> 
> 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-26 Thread Alejandro Colomar via Gcc
On Sat, Oct 26, 2024 at 12:27:03PM GMT, Alejandro Colomar wrote:
> On Sat, Oct 26, 2024 at 12:12:39PM GMT, Alejandro Colomar wrote:
> > On Sat, Oct 26, 2024 at 12:07:04PM GMT, Alejandro Colomar wrote:
> > > [Moved from gcc-patches@ to gcc@]
> > > 
> > > Hi JeanHeyd, наб,
> > > 
> > > I see you (JeanHeyd) are developing yet another survey about the names
> > > for this new operator.  Let me ask you to be clear about my fear of
> > > ambiguity with null-terminated strings which are stored within arrays
> > > and the length of both entities differ, usually by exactly one, being a
> > > potential cause of a confusion that might result in buffer overflows, or
> > > other kinds of errors (and it would be interesting to mention that I've
> > > presented a link to an actual bug of that class in shadow-utils, which I
> > > fixed recently).  I would also like the survey to be presented to
> > > programmers that like programming in C; I would refuse to acknowledge
> > > the results of any survey that is presented to C++ or rust programmers
> > > who acknowledge not wanting to program in C ever again.  They have a
> > > clear conflict of interest.  If this survey is conducted, it should
> > > include the gcc and glibc communities, and the resons why each name is
> > 
> > It would also be interesting to hear the opinion of people from the BSDs
> > and Unix/Plan9.
> > 
> > > considered good and bad should be clearly stated alongside the options,
> > > in a detailed rationale.
> > > 
> > > For extentof(), my only caveat is that one might want to add a 2-args
> > > operator (or macro) that has the C++ semantics, that is, allowing you to
> > > specify the dimension of the array that you want.  I don't see why it
> > > would be useful, but didn't want to preclude it either.  But other than
> > > that, I'm okay with that name.
> > > 
> > > Another thing is that I'd prefer it to be based on email, or something
> > > that doesn't impose a barrier to those who don't have an account in a
> > > given platform, and don't want to create it.
> 
> Another thing I think would be interesting is to allow to choose a
> matrix of options:
> 
>| love | LGTM | could live with it | dislike | hate | No opinion
> lenof
> lengthof
> countof
> nelemsof
> nelementsof
> extentof
> 
> Where multiple ones can be voted by the same person at the same level of
> likeness.

(And I would rather require to give some text justifying the options
 chosen.)

> 
> > > 
> > > Thanks for your interest in this operator!
> > > 
> > > Have a lovely day!
> > > Alex
> > > 
> > > 
> > > P.S.:  наб, you asked why not array_size().  I agree with the defenders
> > > of lengthof that size should not be overloaded to mean both the number
> > > of bytes and the number of elements of an object (although I'm closer to
> > > accepting overloading the term "size" than overloading the term
> > > "length", since size at least doesn't promote off-by-one errors).  Also,
> > > I have a macro sizeof_array() which I define as
> > > 
> > >   #define sizeof_array(a)  (countof(a) * sizeof((a)[0]))
> > > 
> > > It would be very weird and confusing to have
> > > 
> > >   #define sizeof_array(a)  (array_size(a) * sizeof((a)[0]))
> > > 
> > > 
> > > 
> > > On Sat, Oct 26, 2024 at 12:10:56AM GMT, Alejandro Colomar wrote:
> > > > Hi Joseph,
> > > > 
> > > > On Fri, Oct 25, 2024 at 08:44:15PM GMT, Joseph Myers wrote:
> > > > > I don't see the use of pedwarn_c23 and associated tests (error with 
> > > > > -std=c23 -pedantic-errors, warning with -std=c23 -pedantic, no 
> > > > > diagnostic 
> > > > > with -std=c23 -pedantic-errors -Wno-c23-c2y-compat, no diagnostic 
> > > > > with 
> > > > > -std=c2y -pedantic-errors, warning with -std=c2y -pedantic-errors 
> > > > > -Wc23-c2y-compat), previously discussed in comments on v13, that 
> > > > > would be 
> > > > > appropriate before considering this for inclusion with an appropriate 
> > > > > substitution of names.
> > > > 
> > > > I removed it because I renamed it to __countof__, which is a GNU
> > > > extension, and thus should not be warned by -Wpedantic.  As part of my
> > > > opposition to _Lengthof, I will not provide you with that part, which
> > > > would amount to basically giving you _Lengthof but not.  As part of the
> > > > editorialising process, you'll also have to add pedantic warnings, if
> > > > that's what you want to do.  Again, I will earnestly ask to once more to
> > > > consider __countof__, but it's up to you.
> > > > 
> > > > Have a lovely night!
> > > > Alex
> > > > 
> > > > -- 
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> 
> 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-26 Thread Alejandro Colomar via Gcc
On Sat, Oct 26, 2024 at 12:07:04PM GMT, Alejandro Colomar wrote:
> [Moved from gcc-patches@ to gcc@]
> 
> Hi JeanHeyd, наб,
> 
> I see you (JeanHeyd) are developing yet another survey about the names
> for this new operator.  Let me ask you to be clear about my fear of
> ambiguity with null-terminated strings which are stored within arrays
> and the length of both entities differ, usually by exactly one, being a
> potential cause of a confusion that might result in buffer overflows, or
> other kinds of errors (and it would be interesting to mention that I've
> presented a link to an actual bug of that class in shadow-utils, which I
> fixed recently).  I would also like the survey to be presented to
> programmers that like programming in C; I would refuse to acknowledge
> the results of any survey that is presented to C++ or rust programmers
> who acknowledge not wanting to program in C ever again.  They have a
> clear conflict of interest.  If this survey is conducted, it should
> include the gcc and glibc communities, and the resons why each name is

It would also be interesting to hear the opinion of people from the BSDs
and Unix/Plan9.

> considered good and bad should be clearly stated alongside the options,
> in a detailed rationale.
> 
> For extentof(), my only caveat is that one might want to add a 2-args
> operator (or macro) that has the C++ semantics, that is, allowing you to
> specify the dimension of the array that you want.  I don't see why it
> would be useful, but didn't want to preclude it either.  But other than
> that, I'm okay with that name.
> 
> Another thing is that I'd prefer it to be based on email, or something
> that doesn't impose a barrier to those who don't have an account in a
> given platform, and don't want to create it.
> 
> Thanks for your interest in this operator!
> 
> Have a lovely day!
> Alex
> 
> 
> P.S.:  наб, you asked why not array_size().  I agree with the defenders
> of lengthof that size should not be overloaded to mean both the number
> of bytes and the number of elements of an object (although I'm closer to
> accepting overloading the term "size" than overloading the term
> "length", since size at least doesn't promote off-by-one errors).  Also,
> I have a macro sizeof_array() which I define as
> 
>   #define sizeof_array(a)  (countof(a) * sizeof((a)[0]))
> 
> It would be very weird and confusing to have
> 
>   #define sizeof_array(a)  (array_size(a) * sizeof((a)[0]))
> 
> 
> 
> On Sat, Oct 26, 2024 at 12:10:56AM GMT, Alejandro Colomar wrote:
> > Hi Joseph,
> > 
> > On Fri, Oct 25, 2024 at 08:44:15PM GMT, Joseph Myers wrote:
> > > I don't see the use of pedwarn_c23 and associated tests (error with 
> > > -std=c23 -pedantic-errors, warning with -std=c23 -pedantic, no diagnostic 
> > > with -std=c23 -pedantic-errors -Wno-c23-c2y-compat, no diagnostic with 
> > > -std=c2y -pedantic-errors, warning with -std=c2y -pedantic-errors 
> > > -Wc23-c2y-compat), previously discussed in comments on v13, that would be 
> > > appropriate before considering this for inclusion with an appropriate 
> > > substitution of names.
> > 
> > I removed it because I renamed it to __countof__, which is a GNU
> > extension, and thus should not be warned by -Wpedantic.  As part of my
> > opposition to _Lengthof, I will not provide you with that part, which
> > would amount to basically giving you _Lengthof but not.  As part of the
> > editorialising process, you'll also have to add pedantic warnings, if
> > that's what you want to do.  Again, I will earnestly ask to once more to
> > consider __countof__, but it's up to you.
> > 
> > Have a lovely night!
> > Alex
> > 
> > -- 
> > 
> 
> 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: [PATCH v17 2/2] c: Add __countof__ operator

2024-10-26 Thread Alejandro Colomar via Gcc
[Moved from gcc-patches@ to gcc@]

Hi JeanHeyd, наб,

I see you (JeanHeyd) are developing yet another survey about the names
for this new operator.  Let me ask you to be clear about my fear of
ambiguity with null-terminated strings which are stored within arrays
and the length of both entities differ, usually by exactly one, being a
potential cause of a confusion that might result in buffer overflows, or
other kinds of errors (and it would be interesting to mention that I've
presented a link to an actual bug of that class in shadow-utils, which I
fixed recently).  I would also like the survey to be presented to
programmers that like programming in C; I would refuse to acknowledge
the results of any survey that is presented to C++ or rust programmers
who acknowledge not wanting to program in C ever again.  They have a
clear conflict of interest.  If this survey is conducted, it should
include the gcc and glibc communities, and the resons why each name is
considered good and bad should be clearly stated alongside the options,
in a detailed rationale.

For extentof(), my only caveat is that one might want to add a 2-args
operator (or macro) that has the C++ semantics, that is, allowing you to
specify the dimension of the array that you want.  I don't see why it
would be useful, but didn't want to preclude it either.  But other than
that, I'm okay with that name.

Another thing is that I'd prefer it to be based on email, or something
that doesn't impose a barrier to those who don't have an account in a
given platform, and don't want to create it.

Thanks for your interest in this operator!

Have a lovely day!
Alex


P.S.:  наб, you asked why not array_size().  I agree with the defenders
of lengthof that size should not be overloaded to mean both the number
of bytes and the number of elements of an object (although I'm closer to
accepting overloading the term "size" than overloading the term
"length", since size at least doesn't promote off-by-one errors).  Also,
I have a macro sizeof_array() which I define as

#define sizeof_array(a)  (countof(a) * sizeof((a)[0]))

It would be very weird and confusing to have

#define sizeof_array(a)  (array_size(a) * sizeof((a)[0]))



On Sat, Oct 26, 2024 at 12:10:56AM GMT, Alejandro Colomar wrote:
> Hi Joseph,
> 
> On Fri, Oct 25, 2024 at 08:44:15PM GMT, Joseph Myers wrote:
> > I don't see the use of pedwarn_c23 and associated tests (error with 
> > -std=c23 -pedantic-errors, warning with -std=c23 -pedantic, no diagnostic 
> > with -std=c23 -pedantic-errors -Wno-c23-c2y-compat, no diagnostic with 
> > -std=c2y -pedantic-errors, warning with -std=c2y -pedantic-errors 
> > -Wc23-c2y-compat), previously discussed in comments on v13, that would be 
> > appropriate before considering this for inclusion with an appropriate 
> > substitution of names.
> 
> I removed it because I renamed it to __countof__, which is a GNU
> extension, and thus should not be warned by -Wpedantic.  As part of my
> opposition to _Lengthof, I will not provide you with that part, which
> would amount to basically giving you _Lengthof but not.  As part of the
> editorialising process, you'll also have to add pedantic warnings, if
> that's what you want to do.  Again, I will earnestly ask to once more to
> consider __countof__, but it's up to you.
> 
> Have a lovely night!
> Alex
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: BUG: realloc(p,0) is not conforming to C99/C11/C17/POSIX.1-2008

2024-10-17 Thread Alejandro Colomar via Gcc
CC += JeanHeyd

On Thu, Oct 17, 2024 at 05:25:55PM GMT, Alejandro Colomar wrote:
> On Thu, Oct 17, 2024 at 05:19:07PM GMT, Alejandro Colomar wrote:
> > CC += Robert, Joseph, gcc@
> > 
> > On Thu, Oct 17, 2024 at 04:30:27PM GMT, Alejandro Colomar wrote:
> > > On Thu, Oct 17, 2024 at 04:23:29PM GMT, Alejandro Colomar wrote:
> > > > CC += Doug, as the author of the original malloc(3).
> > > > 
> > > > On Thu, Oct 17, 2024 at 04:21:22PM GMT, Alejandro Colomar wrote:
> > > > > Hi!
> > > > > 
> > > > > наб and I have been researching about malloc(0) and realloc(p,0), and
> > > > > have written our findings here:
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -  Every default malloc(0) had always returned a unique pointer.
> > > > > -  Every realloc(,0) had always behaved congruently with malloc(0).
> > > > > -  The weird malloc(0)=NULL was a bug in SysV r2's optimized 
> > > > > alternative
> > > > >-lmalloc library.  The documentation still didn't allow returning
> > > > >NULL.
> > > > > -  SVID surprisingly documented that bug in -lmalloc as if it were the
> > > > >only good behavior, making every default malloc(0) non-conforming.
> > > > >This was a huge mistake/crime.  So far so good for realloc(,0).
> > > > > 
> > > > > -  X3J11 (ANSI C) decided that realloc(p,0) shall be equivalent to
> > > > >free(p) (allegedly, in line with their self-inflicted policy of not
> > > > >allowing zero-size objects, which BTW was never true, since 
> > > > > malloc(0)
> > > > >and realloc(NULL,0) continued to support zero-size objects).  This
> > > > >was a hallucination of ANSI C.  No implementations had done this.
> > > > >Ever.  No documentation or standards document had specified this.
> > > > >Ever.
> > > > > -  POSIX.1-2001 (Issue 6) blindly followed C89, making every POSIX
> > > > >system non-conforming.
> > > > > 
> > > > > -  C99 and POSIX.1-2008 fixed the issue, by reverting to the 
> > > > > historical
> > > > >realloc(p,0) behavior of being congruent to malloc(0).
> > > > > 
> > > > > -  glibc, in 2006/2007, made a so-called bugfix change to conform to
> > > > >C89, which effectively made realloc(p,0) non-conforming to C99.  
> > > > > This
> > > > >happened in the following commit:
> > > > > 
> > > > >   commit 11bf311edc76f5ddc469a8c396e313e82d76be15
> > > > >   Author: Ulrich Drepper 
> > > > >   Date:   Thu Jan 11 21:51:07 2007 +
> > > > > 
> > > > >   [BZ #2510, BZ #2830, BZ #3137, BZ #3313, BZ #3426, BZ 
> > > > > #3465, BZ #3480, BZ #3483, BZ #
> > > > >   3493, BZ #3514, BZ #3515, BZ #3664, BZ #3673, BZ #3674]
> > > > > 
> > > > >   [...]
> > > > > 
> > > > >   2006-12-08  Ulrich Drepper  
> > > > >   * malloc/memusage.c: Handle realloc with new size 
> > > > > of zero and
> > > > >   non-NULL pointer correctly.
> > > > > 
> > > > >   [...]
> > > > > 
> > > > > Please revert that commit, and make realloc(p,0) conform to C99, C11,
> > > > > C17, and POSIX.1-2008.  This is the historical right behavior of
> > > > > realloc(p,0), and what the BSDs do.
> > > 
> > > I realized that I didn't clarify what the bug is.  realloc(p,0) should
> > > return a unique pointer, just like malloc(0) does.  Returning NULL is
> 
> Pedantically, I should say valid, not unique.  Returning the old pointer
> would be a valid implementation.
> 
> > > brain-damaged.
> > > 
> > > > > 
> > > > > That so-called bugfix probably silently broke algorithms for which 0 
> > > > > is
> > > > > not a special case.
> > > > > 
> > > > > C23 will break realloc(p,0) even further, but let's ignore that 
> > > > > mistake.
> > > > > 
> > > > > 
> > > > > Have a lovely day!
> > > > > Alex
> > > > > 
> > > > > 
> > > > > -- 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> 
> 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: BUG: realloc(p,0) is not conforming to C99/C11/C17/POSIX.1-2008

2024-10-17 Thread Alejandro Colomar via Gcc
On Thu, Oct 17, 2024 at 05:19:07PM GMT, Alejandro Colomar wrote:
> CC += Robert, Joseph, gcc@
> 
> On Thu, Oct 17, 2024 at 04:30:27PM GMT, Alejandro Colomar wrote:
> > On Thu, Oct 17, 2024 at 04:23:29PM GMT, Alejandro Colomar wrote:
> > > CC += Doug, as the author of the original malloc(3).
> > > 
> > > On Thu, Oct 17, 2024 at 04:21:22PM GMT, Alejandro Colomar wrote:
> > > > Hi!
> > > > 
> > > > наб and I have been researching about malloc(0) and realloc(p,0), and
> > > > have written our findings here:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -  Every default malloc(0) had always returned a unique pointer.
> > > > -  Every realloc(,0) had always behaved congruently with malloc(0).
> > > > -  The weird malloc(0)=NULL was a bug in SysV r2's optimized alternative
> > > >-lmalloc library.  The documentation still didn't allow returning
> > > >NULL.
> > > > -  SVID surprisingly documented that bug in -lmalloc as if it were the
> > > >only good behavior, making every default malloc(0) non-conforming.
> > > >This was a huge mistake/crime.  So far so good for realloc(,0).
> > > > 
> > > > -  X3J11 (ANSI C) decided that realloc(p,0) shall be equivalent to
> > > >free(p) (allegedly, in line with their self-inflicted policy of not
> > > >allowing zero-size objects, which BTW was never true, since malloc(0)
> > > >and realloc(NULL,0) continued to support zero-size objects).  This
> > > >was a hallucination of ANSI C.  No implementations had done this.
> > > >Ever.  No documentation or standards document had specified this.
> > > >Ever.
> > > > -  POSIX.1-2001 (Issue 6) blindly followed C89, making every POSIX
> > > >system non-conforming.
> > > > 
> > > > -  C99 and POSIX.1-2008 fixed the issue, by reverting to the historical
> > > >realloc(p,0) behavior of being congruent to malloc(0).
> > > > 
> > > > -  glibc, in 2006/2007, made a so-called bugfix change to conform to
> > > >C89, which effectively made realloc(p,0) non-conforming to C99.  This
> > > >happened in the following commit:
> > > > 
> > > > commit 11bf311edc76f5ddc469a8c396e313e82d76be15
> > > > Author: Ulrich Drepper 
> > > > Date:   Thu Jan 11 21:51:07 2007 +
> > > > 
> > > > [BZ #2510, BZ #2830, BZ #3137, BZ #3313, BZ #3426, BZ 
> > > > #3465, BZ #3480, BZ #3483, BZ #
> > > > 3493, BZ #3514, BZ #3515, BZ #3664, BZ #3673, BZ #3674]
> > > > 
> > > > [...]
> > > > 
> > > > 2006-12-08  Ulrich Drepper  
> > > > * malloc/memusage.c: Handle realloc with new size 
> > > > of zero and
> > > > non-NULL pointer correctly.
> > > > 
> > > > [...]
> > > > 
> > > > Please revert that commit, and make realloc(p,0) conform to C99, C11,
> > > > C17, and POSIX.1-2008.  This is the historical right behavior of
> > > > realloc(p,0), and what the BSDs do.
> > 
> > I realized that I didn't clarify what the bug is.  realloc(p,0) should
> > return a unique pointer, just like malloc(0) does.  Returning NULL is

Pedantically, I should say valid, not unique.  Returning the old pointer
would be a valid implementation.

> > brain-damaged.
> > 
> > > > 
> > > > That so-called bugfix probably silently broke algorithms for which 0 is
> > > > not a special case.
> > > > 
> > > > C23 will break realloc(p,0) even further, but let's ignore that mistake.
> > > > 
> > > > 
> > > > Have a lovely day!
> > > > Alex
> > > > 
> > > > 
> > > > -- 
> > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> 
> 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: BUG: realloc(p,0) is not conforming to C99/C11/C17/POSIX.1-2008

2024-10-17 Thread Alejandro Colomar via Gcc
CC += Robert, Joseph, gcc@

On Thu, Oct 17, 2024 at 04:30:27PM GMT, Alejandro Colomar wrote:
> On Thu, Oct 17, 2024 at 04:23:29PM GMT, Alejandro Colomar wrote:
> > CC += Doug, as the author of the original malloc(3).
> > 
> > On Thu, Oct 17, 2024 at 04:21:22PM GMT, Alejandro Colomar wrote:
> > > Hi!
> > > 
> > > наб and I have been researching about malloc(0) and realloc(p,0), and
> > > have written our findings here:
> > > 
> > > 
> > > 
> > > 
> > > -  Every default malloc(0) had always returned a unique pointer.
> > > -  Every realloc(,0) had always behaved congruently with malloc(0).
> > > -  The weird malloc(0)=NULL was a bug in SysV r2's optimized alternative
> > >-lmalloc library.  The documentation still didn't allow returning
> > >NULL.
> > > -  SVID surprisingly documented that bug in -lmalloc as if it were the
> > >only good behavior, making every default malloc(0) non-conforming.
> > >This was a huge mistake/crime.  So far so good for realloc(,0).
> > > 
> > > -  X3J11 (ANSI C) decided that realloc(p,0) shall be equivalent to
> > >free(p) (allegedly, in line with their self-inflicted policy of not
> > >allowing zero-size objects, which BTW was never true, since malloc(0)
> > >and realloc(NULL,0) continued to support zero-size objects).  This
> > >was a hallucination of ANSI C.  No implementations had done this.
> > >Ever.  No documentation or standards document had specified this.
> > >Ever.
> > > -  POSIX.1-2001 (Issue 6) blindly followed C89, making every POSIX
> > >system non-conforming.
> > > 
> > > -  C99 and POSIX.1-2008 fixed the issue, by reverting to the historical
> > >realloc(p,0) behavior of being congruent to malloc(0).
> > > 
> > > -  glibc, in 2006/2007, made a so-called bugfix change to conform to
> > >C89, which effectively made realloc(p,0) non-conforming to C99.  This
> > >happened in the following commit:
> > > 
> > >   commit 11bf311edc76f5ddc469a8c396e313e82d76be15
> > >   Author: Ulrich Drepper 
> > >   Date:   Thu Jan 11 21:51:07 2007 +
> > > 
> > >   [BZ #2510, BZ #2830, BZ #3137, BZ #3313, BZ #3426, BZ #3465, BZ 
> > > #3480, BZ #3483, BZ #
> > >   3493, BZ #3514, BZ #3515, BZ #3664, BZ #3673, BZ #3674]
> > > 
> > >   [...]
> > > 
> > >   2006-12-08  Ulrich Drepper  
> > >   * malloc/memusage.c: Handle realloc with new size of zero 
> > > and
> > >   non-NULL pointer correctly.
> > > 
> > >   [...]
> > > 
> > > Please revert that commit, and make realloc(p,0) conform to C99, C11,
> > > C17, and POSIX.1-2008.  This is the historical right behavior of
> > > realloc(p,0), and what the BSDs do.
> 
> I realized that I didn't clarify what the bug is.  realloc(p,0) should
> return a unique pointer, just like malloc(0) does.  Returning NULL is
> brain-damaged.
> 
> > > 
> > > That so-called bugfix probably silently broke algorithms for which 0 is
> > > not a special case.
> > > 
> > > C23 will break realloc(p,0) even further, but let's ignore that mistake.
> > > 
> > > 
> > > Have a lovely day!
> > > Alex
> > > 
> > > 
> > > -- 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> 
> 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: n3369 New _Lengthof operator (v4)

2024-10-03 Thread Alejandro Colomar via Gcc
Hi all,

On Wed, Oct 02, 2024 at 08:29:26PM GMT, Alejandro Colomar wrote:
> > On Wed, Oct 02, 2024 at 11:41:20AM +0200, Alejandro Colomar wrote:
> > > Hi!
> > > 
> > > This operator is as voted in a WG14 meeting yesterday, with the only
> > > difference that we name it __lengthof__ instead of _Lengthof, to be able
> > > to add it without being bound by ISO bureaucracy.
> > > 
> > > No semantic changes since v12; only the rename, according to what WG14
> > > preferred.  WG14 agreed on the semantic changes of the operator as I
> > > implemented them in v12.
> > > 
> > > Changes since v12:
> > > 
> > > -  Rename s/__nelementsof__/__lengthof__/
> > > -  Fix typo in documentation.

FYI, this has been merged into C2y today, as _Lengthof.
The paper that was merged was
 with some
trivial/editorial wording cosmetic changes amended on top of it.

Have a lovely day!
Alex

> > > 
> > > Below is a range diff against v12.
> > > 
> > > Have a lovely day!
> > > Alex
> > > 
> > > 
> > > Alejandro Colomar (4):
> > >   contrib/: Add support for Cc: and Link: tags
> > >   gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
> > >   Merge definitions of array_type_nelts_top()
> > >   c: Add __lengthof__ operator
> > > 
> > >  contrib/gcc-changelog/git_commit.py|   5 +-
> > >  gcc/c-family/c-common.cc   |  26 
> > >  gcc/c-family/c-common.def  |   3 +
> > >  gcc/c-family/c-common.h|   2 +
> > >  gcc/c/c-decl.cc|  32 +++--
> > >  gcc/c/c-fold.cc|   7 +-
> > >  gcc/c/c-parser.cc  |  62 +++--
> > >  gcc/c/c-tree.h |   4 +
> > >  gcc/c/c-typeck.cc  | 118 +++-
> > >  gcc/config/aarch64/aarch64.cc  |   2 +-
> > >  gcc/config/i386/i386.cc|   2 +-
> > >  gcc/cp/cp-tree.h   |   1 -
> > >  gcc/cp/decl.cc |   2 +-
> > >  gcc/cp/init.cc |   8 +-
> > >  gcc/cp/lambda.cc   |   3 +-
> > >  gcc/cp/operators.def   |   1 +
> > >  gcc/cp/tree.cc |  13 --
> > >  gcc/doc/extend.texi|  30 +
> > >  gcc/expr.cc|   8 +-
> > >  gcc/fortran/trans-array.cc |   2 +-
> > >  gcc/fortran/trans-openmp.cc|   4 +-
> > >  gcc/rust/backend/rust-tree.cc  |  13 --
> > >  gcc/rust/backend/rust-tree.h   |   2 -
> > >  gcc/target.h   |   3 +
> > >  gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
> > >  gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
> > >  gcc/testsuite/gcc.dg/nelementsof.c | 150 +
> > >  gcc/tree.cc|  17 ++-
> > >  gcc/tree.h |   3 +-
> > >  29 files changed, 604 insertions(+), 80 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c
> > > 
> > > Range-diff against v12:
> > > 1:  d7fca49888a = 1:  d7fca49888a contrib/: Add support for Cc: and Link: 
> > > tags
> > > 2:  e65245ac294 = 2:  e65245ac294 gcc/: Rename array_type_nelts() => 
> > > array_type_nelts_minus_one()
> > > 3:  03de2d67bb1 = 3:  03de2d67bb1 Merge definitions of 
> > > array_type_nelts_top()
> > > 4:  4373c48205d ! 4:  f635871da1f c: Add __nelementsof__ operator
> > > @@ Metadata
> > >  Author: Alejandro Colomar 
> > >  
> > >   ## Commit message ##
> > > -c: Add __nelementsof__ operator
> > > +c: Add __lengthof__ operator
> > >  
> > >  This operator is similar to sizeof but can only be applied to an 
> > > array,
> > >  and returns its number of elements.
> > > @@ Commit message
> > >  
> > >  gcc/ChangeLog:
> > >  
> > > -* doc/extend.texi: Document __nelementsof__ operator.
> > > -* target.h (enum type_context_kind): Add __nelementsof__ 
> > > operator.
> > > +* doc/extend.texi: Document __lengthof__ operator.
> > > +* target.h (enum type_context_kind): Add __lengthof__ 
> > > operator.
> > >  
> > >  gcc/c-family/ChangeLog:
> > >  
> > >  * c-common.h
> > >  * c-common.def:
> > > -* c-common.cc (c_nelementsof_type): Add __nelementsof__ 
> > > operator.
> > > +* c-common.cc (c_lengthof_type): Add __lengthof__ 
> > > operator.
> > >  
> > >  gcc/c/ChangeLog:
> > >  
> > >  * c-tree.h
> > > -(c_expr_nelementsof_expr, c_expr_nelementsof_type)
> > > +

n3369 New _Lengthof operator (v4)

2024-10-02 Thread Alejandro Colomar via Gcc
On Wed, Oct 02, 2024 at 08:21:54PM +0200, Alejandro Colomar wrote:
> 
> Hi Daniel,
> 
> Here's a revised version of the WG14 paper, with all issues addressed.

I had a topy in the paper number.  Here's the paper with the number
fixed.

Cheers,
Alex

> 
> Have a lovely day!
> Alex
> 
> On Wed, Oct 02, 2024 at 11:41:20AM +0200, Alejandro Colomar wrote:
> > Hi!
> > 
> > This operator is as voted in a WG14 meeting yesterday, with the only
> > difference that we name it __lengthof__ instead of _Lengthof, to be able
> > to add it without being bound by ISO bureaucracy.
> > 
> > No semantic changes since v12; only the rename, according to what WG14
> > preferred.  WG14 agreed on the semantic changes of the operator as I
> > implemented them in v12.
> > 
> > Changes since v12:
> > 
> > -  Rename s/__nelementsof__/__lengthof__/
> > -  Fix typo in documentation.
> > 
> > Below is a range diff against v12.
> > 
> > Have a lovely day!
> > Alex
> > 
> > 
> > Alejandro Colomar (4):
> >   contrib/: Add support for Cc: and Link: tags
> >   gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
> >   Merge definitions of array_type_nelts_top()
> >   c: Add __lengthof__ operator
> > 
> >  contrib/gcc-changelog/git_commit.py|   5 +-
> >  gcc/c-family/c-common.cc   |  26 
> >  gcc/c-family/c-common.def  |   3 +
> >  gcc/c-family/c-common.h|   2 +
> >  gcc/c/c-decl.cc|  32 +++--
> >  gcc/c/c-fold.cc|   7 +-
> >  gcc/c/c-parser.cc  |  62 +++--
> >  gcc/c/c-tree.h |   4 +
> >  gcc/c/c-typeck.cc  | 118 +++-
> >  gcc/config/aarch64/aarch64.cc  |   2 +-
> >  gcc/config/i386/i386.cc|   2 +-
> >  gcc/cp/cp-tree.h   |   1 -
> >  gcc/cp/decl.cc |   2 +-
> >  gcc/cp/init.cc |   8 +-
> >  gcc/cp/lambda.cc   |   3 +-
> >  gcc/cp/operators.def   |   1 +
> >  gcc/cp/tree.cc |  13 --
> >  gcc/doc/extend.texi|  30 +
> >  gcc/expr.cc|   8 +-
> >  gcc/fortran/trans-array.cc |   2 +-
> >  gcc/fortran/trans-openmp.cc|   4 +-
> >  gcc/rust/backend/rust-tree.cc  |  13 --
> >  gcc/rust/backend/rust-tree.h   |   2 -
> >  gcc/target.h   |   3 +
> >  gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
> >  gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
> >  gcc/testsuite/gcc.dg/nelementsof.c | 150 +
> >  gcc/tree.cc|  17 ++-
> >  gcc/tree.h |   3 +-
> >  29 files changed, 604 insertions(+), 80 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
> >  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
> >  create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c
> > 
> > Range-diff against v12:
> > 1:  d7fca49888a = 1:  d7fca49888a contrib/: Add support for Cc: and Link: 
> > tags
> > 2:  e65245ac294 = 2:  e65245ac294 gcc/: Rename array_type_nelts() => 
> > array_type_nelts_minus_one()
> > 3:  03de2d67bb1 = 3:  03de2d67bb1 Merge definitions of 
> > array_type_nelts_top()
> > 4:  4373c48205d ! 4:  f635871da1f c: Add __nelementsof__ operator
> > @@ Metadata
> >  Author: Alejandro Colomar 
> >  
> >   ## Commit message ##
> > -c: Add __nelementsof__ operator
> > +c: Add __lengthof__ operator
> >  
> >  This operator is similar to sizeof but can only be applied to an 
> > array,
> >  and returns its number of elements.
> > @@ Commit message
> >  
> >  gcc/ChangeLog:
> >  
> > -* doc/extend.texi: Document __nelementsof__ operator.
> > -* target.h (enum type_context_kind): Add __nelementsof__ 
> > operator.
> > +* doc/extend.texi: Document __lengthof__ operator.
> > +* target.h (enum type_context_kind): Add __lengthof__ 
> > operator.
> >  
> >  gcc/c-family/ChangeLog:
> >  
> >  * c-common.h
> >  * c-common.def:
> > -* c-common.cc (c_nelementsof_type): Add __nelementsof__ 
> > operator.
> > +* c-common.cc (c_lengthof_type): Add __lengthof__ operator.
> >  
> >  gcc/c/ChangeLog:
> >  
> >  * c-tree.h
> > -(c_expr_nelementsof_expr, c_expr_nelementsof_type)
> > +(c_expr_lengthof_expr, c_expr_lengthof_type)
> >  * c-decl.cc
> >  (start_struct, finish_struct)
> >  (start_enum, finish_enum)
> >  * c-parser.cc
> >  (c_parser_size

n3365 New _Lengthof operator (v4)

2024-10-02 Thread Alejandro Colomar via Gcc

Hi Daniel,

Here's a revised version of the WG14 paper, with all issues addressed.

Have a lovely day!
Alex

On Wed, Oct 02, 2024 at 11:41:20AM +0200, Alejandro Colomar wrote:
> Hi!
> 
> This operator is as voted in a WG14 meeting yesterday, with the only
> difference that we name it __lengthof__ instead of _Lengthof, to be able
> to add it without being bound by ISO bureaucracy.
> 
> No semantic changes since v12; only the rename, according to what WG14
> preferred.  WG14 agreed on the semantic changes of the operator as I
> implemented them in v12.
> 
> Changes since v12:
> 
> -  Rename s/__nelementsof__/__lengthof__/
> -  Fix typo in documentation.
> 
> Below is a range diff against v12.
> 
> Have a lovely day!
> Alex
> 
> 
> Alejandro Colomar (4):
>   contrib/: Add support for Cc: and Link: tags
>   gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
>   Merge definitions of array_type_nelts_top()
>   c: Add __lengthof__ operator
> 
>  contrib/gcc-changelog/git_commit.py|   5 +-
>  gcc/c-family/c-common.cc   |  26 
>  gcc/c-family/c-common.def  |   3 +
>  gcc/c-family/c-common.h|   2 +
>  gcc/c/c-decl.cc|  32 +++--
>  gcc/c/c-fold.cc|   7 +-
>  gcc/c/c-parser.cc  |  62 +++--
>  gcc/c/c-tree.h |   4 +
>  gcc/c/c-typeck.cc  | 118 +++-
>  gcc/config/aarch64/aarch64.cc  |   2 +-
>  gcc/config/i386/i386.cc|   2 +-
>  gcc/cp/cp-tree.h   |   1 -
>  gcc/cp/decl.cc |   2 +-
>  gcc/cp/init.cc |   8 +-
>  gcc/cp/lambda.cc   |   3 +-
>  gcc/cp/operators.def   |   1 +
>  gcc/cp/tree.cc |  13 --
>  gcc/doc/extend.texi|  30 +
>  gcc/expr.cc|   8 +-
>  gcc/fortran/trans-array.cc |   2 +-
>  gcc/fortran/trans-openmp.cc|   4 +-
>  gcc/rust/backend/rust-tree.cc  |  13 --
>  gcc/rust/backend/rust-tree.h   |   2 -
>  gcc/target.h   |   3 +
>  gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
>  gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
>  gcc/testsuite/gcc.dg/nelementsof.c | 150 +
>  gcc/tree.cc|  17 ++-
>  gcc/tree.h |   3 +-
>  29 files changed, 604 insertions(+), 80 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
>  create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
>  create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c
> 
> Range-diff against v12:
> 1:  d7fca49888a = 1:  d7fca49888a contrib/: Add support for Cc: and Link: tags
> 2:  e65245ac294 = 2:  e65245ac294 gcc/: Rename array_type_nelts() => 
> array_type_nelts_minus_one()
> 3:  03de2d67bb1 = 3:  03de2d67bb1 Merge definitions of array_type_nelts_top()
> 4:  4373c48205d ! 4:  f635871da1f c: Add __nelementsof__ operator
> @@ Metadata
>  Author: Alejandro Colomar 
>  
>   ## Commit message ##
> -c: Add __nelementsof__ operator
> +c: Add __lengthof__ operator
>  
>  This operator is similar to sizeof but can only be applied to an 
> array,
>  and returns its number of elements.
> @@ Commit message
>  
>  gcc/ChangeLog:
>  
> -* doc/extend.texi: Document __nelementsof__ operator.
> -* target.h (enum type_context_kind): Add __nelementsof__ 
> operator.
> +* doc/extend.texi: Document __lengthof__ operator.
> +* target.h (enum type_context_kind): Add __lengthof__ 
> operator.
>  
>  gcc/c-family/ChangeLog:
>  
>  * c-common.h
>  * c-common.def:
> -* c-common.cc (c_nelementsof_type): Add __nelementsof__ 
> operator.
> +* c-common.cc (c_lengthof_type): Add __lengthof__ operator.
>  
>  gcc/c/ChangeLog:
>  
>  * c-tree.h
> -(c_expr_nelementsof_expr, c_expr_nelementsof_type)
> +(c_expr_lengthof_expr, c_expr_lengthof_type)
>  * c-decl.cc
>  (start_struct, finish_struct)
>  (start_enum, finish_enum)
>  * c-parser.cc
>  (c_parser_sizeof_expression)
> -(c_parser_nelementsof_expression)
> -(c_parser_sizeof_or_nelementsof_expression)
> +(c_parser_lengthof_expression)
> +(c_parser_sizeof_or_lengthof_expression)
>  (c_parser_unary_expression)
>  * c-typeck.cc
>  (build_external_ref)
>  (record_maybe_used

Re: [C2y] n3313.2 draft

2024-08-31 Thread Alejandro Colomar via Gcc
Hi,

On Mon, Aug 19, 2024 at 01:04:28PM GMT, Alejandro Colomar wrote:
> On Mon, Aug 19, 2024 at 12:58:30PM GMT, Alejandro Colomar wrote:
> > I have a draft for an updated proposal to WG14 (more recent than n3313).
> > It has some changes to the documentation of prior art, some typo fixes,
> > etc., and renames elementsof()=>nelementsof(), but the meat of the
> > proposal is the same.  I've also rebased it on top of the latest draft
> > for C2y, which is n3301.  I'll send it as a reply to this subthread.
> 
> I've attached the PDF document of the draft n3313.1, the informal name
> for an eventual revision of the n3313 proposal.  I've also attached the
> man(7) source.  Below is a diff since n3313.

I found some issues in n3313.1 and reverted some of those.
Both the PDF paper and the man(7) source of n3313.2 are attached.

Below is a diff against 3313.1.

Cheers,
Alex

---

diff --git a/elementsof.man b/elementsof.man
index 7002553..76c882b 100644
--- a/elementsof.man
+++ b/elementsof.man
@@ -1,11 +1,11 @@
 .
 .
-.TH n3313.1\~ WG14 202?-??-?? ISO/IEC\~9899 "Proposal for C2y"
+.TH n3313.2\~ WG14 202?-??-?? ISO/IEC\~9899 "Proposal for C2y"
 .
 .
 .SH Name
 .
-n3313.1
+n3313.2
 \-
 New nelementsof() operator (v2)
 .
@@ -162,47 +162,36 @@ Document prior art.
 .IP \[bu]
 Document backwards compatibility.
 .IP \[bu]
-Document reasons for having this operator beyond the pointer safety
+Document reasons for having this operator beyond pointer safety
 (which is already solved with complex macros and/or diagnostics).
 .IP \[bu]
 Add specific proposed changes to the draft document (based on n3220).
 .PD
 .RE
 .TP
-n3313.1
+n3313.2
 v3;
 202?-??-??.
 .IP
 New nelementsof() operator (v3)
 .IP
-n3313.1 is an informal name for a future proposal that supersedes n3313.
+n3313.2 is an informal name for a future proposal that supersedes n3313.
 .RS
 .IP \[bu] 3
 .PD 0
 Rename elementsof => nelementsof.
 .IP \[bu]
+Propose an alternative shorter name: neltsof.
+.IP \[bu]
 Rebase on n3301.
 .IP \[bu]
 Document performance problem of sizeof division.
 .IP \[bu]
-Document exponential macro growth problem of magic macros.
-.IP \[bu]
 Fix support for VLAs in example of NITEMS().
 This needs GNU C's
 .MR __builtin_types_compatible_p "" .
 .IP \[bu]
-Remove concerns about double evaluation,
-since it's possible to evaluate at most once,
-by using
-.I typeof
-and GNU C's statement expression.
-That improvement also adds support for type names,
-so also remove that concern.
-.IP \[bu]
-Add support for compound literals in example of NITEMS(),
-by using a variadic macro that will accept commas in the operand.
-.IP \[bu]
-Fix typo in proposed wording.
+Fix typos, and improve wording.
 .PD
 .RE
 .
@@ -236,25 +225,26 @@ Here's an implementation using GNU C:
 )   \[rs]
 )
 #define sizeof_array(a)(sizeof(a) + must_be(is_array(a)))
-#define sizeof_element(a)  sizeof((a)[0])
-#define NITEMS(...) \[rs]
-({  \[rs]
-typeof(__VA_ARGS__)  *ap_;  \[rs]
-\[rs]
-sizeof_array(*ap_) / sizeof_element(*ap_);  \[rs]
-})
+#define NITEMS(a)  (sizeof_array(a) / sizeof((a)[0]))
 .EE
 .P
 While diagnostics could be better,
 with good helper-macro names,
 they are decent.
 .
+.SS Type names
+.
+This
+.IR \%NITEMS ()
+macro is not ideal,
+since it only works with expressions but not with type names.
+However, for most use cases that's enough.
+.
 .SS constexpr
 .
-A
-.I \%sizeof\c
--based
-implementation
+The usual
+.I \%sizeof
+division
 evaluates the operand and
 results in a run-time value
 in cases where it wouldn't be necessary.
@@ -277,6 +267,18 @@ With a
 operator,
 this would result in an integer constant expression of value 7.
 .
+.SS Double evaluation
+.
+With the
+.IR \%sizeof -based
+implementation from above,
+the example above causes double evaluation of
+.IR *p++ .
+It's possible to write a macro that is free of double-evaluation problems
+using a GNU statement expression and
+.IR typeof (),
+but then the macro cannot be used at file scope.
+.
 .SS Diagnostics
 .
 Having more constant expressions would allow for increased diagnostics,
@@ -383,8 +385,6 @@ and wrap it in a macro.
 Common names include:
 .IP \[bu] 3
 .PD 0
-\%ARRAY_SIZE()
-.IP \[bu]
 \%NITEMS()
 .IP \[bu]
 \%NELEM()
@@ -396,8 +396,32 @@ Common names include:
 \%elementsof()
 .IP \[bu]
 \%lengthof()
+.IP \[bu]
+\%ARRAY_SIZE()
 .PD
 .RE
+.P
+We can extract some patterns from these macros:
+.IP \[bu] 3
+The name derives from one of
+.RS
+.TP
+number of elements
+.TP
+size
+But there's a proposal to remove that term from the standard
+due to ambiguity with the number of bytes of an array
+.RI ( sizeof(a) ).
+.TP
+length
+This is also ambiguous in the context of strings,
+where length means the number of non-zero characters.
+.RE
+.IP \[bu]
+The n

[C2y] n3313.1 draft

2024-08-19 Thread Alejandro Colomar via Gcc
On Mon, Aug 19, 2024 at 12:58:30PM GMT, Alejandro Colomar wrote:
> I have a draft for an updated proposal to WG14 (more recent than n3313).
> It has some changes to the documentation of prior art, some typo fixes,
> etc., and renames elementsof()=>nelementsof(), but the meat of the
> proposal is the same.  I've also rebased it on top of the latest draft
> for C2y, which is n3301.  I'll send it as a reply to this subthread.

I've attached the PDF document of the draft n3313.1, the informal name
for an eventual revision of the n3313 proposal.  I've also attached the
man(7) source.  Below is a diff since n3313.

Have a lovely day!
Alex


diff --git i/elementsof.man w/elementsof.man
index 7c5a006..7002553 100644
--- i/elementsof.man
+++ w/elementsof.man
@@ -1,13 +1,13 @@
 .
 .
-.TH n3313\~ WG14 2024-08-15 ISO/IEC\~9899 "Proposal for C2y"
+.TH n3313.1\~ WG14 202?-??-?? ISO/IEC\~9899 "Proposal for C2y"
 .
 .
 .SH Name
 .
-n3313
+n3313.1
 \-
-New elementsof() operator (v2)
+New nelementsof() operator (v2)
 .
 .
 .SH Category
@@ -35,10 +35,6 @@ GNU Compiler Collection
 Martin Uecker
 .ME
 .br
-.MT xry...@xry111.site
-Xi Ruoyao
-.ME
-.br
 .MT xavi@tutanota.com
 Xavier Del Campo Romero
 .ME
@@ -98,6 +94,42 @@ Aaron Ballman
 .MT paulkon...@comcast.net
 Paul Koning
 .ME
+.br
+.MT daniel.lundin.m...@gmail.com
+Daniel Lundin
+.ME
+.br
+.MT strni...@protonmail.com
+Nikolaos Strimpas
+.ME
+.br
+.MT ferna...@borretti.me
+Fernando Borretti
+.ME
+.br
+.MT phdoftheho...@gmail.com
+JeanHeyd Meneide
+.ME
+.br
+.MT jonathan.protze...@ens-lyon.org
+Jonathan Protzenko
+.ME
+.br
+.MT chris.baz...@arm.com
+Chris Bazley
+.ME
+.br
+.MT ville.voutilai...@gmail.com
+Ville Voutilainen
+.ME
+.br
+.MT alexg.n...@gmail.com
+Alex Celeste
+.ME
+.br
+.MT jakublukasiew...@outlook.com
+Jakub Łukasiewicz
+.ME
 .
 .
 .SH History
@@ -115,6 +147,64 @@ v2;
 2024-08-15.
 .IP
 New elementsof() operator (v2)
+.RS
+.IP \[bu] 3
+.PD 0
+Provide an implementation for GCC.
+.IP \[bu]
+Rename _Lengthof => elementsof.
+.IP \[bu]
+Clarify when it should result in an integer constant expression.
+.IP \[bu]
+Require parentheses.
+.IP \[bu]
+Document prior art.
+.IP \[bu]
+Document backwards compatibility.
+.IP \[bu]
+Document reasons for having this operator beyond the pointer safety
+(which is already solved with complex macros and/or diagnostics).
+.IP \[bu]
+Add specific proposed changes to the draft document (based on n3220).
+.PD
+.RE
+.TP
+n3313.1
+v3;
+202?-??-??.
+.IP
+New nelementsof() operator (v3)
+.IP
+n3313.1 is an informal name for a future proposal that supersedes n3313.
+.RS
+.IP \[bu] 3
+.PD 0
+Rename elementsof => nelementsof.
+.IP \[bu]
+Rebase on n3301.
+.IP \[bu]
+Document performance problem of sizeof division.
+.IP \[bu]
+Document exponential macro growth problem of magic macros.
+.IP \[bu]
+Fix support for VLAs in example of NITEMS().
+This needs GNU C's
+.MR __builtin_types_compatible_p "" .
+.IP \[bu]
+Remove concerns about double evaluation,
+since it's possible to evaluate at most once,
+by using
+.I typeof
+and GNU C's statement expression.
+That improvement also adds support for type names,
+so also remove that concern.
+.IP \[bu]
+Add support for compound literals in example of NITEMS(),
+by using a variadic macro that will accept commas in the operand.
+.IP \[bu]
+Fix typo in proposed wording.
+.PD
+.RE
 .
 .
 .SH Synopsis
@@ -126,14 +216,16 @@ This operator yields the number of elements of an array.
 .
 .SS Portability
 .
-Prior to C23 it was impossible to do this portably,
-but since C23 it is possible to
-portably write a macro that
-determines the number of elements of an array,
-that is,
-the number of elements in the array.
+It is possible to write a macro that
+yields the number of elements of an array.
+However, it is impossible to reject pointer arguments portably.
+Here's an implementation using GNU C:
 .IP
 .EX
+#define is_same_type(a, b)__builtin_types_compatible_p(a, b)
+#define is_same_typeof(a, b)  is_same_type(typeof(a), typeof(b))
+#define decay(a)  (&*(a))
+#define is_array(a)   (!is_same_typeof(a, decay(a)))
 #define must_be(e)  \[rs]
 (   \[rs]
 0 * (int) sizeof(   \[rs]
@@ -143,38 +235,30 @@ the number of elements in the array.
 }   \[rs]
 )   \[rs]
 )
-#define is_array(a) \[rs]
-(   \[rs]
-_Generic(&(a),  \[rs]
-typeof((a)[0]) **:  0,  \[rs]
-default:1   \[rs]
-)   \[rs]
-)
-#define sizeof_array(a)  (sizeof(a) + must_be(is_array(a)))
-#define nitems(a)(sizeof_array(a) / sizeof((a)[0]))
+#define sizeo

[PATCH v10 0/3] c: Add __nelementsof__ operator

2024-08-19 Thread Alejandro Colomar via Gcc
Hi!

This is v10 of this patch set; hopefully, we're close to an end.

I've already submitted a proposal for C2y to WG14, as n3313:


For those who haven't been following since the start, the entire thread
of patch set revisions and their discussions can be found here:

It also contains drafts of the n3313 proposal.

Changes since v9:

-  Rename s/__elementsof__/__nelementsof__/

   elementsof() doesn't mean in English something compatible with its
   programming semantics.  While there's existing uses of it in
   programming, that's an abuse of English, which could cause confusion,
   or maybe preclude a more appropriate use of that name in the future.

   nelementsof() is just one more byte, and is much more appropriate, by
   being a contraction of "_n_umber (of) elements of".

   Also, there's only one use of nelementsof() in the wild, which makes
   backwards compatibility much less of a concern.  That use is
   semantically compatible with this proposed operator.  And we could
   just notify that project the existence of a new standard operator
   with that name, if we finally settle on this name.

-  Rebase on top of git HEAD.

-  CC +=
Daniel Lundin, Nikolaos, JeanHeyd, Fernando, Jonathan, Chris,
Ville, Alex Celeste, Jakub Łukasiewicz

I have a draft for an updated proposal to WG14 (more recent than n3313).
It has some changes to the documentation of prior art, some typo fixes,
etc., and renames elementsof()=>nelementsof(), but the meat of the
proposal is the same.  I've also rebased it on top of the latest draft
for C2y, which is n3301.  I'll send it as a reply to this subthread.

This cover letter is sent to both gcc-patches@ and gcc@.  The patches
are only sent to gcc-patches@, and the draft for WG14 is only sent to
gcc@.

If anyone wants to add an `Acked-by:` (or `Reviewed-by:`), please do it
explicitly.  There've been some informal ones, but I prefer to pick them
from explicit ones.


Have a lovely day!

Alex


Alejandro Colomar (3):
  gcc/: Rename array_type_nelts() => array_type_nelts_minus_one()
  Merge definitions of array_type_nelts_top()
  c: Add __nelementsof__ operator

 gcc/c-family/c-common.cc   |  26 
 gcc/c-family/c-common.def  |   3 +
 gcc/c-family/c-common.h|   2 +
 gcc/c/c-decl.cc|  32 +++--
 gcc/c/c-fold.cc|   7 +-
 gcc/c/c-parser.cc  |  62 +++--
 gcc/c/c-tree.h |   4 +
 gcc/c/c-typeck.cc  | 118 +++-
 gcc/config/aarch64/aarch64.cc  |   2 +-
 gcc/config/i386/i386.cc|   2 +-
 gcc/cp/cp-tree.h   |   1 -
 gcc/cp/decl.cc |   2 +-
 gcc/cp/init.cc |   8 +-
 gcc/cp/lambda.cc   |   3 +-
 gcc/cp/operators.def   |   1 +
 gcc/cp/tree.cc |  13 --
 gcc/doc/extend.texi|  30 +
 gcc/expr.cc|   8 +-
 gcc/fortran/trans-array.cc |   2 +-
 gcc/fortran/trans-openmp.cc|   4 +-
 gcc/rust/backend/rust-tree.cc  |  13 --
 gcc/rust/backend/rust-tree.h   |   2 -
 gcc/target.h   |   3 +
 gcc/testsuite/gcc.dg/nelementsof-compile.c | 115 
 gcc/testsuite/gcc.dg/nelementsof-vla.c |  46 +++
 gcc/testsuite/gcc.dg/nelementsof.c | 150 +
 gcc/tree.cc|  17 ++-
 gcc/tree.h |   3 +-
 28 files changed, 600 insertions(+), 79 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-compile.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof-vla.c
 create mode 100644 gcc/testsuite/gcc.dg/nelementsof.c

Range-diff against v9:
1:  a6aa38c9013 = 1:  ab72c4cee8f gcc/: Rename array_type_nelts() => 
array_type_nelts_minus_one()
2:  4ce16ee4dfe ! 2:  27852be4ac0 Merge definitions of array_type_nelts_top()
@@ Commit message
 Merge definitions of array_type_nelts_top()
 
 There were two identical definitions, and none of them are available
-where they are needed for implementing __elementsof__.  Merge them, and
+where they are needed for implementing __nelementsof__.  Merge them, 
and
 provide the single definition in gcc/tree.{h,cc}, where it's available
-for __elementsof__, which will be added in the following commit.
+for __nelementsof__, which will be added in the following commit.
 
 gcc/ChangeLog:
 
3:  caae5dbecb3 ! 3:  9c78ce1f66d c: Add __elementsof__ operator
@@ Metadata
 Author: Alejandro Colomar 
 
  ## Commit message #

v2.2 Draft for an elementsof paper

2024-08-14 Thread Alejandro Colomar via Gcc
[To: s/gcc-patches/gcc/] 

Hi,

Attached is a new revision of the proposal for WG14 for adding an
elementsof() operator (both the man(7) source, and the generated PDF).

v2.2:

-  Rename lengthof => elementsof.  Aaron found incompatible existing
   functions called lengthof() in the wild.

-  Require parens in ISO C.  Leave sizeof-like syntax as a (recommended)
   QoI detail.  It should give more freedom to implementations to start
   with something simple, if they need it.  [Jens]

-  Document prior art in C and C++.  [Jens, Aaron]

-  Document a rationale for the naming choice.  [Jens, Aaron]

-  Document backwards compatiblity.  [Jens, Aaron]

-  Document why we don't propose an uglified name.  [Jens]

-  Add a few specific questions for WG14.  [Jens]

Have a lovely day!
Alex

-- 



elementsof.man
Description: Unix manual page


elementsof.pdf
Description: Adobe PDF document


signature.asc
Description: PGP signature


Re: v2.1 Draft for a lengthof paper

2024-08-14 Thread Alejandro Colomar via Gcc
[CC: s/gcc-patches@/gcc@/]

Hi Jens,

On Wed, Aug 14, 2024 at 05:44:57PM GMT, Jens Gustedt wrote:
> > However, strlen(3) came first, and we must respect it.
> 
> Sure,  string length, a dynamic feature, and array length are two features.

(Except that I've seen --and also written myself-- also string lengths
 that are constant expressions).

#define STRLEN(s)  (NITEMS(s) - 1)

But that's unimportant.  I'm more worried about:

ssize_t
strtcpy(char *restrict dst, const char *restrict src, size_t ndst)
{
booltrunc;
size_t  dlen, slen;

if (ndst == 0)
return -1;

slen = strnlen(src, ndst);
trunc = (slen == ndst);
dlen = slen - trunc;

stpcpy(mempcpy(dst, src, dlen), "");

if (trunc)
return -1;

return slen;
}

There should be a distinction between array length and string length.
If we call them both len, the names will be ambiguous.

> But we also have VLA and not VNEA in the standard, So we should respect this 
> ;-)

Okay.  (Although, when in doubt, I would favour strlen(3) because it's
older, and more stable.)  :-)

> > Since you haven't proposed eliminating "number of elements" from the
> > standard, and it would still be used alongside length, I think
> > elementsof() would be consistent with your view (consistent with "number
> > of elements").
> 
> didn't we ? Then this is actually a good idea to do so, thanks for the idea !

Number of elements also makes it consistent with the language used for
referring to the pointer to one after the last element of an array.
I think length and number of elements can coexist.  Even you had to use
it in your paper:

... We propose to consequently talk about

-  size when talking about size of a type or object in numbers of bytes
-  length when talking about the number of elements in an array.

I would say it's fine to use length if there's no ambiguity, but
number of elements in contexts where it could be ambiguous with strings.
That's already an improvement over the status quo (which is always
ambiguous, since an array always has both a length and a size).

Would you mind at least keeping "number of elements" for now, replacing
uses of size, and reconsidering the idea of removing "number of
elements" separately and only after that?  That would allow separate
discussion.  I'm in favour of the first but not the second.

> "elements of" is a stretch, linguistically, because you don't mean  the
> elements themselves, you are referring to their number. "elementsof" for
> me would refer to a list of these elements.

We could call it nelementsof(), or neltsof().  neltsof() is shorter,
so I like it better.  (alignof() is already a contraction of
alignmentof(), for prior art in contracting names in operators.)

> > Alternatively, you could use a new term, for example extent, for
> > referring to the number of elements of an array.  That would be more
> > respectful to strlen(3), keeping a strong distinction between string
> > length and array **.
> 
> Only that this separation doesn't exist, even now, as said, it is called
> "variable length array"

I guess VLA is already a common enough term that I guess it's blessed.
How do you feel about using it when it's unambiguous, but using number
of elements when it could be ambiguous?

But so is the expression "one past the last element of an array".  Both
notations should be able to coexist.

It's not like the problem with size, which is never unambiguous.

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: Promote -Wno-sizeof-array-argument to an error

2024-08-09 Thread Alejandro Colomar via Gcc
On Fri, Aug 09, 2024 at 04:03:53PM GMT, Jakub Jelinek wrote:
> E.g. consider GCC
> #define iterative_hash_object(OB,INIT) iterative_hash (&OB, sizeof (OB), INIT)
> macro.
> If one uses it on the function parameters declared as arrays but turned into
> pointers and you know what you are doing, why not.
> If using sizeof in that case is an error, you'd need to know, I can use this
> macro on everything except this special case.

Hmmm, I guess the array updates of
 and
__lengthof__ supporting array parameters don't need to come with a
change to sizeof.  If one wants to learn the size of an array parameter,
it's easy to define

#define sizeof_array(a)  (__lengthof__(a) * sizeof(*(a)))

So, please ignore my proposal.

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


Re: Promote -Wno-sizeof-array-argument to an error

2024-08-09 Thread Alejandro Colomar via Gcc
Hi Jakub,

On Fri, Aug 09, 2024 at 02:24:48PM GMT, Jakub Jelinek wrote:
> You can compile your sources with -Werror=sizeof-array-argument if you want
> it to be an error,

The problem with that is that the error will still be enforced _after_
GCC would change the behavior of sizeof(aparam).

Admittedly, I could write configure checks to determine if I have a
version of GCC where sizeof(aparam) is bad and only enable the error
there.

> but there is nothing wrong about using sizeof on function
> argument with array type if you know what you're doing,

I have doubts here.  Does anyone have an idea of what kind of program
would be interested in that value?

> and sizeof should be
> usable on anything that has a size, not on everything except this special
> case, so please don't enforce it on others.

I don't want to enforce it if there's a single line of code that's using
sizeof(aparam) for a legitimate purpose.  But is there?  Or can it be?


Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: Promote -Wno-sizeof-array-argument to an error

2024-08-09 Thread Alejandro Colomar via Gcc
Hi Martin,

On Thu, Aug 08, 2024 at 09:31:37AM GMT, Martin Uecker wrote:
> Am Donnerstag, dem 08.08.2024 um 02:36 +0200 schrieb Alejandro Colomar:
> > Hi Martin,
> > 
> > Can we promote -Wno-sizeof-array-argument to a hard error?  I don't
> > think there's any legitimate use sizeof() on such a parameter.
> 
> I am a bit worried that it might prevent people from adding size information
> to arguments, by transforming later use of sizeof on such a pointer argument
> into a hard error.

I've been thinking about it, and I'm not convinced.


If we don't have an error, at some point, sizeof(array_param) would
change meaning without prior notice.  If a program uses that new
feature, it will silently compile in older compiler versions, producing
bogus results.  To prevent that, the configure scripts would need to
test the compiler and reject compilers that have the old behavior.

However, if we introduce a mandatory compiler error, programs won't need
to be careful about the feature.  If the compiler supports it, it will
compile, and if it doesn't, it will fail.


Since I don't see any legitimate uses of sizeof(aparam) as of today, I
don't expect having any consequences on existing code.  (But please
point me wrong if there are any, maybe in generic macros.)


What do you think?

> 
> So I would not do this at this time, until we have fully evolved the
> size checking and the benefits are clear.
> 
> > 
> > It would be an incompatible extension to ISO C, which would make sure
> > that there are no remaining uses of sizeof(array_param), which would
> > itself allow in the future --if n2906 or something similar is accepted--
> > repurposing sizeof() to return the size in bytes of the array (instead
> > of the size of a pointer) without worrying about breaking existing code.
> 
> I agree with this goal, but this is a long term goal.  
> 
> 
> Martin

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


Re: Spurious -Wsequence-point with auto

2024-08-09 Thread Alejandro Colomar via Gcc
Hi Martin,

On Fri, Aug 09, 2024 at 10:31:55AM GMT, Martin Uecker wrote:
> Can you file a bug if there isn't one already?  



> BTW: Double evaluation is not only a problem for semantics, but also because
> it can cause long compile times, cf. the 45 MB macro expansion example
> in Linux...

Huh, what's that one?  I didn't know it.  :)

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Spurious -Wsequence-point with auto

2024-08-08 Thread Alejandro Colomar via Gcc
Hi,

I've seen some bogus warning in GCC that suggests that some use of auto
may cause undefined behavior (due to double evaluation).

$ cat auto.c 
#include 

int
main(void)
{
int  i = 3;
int  a[2 * i];
int  (*p)[2 * i];

i = 1;
p = &a;
auto q = p + i++; 
}
$ gcc -Wsequence-point -std=c23 auto.c 
auto.c: In function ‘main’:
auto.c:12:23: warning: operation on ‘i’ may be undefined [-Wsequence-point]
   12 | auto q = p + i++;
  |  ~^~


I originally found this problem here:


And I suspect the warning is confusing auto with typeof().  If typeof()
was used, this would indeed cause double evaluation, once before the =
and once after the =, causing UB.

Maybe it's due to how auto is implemented?  I suspect it's doing
internally:

typeof(p + i++) q = p + i++;

and that would be problematic.  I can reproduce it with both gcc-13 and
gcc-14.

Have a lovely night!
Alex


-- 



signature.asc
Description: PGP signature


Re: Promote -Wno-sizeof-array-argument to an error

2024-08-08 Thread Alejandro Colomar via Gcc
On Thu, Aug 08, 2024 at 09:31:37AM GMT, Martin Uecker wrote:
> Am Donnerstag, dem 08.08.2024 um 02:36 +0200 schrieb Alejandro Colomar:
> > Hi Martin,
> > 
> > Can we promote -Wno-sizeof-array-argument to a hard error?  I don't
> > think there's any legitimate use sizeof() on such a parameter.
> 
> I am a bit worried that it might prevent people from adding size information
> to arguments, by transforming later use of sizeof on such a pointer argument
> into a hard error.
> 
> So I would not do this at this time, until we have fully evolved the
> size checking and the benefits are clear.

Hmmm, I see; makes sense.  Thanks!

Alex

> 
> > 
> > It would be an incompatible extension to ISO C, which would make sure
> > that there are no remaining uses of sizeof(array_param), which would
> > itself allow in the future --if n2906 or something similar is accepted--
> > repurposing sizeof() to return the size in bytes of the array (instead
> > of the size of a pointer) without worrying about breaking existing code.
> 
> I agree with this goal, but this is a long term goal.  
> 
> 
> Martin
> 
> > 
> 
> 
> 

-- 



signature.asc
Description: PGP signature


Re: Promote -Wno-sizeof-array-argument to an error

2024-08-07 Thread Alejandro Colomar via Gcc
On Thu, Aug 08, 2024 at 02:36:36AM GMT, Alejandro Colomar wrote:
> Hi Martin,
> 
> Can we promote -Wno-sizeof-array-argument to a hard error?  I don't

D'oh!  I obviously meant -Wsizeof-array-argument.

> think there's any legitimate use sizeof() on such a parameter.
> 
> It would be an incompatible extension to ISO C, which would make sure
> that there are no remaining uses of sizeof(array_param), which would
> itself allow in the future --if n2906 or something similar is accepted--
> repurposing sizeof() to return the size in bytes of the array (instead
> of the size of a pointer) without worrying about breaking existing code.
> 
> Any opinions?
> 
> Have a lovely night!
> Alex
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Promote -Wno-sizeof-array-argument to an error

2024-08-07 Thread Alejandro Colomar via Gcc
Hi Martin,

Can we promote -Wno-sizeof-array-argument to a hard error?  I don't
think there's any legitimate use sizeof() on such a parameter.

It would be an incompatible extension to ISO C, which would make sure
that there are no remaining uses of sizeof(array_param), which would
itself allow in the future --if n2906 or something similar is accepted--
repurposing sizeof() to return the size in bytes of the array (instead
of the size of a pointer) without worrying about breaking existing code.

Any opinions?

Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


Re: n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-27 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sat, Jul 27, 2024 at 12:59:34AM GMT, Martin Uecker wrote:
> Am Samstag, dem 27.07.2024 um 00:26 +0200 schrieb Alejandro Colomar:
> > On Sat, Jul 27, 2024 at 12:03:20AM GMT, Martin Uecker wrote:
> > > > Maybe if GNU C compilers (GCC and Clang) add it first as an extension,
> > > > adding diagnostics, it would help.
> > > 
> > > Both GCC and Clang already have such diagnostics and/or run-time checks:
> > > 
> > > https://godbolt.org/z/MPnxqb9h7
> > 
> > Hi Martin,
> > 
> > I guess that's prior art enough to make this UB in ISO C.  Is there any
> > paper for this already?  Does any of your paper cover that?  Should I
> > prepare one?
> > 
> 
> What do you mean by "this"?

Adding UB.

>  Adding UB would likely see a lot
> of opposition,

But UB allows for safer code.  It's the lack of UB what reduces the
quality of diagnostics, which results in worse code.  I understand it
will see opposition, so we better wait for the path to be prepared
(i.e., n2906 already merged before presenting a paper), but once that's
done, I'd try to add UB.

> even where this could enable run-time checks.  

(And build-time too.)

> N2906 would make 
> 
> int foo(char f[4]);
> int foo(char f[5]);
> 
> a constraint violation (although having those types be incompatible
> could also cause UB indirectly, this would not be its main effect).
> 
> So I think brining a new version of this paper forward would be
> a possible next step, addressing the issues raised in the past.

Yeah, that would be a good next step.  And when the array type is part
of the function type, it'll be easier to convince that [n] can only mean
[n].

Have a lovely day!
Alex

> Martin
> 

-- 



signature.asc
Description: PGP signature


Re: n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-26 Thread Alejandro Colomar via Gcc
On Sat, Jul 27, 2024 at 12:03:20AM GMT, Martin Uecker wrote:
> > Maybe if GNU C compilers (GCC and Clang) add it first as an extension,
> > adding diagnostics, it would help.
> 
> Both GCC and Clang already have such diagnostics and/or run-time checks:
> 
> https://godbolt.org/z/MPnxqb9h7

Hi Martin,

I guess that's prior art enough to make this UB in ISO C.  Is there any
paper for this already?  Does any of your paper cover that?  Should I
prepare one?

Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


Re: n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-26 Thread Alejandro Colomar via Gcc
On Fri, Jul 26, 2024 at 09:22:42PM GMT, Joseph Myers wrote:
> On Fri, 26 Jul 2024, Alejandro Colomar via Gcc wrote:
> 
> > > See reflector message SC22WG14.18575, 17 Nov 2020 (the former convenor 
> > > replying when I asked about just that paper).
> > 
> > Where can I find reflector messages?
> 
> https://www.open-std.org/jtc1/sc22/wg14/18575

Thanks!

> 
> > And another one to propose that [n] means the same as [static n] except
> > for the nonnull property of static.
> 
> I'm not convinced that introducing extra undefined behavior for things 
> that have been valid since C89 (which would be the effect of such a change 
> for any code that passes a smaller array) is a good idea - the general 
> mood is to *reduce* undefined behavior.

While [n] has always _officially_ meant the same as [], it has never
made any sense to write code like that.  Unofficially, it has always
meant the obvious thing.

Maybe if GNU C compilers (GCC and Clang) add it first as an extension,
adding diagnostics, it would help.

Does anyone know of any existing code that uses [n] for meaning anything
other than "n elements are available to the function"?

Functions that specify [n] most likely (definitely?) already mean that
n elements are accessed, and thus passing something different than n
elements results in UB one way or another.  Having the compiler enforce
that via diagnostics and UB is probably an improvement.

Cheers,
Alex

> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature


Re: n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-26 Thread Alejandro Colomar via Gcc
Hi Joseph,

On Fri, Jul 26, 2024 at 08:30:33PM GMT, Joseph Myers wrote:
> On Fri, 26 Jul 2024, Alejandro Colomar via Gcc wrote:
> 
> > I don't see why it should not apply to void*.  memcpy(3) should get this
> > attribute:
> > 
> > [[alx::restrict(1)]]
> > [[alx::restrict(2)]]
> > void *memcpy(void *dst, const void *src, size_t n);
> 
> That would disallow copying between disjoint subarrays within the same 
> toplevel object (and there's no way to specify an array size for void *), 
> which hardly seems right.

Hmmm, I sometimes forget that ISO C is so painful about void.

Has WG14 discussed in the past about the GNU extension that defines
sizeof(void) == 1?

Maybe wording that also considers compiler-specific attributes and
extensions would allow for the following:

[[gnu::access(write_only, 1, 3)]]
[[gnu::access(read_only, 2, 3)]]
[[alx::restrict(1)]]
[[alx::restrict(2)]]
void *memcpy(void *dst, const void *src, size_t n);

The GNU attribute specifies the number of elements of the subarrays, and
the GNU extension sizeof(void)==1 specifies the size of each element.
That gives us the size of the subarrays to be considered for the
restrictness.

So, ISO C wouldn't be allowed to mark malloc(3) as [[alx::restrict]]
(unless they add these GNU extensions), but GNU C could.

> > BTW, the author of n2529 didn't follow up, right?  I'd like that in, so
> > I'll prepare something after n2906 is merged.  Martin, would you mind
> > pinging me about it?
> 
> See reflector message SC22WG14.18575, 17 Nov 2020 (the former convenor 
> replying when I asked about just that paper).

Where can I find reflector messages?

>  As far as I know the author 
> has not yet provided an updated version / asked for it to be added to a 
> meeting agenda.

I think you mentioned that to me some time ago.  I guess I'll take over
then.  I'll ask for a number to propose _Nitems().

And another one to propose that [n] means the same as [static n] except
for the nonnull property of static.

Have a lovely night!
Alex

> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 
> 

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature


Re: n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-26 Thread Alejandro Colomar via Gcc
Hi Joseph,

On Fri, Jul 26, 2024 at 04:24:14PM GMT, Joseph Myers wrote:
> On Wed, 10 Jul 2024, Alejandro Colomar via Gcc wrote:
> 
> >6.7.13.x The restrict function attribute
> >  Constraints
> > The restrict attribute shall be applied to a function.
> > 
> > A 1‐based index can be specified in an  attribute  argument
> > clause,  to  associate the attribute with the corresponding
> > parameter of the function, which must be of a pointer type.
> 
> It's more appropriate to say "shall", and you need a requirement for the 
> pointer to be a pointer to a complete object type (it makes no sense with 
> function pointers, or void).

I don't see why it should not apply to void*.  memcpy(3) should get this
attribute:

[[alx::restrict(1)]]
[[alx::restrict(2)]]
void *memcpy(void *dst, const void *src, size_t n);

The index to which the text above refers is that '(1)' and '(2)'.

>  That is, something like "If an attribute 
> argument clause is present, it shall have the form:
> 
>   ( constant-expression )
> 
> The constant expression shall be an integer constant expression with 
> positive value.  It shall be the index, counting starting from 1, of a 
> function parameter whose type is a pointer to a complete object type.".
> 
> (That might not quite be sufficient - there are the usual questions of 
> exactly *when* the type needs to be complete, if it's completed part way 
> through the function definition, but the standard already doesn't tend to 
> specify such things very precisely.)
> 
> > (Optional.)   The argument attribute clause may be omitted,
> > which is equivalent to specifying the  attribute  once  for
> > each parameter that is a pointer.
> 
> For each parameter that is a pointer to a complete object type, or should 
> there be a constraint violation in this case if some are pointers to such 
> types and some are pointers to other types?
> 
> > If the number of elements is specified with array  notation
> > (or  a compiler‐specific attribute), the array object to be
> > considered for aliasing is a sub‐object of the original ar‐
> > ray object, limited by the number  of  elements  specifiedr
> > [1].
> 
> This is semantically problematic in the absence of something like N2906 
> (different declarations could use different numbers of elements),

Agree.  I think arrays should be fixed in C.  n2906 is a good step
towards that.  Thanks Martin!  :)

BTW, the author of n2529 didn't follow up, right?  I'd like that in, so
I'll prepare something after n2906 is merged.  Martin, would you mind
pinging me about it?

For what this [[alx::restrict]] proposal is concerned, I'd wait after
n2906 is merged for proposing that extension.

> and even 
> N2906 wouldn't help for the VLA case.

I'd basically propose that [3] or [n] means the same as [static 3] and
[static n], except for the nonnull implications of static.  Is there any
such paper?  I'm interested in presenting one for that.

Maybe it would also be interesting to wait after n2906 for that too.

> >  [1]  For the following prototype:
> > 
> >  [[restrict(1)]] [[restrict(2)]]
> >  void f(size_t n, int a[n], const int b[n]);
> 
> That declaration currently means
> 
>   void f(size_t n, int a[*], const int b[*]);

Yeah, that should be fixed in the standard.

I'll keep that extension of restrict out of a proposal until array
parameters are fixed in that regard.

> (that is, the expression giving a VLA size is ignored).  It's equivalent 
> to e.g.
> 
>   void f(size_t n, int a[n + foo()], const int b[n + bar()]);
> 
> where because the size expressions are never evaluated and there's no time 
> defined for evaluation, it's far from clear what anything talking about 
> them giving an array size would even mean.

Yup.

> I know that "noalias" was included in some C89 drafts but removed from the 
> final standard after objections.  Maybe someone who was around then could 
> explain what "noalias" was, what the problems with it were and how it 
> differs from "restrict", so we can make sure that any new proposals in 
> this area don't suffer from whatever the perceived deficiencies of 
> "noalias" were?

As I said in reply to Branden's response, it seems Dennis's concern was
that the noalias proposal was a qualifier, which admittedly makes little
sense (very much like the problems restrict has, but applied to the
pointee, which makes them much worse).

That in fact led me recently to think that an _Optional qualifier
(similar to Clang's _Nullable) as is being proposed at the moment in
n3222 is similarly DOA.  Those qualities of pointers are attributes,
which cannot be specified in the type system.

Have a lovely night!
Alex


> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com


-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature


Re: n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-26 Thread Alejandro Colomar via Gcc
Hi Branden!

On Fri, Jul 26, 2024 at 11:35:51AM GMT, G. Branden Robinson wrote:
> At 2024-07-26T16:24:14+, Joseph Myers wrote:
> > I know that "noalias" was included in some C89 drafts but removed from
> > the final standard after objections.  Maybe someone who was around
> > then could explain what "noalias" was, what the problems with it were
> 
> For this part, I think the source most often cited is Dennis Ritchie's
> thunderbolt aimed directly at "noalias".
> 
> https://www.lysator.liu.se/c/dmr-on-noalias.html

Thanks!  It seems Dennis's concern was that it was a qualifier.
Probably the reason why restrict ended up being a qualifier on the
pointer (and thus easily ignored), instead of the pointee (it would have
caused the problems that Dennis mentioned and which anyone can guess).

Since I'm suggesting an attribute, we are pretty much safe from type
rules, and thus safe from Dennis's concerns, I think.

Have a lovely night!
Alex

> 
> > and how it differs from "restrict",
> 
> I can only disqualify myself as an authority here.
> 
> > To comprehensively address this demands so we can make sure that any
> > new proposals in this area don't suffer from whatever the perceived
> > deficiencies of "noalias" were?
> 
> I think it would be valuable to get such a discussion into the rationale
> of the next C standard.
> 
> Regards,
> Branden

-- 



signature.asc
Description: PGP signature


Re: Apply function attributes (e.g., [[gnu::access()]]) to pointees too

2024-07-12 Thread Alejandro Colomar via Gcc
Hi Martin,

On Thu, Jul 11, 2024 at 06:34:04PM GMT, Alejandro Colomar wrote:
> Hi Martin, David,
> 
> On Thu, Jul 11, 2024 at 06:08:38PM GMT, David Brown wrote:
> > On 11/07/2024 11:58, Martin Uecker via Gcc wrote:
> > > > [[gnu::access(read_write, 1)]]
> > > > [[gnu::access(read_only, 2)]]
> > > > [[gnu::nonnull(1, 2)]]
> > > > [[gnu::null_terminated_string_arg(2)]]
> > > > char *
> > > > strsep(char **restrict sp, const char *delim);
> > > 
> > > The main problem from a user perspective is that
> > > these are attributes on the function declaration
> > > and not on the argument (type).
> > > 
> > > > 
> > > > I was thinking that with floating numbers, one could specify the number
> > > > of dereferences with a number after the decimal point.  It's a bit
> > > > weird, since the floating point is interpreted as two separate integer
> > > > numbers separated by a '.', but could work.  In this case:
> > > > 
> > > > [[gnu::access(read_write, 1)]]
> > > > [[gnu::access(read_write, 1.1)]]
> > > > [[gnu::access(read_only, 2)]]
> > > > [[gnu::nonnull(1, 2)]]
> > > > [[gnu::null_terminated_string_arg(1.1)]]
> > > > [[gnu::null_terminated_string_arg(2)]]
> > > > char *
> > > > strsep(char **restrict sp, const char *delim);
> > > > 
> > > > Which would mark the pointer *sp as read_write and a string.  What do
> > > > you think about it?
> > > 
> > > If the attributes could be applied to the type, then
> > > one could attach them directly at an intermediate
> > > pointer level, which would be more intuitive and
> > > less fragile.
> > > 

On the other hand, I was thinking of this not for [[gnu::access()]], but
for [[alx::restrict()]].

The idea was to be able to mark strtol(3) with it:

[[alx::restrict(1, 2.1)]]
[[alx::restrict(2)]]
[[gnu::access(read_only, 1)]]
[[gnu::access(write_only, 2)]]
[[gnu::access(none, 2.1)]]
[[gnu::nonnull(1)]]
[[gnu::null_terminated_string_arg(1)]]
[[gnu::leaf]]
[[gnu::nothrow]]
long
strtol(const char *nptr, char **endp, int base);

For marking parameters 1 and 2.1 as possibly aliasing each other
requires doing so at the function, and not at the parameter.

Have a lovely day!
Alex

> > 
> > That would be a huge improvement (IMHO).  Then you could write :
> > 
> > #define RW [[gnu::access(read_write)]]
> > #define RO [[gnu::access(read_only)]]
> > #define NONNULL [[gnu::nonnull]]
> > #define CSTRING [[gnu::null_terminated_string_arg]]
> > 
> > char * strsep(char * RW * RW NONNULL CSTRING restrict sp,
> > const char * RO NUNNULL CSTRING delim);
> 
> Yup; if that could be done, it would be interesting.  Martin, can it be
> done?  I'm worried that it might get ambiguous in some cases.  Is there
> any summary of positions where C23 attributes can go and their meanings?
> I always have a hard time finding all the possible combinations.
> 
> Should such a new attribute go to the left of the '*', or to the right?
> 
> > It would be even better if the characteristics could be tied into a typedef.
> > 
> > typedef const char * [[gnu::access(read_only)]] [[gnu::nonnull]]
> > [[gnu::null_terminated_string_arg]] const_cstring;
> 
> H.
> 
> > David
> 
> Cheers,
> Alex
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: Apply function attributes (e.g., [[gnu::access()]]) to pointees too

2024-07-11 Thread Alejandro Colomar via Gcc
Hi Martin, David,

On Thu, Jul 11, 2024 at 06:08:38PM GMT, David Brown wrote:
> On 11/07/2024 11:58, Martin Uecker via Gcc wrote:
> > >   [[gnu::access(read_write, 1)]]
> > >   [[gnu::access(read_only, 2)]]
> > >   [[gnu::nonnull(1, 2)]]
> > >   [[gnu::null_terminated_string_arg(2)]]
> > >   char *
> > >   strsep(char **restrict sp, const char *delim);
> > 
> > The main problem from a user perspective is that
> > these are attributes on the function declaration
> > and not on the argument (type).
> > 
> > > 
> > > I was thinking that with floating numbers, one could specify the number
> > > of dereferences with a number after the decimal point.  It's a bit
> > > weird, since the floating point is interpreted as two separate integer
> > > numbers separated by a '.', but could work.  In this case:
> > > 
> > >   [[gnu::access(read_write, 1)]]
> > >   [[gnu::access(read_write, 1.1)]]
> > >   [[gnu::access(read_only, 2)]]
> > >   [[gnu::nonnull(1, 2)]]
> > >   [[gnu::null_terminated_string_arg(1.1)]]
> > >   [[gnu::null_terminated_string_arg(2)]]
> > >   char *
> > >   strsep(char **restrict sp, const char *delim);
> > > 
> > > Which would mark the pointer *sp as read_write and a string.  What do
> > > you think about it?
> > 
> > If the attributes could be applied to the type, then
> > one could attach them directly at an intermediate
> > pointer level, which would be more intuitive and
> > less fragile.
> > 
> 
> That would be a huge improvement (IMHO).  Then you could write :
> 
> #define RW [[gnu::access(read_write)]]
> #define RO [[gnu::access(read_only)]]
> #define NONNULL [[gnu::nonnull]]
> #define CSTRING [[gnu::null_terminated_string_arg]]
> 
> char * strsep(char * RW * RW NONNULL CSTRING restrict sp,
>   const char * RO NUNNULL CSTRING delim);

Yup; if that could be done, it would be interesting.  Martin, can it be
done?  I'm worried that it might get ambiguous in some cases.  Is there
any summary of positions where C23 attributes can go and their meanings?
I always have a hard time finding all the possible combinations.

Should such a new attribute go to the left of the '*', or to the right?

> It would be even better if the characteristics could be tied into a typedef.
> 
> typedef const char * [[gnu::access(read_only)]] [[gnu::nonnull]]
> [[gnu::null_terminated_string_arg]] const_cstring;

H.

> David

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


Re: Bad interaction between gcc and glibc's handling of GNU extension [GCC PR 115724]

2024-07-09 Thread Alejandro Colomar via Gcc
Hi Mark,

David Malcolm  wrote:
> On Tue, 2024-07-02 at 22:39 +0200, Mark Wielaard wrote:
> > Is there an "optimal" optimization level for -fanalyzer (like having
> > -Og for debugging)?
>
> There isn't, sorry.

What I do is compile several times in a loop, with all optimization
levels, to maximize diagnostics.

$(_REALNAME): %.so.$(DISTVERSION): $(TU_h) $(TU_c) $(MK) $(LIB_pc) | $$(@D)/
$(info  $(INFO_)LD  $@)
for opt in g 0 1 2 s 3 fast; do \
$(LD) $(LDFLAGS) -O$$opt -o $*.O$$opt.so.$(DISTVERSION) $(TU_c) 
$(LDLIBS); \
done
$(LD) $(LDFLAGS) -o $@  $(TU_c) 
$(LDLIBS)



The shell is

SHELL   := bash
.SHELLFLAGS := -Eeuo pipefail -c

so it aborts at the first error, and I don't see repeated errors.  It
slows down compilation (since I compile x8 times), but I don't care too
much about it.

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


n3294 - The restrict function attribute as a replacement of the restrict qualifier

2024-07-09 Thread Alejandro Colomar via Gcc
Here's a proposal for adding a function attribute for replacing
the restrict restrict qualifier.  It's v0.3 of n3294 (now we have a
document number).

I was going to name it [[noalias()]], but I thought that it would be
possible to mark several pointers as possibly referencing the same
object, and then the name [[restrict()]] made more sense.

It's based on a proposal I sent to Martin recently in this discussion.

Do you have any feedback for this?

I've attached the man(7) source and the resulting PDF, and below goes
a plain text rendering (formatting is lost).


Have a lovely night!
Alex

---

N3294 (WG14)Proposal for C2y   N3294 (WG14)

Name
 n3294  - The [[restrict()]] function attribute as a replacement of
 the restrict qualifier

Category
 Feature and deprecation.

Author
 Alejandro  Colomar  Andres;  maintainer  of  the  Linux  man-pages
 project.

   Cc
 GNU C library
 GNU Compiler Collection
 Linux man‐pages
 Paul Eggert
 Xi Ruoyao
 Jakub Jelinek
 Martin Uecker
 LIU Hao
 Jonathan Wakely
 Richard Earnshaw
 Sam James
 Emanuele Torre
 Ben Boeckel
 "Eissfeldt, Heiko"
 David Malcolm

Description
   restrict qualifier
 The  restrict  qualifier is not useful for diagnostics.  Being de‐
 fined in terms of accesses, the API is not enough for a caller  to
 know what the function will do with the objects it receives.

 That is, a caller cannot know if the following call is correct:

void f(const int *restrict a, int **restrict b);

f(a, &a);

 Having  no way to determine if a call will result in Undefined Be‐
 havior makes it a dangerous qualifier.

 The reader might notice that this prototype and call is very simi‐
 lar to the prototype of strtol(3), and the use reminds of a  rela‐
 tively common use of that function.

   Diagnostics
 A good replacement of the restrict qualifier should allow to spec‐
 ify  in  the  API of the following function that it doesn’t accept
 pointers that alias.

void
replace(const T *restrict new, T **restrict ls, size_t pos)
{
 memcpy(ls[pos], new, sizeof(T));
}

 This proposal suggests the following:

[[restrict(1)]] [[restrict(2)]]
void
replace(const T *restrict new, T **restrict ls, size_t pos);

replace(arr[3], arr, 2);  // UB; can be diagnosed

   Qualifiers
 It is also unfortunate that restrict  is  a  qualifier,  since  it
 doesn’t  follow  the  rules  that  apply  to all other qualifiers.
 While it is discarded easily, its  semantics  make  it  as  if  it
 couldn’t be discarded.

   Function attribute
 The purpose of restrict is to

 •  Allow functions to optimize based on the knowledge that certain
objects are not accessed by any other object in the same scope;
usually a function boundary, which is the most opaque boundary,
and where this information is not otherwise available.

 •  Diagnose  calls  that  would result in Undefined Behavior under
this memory model.

 Qualifiers don’t seem to be good for  carrying  this  information,
 but  function attributes are precisely for adding information that
 cannot be expressed by just using the type system.

 An attribute would need to be more strict than the restrict quali‐
 fier to allow diagnosing non‐trivial cases, such as the call shown
 above.

 A caller only knows what the callee receives,  not  what  it  does
 with  it.  Thus, for diagnostics to work, the semantics of a func‐
 tion attribute should be specified in terms of what a function  is
 allowed to receive.

   [[restrict]]
 The  [[restrict]] function attribute specifies that the pointer to
 which it applies is the only reference  to  the  array  object  to
 which  it  points (except that a pointer to one past the last ele‐
 ment may overlap another object).

 If the number of elements is specified with array  notation  or  a
 compiler‐specific  attribute, the array object to be considered is
 a subobject of the original array object, which is limited by  the
 number of elementsspecified in the function prototype.

 For the following prototype:

[[restrict(1)]] [[restrict(2)]]
void add_inplace(size_t n, int a[n], const int b[n]);

 In  the following calls, the caller is able to determine with cer‐
 tainty if the behavior is defined or undefined:

char  a[100] = ...;
char  b[50] = ...;

add_inplace(50, a, a + 50);  // Ok
add_inplace(50, a, b);   // Ok
add_inplace(50, a, a);   // UB

 In the first of the three calls, the parameters don’t alias inside
 the function, since the subobjects of 50 elements do  not  overlap
 each  other,  

Re: [WG14] Request for document number; strtol restrictness

2024-07-09 Thread Alejandro Colomar via Gcc
Hi Daniel,

On Sun, Jul 07, 2024 at 03:46:48PM GMT, Daniel Plakosh wrote:
> Alex,
> 
> Your document number is below:
> 
> n3294 - strtol(3) et al. shouldn't have a restricted first parameter
> 
> Please return the updated document with this number

Am I allowed to retitle the paper?

  n3294 - [[noalias()]] function attribute as a replacement of restrict

Sorry for any inconveniences.

Thanks,
Alex

> 
> Best regards,
> 
> Dan
> 
> Technical Director - Enabling Mission Capability at Scale
> Principal Member of the Technical Staff
> Software Engineering Institute
> Carnegie Mellon University
> 4500 Fifth Avenue
> Pittsburgh, PA 15213
> WORK: 412-268-7197
> CELL: 412-427-4606
> 
> -Original Message-
> From: Alejandro Colomar  
> Sent: Friday, July 5, 2024 3:42 PM
> To: dplak...@cert.org
> Cc: Martin Uecker ; Jonathan Wakely ; 
> Xi Ruoyao ; Jakub Jelinek ; 
> libc-al...@sourceware.org; gcc@gcc.gnu.org; Paul Eggert ; 
> linux-...@vger.kernel.org; LIU Hao ; Richard Earnshaw 
> ; Sam James 
> Subject: [WG14] Request for document number; strtol restrictness
> 
> Hi,
> 
> I have a paper for removing restrict from the first parameter of
> strtol(3) et al.  The title is
> 
>   strtol(3) et al. should’t have a restricted first parameter.
> 
> If it helps, I already have a draft of the paper, which I attach (both the 
> PDF, and the man(7) source).
> 
> Cheers,
> Alex
> 
> --
> 

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-09 Thread Alejandro Colomar via Gcc
Hi Paul,

On Tue, Jul 09, 2024 at 02:09:24PM GMT, Paul Eggert wrote:
> On 7/8/24 00:52, Alejandro Colomar wrote:
> > a small set of functions
> > accept pointers that alias each other, but one of them is never
> > accessed; in those few cases, restrict was added to the parameters in
> > ISO C, but I claim it would be better removed.
> 
> Are these aliasing pointers the nptr and initial *endptr of strtol?

Yes.

> That is,
> are you saying that last line in the following example, which is currently
> invalid, should become valid and should be implementable as ‘end = s; long l
> = 0;’?

No.  I don't think this is a consequence of the previous statement.

> 
>char *end;
>char *s = (char *) &end;
>*s = '\0';
>long l = strtol (s, &end, 0);
> 
> If so, I fail to see the motivation for the proposed change, as nobody
> writes (or should write) code like that. And if not, evidently I
> misunderstand the proposal.

My proposal is:

 long int
-strtol(const char *restrict nptr, char **restrict endptr, int base);
+strtol(const char *nptr, char **restrict endptr, int base);

My proposal doesn't make valid the example above.  To make that example
valid, you'd need:

long int
strtol(const char *nptr, char **endptr, int base);

Because in the example above, you're aliasing nptr with endptr, not with
*endptr.  Thus, endptr cannot be a restricted pointer for that example
to be valid.

[... snip ...]

I'm not sure I understood that part, but it's probably a consequence of
the misuderstanding from above.  Let's ignore it for now, and please
resend if you think it's still a concern.

> 
> > Maybe I should use abstract names for the objects, to avoid confusing
> > them with the pointer variables that are used to pass them?
> 
> That might help, yes, since v0.2 is unclear on this point.

Ok; will do.

> > this formal
> > definition is quite unreadable, though.  The more I read it, the less
> > sure I am about it.
> 
> Yes, it’s lovely isn’t it? One must understand what the C committee
> intended in order to read and understand that part of the standard.

:-)

> > If L is used to access the value of the object X that it
> > designates, and X is also modified (by any means), then the
> > following requirements apply: T shall not be const-qualified
> >
> > This reads to me as "const variables are not writable when they are
> > accessed via a restricted pointer; casting away is not enough".  Am I
> > reading this correctly?
> 
> In that quoted statement, the restricted pointer is not allowed to be
> pointer-to-const. However, I’m not quite sure what your question means, as
> the phrase “const variables” does not appear in the standard. Perhaps give
> an example to clarify the question?

I should have said

"An object pointed to by a pointer-to-const cannot be written if the
pointer is a restricted one; casting const away is not enough."

Is this interpretation of restrict correct?

> >> an implementation is allowed to set errno = EINVAL first thing, and then
> set
> >> errno to some other nonzero value if it determines that the arguments are
> >> valid. I wouldn't implement strtol that way, but I can see where someone
> >> else might do that.
> >
> > In any case an implementation is not obliged to pessimize strtol(3).  It
> > is only allowed to.  Should we not allow them to do so?
> 
> Of course the standard should allow suboptimal implementations. However, I’m
> not sure what the point of the question is. The “errno = EINVAL first thing”
> comment says that removing ‘restrict’ obliges the implementation to support
> obviously-bogus calls like strtol(&errno, ...), which might make the
> implementation less efficient.

See for example how musl implements strtol(3):

$ grepc strtox src/stdlib/strtol.c
src/stdlib/strtol.c:static unsigned long long strtox(const char *s, char **p, 
int base, unsigned long long lim)
{
FILE f;
sh_fromstring(&f, s);
shlim(&f, 0);
unsigned long long y = __intscan(&f, base, 1, lim);
if (p) {
size_t cnt = shcnt(&f);
*p = (char *)s + cnt;
}
return y;
}

The work is done within __intscan(), which could be prototyped as

hidden unsigned long long
__intscan(FILE *restrict, unsigned, int, unsigned long long);

And now you're able to optimize internally, since thanks to that helper
function you know it doesn't alias errno, regardless of the external
API.


BTW, now I remember that strtol(3) says:

ERRORS
 This function does not modify errno on success.

Which means that setting errno at function start wouldn't make much
sense.  Although there's probably a contrived way of doing it and still
be conformant (plus, I think ISO C doesn't say that about errno).

> I don’t see how the question is relevant to
> that comment.
> 
> 
> > Let's take a simpler one: rename(2).  Is it allowed to receive &errno?
> > Hopefully not.
> 
> I agree with that hope, 

Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-09 Thread Alejandro Colomar via Gcc
On Tue, Jul 09, 2024 at 12:28:18PM GMT, Alejandro Colomar wrote:
> Hi Jakub,
> 
> On Tue, Jul 09, 2024 at 11:18:11AM GMT, Jakub Jelinek wrote:
> > On Tue, Jul 09, 2024 at 11:07:59AM +0200, Alejandro Colomar wrote:
> > > Yup, I was thinking that maybe noalias is a better name.
> > 
> > Name is one thing, but you'd also need to clearly define what it means.
> > When restrict is access based, it is clear what it means.
> > 
> > If you want something else which is not based on accesses and which should
> > allow warnings in the callers, I suppose you need to specify not just the
> > pointer but the extent as well (and maybe stride) or that it is an '\0'
> 
> Agree.  Here's how I'd define it as an attribute:
> 
> noalias
> 
>   The noalias function attribute specifies that the pointer to
>   which it applies is the only reference to the array object that
>   it points to (except that a pointer to one past the last
>   element may overlap another object).
> 
>   If the number of elements is specified with array notation, the
>   array object to be considered is a subobject of the original
>   array object, which is limited to the number of elements
>   specified in the function prototype.
> 
>   Example:
> 
>   [[alx::noalias(1)]] [[alx::noalias(2)]]
>   [[gnu::access(read_write, 1)]] [[gnu::access(read_only, 2)]]
>   void add_inplace(int a[n], const int b[n], size_t n);

Ooops, I meant 'n' to be the first parameter.

> 
>   char arr[100] = ...;
> 
>   add_inplace(arr, arr + 50, 50);
> 
>   In the example above, the parameters a and b don't alias inside
>   the function, since the subobjects of 50 elements do not overlap
>   eachother, even though they are one single array object to the
>   outer function.
> 
> It may need some adjustment, to avoid conflicts with other parts of
> ISO C, but this is the idea I have in mind.
> 
> > terminated string, because if you want to say that for
> > void foo (char *, const char *, int);
> > the 2 pointers don't really alias, the size information is missing.  So,
> > shall the new warning warn on
> > struct S { char a[1024]; char b[1024]; } s;
> > foo (s.a, s.b, 512);
> 
> This does not need clarification of bounds.  You're passing separate
> objects, and thus cannot alias (except that maybe you're able to cast
> to the struct type, and then access s.b from a pointer derived from
> s.a; I never know that rule too well).
> 
> > or not?  Or foo (s.a, s.a + 512, 512);
> 
> According to the definition I provide in this email, the above is just
> fine.
> 
> Thanks!
> 
> Have a lovely day!
> Alex
> 
> > 
> > Jakub
> > 
> > 
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-09 Thread Alejandro Colomar via Gcc
Hi Jakub,

On Tue, Jul 09, 2024 at 11:18:11AM GMT, Jakub Jelinek wrote:
> On Tue, Jul 09, 2024 at 11:07:59AM +0200, Alejandro Colomar wrote:
> > Yup, I was thinking that maybe noalias is a better name.
> 
> Name is one thing, but you'd also need to clearly define what it means.
> When restrict is access based, it is clear what it means.
> 
> If you want something else which is not based on accesses and which should
> allow warnings in the callers, I suppose you need to specify not just the
> pointer but the extent as well (and maybe stride) or that it is an '\0'

Agree.  Here's how I'd define it as an attribute:

noalias

The noalias function attribute specifies that the pointer to
which it applies is the only reference to the array object that
it points to (except that a pointer to one past the last
element may overlap another object).

If the number of elements is specified with array notation, the
array object to be considered is a subobject of the original
array object, which is limited to the number of elements
specified in the function prototype.

Example:

[[alx::noalias(1)]] [[alx::noalias(2)]]
[[gnu::access(read_write, 1)]] [[gnu::access(read_only, 2)]]
void add_inplace(int a[n], const int b[n], size_t n);

char arr[100] = ...;

add_inplace(arr, arr + 50, 50);

In the example above, the parameters a and b don't alias inside
the function, since the subobjects of 50 elements do not overlap
eachother, even though they are one single array object to the
outer function.

It may need some adjustment, to avoid conflicts with other parts of
ISO C, but this is the idea I have in mind.

> terminated string, because if you want to say that for
> void foo (char *, const char *, int);
> the 2 pointers don't really alias, the size information is missing.  So,
> shall the new warning warn on
> struct S { char a[1024]; char b[1024]; } s;
> foo (s.a, s.b, 512);

This does not need clarification of bounds.  You're passing separate
objects, and thus cannot alias (except that maybe you're able to cast
to the struct type, and then access s.b from a pointer derived from
s.a; I never know that rule too well).

> or not?  Or foo (s.a, s.a + 512, 512);

According to the definition I provide in this email, the above is just
fine.

Thanks!

Have a lovely day!
Alex

> 
>   Jakub
> 
> 

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-09 Thread Alejandro Colomar via Gcc
Hi Martin,

On Tue, Jul 09, 2024 at 07:58:40AM GMT, Martin Uecker wrote:
> Am Montag, dem 08.07.2024 um 22:17 +0200 schrieb Alejandro Colomar:
> > Hi Martin,
> > 
> > On Mon, Jul 08, 2024 at 06:05:08PM GMT, Martin Uecker wrote:
> > > Am Montag, dem 08.07.2024 um 17:01 +0200 schrieb Alejandro Colomar:
> > > > On Mon, Jul 08, 2024 at 10:30:48AM GMT, David Malcolm wrote:
> > > 
> > > ...
> > > > And then have it mean something strict, such as: The object pointed to
> > > > by the pointer is not pointed to by any other pointer; period.
> > > > 
> > > > This definition is already what -Wrestrict seems to understand.
> > > 
> > > One of the main uses of restrict is scientific computing. In this
> > > context such a definition of "restrict" would not work for many 
> > > important use cases. But I agree that for warning purposes the
> > > definition of "restrict" in ISO C is not helpful.
> > 
> > Do you have some examples of functions where this matters and is
> > important?  I'm curious to see them.  Maybe we find some alternative.
> 
> In many numerical algorithms you want to operate on
> different parts of the same array object.  E.g. for matrix
> decompositions you want to take a row / column and add it 
> to another. Other examples are algorithms that decompose
> some input (.e.g. high and low band in a wavelet transform)
> and store it into the same output array, etc.
> 
> Without new notation for strided array slicing, one

I'll have to remove dust from my old proposal of [.nmemb]?  :)

> fundamentally needs the flexibility of restrict that
> only guarantuees that actual accesses do not conflict.

I guess a combination of [.nmemb] (or the third argument of
[[gnu::access()]] and [[alx::noalias]] could be good enough for such a
use case?

[[alx::noalias(1)]] [[alx::noalias(2)]]
void add(int a[.n], const int b[.n], n);

//or

[[alx::noalias(1)]] [[alx::noalias(2)]]
[[gnu::access(read_write, 1, 3)]] [[gnu::access(read_only, 2, 3)]]
void add(int a[], const int b[], n);

mad(&arr[0], &arr[50], 50);

The caller should be able to know that while the pointers can alias
within the caller, they don't alias within the callee, since the callee
has no right to access past the specified bound*.

*  The standard would have to tighten bounds in function interfaces,
   since right now, the value is just ignored if it doesn't come with
   'static' (which I never understood), and if specified with 'static',
   it's just a lower bound not a strict bound.  I would propose changing
   the meaning of [N] in a function prototype to mean a strict bound.

Does that make sense?

Have a lovely day!
Alex

> But this then implies that one can not use restrict as a
> contract specification on function prototypes, but has
> to analyze the implementation of a function to see if
> it is used correctly.  But I would not see it as a design 
> problem of restrict. It was simply not the intended use 
> case when originally designed. 

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-09 Thread Alejandro Colomar via Gcc
Hi Dave,

On Mon, Jul 08, 2024 at 06:48:51PM GMT, David Malcolm wrote:
> > restrict, as of the formal definition of ISO C is useless crap.  The
> > more I read it, the more I agree.
> 
> Please note that "useless crap" was your wording, not mine.

Yup.  :)

> 
> > 
> > restrict, as of what -Wrestrict warns about, seems a reasonable
> > thing.
> > 
> > How about a [[gnu::restrict()]] attribute, similar to
> > [[gnu::access()]],
> > which is simpler than the qualifier?  Since restrict is only
> > meaningful
> > in function boundaries, it would make sense to have a function
> > attribute.  We don't want a qualifier that must follow discarding
> > rules.
> 
> If it doesn't have the same meaning as "restrict" then perhaps call the
> proposed attribute something other than "restrict"?

Yup, I was thinking that maybe noalias is a better name.

> 
> That said, I don't have strong opinions on any of this, except to note
> that I have more than enough *other* work on improvements to GCC's
> static analyzer and usability to keep me busy, so getting sucked into
> discussion/implementation on 'restrict' is something I want to avoid,
> and -Wanalyzer-overlapping-buffers is getting the job done for me at
> the moment.
> 
> [...snip...]
> 
> Hope this is constructive; sorry again if I missed anything due to only
> skimming the thread

It is.  I don't want you to work on this if you don't have time or
interest.  Just having the idea floating aroud, and if somebody finds
time to have a look at it in the next decade, maybe try it.  :-)

Does that make sense?

Cheers,
Alex

> 
> Dave
> 
> 

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-08 Thread Alejandro Colomar via Gcc
Hi Martin,

On Mon, Jul 08, 2024 at 06:05:08PM GMT, Martin Uecker wrote:
> Am Montag, dem 08.07.2024 um 17:01 +0200 schrieb Alejandro Colomar:
> > On Mon, Jul 08, 2024 at 10:30:48AM GMT, David Malcolm wrote:
> 
> ...
> > And then have it mean something strict, such as: The object pointed to
> > by the pointer is not pointed to by any other pointer; period.
> > 
> > This definition is already what -Wrestrict seems to understand.
> 
> One of the main uses of restrict is scientific computing. In this
> context such a definition of "restrict" would not work for many 
> important use cases. But I agree that for warning purposes the
> definition of "restrict" in ISO C is not helpful.

Do you have some examples of functions where this matters and is
important?  I'm curious to see them.  Maybe we find some alternative.

> > > Has the C standard clarified the meaning of 'restrict' since that
> > > discussion?  Without that, I wasn't planning to touch 'restrict' in
> > > GCC's -fanalyzer.
> > 
> > Meh; no they didn't.  
> 
> There were examples added in C23 and there are now several papers
> being under discussion.

Hmm, yeah, the examples help with the formal definition.  I was thinking
of the definition itself, which I still find quite confusing.  :-)

Have a lovely night!
Alex

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-08 Thread Alejandro Colomar via Gcc
On Mon, Jul 08, 2024 at 10:30:48AM GMT, David Malcolm wrote:
> > > Why is this change worth
> > > making? Real-world programs do not make calls like that.
> > 
> > Because it makes analysis of 'restrict' more consistent.  The obvious
> > improvement of GCC's analyzer to catch restrict violations will
> > trigger
> > false positives in normal uses of strtol(3).
> 
> Hi Alejandro

Hi Dave,

> I'm author/maintainer of GCC's -fanalyzer option, which is presumably
> why you CCed me on this.

Yup.

> One of my GSoC 2022 students (Tim Lange)
> looked at making use of 'restrict' in -fanalyzer, see e.g. 
> https://lists.gnu.org/archive/html/bug-gnulib/2022-07/msg00062.html
> 
> Based on Paul's comment here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99860#c2 (and its
> references) I came to the conclusion at the time that we should work on
> something else, as the meaning of 'restrict' is too ambiguous.

restrict, as of the formal definition of ISO C is useless crap.  The
more I read it, the more I agree.

restrict, as of what -Wrestrict warns about, seems a reasonable thing.

How about a [[gnu::restrict()]] attribute, similar to [[gnu::access()]],
which is simpler than the qualifier?  Since restrict is only meaningful
in function boundaries, it would make sense to have a function
attribute.  We don't want a qualifier that must follow discarding rules.

And then have it mean something strict, such as: The object pointed to
by the pointer is not pointed to by any other pointer; period.

This definition is already what -Wrestrict seems to understand.

> Later, I added a new -Wanalyzer-overlapping-buffers warning in GCC 14,
> which simply has a hardcoded set of standard library functions that it
> "knows" to warn about.

Hmmm, so it doesn't help at all for anything other than libc.  Ok.

> Has the C standard clarified the meaning of 'restrict' since that
> discussion?  Without that, I wasn't planning to touch 'restrict' in
> GCC's -fanalyzer.

Meh; no they didn't.  I understand.  That's why I don't like innovations
in ISO C, and prefer that implementations innovate with real stuff.

> Sorry if I'm missing anything here; I confess I've skimmed through
> parts of this thread.

Nope; all's good.

> 
> Dave

-- 



signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Paul,

On Sun, Jul 07, 2024 at 07:30:43PM GMT, Paul Eggert wrote:
> On 7/7/24 14:42, Alejandro Colomar wrote:
> > On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:
> > > Also, “global variables” is not
> > > right here. The C standard allows strtol, for example, to read and write 
> > > an
> > > internal static cache. (Yes, that would be weird, but it’s allowed.)
> > 
> > That's not part of the API.  A user must not access internal static
> > cache
> 
> Although true in the normal (sane) case, as an extension the implementation
> can make such a static cache visible to the user, and in this case the
> caller must not pass cache addresses as arguments to strtol.
> 
> For other functions this point is not purely academic. For example, the C
> standard specifies the signature "FILE *fopen(const char *restrict, const
> char *restrict);". If I understand your argument correctly, it says that the
> "restrict"s can be omitted there without changing the set of valid programs.

No, I didn't say that restrict can be removed from fopen(3).

What I say is that in functions that accept pointers that alias each
other, those aliasing pointers should not be restrict.  Usually,
pointers that alias are accessed, and thus they are not specified as
restrict, such as in memmove(3).  However, a small set of functions
accept pointers that alias each other, but one of them is never
accessed; in those few cases, restrict was added to the parameters in
ISO C, but I claim it would be better removed.  We're lucky, and the
small set of functions where this happens don't seem to use any state,
so we don't need to care about implementations using internal buffers
that are passed somehow to the user..

From ISO C, IIRC, the only examples are strtol(3) et al.  Another
example is Plan9's seprint(3) family of functions.  However, Plan9
doesn't use restrict, so it doesn't have it.

> But that can't be right, as omitting the "restrict"s would make the
> following code be valid in any platform where sizeof(int)>1:
> 
>char *p = (char *) &errno;
>p[0] = 'r';
>p[1] = 0;
>FILE *f = fopen (p, p);
> 
> even though the current standard says this code is invalid.

No, I wouldn't remove any of the restrict qualifiers in fopen(3).

Only from pointers that alias an access(none) pointer.

> > > “endptr   access(write_only) ... *endptr access(none)”
> > > 
> > > This is true for glibc, but it’s not necessarily true for all conforming
> > > strtol implementations. If endptr is non-null, a conforming strtol
> > > implementation can both read and write *endptr;
> > 
> > It can't, I think.  It's perfectly valid to pass an uninitialized
> > endptr, which means the callee must not read the original value.
> 
> Sure, but the callee can do something silly like "*endptr = p + 1; *endptr =
> *endptr - 1;". That is, it can read *endptr after writing it, without any
> undefined behavior. (And if the callee is written in assembly language it
> can read *endptr even before writing it - but I digress.)

But once you modify the pointer provenance, you don't care anymore about
it.  We need to consider the pointers that a function receives, which
are the ones the callee needs to know their provenance.  Of course, a
callee knows what it does, and so doesn't need restrict in local
variables.

C23/N3220::6.7.4.1p9 says:

An object that is accessed through a restrict-qualified pointer
has a special association with that pointer.  This association,
defined in 6.7.4.2, requires that all accesses to that object
use, directly or indirectly, the value of that pointer.

When you set *endptr = nptr + x, and use the lvalue **endptr, you're
still accessing the object indirectly using the value of nptr.

So, strtol(3) gets 4 objects, let's call them A, B, C, and D.

A is gotten via its pointer nptr.
B is gotten via its pointer endptr.
C is gotten via its pointer *endptr.
D is gotten via the global variable errno.

Object A may be the same as object C.
Object B is unique inside the callee.  Its pointer endptr must be
restrict-qualified to denote its uniqueness.
Object D is unique, but there's no way to specify that.

Object C must NOT be read or written.  The function is of course allowed
to set *endptr to whatever it likes, and then access it however it
likes, but object C must still NOT be accessed, since its pointer may be
uninitialized, and thus point to no object at all.


Maybe I should use abstract names for the objects, to avoid confusing
them with the pointer variables that are used to pass them?

The formal definition of restrict refers to the "object into which it
formerly [in the list of parameter declarations of a function
definition] pointed".  I'm not 100% certain, because this formal
definition is quite unreadable, though.  The more I read it, the less
sure I am about it.

BTW, I noticed something I didn't know:

If L is used to access the value of the object X that it
designates, and X is also mo

Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sun, Jul 07, 2024 at 02:21:17PM GMT, Martin Uecker wrote:
> Am Sonntag, dem 07.07.2024 um 13:07 +0200 schrieb Alejandro Colomar via Gcc:
> > Which is actually having perfect information, regardless of 'restrict'
> > on nptr.  :-)
> 
> Yes, but my point is that even with "restrict" a smarter
> compiler could then also be smart enough not to warn even
> when *endptr aliases nptr.

Hmmm, this is a valid argument.  I feel less strongly about this
proposal now.

I'll document this in the proposal.

Your analyzer would need to be more complex to be able to not trigger
false positives here, but it's possible, so I guess I'm happy with
either case.

Still, removing restrict from strtol(3) would allow to change the
semantics of restrict to be more restrictive (and easier to understand),
so that passing aliasing pointers as restrict pointers would already be
Undefined Behavior, regardless of the accesses by the callee.

But yeah, either way it's good, as far as strtol(3) and gcc-20 are
concerned.  :)

Have a lovely day!
Alex

> > > You also need to discuss backwards compatibility.  Changing
> > > the type of those functions can break valid programs.
> > 
> > I might be forgetting about other possibilities, but the only one I had
> > in mind that could break API would be function pointers.  However, a
> > small experiment seems to say it doesn't:
> 
> Right, the outermost qualifiers are ignored, so this is not a
> compatibility problem.  So I think this is not an issue, but
> it is worth pointing it out.

Yup.

> 
> Martin

-- 
<https://www.alejandro-colomar.es/>


signature.asc
Description: PGP signature


Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Paul,

On Sun, Jul 07, 2024 at 12:42:51PM GMT, Paul Eggert wrote:
> On 7/7/24 03:58, Alejandro Colomar wrote:
> 
> > I've incorporated feedback, and here's a new revision, let's call it
> > v0.2, of the draft for a WG14 paper.
> Although I have not followed the email discussion closely, I read v0.2 and
> think that as stated there is little chance that its proposal will be
> accepted.

Thanks for reading thoroughly, and the feedback!

> Fundamentally the proposal is trying to say that there are two styles X and
> Y for declaring strtol and similar functions, and that although both styles
> are correct in some sense, style Y is better than style X. However, the
> advantages of Y are not clearly stated and the advantages of style X over Y
> are not admitted, so the proposal is not making the case clearly and fairly.
> 
> One other thing: to maximize the chance of a proposal being accepted, please
> tailor it for its expected readership. The C committee is expert on
> ‘restrict’, so don’t try to define ‘restrict’ in your own way. Unless merely
> repeating the language of the standard, any definition given for ‘restrict’
> is likely to cause the committee to quibble with the restatement of the
> standard wording. (It is OK to mention some corollaries of the standard
> definition, so long as the corollaries are not immediately obvious.)
> 
> Here are some comments about the proposal. At the start these comments are
> detailed; towards the end, as I could see the direction the proposal was
> headed and was convinced it wouldn’t be accepted as stated, the comments are
> less detailed.
> 
> 
> "The API may copy"
> 
> One normally doesn’t think of the application programming interface as
> copying. Please replace the phrase “the API” with “the caller” or “the
> callee” as appropriate. (Although ‘restrict’ can be used in places other
> than function parameters, I don’t think the proposal is concerned about
> those cases and so it doesn’t need to go into that.)

Ok.

> "To avoid violations of for example C11::6.5.16.1p3,"
> 
> Code that violates C11::6.5.16.1p3 will do so regardless of whether
> ‘restrict’ is present. I would not mention C11::6.5.16.1p3 as it’s a red
> herring. Fundamentally, ‘restrict’ is not about the consequences of caching
> when one does overlapping moves; it’s about caching in a more general sense.

The violations are UB regardless of restrict, but consistent use of
restrict allows the caller to have a rough model of what the callee will
do with the objects, and prevent those violations via compiler
diagnostics.  I've reworded that part to make it more clear why I'm
mentioning that.

> “As long as an object is only accessed via one restricted pointer, other
> restricted pointers are allowed to point to the same object.”
> 
> “only accessed” → “accessed only”

Ok.

> “This is less strict than I think it should be, but this proposal doesn’t
> attempt to change that definition.”
> 
> I would omit this sentence and all similar sentences. Don’t distract the
> reader with other potential proposals. The proposal as it stands is
> complicated enough.

Ok.

> “return ca > a;”
> “return ca > *ap;”
> 
> I fail to understand why these examples are present. It’s not simply that
> nobody writes code like that: the examples are not on point. I would remove
> the entire programs containing them, along with the sections that discuss
> them. When writing to the C committee one can assume the reader is expert in
> ‘restrict’, there is no need for examples such as these.

Those are examples of how consistent use of restrict can --or could, in
the case of g()-- detect, via compiler diagnostics, (likely) violations
of seemingly unrelated parts of the standard, such as the referenced
C11::6.5.16.1p3, or in this case, C11::6.5.8p5.  

> “strtol(3) accepts 4 objects via pointer parameters and global variables.”
> 
> Omit the “(3)”, here and elsewhere, as the audience is the C standard
> committee.

The C standard committee doesn't know about historic use of (3)?  That
predates the standard, and they built on top of that (C originated in
Unix).  While they probably don't care about it anymore, I expect my
paper to be read by other audience, including GCC and glibc, and I
prefer to keep it readable for that audience.  I expect the standard
committee to at least have a rough idea of the existence of this syntax,
and respect it, even if they don't use it or like it.

> “accepts” is a strange word to use here: normally one says “accepts” to talk
> about parameters, not global variables.

The thing is, strtol(3) does not actually access *endptr.  I thought
that might cause more confusion than using "accepts".

> Also, “global variables” is not
> right here. The C standard allows strtol, for example, to read and write an
> internal static cache. (Yes, that would be weird, but it’s allowed.)

That's not part of the API.  A user must not access internal static
cache, and so the implementation is free to assume that it doesn'

Re: WG14 paper for removing restrict from nptr in strtol(3)

2024-07-07 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sun, Jul 07, 2024 at 09:15:23AM GMT, Martin Uecker wrote:
> 
> Hi Alejandro,
> 
> if in caller it is known that endptr has access mode "write_only"
> then it can conclude that the content of *endptr has access mode
> "none", couldn't it?

H.  I think you're correct.  I'll incorporate that and see how it
affects the caller.

At first glance, I think it would result in

nptraccess(read_only)   alias *endptr
endptr  access(write_only)  unique
errno   access(read_write)  unique
*endptr access(none)alias nptr

Which is actually having perfect information, regardless of 'restrict'
on nptr.  :-)

> You also need to discuss backwards compatibility.  Changing
> the type of those functions can break valid programs.

I might be forgetting about other possibilities, but the only one I had
in mind that could break API would be function pointers.  However, a
small experiment seems to say it doesn't:

$ cat strtolp.c 
#include 

long
alx_strtol(const char *nptr, char **restrict endptr, int base)
{
return strtol(nptr, endptr, base);
}

typedef long (*strtolp_t)(const char *restrict nptr,
  char **restrict endptr, int base);
typedef long (*strtolpnr_t)(const char *nptr,
   char **restrict endptr, int base);

int
main(void)
{
[[maybe_unused]] strtolp_ta = &strtol;
[[maybe_unused]] strtolpnr_t  b = &strtol;
[[maybe_unused]] strtolp_tc = &alx_strtol;
[[maybe_unused]] strtolpnr_t  d = &alx_strtol;
}

$ cc -Wall -Wextra strtolp.c 
$

Anyway, I'll say that it doesn't seem to break API.

>  You would
> need to make a case that this is unlikely to affect any real
> world program.

If you have something else in mind that could break API, please let me
know, and I'll add it to the experiments.

Thanks!

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


WG14 paper for removing restrict from nptr in strtol(3)

2024-07-06 Thread Alejandro Colomar via Gcc
Hi,

I've incorporated feedback, and here's a new revision, let's call it
v0.2, of the draft for a WG14 paper.  I've attached the man(7) source,
and the generated PDF.

Cheers,
Alex


-- 



strtol.man
Description: Unix manual page


strtol.pdf
Description: Adobe PDF document


signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Xi,

On Sat, Jul 06, 2024 at 10:24:16AM GMT, Xi Ruoyao wrote:
> On Sat, 2024-07-06 at 00:02 +0200, Alejandro Colomar wrote:
> > That's precisely the case with strtol(3): it doesn't access any objects
> > through *endptr, and so that pointer need not be restrict.
> > 
> > Then, nptr is a read-only pointer, so is doesn't matter either if it's
> > accessed or not.
> 
> Restrict allows to reorder any writes to other objects with an read from
> nptr then. In strtol at least errno can be written, and depending on the
> implementation of locale things there may be more.

This does not apply here, I think.  Let's include errno in the list of
objects that strtol(3) takes, and list their access modes:

-  nptr access(read_only)
-  *endptr  access(none)
-  endptr   access(read_write) [it checks for NULL; I had forgotten]
-  errnoaccess(read_write)

In the callee:
~~

The access modes are known by the callee, because of course it knows
what it does, so even without the attributes, it knows that.

strtol(3) cannot write to errno until it has parsed the string.  And
once it knows it has failed (so wants to set errno), it has no reasons
to read nptr again.  Thus, even without knowing if 'errno' and 'nptr'
are the same thing, there's nothing that could be optimized.

*endptr is access(none), so it is implicitly restricted even without
specifying restrict on it; the callee couldn't care less about it.

endptr is 'restrict', so it can be treated separately from the rest.

In the caller:
~~

We can't specify the access mode of *endptr nor errno, so the caller
must assume they are read_write.

endptr is 'restrict', but this is only useful for having warnings
against dumb things such as strtol(x, x, 0).  Other than that, the
caller knows what it has passed in endptr, so it knows it's a different
thing.

The caller also knows that it hasn't passed errno as nptr.

Then we must make a distinction in what the caller passes in *endptr:

*endptr is uninitialized:
~

The caller knows that nptr is restricted even without the qualifier,
since all other objects are either restricted, or known to be different.

*endptr points to the same thing as nptr:
~

Regardless of the 'restrict' qualifier being specified or not, the
caller has no way to determine if the callee accesses the object via
nptr or via *endptr, so it must assume the worst case: *endptr; and so
it must assume it could have written to it (since *endptr is non-const
--and even if it were const, as you said, it means nothing--).


So, even considering errno in the game, I don't see any difference if we
specify nptr to be restrict or not.

Thanks for the feedback!  I'll incorporate the discussion about errno in
the paper for WG14.

Have a lovely day!
Alex

> 
> TBAA does not help here because char aliases with anything.

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
On Sat, Jul 06, 2024 at 08:10:28AM GMT, Alejandro Colomar wrote:
> Hi Xi,
> 
> On Sat, Jul 06, 2024 at 10:24:16AM GMT, Xi Ruoyao wrote:
> > On Sat, 2024-07-06 at 00:02 +0200, Alejandro Colomar wrote:
> > > That's precisely the case with strtol(3): it doesn't access any objects
> > > through *endptr, and so that pointer need not be restrict.
> > > 
> > > Then, nptr is a read-only pointer, so is doesn't matter either if it's
> > > accessed or not.
> > 
> > Restrict allows to reorder any writes to other objects with an read from
> > nptr then. In strtol at least errno can be written, and depending on the
> > implementation of locale things there may be more.
> 
> This does not apply here, I think.  Let's include errno in the list of
> objects that strtol(3) takes, and list their access modes:
> 
> -  nptr   access(read_only)
> -  *endptraccess(none)
> -  endptr access(read_write) [it checks for NULL; I had forgotten]

Sorry, I was right the first time; it's write_only.  The NULL check is
on the pointer; not the pointee.

> -  errno  access(read_write)
> 
> In the callee:
> ~~
> 
> The access modes are known by the callee, because of course it knows
> what it does, so even without the attributes, it knows that.
> 
> strtol(3) cannot write to errno until it has parsed the string.  And
> once it knows it has failed (so wants to set errno), it has no reasons
> to read nptr again.  Thus, even without knowing if 'errno' and 'nptr'
> are the same thing, there's nothing that could be optimized.
> 
> *endptr is access(none), so it is implicitly restricted even without
> specifying restrict on it; the callee couldn't care less about it.
> 
> endptr is 'restrict', so it can be treated separately from the rest.
> 
> In the caller:
> ~~
> 
> We can't specify the access mode of *endptr nor errno, so the caller
> must assume they are read_write.
> 
> endptr is 'restrict', but this is only useful for having warnings
> against dumb things such as strtol(x, x, 0).  Other than that, the
> caller knows what it has passed in endptr, so it knows it's a different
> thing.
> 
> The caller also knows that it hasn't passed errno as nptr.
> 
> Then we must make a distinction in what the caller passes in *endptr:
> 
> *endptr is uninitialized:
> ~
> 
> The caller knows that nptr is restricted even without the qualifier,
> since all other objects are either restricted, or known to be different.
> 
> *endptr points to the same thing as nptr:
> ~
> 
> Regardless of the 'restrict' qualifier being specified or not, the
> caller has no way to determine if the callee accesses the object via
> nptr or via *endptr, so it must assume the worst case: *endptr; and so
> it must assume it could have written to it (since *endptr is non-const
> --and even if it were const, as you said, it means nothing--).
> 
> 
> So, even considering errno in the game, I don't see any difference if we
> specify nptr to be restrict or not.
> 
> Thanks for the feedback!  I'll incorporate the discussion about errno in
> the paper for WG14.
> 
> Have a lovely day!
> Alex
> 
> > 
> > TBAA does not help here because char aliases with anything.
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


[[gnu::null_terminated_string_arg(1)]] on strtol(1) (was: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like) functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Xi,

On Sat, Jul 06, 2024 at 10:39:41AM GMT, Xi Ruoyao wrote:
> BTW among your list:
> 
> > > [[gnu::access(read_only, 1)]]
> > > [[gnu::access(write_only, 2)]]
> > > [[gnu::leaf]]
> > > [[gnu::nothrow]]
> > > [[gnu::null_terminated_string_arg(1)]]
> 
> IMO we should add these access attributes, they'll definitely help the
> optimization (like, optimize away the initialization of a pointer).
> 
> We already have __THROW which expands to nothrow and leaf.
> 
> I'm not sure if null_terminated_string_arg is correct: is the following
> invalid or not?
> 
> char p[] = {'1', ')'};
> char *q;
> strtol(p, &q, 10);
> assert(q == &p[1]);
> 
> If this is invalid we should have null_terminated_string_arg so at least
> we'll get a warning against this.

ISO C says:

"""
The strtol, strtoll, strtoul, and strtoull functions convert the initial
portion of the string pointed to by nptr to long int, long long int,
unsigned long int, and unsigned long long int representation,
respectively.  First, they decompose the input string into three parts:
an initial, possibly empty, sequence of white-space characters (as
specified by the isspace function), a subject sequence resembling an
integer represented in some radix determined by the value of base, and a
final string of one or more unrecognized characters, including the
terminating null character of the input string.  Then, they attempt to
convert the subject sequence to an integer, and return the result.
"""


I'd say it's a string.

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
On Sat, Jul 06, 2024 at 12:02:06AM GMT, Alejandro Colomar wrote:
> Hi Jonathan,
> 
> On Fri, Jul 05, 2024 at 10:39:52PM GMT, Jonathan Wakely wrote:
> > On Fri, 5 Jul 2024 at 21:55, Alejandro Colomar  wrote:
> > >
> > > On Fri, Jul 05, 2024 at 09:28:46PM GMT, Jonathan Wakely wrote:
> > > > > If we marked endptr as "write_only" (which it might already
> > > > > be) then a future warning mechanism for -Wrestrict could
> > > > > ignore the content of *endptr.
> > > >
> > > >
> > > > That seems more useful. Add semantic information instead of taking it
> > > > away. If the concern is a hypothetical future compiler warning that
> > > > would give false positives for perfectly valid uses of strtol, then
> > > > the problem is the compiler warning, not strtol. If additional
> > > > information can be added to avoid the false positives (and also
> > > > possibly optimize the code better), then that wouldn't require a
> > > > change to the standard motivated by a hypothetical compiler warning.
> > >
> > > Let me be a little bit sarcastic.
> > >
> > > If so, let's take down -Wrestrict at all, because it triggers false
> > > positives at the same rate.  How is it even in -Wall and not just
> > > -Wextra?
> > >
> > > Here's a false positive:
> > >
> > > $ cat d.c
> > > int is_same_pointer(const char *restrict ca, char *restrict a)
> > > {
> > > return ca == a;
> > 
> > This is a strawman argument, because all your example functions have
> > been pretty useless and/or not good uses of restrict.
> > 
> > Yes, if you put restrict on functions where you don't ever access any
> > objects through the pointers, the restrict qualifiers are misleading
> 
> That's precisely the case with strtol(3): it doesn't access any objects
> through *endptr, and so that pointer need not be restrict.
> 
> Then, nptr is a read-only pointer, so is doesn't matter either if it's
> accessed or not.
> 
> Let's say we add as many attributes as possible to strtol(3):
> 
> [[gnu::access(read_only, 1)]]
> [[gnu::access(write_only, 1)]]

s/1/2/

> [[gnu::leaf]]
> [[gnu::nothrow]]
> [[gnu::null_terminated_string_arg(1)]]
>  // [[gnu::access(none, *1)]]

s/*1/*2/

> long
> alx_strtol(const char *nptr, char **_Nullable restrict endp, int base);
> 
> Let's say we could mark *endptr as a 'access(none)' pointer, since it's
> not accessed.  Let's say we do that with [[gnu::access(none, *1)]].

s/*1/*2/

> Then, do you think the information of that prototype is any different
> than a prototype with restrict on the remaining pointers?
> 
> [[gnu::access(read_only, 1)]]
> [[gnu::access(write_only, 1)]]

s/1/2/

> [[gnu::leaf]]
> [[gnu::nothrow]]
> [[gnu::null_terminated_string_arg(1)]]
>  // [[gnu::access(none, *1)]]

s/*1/*2/

> long
> alx_strtol(const char *restrict nptr,
>char *restrict *_Nullable restrict endp, int base);
> 
> I don't think so.  Since *endptr is access(none), it certainly cannot
> access nptr, and thus the qualifier on nptr is superfluous.
> 
> And even without the hypothetical [[gnu::access(none, *1)]]:

s/*1/*2/

> 
> -  The callee doesn't care about restrict, because it doesn't access
>any objects via *endptr, so it certainly knows that nptr can be read
>without any concerns about optimization.
> 
> -  The caller can't know if strtol(3) accesses *endptr, or nptr, and so
>it can't optimize.  Unless it passes an uninitialized value in
>*endptr, which means the caller knows for sure that nptr won't be
>written, regardless of restrict on it or not.
> 
> Please, describe what's the information you think is being added by
> having restrict on nptr, on how it would be lost if we remove it.
> 
> Cheers,
> Alex
> 
> > and so compilers might give bad warnings for your bad API.
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Jonathan,

On Fri, Jul 05, 2024 at 10:39:52PM GMT, Jonathan Wakely wrote:
> On Fri, 5 Jul 2024 at 21:55, Alejandro Colomar  wrote:
> >
> > On Fri, Jul 05, 2024 at 09:28:46PM GMT, Jonathan Wakely wrote:
> > > > If we marked endptr as "write_only" (which it might already
> > > > be) then a future warning mechanism for -Wrestrict could
> > > > ignore the content of *endptr.
> > >
> > >
> > > That seems more useful. Add semantic information instead of taking it
> > > away. If the concern is a hypothetical future compiler warning that
> > > would give false positives for perfectly valid uses of strtol, then
> > > the problem is the compiler warning, not strtol. If additional
> > > information can be added to avoid the false positives (and also
> > > possibly optimize the code better), then that wouldn't require a
> > > change to the standard motivated by a hypothetical compiler warning.
> >
> > Let me be a little bit sarcastic.
> >
> > If so, let's take down -Wrestrict at all, because it triggers false
> > positives at the same rate.  How is it even in -Wall and not just
> > -Wextra?
> >
> > Here's a false positive:
> >
> > $ cat d.c
> > int is_same_pointer(const char *restrict ca, char *restrict a)
> > {
> > return ca == a;
> 
> This is a strawman argument, because all your example functions have
> been pretty useless and/or not good uses of restrict.
> 
> Yes, if you put restrict on functions where you don't ever access any
> objects through the pointers, the restrict qualifiers are misleading

That's precisely the case with strtol(3): it doesn't access any objects
through *endptr, and so that pointer need not be restrict.

Then, nptr is a read-only pointer, so is doesn't matter either if it's
accessed or not.

Let's say we add as many attributes as possible to strtol(3):

[[gnu::access(read_only, 1)]]
[[gnu::access(write_only, 1)]]
[[gnu::leaf]]
[[gnu::nothrow]]
[[gnu::null_terminated_string_arg(1)]]
 // [[gnu::access(none, *1)]]
long
alx_strtol(const char *nptr, char **_Nullable restrict endp, int base);

Let's say we could mark *endptr as a 'access(none)' pointer, since it's
not accessed.  Let's say we do that with [[gnu::access(none, *1)]].

Then, do you think the information of that prototype is any different
than a prototype with restrict on the remaining pointers?

[[gnu::access(read_only, 1)]]
[[gnu::access(write_only, 1)]]
[[gnu::leaf]]
[[gnu::nothrow]]
[[gnu::null_terminated_string_arg(1)]]
 // [[gnu::access(none, *1)]]
long
alx_strtol(const char *restrict nptr,
   char *restrict *_Nullable restrict endp, int base);

I don't think so.  Since *endptr is access(none), it certainly cannot
access nptr, and thus the qualifier on nptr is superfluous.

And even without the hypothetical [[gnu::access(none, *1)]]:

-  The callee doesn't care about restrict, because it doesn't access
   any objects via *endptr, so it certainly knows that nptr can be read
   without any concerns about optimization.

-  The caller can't know if strtol(3) accesses *endptr, or nptr, and so
   it can't optimize.  Unless it passes an uninitialized value in
   *endptr, which means the caller knows for sure that nptr won't be
   written, regardless of restrict on it or not.

Please, describe what's the information you think is being added by
having restrict on nptr, on how it would be lost if we remove it.

Cheers,
Alex

> and so compilers might give bad warnings for your bad API.

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
On Fri, Jul 05, 2024 at 09:28:46PM GMT, Jonathan Wakely wrote:
> > If we marked endptr as "write_only" (which it might already
> > be) then a future warning mechanism for -Wrestrict could
> > ignore the content of *endptr.
> 
> 
> That seems more useful. Add semantic information instead of taking it
> away. If the concern is a hypothetical future compiler warning that
> would give false positives for perfectly valid uses of strtol, then
> the problem is the compiler warning, not strtol. If additional
> information can be added to avoid the false positives (and also
> possibly optimize the code better), then that wouldn't require a
> change to the standard motivated by a hypothetical compiler warning.

Let me be a little bit sarcastic.

If so, let's take down -Wrestrict at all, because it triggers false
positives at the same rate.  How is it even in -Wall and not just
-Wextra?

Here's a false positive:

$ cat d.c
int is_same_pointer(const char *restrict ca, char *restrict a)
{
return ca == a;
}

int main(void)
{
char x = 3;
char *xp = &x;
is_same_pointer(xp, xp);
}
$ cc -Wall d.c
d.c: In function ‘main’:
d.c:10:9: warning: passing argument 2 to ‘restrict’-qualified parameter 
aliases with argument 1 [-Wrestrict]
   10 | d(xp, xp);
  | ^~~

It's impossible to know if a use of restrict causes UB without reading
both the source code of the caller and the callee, so except for
-fanalyzer, it's impossible to diagnose something with certainty.

So, it's certainly not something we want in -Wall.

Or should I remove the 'restrict' qualifier from that function, loosing
"precious" semantic information, just because the compiler doesn't like
it?

Cheers,
Alex


-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
On Fri, Jul 05, 2024 at 09:28:46PM GMT, Jonathan Wakely wrote:
> On Fri, 5 Jul 2024 at 21:26, Martin Uecker  wrote:
> >
> > Am Freitag, dem 05.07.2024 um 21:28 +0200 schrieb Alejandro Colomar:
> >
> > ...
> > >
> > > > > Showing that you can contrive a case where a const char* restrict and
> > > > > char** restrict can alias doesn't mean there's a problem with strtol.
> > > >
> > > > I think his point is that a const char* restrict and something which
> > > > is stored in a char* whose address is then passed can alias and there
> > > > a warning would make sense in other situations.
> > >
> > > Indeed.
> > >
> > > > But I am also not convinced removing restrict would be an improvement.
> > > > It would make more sense to have an annotation that indicates that
> > > > endptr is only used as output.
> > >
> > > What is the benefit of keeping restrict there?  It doesn't provide any
> > > benefits, AFAICS.
> >
> > Not really I think.  I am generally not a fan of restrict.
> > IMHO it is misdesigned and I would like to see it replaced
> > with something better.  But I also not convinced it really
> > helps to remove it here.
> >
> > If we marked endptr as "write_only" (which it might already
> > be) then a future warning mechanism for -Wrestrict could
> > ignore the content of *endptr.
> 
> 
> That seems more useful. Add semantic information instead of taking it
> away.

How does restrict on nptr (or conversely on *endptr) add any semantic
information?  Can you please phrase the semantic information provided by
it?  How is it useful to the caller?  How is it useful to the callee?

Cheers,
Alex

> If the concern is a hypothetical future compiler warning that
> would give false positives for perfectly valid uses of strtol, then
> the problem is the compiler warning, not strtol. If additional
> information can be added to avoid the false positives (and also
> possibly optimize the code better), then that wouldn't require a
> change to the standard motivated by a hypothetical compiler warning.

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Jonathan,

On Fri, Jul 05, 2024 at 08:52:30PM GMT, Jonathan Wakely wrote:
> > > > > > > It **shouldn't**.  strtol will only violate restrict if it's 
> > > > > > > wrongly
> > > > > > > implemented, or something dumb is done like "strtol((const char*) 
> > > > > > > &p,
> > > > > > > &p, 0)".
> > > > > > >
> > > > > > > See my previous reply.
> > > >
> > > > That's not right.  See my reply to yours, Xi.  The restrict in
> > > >
> > > > char **endptr
> > > >
> > > > already prevents calls such as strtol(x, x, 0).
> > >
> > > That seems to contradict footnote 153 in C23.
> >
> > Did you mean a different footnote number?
> 
> No.
> 
> >
> > Here's 153 in N3047:
> 
> That draft is nearly two years old.
> 
> >
> > 153) An implementation can delay the choice of which integer type until
> >  all enumeration constants have been seen.
> >
> > which seems completely unrelated.
> 
> Because you're looking at a draft from nearly two years ago. Try N3220.

Ahhh, sorry!  Indeed.

Let's quote it here, for others to not need to find it:

153) In other words, E depends on the value of P itself rather than on
 the value of an object referenced indirectly through P.  For
 example, if identifier p has type (int **restrict), then the
 pointer expressions p and p+1 are based on the restricted pointer
 object designated by p, but the pointer expressions *p and p[1] are
 not.

I don't think footnote 153 is problematic here.

Let's have this prototype:

long int
alx_strtol(const char *nptr, char **restrict endptr, int base);

and let's discuss some example of bad usage:

char str[] = "1";

s = str;
alx_strtol(s, (char **)s, 0);

According to 153, the pointer expression endptr is based on the
restricted pointer object, but *endptr and nptr are not.

The user has passed s as endptr, and also s as nptr.  Thus, the object
s is being accessed via a restricted pointer, endptr, and a
non-restricted one, nptr.  That's UB.

Let's see a different example of bad usage:

char str[] = "1";

s = str;
alx_strtol((char *)&s, &s, 0);

For similar reasons, it's also UB.

The compiler diagnoses both:

$ cat r.c
long alx_strtol(const char *s, char **restrict endp, int base);

int main(void)
{
char x = 3;
char *xp = &x;

alx_strtol(xp, &xp, 0);  // Fine.
alx_strtol(xp, (char **) xp, 0);  // Bug.
alx_strtol((char *) &xp, &xp, 0);  // Bug.
}
$ cc -Wall -Wextra -S r.c
r.c: In function ‘main’:
r.c:9:24: warning: passing argument 2 to ‘restrict’-qualified parameter 
aliases with argument 1 [-Wrestrict]
9 | alx_strtol(xp, (char **) xp, 0);  // Bug.
  |^~~~
r.c:10:34: warning: passing argument 2 to ‘restrict’-qualified 
parameter aliases with argument 1 [-Wrestrict]
   10 | alx_strtol((char *) &xp, &xp, 0);  // Bug.
  |  ^~~

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Jonathan,

On Fri, Jul 05, 2024 at 08:38:15PM GMT, Jonathan Wakely wrote:
> On Fri, 5 Jul 2024 at 20:28, Alejandro Colomar  wrote:
> >
> > Hi,
> >
> > On Fri, Jul 05, 2024 at 06:30:50PM GMT, Martin Uecker wrote:
> > > Am Freitag, dem 05.07.2024 um 17:24 +0100 schrieb Jonathan Wakely:
> > > > On Fri, 5 Jul 2024 at 17:02, Xi Ruoyao via Gcc  wrote:
> > > > >
> > > > > On Fri, 2024-07-05 at 17:53 +0200, Alejandro Colomar wrote:
> > > > > > At least, I hope there's consensus that while current GCC doesn't 
> > > > > > warn
> > > > > > about this, ideally it should, which means it should warn for valid 
> > > > > > uses
> > > > > > of strtol(3), which means strtol(3) should be fixed, in all of ISO,
> > > > > > POSIX, and glibc.
> > > > >
> > > > > It **shouldn't**.  strtol will only violate restrict if it's wrongly
> > > > > implemented, or something dumb is done like "strtol((const char*) &p,
> > > > > &p, 0)".
> > > > >
> > > > > See my previous reply.
> >
> > That's not right.  See my reply to yours, Xi.  The restrict in
> >
> > char **endptr
> >
> > already prevents calls such as strtol(x, x, 0).
> 
> That seems to contradict footnote 153 in C23.

Did you mean a different footnote number?

Here's 153 in N3047:

153) An implementation can delay the choice of which integer type until
 all enumeration constants have been seen.

which seems completely unrelated.

Cheers,
Alex

-- 



signature.asc
Description: PGP signature


[WG14] Request for document number; strtol restrictness

2024-07-05 Thread Alejandro Colomar via Gcc
Hi,

I have a paper for removing restrict from the first parameter of
strtol(3) et al.  The title is

strtol(3) et al. should’t have a restricted first parameter.

If it helps, I already have a draft of the paper, which I attach (both
the PDF, and the man(7) source).

Cheers,
Alex

-- 



strtol.man
Description: Unix manual page


strtol.pdf
Description: Adobe PDF document


signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi,

On Fri, Jul 05, 2024 at 06:30:50PM GMT, Martin Uecker wrote:
> Am Freitag, dem 05.07.2024 um 17:24 +0100 schrieb Jonathan Wakely:
> > On Fri, 5 Jul 2024 at 17:02, Xi Ruoyao via Gcc  wrote:
> > > 
> > > On Fri, 2024-07-05 at 17:53 +0200, Alejandro Colomar wrote:
> > > > At least, I hope there's consensus that while current GCC doesn't warn
> > > > about this, ideally it should, which means it should warn for valid uses
> > > > of strtol(3), which means strtol(3) should be fixed, in all of ISO,
> > > > POSIX, and glibc.
> > > 
> > > It **shouldn't**.  strtol will only violate restrict if it's wrongly
> > > implemented, or something dumb is done like "strtol((const char*) &p,
> > > &p, 0)".
> > > 
> > > See my previous reply.

That's not right.  See my reply to yours, Xi.  The restrict in

char **endptr

already prevents calls such as strtol(x, x, 0).

The restrict in

const char *nptr

provides nothing.

> > 
> > Right, is there a valid use of strtol where a warning would be justified?

Is there any valid reason to have restrict in the _first_ parameter of
strtol(3)?  Other than "ISO C says so"?  I'll take my beef with ISO C to
WG14, and hopefully get that fixed.  Can we please discuss this
technically, ignoring the existence of ISO C, for the time being?

> > Showing that you can contrive a case where a const char* restrict and
> > char** restrict can alias doesn't mean there's a problem with strtol.
> 
> I think his point is that a const char* restrict and something which
> is stored in a char* whose address is then passed can alias and there
> a warning would make sense in other situations.   

Indeed.

> But I am also not convinced removing restrict would be an improvement.
> It would make more sense to have an annotation that indicates that
> endptr is only used as output.

What is the benefit of keeping restrict there?  It doesn't provide any
benefits, AFAICS.

I've prepared a paper for wg14.  I'll ask for a number, but will attach
it here already.  I also attach the man(7) source code for it.

Cheers,
Alex

-- 



strtol.man
Description: Unix manual page


strtol.pdf
Description: Adobe PDF document


signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
On Fri, Jul 05, 2024 at 06:32:26PM GMT, Alejandro Colomar wrote:
> Hi Xi,
> 
> On Fri, Jul 05, 2024 at 11:55:05PM GMT, Xi Ruoyao wrote:
> > On Fri, 2024-07-05 at 17:23 +0200, Alejandro Colomar wrote:
> > > > strtol does have  a "char * restrict * restrict" though, so the
> > > > situation is different.   A "char **" and a "const char *"
> > > > shouldn't alias anyway. 
> > > 
> > > Pedantically, it is actually declared as 'char **restrict' (the inner
> > > one is not declared as restrict, even though it will be restricted,
> > > since there are no other unrestricted pointers).
> > 
> > So how's the following implementation of strtol (10-based, no negative
> > number handling, no overflow handling, ASCII-only) wrong?
> > 
> > long int my_strtol(const char *restrict nptr, char **restrict endptr)
> > {
> >   long ret = 0;
> > 
> >   while (isdigit(*nptr))
> > ret = ret * 10 + (*nptr++ - '0');
> > 
> >   *endptr = (char *)nptr;
> >   return ret;
> > }
> > 
> > There's no dumb thing, there's no restrict violation (unless it's called
> > in a stupid way, see below), and there **shouldn't** be a -Wrestrict
> > warning.
> > 
> > If you do
> > 
> > char *x = NULL;
> > strtol((char *)&x, &x, 10);
> 
> The restrict in `char **restrict endptr` already protects you from this.
> You don't need to make the first parameter also restricted.  See:
> 
>   $ cat r.c 
>   long alx_strtol(const char *nptr, char **restrict endp);
> 
>   int main(void)
>   {
>   char x = 3;
>   char *xp = &x;
> 
>   alx_strtol(xp, &xp);  // Fine.
>   alx_strtol(xp, (char **) xp);  // Bug.
>   alx_strtol((char *) &xp, &xp);  // Bug.
>   }
>   $ cc -Wall -Wextra -S r.c 
>   r.c: In function ‘main’:
>   r.c:9:24: warning: passing argument 2 to ‘restrict’-qualified parameter 
> aliases with argument 1 [-Wrestrict]
>   9 | alx_strtol(xp, (char **) xp);  // Bug.
> |^~~~
>   r.c:10:34: warning: passing argument 2 to ‘restrict’-qualified 
> parameter aliases with argument 1 [-Wrestrict]
>  10 | alx_strtol((char *) &xp, &xp);  // Bug.
> |  ^~~
> 
> Using my proposed prototype wouldn't case any warnings with a powerful
> -fanalyzer that would be able to emit diagnostics with the current
> prototype.
> 
> In strtol(3), there are 3 pointers:
> 
> -  nptr
> -  *endptr
> -  endptr
> 
> The first two should be allowed to alias each other (and at the end of
> the call, they definitely alias each other).  It is only the third one
> which must not alias any of the other, which is why my patch (v2) keeps

(Whoops; forget about that v2; that was about a similar patch to
strsep(3).  In this case we're in patch v1; which already had that into
consideration.  Please read it as s/v2/v1/.)

> that restrict, but removes the other one.
> 
> Does that make sense?
> 
> Cheers,
> Alex
> 
> > 
> > it'll violate restrict.  Nobody sane should write this, and it's warned
> > anyway:
> > 
> > t.c: In function 'main':
> > t.c:6:28: warning: passing argument 2 to 'restrict'-qualified parameter
> > aliases with argument 1 [-Wrestrict]
> > 6 | strtol((char *)&x, &x, 10);
> >   |~~  ^~
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> -- 
> 



-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Xi,

On Fri, Jul 05, 2024 at 11:55:05PM GMT, Xi Ruoyao wrote:
> On Fri, 2024-07-05 at 17:23 +0200, Alejandro Colomar wrote:
> > > strtol does have  a "char * restrict * restrict" though, so the
> > > situation is different.   A "char **" and a "const char *"
> > > shouldn't alias anyway. 
> > 
> > Pedantically, it is actually declared as 'char **restrict' (the inner
> > one is not declared as restrict, even though it will be restricted,
> > since there are no other unrestricted pointers).
> 
> So how's the following implementation of strtol (10-based, no negative
> number handling, no overflow handling, ASCII-only) wrong?
> 
> long int my_strtol(const char *restrict nptr, char **restrict endptr)
> {
>   long ret = 0;
> 
>   while (isdigit(*nptr))
> ret = ret * 10 + (*nptr++ - '0');
> 
>   *endptr = (char *)nptr;
>   return ret;
> }
> 
> There's no dumb thing, there's no restrict violation (unless it's called
> in a stupid way, see below), and there **shouldn't** be a -Wrestrict
> warning.
> 
> If you do
> 
> char *x = NULL;
> strtol((char *)&x, &x, 10);

The restrict in `char **restrict endptr` already protects you from this.
You don't need to make the first parameter also restricted.  See:

$ cat r.c 
long alx_strtol(const char *nptr, char **restrict endp);

int main(void)
{
char x = 3;
char *xp = &x;

alx_strtol(xp, &xp);  // Fine.
alx_strtol(xp, (char **) xp);  // Bug.
alx_strtol((char *) &xp, &xp);  // Bug.
}
$ cc -Wall -Wextra -S r.c 
r.c: In function ‘main’:
r.c:9:24: warning: passing argument 2 to ‘restrict’-qualified parameter 
aliases with argument 1 [-Wrestrict]
9 | alx_strtol(xp, (char **) xp);  // Bug.
  |^~~~
r.c:10:34: warning: passing argument 2 to ‘restrict’-qualified 
parameter aliases with argument 1 [-Wrestrict]
   10 | alx_strtol((char *) &xp, &xp);  // Bug.
  |  ^~~

Using my proposed prototype wouldn't case any warnings with a powerful
-fanalyzer that would be able to emit diagnostics with the current
prototype.

In strtol(3), there are 3 pointers:

-  nptr
-  *endptr
-  endptr

The first two should be allowed to alias each other (and at the end of
the call, they definitely alias each other).  It is only the third one
which must not alias any of the other, which is why my patch (v2) keeps
that restrict, but removes the other one.

Does that make sense?

Cheers,
Alex

> 
> it'll violate restrict.  Nobody sane should write this, and it's warned
> anyway:
> 
> t.c: In function 'main':
> t.c:6:28: warning: passing argument 2 to 'restrict'-qualified parameter
> aliases with argument 1 [-Wrestrict]
> 6 | strtol((char *)&x, &x, 10);
>   |~~  ^~
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Martin,

On Fri, Jul 05, 2024 at 05:34:55PM GMT, Martin Uecker wrote:
> > I've written functions that more closely resemble strtol(3), to show
> > that in the end they all share the same issue regarding const-ness:

(Above I meant s/const/restrict/)

> > 
> > $ cat d.c 
> > int d(const char *restrict ca, char *restrict a)
> > {
> > return ca > a;
> > }
> > 
> > int main(void)
> > {
> > char x = 3;
> > char *xp = &x;
> > d(xp, xp);
> > }
> > $ cc -Wall -Wextra d.c 
> > d.c: In function ‘main’:
> > d.c:10:9: warning: passing argument 2 to ‘restrict’-qualified parameter 
> > aliases with argument 1 [-Wrestrict]
> >10 | d(xp, xp);
> >   | ^
> > 
> > This trivial program causes a diagnostic.  (Although I think the '>'
> > should also cause a diagnostic!!)
> > 
> > Let's add a reference, to resemble strtol(3):
> > 
> > $ cat d2.c 
> > int d2(const char *restrict ca, char *restrict *restrict ap)
> > {
> > return ca > *ap;
> > }
> > 
> > int main(void)
> > {
> > char x = 3;
> > char *xp = &x;
> > d2(xp, &xp);
> > }
> > $ cc -Wall -Wextra d2.c 
> > $ 
> > 
> > Why does this not cause a -Wrestrict diagnostic, while d.c does?  How
> > are these programs any different regarding pointer restrict-ness?
> 
> It would require data flow anaylsis to produce the diagnostic while
> the first can simply be diagnosed by comparing arguments.

Agree.  It seems like a task for -fanalyzer.

$ cc -Wall -Wextra -fanalyzer -fuse-linker-plugin -flto d2.c
$

I'm unable to trigger that at all.  It's probably not implemented, I
guess.  I've updated the bug report
 to change the
component to 'analyzer'.

At least, I hope there's consensus that while current GCC doesn't warn
about this, ideally it should, which means it should warn for valid uses
of strtol(3), which means strtol(3) should be fixed, in all of ISO,
POSIX, and glibc.

Cheers,
Alex

> > > > Well, I don't know how to report that defect to WG14.  If you help me,
> > > > I'll be pleased to do so.  Do they have a public mailing list or
> > > > anything like that?
> > > 
> > > One can submit clarification or change requests:
> > > 
> > > https://www.open-std.org/jtc1/sc22/wg14/www/contributing.html

P.S.:

I've sent a mail to UNE (the Spanish National Body for ISO), and
asked them about joining WG14.  Let's see what they say.

P.S. 2:

I'm also preparing a paper; would you mind championing it if I'm not yet
able to do it when it's ready?

P.S. 3:

Do you know of any Spanish member of WG14?  Maybe I can talk with them
to have more information about how they work.

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
Hi Martin,

On Fri, Jul 05, 2024 at 05:02:15PM GMT, Martin Uecker wrote:
> > But when the thing gets non-trivial, as in strtol(3), GCC misses the
> > -Wrestrict diagnostic, as reported in
> > .
> > 
> > Let's write a reproducer by altering the dumb.c program from above, with
> > just another reference:
> > 
> > int
> > dumb2(int *restrict a, int *restrict *restrict ap)
> > {
> > // We don't access the objects
> > return a == *ap;
> > }
> > 
> > int
> > main(void)
> > {
> > int x = 3;
> > int *xp = &x;
> > 
> > return dumb2(&x, &xp);
> > }
> > 
> > GCC doesn't report anything bad here, even though it's basically the
> > same as the program from above:
> > 
> > $ cc -Wall -Wextra dumb2.c
> > $
> 
> strtol does have  a "char * restrict * restrict" though, so the
> situation is different.   A "char **" and a "const char *"
> shouldn't alias anyway. 

Pedantically, it is actually declared as 'char **restrict' (the inner
one is not declared as restrict, even though it will be restricted,
since there are no other unrestricted pointers).

I've written functions that more closely resemble strtol(3), to show
that in the end they all share the same issue regarding const-ness:

$ cat d.c 
int d(const char *restrict ca, char *restrict a)
{
return ca > a;
}

int main(void)
{
char x = 3;
char *xp = &x;
d(xp, xp);
}
$ cc -Wall -Wextra d.c 
d.c: In function ‘main’:
d.c:10:9: warning: passing argument 2 to ‘restrict’-qualified parameter 
aliases with argument 1 [-Wrestrict]
   10 | d(xp, xp);
  | ^

This trivial program causes a diagnostic.  (Although I think the '>'
should also cause a diagnostic!!)

Let's add a reference, to resemble strtol(3):

$ cat d2.c 
int d2(const char *restrict ca, char *restrict *restrict ap)
{
return ca > *ap;
}

int main(void)
{
char x = 3;
char *xp = &x;
d2(xp, &xp);
}
$ cc -Wall -Wextra d2.c 
$ 

Why does this not cause a -Wrestrict diagnostic, while d.c does?  How
are these programs any different regarding pointer restrict-ness?

> > Well, I don't know how to report that defect to WG14.  If you help me,
> > I'll be pleased to do so.  Do they have a public mailing list or
> > anything like that?
> 
> One can submit clarification or change requests:
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/contributing.html

Thanks!  Will do.  Anyway, I think this should be discussed in glibc/gcc
in parallel, since it's clearly a missed diagnostic, and possibly a
dangerous use of restrict if the compiler does any assumptions that
shouldn't be done.

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
[CC += linux-man@, since we're discussing an API documented there, and
 the manual page would also need to be updated]

Hi Xi,  Jakub,

On Fri, Jul 05, 2024 at 09:38:21PM GMT, Xi Ruoyao wrote:
> On Fri, 2024-07-05 at 15:03 +0200, Alejandro Colomar wrote:
> > ISO C specifies these APIs as accepting a restricted pointer in their
> > first parameter:
> > 
> > $ stdc c99 strtol
> > long int strtol(const char *restrict nptr, char **restrict endptr, int 
> > base);
> > $ stdc c11 strtol
> > long int strtol(const char *restrict nptr, char **restrict endptr, int 
> > base);
> > 
> > However, it should be considered a defect in ISO C.  It's common to see
> > code that aliases it:
> > 
> > char str[] = "10 20";
> > 
> > p = str;
> > a = strtol(p, &p, 0);  // Let's ignore error handling for
> > b = strtol(p, &p, 0);  // simplicity.
> 
> Why this is wrong?
> 
> During the execution of strtol() the only expression accessing the
> object "p" is *endptr.  When the body of strtol() refers "nptr" it
> accesses a different object, not "p".



Theoretically, 'restrict' is defined in terms of accesses, not just
references, so it's fine for strtol(3) to hold two references of p in
restrict pointers.  That is, the following code is valid:

int
dumb(int *restrict a, int *restrict also_a)
{
// We don't access the objects
return a == also_a;
}

int
main(void)
{
int x = 3;

return dumb(&x, &x);
}

However, in practice that's dumb.  The caller cannot know that the
function doesn't access the object, so it must be cautious and enable
-Wrestrict, which should be paranoid and do not allow passing references
to the same object in different arguments, just in case the function
decides to access to objects.  Of course, GCC reports a diagnostic for
the previous code:

$ cc -Wall -Wextra dumb.c 
dumb.c: In function ‘main’:
dumb.c:13:21: warning: passing argument 1 to ‘restrict’-qualified 
parameter aliases with argument 2 [-Wrestrict]
   13 | return dumb(&x, &x);
  | ^~  ~~

... even when there's no UB, since the object is not being accessed.

But when the thing gets non-trivial, as in strtol(3), GCC misses the
-Wrestrict diagnostic, as reported in
.

Let's write a reproducer by altering the dumb.c program from above, with
just another reference:

int
dumb2(int *restrict a, int *restrict *restrict ap)
{
// We don't access the objects
return a == *ap;
}

int
main(void)
{
int x = 3;
int *xp = &x;

return dumb2(&x, &xp);
}

GCC doesn't report anything bad here, even though it's basically the
same as the program from above:

$ cc -Wall -Wextra dumb2.c
$

Again, there's no UB, but we really want to be cautious and get a
diagnostic as callers, just in case the callee decides to access the
object; we never know.

So, GCC should be patched to report a warning in the program above.
That will also cause strtol(3) to start issuing warnings in use cases
like the one I showed.

Even further, let's try something really weird: inequality comparison,
which is only defined for pointers to the same array object:

int
dumb3(int *restrict a, int *restrict *restrict ap)
{
// We don't access the objects
return a > *ap;
}

int
main(void)
{
int x = 3;
int *xp = &x;

return dumb3(&x, &xp);
}

The behavior is still defined, since the obnjects are not accessed, but
the compiler should really warn, on both sides:

-  The caller is passing references to the same object in restricted
   parameters, which is a red flag.

-  The callee is comparing for inequality pointers that should, under
   normal circumstances, cause Undefined Behavior.


> And if this is really wrong you should report it to WG14 before changing
> glibc.

Well, I don't know how to report that defect to WG14.  If you help me,
I'll be pleased to do so.  Do they have a public mailing list or
anything like that?


Cheers,
Alex

-- 



signature.asc
Description: PGP signature


[PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Alejandro Colomar via Gcc
ISO C specifies these APIs as accepting a restricted pointer in their
first parameter:

$ stdc c99 strtol
long int strtol(const char *restrict nptr, char **restrict endptr, int base);
$ stdc c11 strtol
long int strtol(const char *restrict nptr, char **restrict endptr, int base);

However, it should be considered a defect in ISO C.  It's common to see
code that aliases it:

char str[] = "10 20";

p = str;
a = strtol(p, &p, 0);  // Let's ignore error handling for
b = strtol(p, &p, 0);  // simplicity.

strtol(3) doesn't write to the string at all, so it shouldn't care at
all if there's any aliasing.  Requiring that the user uses a distinct
pointer for the second argument is an artificial imposition that has no
reason to be, and is often violated by real code, so let's lift that
restriction.

For example, in the shadow project, there were two cases (as of
shadow-4.14.8; they probably still are there in more recent versions,
but they now use some wrapper functions that make it more complex to
show) of violation of this restriction:

$ grep -rn strto.*pos.*pos lib* src/ | sed 's/\t\t*/\t/'
src/usermod.c:322:  last = strtoll(pos + 1, &pos, 10);
$ grep -rn strto.*end.*end lib* src/ | sed 's/\t\t*/\t/'
lib/getrange.c:83:  n = strtoul (endptr, &endptr, 10);

Link: 
Cc: 
Cc: Paul Eggert 
Signed-off-by: Alejandro Colomar 
---
 include/stdlib.h   |  35 +++---
 include/wchar.h|  17 ++-
 manual/arith.texi  |  36 +++
 stdlib/inttypes.h  |  24 ++---
 stdlib/stdlib.h| 119 +
 sysdeps/ieee754/ldbl-opt/nldbl-strtold_l.c |   2 +-
 wcsmbs/wchar.h |  96 -
 7 files changed, 154 insertions(+), 175 deletions(-)

diff --git a/include/stdlib.h b/include/stdlib.h
index 0cab3f5b56..c3f61f6891 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -189,32 +189,31 @@ libc_hidden_proto (__arc4random_uniform);
 extern void __arc4random_buf_internal (void *buffer, size_t len)
  attribute_hidden;
 
-extern double __strtod_internal (const char *__restrict __nptr,
+extern double __strtod_internal (const char *__nptr,
 char **__restrict __endptr, int __group)
  __THROW __nonnull ((1)) __wur;
-extern float __strtof_internal (const char *__restrict __nptr,
+extern float __strtof_internal (const char *__nptr,
char **__restrict __endptr, int __group)
  __THROW __nonnull ((1)) __wur;
-extern long double __strtold_internal (const char *__restrict __nptr,
+extern long double __strtold_internal (const char *__nptr,
   char **__restrict __endptr,
   int __group)
  __THROW __nonnull ((1)) __wur;
-extern long int __strtol_internal (const char *__restrict __nptr,
+extern long int __strtol_internal (const char *__nptr,
   char **__restrict __endptr,
   int __base, int __group)
  __THROW __nonnull ((1)) __wur;
-extern unsigned long int __strtoul_internal (const char *__restrict __nptr,
+extern unsigned long int __strtoul_internal (const char *__nptr,
 char **__restrict __endptr,
 int __base, int __group)
  __THROW __nonnull ((1)) __wur;
 __extension__
-extern long long int __strtoll_internal (const char *__restrict __nptr,
+extern long long int __strtoll_internal (const char *__nptr,
 char **__restrict __endptr,
 int __base, int __group)
  __THROW __nonnull ((1)) __wur;
 __extension__
-extern unsigned long long int __strtoull_internal (const char *
-  __restrict __nptr,
+extern unsigned long long int __strtoull_internal (const char *__nptr,
   char **__restrict __endptr,
   int __base, int __group)
  __THROW __nonnull ((1)) __wur;
@@ -226,33 +225,31 @@ libc_hidden_proto (__strtoll_internal)
 libc_hidden_proto (__strtoul_internal)
 libc_hidden_proto (__strtoull_internal)
 
-extern double strtod_l_internal (const char *__restrict __nptr,
+extern double strtod_l_internal (const char *__nptr,
 char **__restrict __endptr, int __group,
 locale_t __loc);
-extern float strtof_l_internal (const char *__restrict __nptr,
+extern float strtof_l_internal (const char *__nptr,
char **__restrict __endptr, int __group,
locale_t __loc);
-extern long double strtold_l_

Re: strtol(3) with QChar arguments

2024-05-05 Thread Alejandro Colomar via Gcc
Hi Martin,

On Sun, May 05, 2024 at 04:47:21PM +0200, Martin Uecker wrote:
> Am Sonntag, dem 05.05.2024 um 15:13 +0200 schrieb Alejandro Colomar:
> > Hi Martin,
> > 
> > I was wondering why C23 didn't use QChar for strtol(3).  It has the same
> > problems that string functions have: a const input string and a
> > non-const output string (the endptr).
> 
> I am not sure whether strtol was discussed.
> 
> > 
> > I think endptr should have the same constness of the string passed to
> > strtol(3), no?
> > 
> > Should this be addressed for C3x?  For liba2i.git, I'm working on
> > const-generic versions of strtol(3) wrappers, which have helped simplify
> > the const/non-const mix of pointers in shadow.git.
> 
> One potential issue is that for strtol such a change would break
> all callers that pass a const-qualified pointer as first argument
> and provide an argument for enptr second, which now has to be
> a pointer to a non-const pointer.

Yep, it is more aggressive with existing code than e.g. strchr(3).

> 
> For the functions we changed this breaks only cases where
> a const qualified pointer is passed and then the result
> is assigned to a non-const pointer, which could already be
> considered questionable in existing code.

Hmmm, okay.  I'll then just provide my library to replace strtol(3)
calls, and hope that programmers switch to it voluntarily.

Thanks!

Have a lovely day!
Alex

-- 

A client is hiring kernel driver, mm, and/or crypto developers;
contact me if interested.


signature.asc
Description: PGP signature


strtol(3) with QChar arguments

2024-05-05 Thread Alejandro Colomar via Gcc
Hi Martin,

I was wondering why C23 didn't use QChar for strtol(3).  It has the same
problems that string functions have: a const input string and a
non-const output string (the endptr).

I think endptr should have the same constness of the string passed to
strtol(3), no?

Should this be addressed for C3x?  For liba2i.git, I'm working on
const-generic versions of strtol(3) wrappers, which have helped simplify
the const/non-const mix of pointers in shadow.git.

Have a lovely day!
Alex

-- 

A client is hiring kernel driver, mm, and/or crypto developers;
contact me if interested.


signature.asc
Description: PGP signature


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-30 Thread Alejandro Colomar via Gcc
Hi Mark,

On Sun, Apr 21, 2024 at 10:40:14PM +0200, Alejandro Colomar wrote:

[...]

> Let's generate a v2 patch set, showing the range-diff against v1.  We
> need to check the commit IDs of the first set, which can be found in the
> mailing list archives, thanks to the trick we used.  The v1 range was
> 7ec952012^..892a12470.  So we just pass that range:
> 
>   $ git format-patch -o ./patches/ master..HEAD \
>   --range-diff=7ec952012^..892a12470 -v2 --cover-letter;
>   ./patches/v2--cover-letter.patch
>   
> ./patches/v2-0001-share-mk-build-fonts-unifont-Build-UnifontR-from-.patch
>   ./patches/v2-0002-share-mk-build-pdf-book-Use-Unifont.patch
>   
> ./patches/v2-0003-share-mk-build-fonts-unifont-Specify-space-width-.patch
> 
> The v2 cover letter shows the changes introduced since v1:
> 
>   $ tail -n20 ./patches/v2--cover-letter.patch 
>create mode 100644 share/mk/build/fonts/unifont/dit.mk
>create mode 100644 share/mk/build/fonts/unifont/pfa.mk
>create mode 100644 
> share/mk/configure/build-depends/fonts-unifont/unifont.otf.mk
> 
>   Range-diff against v1:
>   1:  7ec952012 = 1:  7ec952012 share/mk/: build-fonts-unifont: Build 
> UnifontR from unifont.otf
>   2:  d80376b08 = 2:  d80376b08 share/mk/: build-pdf-book: Use Unifont
>   3:  892a12470 ! 3:  bc7fa7d92 share/mk/: build-fonts-unifont: Specify 
> spacewidth in afmtodit(1)
>   @@ Metadata
>Author: Alejandro Colomar 
>
> ## Commit message ##
>   -share/mk/: build-fonts-unifont: Specify spacewidth in 
> afmtodit(1)
>   +share/mk/: build-fonts-unifont: Specify space width in 
> afmtodit(1)
>
>Link: 
> 
>Suggested-by: "G. Branden Robinson" 
>   -- 
>   2.43.0

I've added a recommendation in the Linux man-pages contributing
documentation that patches be sent with a range diff, and also that
patches be sent in PGP-signed mail (if the user has a PGP key).  It has
specific instructions like the above (but simplified).



Feel free to copy any of that documentation.

I also recommended specific mutt(1) settings:

set crypt_autosign = yes
set crypt_protected_headers_write = yes

And git-send-email(1) configuration for using with neomutt(1):

   [sendemail]
   sendmailcmd = neomutt -C -H - && true

For all the documentation for mail and patches, see these two files:



Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-21 Thread Alejandro Colomar via Gcc
On Sun, Apr 21, 2024 at 10:40:14PM +0200, Alejandro Colomar wrote:
> On Sun, Apr 21, 2024 at 05:30:52PM +0200, Mark Wielaard wrote:
> > Hi Alejandro,
> 
> Hi Mark,
> 
> I've reordered your message, to organize my response.
> 
> > On Wed, Apr 10, 2024 at 06:30:42PM +0200, Alejandro Colomar wrote:
> > > It would also be interesting to require showing range-diffs between
> > > patch revisions.  They make it much more difficult to introduce a
> > > vulnerability after a reviewer has turned its mins into approving the
> > > patch.  Of course, the patch could go in if the submitter lies in the
> > > range-diff and the vuln is undetected, but then it can be verified a
> > > posteriory to prove that there was a lie.
> > 
> > Could you give an example of using git range-diff?
> 
> Sure!
> 
> > Normally when asked for changes to a
> > patch (series) I do an git rebase -i (on the local branch I used to
> > develop the feature/bug fix) and split/commit all requested changes
> > and then sent the new patches with git send-email again.
> 
> I do the same thing.
> 
> > But I guess
> > to use/combine that with git range-diffs I should start creating new
> > local branches for each patch (series) in development?
> 
> No; it's compatible with that workflow.
> 
> > How do you go from
> > v1 of a patch (series) to a v2?
> 
> I'll start first with how you go from nothing to v1, which will help
> explain how you go from v1 to v2.
> 
> Let's say I have a branch 'unifont' for adding the Unifont font to the
> Linux man-pages' PDF book.  I have it in a branch that applies on top of
> 'master'.
> 
>   $ git log --oneline --graph master..unifont;
>   * 892a12470 (HEAD -> unifont, alx/contrib, contrib) share/mk/: 
> build-fonts-unifont: Specify spacewidth in afmtodit(1)
>   * d80376b08 share/mk/: build-pdf-book: Use Unifont
>   * 7ec952012 share/mk/: build-fonts-unifont: Build UnifontR from 
> unifont.otf
> 
> I want to send that as a patch set (v1, of course, since it's the first
> send).  I would already use range-diff when generating the patches:
> 
>   $ git format-patch -o ./patches master..HEAD \
>   --range-diff=master -v1 --cover-letter;
>   ./patches/v1--cover-letter.patch
>   
> ./patches/v1-0001-share-mk-build-fonts-unifont-Build-UnifontR-from-.patch
>   ./patches/v1-0002-share-mk-build-pdf-book-Use-Unifont.patch
>   
> ./patches/v1-0003-share-mk-build-fonts-unifont-Specify-spacewidth-i.patch
> 
> Why would you do that in v1?  It notes the commit IDs of the patches, so
> you can later check them when preparing v2; we'll come back to that.
> For now, see what this adds to the patch set:
> 
>   $ tail ./patches/v1--cover-letter.patch ;
>create mode 100644 share/mk/build/fonts/unifont/pfa.mk
>create mode 100644 
> share/mk/configure/build-depends/fonts-unifont/unifont.otf.mk
> 
>   Range-diff against v0:
>   -:  - > 1:  7ec952012 share/mk/: build-fonts-unifont: Build 
> UnifontR from unifont.otf
>   -:  - > 2:  d80376b08 share/mk/: build-pdf-book: Use Unifont
>   -:  - > 3:  892a12470 share/mk/: build-fonts-unifont: Specify 
> spacewidth in afmtodit(1)
>   -- 
>   2.43.0
> 
> You can see the IDs of the commits that form this patch set, which match
> the git-log(1) that I showed above.  Now someone suggests a change.  For
> this example, I wasn't happy with a commit message, and added a space to
> the third commit message subject line.  The git-log(1) is now:
> 
>   $ git log --oneline --graph master..unifont
>   * bc7fa7d92 (HEAD -> unifont) share/mk/: build-fonts-unifont: Specify 
> space width in afmtodit(1)
>   * d80376b08 share/mk/: build-pdf-book: Use Unifont
>   * 7ec952012 share/mk/: build-fonts-unifont: Build UnifontR from 
> unifont.otf
> 
> Let's generate a v2 patch set, showing the range-diff against v1.  We
> need to check the commit IDs of the first set, which can be found in the
> mailing list archives, thanks to the trick we used.  The v1 range was
> 7ec952012^..892a12470.  So we just pass that range:
> 
>   $ git format-patch -o ./patches/ master..HEAD \
>   --range-diff=7ec952012^..892a12470 -v2 --cover-letter;
>   ./patches/v2--cover-letter.patch
>   
> ./patches/v2-0001-share-mk-build-fonts-unifont-Build-UnifontR-from-.patch
>   ./patches/v2-0002-share-mk-build-pdf-book-Use-Unifont.patch
>   
> ./patches/v2-0003-share-mk-build-fonts-unifont-Specify-space-width-.patch
> 
> The v2 cover letter shows the changes introduced since v1:
> 
>   $ tail -n20 ./patches/v2--cover-letter.patch 
>create mode 100644 share/mk/build/fonts/unifont/dit.mk
>create mode 100644 share/mk/build/fonts/unifont/pfa.mk
>create mode 100644 
> share/mk/configure/build-depends/fonts-unifont/unifont.otf.mk
> 
>   Range-diff against v1:
>   1:  7ec952012 = 1:  7ec952012 share/mk/: build-fonts-unifont: Build 
> UnifontR from unif

Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-21 Thread Alejandro Colomar via Gcc
On Sun, Apr 21, 2024 at 05:30:52PM +0200, Mark Wielaard wrote:
> Hi Alejandro,

Hi Mark,

I've reordered your message, to organize my response.

> On Wed, Apr 10, 2024 at 06:30:42PM +0200, Alejandro Colomar wrote:
> > It would also be interesting to require showing range-diffs between
> > patch revisions.  They make it much more difficult to introduce a
> > vulnerability after a reviewer has turned its mins into approving the
> > patch.  Of course, the patch could go in if the submitter lies in the
> > range-diff and the vuln is undetected, but then it can be verified a
> > posteriory to prove that there was a lie.
> 
> Could you give an example of using git range-diff?

Sure!

> Normally when asked for changes to a
> patch (series) I do an git rebase -i (on the local branch I used to
> develop the feature/bug fix) and split/commit all requested changes
> and then sent the new patches with git send-email again.

I do the same thing.

> But I guess
> to use/combine that with git range-diffs I should start creating new
> local branches for each patch (series) in development?

No; it's compatible with that workflow.

> How do you go from
> v1 of a patch (series) to a v2?

I'll start first with how you go from nothing to v1, which will help
explain how you go from v1 to v2.

Let's say I have a branch 'unifont' for adding the Unifont font to the
Linux man-pages' PDF book.  I have it in a branch that applies on top of
'master'.

$ git log --oneline --graph master..unifont;
* 892a12470 (HEAD -> unifont, alx/contrib, contrib) share/mk/: 
build-fonts-unifont: Specify spacewidth in afmtodit(1)
* d80376b08 share/mk/: build-pdf-book: Use Unifont
* 7ec952012 share/mk/: build-fonts-unifont: Build UnifontR from 
unifont.otf

I want to send that as a patch set (v1, of course, since it's the first
send).  I would already use range-diff when generating the patches:

$ git format-patch -o ./patches master..HEAD \
--range-diff=master -v1 --cover-letter;
./patches/v1--cover-letter.patch

./patches/v1-0001-share-mk-build-fonts-unifont-Build-UnifontR-from-.patch
./patches/v1-0002-share-mk-build-pdf-book-Use-Unifont.patch

./patches/v1-0003-share-mk-build-fonts-unifont-Specify-spacewidth-i.patch

Why would you do that in v1?  It notes the commit IDs of the patches, so
you can later check them when preparing v2; we'll come back to that.
For now, see what this adds to the patch set:

$ tail ./patches/v1--cover-letter.patch ;
 create mode 100644 share/mk/build/fonts/unifont/pfa.mk
 create mode 100644 
share/mk/configure/build-depends/fonts-unifont/unifont.otf.mk

Range-diff against v0:
-:  - > 1:  7ec952012 share/mk/: build-fonts-unifont: Build 
UnifontR from unifont.otf
-:  - > 2:  d80376b08 share/mk/: build-pdf-book: Use Unifont
-:  - > 3:  892a12470 share/mk/: build-fonts-unifont: Specify 
spacewidth in afmtodit(1)
-- 
2.43.0

You can see the IDs of the commits that form this patch set, which match
the git-log(1) that I showed above.  Now someone suggests a change.  For
this example, I wasn't happy with a commit message, and added a space to
the third commit message subject line.  The git-log(1) is now:

$ git log --oneline --graph master..unifont
* bc7fa7d92 (HEAD -> unifont) share/mk/: build-fonts-unifont: Specify 
space width in afmtodit(1)
* d80376b08 share/mk/: build-pdf-book: Use Unifont
* 7ec952012 share/mk/: build-fonts-unifont: Build UnifontR from 
unifont.otf

Let's generate a v2 patch set, showing the range-diff against v1.  We
need to check the commit IDs of the first set, which can be found in the
mailing list archives, thanks to the trick we used.  The v1 range was
7ec952012^..892a12470.  So we just pass that range:

$ git format-patch -o ./patches/ master..HEAD \
--range-diff=7ec952012^..892a12470 -v2 --cover-letter;
./patches/v2--cover-letter.patch

./patches/v2-0001-share-mk-build-fonts-unifont-Build-UnifontR-from-.patch
./patches/v2-0002-share-mk-build-pdf-book-Use-Unifont.patch

./patches/v2-0003-share-mk-build-fonts-unifont-Specify-space-width-.patch

The v2 cover letter shows the changes introduced since v1:

$ tail -n20 ./patches/v2--cover-letter.patch 
 create mode 100644 share/mk/build/fonts/unifont/dit.mk
 create mode 100644 share/mk/build/fonts/unifont/pfa.mk
 create mode 100644 
share/mk/configure/build-depends/fonts-unifont/unifont.otf.mk

Range-diff against v1:
1:  7ec952012 = 1:  7ec952012 share/mk/: build-fonts-unifont: Build 
UnifontR from unifont.otf
2:  d80376b08 = 2:  d80376b08 share/mk/: build-pdf-book: Use Unifont
3:  892a12470 ! 3:  bc7fa7d92 share/mk/: build-fonts-unifont: Specify 
spacewidth in afmtodit(1)
@@ Metadata
 Au

Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-10 Thread Alejandro Colomar via Gcc
Hi Joel,

On Wed, Apr 03, 2024 at 08:53:21AM -0500, Joel Sherrill wrote:
> On Wed, Apr 3, 2024, 3:09 AM Florian Weimer via Gdb 
> wrote:
> 
> > * Guinevere Larsen via Overseers:
> >
> > > Beyond that, we (GDB) are already experimenting with approved-by, and
> > > I think glibc was doing the same.
> >
> > The glibc project uses Reviewed-by:, but it's completely unrelated to
> > this.  Everyone still pushes their own patches, and there are no
> > technical countermeasures in place to ensure that the pushed version is
> > the reviewed version.
> >
> 
> Or that there isn't "collusion" between a malicious author and reviewer.
> Just tagging it approved or reviewed by just gives you two people to blame.
> It is not a perfect solution either.

If those tags are given in a mailing list _and_ mails to the mailing
list are PGP-signed, then you can verify that the tags were valid, and
not just invented.

And with signed commits you have a guarantee that one doesn't overwrite
history attributing commits to other committers (it could happen with
authors, and indeed it seems to have happened in this case, but if
patches are sent via signed mail, then it's also verifyiable).

In the email side, there are a few things to improve:

For sending signed emails, there's patatt(5) (used by b4(5)), but it
might not be comfortable to use for everyone.  For those preferring
normal MUAs, neomutt(1) is an alternative:



And I have a few patches for improving the security of protected
messages:




And the corresponding security-vulnerability reports:







(I find it funny that I didn't know about this xz issue until yesterday,
 so not when I reported those issues, but they are interestingly
 related.)

It would also be interesting to require showing range-diffs between
patch revisions.  They make it much more difficult to introduce a
vulnerability after a reviewer has turned its mins into approving the
patch.  Of course, the patch could go in if the submitter lies in the
range-diff and the vuln is undetected, but then it can be verified a
posteriory to prove that there was a lie.

I recently started applying all of these (signing all email, including
patches, sign all commits and tags, and provide range-diffs), and it's
not too uncomfrotable; I'd say it's even more comfortable than not doing
it, as it allows me to easily roll back a patch to an older revision if
I find I introduced a mistake, and find where I introduced it.


Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: Missed optimization of strcpy(3) (or stpcpy(3))

2023-11-13 Thread Alejandro Colomar via Gcc
Hi Richard,

On Mon, Nov 13, 2023 at 07:56:00AM +0100, Richard Biener wrote:
> Can you file a bugreport please?  It looks like a transform for the
> strlen pass.

Done; thanks.


-- 



signature.asc
Description: PGP signature


Missed optimization of strcpy(3) (or stpcpy(3))

2023-11-12 Thread Alejandro Colomar via Gcc
Hi,

The following code can be optimized to use memcpy(3), since the length
of the copy is known (we've just called strnlen(3), and discarded the
possibility of truncated lengths).

$ cat strxcpy.c
#include 
#include 
#include 

ssize_t
strxcpy(char *restrict dst, const char *restrict src, size_t dsize)
{
if (strnlen(src, dsize) == dsize)
return -1;

return stpcpy(dst, src) - dst;
}


This compiles to stpcpy(3).  Similar code written with strcpy(3) also
keeps strcpy(3).  But they could be optimized with mempcpy(3) or
memcpy(3).


Thanks,
Alex

-- 



signature.asc
Description: PGP signature


Re: Confusing location of error in source code

2023-09-01 Thread Alejandro Colomar via Gcc
Hi Jonathan,

On 2023-09-01 08:49, Jonathan Wakely wrote:
> On Thu, 31 Aug 2023, 17:05 Alejandro Colomar via Gcc, 
> wrote:
> 
>> Hi!
>>
>> I've been confused for some time with a compilation error that
>> pointed to a slightly-off location.  I wasn't seeing that I used
>> a temporary variable in a constant expression.  The code could be
>> reduced to:
>>
>> $ cat const.c
>> int
>> main(void)
>> {
>> int x = 42;
>>
>> _Static_assert(0 || 7 > x, "");
>> }
>> $ cc -Wall -Wextra const.c
>> const.c: In function ‘main’:
>> const.c:6:26: error: expression in static assertion is not constant
>> 6 | _Static_assert(0 || 7 > x, "");
>>   |~~^~~~
>>
>>
>> I think the appropriate error report should point to this other point:
>>
>>
>> 6 | _Static_assert(0 || 7 > x, "");
>>   |~^
>>
> 
> Please use bugzilla for bug reports.

Sure, will do.  Thanks!

Alex

> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Confusing location of error in source code

2023-08-31 Thread Alejandro Colomar via Gcc
Hi!

I've been confused for some time with a compilation error that
pointed to a slightly-off location.  I wasn't seeing that I used
a temporary variable in a constant expression.  The code could be
reduced to:

$ cat const.c 
int
main(void)
{
int x = 42;

_Static_assert(0 || 7 > x, "");
}
$ cc -Wall -Wextra const.c 
const.c: In function ‘main’:
const.c:6:26: error: expression in static assertion is not constant
6 | _Static_assert(0 || 7 > x, "");
  |~~^~~~


I think the appropriate error report should point to this other point:


6 | _Static_assert(0 || 7 > x, "");
  |~^

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: Should GCC warn about sizeof(flexible_struct)?

2023-08-14 Thread Alejandro Colomar via Gcc
On 2023-08-14 19:11, Alejandro Colomar wrote:
> Hi Martin,
> 
> On 2023-08-14 18:49, Martin Uecker wrote:
>> Am Montag, dem 14.08.2023 um 12:21 +0200 schrieb Alejandro Colomar:
> [...]
> 
>>> Would you mind chiming in to this question?:
>>> 
>>
>> Unclear. It is probably UB by omission.
> 
> Agree.
> 
>>  But this is possibly
>> different from the FAM case.
> 
> I don't think I agree on this.  To me it really looks like the same thing.
> BTW, there was a mention there to the FAM case in a comment:
> 
> 
>>
>> I any case, I am not so concerned about the whether this UB, 
>> but that a programmer might do:
>>
>> struct s = { .. }
>> struct s* p = malloc(...)
>> memcpy(p, &s, sizeof s); // copy header. 
> 
> That's (or could be) a bug, and just another manifestation of why
> sizeof(s) is wrong.  Let's see a couple of ways how this can go wrong:
> 
> 
> $ cat memcpy.c 
> #include 
> #include 
> #include 
> 
> struct s {
>   int   i;
>   char  c;
>   char  fam[];
> };
> 
> struct h {
>   int   i;
>   char  c;
> };
> 
> int
> main(void)
> {
>   char  *f;
>   struct h  h = { .i = 42, .c = 3 };
>   struct s  *p;
> 
>   p = malloc(sizeof(struct s) + sizeof("foobar"));
> 
>   strcpy(p->fam, "foobar");
>   /*
>* since we're copying the header, it shouldn't matter if we
>* copy it after copying the fam itself, no?  They're at
>* different locations... or are they?
>*/
>   memcpy(p, &h, sizeof(struct s));

But if you did here

memcpy(p, &h, offsetof(struct s, fam));

>   puts(p->fam);
>   free(p);
> 
>   p = malloc(sizeof(struct s) + sizeof("foobar"));
> 
>   f = mempcpy(p, &h, sizeof(struct s));

and here

f = mempcpy(p, &h, offsetof(struct s, fam));

Then it magically works as expected:

$ ./a.out 
foobar
foobar


>   /*
>* We could reuse the pointer from mempcpy(3) to get the location
>* of just after the header, right?  Heh.
>*/
>   strcpy(f, "foobar");
>   puts(p->fam);
>   free(p);
> }
> $ cc -Wall -Wextra memcpy.c -D_GNU_SOURCE
> $ ./a.out 
> 
> 
> $
> 
> 
> Cheers,
> Alex
> 

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: Should GCC warn about sizeof(flexible_struct)?

2023-08-14 Thread Alejandro Colomar via Gcc
Hi Martin,

On 2023-08-14 18:49, Martin Uecker wrote:
> Am Montag, dem 14.08.2023 um 12:21 +0200 schrieb Alejandro Colomar:
[...]

>> Would you mind chiming in to this question?:
>> 
> 
> Unclear. It is probably UB by omission.

Agree.

>  But this is possibly
> different from the FAM case.

I don't think I agree on this.  To me it really looks like the same thing.
BTW, there was a mention there to the FAM case in a comment:


> 
> I any case, I am not so concerned about the whether this UB, 
> but that a programmer might do:
> 
> struct s = { .. }
> struct s* p = malloc(...)
> memcpy(p, &s, sizeof s); // copy header. 

That's (or could be) a bug, and just another manifestation of why
sizeof(s) is wrong.  Let's see a couple of ways how this can go wrong:


$ cat memcpy.c 
#include 
#include 
#include 

struct s {
int   i;
char  c;
char  fam[];
};

struct h {
int   i;
char  c;
};

int
main(void)
{
char  *f;
struct h  h = { .i = 42, .c = 3 };
struct s  *p;

p = malloc(sizeof(struct s) + sizeof("foobar"));

strcpy(p->fam, "foobar");
/*
 * since we're copying the header, it shouldn't matter if we
 * copy it after copying the fam itself, no?  They're at
 * different locations... or are they?
 */
memcpy(p, &h, sizeof(struct s));
puts(p->fam);
free(p);

p = malloc(sizeof(struct s) + sizeof("foobar"));

f = mempcpy(p, &h, sizeof(struct s));
/*
 * We could reuse the pointer from mempcpy(3) to get the location
 * of just after the header, right?  Heh.
 */
strcpy(f, "foobar");
puts(p->fam);
free(p);
}
$ cc -Wall -Wextra memcpy.c -D_GNU_SOURCE
$ ./a.out 


$


Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: Should GCC warn about sizeof(flexible_struct)?

2023-08-14 Thread Alejandro Colomar via Gcc
Hi Richard,

On 2023-08-14 08:41, Richard Biener wrote:
> On Fri, Aug 11, 2023 at 8:30 PM Alejandro Colomar via Gcc
>  wrote:

[...]

>> How about some -Wfam-sizeof-arithmetic that would not warn about taking
>> sizeof(s) but would warn if that sizeof is used in any arithmetic?
> 
> There are probably many ways sizeof() plus arithmetic can yield a correct
> size for allocation.  After all _all_ uses of FAM requires allocation
> and there's
> no convenient standard way of calculating the required size (sizeof
> (fam-type[n])?).

You may be confusing sizeof(struct contains_fam) with sizeof(fam[n]).

Yes, the second is necessary for allocation, but the first is not.  Well,
it is valid for allocation but only because allocating extra bytes is not
a problem.

The size of a flexible structure is calculated as the sum of the offset
of the fam, and the size of the fam.  The size of the structure has nothing
to do.

struct s {
int   i;
char  c;
char  fam[];
} s;

size = offsetof(struct s, fam) + sizeof("foobar"));  // OK: 12 B

size = sizeof(struct s) + sizeof("foobar"));  // NOK: 15 B; wastes bytes
   ^~~~ problem here.

> 
> Iff we want to diagnose anything then possibly a computation that looks like
> a size computation but that's actually smaller than required,

It's actually the other way around.  The problem is that the computation may
give a value larger than the expected one.  This can be usually a benign bug
and nothing bad will happen.  But when a programmer relies on that size for
reading or writing, which I've seen happen, you better make sure that the
structure has no padding, or you'll be reading/writing at `fam + padding`.

Example, using the allocation above:

strcpy((char *) s + sizeof(struct s), "foobar");  // NOK, writes after 
padding
puts(s->fam);  // OK, reads the fam, but surprise

// Unpredictable; prints 3 uninitialized bytes, then (if no previous 
'\0') "foobar"

strcpy(s->fam, "foobar");  // OK, writes at the fam
puts((char *) s + sizeof(struct s));  // NOK, reads after padding

// prints: "bar"

strcpy(s->fam, "foobar");  // OK
puts((char *) s + offsetof(struct s, fam));  // OK; with offsetof, 
equivalent to s->fam

// prints: "foobar"

strcpy((char *) s + offsetof(struct s, fam), "foobar");  // Also OK
puts(s->fam);  // OK

// prints: "foobar"

> but
> other than that - what
> would you suggest to fix such reported warnings?

To fix the warnings, replace all invocations of `sizeof(struct contains_fam)`
by `offsetof(struct contains_fam, fam)`.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: Should GCC warn about sizeof(flexible_struct)?

2023-08-11 Thread Alejandro Colomar via Gcc
On 2023-08-11 20:29, Alejandro Colomar wrote:
> Hi!
> 
> Structures with flexible array members have restrictions about being
> used in arrays or within other structures, as the size of the enclosing
> aggregate type would be... inconsistent.
> 
> In general, sizeof(flexible_struct) is a problematic thing that rarely
> means what programmers think it means.  It is not the size of the
> structure up to the flexible array member; or expressed using C,
> the following can be true:
> 
>   sizeof(s) != offsetof(s, fam)
> 
> See the program at the bottom that demonstrates how this is problematic.
> 
> It's true that if one uses
> 
>   malloc(offseof(s, fam) + sizeof_member(s, fam[0]) * N);
> 
> and N is very small (0 or 1 usually), the allocation would be smaller
> than the object size, which for GCC seems to be fine, but I'm worried the
> standard is not clear enough about its validity[1].
> 
> [1]: 
> 
> To avoid having UB there, pedantically one would need to call
> 
>   malloc(MAX(sizeof(s), offseof(s, fam) + sizeof_member(s, fam[0]) * N));
> 
> But I think that's the only correct use of sizeof() with structures
> containing flexible array members.  So it seems sizeof() by itself is
> a valid thing, but when adding it to something else to get the total size,
> or doing any arithmetic with it, that's dubious code.
> 
> How about some -Wfam-sizeof-arithmetic that would not warn about taking
> sizeof(s) but would warn if that sizeof is used in any arithmetic?
> 
> Cheers,
> Alex
> 
> ---
> 
> $ cat off.c 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> 
> struct s {
>   int   i;
>   char  c;
>   char  fam[];
> };
> 
> 
> static inline void *xmalloc(size_t size);
> 
> 
> int
> main(void)
> {
>   char  *p;
>   struct s  *s;
> 
>   printf("sizeof: %zu\n", sizeof(struct s));
>   printf("offsetof: %zu\n", offsetof(struct s, fam));
> 
>   puts("\nWith sizeof():");
> 
>   s = xmalloc(sizeof(struct s) + sizeof("Hello, sizeof!"));
>   strcpy(s->fam, "Hello, sizeof!");
>   p = (char *) s + sizeof(struct s);
>   puts(p);
>   free(s);
> 
>   puts("\nWith offsetof(3):");
> 
>   s = xmalloc(offsetof(struct s, fam) + sizeof("Hello, offsetof!"));
>   strcpy(s->fam, "Hello, offsetof!");
>   p = (char *) s + offsetof(struct s, fam);
>   puts(p);
>   free(s);
> 
>   exit(EXIT_SUCCESS);
> }
> 
> 
> static inline void *
> xmalloc(size_t size)
> {
>   void  *p;
> 
>   p = malloc(size);
>   if (p == NULL)
>   err(EXIT_FAILURE, "malloc");
>   return p;
> }
> $ gcc-13 -Wall -Wextra -Wpadded -fanalyzer off.c
> off.c:12:1: warning: padding struct size to alignment boundary with 3 bytes 
> [-Wpadded]
>12 | };
>   | ^

I forgot to paste the output of the program:)

$ ./a.out 
sizeof: 8
offsetof: 5

With sizeof():
lo, sizeof!

With offsetof(3):
Hello, offsetof!

> 
> 
> The only warning I know that is triggered in the code above is -Wpadded,
> which is related to this problem, but I think there should be something
> to warn about sizeof() in this context.
> 
> 

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Should GCC warn about sizeof(flexible_struct)?

2023-08-11 Thread Alejandro Colomar via Gcc
Hi!

Structures with flexible array members have restrictions about being
used in arrays or within other structures, as the size of the enclosing
aggregate type would be... inconsistent.

In general, sizeof(flexible_struct) is a problematic thing that rarely
means what programmers think it means.  It is not the size of the
structure up to the flexible array member; or expressed using C,
the following can be true:

sizeof(s) != offsetof(s, fam)

See the program at the bottom that demonstrates how this is problematic.

It's true that if one uses

malloc(offseof(s, fam) + sizeof_member(s, fam[0]) * N);

and N is very small (0 or 1 usually), the allocation would be smaller
than the object size, which for GCC seems to be fine, but I'm worried the
standard is not clear enough about its validity[1].

[1]: 

To avoid having UB there, pedantically one would need to call

malloc(MAX(sizeof(s), offseof(s, fam) + sizeof_member(s, fam[0]) * N));

But I think that's the only correct use of sizeof() with structures
containing flexible array members.  So it seems sizeof() by itself is
a valid thing, but when adding it to something else to get the total size,
or doing any arithmetic with it, that's dubious code.

How about some -Wfam-sizeof-arithmetic that would not warn about taking
sizeof(s) but would warn if that sizeof is used in any arithmetic?

Cheers,
Alex

---

$ cat off.c 
#include 
#include 
#include 
#include 
#include 


struct s {
int   i;
char  c;
char  fam[];
};


static inline void *xmalloc(size_t size);


int
main(void)
{
char  *p;
struct s  *s;

printf("sizeof: %zu\n", sizeof(struct s));
printf("offsetof: %zu\n", offsetof(struct s, fam));

puts("\nWith sizeof():");

s = xmalloc(sizeof(struct s) + sizeof("Hello, sizeof!"));
strcpy(s->fam, "Hello, sizeof!");
p = (char *) s + sizeof(struct s);
puts(p);
free(s);

puts("\nWith offsetof(3):");

s = xmalloc(offsetof(struct s, fam) + sizeof("Hello, offsetof!"));
strcpy(s->fam, "Hello, offsetof!");
p = (char *) s + offsetof(struct s, fam);
puts(p);
free(s);

exit(EXIT_SUCCESS);
}


static inline void *
xmalloc(size_t size)
{
void  *p;

p = malloc(size);
if (p == NULL)
err(EXIT_FAILURE, "malloc");
return p;
}
$ gcc-13 -Wall -Wextra -Wpadded -fanalyzer off.c
off.c:12:1: warning: padding struct size to alignment boundary with 3 bytes 
[-Wpadded]
   12 | };
  | ^


The only warning I know that is triggered in the code above is -Wpadded,
which is related to this problem, but I think there should be something
to warn about sizeof() in this context.


-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: ISO C's [static] (was: _Nullable and _Nonnull in GCC's analyzer)

2023-08-09 Thread Alejandro Colomar via Gcc
Hi Martin!

On 2023-08-09 14:03, Martin Uecker wrote:
[...]

>> GCC could perfectly add a warning for the following case:
>>
>> void foo(size_t n, int a[n]);
>>
>> int
>> main(void)
>> {
>> int a[7];
>>
>> foo(42, a);
>> }
>>
>> Nobody in their right mind would specify a size of an array
>> in a parameter and expect that passing a smaller array than
>> that can produce a valid program.  So, why not make that a
>> Wall warning?
> 
> But we have this warning! is even activated by 
> default without -Wall and already since GCC 11:

Ahh, I hadn't realized that was added (relatievly) recently.
That's great news!  Thanks!

> https://godbolt.org/z/sMbTon458
> 
> But this is for minimum required elements. How do 
> we differentiate between null and non-null?
> 
> We have:
> 
> int[] or int* // no bound, nullable
> int[N]  // at least N, nullable
> int[static N] // at least N, nonnull
> 
> The 'static' implies nonnull, so we could 
> use 'static' to diffentiate between nonnull 
> and nullable. 

Since static is only for arrays (okay, they're all pointers, but
it's only usable with array syntax), and nullability is a thing
of pointers, I think static is not the right choice.

I oppose using array syntax for pointers, as it can confuse
programmers.

In any case, [static] is not that different from [_Nonnull], and
_Nonnull is more flexible, since you can also use it as
*_Nonnull.

Regarding the issues you have with _Nonnull being a qualifier,
I've been thinking about it for a long time and don't yet have
a concrete answer.  The more I think about it, the more I'm
convinced it is like `restrict`.  You can't have null correctness
as we can do with `const`.  With const, we know it's always bad
to store a const object in a non-const pointer.  With NULL, it
depends on the context: if you've checked for NULL in the lines
above, then it's fine to do the conversion.

There is a proposal for Clang to add `_Optional`, a qualifier to
the pointee, as a more correct version of _Nullable.  The
following would be equivalent:

int *_Nullable  i;
_Optional int   *j;

However, I don't believe it's a good choice, because of the above.
Instead, I think _Nullable or _Nonnull should be like `restrict`
and used only in function prototypes.  Let the analyzer do the
job.  I know `restrict` is hard to grasp, but I couldn't think of
anything better.

> 
> What is missing something which implies bounds
> also inside the callee.  You can use the "access"
> attribute or we extend the meaning of int[N]
> and int[static N] also imply a maximum bound.

Why is the upper bound important?  It's usually irrelevant.

In the case where one wants to make sure that an array is of an
exact size (instead of just "at least"), pointers to arrays can be
used.  They're just not often used, because you don't need that
so often.  I don't think we need a new addition to the language,
do we?

$ cat ap.c
#include 

void
foo(size_t n, int (*a)[n])
{
for (size_t i = 0; i < n; i++)
(*a)[i] = 42;
}
$ gcc-13 -S -Wall -Wextra ap.c 
$ 

In the function above, n is not a lower bound, but an exact bound.

GCC should add a warning for the following code:

$ cat ap2.c
#include 

void foo(size_t n, int (*a)[n]);

void
bar(void)
{
int x[7];

foo(3, &x);
foo(9, &x);
}
$ gcc-13 -S -Wall -Wextra ap2.c
$ 

Neither GCC nor Clang warn about that.  Not even with -fanalyzer.

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


ISO C's [static] (was: _Nullable and _Nonnull in GCC's analyzer)

2023-08-09 Thread Alejandro Colomar via Gcc
Hi Martin,

On 2023-08-09 09:26, Martin Uecker wrote:
> it is a bit cumbersome to use, but one can use [static]
> instead, which gives you the same static warnings.
> 
> [static] does not work with __builtin_dynamic_object_size,
> but maybe this could be changed (there is a bug filed.)
> 
> I am not sure whether [static] should imply [[gnu::nonnull]]

I have a gripe with ISO C's [static].  As you mention, ISO
conflated two functionalities in [static]:

-  The size of the array passed as argument must not be less
   than the size specified in the parameter's [].

-  The pointer must not be NULL.

And there are valid cases where you may want the first but
not the second.  Or the second but not the first (that's the
case for _Nonnull, of course).

In fact, it's so badly damaged, that it prompted a proposal
to ISO C of using [static 1] as an equivalent of _Nonnull in
the prototypes that accepted a pointer that should not be
NULL.  However, that proposal didn't include the functions
that actually take arrays as input (because they are taken
in the opposite order, so array syntax is not legal).  Don't
you find it ironic that ISO C could have used array syntax
for pointers and pointer syntax for arrays?  I do.

As for when one would want to mean the first (size of array)
but not _Nonnull: for a function where you may pass either
an array (which should not be smaller than the size), or a
sentinel NULL value.

Nevertheless, I floated the idea that [static] is completely
unnecessary, and nobody has yet been against it.

GCC could perfectly add a warning for the following case:

void foo(size_t n, int a[n]);

int
main(void)
{
int a[7];

foo(42, a);
}

Nobody in their right mind would specify a size of an array
in a parameter and expect that passing a smaller array than
that can produce a valid program.  So, why not make that a
Wall warning?

And so [static] would be irrelevant in GNU C, because well,
what does it add?  In fact, I like that [static] is so badly
designed, because then we can repurpose plain [size] to mean
the right thing, which would produce cleaner programs
([static] just adds noise to the source).

What do you think of giving [42] a meaning, instead of just
ignoring it?

Cheers,
Alex

> which would then also trigger the optimization. I think
> clang uses it for optimization.
> 
> Martin

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: _Nullable and _Nonnull in GCC's analyzer (was: [PATCH v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h)

2023-07-10 Thread Alejandro Colomar via Gcc

On 7/10/23 22:14, Alejandro Colomar wrote:

[CC += Andrew]

Hi Xi, Andrew,

On 7/10/23 20:41, Xi Ruoyao wrote:

Maybe we should have a weaker version of nonnull which only performs the
diagnostic, not the optimization.  But it looks like they hate the idea:
https://gcc.gnu.org/PR110617.


This is the one thing that makes me use both Clang and GCC to compile,
because while any of them would be enough to build, I want as much
static analysis as I can get, and so I want -fanalyzer (so I need GCC),
but I also use _Nullable (so I need Clang).

If GCC had support for _Nullable, I would have in GCC the superset of
features that I need from both in a single vendor.  Moreover, Clang's
static analyzer is brain-damaged (sorry, but it doesn't have a simple
command line to run it, contrary to GCC's easy -fanalyzer), so having
GCC's analyzer get over those _Nullable qualifiers would be great.

Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
mode, as there are many cases where the compiler doesn't have enough
information, and the analyzer can get rid of false negatives and
positives.  See: 

I'll back the ask for the qualifiers in GCC, for compatibility with
Clang.


BTW, Bionic libc is adding those qualifiers:






Thanks,
Alex



--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


_Nullable and _Nonnull in GCC's analyzer (was: [PATCH v5] libio: Add nonnull attribute for most FILE * arguments in stdio.h)

2023-07-10 Thread Alejandro Colomar via Gcc

[CC += Andrew]

Hi Xi, Andrew,

On 7/10/23 20:41, Xi Ruoyao wrote:

Maybe we should have a weaker version of nonnull which only performs the
diagnostic, not the optimization.  But it looks like they hate the idea:
https://gcc.gnu.org/PR110617.


This is the one thing that makes me use both Clang and GCC to compile,
because while any of them would be enough to build, I want as much
static analysis as I can get, and so I want -fanalyzer (so I need GCC),
but I also use _Nullable (so I need Clang).

If GCC had support for _Nullable, I would have in GCC the superset of
features that I need from both in a single vendor.  Moreover, Clang's
static analyzer is brain-damaged (sorry, but it doesn't have a simple
command line to run it, contrary to GCC's easy -fanalyzer), so having
GCC's analyzer get over those _Nullable qualifiers would be great.

Clang's _Nullable (and _Nonnull) are not very useful outside of analyzer
mode, as there are many cases where the compiler doesn't have enough
information, and the analyzer can get rid of false negatives and
positives.  See: 

I'll back the ask for the qualifiers in GCC, for compatibility with
Clang.

Thanks,
Alex

--

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5



OpenPGP_signature
Description: OpenPGP digital signature


Re: [wish] Flexible array members in unions

2023-05-11 Thread Alejandro Colomar via Gcc
Hi Joseph, Kees,

On 5/12/23 00:52, Joseph Myers wrote:
> On Thu, 11 May 2023, Kees Cook via Gcc wrote:
> 
>> Okay, understood. If this is a C-only thing, we can ignore the C++
>> impact.
> 
> We're a lot more careful lately in WG14 about checking for C++ 
> compatibility issues and expecting approval from the liaison group for 
> anything with possible compatibility concerns for syntax in the common 
> subset of C and C++.  So, no, we can't ignore the C++ impact for adding 
> empty types; it would need careful consideration in the liaison group.
> 
>> What depends on the "different objects have different addresses"
>> principle? And why do unions not break this -- they could point to the
>> same locations within the object? And don't flexible arrays already need
>> special handling in this regard?
> 
> "including a pointer to an object and a subobject at its beginning" and 
> "one is a pointer to one past the end of one array object and the other is 
> a pointer to the start of a different array object that happens to 
> immediately follow the first array object in the address space" are both 
> cases included in the semantics for comparison operators.  If you allow 
> zero-size objects you get more special cases there (and quite possibly 
> affect optimizations based on points-to analysis that can determine 
> pointers are based on different objects, if an object is not known at 
> compile time to have nonzero size).

Since GNU C already supports empty structs, how about allowing that in GCC
with no intention of adding it to ISO C?  We'll see how good it behaves in
GCC, and if so suggest it for inclusion in the standard.

Why should GNU C, which allows empty structures, and de facto supports
flexible arrays in empty structs and in unions (via the empty preceeding
struct and the wrapper struct tricks, as the kernel does), shouldn't
support them officially without tricks?

Apart from violating artificial rules that disallow that use of flexible
arrays, I believe the example program I provided in the first post
doesn't violate aliasing rules or other similar rules that would result
in optimization conflicts.  Does it?

About zero-sized types, a union consisting of only flexible-array members
would effectively be a zero-sized type, so GCC should have similar issues
having this in unions and in empty structs.  So far, I don't see GCC
complaining about such horrible thing as an array of empty structs:

$ cat arr.c
#include 

struct s {};

struct s x[10];

int
main(void)
{
printf("x:%zu, %p\n", sizeof(x), &x);
printf("x[3]: %zu, %p\n", sizeof(x[3]), &x[3]);
}
$ gcc-13 arr.c -Wall -Wextra
$ ./a.out 
x:0, 0x55c5f6d72019
x[3]: 0, 0x55c5f6d72019



So, in GNU C land, I think it is reasonable to add this feature.

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [wish] Flexible array members in unions

2023-05-11 Thread Alejandro Colomar via Gcc
[CC += Kees, Andrew]

[start of thread: 
]

On 5/11/23 18:07, Alejandro Colomar wrote:
> Hi!
> 
> Currently, one can have pseudo-flexible array members in unions with
> [0] syntax, but it's not allowed with [] syntax.
> 
> Here's an example of how it is possible today:
> 
> struct s {
>   ...
> 
>   size_t  n;
>   union {
>   ptrdiff_t  off[0];  // [n]; offsets from s->data.
>   char   data[0];
>   };
> };
> 
> which is useful to have a structure with two (or really several)
> consecutive flexible arrays: one of offsets, which mark the positions
> of data, and another with the actual data.  Below goes an example
> program, which works fine with GCC, and I believe rewriting it to
> not use the union would make it less clear, since I'd need to add
> casts to it.
> 
> It works thanks to [0] pseudo-flexible arrays, but it doesn't
> compile with C99 flexible arrays.  And of course, [0] arrays have
> issues with -fstrict-flex-arrays=3.
> 
> 
> $ cat flexi4.c 
> #include 
> #include 
> #include 
> #include 
> 
> struct s {
>   size_t n;
>   union {
>   ptrdiff_t  off[0];
>   char   data[0];
>   };
> };
> 
> int
> main(void)
> {
>   char  *p;
>   struct s  *s;
> 
>   s = malloc(offsetof(struct s, off) +
>  sizeof(ptrdiff_t) * 2 +
>  sizeof("foobar") + sizeof("baz"));
> 
>   s->n = 2;
>   p = s->data + sizeof(ptrdiff_t) * s->n;
> 
>   s->off[0] = p - s->data;
>   p = stpcpy(p, "foobar") + 1;
>   s->off[1] = p - s->data;
>   p = stpcpy(p, "baz") + 1;
> 
>   puts(s->data + s->off[0]);
>   puts(s->data + s->off[1]);
> 
>   free(s);
> }
> $ gcc-13 -Wall -Wextra -Werror -fanalyzer \
>  -fsanitize=undefined -fsanitize=address \
>  -D_FORTIFY_SOURCE=3 -fstrict-flex-arrays=2 \
>  flexi4.c 
> $ ./a.out 
> foobar
> baz
> $ gcc-13 -Wall -Wextra -Werror -fanalyzer \
>  -fsanitize=undefined -fsanitize=address \
>  -D_FORTIFY_SOURCE=3 -fstrict-flex-arrays=3 \
>  flexi4.c 
> $ ./a.out 
> flexi4.c:27:8: runtime error: index 0 out of bounds for type 'ptrdiff_t [*]'
> flexi4.c:29:8: runtime error: index 1 out of bounds for type 'ptrdiff_t [*]'
> flexi4.c:32:23: runtime error: index 0 out of bounds for type 'ptrdiff_t [*]'
> foobar
> flexi4.c:33:23: runtime error: index 1 out of bounds for type 'ptrdiff_t [*]'
> baz
> 
> 
> Would you allow flexible array members in unions?  Is there any
> strong reason to disallow them?
> 
> Currently, I get:
> 
> $ gcc-13 -Wall -Wextra -fanalyzer \
>  -fsanitize=undefined -fsanitize=address \
>  -D_FORTIFY_SOURCE=3 -fstrict-flex-arrays=3 \
>  flexi4-true.c 
> flexi4-true.c:9:28: error: flexible array member in union
> 9 | ptrdiff_t  off[];
>   |^~~
> flexi4-true.c:10:28: error: flexible array member in union
>10 | char   data[];
>   |^~~~
> 
> 

Currently, the Linux kernel has to go through some hoops due to this
restriction:


$ grepc -tm __DECLARE_FLEX_ARRAY *
include/uapi/linux/stddef.h:42:
#define __DECLARE_FLEX_ARRAY(TYPE, NAME)\
struct { \
struct { } __empty_ ## NAME; \
TYPE NAME[]; \
}


tools/include/uapi/linux/stddef.h:42:
#define __DECLARE_FLEX_ARRAY(TYPE, NAME)\
struct { \
struct { } __empty_ ## NAME; \
TYPE NAME[]; \
}
$ grep -rnB2 __DECLARE_FLEX_ARRAY * | head
include/uapi/rdma/rdma_user_rxe.h-153-  __u32   reserved;
include/uapi/rdma/rdma_user_rxe.h-154-  union {
include/uapi/rdma/rdma_user_rxe.h:155:  __DECLARE_FLEX_ARRAY(__u8, 
inline_data);
include/uapi/rdma/rdma_user_rxe.h:156:  __DECLARE_FLEX_ARRAY(__u8, 
atomic_wr);
include/uapi/rdma/rdma_user_rxe.h:157:  __DECLARE_FLEX_ARRAY(struct 
rxe_sge, sge);
--
include/uapi/scsi/scsi_netlink_fc.h-53- union {
include/uapi/scsi/scsi_netlink_fc.h-54- __u32 event_data;
include/uapi/scsi/scsi_netlink_fc.h:55: __DECLARE_FLEX_ARRAY(__u8, 
event_data_flex);
--




-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


[wish] Flexible array members in unions

2023-05-11 Thread Alejandro Colomar via Gcc
Hi!

Currently, one can have pseudo-flexible array members in unions with
[0] syntax, but it's not allowed with [] syntax.

Here's an example of how it is possible today:

struct s {
...

size_t  n;
union {
ptrdiff_t  off[0];  // [n]; offsets from s->data.
char   data[0];
};
};

which is useful to have a structure with two (or really several)
consecutive flexible arrays: one of offsets, which mark the positions
of data, and another with the actual data.  Below goes an example
program, which works fine with GCC, and I believe rewriting it to
not use the union would make it less clear, since I'd need to add
casts to it.

It works thanks to [0] pseudo-flexible arrays, but it doesn't
compile with C99 flexible arrays.  And of course, [0] arrays have
issues with -fstrict-flex-arrays=3.


$ cat flexi4.c 
#include 
#include 
#include 
#include 

struct s {
size_t n;
union {
ptrdiff_t  off[0];
char   data[0];
};
};

int
main(void)
{
char  *p;
struct s  *s;

s = malloc(offsetof(struct s, off) +
   sizeof(ptrdiff_t) * 2 +
   sizeof("foobar") + sizeof("baz"));

s->n = 2;
p = s->data + sizeof(ptrdiff_t) * s->n;

s->off[0] = p - s->data;
p = stpcpy(p, "foobar") + 1;
s->off[1] = p - s->data;
p = stpcpy(p, "baz") + 1;

puts(s->data + s->off[0]);
puts(s->data + s->off[1]);

free(s);
}
$ gcc-13 -Wall -Wextra -Werror -fanalyzer \
 -fsanitize=undefined -fsanitize=address \
 -D_FORTIFY_SOURCE=3 -fstrict-flex-arrays=2 \
 flexi4.c 
$ ./a.out 
foobar
baz
$ gcc-13 -Wall -Wextra -Werror -fanalyzer \
 -fsanitize=undefined -fsanitize=address \
 -D_FORTIFY_SOURCE=3 -fstrict-flex-arrays=3 \
 flexi4.c 
$ ./a.out 
flexi4.c:27:8: runtime error: index 0 out of bounds for type 'ptrdiff_t [*]'
flexi4.c:29:8: runtime error: index 1 out of bounds for type 'ptrdiff_t [*]'
flexi4.c:32:23: runtime error: index 0 out of bounds for type 'ptrdiff_t [*]'
foobar
flexi4.c:33:23: runtime error: index 1 out of bounds for type 'ptrdiff_t [*]'
baz


Would you allow flexible array members in unions?  Is there any
strong reason to disallow them?

Currently, I get:

$ gcc-13 -Wall -Wextra -fanalyzer \
 -fsanitize=undefined -fsanitize=address \
 -D_FORTIFY_SOURCE=3 -fstrict-flex-arrays=3 \
 flexi4-true.c 
flexi4-true.c:9:28: error: flexible array member in union
9 | ptrdiff_t  off[];
  |^~~
flexi4-true.c:10:28: error: flexible array member in union
   10 | char   data[];
  |^~~~


Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH v3] sockaddr.3type: POSIX Issue 8 will solve strict-aliasing issues with these types

2023-04-21 Thread Alejandro Colomar via Gcc
Link: 
Reported-by: Bastien Roucariès 
Reported-by: Alejandro Colomar 
Reviewed-by: Eric Blake 
Cc: glibc 
Cc: GCC 
Cc: Stefan Puiu 
Cc: Igor Sysoev 
Cc: Rich Felker 
Cc: Andrew Clayton 
Cc: Richard Biener 
Cc: Zack Weinberg 
Cc: Florian Weimer 
Cc: Joseph Myers 
Cc: Jakub Jelinek 
Cc: Sam James 
Signed-off-by: Alejandro Colomar 
---

Hi Eric,

I took your informal review as a "Reviewed-by".  Please confirm.
I've also modified the small wording thingy you suggested.

I'll float this patch in the list in case anyone has comments, and
will push some time this weekend (depending on many variables).

Cheers,
Alex

 man3type/sockaddr.3type | 8 
 1 file changed, 8 insertions(+)

diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type
index 2fdf56c59..cf8d601f5 100644
--- a/man3type/sockaddr.3type
+++ b/man3type/sockaddr.3type
@@ -117,6 +117,14 @@ .SH HISTORY
 was invented by POSIX.
 See also
 .BR accept (2).
+.PP
+These structures were invented before modern ISO C strict-aliasing rules.
+If aliasing rules are applied strictly,
+these structures would be extremely difficult to use
+without invoking Undefined Behavior.
+POSIX Issue 8 will fix this by requiring that implementations
+make sure that these structures
+can be safely used as they were designed.
 .SH NOTES
 .I socklen_t
 is also defined in
-- 
2.40.0



Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-21 Thread Alejandro Colomar via Gcc


On 4/21/23 16:58, Alejandro Colomar wrote:
> Hi Eric,
> 
> On 4/14/23 18:08, Zack Weinberg via Libc-alpha wrote:
>> On 2023-04-06 3:37 PM, Eric Blake wrote:
>>> Since Issue 7 is tied to C99, and Issue 8 will be tied to C17, both of
>>> which use the same section number despite being a different edition of
>>> the C standard, being that specific may work.  Or, we might try
>>> something focusing more on wording instead of document location, as
>>> in:
>>>
>>> Additionally, the structures shall be defined in such a way that the
>>> compiler treats an access to the stored value of the sa_family_t
>>> member of any of these structures, via an lvalue expression whose type
>>> involves any other one of these structures, as permissible even if the
>>> types involved would not otherwise be deemed compatible with the
>>> effective type of the object ultimately being accessed.
> 
> The wording I see in 
> doesn't seem to cover the case of aliasing a sockaddr_storage as a
> protocol-specific address for setting other members.
> 
> Aliasing rules don't allow one to declare an object of type
> sockaddr_storage and then fill the structure as if it were another
> structure, even if alignment and size are correct.  We would need
> some wording that says something like:
> 
> When a pointer to a sockaddr_storage structure is first aliased as a
> pointer to a protocol-specific address structure, the effective type
> of the object will be set to the protocol-specific structure.
> 
> This is similar to what happens when malloc(3) is assigned to a
> non-character type.  That's a big hammer, but it does the job.  Maybe
> we would need some looser language?  I CCd GCC, in case they have
> concerns about this wording.
> 
> Cheers,
> Alex
> 
>>
>> I quite like this way of putting it.  It subsumes both what I wrote and 
>> the related potential headache with deciding whether the sa_family_t 
>> field is considered an object or just a range of bytes within a larger 
>> object.
>>
>> zw
> 

For the man pages, I've rewritten it to the following:


$ git diff
diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type
index 2fdf56c59..e610aa0f5 100644
--- a/man3type/sockaddr.3type
+++ b/man3type/sockaddr.3type
@@ -117,6 +117,14 @@ .SH HISTORY
 was invented by POSIX.
 See also
 .BR accept (2).
+.PP
+These structures were invented before modern ISO C strict-aliasing rules.
+If aliasing rules are applied strictly,
+these structures would be impossible to use
+without invoking Undefined Behavior (UB).
+POSIX Issue 8 will fix this by requiring that implementations
+make sure that these structures
+can be safely used as they were designed.
 .SH NOTES
 .I socklen_t
 is also defined in


I guess this is simple enough that it should work as documentation.

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-21 Thread Alejandro Colomar via Gcc
Hi Eric,

On 4/14/23 18:08, Zack Weinberg via Libc-alpha wrote:
> On 2023-04-06 3:37 PM, Eric Blake wrote:
>> Since Issue 7 is tied to C99, and Issue 8 will be tied to C17, both of
>> which use the same section number despite being a different edition of
>> the C standard, being that specific may work.  Or, we might try
>> something focusing more on wording instead of document location, as
>> in:
>>
>> Additionally, the structures shall be defined in such a way that the
>> compiler treats an access to the stored value of the sa_family_t
>> member of any of these structures, via an lvalue expression whose type
>> involves any other one of these structures, as permissible even if the
>> types involved would not otherwise be deemed compatible with the
>> effective type of the object ultimately being accessed.

The wording I see in 
doesn't seem to cover the case of aliasing a sockaddr_storage as a
protocol-specific address for setting other members.

Aliasing rules don't allow one to declare an object of type
sockaddr_storage and then fill the structure as if it were another
structure, even if alignment and size are correct.  We would need
some wording that says something like:

When a pointer to a sockaddr_storage structure is first aliased as a
pointer to a protocol-specific address structure, the effective type
of the object will be set to the protocol-specific structure.

This is similar to what happens when malloc(3) is assigned to a
non-character type.  That's a big hammer, but it does the job.  Maybe
we would need some looser language?  I CCd GCC, in case they have
concerns about this wording.

Cheers,
Alex

> 
> I quite like this way of putting it.  It subsumes both what I wrote and 
> the related potential headache with deciding whether the sa_family_t 
> field is considered an object or just a range of bytes within a larger 
> object.
> 
> zw

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-06 Thread Alejandro Colomar via Gcc
Hi Eric,

On 4/6/23 18:24, Eric Blake wrote:
> On Wed, Apr 05, 2023 at 02:42:04AM +0200, Alejandro Colomar wrote:
>> Hi Eric,
>>
>> I'm going to reply both your emails here so that GCC is CCed, and they can
>> suggest better stuff.  I'm worried about sending something to POSIX without
>> enough eyes checking it.  So this will be a long email.
> 
> Because your mail landed in a publicly archived mailing list, the
> POSIX folks saw it anyways ;)

:)

> 
> ...
>>>
>>> Whether gcc already has all the attributes you need is not my area of
>>> expertise.  In my skim of the glibc list conversation, I saw mention
>>> of attribute [[gnu:transparent_union]] rather than [[__may_alias__]] -
>>> if that's a better implementation-defined extension that does what we
>>> need, then use it.  The standard developers were a bit uncomfortable
>>> directly putting [[gnu:transparent_union]] in the standard, but
>>> [[__may_alias__]] was noncontroversial (it's in the namespace reserved
>>> for the implementation)
>>
>> Not really; implementation-defined attributes are required to use an
>> implementation-defined prefix like 'gnu::'.  So [[__may_alias__]] is
>> reserved by ISO C, AFAIR.  Maybe it would be better to just mention
>> attributes without any specific attribute name; being fuzzy about it
>> would help avoid making promises that we can't hold.
> 
> On this point, the group agreed, and we intentionally loosened to
> wording to just mention an implementation-defined extension, rather
> than giving any specific attribute name.
> 
> ...
>>
>> I would just make it more fuzzy about which standard version did what.
>> How about this?:
>>
>> [[
>> Note that defining the sockaddr_storage and sockaddr structures using
>> only mechanisms defined in editions of the ISO C standard may produce
>> aliasing diagnostics.  Because of the large body of existing code
>> utilizing sockets in a way that could trigger undefined behavior due
>> to strict aliasing rules, this standard mandates that the various socket
>> address structures can alias each other for accessing their first member,
> 
> The sa_family_t member is not necessarily the first member on all
> platforms (it happens to be first in Linux, but as a counter-example,
> https://man.freebsd.org/cgi/man.cgi?query=unix&sektion=4 shows
> sun_family as the second one-byte field in struct sockaddr_un).  The
> emphasis is on derefencing the family member (whatever offset it is
> at) to learn what cast to use to then safely access the rest of the
> storage.
> 
> As such, here's the updated wording that the Austin Group tried today
> (and we plan on starting a 30-day interpretation feedback window if
> there are still adjustments to be made to the POSIX wording):
> 
> https://austingroupbugs.net/view.php?id=1641#c6255

Thanks!  That wording (both paragraphs) LGTM.

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-04 Thread Alejandro Colomar via Gcc
Hi Eric,

I'm going to reply both your emails here so that GCC is CCed, and they can
suggest better stuff.  I'm worried about sending something to POSIX without
enough eyes checking it.  So this will be a long email.


On 3/30/23 20:36, eblake wrote:
> On Thu, Mar 30, 2023 at 06:25:30PM +0200, Alejandro Colomar wrote:
>> Hi Eric!
>>
>> On 3/30/23 17:22, Austin Group Bug Tracker via austin-group-l at The Open 
>> Group wrote:
>>>
>>> The following issue has been RESOLVED. 
>>> == 
>>> https://austingroupbugs.net/view.php?id=1641
> ...
>>
>> Thanks for taking care of this bug!
> 
> My pleasure.

:-)

> 
>>
>>
>> On 3/30/23 17:20, Austin Group Bug Tracker via austin-group-l at The Open 
>> Group wrote:
>>> On page 386 line 13115 section  DESCRIPTION, change:
>>>
>>> When a pointer to a sockaddr_storage structure is cast as a pointer to 
>>> a sockaddr structure, the ss_family field of the sockaddr_storage structure 
>>> shall map onto the sa_family field of the sockaddr structure. When a 
>>> pointer to a sockaddr_storage structure is cast as a pointer to a 
>>> protocol-specific address structure, the ss_family field shall map onto a 
>>> field of that structure that is of type sa_family_t and that identifies the 
>>> protocol’s address family.
>>>
>>> to:
>>>
>>> When a pointer to a sockaddr_storage structure is cast as a pointer to 
>>> a sockaddr structure, or vice versa, the ss_family field of the 
>>> sockaddr_storage structure shall map onto the sa_family field of the 
>>> sockaddr structure. When a pointer to a sockaddr_storage structure is cast 
>>> as a pointer to a protocol-specific address structure, or vice versa, the 
>>> ss_family field shall map onto a field of that structure that is of type 
>>> sa_family_t and that identifies the protocol’s address family. When a 
>>> pointer to a sockaddr structure is cast as a pointer to a protocol-specific 
>>> address structure, or vice versa, the sa_family field shall map onto a 
>>> field of that structure that is of type sa_family_t and that identifies the 
>>> protocol’s address family. Additionally, the structures shall be defined in 
>>> such a way that these casts do not cause the compiler to produce 
>>> diagnostics about aliasing issues when compiling conforming application 
>>> (xref to XBD section 2.2) source files.
>>
>> I will add a CAVEATS section in sockaddr(3type) covering this, and will
>> CC you just to check.
> 
> Sure, I'll be happy to review.
> 
> The intent from the meeting (and perhaps requires a bit of reading
> between the lines compared to what was captured as the approved text)
> was that implementations MUST ensure that existing code does not get
> miscompiled under the guise of undefined behavior, but without stating
> how to do it other than suggesting that implementation-specific
> extensions may be needed (somewhat similar as how POSIX requires that
> when dlsym() returns a void* for a function entry point, converting
> that pointer to a function pointer that can then be called MUST be
> compiled to work as intended, even though C doesn't define it).  We
> want the burden to be on the libc and system header providers to
> guarantee defined behavior, and not on the average coder to make
> careful use of memcpy() between storage of different effective types
> to avoid what might be otherwise undefined if it were written using
> types declared using only C99 syntax.
> 
> Whether gcc already has all the attributes you need is not my area of
> expertise.  In my skim of the glibc list conversation, I saw mention
> of attribute [[gnu:transparent_union]] rather than [[__may_alias__]] -
> if that's a better implementation-defined extension that does what we
> need, then use it.  The standard developers were a bit uncomfortable
> directly putting [[gnu:transparent_union]] in the standard, but
> [[__may_alias__]] was noncontroversial (it's in the namespace reserved
> for the implementation)

Not really; implementation-defined attributes are required to use an
implementation-defined prefix like 'gnu::'.  So [[__may_alias__]] is
reserved by ISO C, AFAIR.  Maybe it would be better to just mention
attributes without any specific attribute name; being fuzzy about it
would help avoid making promises that we can't hold.

> and deemed to be a sufficient hint for
> developers to figure out that they can use whatever works best to meet
> the actual requirement of not letting the compiler optimize away
> socket operations under the premise of undefined behavior.
> 
>>>
>>> On page 390 line 13260 section  APPLICATION USAGE, append a 
>>> sentence:
>>>
>>> Note that this example only deals with size and alignment; see 
>>> RATIONALE for additional issues related to these structures.
>>>
>>>
>>>
>>> On page 390 line 13291 section , change RATIONALE from "None" 
>>> to:
>>>
>>> Note that defining the sockaddr_storage and sockaddr structures using 
>>> only mechanism

[PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-03-30 Thread Alejandro Colomar via Gcc
POSIX.1 Issue 8 will fix the long-standing issue with sockaddr APIs,
which inevitably caused UB either on user code, libc, or more likely,
both.  sockaddr_storage has been clarified to be implemented in a manner
that aliasing it is safe (suggesting a unnamed union, or other compiler
magic).

Link: 
Reported-by: Bastien Roucariès 
Reported-by: Alejandro Colomar 
Cc: glibc 
Cc: GCC 
Cc: Eric Blake 
Cc: Stefan Puiu 
Cc: Igor Sysoev 
Cc: Rich Felker 
Cc: Andrew Clayton 
Cc: Richard Biener 
Cc: Zack Weinberg 
Cc: Florian Weimer 
Cc: Joseph Myers 
Cc: Jakub Jelinek 
Cc: Sam James 
Signed-off-by: Alejandro Colomar 
---

Hi all,

This is my proposal for documenting the POSIX decission of fixing the
definition of sockaddr_storage.  Bastien, I believe you had something
similar in mind; please review.  Eric, thanks again for the fix!  Could
you please also have a look at this?

Cheers,

Alex

 man3type/sockaddr.3type | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type
index 32c3c5bd0..d1db87d5d 100644
--- a/man3type/sockaddr.3type
+++ b/man3type/sockaddr.3type
@@ -23,6 +23,14 @@ .SH SYNOPSIS
 .PP
 .B struct sockaddr_storage {
 .BR "sa_family_t ss_family;" "  /* Address family */"
+.PP
+.RS 4
+/* This structure is not really implemented this way.  It may be
+\&   implemented with an unnamed union or some compiler magic to
+\&   avoid breaking aliasing rules when accessed as any other of the
+\&   sockaddr_* structures documented in this page.  See CAVEATS.
+\& */
+.RE
 .B };
 .PP
 .BR typedef " /* ... */ " socklen_t;
@@ -122,6 +130,20 @@ .SH NOTES
 .I 
 and
 .IR  .
+.SH CAVEATS
+To avoid breaking aliasing rules,
+programs that use functions that receive pointers to
+.I sockaddr
+structures should declare objects of type
+.IR sockaddr_storage ,
+which is defined in a way that it
+can be accessed as any of the different structures defined in this page.
+Failure to do so may result in Undefined Behavior.
+.PP
+New functions should be written to accept pointers to
+.I sockaddr_storage
+instead of the traditional
+.IR sockaddr .
 .SH SEE ALSO
 .BR accept (2),
 .BR bind (2),
-- 
2.39.2



  1   2   3   >