[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-23 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #26 from Adam Wozniak  ---
(In reply to Jonathan Wakely from comment #19)
> (In reply to Andreas Schwab from comment #10)
> > It is a valid preprocessing token ("non-whitespace character that cannot be
> > one of the above").
> 
> Ah right, yes. It's a preprocessing token, but is never converted to a
> token, so doesn't need to be a keyword, identifier etc.

i feel like it should work for stringification reasons too.  e.g.

#define X(x) #x
const char *letterA = X(A);   // this works
const char *notequal = X(≠);  // this does not

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-23 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #24 from Adam Wozniak  ---
(In reply to Jonathan Wakely from comment #23)
> (In reply to Adam Wozniak from comment #20)
> > i get this response:
> > 
> > This page contains the following errors:
> > error on line 20 at column 54: AttValue: " or ' expected
> > Below is a rendering of the page up to the first error.
> 
> That seems to be a problem at your end, the page is well-formed:
> https://validator.w3.org/nu/?doc=https%3A%2F%2Fgcc.gnu.org%2Fgit%2Fgitweb.
> cgi%3Fp%3Dgcc.git%3Bh%3D7d112d6670a0e0e662

works now.  did not before.  weird.

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-23 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #22 from Adam Wozniak  ---
(In reply to Jonathan Wakely from comment #19)
> (In reply to Andreas Schwab from comment #10)
> > It is a valid preprocessing token ("non-whitespace character that cannot be
> > one of the above").
> 
> Ah right, yes. It's a preprocessing token, but is never converted to a
> token, so doesn't need to be a keyword, identifier etc.

Correct.

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-23 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #21 from Adam Wozniak  ---
(In reply to Andrew Pinski from comment #16)
> It is funny arguing with folks who write parts of GCC on an idea of
> integrated vs seperate preprocessor really.

yeah, i've been pounding out C since the late 80s, my dinosaur is probably
showing.  they'll probably call me in 2038 like they called the old COBOL
programmers for Y2K.

it's weird to me to think of them not separately.  i've even used the C
preprocessor in contexts unrelated to parsing C code.

it's also weird to see someone who thinks of the C preprocessor only in terms
of its service to the compiler.

whatever, that's drifting off topic.

main point for me was, i don't see any other reason to disallow these unicode
chars other than "the spec says so".  i don't see any HARM in allowing them,
and i certainly see use cases where there is BENEFIT to allowing them.

not all macro args get turned into C++ identifiers.  some get thrown away. 
some get stringified.  in the particular case where i tripped over this, they
get thrown away, and i have ANOTHER postprocessing step that picks them up and
does other magic stuff with them.

also, there's probably a really good case for allowing some of these things,
like emoji, actually be allowed as C++ identifiers.

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-23 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #20 from Adam Wozniak  ---
(In reply to Andrew Pinski from comment #17)
> (In reply to Adam Wozniak from comment #13)
> > (In reply to Jakub Jelinek from comment #11)
> > > Bisection points to r10-3309-g7d112d6670a0e0e662
> > 
> > that link gives me an error
> 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=7d112d6670a0e0e662
> 
> Does that link work?

i get this response:

This page contains the following errors:
error on line 20 at column 54: AttValue: " or ' expected
Below is a rendering of the page up to the first error.

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #15 from Adam Wozniak  ---
(In reply to Jonathan Wakely from comment #6)
> ≠ cannot be used in an identifier, and it's none of the other forms either.

at the risk of beating a dead horse, what you are saying here is that ≠ simply
cannot be used, ever, anywhere, in C/C++.

that seems like kind of a waste.  a whole raft of unicode characters that
simply cannot be used.  so much for embracing unicode.  Maybe someone wants to
name a variable "§32" for some reason, but can't because...

why exactly?

because the spec says so.

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #14 from Adam Wozniak  ---
(In reply to Adam Wozniak from comment #12)
> (In reply to Jonathan Wakely from comment #9)
> > (In reply to Adam Wozniak from comment #8)
> > > i don't think of the preprocessor as part of the compiler.
> > > it's a different step, a different executable, that happens BEFORE the
> > > compiler.
> > 
> > No it isn't. Preprocessing is done by the compiler, using libcpp. There is
> > no different executable. GCC has worked that way for many, many years.
> 
> 
> No, i am fairly CERTAIN they are different executables.
> 
> i can even invoke one without the other; /lib/cpp can be invoked directly,
> and g++ can be told to skip the preprocessor by renaming your source file
> *.i or *.ii.
> 
> $ ls -la /lib/cpp
> lrwxrwxrwx 1 root root 21 May 11  2022 /lib/cpp -> /etc/alternatives/cpp
> $ ls -la /etc/alternatives/cpp
> lrwxrwxrwx 1 root root 12 May 11  2022 /etc/alternatives/cpp -> /usr/bin/cpp
> $ ls -la /usr/bin/cpp
> lrwxrwxrwx 1 root root 6 May 11  2022 /usr/bin/cpp -> cpp-11
> $ ls -la /usr/bin/cpp-11
> lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/cpp-11 ->
> x86_64-linux-gnu-cpp-11
> $ ls -la /usr/bin/x86_64-linux-gnu-cpp-11
> -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-cpp-11
> $ which g++
> /usr/bin/g++
> $ ls -la /usr/bin/g++
> lrwxrwxrwx 1 root root 21 May 22 16:06 /usr/bin/g++ -> /etc/alternatives/g++
> $ ls -la /etc/alternatives/g++
> lrwxrwxrwx 1 root root 15 May 22 19:31 /etc/alternatives/g++ ->
> /usr/bin/g++-11
> $ ls -la /usr/bin/g++-11
> lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/g++-11 ->
> x86_64-linux-gnu-g++-11
> $ ls -la /usr/bin/x86_64-linux-gnu-g++-11
> -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-g++-11

lest someone claim they are the same because of identical sizes...

$ md5sum /usr/bin/x86_64-linux-gnu-g++-11
f0b26412421754aa03b9457a4d2ee40c  /usr/bin/x86_64-linux-gnu-g++-11

$ md5sum /usr/bin/x86_64-linux-gnu-cpp-11
3bddc1f50d7631ad22da0f875babe7a3  /usr/bin/x86_64-linux-gnu-cpp-11

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #13 from Adam Wozniak  ---
(In reply to Jakub Jelinek from comment #11)
> Bisection points to r10-3309-g7d112d6670a0e0e662

that link gives me an error

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #12 from Adam Wozniak  ---
(In reply to Jonathan Wakely from comment #9)
> (In reply to Adam Wozniak from comment #8)
> > i don't think of the preprocessor as part of the compiler.
> > it's a different step, a different executable, that happens BEFORE the
> > compiler.
> 
> No it isn't. Preprocessing is done by the compiler, using libcpp. There is
> no different executable. GCC has worked that way for many, many years.


No, i am fairly CERTAIN they are different executables.

i can even invoke one without the other; /lib/cpp can be invoked directly, and
g++ can be told to skip the preprocessor by renaming your source file *.i or
*.ii.

$ ls -la /lib/cpp
lrwxrwxrwx 1 root root 21 May 11  2022 /lib/cpp -> /etc/alternatives/cpp
$ ls -la /etc/alternatives/cpp
lrwxrwxrwx 1 root root 12 May 11  2022 /etc/alternatives/cpp -> /usr/bin/cpp
$ ls -la /usr/bin/cpp
lrwxrwxrwx 1 root root 6 May 11  2022 /usr/bin/cpp -> cpp-11
$ ls -la /usr/bin/cpp-11
lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/cpp-11 ->
x86_64-linux-gnu-cpp-11
$ ls -la /usr/bin/x86_64-linux-gnu-cpp-11
-rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-cpp-11
$ which g++
/usr/bin/g++
$ ls -la /usr/bin/g++
lrwxrwxrwx 1 root root 21 May 22 16:06 /usr/bin/g++ -> /etc/alternatives/g++
$ ls -la /etc/alternatives/g++
lrwxrwxrwx 1 root root 15 May 22 19:31 /etc/alternatives/g++ -> /usr/bin/g++-11
$ ls -la /usr/bin/g++-11
lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/g++-11 ->
x86_64-linux-gnu-g++-11
$ ls -la /usr/bin/x86_64-linux-gnu-g++-11
-rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-g++-11

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #8 from Adam Wozniak  ---
(In reply to Jonathan Wakely from comment #6)
> That isn't the point. The compiler has to tokenize the input in order to
> perform the preprocessing step. That means it has to be able to decide what
> the bytes comprising the ≠ mean. Are they multiple tokens? A single token
> consisting of an identifier? A C++ operator?
> 
> The standard says "Each preprocessing token that is converted to a token
> (5.6) shall have the lexical form of a keyword, an identifier, a literal, or
> an operator or punctuator."
> 
> ≠ cannot be used in an identifier, and it's none of the other forms either.
> 
> > it should be perfectly legal to use these as arguments.
> 
> By that argument, you could say X(£), but that isn't allowed either.
> 
> > note the emoji passes through flawlessly.
> 
> Not with -Wpedantic

i would argue that X(£) should also be allowed.
i don't think of the preprocessor as part of the compiler.
it's a different step, a different executable, that happens BEFORE the
compiler.
hence the name, PREprocessor.

i cannot argue with "the standard", however.

[Bug c++/109936] error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Adam Wozniak  changed:

   What|Removed |Added

 Resolution|INVALID |---
 Status|RESOLVED|UNCONFIRMED
Version|11.3.0  |12.1.0

--- Comment #4 from Adam Wozniak  ---
reopening.  this is not at all "expected".

C++ papers P1041R4 and P1139R2 cover literal constants in code.
they do not at all cover anything about arguments to C preprocessor macros.

in this case, the macro generates no code.
it should be perfectly legal to use these as arguments.
note the emoji passes through flawlessly.

bug also exists in 12.1.0, so updating "Version".

[Bug c++/109936] New: error: extended character ≠ is not valid in an identifier

2023-05-22 Thread adam at wozniakconsulting dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Bug ID: 109936
   Summary: error: extended character ≠ is not valid in an
identifier
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: adam at wozniakconsulting dot com
  Target Milestone: ---

Created attachment 55138
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55138=edit
cpp file that demonstrates bug

#define X(x)
X(樂) // emojis work
X(≠)  // this "not equal" does NOT work!

///
#if 0
compile with "g++ -c bad.cpp" gives:

bad.cpp:3:3: error: extended character ≠ is not valid in an identifier
3 | X(≠)
  |   ^

compile with "g++ -c -fextended-identifiers bad.cpp" gives the same error.

g++ --version says:

g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/lib/cpp --version says:

cpp (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


manual for both gcc and cpp says:

   -fextended-identifiers
  Accept universal character names and extended characters in
  identifiers.  This option is enabled by default for C99 (and later
  C standard versions) and C++.

BTW, i get similar error with the following unicode code points.  while some
may have reasonable explanations, many do not.

0080 - 00a7
00a9
00ab - 00ac
00ae
00b0 - 00b1
00b6
00bb
00bf
00d7
00f7
0300 - 036f
1680
180e
1dc0 - 1dff
2000 - 200a
200e - 2029
202f - 203e
2041 - 2053
2055 - 205f
20d0 - 20ff
2190 - 245f
2500 - 2775
2794 - 2bff
2e00 - 2e7f
3000 - 3003
3008 - 3020
3030
e000 - f8ff
fdd0 - fdef
fe20 - fe2f
fe45 - fe46

#endif