https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103305

Pekka S <p...@gcc-bugzilla.mail.kapsi.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |p...@gcc-bugzilla.mail.kaps
                   |                            |i.fi

--- Comment #14 from Pekka S <p...@gcc-bugzilla.mail.kapsi.fi> ---
(In reply to Jonathan Wakely from comment #6)
> (In reply to Jonathan Wakely from comment #5)
> >      static const mask blank    = space;
> 
> We might want to use blank = _ISspace | _ISblank for this last one, but I
> don't really understand what newlib defines those categories as:
> 
> 
> #define isblank(__c) \
>   __extension__ ({ __typeof__ (__c) __x = (__c);              \
>         (__ctype_lookup(__x)&_B) || (int) (__x) == '\t';})
>         (__ctype_lookup(__x)&_ISblank) || (int) (__x) == '\t';})
> 
> This definition is weird ... why is '\t' not already handled by _ISblank?

It has been attempted in the past:

https://sourceware.org/legacy-ml/newlib/2009/threads.html#00342

The used 8-bit mask is simply not wide enough to disambiguate all POSIX
character classes;  Namely space, blank and print classes are the problematic
ones to distinguish properly.  The naming of newlib character classes does not
fully align with POSIX, and this has to do with the restrictions that come from
space concerns and limitations.

Also, libstdc++-v3/config/locale/newlib/ctype_members.cc does not handle blank
class even though newlib supports wctype("blank").  As explained above, in this
case it really doesn't matter, since matching a character to a (true POSIX)
class using a mask bit alone is not possible.

Anyway, I made a similar patch but never got around submitting it.  I also used
_ISblank | _ISspace since IMHO it is "less wrong" than _ISspace (or equal to
space) alone and added a note explaining the issue.

(Yes, I was about to repeat the history.)

Reply via email to