Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
On 10/27/22 13:16, Ben Boeckel wrote: This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. Signed-off-by: Ben Boeckel --- libcpp/ChangeLog | 6 ++ libcpp/charset.cc | 18 ++ libcpp/internal.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 4d707277531..4e2c7900ae2 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2022-10-27 Ben Boeckel + + * include/charset.cc: Add `_cpp_valid_utf8_str` which determines + whether a C string is valid UTF-8 or not. + * include/internal.h: Add prototype for `_cpp_valid_utf8_str`. + 2022-10-27 Ben Boeckel * include/charset.cc: Reject encodings of codepoints above 0x10. diff --git a/libcpp/charset.cc b/libcpp/charset.cc index e9da6674b5f..0524ab6beba 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } Please add a comment before the function. +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen(name); + cppchar_t cp; + + while (*in) +{ + if (one_utf8_to_cppchar(, , )) + { + return false; + } +} We usually omit unnecessary { } around single statements. + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index badfd1b40da..4f2dd4a2f5c 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t,
Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote: > On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote: > > This simplifies the interface for other UTF-8 validity detections > > when a > > simple "yes" or "no" answer is sufficient. > > > > Signed-off-by: Ben Boeckel > > --- > > libcpp/ChangeLog | 6 ++ > > libcpp/charset.cc | 18 ++ > > libcpp/internal.h | 2 ++ > > 3 files changed, 26 insertions(+) > > > > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog > > index 4d707277531..4e2c7900ae2 100644 > > --- a/libcpp/ChangeLog > > +++ b/libcpp/ChangeLog > > @@ -1,3 +1,9 @@ > > +2022-10-27 Ben Boeckel > > + > > + * include/charset.cc: Add `_cpp_valid_utf8_str` which > > determines > > + whether a C string is valid UTF-8 or not. > > + * include/internal.h: Add prototype for > > `_cpp_valid_utf8_str`. > > + > > 2022-10-27 Ben Boeckel > > > > * include/charset.cc: Reject encodings of codepoints above > > 0x10. > > The patch looks good to me, with the same potential caveat that you > might need to move the ChangeLog entry from the patch "body" to the > leading blurb, to satisfy: > ./contrib/gcc-changelog/git_check_commit.py Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in v3 pending some time for further reviews. THanks, --Ben
Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote: > This simplifies the interface for other UTF-8 validity detections > when a > simple "yes" or "no" answer is sufficient. > > Signed-off-by: Ben Boeckel > --- > libcpp/ChangeLog | 6 ++ > libcpp/charset.cc | 18 ++ > libcpp/internal.h | 2 ++ > 3 files changed, 26 insertions(+) > > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog > index 4d707277531..4e2c7900ae2 100644 > --- a/libcpp/ChangeLog > +++ b/libcpp/ChangeLog > @@ -1,3 +1,9 @@ > +2022-10-27 Ben Boeckel > + > + * include/charset.cc: Add `_cpp_valid_utf8_str` which > determines > + whether a C string is valid UTF-8 or not. > + * include/internal.h: Add prototype for > `_cpp_valid_utf8_str`. > + > 2022-10-27 Ben Boeckel > > * include/charset.cc: Reject encodings of codepoints above > 0x10. The patch looks good to me, with the same potential caveat that you might need to move the ChangeLog entry from the patch "body" to the leading blurb, to satisfy: ./contrib/gcc-changelog/git_check_commit.py Thanks Dave
[PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. Signed-off-by: Ben Boeckel --- libcpp/ChangeLog | 6 ++ libcpp/charset.cc | 18 ++ libcpp/internal.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 4d707277531..4e2c7900ae2 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2022-10-27 Ben Boeckel + + * include/charset.cc: Add `_cpp_valid_utf8_str` which determines + whether a C string is valid UTF-8 or not. + * include/internal.h: Add prototype for `_cpp_valid_utf8_str`. + 2022-10-27 Ben Boeckel * include/charset.cc: Reject encodings of codepoints above 0x10. diff --git a/libcpp/charset.cc b/libcpp/charset.cc index e9da6674b5f..0524ab6beba 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen(name); + cppchar_t cp; + + while (*in) +{ + if (one_utf8_to_cppchar(, , )) + { + return false; + } +} + + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index badfd1b40da..4f2dd4a2f5c 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t, -- 2.37.3