Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-11-07 Thread Jason Merrill via Gcc-patches

On 10/27/22 13:16, Ben Boeckel wrote:

This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

Signed-off-by: Ben Boeckel 
---
  libcpp/ChangeLog  |  6 ++
  libcpp/charset.cc | 18 ++
  libcpp/internal.h |  2 ++
  3 files changed, 26 insertions(+)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 4d707277531..4e2c7900ae2 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  
+
+   * include/charset.cc: Add `_cpp_valid_utf8_str` which determines
+   whether a C string is valid UTF-8 or not.
+   * include/internal.h: Add prototype for `_cpp_valid_utf8_str`.
+
  2022-10-27  Ben Boeckel  
  
  	* include/charset.cc: Reject encodings of codepoints above 0x10.

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index e9da6674b5f..0524ab6beba 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile,
return true;
  }


Please add a comment before the function.


+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(, , ))
+   {
+ return false;
+   }
+}


We usually omit unnecessary { } around single statements.


+  return true;
+}
+
  /* Subroutine of convert_hex and convert_oct.  N is the representation
 in the execution character set of a numeric escape; write it into the
 string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
  
+extern bool _cpp_valid_utf8_str (const char *str);

+
  extern void _cpp_destroy_iconv (cpp_reader *);
  extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,




Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel 
> > ---
> >  libcpp/ChangeLog  |  6 ++
> >  libcpp/charset.cc | 18 ++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  
> > +
> > +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +   whether a C string is valid UTF-8 or not.
> > +   * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  
> >  
> > * include/charset.cc: Reject encodings of codepoints above
> > 0x10.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben


Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread David Malcolm via Gcc-patches
On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> This simplifies the interface for other UTF-8 validity detections
> when a
> simple "yes" or "no" answer is sufficient.
> 
> Signed-off-by: Ben Boeckel 
> ---
>  libcpp/ChangeLog  |  6 ++
>  libcpp/charset.cc | 18 ++
>  libcpp/internal.h |  2 ++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 4d707277531..4e2c7900ae2 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  
> +
> +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> determines
> +   whether a C string is valid UTF-8 or not.
> +   * include/internal.h: Add prototype for
> `_cpp_valid_utf8_str`.
> +
>  2022-10-27  Ben Boeckel  
>  
> * include/charset.cc: Reject encodings of codepoints above
> 0x10.

The patch looks good to me, with the same potential caveat that you
might need to move the ChangeLog entry from the patch "body" to the
leading blurb, to satisfy:
  ./contrib/gcc-changelog/git_check_commit.py

Thanks
Dave



[PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-27 Thread Ben Boeckel via Gcc-patches
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

Signed-off-by: Ben Boeckel 
---
 libcpp/ChangeLog  |  6 ++
 libcpp/charset.cc | 18 ++
 libcpp/internal.h |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 4d707277531..4e2c7900ae2 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  
+
+   * include/charset.cc: Add `_cpp_valid_utf8_str` which determines
+   whether a C string is valid UTF-8 or not.
+   * include/internal.h: Add prototype for `_cpp_valid_utf8_str`.
+
 2022-10-27  Ben Boeckel  
 
* include/charset.cc: Reject encodings of codepoints above 0x10.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index e9da6674b5f..0524ab6beba 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(, , ))
+   {
+ return false;
+   }
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.37.3