Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-12-03 Thread Tom Tromey
 Jakub == Jakub Jelinek ja...@redhat.com writes:

Jakub 2012-11-23  Jakub Jelinek  ja...@redhat.com
Jakub  PR bootstrap/55380
Jakub  * files.c (read_file_guts): Allocate extra 16 bytes instead of
Jakub  1 byte at the end of buf.  Pass size + 16 instead of size
Jakub  to _cpp_convert_input.
Jakub  * charset.c (_cpp_convert_input): Reallocate if there aren't
Jakub  at least 16 bytes beyond to.len in the buffer.  Clear 16 bytes
Jakub  at to.text + to.len.

Jakub +  buf = XNEWVEC (uchar, size + 16);

I think the magic constant 16 could use a comment, here and elsewhere.

Otherwise ok.

Tom


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-26 Thread H.J. Lu
On Fri, Nov 23, 2012 at 11:50 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Nov 23, 2012 at 11:33:37AM -0800, H.J. Lu wrote:
 2012-11-21  H.J. Lu  hongjiu...@intel.com

   PR bootstrap/55380
   * charset.c (_cpp_convert_input): Clear CPP_PAD_BUFFER_SIZE
   bytes if CLEAR_CPP_PAD_BUFFER isn't 0.  Allocate extra
   CPP_PAD_BUFFER_SIZE bytes and clear it if CLEAR_CPP_PAD_BUFFER
   isn't 0.

   * files.c (read_file_guts): Allocate extra CPP_PAD_BUFFER_SIZE
   bytes.

   * internal.h (CPP_PAD_BUFFER_SIZE): New.  Defined to 16 for
   -fsanitize=address if __SANITIZE_ADDRESS__ is defined.
   (CLEAR_CPP_PAD_BUFFER): New.

 I'd say you are making this way too much complicated.
 The following patch just changes those + 1 to + 16, adds memset of the 16
 bytes unconditionally, but as it also fixes a thinko which IMHO affects
 the most common cases (regular file, with no conversion needed), I'd say
 it ought to be faster than the old version (the old version would allocate
 st.st_size + 1 bytes long buffer, read st.st_size bytes into it,
 call _cpp_convert_input with len st.st_size and size st.st_size (the latter
 should have been st.st_size + 1, i.e. the allocated size) and thus
 to.len = to.asize was true and we called realloc with the same length
 as malloc has been called originally.

 2012-11-23  Jakub Jelinek  ja...@redhat.com

 PR bootstrap/55380
 * files.c (read_file_guts): Allocate extra 16 bytes instead of
 1 byte at the end of buf.  Pass size + 16 instead of size
 to _cpp_convert_input.
 * charset.c (_cpp_convert_input): Reallocate if there aren't
 at least 16 bytes beyond to.len in the buffer.  Clear 16 bytes
 at to.text + to.len.

 --- libcpp/files.c.jj   2012-11-22 11:04:24.0 +0100
 +++ libcpp/files.c  2012-11-23 20:37:03.146379853 +0100
 @@ -671,7 +671,7 @@ read_file_guts (cpp_reader *pfile, _cpp_
 the majority of C source files.  */
  size = 8 * 1024;

 -  buf = XNEWVEC (uchar, size + 1);
 +  buf = XNEWVEC (uchar, size + 16);
total = 0;
while ((count = read (file-fd, buf + total, size - total))  0)
  {
 @@ -682,7 +682,7 @@ read_file_guts (cpp_reader *pfile, _cpp_
   if (regular)
 break;
   size *= 2;
 - buf = XRESIZEVEC (uchar, buf, size + 1);
 + buf = XRESIZEVEC (uchar, buf, size + 16);
 }
  }

 @@ -699,7 +699,7 @@ read_file_guts (cpp_reader *pfile, _cpp_

file-buffer = _cpp_convert_input (pfile,
  CPP_OPTION (pfile, input_charset),
 -buf, size, total,
 +buf, size + 16, total,
  file-buffer_start,
  file-st.st_size);
file-buffer_valid = true;
 --- libcpp/charset.c.jj 2011-01-06 10:22:00.0 +0100
 +++ libcpp/charset.c2012-11-23 20:40:39.736116642 +0100
 @@ -1,6 +1,6 @@
  /* CPP Library - charsets
 Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2008, 2009,
 -   2010 Free Software Foundation, Inc.
 +   2010, 2012 Free Software Foundation, Inc.

 Broken out of c-lex.c Apr 2003, adding valid C99 UCN ranges.

 @@ -1729,9 +1729,12 @@ _cpp_convert_input (cpp_reader *pfile, c
  iconv_close (input_cset.cd);

/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 -  if (to.len + 4096  to.asize || to.len = to.asize)
 -to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 + haven't enough space for the \n-terminator or following
 + 15 bytes of padding.  */
 +  if (to.len + 4096  to.asize || to.len + 16  to.asize)
 +to.text = XRESIZEVEC (uchar, to.text, to.len + 16);
 +
 +  memset (to.text + to.len, '\0', 16);

/* If the file is using old-school Mac line endings (\r only),
   terminate with another \r, not an \n, so that we do not mistake


 Jakub

You should also mention:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54691

in ChangeLog entry.

Thanks.


-- 
H.J.


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-24 Thread Hans-Peter Nilsson
On Fri, 23 Nov 2012, H.J. Lu wrote:
 On Fri, Nov 23, 2012 at 9:38 AM, Jakub Jelinek ja...@redhat.com wrote:
  On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
  This patch allocates extra 16 bytes for -fsanitize=address so that
  asan won't report read beyond memory buffer. It is used by
  bootstrap-asan.  OK to install?
 
  As valgrind warns about that too, I'd say we should do that unconditionally,
  the additional 16-bytes just aren't a big deal.

 This isn't sufficient for valgrind since valgrind will also report
 reading uninitialized data,

Only if that initialized data is actually used in a conditional.

 which requires additional memset
 on extra 16 bytes.

(For plain operations, valgrind just operate on, typically
passing along, the undefinedness.)

Maybe that's what you meant, in which case my comment just
clarifies what I saw as an ambiguity in your statement.

brgds, H-P


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread Jakub Jelinek
On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

As valgrind warns about that too, I'd say we should do that unconditionally,
the additional 16-bytes just aren't a big deal.
But, _cpp_convert_input isn't the only place which needs that, IMHO you want
to also change the caller, read_file_guts, where it does
  buf = XNEWVEC (uchar, size + 1);
and
  buf = XRESIZEVEC (uchar, buf, size + 1);
I'll defer the review to Tom though.

 2012-11-21  H.J. Lu  hongjiu...@intel.com
 
   PR bootstrap/55380
   * charset.c (_cpp_convert_input): Allocate extra 16 bytes for
   -fsanitize=address if __SANITIZE_ADDRESS__ is defined.
 
 diff --git a/libcpp/charset.c b/libcpp/charset.c
 index cba19a6..dea8bb1 100644
 --- a/libcpp/charset.c
 +++ b/libcpp/charset.c
 @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char 
 *input_charset,
  iconv_close (input_cset.cd);
  
/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 + haven't enough space for the \n-terminator.  Allocate extra 16
 + bytes for -fsanitize=address.  */
if (to.len + 4096  to.asize || to.len = to.asize)
 -to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 +{
 +#ifdef __SANITIZE_ADDRESS__
 +  to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
 +#else
 +  to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 +#endif
 +}
  
/* If the file is using old-school Mac line endings (\r only),
   terminate with another \r, not an \n, so that we do not mistake

Jakub


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread H.J. Lu
On Fri, Nov 23, 2012 at 9:38 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

 As valgrind warns about that too, I'd say we should do that unconditionally,
 the additional 16-bytes just aren't a big deal.

This isn't sufficient for valgrind since valgrind will also report
reading uninitialized data, which requires additional memset
on extra 16 bytes.

 But, _cpp_convert_input isn't the only place which needs that, IMHO you want

This patch is sufficient to compile libcpp with -fsanitize=address.

 to also change the caller, read_file_guts, where it does
   buf = XNEWVEC (uchar, size + 1);
 and
   buf = XRESIZEVEC (uchar, buf, size + 1);

I don't think it is necessary since there is no real data in
those extra 16 bytes.  They can have garbage and libcpp still
functions correctly.  They are purely used to silence ASAN.

 I'll defer the review to Tom though.

 2012-11-21  H.J. Lu  hongjiu...@intel.com

   PR bootstrap/55380
   * charset.c (_cpp_convert_input): Allocate extra 16 bytes for
   -fsanitize=address if __SANITIZE_ADDRESS__ is defined.

 diff --git a/libcpp/charset.c b/libcpp/charset.c
 index cba19a6..dea8bb1 100644
 --- a/libcpp/charset.c
 +++ b/libcpp/charset.c
 @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char 
 *input_charset,
  iconv_close (input_cset.cd);

/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 + haven't enough space for the \n-terminator.  Allocate extra 16
 + bytes for -fsanitize=address.  */
if (to.len + 4096  to.asize || to.len = to.asize)
 -to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 +{
 +#ifdef __SANITIZE_ADDRESS__
 +  to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
 +#else
 +  to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 +#endif
 +}

/* If the file is using old-school Mac line endings (\r only),
   terminate with another \r, not an \n, so that we do not mistake

 Jakub



-- 
H.J.


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread Konstantin Serebryany
On Fri, Nov 23, 2012 at 10:08 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Nov 23, 2012 at 9:38 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Nov 23, 2012 at 09:23:37AM -0800, H.J. Lu wrote:
 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

 As valgrind warns about that too, I'd say we should do that unconditionally,
 the additional 16-bytes just aren't a big deal.

 This isn't sufficient for valgrind since valgrind will also report
 reading uninitialized data, which requires additional memset
 on extra 16 bytes.

Valgrind should not report reading uninitialized data if that data is
actually not used later (extra bytes are cleared by AND-ing with
zeroes).
If the code was mine, I would simply use (to.len + 17) w/o any conditionals.

--kcc



 But, _cpp_convert_input isn't the only place which needs that, IMHO you want

 This patch is sufficient to compile libcpp with -fsanitize=address.

 to also change the caller, read_file_guts, where it does
   buf = XNEWVEC (uchar, size + 1);
 and
   buf = XRESIZEVEC (uchar, buf, size + 1);

 I don't think it is necessary since there is no real data in
 those extra 16 bytes.  They can have garbage and libcpp still
 functions correctly.  They are purely used to silence ASAN.

 I'll defer the review to Tom though.

 2012-11-21  H.J. Lu  hongjiu...@intel.com

   PR bootstrap/55380
   * charset.c (_cpp_convert_input): Allocate extra 16 bytes for
   -fsanitize=address if __SANITIZE_ADDRESS__ is defined.

 diff --git a/libcpp/charset.c b/libcpp/charset.c
 index cba19a6..dea8bb1 100644
 --- a/libcpp/charset.c
 +++ b/libcpp/charset.c
 @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char 
 *input_charset,
  iconv_close (input_cset.cd);

/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 + haven't enough space for the \n-terminator.  Allocate extra 16
 + bytes for -fsanitize=address.  */
if (to.len + 4096  to.asize || to.len = to.asize)
 -to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 +{
 +#ifdef __SANITIZE_ADDRESS__
 +  to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
 +#else
 +  to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
 +#endif
 +}

/* If the file is using old-school Mac line endings (\r only),
   terminate with another \r, not an \n, so that we do not mistake

 Jakub



 --
 H.J.


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread Jakub Jelinek
On Fri, Nov 23, 2012 at 10:08:11AM -0800, H.J. Lu wrote:
  to also change the caller, read_file_guts, where it does
buf = XNEWVEC (uchar, size + 1);
  and
buf = XRESIZEVEC (uchar, buf, size + 1);
 
 I don't think it is necessary since there is no real data in
 those extra 16 bytes.  They can have garbage and libcpp still
 functions correctly.  They are purely used to silence ASAN.

The thing is, if the buf from the caller has such size/total
combination that if (to.len + 4096  to.asize || to.len = to.asize)
isn't true, there won't be any reallocation and the buffer passed
in from the caller will be used instead.  And, if it doesn't have those
extra 16 bytes, it will still result in asan warning...
Guess you need to read file from stdin instead of a file for that
to trigger that.

For valgrind I bet just clearing those 16 bytes might still be cheap enough
not to worry about it.

  I'll defer the review to Tom though.
 
  2012-11-21  H.J. Lu  hongjiu...@intel.com
 
PR bootstrap/55380
* charset.c (_cpp_convert_input): Allocate extra 16 bytes for
-fsanitize=address if __SANITIZE_ADDRESS__ is defined.
 
  diff --git a/libcpp/charset.c b/libcpp/charset.c
  index cba19a6..dea8bb1 100644
  --- a/libcpp/charset.c
  +++ b/libcpp/charset.c
  @@ -1729,9 +1729,16 @@ _cpp_convert_input (cpp_reader *pfile, const char 
  *input_charset,
   iconv_close (input_cset.cd);
 
 /* Resize buffer if we allocated substantially too much, or if we
  - haven't enough space for the \n-terminator.  */
  + haven't enough space for the \n-terminator.  Allocate extra 16
  + bytes for -fsanitize=address.  */
 if (to.len + 4096  to.asize || to.len = to.asize)
  -to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
  +{
  +#ifdef __SANITIZE_ADDRESS__
  +  to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
  +#else
  +  to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
  +#endif
  +}
 
 /* If the file is using old-school Mac line endings (\r only),
terminate with another \r, not an \n, so that we do not mistake

Jakub


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread H.J. Lu
On Fri, Nov 23, 2012 at 10:12 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Nov 23, 2012 at 10:08:11AM -0800, H.J. Lu wrote:
  to also change the caller, read_file_guts, where it does
buf = XNEWVEC (uchar, size + 1);
  and
buf = XRESIZEVEC (uchar, buf, size + 1);

 I don't think it is necessary since there is no real data in
 those extra 16 bytes.  They can have garbage and libcpp still
 functions correctly.  They are purely used to silence ASAN.

 The thing is, if the buf from the caller has such size/total
 combination that if (to.len + 4096  to.asize || to.len = to.asize)
 isn't true, there won't be any reallocation and the buffer passed
 in from the caller will be used instead.  And, if it doesn't have those
 extra 16 bytes, it will still result in asan warning...
 Guess you need to read file from stdin instead of a file for that
 to trigger that.

I see.  I will double check and update my patch.

 For valgrind I bet just clearing those 16 bytes might still be cheap enough
 not to worry about it.

This is what I did for valgrind:

http://gcc.gnu.org/git/?p=gcc.git;a=blobdiff;f=libcpp/charset.c;h=16a37c36f0cbf7a69c93ae7acb5d79893d57b643;hp=cba19a67178c796dcef1c8c70ac5c43dcbc69071;hb=8ab0e2cd4ae0fae01d0b84d6ccc382acb81ab876;hpb=e5e0d29f8b8ccff799a26fc3e6435af8dbf358fc

   /* Resize buffer if we allocated substantially too much, or if we
- haven't enough space for the \n-terminator.  */
+ haven't enough space for the \n-terminator.  Allocate and
+ initialize extra 16 bytes for --enable-checking=valgrind.  */
   if (to.len + 4096  to.asize || to.len = to.asize)
-to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+{
+#ifdef ENABLE_VALGRIND_CHECKING
+  to.text = XRESIZEVEC (uchar, to.text, to.len + 17);
+  memset (to.text + to.len + 1, 0, 16);
+#else
+  to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+#endif
+}

I will check if I need to update it for stdin.

Thanks.

-- 
H.J.


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread Uros Bizjak
Hello!

 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

   /* Resize buffer if we allocated substantially too much, or if we
- haven't enough space for the \n-terminator.  */
+ haven't enough space for the \n-terminator.  Allocate extra 16
+ bytes for -fsanitize=address.  */

I guess that extra _15_ bytes should be enough? The maximum we need
for SSE stringops is additional 15 bytes, when only the first byte is
allocated in the 16byte bundle.

Uros,


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread H.J. Lu
On Fri, Nov 23, 2012 at 10:59 AM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 + haven't enough space for the \n-terminator.  Allocate extra 16
 + bytes for -fsanitize=address.  */

 I guess that extra _15_ bytes should be enough? The maximum we need
 for SSE stringops is additional 15 bytes, when only the first byte is
 allocated in the 16byte bundle.


If we need to clear the extra bytes, clearing 16 bytes can be faster
than 16 bytes.


-- 
H.J.


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread Uros Bizjak
On Fri, Nov 23, 2012 at 8:16 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Nov 23, 2012 at 10:59 AM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 + haven't enough space for the \n-terminator.  Allocate extra 16
 + bytes for -fsanitize=address.  */

 I guess that extra _15_ bytes should be enough? The maximum we need
 for SSE stringops is additional 15 bytes, when only the first byte is
 allocated in the 16byte bundle.


 If we need to clear the extra bytes, clearing 16 bytes can be faster
 than 16 bytes.

(I assume ... than 15 bytes).

However, this is ideal for the code below, where we add either +16 or
+1, considering also additional byte for the terminator.

Uros.


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread H.J. Lu
On Fri, Nov 23, 2012 at 11:23 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Nov 23, 2012 at 8:16 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Nov 23, 2012 at 10:59 AM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 This patch allocates extra 16 bytes for -fsanitize=address so that
 asan won't report read beyond memory buffer. It is used by
 bootstrap-asan.  OK to install?

/* Resize buffer if we allocated substantially too much, or if we
 - haven't enough space for the \n-terminator.  */
 + haven't enough space for the \n-terminator.  Allocate extra 16
 + bytes for -fsanitize=address.  */

 I guess that extra _15_ bytes should be enough? The maximum we need
 for SSE stringops is additional 15 bytes, when only the first byte is
 allocated in the 16byte bundle.


 If we need to clear the extra bytes, clearing 16 bytes can be faster
 than 16 bytes.

 (I assume ... than 15 bytes).

 However, this is ideal for the code below, where we add either +16 or
 +1, considering also additional byte for the terminator.

 Uros.

Here is the updated patch.  I added CLEAR_CPP_PAD_BUFFER to
be used later for valgrind.  I have no real preferences for 15 vs 16 extra
bytes.


-- 
H.J.
---
2012-11-21  H.J. Lu  hongjiu...@intel.com

PR bootstrap/55380
* charset.c (_cpp_convert_input): Clear CPP_PAD_BUFFER_SIZE
bytes if CLEAR_CPP_PAD_BUFFER isn't 0.  Allocate extra
CPP_PAD_BUFFER_SIZE bytes and clear it if CLEAR_CPP_PAD_BUFFER
isn't 0.

* files.c (read_file_guts): Allocate extra CPP_PAD_BUFFER_SIZE
bytes.

* internal.h (CPP_PAD_BUFFER_SIZE): New.  Defined to 16 for
-fsanitize=address if __SANITIZE_ADDRESS__ is defined.
(CLEAR_CPP_PAD_BUFFER): New.
diff --git a/libcpp/charset.c b/libcpp/charset.c
index cba19a6..1ca94a1 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -1709,6 +1709,8 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_c
harset,
   to.text = input;
   to.asize = size;
   to.len = len;
+  if (CLEAR_CPP_PAD_BUFFER)
+   memset (input + size, 0, CPP_PAD_BUFFER_SIZE);
 }
   else
 {
@@ -1731,7 +1733,12 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_
charset,
   /* Resize buffer if we allocated substantially too much, or if we
  haven't enough space for the \n-terminator.  */
   if (to.len + 4096  to.asize || to.len = to.asize)
-to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+{
+  to.text = XRESIZEVEC (uchar, to.text,
+   to.len + CPP_PAD_BUFFER_SIZE + 1);
+  if (CLEAR_CPP_PAD_BUFFER)
+   memset (to.text + to.len + 1, 0, CPP_PAD_BUFFER_SIZE);
+}

   /* If the file is using old-school Mac line endings (\r only),
  terminate with another \r, not an \n, so that we do not mistake
diff --git a/libcpp/files.c b/libcpp/files.c
index 9f84d8c..090f928 100644
--- a/libcpp/files.c
+++ b/libcpp/files.c
@@ -671,7 +671,7 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file)
the majority of C source files.  */
 size = 8 * 1024;

-  buf = XNEWVEC (uchar, size + 1);
+  buf = XNEWVEC (uchar, size + CPP_PAD_BUFFER_SIZE + 1);
   total = 0;
   while ((count = read (file-fd, buf + total, size - total))  0)
 {
@@ -682,7 +682,7 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file)
  if (regular)
break;
  size *= 2;
- buf = XRESIZEVEC (uchar, buf, size + 1);
+ buf = XRESIZEVEC (uchar, buf, size + CPP_PAD_BUFFER_SIZE + 1);
}
 }

diff --git a/libcpp/internal.h b/libcpp/internal.h
index 312b8b5..bc1f311 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -77,6 +77,16 @@ struct cset_converter
efficiency, and partly to limit runaway recursion.  */
 #define CPP_STACK_MAX 200

+/* Allocate extra 16 bytes for -fsanitize=address.  */
+#ifdef __SANITIZE_ADDRESS__
+#define CPP_PAD_BUFFER_SIZE 16
+#else
+#define CPP_PAD_BUFFER_SIZE 0
+#endif
+
+/* 1 to clear extra 16 bytes.  */
+#define CLEAR_CPP_PAD_BUFFER 0
+
 /* Host alignment handling.  */
 struct dummy
 {


Re: [PATCH] Allocate extra 16 bytes for -fsanitize=address

2012-11-23 Thread Jakub Jelinek
On Fri, Nov 23, 2012 at 11:33:37AM -0800, H.J. Lu wrote:
 2012-11-21  H.J. Lu  hongjiu...@intel.com
 
   PR bootstrap/55380
   * charset.c (_cpp_convert_input): Clear CPP_PAD_BUFFER_SIZE
   bytes if CLEAR_CPP_PAD_BUFFER isn't 0.  Allocate extra
   CPP_PAD_BUFFER_SIZE bytes and clear it if CLEAR_CPP_PAD_BUFFER
   isn't 0.
 
   * files.c (read_file_guts): Allocate extra CPP_PAD_BUFFER_SIZE
   bytes.
 
   * internal.h (CPP_PAD_BUFFER_SIZE): New.  Defined to 16 for
   -fsanitize=address if __SANITIZE_ADDRESS__ is defined.
   (CLEAR_CPP_PAD_BUFFER): New.

I'd say you are making this way too much complicated.
The following patch just changes those + 1 to + 16, adds memset of the 16
bytes unconditionally, but as it also fixes a thinko which IMHO affects
the most common cases (regular file, with no conversion needed), I'd say
it ought to be faster than the old version (the old version would allocate
st.st_size + 1 bytes long buffer, read st.st_size bytes into it,
call _cpp_convert_input with len st.st_size and size st.st_size (the latter
should have been st.st_size + 1, i.e. the allocated size) and thus
to.len = to.asize was true and we called realloc with the same length
as malloc has been called originally.

2012-11-23  Jakub Jelinek  ja...@redhat.com

PR bootstrap/55380
* files.c (read_file_guts): Allocate extra 16 bytes instead of
1 byte at the end of buf.  Pass size + 16 instead of size
to _cpp_convert_input.
* charset.c (_cpp_convert_input): Reallocate if there aren't
at least 16 bytes beyond to.len in the buffer.  Clear 16 bytes
at to.text + to.len.

--- libcpp/files.c.jj   2012-11-22 11:04:24.0 +0100
+++ libcpp/files.c  2012-11-23 20:37:03.146379853 +0100
@@ -671,7 +671,7 @@ read_file_guts (cpp_reader *pfile, _cpp_
the majority of C source files.  */
 size = 8 * 1024;
 
-  buf = XNEWVEC (uchar, size + 1);
+  buf = XNEWVEC (uchar, size + 16);
   total = 0;
   while ((count = read (file-fd, buf + total, size - total))  0)
 {
@@ -682,7 +682,7 @@ read_file_guts (cpp_reader *pfile, _cpp_
  if (regular)
break;
  size *= 2;
- buf = XRESIZEVEC (uchar, buf, size + 1);
+ buf = XRESIZEVEC (uchar, buf, size + 16);
}
 }
 
@@ -699,7 +699,7 @@ read_file_guts (cpp_reader *pfile, _cpp_
 
   file-buffer = _cpp_convert_input (pfile,
 CPP_OPTION (pfile, input_charset),
-buf, size, total,
+buf, size + 16, total,
 file-buffer_start,
 file-st.st_size);
   file-buffer_valid = true;
--- libcpp/charset.c.jj 2011-01-06 10:22:00.0 +0100
+++ libcpp/charset.c2012-11-23 20:40:39.736116642 +0100
@@ -1,6 +1,6 @@
 /* CPP Library - charsets
Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2008, 2009,
-   2010 Free Software Foundation, Inc.
+   2010, 2012 Free Software Foundation, Inc.
 
Broken out of c-lex.c Apr 2003, adding valid C99 UCN ranges.
 
@@ -1729,9 +1729,12 @@ _cpp_convert_input (cpp_reader *pfile, c
 iconv_close (input_cset.cd);
 
   /* Resize buffer if we allocated substantially too much, or if we
- haven't enough space for the \n-terminator.  */
-  if (to.len + 4096  to.asize || to.len = to.asize)
-to.text = XRESIZEVEC (uchar, to.text, to.len + 1);
+ haven't enough space for the \n-terminator or following
+ 15 bytes of padding.  */
+  if (to.len + 4096  to.asize || to.len + 16  to.asize)
+to.text = XRESIZEVEC (uchar, to.text, to.len + 16);
+
+  memset (to.text + to.len, '\0', 16);
 
   /* If the file is using old-school Mac line endings (\r only),
  terminate with another \r, not an \n, so that we do not mistake


Jakub