Re: bug#32592: heap-use-after-free in regex module

2018-09-06 Thread Jim Meyering
On Thu, Sep 6, 2018 at 12:18 AM Paul Eggert  wrote:
> Jim Meyering wrote:
> > I couldn't help but notice this nonsense right after the line
> > you inserted:
> >
> >if (err == REG_NOMATCH)
> >  continue;
> >  }
> >
> > That is an "if (...) continue;" just before the closing brace of a
> > for-loop. Those two lines constitute a no-op and should be removed,
> > though not as part of your change.
>
> Actually I think the abovementioned code should be kept, and the nonsense 
> comes
> from the fact that some code is missing after the "if". When err != 
> REG_NOMATCH
> && err != REG_NOERROR, the function should exit the loop and return 
> immediately,
> because there is a memory allocation error in a subroutine.
>
> What a coincidence that we would find two bugs right next to each other, 
> huh?...

Indeed. Glad you realized that.



Re: bug#32592: heap-use-after-free in regex module

2018-09-06 Thread Paul Eggert

Assaf Gordon wrote:

Speaking of coincidences,
I just found this use-after-free bug was already reported (but not fixed)
back in 2015:https://sourceware.org/bugzilla/show_bug.cgi?id=18040  .


Thanks, I had looked for a duplicate bug report before filing glibc bug 23609 
but did not find that one. I have added notes to glibc bugs 18040 and 23609 
suggesting that they be merged (which is apparently not something I can do via 
the web UI).




Re: bug#32592: heap-use-after-free in regex module

2018-09-06 Thread Assaf Gordon
Thank you all for the review and comments.

On Thu, Sep 6, 2018 at 1:18 AM, Paul Eggert  wrote:
> What a coincidence that we would find two bugs right next to each other,
> huh?...
>
> I filed a bug report against glibc, and unless there's an objection I would
> like to fix both bugs in glibc and propagate the fix into gnulib. Please see
> the glibc bug here:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=23609

Speaking of coincidences,
I just found this use-after-free bug was already reported (but not fixed)
back in 2015: https://sourceware.org/bugzilla/show_bug.cgi?id=18040 .


regards,
 - assaf



Re: bug#32592: heap-use-after-free in regex module

2018-09-06 Thread Paul Eggert

Jim Meyering wrote:

I couldn't help but notice this nonsense right after the line
you inserted:

   if (err == REG_NOMATCH)
 continue;
 }

That is an "if (...) continue;" just before the closing brace of a
for-loop. Those two lines constitute a no-op and should be removed,
though not as part of your change.


Actually I think the abovementioned code should be kept, and the nonsense comes 
from the fact that some code is missing after the "if". When err != REG_NOMATCH 
&& err != REG_NOERROR, the function should exit the loop and return immediately, 
because there is a memory allocation error in a subroutine.


What a coincidence that we would find two bugs right next to each other, huh?...

I filed a bug report against glibc, and unless there's an objection I would like 
to fix both bugs in glibc and propagate the fix into gnulib. Please see the 
glibc bug here:


https://sourceware.org/bugzilla/show_bug.cgi?id=23609



Re: bug#32592: heap-use-after-free in regex module

2018-09-05 Thread Jim Meyering
On Wed, Sep 5, 2018 at 6:28 PM Assaf Gordon  wrote:
>
> Bruno alerted me off-list:
>
> On 05/09/18 07:19 PM, Bruno Haible wrote:
>  > Is the ChangeLog entry up-to-date?
>  >
>  > +* regexec.c (get_subexp): Update 'buf' after call to get_subexp_sub.
>  > +Additionally, check for allocation errors and bail out if needed.
>  >
>  > I don't see a code change for
>  > "check for allocation errors and bail out if needed".
>
> Thanks!
>
> I initially had a check for REG_NOERROR there, but removed it.
>
> Attached an updated patch without the outdated comment.

Very nice work!

Your change looks fine: set "buf" to account for potentially-moved
allocation, just as is done on three other lines above.
However, I couldn't help but notice this nonsense right after the line
you inserted:

  if (err == REG_NOMATCH)
continue;
}

That is an "if (...) continue;" just before the closing brace of a
for-loop. Those two lines constitute a no-op and should be removed,
though not as part of your change.



Re: bug#32592: heap-use-after-free in regex module

2018-09-05 Thread Jim Meyering
On Wed, Sep 5, 2018 at 6:08 PM Assaf Gordon  wrote:
> Assuming the gnulib bugfix is valid (in my previous email),
> I suggest adding the following test to sed (after updating gnulib).

Thank you, Assaf.
Only tiny suggestions:


sed-test-tweak.diff
Description: Binary data


Re: bug#32592: heap-use-after-free in regex module

2018-09-05 Thread Assaf Gordon

Bruno alerted me off-list:

On 05/09/18 07:19 PM, Bruno Haible wrote:
> Is the ChangeLog entry up-to-date?
>
> +  * regexec.c (get_subexp): Update 'buf' after call to get_subexp_sub.
> +  Additionally, check for allocation errors and bail out if needed.
>
> I don't see a code change for
> "check for allocation errors and bail out if needed".

Thanks!

I initially had a check for REG_NOERROR there, but removed it.

Attached an updated patch without the outdated comment.

-assaf


>From 3e6bc87d1a8dc6e22c6d60d06aef0b0b6cb03a49 Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Wed, 5 Sep 2018 17:40:28 -0600
Subject: [PATCH] regex: fix heap-use-after-free error

Problem reported by Saito Takaaki  in
https://debbugs.gnu.org/32592
Calling get_subexp() -> get_subexp_sub() -> clean_state_log_if_needed() may
call extend_buffers() which reallocates the re_string_t's internal buffer.
Local variable 'buf' was not updated in such case, resulting in
use-after-free.
* regexec.c (get_subexp): Update 'buf' after calling get_subexp_sub.
---
 ChangeLog | 11 +++
 lib/regexec.c |  1 +
 2 files changed, 12 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 23689545a..e3c01c644 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2018-09-05  Assaf Gordon 
+
+	regex: fix heap-use-after-free error
+	Problem reported by Saito Takaaki  in
+	https://debbugs.gnu.org/32592
+	Call stack get_subexp->get_subexp_sub->clean_state_log_if_needed may
+	call extend_buffers which reallocates the re_string_t internal buffer.
+	Local variable 'buf' was not updated in such case, resulting in
+	use-after-free.
+	* regexec.c (get_subexp): Update 'buf' after call to get_subexp_sub.
+
 2018-09-05  Eric Blake  
 
 	doc: mention environ pitfall
diff --git a/lib/regexec.c b/lib/regexec.c
index 73644c234..61a4ea26d 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -2777,6 +2777,7 @@ get_subexp (re_match_context_t *mctx, Idx bkref_node, Idx bkref_str_idx)
 	return REG_ESPACE;
 	  err = get_subexp_sub (mctx, sub_top, sub_last, bkref_node,
 bkref_str_idx);
+	  buf = (const char *) re_string_get_buffer (>input);
 	  if (err == REG_NOMATCH)
 	continue;
 	}
-- 
2.11.0



Re: bug#32592: heap-use-after-free in regex module

2018-09-05 Thread Assaf Gordon

Hello,
Assuming the gnulib bugfix is valid (in my previous email),
I suggest adding the following test to sed (after updating gnulib).

comments welcomed,
 - assaf


>From bc2794c76cd4202df5172bdbe364a4006e6edbe6 Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Wed, 5 Sep 2018 18:58:55 -0600
Subject: [PATCH] sed: fix heap-use-after-free bug in s/// command

sed would accesses freed memory when given specific backreferences
in 's' command. Reported by Saito Takaaki 
in https://debbugs.gnu.org/32592 .

This is a gnulib/glibc bug which can be triggered by sed.
If the bug is detected, it is recommended to rebuild sed with the built-in
regex engine (./configure --with-included-regex).

Example:

echo 'abcdefghijk!!abcdefghijk!!' \
| valgrind ./sed/sed -E 's/(.+).*\1//i'

* NEWS: Mention the fix.
* testsuite/bug32592.sh: Test for the bug.
* testsuite/local.mk (T): Add new test.
---
 NEWS  |   6 +++
 testsuite/bug32592.sh | 140 ++
 testsuite/local.mk|   1 +
 3 files changed, 147 insertions(+)
 create mode 100755 testsuite/bug32592.sh

diff --git a/NEWS b/NEWS
index e25d26b..ecbba45 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,12 @@ GNU sed NEWS-*- outline -*-
   sed no longer accesses invalid memory (heap overflow) with s/$//n regexes.
   [bug#32271, present since sed-4.3].
 
+  sed no longer accesses freed memory when given specific backreferences
+  in 's' command. This is a gnulib/glibc bug which can be triggered by sed.
+  If the bug is detected, it is recommended to rebuild sed with the built-in
+  regex engine (./configure --with-included-regex) [bug#32592, present at
+  least since sed-4.0.6].
+
 
 * Noteworthy changes in release 4.5 (2018-03-31) [stable]
 
diff --git a/testsuite/bug32592.sh b/testsuite/bug32592.sh
new file mode 100755
index 000..863768e
--- /dev/null
+++ b/testsuite/bug32592.sh
@@ -0,0 +1,140 @@
+#!/bin/sh
+# sed would access freed memory under certain uses.
+# Before sed-4.6, this would result in "Invalid read of size 1" and
+# "Address 0x56096d0 is 16 bytes inside a block of size 42 free'd"
+#
+# The root cause is a gnulib/glibc bug (not sed code).
+# If this test fail, it is recommended to build sed with internal
+# regex implementation (./configure --with-included-regex).
+
+# Copyright (C) 2018 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+## This warning will be shown on top in the test-suite.log file - hopefully
+## users/testers will see it and avoid submitting false-positive bugs.
+printf "%s\n" \
+   "" \
+   "===" \
+   "===" \
+   "" \
+   "  If this test failed (bug32592) you are likely using a buggy glibc." \
+   "  consider recompiling with './configure --with-internal-regex'" \
+   "" \
+   "" \
+   "" \
+   ""
+
+
+. "${srcdir=.}/testsuite/init.sh"; path_prepend_ ./sed
+print_ver_ sed
+
+require_valgrind_
+
+
+
+##
+## Test 1: minimal reproducer
+##
+
+printf 'abcdefghijk!!!abcdefghijk!!!' > in1 \
+  || framework_failure_
+printf 's/(.+).*\1//i' > prog1 || framework_failure_
+
+valgrind --quiet --error-exitcode=1 \
+  sed -E 's/(.+).*\1//i' in1 > out1 2> err1 || fail=1
+
+echo "valgrind report for 'test1':"
+echo "=="
+cat err1
+echo "=="
+
+# Work around a bug in CentOS 5.10's valgrind
+# FIXME: remove in 2018 or when CentOS 5 is no longer officially supported
+grep 'valgrind: .*Assertion.*failed' err-posix > /dev/null \
+  && skip_ 'you seem to have a buggy version of valgrind'
+
+compare /dev/null out1 || fail=1
+compare /dev/null err1 || fail=1
+
+
+
+
+##
+## Test 2: The original bug report
+##
+
+cat<<\EOF>prog2 || framework_failure_
+s/$/\nabcdefghijklmnopqrstuvwxyz/
+s/\(\(.\).\+\(.\)\)\(.*\n.*\1\)/\2-\3\4/i
+P
+d
+EOF
+
+cat<<\EOF>in2 || framework_failure_
+abcdefghijklmns!!
+abcdefghijklmns!
+abcdefghijklmns
+abcdefghijklmns!!!
+abcdefghijklmns!!
+abcdefghijklmns!!
+EOF
+

Re: bug#32592: heap-use-after-free in regex module

2018-09-05 Thread Assaf Gordon

Hello,

On Wed, Sep 5, 2018 at 12:32 AM Assaf Gordon  wrote:  On 04/09/18 07:02 PM, Saito Takaaki wrote:>>> 
https://ideone.com/Sq5xJX>> I hope this helps even a bit. The 
linked snippet you provided exposed a heap-use-after-free bug

in gnulib's regex module (possibly in glibc as well).


Please find the attached patch as a suggested fix.

Comments and review very welcomed,
 - assaf
>From d58391ad0377f0fde07e8f83bff8125772d3 Mon Sep 17 00:00:00 2001
From: Assaf Gordon 
Date: Wed, 5 Sep 2018 17:40:28 -0600
Subject: [PATCH] regex: fix heap-use-after-free error

Problem reported by Saito Takaaki  in
https://debbugs.gnu.org/32592
Calling get_subexp() -> get_subexp_sub() -> clean_state_log_if_needed() may
call extend_buffers() which reallocates the re_string_t's internal buffer.
Local variable 'buf' was not updated in such case, resulting in
use-after-free.
* regexec.c (get_subexp): Update 'buf' after calling get_subexp_sub.
---
 ChangeLog | 12 
 lib/regexec.c |  1 +
 2 files changed, 13 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 23689545a..3cafe2177 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2018-09-05  Assaf Gordon 
+
+	regex: fix heap-use-after-free error
+	Problem reported by Saito Takaaki  in
+	https://debbugs.gnu.org/32592
+	Call stack get_subexp->get_subexp_sub->clean_state_log_if_needed may
+	call extend_buffers which reallocates the re_string_t internal buffer.
+	Local variable 'buf' was not updated in such case, resulting in
+	use-after-free.
+	* regexec.c (get_subexp): Update 'buf' after call to get_subexp_sub.
+	Additionally, check for allocation errors and bail out if needed.
+
 2018-09-05  Eric Blake  
 
 	doc: mention environ pitfall
diff --git a/lib/regexec.c b/lib/regexec.c
index 73644c234..61a4ea26d 100644
--- a/lib/regexec.c
+++ b/lib/regexec.c
@@ -2777,6 +2777,7 @@ get_subexp (re_match_context_t *mctx, Idx bkref_node, Idx bkref_str_idx)
 	return REG_ESPACE;
 	  err = get_subexp_sub (mctx, sub_top, sub_last, bkref_node,
 bkref_str_idx);
+	  buf = (const char *) re_string_get_buffer (>input);
 	  if (err == REG_NOMATCH)
 	continue;
 	}
-- 
2.11.0



Re: bug#32592: heap-use-after-free in regex module (was: s with i modifier seems to work incorrectly)

2018-09-05 Thread Jim Meyering
On Wed, Sep 5, 2018 at 12:32 AM Assaf Gordon  wrote:
>
> (adding gnulib)
>
> On 04/09/18 07:02 PM, Saito Takaaki wrote:
> [... discussing a sed bug ...]
> > However, a friend showed me a more complex case which is
> > problematic even with sed 4.4 on ideone.  The last two lines of the
> > output (for the identical input lines) are  particularly interesting.
> > https://ideone.com/Sq5xJX
> >
> > I hope this helps even a bit.
>
> Thank you for persisting with this bug.
>
> The linked snippet you provided exposed a heap-use-after-free bug
> in gnulib's regex module (possibly in glibc as well).
>
> A simple way to reproduce with latest sed:
>
>cd sed
>./bootstrap
>./configure --with-included-regex
>make
>echo 'abcdefghijklmns!!' \
>   | valgrind ./sed/sed -E 'h;G;s/((.).+(.))(.*\n.*\1)/\2-\3\4/i'
>
> Results in a use-after-free relating to the back-references (valgrind
> output below). There's some interplay with the input length - if the
> exclamation marks are removed, the bug is not triggered.
> The bug does not trigger without the case-insensitive flag (s///i).
>
> This is easier to trigger with gnulib (hence --with-included-regex)
> but happens also with glibc's regex module.
>
> This could also mean that the bug you previously reported and I surmised
> was fixed is not fixed at all - could be that it was just much harder to
> trigger with later sed versions.
>
> I'm still learning the code so don't have a fix yet.

Wow, another!?! Thanks for pursuing!