Re: [OpenWrt-Devel] [musl] regex issue / asterisk / musl / sed

2016-03-01 Thread Bastian Bittorf
* Szabolcs Nagy  [29.02.2016 20:35]:
> a possible fix is attached, the handling of ^ and $
> in BRE is suboptimal, but that will need a bigger
> refactoring.

thank you, fixes it for me on x86/UML and MIPS/ar71xx.

bye, bastian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [musl] regex issue / asterisk / musl / sed

2016-02-29 Thread Szabolcs Nagy
* Szabolcs Nagy  [2016-02-29 14:53:48 +0100]:
> * Bastian Bittorf  [2016-02-29 13:57:36 +0100]:
> > root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g'
> > sed: bad regex '*': Invalid regexp
> > root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g'
> > oasterisko
> > 
> > it's musl 1.1.14 on OpenWrt / r48814
> > both commands are working fine with glibc and uclibc
> > but the first invokation fails with musl 1.1.14 but
> > works with musl 1.1.13. unsre if the prob is on my
> > side, maybe $you have an idea...
> 
> yes, i introduced this regression in
> http://git.musl-libc.org/cgit/musl/commit/?id=7eaa76fc2e7993582989d3838b1ac32dd8abac09
> 
> because i missed the special * behaviour for BRE,
> but even before that ^* was broken so just reverting
> the patch is not enough, handling * after an anchor
> or assertion correctly needs more code changes.

a possible fix is attached, the handling of ^ and $
in BRE is suboptimal, but that will need a bigger
refactoring.

>From b4abe263b2bc0c183274d1aec70cc586e4a46ba1 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy 
Date: Mon, 29 Feb 2016 15:04:46 +
Subject: [PATCH 1/2] fix * at the start of a BRE subexpression

commit 7eaa76fc2e7993582989d3838b1ac32dd8abac09 made * invalid at
the start of a BRE subexpression, but it should be accepted as
literal * there according to the standard.

This patch does not fix subexpressions starting with ^*.
---
 src/regex/regcomp.c |4 
 1 file changed, 4 deletions(-)

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c
index da6abd1..7a2864c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -889,7 +889,6 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
 		s++;
 		break;
 	case '*':
-		return REG_BADPAT;
 	case '{':
 	case '+':
 	case '?':
@@ -978,9 +977,6 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx)
 		}
 
 	parse_iter:
-		/* extension: repetitions are rejected after an empty node
-		   eg. (+), |*, {2}, but assertions are not treated as empty
-		   so ^* or $? are accepted currently. */
 		for (;;) {
 			int min, max;
 
-- 
1.7.9.5

>From d24223c8b344ab3c58f1b9200379bd5349bb8cee Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy 
Date: Mon, 29 Feb 2016 16:36:25 +
Subject: [PATCH 2/2] fix ^* at the start of a complete BRE

This is a workaround to treat * as literal * at the start of a BRE.

Ideally ^ would be treated as an anchor at the start of any BRE
subexpression and similarly $ would be an anchor at the end of any
subexpression.  This is not required by the standard and hard to do
with the current code, but it's the existing practice.  If it is
changed, * should be treated as literal after such anchor as well.
---
 src/regex/regcomp.c |4 
 1 file changed, 4 insertions(+)

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c
index 7a2864c..5fad98b 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -994,6 +994,10 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx)
 			if (*s=='\\')
 s++;
 
+			/* handle ^* at the start of a complete BRE. */
+			if (!ere && s==ctx->re+1 && s[-1]=='^')
+break;
+
 			/* extension: multiple consecutive *+?{,} is unspecified,
 			   but (a+)+ has to be supported so accepting a++ makes
 			   sense, note however that the RE_DUP_MAX limit can be
-- 
1.7.9.5

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [musl] regex issue / asterisk / musl / sed

2016-02-29 Thread Szabolcs Nagy
* Bastian Bittorf  [2016-02-29 13:57:36 +0100]:
> root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g'
> sed: bad regex '*': Invalid regexp
> root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g'
> oasterisko
> 
> it's musl 1.1.14 on OpenWrt / r48814
> both commands are working fine with glibc and uclibc
> but the first invokation fails with musl 1.1.14 but
> works with musl 1.1.13. unsre if the prob is on my
> side, maybe $you have an idea...

yes, i introduced this regression in
http://git.musl-libc.org/cgit/musl/commit/?id=7eaa76fc2e7993582989d3838b1ac32dd8abac09

because i missed the special * behaviour for BRE,
but even before that ^* was broken so just reverting
the patch is not enough, handling * after an anchor
or assertion correctly needs more code changes.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel