* Szabolcs Nagy <n...@port70.net> [2016-02-29 14:53:48 +0100]: > * Bastian Bittorf <bitt...@bluebottle.com> [2016-02-29 13:57:36 +0100]: > > root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g' > > sed: bad regex '*': Invalid regexp > > root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g' > > oasterisko > > > > it's musl 1.1.14 on OpenWrt / r48814 > > both commands are working fine with glibc and uclibc > > but the first invokation fails with musl 1.1.14 but > > works with musl 1.1.13. unsre if the prob is on my > > side, maybe $you have an idea... > > yes, i introduced this regression in > http://git.musl-libc.org/cgit/musl/commit/?id=7eaa76fc2e7993582989d3838b1ac32dd8abac09 > > because i missed the special * behaviour for BRE, > but even before that ^* was broken so just reverting > the patch is not enough, handling * after an anchor > or assertion correctly needs more code changes.
a possible fix is attached, the handling of ^ and $ in BRE is suboptimal, but that will need a bigger refactoring.
>From b4abe263b2bc0c183274d1aec70cc586e4a46ba1 Mon Sep 17 00:00:00 2001 From: Szabolcs Nagy <n...@port70.net> Date: Mon, 29 Feb 2016 15:04:46 +0000 Subject: [PATCH 1/2] fix * at the start of a BRE subexpression commit 7eaa76fc2e7993582989d3838b1ac32dd8abac09 made * invalid at the start of a BRE subexpression, but it should be accepted as literal * there according to the standard. This patch does not fix subexpressions starting with ^*. --- src/regex/regcomp.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index da6abd1..7a2864c 100644 --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c @@ -889,7 +889,6 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s) s++; break; case '*': - return REG_BADPAT; case '{': case '+': case '?': @@ -978,9 +977,6 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx) } parse_iter: - /* extension: repetitions are rejected after an empty node - eg. (+), |*, {2}, but assertions are not treated as empty - so ^* or $? are accepted currently. */ for (;;) { int min, max; -- 1.7.9.5
>From d24223c8b344ab3c58f1b9200379bd5349bb8cee Mon Sep 17 00:00:00 2001 From: Szabolcs Nagy <n...@port70.net> Date: Mon, 29 Feb 2016 16:36:25 +0000 Subject: [PATCH 2/2] fix ^* at the start of a complete BRE This is a workaround to treat * as literal * at the start of a BRE. Ideally ^ would be treated as an anchor at the start of any BRE subexpression and similarly $ would be an anchor at the end of any subexpression. This is not required by the standard and hard to do with the current code, but it's the existing practice. If it is changed, * should be treated as literal after such anchor as well. --- src/regex/regcomp.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index 7a2864c..5fad98b 100644 --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c @@ -994,6 +994,10 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx) if (*s=='\\') s++; + /* handle ^* at the start of a complete BRE. */ + if (!ere && s==ctx->re+1 && s[-1]=='^') + break; + /* extension: multiple consecutive *+?{,} is unspecified, but (a+)+ has to be supported so accepting a++ makes sense, note however that the RE_DUP_MAX limit can be -- 1.7.9.5
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel