Re: [leaf-devel] Busybox has buggy regex handling
Hello Charles Steinkuehler, tor 2008-02-28 klockan 23:12 -0600 skrev Charles Steinkuehler: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mats Erik Andersson wrote: | Hi folks, | | this is preliminary information that Busybox does | not possess a full command of regular expression ... Can you provide the exact sed code you're working with? I don't have the validator.sh code handy to examine directly. This is not essential! Instead try the code snippet below. It looks like the regular expression you're passing to grep is: ~ $ echo ^$Trenne$ ~ ^\(1\?[0-9]\|2[0-5]\)\(-\(1\?[0-9]\|2[0-5]\)\)\{2\}$ For portability I would suggest directly crafting an extended regular expression (rather than escaping all the extended metacharacters in a ... Minor glitches like this are why the old (2.2 based kernel) releases used the 'real' sed. :) - -- Charles Steinkuehler [EMAIL PROTECTED] I have step by step narrowed the cause of failure. It turns out that uClibc cannot handle the desired regular expression. This is the case for 0.9.28 (on Bering-C and ATNGW100/avr32), and since the relevant codebase has been untouched for quite some time, I expect the same thing to hold for uClibc 0.9.29. After experimentation a minimal test is a follows: CODE #!/bin/sh lager=/tmp/tillf { echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$' } $lager # | tee $lager echo $(wc -l $lager) matches out of intended 6. cat $lager rm $lager END OF CODE The intended six matches appear on a GNU-system and on OpenBSD, but only five matches appear on Bering-uC 3.1 and a uC-0.9.28-based avr32-system. The problem lies in '?' preceeding '|', but not the other way around. Regards Mats E Andersson - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel
Re: [leaf-devel] Busybox has buggy regex handling
Hi Mats Just out of curiosity I tested your code on AIX # #!/bin/sh # # lager=/tmp/tillf # # { echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$' } $lager # | tee $lager # # echo $(wc -l $lager) matches out of intended 6. 4 matches out of intended 6. # cat $lager baab abba baab abba On a SuSE 10.0 [EMAIL PROTECTED]:~ echo $(wc -l $lager) matches out of intended 6. 6 matches out of intended 6. [EMAIL PROTECTED]:~ cat $lager abab baab abba abab baab abba On Bering 1.2 greatwall: -root- # echo $(wc -l $lager) matches out of intended 6. 5 matches out of intended 6. greatwall: -root- # cat $lager abab abba abab baab abba On Bering 3.x gatekeeper# #!/bin/sh gatekeeper# gatekeeper# lager=/tmp/tillf gatekeeper# gatekeeper# { echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$' } $lager # | tee $lager gatekeeper# gatekeeper# echo $(wc -l $lager) matches out of intended 6. 5 matches out of intended 6. gatekeeper# cat $lager abab abba abab baab abba In this selection the only system which interprets your code 'right' is a fully blown linux. Mind you, Bering 1.2 has a sed binary as does AIX. I guess your code is not very portable then :-( cheers Erich - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel
Re: [leaf-devel] Busybox has buggy regex handling
Dear Eric, I will leave all your code for reference, and add one more for OpenBSD. The really bad news is that uClibc (Berin 1.2/3.1) is assymmetric: five hits, where I went to efforts to construct symmetric test cases. As for AIX, the answer is symmetric, but could you possible test a deeper grouping of patterns: echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$' This additional depth does not change anything to GNU-regex, but fits perfectly with the point where AIX failed. Does anyone have access to a Solaris system? Regards Mats E A on OpenBSD 4.2 6 matches out of intended 6. abab baab abba abab baab abba fre 2008-02-29 klockan 13:26 + skrev Erich Titl: Hi Mats Just out of curiosity I tested your code on AIX # #!/bin/sh # # lager=/tmp/tillf # # { echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$' } $lager # | tee $lager # # echo $(wc -l $lager) matches out of intended 6. 4 matches out of intended 6. # cat $lager baab abba baab abba On a SuSE 10.0 [EMAIL PROTECTED]:~ echo $(wc -l $lager) matches out of intended 6. 6 matches out of intended 6. [EMAIL PROTECTED]:~ cat $lager abab baab abba abab baab abba On Bering 1.2 greatwall: -root- # echo $(wc -l $lager) matches out of intended 6. 5 matches out of intended 6. greatwall: -root- # cat $lager abab abba abab baab abba On Bering 3.x gatekeeper# #!/bin/sh gatekeeper# gatekeeper# lager=/tmp/tillf gatekeeper# gatekeeper# { echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$' } $lager # | tee $lager gatekeeper# gatekeeper# echo $(wc -l $lager) matches out of intended 6. 5 matches out of intended 6. gatekeeper# cat $lager abab abba abab baab abba In this selection the only system which interprets your code 'right' is a fully blown linux. Mind you, Bering 1.2 has a sed binary as does AIX. I guess your code is not very portable then :-( cheers Erich - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel
Re: [leaf-devel] Busybox has buggy regex handling
Hi Mats Mats Erik Andersson schrieb: Dear Eric, I will leave all your code for reference, and add one more for OpenBSD. The really bad news is that uClibc (Berin 1.2/3.1) is assymmetric: five hits, where I went to efforts to construct symmetric test cases. As for AIX, the answer is symmetric, but could you possible test a deeper grouping of patterns: echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$' # echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$' baab abba # echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$' baab abba Regards Erich - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel
Re: [leaf-devel] Busybox has buggy regex handling
Hello again Eric, I am beginning to doubt the portability of regular expressions altogether. A fully grouped pattern seems to be what AIX needs: echo -e abab\nbaab\nabba | egrep '^((a?[ab])|(ba)){2}$' echo -e abab\nbaab\nabba | egrep '^((ba)|(a?[ab])){2}$' If this does not work, AIX stays hopeless! Could it be that AIX produces a match for echo abaaba | egrep '^((a?[ab])|ba){2}$' which would be honestly terrible. Back to the original matter. Following the suggestion of Charles Steinkuehler, the functionality in validator.sh is restored when I put the asymmetry to ugly use: $THIS(-$THIS){3} --- ($THIS-){3}$THIS (broken)(functional) where THIS is similar to (r?a|is) only that I use ordinary regexes in validator.sh, not the extended version, although this service is internal to validator.sh. As some of you have noticed, I did file a notice with the uClibc mailing list, so hopefully the matter will be better as the next beta phase of Bering-uC unrolls. Best regards Mats E A fre 2008-02-29 klockan 18:20 +0100 skrev Erich Titl: Hi Mats Mats Erik Andersson schrieb: Dear Eric, echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$' echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$' # echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$' baab abba # echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$' baab abba I expected also abab in both cases, like GNU provides. Regards Erich - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel
[leaf-devel] Busybox has buggy regex handling
Hi folks, this is preliminary information that Busybox does not possess a full command of regular expression to the extent desirable for the 3.1 of Bering. I know for certain that this disturbs the validation I have implemented for Webconf, but those of you who rely on regular expressions in other subsystems of Bering ought to read the following exposition. The problem is that the regex handling in Busybox cannot correctly resolve repetitions using \{n\}. The following is a reduction of the actual case that disturbs my validation of ip-addresses inside /var/webconf/lib/validator.sh. firewall# ### Matches exactly {0,...,25} firewall# tjugufem=\(1\?[0-9]\|2[0-5]\) firewall# ### Should match n-m-k, where n,m,k in {0,...,25} firewall# Trenne=$tjugufem\(-$tjugufem\)\{2\} firewall# firewall# echo 25-19-25 | grep ^$Trenne$ 25-19-25 firewall# echo 25-20-25 | grep ^$Trenne$ firewall# Using the fullgrown sed on my Debian system, the expected match on 25-20-25 does appear, but not so on Bering 3.1. It is not very probable that some other subsystem uses this kind of regular expression, but you ought to take this into consideration until I find time to develop a patch for Busybox, and which still will not take effect until next release of Bering! Best regards Mats Erik Andersson - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel
Re: [leaf-devel] Busybox has buggy regex handling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mats Erik Andersson wrote: | Hi folks, | | this is preliminary information that Busybox does | not possess a full command of regular expression | to the extent desirable for the 3.1 of Bering. | I know for certain that this disturbs the validation | I have implemented for Webconf, but those of you who | rely on regular expressions in other subsystems of | Bering ought to read the following exposition. | | The problem is that the regex handling in Busybox | cannot correctly resolve repetitions using \{n\}. | The following is a reduction of the actual case | that disturbs my validation of ip-addresses inside | /var/webconf/lib/validator.sh. | | firewall# ### Matches exactly {0,...,25} | firewall# tjugufem=\(1\?[0-9]\|2[0-5]\) | firewall# ### Should match n-m-k, where n,m,k in {0,...,25} | firewall# Trenne=$tjugufem\(-$tjugufem\)\{2\} | firewall# | firewall# echo 25-19-25 | grep ^$Trenne$ | 25-19-25 | firewall# echo 25-20-25 | grep ^$Trenne$ | firewall# Can you provide the exact sed code you're working with? I don't have the validator.sh code handy to examine directly. It looks like the regular expression you're passing to grep is: ~ $ echo ^$Trenne$ ~ ^\(1\?[0-9]\|2[0-5]\)\(-\(1\?[0-9]\|2[0-5]\)\)\{2\}$ For portability I would suggest directly crafting an extended regular expression (rather than escaping all the extended metacharacters in a standard regex). To gnu sed, the above and below are identical, but they might not be to busybox. Try the following, as an extended expression (ie: egrep or sed -r), and see if it's still buggy: ~ egrep '^(1?[0-9]|2[0-5])(-(1?[0-9]|2[0-5])){2}$' ...and try changing the location of the iterator to the first regex: ~ egrep '^((1?[0-9]|2[0-5])-){2}(1?[0-9]|2[0-5])$' Minor glitches like this are why the old (2.2 based kernel) releases used the 'real' sed. :) - -- Charles Steinkuehler [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHx5RWLywbqEHdNFwRAv19AJoD+CGhagwaUQOEGKuDlDCQGFAI3ACgidZ+ QPHg8StqqnhqHq1SE2GKwtA= =Q51A -END PGP SIGNATURE- - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ leaf-devel mailing list leaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/leaf-devel