Re: [leaf-devel] Busybox has buggy regex handling

2008-02-29 Thread Mats Erik Andersson
Hello Charles Steinkuehler,

tor 2008-02-28 klockan 23:12 -0600 skrev Charles Steinkuehler:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Mats Erik Andersson wrote:
 | Hi folks,
 |
 | this is preliminary information that Busybox does
 | not possess a full command of regular expression
 ...
 Can you provide the exact sed code you're working with?  I don't have
 the validator.sh code handy to examine directly.
 
This is not essential! Instead try the code snippet below.

 It looks like the regular expression you're passing to grep is:
 
 ~  $ echo ^$Trenne$
 ~  ^\(1\?[0-9]\|2[0-5]\)\(-\(1\?[0-9]\|2[0-5]\)\)\{2\}$
 
 For portability I would suggest directly crafting an extended regular
 expression (rather than escaping all the extended metacharacters in a
 ...
 Minor glitches like this are why the old (2.2 based kernel) releases
 used the 'real' sed.  :)
 
 - --
 Charles Steinkuehler
 [EMAIL PROTECTED]

I have step by step narrowed the cause of failure.
It turns out that uClibc cannot handle the desired
regular expression. This is the case for 0.9.28
(on Bering-C and ATNGW100/avr32), and since the
relevant codebase has been untouched for quite
some time, I expect the same thing to hold for
uClibc 0.9.29.

After experimentation a minimal test is a follows:

 CODE 
#!/bin/sh

lager=/tmp/tillf

{
echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$'
echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$'
}  $lager # | tee $lager

echo $(wc -l  $lager) matches out of intended 6.
cat $lager
rm $lager
 END OF CODE 

The intended six matches appear on a GNU-system
and on OpenBSD, but only five matches appear on
Bering-uC 3.1 and a uC-0.9.28-based avr32-system.
The problem lies in '?' preceeding '|', but not
the other way around.

Regards

Mats E Andersson


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel


Re: [leaf-devel] Busybox has buggy regex handling

2008-02-29 Thread Erich Titl
Hi Mats

Just out of curiosity I tested your code

 
on AIX
 
# #!/bin/sh
#
# lager=/tmp/tillf
#
# {
  echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$'
  echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$'
  }  $lager # | tee $lager
#
# echo $(wc -l  $lager) matches out of intended 6.
4 matches out of intended 6.
# cat $lager
baab
abba
baab
abba
 
On a SuSE 10.0
 
[EMAIL PROTECTED]:~ echo $(wc -l  $lager) matches out of intended 6.
6 matches out of intended 6.
[EMAIL PROTECTED]:~ cat $lager
abab
baab
abba
abab
baab
abba
 
On Bering 1.2
 
greatwall: -root-
# echo $(wc -l  $lager) matches out of intended 6.
   5  matches out of intended 6.

greatwall: -root-
# cat $lager
abab
abba
abab
baab
abba
 
On Bering 3.x
 
gatekeeper# #!/bin/sh
gatekeeper#
gatekeeper# lager=/tmp/tillf
gatekeeper#
gatekeeper# {
  echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$'
  echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$'
  }  $lager # | tee $lager
gatekeeper#
gatekeeper# echo $(wc -l  $lager) matches out of intended 6.
5 matches out of intended 6.
gatekeeper# cat $lager
abab
abba
abab
baab
abba

In this selection the only system which interprets your code 'right' is 
a fully blown linux. Mind you, Bering 1.2 has a sed binary as does AIX.

I guess your code is not very portable then :-(

cheers

Erich


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel


Re: [leaf-devel] Busybox has buggy regex handling

2008-02-29 Thread Mats Erik Andersson
Dear Eric,

I will leave all your code for reference, and add one more
for OpenBSD. The really bad news is that uClibc (Berin 1.2/3.1)
is assymmetric: five hits, where I went to efforts to construct
symmetric test cases. As for AIX, the answer is symmetric, but
could you possible test a deeper grouping of patterns:

echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$'
echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$'

This additional depth does not change anything to GNU-regex,
but fits perfectly with the point where AIX failed.

Does anyone have access to a Solaris system?

Regards

Mats E A 


on OpenBSD 4.2

   6  matches out of intended 6.
abab
baab
abba
abab
baab
abba

fre 2008-02-29 klockan 13:26 + skrev Erich Titl:
 Hi Mats
 
 Just out of curiosity I tested your code
 
  
 on AIX
  
 # #!/bin/sh
 #
 # lager=/tmp/tillf
 #
 # {
   echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$'
   echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$'
   }  $lager # | tee $lager
 #
 # echo $(wc -l  $lager) matches out of intended 6.
 4 matches out of intended 6.
 # cat $lager
 baab
 abba
 baab
 abba
  
 On a SuSE 10.0
  
 [EMAIL PROTECTED]:~ echo $(wc -l  $lager) matches out of intended 6.
 6 matches out of intended 6.
 [EMAIL PROTECTED]:~ cat $lager
 abab
 baab
 abba
 abab
 baab
 abba
  
 On Bering 1.2
  
 greatwall: -root-
 # echo $(wc -l  $lager) matches out of intended 6.
5  matches out of intended 6.
 
 greatwall: -root-
 # cat $lager
 abab
 abba
 abab
 baab
 abba
  
 On Bering 3.x
  
 gatekeeper# #!/bin/sh
 gatekeeper#
 gatekeeper# lager=/tmp/tillf
 gatekeeper#
 gatekeeper# {
   echo -e abab\nbaab\nabba | egrep '^(a?[ab]|ba){2}$'
   echo -e abab\nbaab\nabba | egrep '^(ba|a?[ab]){2}$'
   }  $lager # | tee $lager
 gatekeeper#
 gatekeeper# echo $(wc -l  $lager) matches out of intended 6.
 5 matches out of intended 6.
 gatekeeper# cat $lager
 abab
 abba
 abab
 baab
 abba
 
 In this selection the only system which interprets your code 'right' is 
 a fully blown linux. Mind you, Bering 1.2 has a sed binary as does AIX.
 
 I guess your code is not very portable then :-(
 
 cheers
 
 Erich
 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel


Re: [leaf-devel] Busybox has buggy regex handling

2008-02-29 Thread Erich Titl
Hi Mats

Mats Erik Andersson schrieb:
 Dear Eric,
 
 I will leave all your code for reference, and add one more
 for OpenBSD. The really bad news is that uClibc (Berin 1.2/3.1)
 is assymmetric: five hits, where I went to efforts to construct
 symmetric test cases. As for AIX, the answer is symmetric, but
 could you possible test a deeper grouping of patterns:
 
 echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$'
 echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$'

# echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$'
baab
abba
# echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$'
baab
abba

Regards

Erich


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel


Re: [leaf-devel] Busybox has buggy regex handling

2008-02-29 Thread Mats Erik Andersson
Hello again Eric,

I am beginning to doubt the portability of regular expressions
altogether. A fully grouped pattern seems to be what AIX needs:

echo -e abab\nbaab\nabba | egrep '^((a?[ab])|(ba)){2}$'
echo -e abab\nbaab\nabba | egrep '^((ba)|(a?[ab])){2}$'

If this does not work, AIX stays hopeless! Could it be that
AIX produces a match for

echo abaaba | egrep '^((a?[ab])|ba){2}$'

which would be honestly terrible.

Back to the original matter. Following the suggestion
of Charles Steinkuehler, the functionality in validator.sh
is restored when I put the asymmetry to ugly use:

$THIS(-$THIS){3}   ---   ($THIS-){3}$THIS

  (broken)(functional)

where THIS is similar to

  (r?a|is)

only that I use ordinary regexes in validator.sh, not the
extended version, although this service is internal to validator.sh.

As some of you have noticed, I did file a notice with the uClibc
mailing list, so hopefully the matter will be better as the next
beta phase of Bering-uC unrolls.

Best regards

Mats E A


fre 2008-02-29 klockan 18:20 +0100 skrev Erich Titl:
 Hi Mats
 
 Mats Erik Andersson schrieb:
  Dear Eric,
  
  
  echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$'
  echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$'
 
 # echo -e abab\nbaab\nabba | egrep '^((a?[ab])|ba){2}$'
 baab
 abba
 # echo -e abab\nbaab\nabba | egrep '^(ba|(a?[ab])){2}$'
 baab
 abba
 
I expected also abab in both cases, like GNU provides.

 Regards
 
 Erich
 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel


[leaf-devel] Busybox has buggy regex handling

2008-02-28 Thread Mats Erik Andersson
Hi folks,

this is preliminary information that Busybox does
not possess a full command of regular expression
to the extent desirable for the 3.1 of Bering.
I know for certain that this disturbs the validation
I have implemented for Webconf, but those of you who
rely on regular expressions in other subsystems of
Bering ought to read the following exposition.

The problem is that the regex handling in Busybox
cannot correctly resolve repetitions using \{n\}.
The following is a reduction of the actual case
that disturbs my validation of ip-addresses inside
/var/webconf/lib/validator.sh.

firewall# ### Matches exactly {0,...,25}
firewall# tjugufem=\(1\?[0-9]\|2[0-5]\)
firewall# ### Should match n-m-k, where n,m,k in {0,...,25}
firewall# Trenne=$tjugufem\(-$tjugufem\)\{2\}
firewall#
firewall# echo 25-19-25 | grep ^$Trenne$
25-19-25
firewall# echo 25-20-25 | grep ^$Trenne$
firewall#

Using the fullgrown sed on my Debian system, the
expected match on 25-20-25 does appear, but not
so on Bering 3.1.

It is not very probable that some other subsystem
uses this kind of regular expression, but you ought
to take this into consideration until I find time to
develop a patch for Busybox, and which still will not
take effect until next release of Bering!

Best regards

Mats Erik Andersson 



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel


Re: [leaf-devel] Busybox has buggy regex handling

2008-02-28 Thread Charles Steinkuehler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mats Erik Andersson wrote:
| Hi folks,
|
| this is preliminary information that Busybox does
| not possess a full command of regular expression
| to the extent desirable for the 3.1 of Bering.
| I know for certain that this disturbs the validation
| I have implemented for Webconf, but those of you who
| rely on regular expressions in other subsystems of
| Bering ought to read the following exposition.
|
| The problem is that the regex handling in Busybox
| cannot correctly resolve repetitions using \{n\}.
| The following is a reduction of the actual case
| that disturbs my validation of ip-addresses inside
| /var/webconf/lib/validator.sh.
|
| firewall# ### Matches exactly {0,...,25}
| firewall# tjugufem=\(1\?[0-9]\|2[0-5]\)
| firewall# ### Should match n-m-k, where n,m,k in {0,...,25}
| firewall# Trenne=$tjugufem\(-$tjugufem\)\{2\}
| firewall#
| firewall# echo 25-19-25 | grep ^$Trenne$
| 25-19-25
| firewall# echo 25-20-25 | grep ^$Trenne$
| firewall#

Can you provide the exact sed code you're working with?  I don't have
the validator.sh code handy to examine directly.

It looks like the regular expression you're passing to grep is:

~  $ echo ^$Trenne$
~  ^\(1\?[0-9]\|2[0-5]\)\(-\(1\?[0-9]\|2[0-5]\)\)\{2\}$

For portability I would suggest directly crafting an extended regular
expression (rather than escaping all the extended metacharacters in a
standard regex).  To gnu sed, the above and below are identical, but
they might not be to busybox.  Try the following, as an extended
expression (ie: egrep or sed -r), and see if it's still buggy:

~  egrep '^(1?[0-9]|2[0-5])(-(1?[0-9]|2[0-5])){2}$'

...and try changing the location of the iterator to the first regex:

~  egrep '^((1?[0-9]|2[0-5])-){2}(1?[0-9]|2[0-5])$'

Minor glitches like this are why the old (2.2 based kernel) releases
used the 'real' sed.  :)

- --
Charles Steinkuehler
[EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHx5RWLywbqEHdNFwRAv19AJoD+CGhagwaUQOEGKuDlDCQGFAI3ACgidZ+
QPHg8StqqnhqHq1SE2GKwtA=
=Q51A
-END PGP SIGNATURE-

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/

___
leaf-devel mailing list
leaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-devel