DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=11689>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=11689

Pattern parsing problems

           Summary: Pattern parsing problems
           Product: Regexp
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Other
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Hello!

I'm a user of Regexp-1.2. When I work on magazine article about using
Regexp-1.2 in Java projects,
I found bug. When I specified such pattern:
^[^\x5B\x5D]+$

or the same pattern:
^[^\[\]]+$

strings, which contains symbol '\' are not match in Regexp-1.2. This is
incorrect behaviour.
I see Regexp-1.2 sources and found bug:

//in RECompiler.java
class RERange
{
...
    void remove(int min, int max)
    {
        // Loop through ranges
        for (int i = 0; i < num; i++)
        {
            // minRange[i]-maxRange[i] is subsumed by min-max
            if (minRange[i] >= min && maxRange[i] <= max)
            {
                delete(i);
                i--;
                return;
            }

            // min-max is subsumed by minRange[i]-maxRange[i]
            else if (min >= minRange[i] && max <= maxRange[i])
            {
                int minr = minRange[i];
                int maxr = maxRange[i];
                delete(i);
                if (minr <= min - 1) //!!! corrected by me - bug was here,
source condition was: if (minr < min - 1)
                {
                    merge(minr, min - 1);
                }
                if (max + 1 <= maxr) //!!! corrected by me - bug was here,
source condition was: if (max + 1 < maxr)
                {
                    merge(max + 1, maxr);
                }
                return;
            }
        ...
        }
    ...
    }
...
}

Bug occurs because conditions was incorrect. When RECompiler process
specified (see my example) character class,
before processing of x5D (93 in decimal) symbol, the range is:
0-90, 92-65535. So remove(93, 93) invoked, condition (92 < 93 - 1) is false
and merge(92, 92) was not invoked, so
after processing entire character class, the range is: 0-90, 94-65535.
Symbol '\' has decimal code 92, it is not in
range, so strings, which contains '\' symbol are not match pattern.
If change conditions to "if (minr <= min - 1)" and "if (max + 1 <= maxr)"
all is OK (for my example, such range
created 0-90, 92-92, 94-65535).

Test application:

public class Test
{
    public static void main(String[] argv) throws Exception
    {
        RE re = new RE("^[^\\x5B\\x5D]+$");
        if (re.match("sdfsd\\fsdf"))
        {
            System.out.println("Matched");
        }
        else
        {
            System.out.println("Not matched");
        }
        System.out.println("Finished!");
    }
}

If you have have some questions about this bug, don't hesitate to contact
me. I'm glad to help you.

Best regards,
Igor Vinnikov

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to