FYI.

I reported the bug to Sun...from what I see in the code they 'forgot' to
implement a DollarUnix node and forgot to use '\u0085' in their processing
for non-UNIX_LINES....

...

Stephane


----- Original Message -----
From: "Stephane Bailliez" <[EMAIL PROTECTED]>
To: "Ant Developers List" <[EMAIL PROTECTED]>
Sent: Tuesday, January 29, 2002 10:39 PM
Subject: Re: Ant Regexp wrappers [Re: multiline mode and platform issues]


> ----- Original Message -----
> From: "Stefan Bodewig" <[EMAIL PROTECTED]>
> [...]
> > If you want to apply any magic at any point, there will always be
> > situations where the things Ant does are wrong.  Let people deal with
> > these problems explicitly themselves (they could use <fixcrl> for
> > example).
>
> This is *much* more complicated than I expected...
>
> 1) Jakarta Oro works fine and is perfectly consistent.
>
> 2) Jakarta RegExp use platform dependant line-separator and the logic is
not
> correct and will always return false on windows when the \n immediately
ends
> the string.... bug. see code below:
>
>     /** @return true if at the i-th position in the 'search' a newline
ends
> */
>     private boolean isNewline(int i) {
> //#### will fail here if the string is "end of line\n" since it compares
> with "\r\n" size...
>         if (i < NEWLINE.length() - 1) {
>             return false;
>         }
>
>         if (search.charAt(i) == '\n') {
>             return true;
>         }
>
>         for (int j = NEWLINE.length() - 1; j >= 0; j--, i--) {
>             if (NEWLINE.charAt(j) != search.charAt(i)) {
>                 return false;
>             }
>         }
>         return true;
>     }
>
>
> 2) JDK 1.4 does not care about the option UNIX_LINE for $, it seems to
only
> use it for normal processing of text.. Yay ! :-( Plus it does not process
> the next-line character \u0085..argh !
> Did not debug it but that's what I can roughly read from the code, the
> testcase at the end of this mail does not work AT ALL.
>
> code snippet from JDK 1.4 RC
>         boolean match(Matcher matcher, int i, CharSequence seq) {
>             if (i < matcher.to) {
>                 char ch = seq.charAt(i);
>                 if (ch == '\n' || (ch|1) == '\u2029') {
>                     i++;
>                 } else if (ch == '\r') {
>                     i++;
>                     if (i < matcher.to && seq.charAt(i) == '\n') {
>                         i++;
>                     }
>                 } else {
>                     return false;
>                 }
>                 if (multiline == false && i != matcher.to) {
>                     return false;
>                 }
>
>
> I did the following test:
>
>         reg.setPattern("end of text$");
>         assertTrue("Windows line separator", !reg.matches("end of
> text\r\n"));
>         assertTrue("Unix line separator", reg.matches("end of text\n"));
>         assertTrue("standalone CR", !reg.matches("end of text\r"));
>         assertTrue("next-line character", !reg.matches("end of
> text\u0085"));
>         assertTrue("line-separator character", !reg.matches("end of
> text\u2028"));
>         assertTrue("paragraph character", !reg.matches("end of
> text\u2029"));
>         reg.setPattern("end of text\r$");
>         assertTrue("Windows line separator", reg.matches("end of
> text\r\n"));
>
> Stephane
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to