FYI. I reported the bug to Sun...from what I see in the code they 'forgot' to implement a DollarUnix node and forgot to use '\u0085' in their processing for non-UNIX_LINES....
... Stephane ----- Original Message ----- From: "Stephane Bailliez" <[EMAIL PROTECTED]> To: "Ant Developers List" <[EMAIL PROTECTED]> Sent: Tuesday, January 29, 2002 10:39 PM Subject: Re: Ant Regexp wrappers [Re: multiline mode and platform issues] > ----- Original Message ----- > From: "Stefan Bodewig" <[EMAIL PROTECTED]> > [...] > > If you want to apply any magic at any point, there will always be > > situations where the things Ant does are wrong. Let people deal with > > these problems explicitly themselves (they could use <fixcrl> for > > example). > > This is *much* more complicated than I expected... > > 1) Jakarta Oro works fine and is perfectly consistent. > > 2) Jakarta RegExp use platform dependant line-separator and the logic is not > correct and will always return false on windows when the \n immediately ends > the string.... bug. see code below: > > /** @return true if at the i-th position in the 'search' a newline ends > */ > private boolean isNewline(int i) { > //#### will fail here if the string is "end of line\n" since it compares > with "\r\n" size... > if (i < NEWLINE.length() - 1) { > return false; > } > > if (search.charAt(i) == '\n') { > return true; > } > > for (int j = NEWLINE.length() - 1; j >= 0; j--, i--) { > if (NEWLINE.charAt(j) != search.charAt(i)) { > return false; > } > } > return true; > } > > > 2) JDK 1.4 does not care about the option UNIX_LINE for $, it seems to only > use it for normal processing of text.. Yay ! :-( Plus it does not process > the next-line character \u0085..argh ! > Did not debug it but that's what I can roughly read from the code, the > testcase at the end of this mail does not work AT ALL. > > code snippet from JDK 1.4 RC > boolean match(Matcher matcher, int i, CharSequence seq) { > if (i < matcher.to) { > char ch = seq.charAt(i); > if (ch == '\n' || (ch|1) == '\u2029') { > i++; > } else if (ch == '\r') { > i++; > if (i < matcher.to && seq.charAt(i) == '\n') { > i++; > } > } else { > return false; > } > if (multiline == false && i != matcher.to) { > return false; > } > > > I did the following test: > > reg.setPattern("end of text$"); > assertTrue("Windows line separator", !reg.matches("end of > text\r\n")); > assertTrue("Unix line separator", reg.matches("end of text\n")); > assertTrue("standalone CR", !reg.matches("end of text\r")); > assertTrue("next-line character", !reg.matches("end of > text\u0085")); > assertTrue("line-separator character", !reg.matches("end of > text\u2028")); > assertTrue("paragraph character", !reg.matches("end of > text\u2029")); > reg.setPattern("end of text\r$"); > assertTrue("Windows line separator", reg.matches("end of > text\r\n")); > > Stephane > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>