Hi,
I'm posting this here before entering a bug in bugzilla just to make sure
that it's not related to bug #125...
My results were the same in both oro-dev-2.0.2-dev-2 and oro-dev-2.0.3
MULTILINE_MASK patterns that use the end anchor '$' are not matching
non-UNIX files.
(I don't have a Macintosh to test with,
but I've confirmed this with Windows NT)
So, if I have the regular expression
/<matching pattern>$/m
-Slurp a file into a string (using the System's line.separator between lines)
-Try to match the file string (which contains a line that matches)
+ On Solaris, the pattern matches without any problems.
+ On WinNt, the pattern doesn't match.
The fix I'm using now is to write the regular expression so that it looks like
/<matching pattern>([\r\n]|$)/sm
I see that there are many checks in the oro code that look for a character
equal to '\n'... My suggested fix is to create a helper class (what package
this belongs in I don't know). But, this helper class could have a static
method "boolean isLineEnding( char )" or similar that could replace all the
current "<char> == '\n'" code. I'd be happy to implement this (with a little
guidance as to where it belongs).
Notes on test results using the attached TestEndAnchor code:
On a Solaris 2.6 machine, there were two failures when trying to match the
first pattern. On a WinNT 4.0 machine, there were three failures when trying
to match the same pattern. The added failure was from the string that uses
the System.getProperty( "line.separator" ) for it's line ending.
Thanks,
Ed.
P.S.
Daniel, I would have liked to give you the regular expressions I used in my
timing test code (from mid-May). But, I was advised against doing that because
the REs were very application specific.
import org.apache.oro.text.regex.Pattern;
import org.apache.oro.text.regex.PatternMatcher;
import org.apache.oro.text.regex.Perl5Matcher;
import org.apache.oro.text.regex.Perl5Compiler;
import org.apache.oro.text.regex.PatternMatcherInput;
import org.apache.oro.text.regex.MalformedPatternException;
/**
* Test the regular expression end anchor using different line endings.
*
* @author Ed Chidester
*/
public class TestEndAnchor {
/**
* An array of Perl-syntax regular expression strings for testing.
* Regular expressions are tested against both the
* {@link #failString__ failing string} and the
* {@link #matchString__ matching string} arrays.
*/
private static String [ ] reString__ = {
"/x$/m" ,
"/x([\\r\\n]|$)/sm" ,
};
/**
* An array of matching strings for testing.
* This array is used for the tests performed in the
* {@link #main main testing method}.
* Each element will match against the corresponding element in
* the {@link #reString__ regular expression array}.
*/
private static String [ ] matchString__ = {
"This line ends with x" ,
"This also stops at x\r"
+ "but it uses a \\r char like a Macintosh file would" ,
"This also stops at x\n"
+ "but it uses a \\n char like a Solaris file would" ,
"This also stops at x\r\n"
+ "but it uses both \\r and \\n like Win32 files would" ,
"This line also stops with x"
+ System.getProperty( "line.separator" )
+ "and it uses the system-dependent line ending character(s)." ,
};
/**
* An array of failing strings for testing.
* This array is used for the tests performed in the
* {@link #main main testing method}.
* Each element will fail to match against the corresponding element in
* the {@link #reString__ regular expression array}.
*/
private static String [ ] failString__ = {
"This line ends with the wrong character" ,
"This one also ends with the wrong character" ,
"Wrong characters abound in the failString__\r"
+ "array" ,
"Wrong characters abound in the failString__\r"
+ "array" ,
"Neither tubas nor xylophones should cause"
+ System.getProperty( "line.separator" )
+ "the regular expressions to match."
};
/**
* <p>
* Main test method
* </p>
*
*/
public static void main( String [ ] args ) {
int i;
int x;
int size = matchString__.length;
int oroMatchFlags;
int firstIndex;
int lastIndex;
int finalIndex;
Pattern [] oroObject = new Pattern [ reString__.length ];
Perl5Matcher oroMatcher;
Perl5Compiler oroCompiler;
if ( failString__.length < size ) {
size = failString__.length;
}
try {
// Initialize the Oro objects
oroMatcher = new Perl5Matcher( );
oroCompiler = new Perl5Compiler( );
// ---------------------------------------------
// Initialize all the regular expression objects
// ---------------------------------------------
for ( i = 0 ; i < reString__.length ; i++ ) {
oroMatchFlags = Perl5Compiler.DEFAULT_MASK;
firstIndex = reString__[ i ].indexOf( '/' );
lastIndex = reString__[ i ].lastIndexOf( '/' );
finalIndex = reString__[ i ].length( );
if ( lastIndex <= firstIndex ) {
System.err.println( "Error reading regular expression \""
+ reString__[ i ] + "\"" );
lastIndex = reString__[ i ].length( );
}
// Account for any global or case insensitive matches
for ( int j = lastIndex + 1 ; j < finalIndex ; j++ ) {
if ( reString__[ i ].charAt( j ) == 'i' ) {
// // Testing printout...
// System.out.println( "Case independent\t"
// + reString__[ i ] );
oroMatchFlags |= Perl5Compiler.CASE_INSENSITIVE_MASK;
}
else if ( reString__[ i ].charAt( j ) == 'm' ) {
// // Testing printout...
// System.out.println( "Multiline match\t"
// + reString__[ i ] );
oroMatchFlags |= Perl5Compiler.MULTILINE_MASK;
}
else if ( reString__[ i ].charAt( j ) == 's' ) {
// // Testing printout...
// System.out.println( "Singleline match\t"
// + reString__[ i ] );
oroMatchFlags |= Perl5Compiler.SINGLELINE_MASK;
}
else {
System.err.println( "Regular expression option \"/"
+ reString__[ i ].charAt( j )
+ "\" is being ignored" );
}
} // End for j
oroObject[ i ] = oroCompiler.compile(
reString__[i].substring( (firstIndex + 1), lastIndex ) ,
oroMatchFlags );
} // End for i
// -------------
// Begin testing
// -------------
// Testing printout...
System.out.println( "About to begin testing" );
for ( x = 0 ; x < reString__.length ; x++ ) {
for ( i = 0 ; i < size ; i++ ) {
int beginIndex = 0;
boolean legitimateMatch = false;
boolean illegitimateMatch = false;
PatternMatcherInput pmiMatch =
new PatternMatcherInput( matchString__[ i ] );
PatternMatcherInput pmiFail =
new PatternMatcherInput( failString__[ i ] );
if ( ! oroMatcher.contains( pmiMatch , oroObject[x] ) ) {
System.err.println("Error with Perl5 match["+i+"]");
System.err.println( reString__[ x ] );
System.err.println( matchString__[ i ] );
}
else {
// Testing printout...
System.out.println( "match[ " + i + " ] okay" );
}
if ( oroMatcher.contains( pmiFail , oroObject[x] ) ) {
System.err.println("Error with Perl5 fail["+i+"]");
System.err.println( reString__[ x ] );
System.err.println( failString__[ i ] );
}
else {
// Testing printout...
System.out.println( "fail[ " + i + " ] okay" );
}
} // End for i
} // End for x
}
catch ( Exception e ) {
System.err.println( e + "Caught while running test" );
e.printStackTrace( System.err );
System.exit( 1 );
}
// Testing printout...
System.out.println( "Finished testing" );
} // End main method
} // TestEndAnchor class