: Yes the version of lucene and java are exactly the same on the different : machines. : Infact we unjared lucene and jared it with our jar and are running from the : same nfs mounts on both the machines
i didn't do an indepth code read, but a quick skim of StandardTokenizerImpl didn't turn up any questionale uses of APIs that might have differnet behavior depending on the default locale/charset of the JVM running it ... everthing is simple char or String based access. Are you *certain* that you are providing Lucene with the Strings you think you are? Is it possible that you are using a FileReader or InputStreamReader that rely on the default character encoding of the JVM (which may not be correct for the data you are reading in) ? Can you write a simple junit test that fails on one machine and passes on the other? If so i'd love to see that test along with the output of this code... java.util.Enumeration e = System.getProperties().propertyNames(); while(e.hasMoreElements()) { String prop = (String)e.nextElement(); System.out.println(prop + " = " + java.net.URLEncoder.encode(System.getProperty(prop), "US-ASCII")); } -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]