Re: Lucene standard analyzer internationalization

Chris Hostetter Tue, 22 Apr 2008 14:04:19 -0700

: Yes the version of lucene and java are exactly the same on the different
: machines.
: Infact we unjared lucene and jared it with our jar and are running from the
: same nfs mounts on both the machines


i didn't do an indepth code read, but a quick skim of 
StandardTokenizerImpl didn't turn up any questionale uses of APIs that 
might have differnet behavior depending on the default locale/charset of 
the JVM running it ... everthing is simple char or String based access.

Are you *certain* that you are providing Lucene with the Strings you think 
you are?  Is it possible that you are using a FileReader or 
InputStreamReader that rely on the default character encoding of the JVM 
(which may not be correct for the data you are reading in) ?

Can you write a simple junit test that fails on one machine and passes on 
the other?  If so i'd love to see that test along with the output of this 
code...

  java.util.Enumeration e = System.getProperties().propertyNames();
  while(e.hasMoreElements()) {
    String prop = (String)e.nextElement();
    System.out.println(prop + " = " + 
java.net.URLEncoder.encode(System.getProperty(prop), "US-ASCII"));
  }


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene standard analyzer internationalization

Reply via email to