Re: Problem using Lucene on Ubuntu

Grant Ingersoll Mon, 18 Feb 2008 05:45:26 -0800

Not sure about WordExtractor, does it also take a Reader?  I would try:

Reader input = new InputStreamReader(new FileInputStream(file),"ENCODING");

WordExtractor extractor = new WordExtractor(input);
content = extractor.getText();

Note: ENCODING is whatever encoding the file is in, as in "UTF-8", ifthat is what your files are in. If you don't know the encoding, youwill need to add in some type of character encoding detection tool.Search the web for that, as I know there are some out there (I don'tknow any names off hand).

Bottom line, it sounds like you need to figure out how to load yourfiles based on their encoding. That problem is not really core toLucene, but you should be able to search the archives here to findothers with similar questions.


-Grant

On Feb 18, 2008, at 8:13 AM, kratoras wrote:


No problem about the misunderstanding.
I am using

InputStream input =new URL (  "file:///"+file.getAbsolutePath()
).openStream ();
WordExtractor  extractor = new WordExtractor(input);
content=extractor.getText();

where the wordextractor isorg.apache.poi.hwpf.extractor.WordExtractor;

The wordextractor takes an inputstream as an arguement. Should idetermine

the encoding of the inputstream and how?
--
View this message in context: 
http://www.nabble.com/Problem-using-Lucene-on-Ubuntu-tp15543843p15545082.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problem using Lucene on Ubuntu

Reply via email to