Re: Problem using Lucene on Ubuntu

Jan Peter Stotz Mon, 18 Feb 2008 06:01:16 -0800

Grant Ingersoll wrote:

Note: ENCODING is whatever encoding the file is in, as in "UTF-8", ifthat is what your files are in.

I think there is a misunderstanding, the WordExtractor extracts textfrom MS Word (.doc) files. Those files are binary and therefore does nothave an encoding.I would print out the extracted text into a plain text files and compareif there are differences between the file generated on Windows andLinux/Ubuntu. This allows to determine if this is a WordExtractor or aLucene problem.


Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problem using Lucene on Ubuntu

Reply via email to