Hi, Currently we are using PDFBox to process PDF files and POI to process DOC/XLS files, before send strings to lucene for indexing,
Does any one know if PDFBox or POI can process multi- byte characters like Japanese with various encodings (whatever specified in PDF or DOC)? Thanks very much for helps, Lisheng --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]