RE: Indexing Open office documents

Uwe Schindler Fri, 21 Nov 2008 05:15:30 -0800

For converting full text to plain text for indexing look at Apache TIKA,
which has an converter for OpenDocument: http://lucene.apache.org/tika/


This Mailing List is *about* the development of Lucene, not about questions
*how* to develop own code that uses Lucene.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [EMAIL PROTECTED]

> -----Original Message-----
> From: ganesh H D [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 21, 2008 1:50 PM
> To: [email protected]
> Subject: Indexing Open office documents
> 
> 
> Hi,
> 
> I have been working on Apache Lucene from past 3 days. I tried to deploy
> the
> sample application which we get from lucene distribution. its working
> absolutely fine. It's indexing all type files like .pdf, .Xml, .java ,
> .txt
> etc.....its also indexing open office documents also. but when i search
> for
> the words of open office documents, its not showing the exact result.
> later
> i come to know that open office documents are ZIP archives that contain
> XML
> files. we need to uncompress the file using Java's ZIP support, then parse
> meta.xml to get title etc. and content.xml to get the document's content.
> But i couldn't get much information about this issue. please help me to
> solve this issue.
> 
> regards,
> ganesh
> 
> --
> View this message in context: http://www.nabble.com/Indexing-Open-office-
> documents-tp20620421p20620421.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Indexing Open office documents

Reply via email to