For converting full text to plain text for indexing look at Apache TIKA, which has an converter for OpenDocument: http://lucene.apache.org/tika/
This Mailing List is *about* the development of Lucene, not about questions *how* to develop own code that uses Lucene. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] > -----Original Message----- > From: ganesh H D [mailto:[EMAIL PROTECTED] > Sent: Friday, November 21, 2008 1:50 PM > To: java-dev@lucene.apache.org > Subject: Indexing Open office documents > > > Hi, > > I have been working on Apache Lucene from past 3 days. I tried to deploy > the > sample application which we get from lucene distribution. its working > absolutely fine. It's indexing all type files like .pdf, .Xml, .java , > .txt > etc.....its also indexing open office documents also. but when i search > for > the words of open office documents, its not showing the exact result. > later > i come to know that open office documents are ZIP archives that contain > XML > files. we need to uncompress the file using Java's ZIP support, then parse > meta.xml to get title etc. and content.xml to get the document's content. > But i couldn't get much information about this issue. please help me to > solve this issue. > > regards, > ganesh > > -- > View this message in context: http://www.nabble.com/Indexing-Open-office- > documents-tp20620421p20620421.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]