Hi Prasad, I was looking through documentation few days ago and found helpful information in Lucene FAQs.
Here are the links http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents. 3F http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_file_formats_l ike_OpenDocument_.28aka_OpenOffice.org.29.2C_RTF.2C_Microsoft_Word.2C_Excel .2C_PowerPoint.2C_Visio.2C_etc.3F This will be a good starting point for indexing PDF and other files. (e.g. You can extract the text from PDF documents using one of the mentioned clients.) -param On 2/1/12 11:53 AM, "Prasad KVSH" <prasad.kokep...@ness.com> wrote: >Hi, > >Please find our requirement and we trying to accomplish this. > >Our client is looking for a Extended search engine like searching the >given text inside the documents like (PDF, Msg, Excel, XML, Word, TXT >etc) and return the list of file names where it find the text. Using the >return list we can populate them in User Interface after validating with >user access rights. Actually we have one image server in that there will >be few folders and sub folders, each folder will have may have 10,000 >files. > >so far we are search text for TXT files only using lucene-3.0.3. > >Thanks > >Prasad > > >________________________________ > >From: KARTHIK SHIVAKUMAR [mailto:nskarthi...@gmail.com] >Sent: Wed 2/1/2012 7:04 PM >To: java-user@lucene.apache.org >Subject: Re: lucene-3.0.3 > > > >Hi > >>>lucene-3.0.3 can be used for searching a text from > >Lucene 's primary job is to do a text search. > >May it be PDF/HTML/XML/MSword/PPT/XLS > >U have to have the code for plugin to do 2 things > >1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS) >2) Index this processed text using Lucene > >The indexed process can be later used for Searching thru the required >content. > >;) >with regards >karthik > > >On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH ><prasad.kokep...@ness.com>wrote: > >> Hi, >> >> >> >> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc, >> xls, msg, TXT files. For this we have any common function to accomplish >> this. Please help me on this. >> >> >> >> Thanks >> >> Prasad >> >> >> >> > > >-- >*N.S.KARTHIK >R.M.S.COLONY >BEHIND BANK OF INDIA >R.M.V 2ND STAGE >BANGALORE >560094* > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org