Hi Prasad,

I was looking through documentation few days ago and found helpful
information in Lucene FAQs.

Here are the links 
http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents.
3F
http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_file_formats_l
ike_OpenDocument_.28aka_OpenOffice.org.29.2C_RTF.2C_Microsoft_Word.2C_Excel
.2C_PowerPoint.2C_Visio.2C_etc.3F

This will be a good starting point for indexing PDF and other files.

(e.g. You can extract the text from PDF documents using one of the
mentioned clients.)

-param

On 2/1/12 11:53 AM, "Prasad KVSH" <prasad.kokep...@ness.com> wrote:

>Hi,
> 
>Please find our requirement and we trying to accomplish this.
>
>Our client is looking for a Extended search engine like searching the
>given text inside the documents like (PDF, Msg, Excel, XML, Word, TXT
>etc)  and return the list of file names where it find the text. Using the
>return list we can populate them in User Interface after validating with
>user access rights. Actually we have one image server in that there will
>be few folders and sub folders, each folder will have may have 10,000
>files.
>
>so far we are search text for TXT files only using lucene-3.0.3.
>
>Thanks
>
>Prasad
>
>
>________________________________
>
>From: KARTHIK SHIVAKUMAR [mailto:nskarthi...@gmail.com]
>Sent: Wed 2/1/2012 7:04 PM
>To: java-user@lucene.apache.org
>Subject: Re: lucene-3.0.3
>
>
>
>Hi
>
>>>lucene-3.0.3 can be used for searching a text from
>
>Lucene 's primary job is to do a text search.
>
>May it be PDF/HTML/XML/MSword/PPT/XLS
>
>U have to have the code for plugin to do 2 things
>
>1) Strip text from either of the Documents (PDF/HTML/XML/MSword/PPT/XLS)
>2) Index this processed text using Lucene
>
>The indexed process can be later used for Searching thru the required
>content.
>
>;)
>with regards
>karthik
>
>
>On Wed, Feb 1, 2012 at 6:37 PM, Prasad KVSH
><prasad.kokep...@ness.com>wrote:
>
>> Hi,
>>
>>
>>
>> lucene-3.0.3 can be used for searching a text from PDF, xlsx, docx, doc,
>> xls, msg, TXT files. For this we have any common function to accomplish
>> this. Please help me on this.
>>
>>
>>
>> Thanks
>>
>> Prasad
>>
>>
>>
>>
>
>
>--
>*N.S.KARTHIK
>R.M.S.COLONY
>BEHIND BANK OF INDIA
>R.M.V 2ND STAGE
>BANGALORE
>560094*
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to