Re: Pdf in Lucene?

Kalani Ruwanpathirana Thu, 04 Dec 2008 05:47:48 -0800

Hi Tiziano,

What is the error you got? I think you can get the text easily using the
code shown below.



FileInputStream fi = new FileInputStream(new File("sample.pdf"));

PDFParser parser = new PDFParser(fi);
parser.parse();
COSDocument cd = parser.getDocument();
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(new PDDocument(cd));
cd.close();

After getting the value for text you can simply create the Lucene document.

Document doc = new Document();
doc.add(new Field("content", text,Field.Store.NO <http://field.store.no/>,
Field.Index.TOKENIZED));
>
>
>
>
> On Thu, Dec 4, 2008 at 6:20 PM, tiziano bernardi <[EMAIL PROTECTED]>wrote:
>
>>
>> Thanks very kind ...
>> But I've tried that code but I do not work ...
>> You could send me a simple working class that uses it please?
>> Thanks> Date: Thu, 4 Dec 2008 15:19:26 +0530> From: [EMAIL PROTECTED]>
>> To: [email protected]> Subject: Re: Pdf in Lucene?> > Hi,> > In
>> my case I used PDFBox, just to extract the text from PDF document and> then
>> I created the Lucene document giving the extracted text. (I didn't use> the
>> PDFBox built in Lucene search engine). So I didn't get any> incompatibility
>> problems.> > This blog post shows the way.>
>> http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html>
>> > It worked perfect for me.> > Thanks.
>> _________________________________________________________________
>> Ci sai fare con l'italiano? Scoprilo con Typectionary!
>> http://typectionary.it.msn.com/
>>
>
>
>
> --
> Kalani Ruwanpathirana
> Department of Computer Science & Engineering
> University of Moratuwa
>



-- 
Kalani Ruwanpathirana
Department of Computer Science & Engineering
University of Moratuwa

Re: Pdf in Lucene?

Reply via email to