Hi Tiziano,
What is the error you got? I think you can get the text easily using the
code shown below.
FileInputStream fi = new FileInputStream(new File("sample.pdf"));
PDFParser parser = new PDFParser(fi);
parser.parse();
COSDocument cd = parser.getDocument();
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(new PDDocument(cd));
cd.close();
After getting the value for text you can simply create the Lucene document.
Document doc = new Document();
doc.add(new Field("id", "2", Field.Store.YES,
Field.Index.TOKENIZED));
doc.add(new Field("content", docText,Field.Store.NO,
Field.Index.TOKENIZED));
On Thu, Dec 4, 2008 at 6:20 PM, tiziano bernardi <[EMAIL PROTECTED]> wrote:
>
> Thanks very kind ...
> But I've tried that code but I do not work ...
> You could send me a simple working class that uses it please?
> Thanks> Date: Thu, 4 Dec 2008 15:19:26 +0530> From: [EMAIL PROTECTED]> To:
> [email protected]> Subject: Re: Pdf in Lucene?> > Hi,> > In my
> case I used PDFBox, just to extract the text from PDF document and> then I
> created the Lucene document giving the extracted text. (I didn't use> the
> PDFBox built in Lucene search engine). So I didn't get any> incompatibility
> problems.> > This blog post shows the way.>
> http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html>
> > It worked perfect for me.> > Thanks.
> _________________________________________________________________
> Ci sai fare con l'italiano? Scoprilo con Typectionary!
> http://typectionary.it.msn.com/
>
--
Kalani Ruwanpathirana
Department of Computer Science & Engineering
University of Moratuwa