Hi Tiziano, What is the error you got? I think you can get the text easily using the code shown below.
FileInputStream fi = new FileInputStream(new File("sample.pdf")); PDFParser parser = new PDFParser(fi); parser.parse(); COSDocument cd = parser.getDocument(); PDFTextStripper stripper = new PDFTextStripper(); String text = stripper.getText(new PDDocument(cd)); cd.close(); After getting the value for text you can simply create the Lucene document. Document doc = new Document(); doc.add(new Field("id", "2", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("content", docText,Field.Store.NO, Field.Index.TOKENIZED)); On Thu, Dec 4, 2008 at 6:20 PM, tiziano bernardi <[EMAIL PROTECTED]> wrote: > > Thanks very kind ... > But I've tried that code but I do not work ... > You could send me a simple working class that uses it please? > Thanks> Date: Thu, 4 Dec 2008 15:19:26 +0530> From: [EMAIL PROTECTED]> To: > java-user@lucene.apache.org> Subject: Re: Pdf in Lucene?> > Hi,> > In my > case I used PDFBox, just to extract the text from PDF document and> then I > created the Lucene document giving the extracted text. (I didn't use> the > PDFBox built in Lucene search engine). So I didn't get any> incompatibility > problems.> > This blog post shows the way.> > http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html> > > It worked perfect for me.> > Thanks. > _________________________________________________________________ > Ci sai fare con l'italiano? Scoprilo con Typectionary! > http://typectionary.it.msn.com/ > -- Kalani Ruwanpathirana Department of Computer Science & Engineering University of Moratuwa