PDFBox comes with the class org.pdfbox.searchengine.lucene.LucenePDFDocument which shows how to parse /index a pdf document.
Ben On Tue, 15 Jul 2003, alvaro z wrote: > > im using lucene with TXT and HTML files , its working. > > the only problem with HTML files is that i have to index html files as txt first , > before to index them as HTML. > > do anyone have try to index pdf files ? > > im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find > any samples to do that) with any of the parsers (pdfbox, jpedal ,etc). > > thanks for helping, > > Alvaro. from Lima - Peru > > > --------------------------------- > Do you Yahoo!? > SBC Yahoo! DSL - Now only $29.95 per month! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]