Hi, In my case I used PDFBox, just to extract the text from PDF document and then I created the Lucene document giving the extracted text. (I didn't use the PDFBox built in Lucene search engine). So I didn't get any incompatibility problems.
This blog post shows the way. http://kalanir.blogspot.com/2008/08/indexing-pdf-documents-with-lucene.html It worked perfect for me. Thanks. On Tue, Dec 2, 2008 at 2:33 PM, tiziano bernardi <[EMAIL PROTECTED]> wrote: > > > This is the exception: > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.lucene.document.Document.add(Lorg/apache/lucene/document/Field;)V > at > org.pdfbox.searchengine.lucene.LucenePDFDocument.addUnindexedField(LucenePDFDocument.java:224) > at > org.pdfbox.searchengine.lucene.LucenePDFDocument.convertDocument(LucenePDFDocument.java:265) > at > org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocument.java:377) > at SimplePdfSearch.main(SimplePdfSearch.java:30) > > I thank you for the time you spent > > From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org> Subject: Re: > Pdf in Lucene?> Date: Mon, 1 Dec 2008 17:40:12 -0500> > I certainly don't > either, since you haven't said what the actual > exception is. If I had to > guess, though, I would say it is the line> Document document = > LucenePDFDocument.getDocument> > And that the Lucene library expected by > PDFBox is not the same version > of Lucene you are using. I would suggest > not relying on PDFBox to > create your document, and instead look at the > PDFBox calls that you > need to make to then create your Document.> > > On > Dec 1, 2008, at 9:18 AM, tiziano bernardi wrote:> > >> >> > this is my > class, I use eclipse and I haven't any errors.Do not > > understand where > the problem ....> >> >> > import java.io.File;> > import > java.io.IOException;> >> > import org.apache.lucene.analysis.Analyzer;> > > import org.apache.lucene.analysis.standard.StandardAnalyzer;> > import > org.apache.lucene.document.Document;> > import > org.apache.lucene.index.IndexWriter;> > import > org.apache.lucene.index.Term;> > import org.apache.lucene.search.Hits;> > > import org.apache.lucene.search.IndexSearcher;> > import > org.apache.lucene.search.Query;> > import > org.apache.lucene.search.TermQuery;> > import > org.apache.lucene.store.Directory;> > import > org.apache.lucene.store.RAMDirectory;> > import > org.pdfbox.searchengine.lucene.LucenePDFDocument;> >> > public final class > SimplePdfSearch> > {> > private static final String PDF_FILE_PATH = > "C:\\Users\\Tiziano\ > > \Desktop\\doc_di_prova\\prova.pdf";> > private > static final String SEARCH_TERM = "prova";> >> > public static final void > main(String[] args) throws IOException> > {> > Directory directory = null;> > >> > try> > {> > File pdfFile = new File(PDF_FILE_PATH);> > Document > document = LucenePDFDocument.getDocument(pdfFile);> >> > directory = new > RAMDirectory();> >> > IndexWriter indexWriter = null;> >> > try> > {> > > Analyzer analyzer = new StandardAnalyzer();> > indexWriter = new > IndexWriter(directory, analyzer, true);> >> > > indexWriter.addDocument(document);> > }> > finally> > {> > if (indexWriter > != null)> > {> > try> > {> > indexWriter.close();> > }> > catch (IOException > ignore)> > {> > // Ignore> > }> >> > indexWriter = null;> > }> > }> >> > > IndexSearcher indexSearcher = null;> >> > try> > {> > indexSearcher = new > IndexSearcher(directory);> >> > Term term = new Term("contents", > SEARCH_TERM);> > Query query = new TermQuery(term);> >> > Hits hits = > indexSearcher.search(query);> >> > System.out.println((hits.length() != 0) ? > "Found" : "Not Found");> > }> > finally> > {> > if (indexSearcher != null)> > > {> > try> > {> > indexSearcher.close();> > }> > catch (IOException > ignore)> > {> > // Ignore> > }> >> > indexSearcher = null;> > }> > }> > }> > > finally> > {> > if (directory != null)> > {> > try> > {> > > directory.close();> > }> > catch (IOException ignore)> > {> > // Ignore> > > }> >> > directory = null;> > }> > }> > }> > }> From: [EMAIL PROTECTED]> > To: java-user@lucene.apache.org> > > Subject: Re: Pdf in Lucene?> Date: > Mon, 1 Dec 2008 08:22:58 -0500> > > > > On Dec 1, 2008, at 8:01 AM, tiziano > bernardi wrote:> > >> > I > > tried to use pdfbox but gives me an error.> > > That the version of > > lucene and the pdfbox are incompatible.> > Lucene > knows nothing > > about PDFBox, so I don't see how they could be > > incompatible, > > unless your are referring to PDFBox's Lucene Document > > creator, in > > which case, you should ask on the PDFBox mailing list. I > > think, > > however, that it's pretty straightforward to create a Lucene > > > > document from PDFBox, so you shouldn't need to rely on their > > version.> > > Personally, I'd have a look at Tika (http://lucene.apache.org/tika > > > ), > which wraps PDFBox (and other extraction libraries) and gives > > you > back > SAX-like events via a ContentHandler, which you can then > > use to > create > Lucene documents. Else, I've been working on > > SOLR-284, which > > integrates Tika into Solr, see > https://issues.apache.org/jira/browse/SOLR-284 > > > > -Grant> > >> > I > use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, > > 1 Dec 2008 11:43:00 > > > +0000> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org > > > > > > Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes > > text so > > > you'll have to get the text out of the PDF> and feed it > > to lucene.> > > > > Google for lucene pdf, or go straight to http://www.pdfbox.org/ > > > > > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi <[EMAIL PROTECTED] > > > >:> > > >> >> > Hi,> > I want to index PDF files with lucene is > > > possible?> > > > What like?> > Thanks Tiziano Bernardi> > > > > > > _________________________________________________________________> > > > > > > Fanne di tutti i colori, personalizza la tua Hotmail!> > > http://imagine-windowslive.com/Hotmail/#0 > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: java-user- > > [EMAIL PROTECTED]> > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > >> > > > _________________________________________________________________> > > > > 50 nuovi schemi per giocare su CrossWire! Accetta la sfida!> > > http://livesearch.games.msn.com/crosswire/play_it/ > > > > > --------------------------> Grant Ingersoll> > Lucene Helpful > > Hints:> > http://wiki.apache.org/lucene-java/BasicsOfPerformance> > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]> > > For > additional commands, e-mail: [EMAIL PROTECTED]>> > > _________________________________________________________________> > Vai > oltre le parole, scarica il nuovo Messenger!> > > http://download.live.com/?mkt=it-it> > --------------------------> Grant > Ingersoll> > Lucene Helpful Hints:> > http://wiki.apache.org/lucene-java/BasicsOfPerformance> > http://wiki.apache.org/lucene-java/LuceneFAQ> > > > > > > > > > > > > ---------------------------------------------------------------------> To > unsubscribe, e-mail: [EMAIL PROTECTED]> For > additional commands, e-mail: [EMAIL PROTECTED]> > _________________________________________________________________ > Vai oltre le parole, scarica il nuovo Messenger! > http://download.live.com/?mkt=it-it > -- Kalani Ruwanpathirana Department of Computer Science & Engineering University of Moratuwa