this is my class, I use eclipse and I haven't any errors.Do not understand
where the problem ....
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.pdfbox.searchengine.lucene.LucenePDFDocument;
public final class SimplePdfSearch
{
private static final String PDF_FILE_PATH =
"C:\\Users\\Tiziano\\Desktop\\doc_di_prova\\prova.pdf";
private static final String SEARCH_TERM = "prova";
public static final void main(String[] args) throws IOException
{
Directory directory = null;
try
{
File pdfFile = new File(PDF_FILE_PATH);
Document document = LucenePDFDocument.getDocument(pdfFile);
directory = new RAMDirectory();
IndexWriter indexWriter = null;
try
{
Analyzer analyzer = new StandardAnalyzer();
indexWriter = new IndexWriter(directory, analyzer, true);
indexWriter.addDocument(document);
}
finally
{
if (indexWriter != null)
{
try
{
indexWriter.close();
}
catch (IOException ignore)
{
// Ignore
}
indexWriter = null;
}
}
IndexSearcher indexSearcher = null;
try
{
indexSearcher = new IndexSearcher(directory);
Term term = new Term("contents", SEARCH_TERM);
Query query = new TermQuery(term);
Hits hits = indexSearcher.search(query);
System.out.println((hits.length() != 0) ? "Found" : "Not Found");
}
finally
{
if (indexSearcher != null)
{
try
{
indexSearcher.close();
}
catch (IOException ignore)
{
// Ignore
}
indexSearcher = null;
}
}
}
finally
{
if (directory != null)
{
try
{
directory.close();
}
catch (IOException ignore)
{
// Ignore
}
directory = null;
}
}
}
}> From: [EMAIL PROTECTED]> To: [email protected]> Subject: Re: Pdf
in Lucene?> Date: Mon, 1 Dec 2008 08:22:58 -0500> > > On Dec 1, 2008, at 8:01
AM, tiziano bernardi wrote:> > >> > I tried to use pdfbox but gives me an
error.> > That the version of lucene and the pdfbox are incompatible.> > Lucene
knows nothing about PDFBox, so I don't see how they could be > incompatible,
unless your are referring to PDFBox's Lucene Document > creator, in which case,
you should ask on the PDFBox mailing list. I > think, however, that it's pretty
straightforward to create a Lucene > document from PDFBox, so you shouldn't
need to rely on their version.> > Personally, I'd have a look at Tika
(http://lucene.apache.org/tika), > which wraps PDFBox (and other extraction
libraries) and gives you back > SAX-like events via a ContentHandler, which you
can then use to create > Lucene documents. Else, I've been working on SOLR-284,
which > integrates Tika into Solr, see
https://issues.apache.org/jira/browse/SOLR-284> > -Grant> > >> > I use pdf box
0.7.3 and lucene 2.1.0> Date: Mon, 1 Dec 2008 11:43:00 > > +0000> From: [EMAIL
PROTECTED]> To: [email protected]> > > Subject: Re: Pdf in Lucene?> >
Hi> > > Lucene only indexes text so > > you'll have to get the text out of the
PDF> and feed it to lucene.> > > > Google for lucene pdf, or go straight to
http://www.pdfbox.org/> > > > > --> Ian.> > > > 2008/12/1 tiziano bernardi
<[EMAIL PROTECTED]>:> > > >> >> > Hi,> > I want to index PDF files with lucene
is possible?> > > > What like?> > Thanks Tiziano Bernardi> > > >
_________________________________________________________________> > > > Fanne
di tutti i colori, personalizza la tua Hotmail!> >
http://imagine-windowslive.com/Hotmail/#0 > > > > > >
--------------------------------------------------------------------- > > > To
unsubscribe, e-mail: [EMAIL PROTECTED]> > > For additional commands, e-mail:
[EMAIL PROTECTED]>> >
_________________________________________________________________> > 50 nuovi
schemi per giocare su CrossWire! Accetta la sfida!> >
http://livesearch.games.msn.com/crosswire/play_it/> >
--------------------------> Grant Ingersoll> > Lucene Helpful Hints:>
http://wiki.apache.org/lucene-java/BasicsOfPerformance>
http://wiki.apache.org/lucene-java/LuceneFAQ> > > > > > > > > > > >
---------------------------------------------------------------------> To
unsubscribe, e-mail: [EMAIL PROTECTED]> For additional commands, e-mail: [EMAIL
PROTECTED]>
_________________________________________________________________
Vai oltre le parole, scarica il nuovo Messenger!
http://download.live.com/?mkt=it-it