Re: index document pdf

Luke Shannon Wed, 17 Nov 2004 09:49:42 -0800

Hello;

Hopfully I understand the question.


1. Modify the indexDoc(file) method to consider the file type pdf:

else if (file.getPath().endsWith(".html") ||
file.getPath().endsWith(".pdf")) {

2. Create a specific branch of code to create the lucene document from the
file type and than add it to the index:

if (file.getPath().endsWith(".pdf")) {
      try {
       Document doc = LucenePDFDocument.getDocument(file);
       writer.addDocument(doc);
      } catch (Exception e) {
       System.out.println("INDEXING ERROR: Unable to index pdf document: "
           + file.getPath()
           + " "
           + e.getMessage());
      }
     }

Note: Ensure you do step 2 for the case when uidIter != null and when it is
equal to null.

That should do it.

Concerning pdfbox make sure you have all the jars required. I had a little
trouble getting this going at first. It needs log4j.jar to run. If you have
any problems with the appenders I found this message thread helpful.
http://java2.5341.com/msg/32909.html

Luke

----- Original Message ----- 
From: "Miguel Angel" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, November 17, 2004 12:28 PM
Subject: index document pdf


> Hi, i downloading pdfbox 0.6.4  , what add in the source code the
> demo`s lucene ????
>
> -- 
> Miguel Angel Angeles R.
> Asesoria en Conectividad y Servidores
> Telf. 97451277
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: index document pdf

Reply via email to