Re: about PDF / HTML index

Ben Litchfield Wed, 16 Jul 2003 03:28:30 -0700

PDFBox comes with the class
org.pdfbox.searchengine.lucene.LucenePDFDocument which shows how to
parse /index a pdf document.


Ben


On Tue, 15 Jul 2003, alvaro z wrote:

>
> im using lucene with TXT and HTML files , its working.
>
> the only problem with HTML files is that i have to index html files as txt first , 
> before to index them as HTML.
>
> do anyone have try to index pdf files ?
>
> im trying the pdfbox , is there any samples for indexing pdf files ? (i dont find 
> any samples to do that) with any of the parsers (pdfbox, jpedal ,etc).
>
> thanks for helping,
>
> Alvaro. from Lima - Peru
>
>
> ---------------------------------
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: about PDF / HTML index

Reply via email to