This may help:
http://www.pdfbox.org/userguide/text_extraction.html#Lucene+Integration
ashwin kumar wrote:
> hi all i am able to convert a pdf in to a text file using pdfbox. and this
> is the code that i used
>
> import org.pdfbox.pdfparser.PDFParser;
> import org.pdfbox.pdmodel.PDDocument;
> import org.pdfbox.util.PDFTextStripper;
> import org.pdfbox.*;
>
> import java.io.*;
>
> public class PDFConvert
> {
>
> public static void main(String [] args)
> {
> String content = null;
> try
> {
>
> String pdfFile=new String ("D:\\ASHWIN\\res\\ashwin.pdf");
> PDDocument doc = PDDocument.load(pdfFile);
> PDFTextStripper strip = new PDFTextStripper();
> content = strip.getText(doc);
> System.out.println(content);
> }
> catch(Exception e)
> {
> e.printStackTrace();
> }
>
> }
> }
>
> now i want to index this text information with lucene . wat is code
> required
> for that pls help
>
> regards
> ashwin
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]