This may help:

http://www.pdfbox.org/userguide/text_extraction.html#Lucene+Integration

ashwin kumar wrote:
> hi all i am able to convert a pdf in to a text file using pdfbox. and this
> is the code that i used
> 
> import org.pdfbox.pdfparser.PDFParser;
> import org.pdfbox.pdmodel.PDDocument;
> import org.pdfbox.util.PDFTextStripper;
> import org.pdfbox.*;
> 
> import java.io.*;
> 
> public class PDFConvert
> {
> 
> public static void main(String [] args)
> {
> String content = null;
> try
> {
> 
>    String pdfFile=new String ("D:\\ASHWIN\\res\\ashwin.pdf");
>    PDDocument doc = PDDocument.load(pdfFile);
>    PDFTextStripper strip = new PDFTextStripper();
>    content = strip.getText(doc);
>    System.out.println(content);
> }
> catch(Exception e)
> {
>    e.printStackTrace();
> }
> 
> }
> }
> 
> now i want to index this text information with lucene . wat is code
> required
> for that pls help
> 
> regards
> ashwin
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to