Re: PDF Text extraction

Ben Litchfield Fri, 27 Dec 2002 06:04:21 -0800

You need to do something like

//first get the document field
Field contentsField = doc.getField( "contents" );


//Then get the reader from the field
BufferedReader contentsReader =
    new BufferedReader( contentsField.readerValue() );

//finally dump the contents of the reader to System.out
String line = null;
while( (line = contentsReader.readLine() ) != null )
{
    System.out.println( line );
}

I have not tested if this compiles but it should be pretty close.

Ben Litchfield


On Fri, 27 Dec 2002, Suhas Indra wrote:

> Hello List
>
> I am using PDFBox to index some of the PDF documents. The parser works fine
> and I can read the summary. But the contents are displayed as
> java.io.InputStream.
>
> When I try the following:
> System.out.println(doc.getField("contents")) (where doc is the Document
> object)
>
> The result will be:
>
> Text<contents:java.io.InputStreamReader@127dc0>
>
> I want to print the extracted data.
>
> Can anyone please let me know how to extract the contents?
>
> Regards
>
> Suhas
>
>
>
> --------------------------------------------------------------
> Robosoft Technologies - Partners in Product Development
>
>
>
>
>
>
>
>
>
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>

-- 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: PDF Text extraction

Reply via email to