Please don't cross-post to pdfbox-dev. All devs are expected to also be
on the user list. Thanks.

If your PDF actually contained PDF Forms (which they don't), you could
use the ExtractFDF or ExtractXFDF tool to extract the form data. But
your Tax.pdf has the form mixed with the form data as normal text. There
are also no structure tags that identify certain values. The only thing
you can do is use the ExtractText tool as suggested earlier and try to
construct rules to find the values in the extracted text you're looking
for. But I don't expect that to work reliably. So either get your PDF
producer to generate PDF forms or structure tags in the content. But the
latter is probably more difficult and I don't know if PDFBox would be a
help extracting the values. But PDF forms is most probably the way to go.

On 13.11.2008 13:28:58 Duseja, Sushil wrote:
> Thank you very much for the response.
> 
> I have gone through the links mentioned below; however that didn't help
> me.
> 
> The pdf I want to extract the text from, contains multiple forms. I have
> attached a sample pdf for your kind reference.
> 
> Please advise as to how I can fetch a particular value (ex. Account
> Number).
> 
> Thanks again.
> 
> 
> -----Original Message-----
> From: Jeremias Maerki [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, November 13, 2008 5:36 PM
> To: [email protected]
> Subject: Re: Text Extraction
> 
> Have you looked at the documentation already?
> 
> 0.7.3 release:
> http://pdfbox.org/userguide/text_extraction.html
> 
> Development code:
> http://incubator.apache.org/pdfbox/userguide/text_extraction.html
> 
> You can also look at the "ExtractText" tool's source code for another
> working example to extract text from a PDF.
> 
> On 13.11.2008 11:27:04 Duseja, Sushil wrote:
> > Can anyone kindly respond to my question below?
> > 
> >  
> > 
> > Thanks!
> > 
> >  
> > 
> > -----Original Message-----
> > From: Duseja, Sushil 
> > Sent: Monday, November 10, 2008 8:09 PM
> > To: [email protected]
> > Subject: Text Extraction
> > 
> >  
> > 
> > Hello,
> > 
> >  
> > 
> > Can anyone please let me know as to how can I extract text from a pdf
> > 
> > file (with multiple forms) using PDFBox? Is creating and accessing
> > 
> > bookmarks the way to go? If possible, please point me to some working
> > 
> > examples.
> > 
> >  
> > 
> > Thanks. 
> > 
> >  
> > 
> >  
> > 
> >  
> 
> 
> 
> 
> Jeremias Maerki
> 




Jeremias Maerki

Reply via email to