Hi,
I've just added this line:

//after stripper.extractRegions();
stripper.getText(document));

After doing this I got some text for the regions. But it seems that this
text is related to page 1. Did you have found an example how to use the
Stripper? Maybe another guy could help you, due the fact that I don't have
any knowledge about the Stripper.

If I have some time in the evening I will give it another test.


Bye,
Daniel

2008/12/29 Duseja, Sushil <[email protected]>

>  Hello Daniel,
>
>
>
> I tried using the compiled version sent across by you with no luck.
>
>
>
> I tried running a java program (for text extraction) with PDFBox 0.7.3 and
> 0.8 versions in the classpath separately. With 0.8, I am not being able to
> fetch anything. However with 0.7.3, I could extract all values apart from
> "Year of Form"  whose value is garbage - À¾´» , which is why you recommended
> using 0.8.
>
>
>
> Note - Java program and my PDF are attached for your kind reference. The
> names of the java files are self explanatory and indicative of which version
> they are using. The contents of these java files are exactly the same.
>
>
>
> Please advise.
>
>
>
> Thanks!
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 2:45 PM
>
> *To:* Duseja, Sushil
> *Cc:* [email protected]; Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> Just check out the latest source code and run Maven.
>
>
>
> I will send you a compiled version.
>
>
>
>
>
> Bye
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Thanks Daniel.
>
>
>
> Do you mean that - I need to fetch the latest source code from the trunk in
> the Subversion repository? If no, how can I get the source code for 0.8?
>
>
>
> I would really appreciate if you can build me a compiled version. I hope I
> am not bothering you.
>
>
>
> Thanking you in anticipation.
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 1:41 PM
>
>
> *To:* Duseja, Sushil
> *Cc:* [email protected]; Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> PDFBox is still under incubation and there is not 0.8 distribution. What
> you could do, is downloading the source code and build it by your own. So
> you could have a look at the code and debug it, where the garbage is
> produced. Or ask me and I will build you a compiled version.
>
>
>
>
>
> Daniel
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Thanks again for responding.
>
>
>
> Can you please point me to the URL/location from which 0.8 version can be
> downloaded?
>
>
>
> I referred to -
> http://sourceforge.net/project/showfiles.php?group_id=78314; however it
> shows the latest version is 0.7.3.
>
>
>
> Thanks for your time.
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 1:29 PM
> *To:* Duseja, Sushil
> *Cc:* [email protected]; Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> Try to check out the latest Development Build. Due the fact thaht 0.7.3 is
> outdated. (year: 2006) In 0.8 there are a lot of issues fixed.
>
>
>
>
>
> Bye,
>
> daniel
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Hello Daniel,
>
> Thanks for the response.
>
> I am using version 0.7.3.
>
> Thanks!
>
>
> -----Original Message-----
> From: Daniel Manzke [mailto:[email protected]]
> Sent: Friday, December 26, 2008 9:11 PM
> To: [email protected]
> Subject: Re: Garbage Output
>
> Hi,
> standard question. ;) Which version are you using?
>
>
> Daniel
>
> 2008/12/26 Duseja, Sushil <[email protected]>
>
> >  Hello,
> >
> >
> >
> > While extracting text from a pdf file (attached for your kind reference)
> > using PDFBox, I get garbage output (*À¾´»*) for a special text
> value"*2007
> > *" (please see page 2); I can fetch other values correctly though.
> >
> > Is this an *encoding issue*; if yes, can anyone please let me know how to
> > fix it? If possible, please point me to some working examples.
> >
> >
> >
> > Thanks in advance.
> >
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

Reply via email to