Sorry this would be a job for one of the pdfbox developers. Until now I'm
just doing some support for the list and didn't have too much know-how about
it.
So I can just have a look in the evening and maybe I will find a solution.
;)


Daniel

2008/12/29 Duseja, Sushil <[email protected]>

>  If possible, can you please let us know your contact number to discuss
> this issue?
>
>
>
> Thanks!
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 5:12 PM
> *To:* Duseja, Sushil; [email protected]
> *Cc:* Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> Hi,
>
>
>
> I've just added this line:
>
>
>
> //after stripper.extractRegions();
>
> stripper.getText(document));
>
>
>
> After doing this I got some text for the regions. But it seems that this
> text is related to page 1. Did you have found an example how to use the
> Stripper? Maybe another guy could help you, due the fact that I don't have
> any knowledge about the Stripper.
>
>
>
> If I have some time in the evening I will give it another test.
>
>
>
>
>
> Bye,
>
> Daniel
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Hello Daniel,
>
>
>
> I tried using the compiled version sent across by you with no luck.
>
>
>
> I tried running a java program (for text extraction) with PDFBox 0.7.3 and
> 0.8 versions in the classpath separately. With 0.8, I am not being able to
> fetch anything. However with 0.7.3, I could extract all values apart from
> "Year of Form"  whose value is garbage - À¾´» , which is why you recommended
> using 0.8.
>
>
>
> Note - Java program and my PDF are attached for your kind reference. The
> names of the java files are self explanatory and indicative of which version
> they are using. The contents of these java files are exactly the same.
>
>
>
> Please advise.
>
>
>
> Thanks!
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 2:45 PM
>
>
> *To:* Duseja, Sushil
> *Cc:* [email protected]; Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> Just check out the latest source code and run Maven.
>
>
>
> I will send you a compiled version.
>
>
>
>
>
> Bye
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Thanks Daniel.
>
>
>
> Do you mean that - I need to fetch the latest source code from the trunk in
> the Subversion repository? If no, how can I get the source code for 0.8?
>
>
>
> I would really appreciate if you can build me a compiled version. I hope I
> am not bothering you.
>
>
>
> Thanking you in anticipation.
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 1:41 PM
>
>
> *To:* Duseja, Sushil
> *Cc:* [email protected]; Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> PDFBox is still under incubation and there is not 0.8 distribution. What
> you could do, is downloading the source code and build it by your own. So
> you could have a look at the code and debug it, where the garbage is
> produced. Or ask me and I will build you a compiled version.
>
>
>
>
>
> Daniel
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Thanks again for responding.
>
>
>
> Can you please point me to the URL/location from which 0.8 version can be
> downloaded?
>
>
>
> I referred to -
> http://sourceforge.net/project/showfiles.php?group_id=78314; however it
> shows the latest version is 0.7.3.
>
>
>
> Thanks for your time.
>
>
>
> *From:* Daniel Manzke [mailto:[email protected]]
> *Sent:* Monday, December 29, 2008 1:29 PM
> *To:* Duseja, Sushil
> *Cc:* [email protected]; Rally, Menka
> *Subject:* Re: Garbage Output
>
>
>
> Try to check out the latest Development Build. Due the fact thaht 0.7.3 is
> outdated. (year: 2006) In 0.8 there are a lot of issues fixed.
>
>
>
>
>
> Bye,
>
> daniel
>
> 2008/12/29 Duseja, Sushil <[email protected]>
>
> Hello Daniel,
>
> Thanks for the response.
>
> I am using version 0.7.3.
>
> Thanks!
>
>
> -----Original Message-----
> From: Daniel Manzke [mailto:[email protected]]
> Sent: Friday, December 26, 2008 9:11 PM
> To: [email protected]
> Subject: Re: Garbage Output
>
> Hi,
> standard question. ;) Which version are you using?
>
>
> Daniel
>
> 2008/12/26 Duseja, Sushil <[email protected]>
>
> >  Hello,
> >
> >
> >
> > While extracting text from a pdf file (attached for your kind reference)
> > using PDFBox, I get garbage output (*À¾´»*) for a special text
> value"*2007
> > *" (please see page 2); I can fetch other values correctly though.
> >
> > Is this an *encoding issue*; if yes, can anyone please let me know how to
> > fix it? If possible, please point me to some working examples.
> >
> >
> >
> > Thanks in advance.
> >
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>
>
>
>
> --
> Mit freundlichen Grüßen
>
> Daniel Manzke
>



-- 
Mit freundlichen Grüßen

Daniel Manzke

Reply via email to