Aw: Re: Type1Glyph2D No glyph for 41 (.notdef) in font Helvetica

2015-02-02 Thread Andreas Lüdtke
Hi John,
 
I tried pdfbox-app-2.0.0-20150203.010142-1018.jar this morning and I still get 
the Exception in thread "main" java.lang.StringIndexOutOfBoundsException: 
String index out of range: 0.
 
Can I help you fix this?
 
Andreas
 

Gesendet: Samstag, 31. Januar 2015 um 20:25 Uhr
Von: "John Hewson" 
An: users@pdfbox.apache.org, "Andreas Lüdtke" 
Betreff: Re: Type1Glyph2D No glyph for 41 (.notdef) in font Helvetica
Thanks Andreas. I tool a look at the arial.ttf file which you sent me and the 
problem was as I had suspected - Microsoft have changed the tables in the 
Windows 8.1 version of the font.

PDFBox relies on the PostScript glyph names in the ‘post’ table when 
substituting a TTF in place of a Type 1 font. However, the new Windows 8.1 
version of Arial uses a format 3 ‘post’ table which does not include any names. 
That means that every glyph lookup fails.

The solution is to extended FontBox's TrueTypeFont class to allow looking up of 
PostScript names by mapping them to entries in the ‘cmap’ table. I’ve opened 
PDFBOX-2650 to address this.

Thanks for taking the time to report this and provide the necessary details.

-- John

> On 31 Jan 2015, at 05:45, Andreas Lüdtke  wrote:
>
> John,
>
> I forgot that I have a Windows 8.1 tablet at home: so here is the output of 
> pdfbox-app. I copied only the first lines since they are basically the same:
>
> D:\__test>java -jar pdfbox-app-2.0.0-20150129.180809-996.jar PDFToImage 
> rg-1234567890BA.pdf
> Jan 31, 2015 2:34:34 PM org.apache.pdfbox.rendering.font.Type1Glyph2D 
> getPathForCharacterCode
> WARNUNG: No glyph for 77 (.notdef) in font Helvetica
> Jan 31, 2015 2:34:34 PM org.apache.pdfbox.rendering.font.Type1Glyph2D 
> getPathForCharacterCode
> WARNUNG: No glyph for 97 (.notdef) in font Helvetica
> Jan 31, 2015 2:34:34 PM org.apache.pdfbox.rendering.font.Type1Glyph2D 
> getPathForCharacterCode
> WARNUNG: No glyph for 114 (.notdef) in font Helvetica
> Jan 31, 2015 2:34:34 PM org.apache.pdfbox.rendering.font.Type1Glyph2D 
> getPathForCharacterCode
> WARNUNG: No glyph for 107 (.notdef) in font Helvetica
> Jan 31, 2015 2:34:34 PM org.apache.pdfbox.rendering.font.Type1Glyph2D 
> getPathForCharacterCode
> WARNUNG: No glyph for 117 (.notdef) in font Helvetica
> I send you the font off list.
>
> bestr regards
>
> Andreas
>
>
> -- Originalnachricht --
> Von: "John Hewson" mailto:j...@jahewson.com>>
> An: users@pdfbox.apache.org 
> Gesendet: 30.01.2015 21:27:43
> Betreff: Re: Type1Glyph2D No glyph for 41 (.notdef) in font Helvetica
>
>> Your list of fonts looks normal, Helvetica gets mapped to ArialMT on 
>> Windows. I wonder if the Arial font has changed on Windows 8.1 in a way 
>> which is causing PDFBox to parse it incorrectly? If you send me 
>> C:\Windows\FONTS\arial.ttf off-list, I can take a look at it.
>>
>> When you run pdfbox-app do you see any other font-related messages in the 
>> log?
>>
>> -- John
>>
>>> On 29 Jan 2015, at 23:05, Andreas Lüdtke  wrote:
>>>
>>> John,
>>>
>>> below you can find the output from a Windows 8.1 machine. When I run the 
>>> test on a Windows 7 machine I have also no problems with the generated 
>>> image.
>>> If you want me to run other tests, please let me know.
>>>
>>> Best regards
>>>
>>> Andreas
>>>
>>> output from DumpFonts on Windows 8.1 Enterprise 64bit:
>>> ---
>>> TTF: UtsaahItalic: C:\Windows\FONTS\utsaahi.ttf
>>> TTF: LeelawadeeUIBold: C:\Windows\FONTS\LeelaUIb.ttf
>>> TTF: GeorgiaItalic: C:\Windows\FONTS\georgiai.ttf
>>> TTF: DilleniaUPCItalic: C:\Windows\FONTS\upcdi.ttf
>>> TTF: Vrinda: C:\Windows\FONTS\vrinda.ttf
>>> TTF: IskoolaPotaBold: C:\Windows\FONTS\iskpotab.ttf
>>> TTF: JavaneseText: C:\Windows\FONTS\javatext.ttf
>>> TTF: Bauhaus93: C:\Windows\FONTS\BAUHS93.TTF
>>> TTF: BookAntiqua-Italic: C:\Windows\FONTS\ANTQUAI.TTF
>>> TTF: BookAntiquaItalic: C:\Windows\FONTS\ANTQUAI.TTF
>>> TTF: LucidaBright-Demi: C:\Windows\FONTS\LBRITED.TTF
>>> TTF: UrduTypesetting-Bold: C:\Windows\FONTS\UrdTypeb.ttf
>>> TTF: TraditionalArabicBold: C:\Windows\FONTS\tradbdo.ttf
>>> TTF: YuMinchoDemibold: C:\Windows\FONTS\yumindb.ttf
>>> TTF: Corbel-Italic: C:\Windows\FONTS\corbeli.ttf
>>> TTF: NiagaraSolidReg: C:\Windows\FONTS\NIAGSOL.TTF
>>> TTF: SegoeUI-LightItalic: C:\Windows\FONTS\seguili.ttf
>>> TTF: EucrosiaUPCItalic: C:\Windows\FONTS\upcei.ttf
>>> TTF: Tahoma: C:\Windows\FONTS\tahoma.ttf
>>> TTF: CenturyGothic-Italic: C:\Windows\FONTS\GOTHICI.TTF
>>> TTF: Mangal-Bold: C:\Windows\FONTS\mangalb.ttf
>>> TTF: Aparajita: C:\Windows\FONTS\aparaj.ttf
>>> TTF: ArialBoldItalicMT: C:\Windows\FONTS\arialbi.ttf
>>> TTF: LucidaFax-DemiItalic: C:\Windows\FONTS\LFAXDI.TTF
>>> TTF: GaramondItalic: C:\Windows\FONTS\GARAIT.TTF
>>> TTF: Modern-Regular: C:\Windows\FONTS\MOD20.TTF
>>> TTF: NiagaraSolid-Reg: C:\Windows\FONTS\NIAGSOL.TTF
>>> TTF: CourierNewPSBoldMT: C:\Windows\FONTS\courbd.ttf
>>> TTF: SegoeUI: C:\Windows\FONTS\segoeui.ttf
>>> TTF: Ahar

Re: PDF extraction

2015-02-02 Thread Frank van der Hulst
I agree with everything Peter has said.

My 'solutions' work for the tables I wanted to extract, but won't work for
others.

I think its common for a table to have at least one column which is present
in every row. I use that to break the table up into rows... when the X
position goes back to the left of this column it is (probably) a new row.

I have also used the header row of the table to identify the column limits.

Incidentally, it's also a good idea to strip out the page headers & footers
before trying to parse a table.

I'm happy for my work to be included in PDFBox as a starting point for
other people trying to extract tables.

Frank

On Tue, Feb 3, 2015 at 8:48 AM, Peter Murray-Rust  wrote:

> I agree with all those who emphasis that there is no deterministic
> algorithm. I also agree that Tabula is likely to be the best place to start
> and am working with them.
>
> The first question is:
>
> "How do you know where the tables are?"
>
> In some cases you can look for the Anglophone word "Table", and a regex of
> something like:
> - "Tab(le)?\s*((\d+)|(IVXL)+) "
> or you can look for
>  - grid lines
> or you can look for whitespace patterns:
>
> Isthis
> a table
>
> or just fortuitous.
>
> and some tables use zebra stripes.
>
> I suspect at least 100 person years (and probably much more) have been
> spent on trying to extract tables. If we take the heuristic approach then
> it's work pooling our efforts and trying to share code. I'm sharing mine
> on:
> https://bitbucket.org/petermr/svg2xml/wiki/Home (which is built on PDFBox
> and https://bitbucket.org/petermr/pdf2svg/wiki/Home).
>
> Other people have built systems that use adaptive methods to decide where
> the whitespace is.
>
> I'd recommend splitting the PDF2Character part (I use SVG for the modelling
> syntax) and characters2tables as it means we can use more character
> extractors and combine them with the table synthesizers.
>
> P.
>
>
>
>
>
> On Mon, Feb 2, 2015 at 6:56 PM, Frank van der Hulst <
> drifter.fr...@gmail.com
> > wrote:
>
> > I have written a couple of Java classes that extract tabular data to
> arrays
> > of Strings.
> >
> > One works where the location of each column is fixed. The other figures
> out
> > the locations of columns from the table headers and outline drawing.
> >
> > The usual story applies... hardly any documentation, and they only work
> for
> > limited cases. I've sent the code to Lorena... I'd be grateful if you
> could
> > improve the documentation.
> >
> > NB: I'll be out of reach of my computer (and therefore my source code)
> for
> > the next few days, but will probably still be able to answer emails.
> >
> > Frank
> >
> >
> > On Tue, Feb 3, 2015 at 7:07 AM, Tilman Hausherr 
> > wrote:
> >
> > > Hi Lorena,
> > >
> > > There is no concept of table in a PDF, except in a tagged PDF.
> > >
> > > A table is just lines and text. In no specific order. It could also be
> an
> > > image of a table.
> > >
> > > You can succeed in this only if you know the structure of the PDF in
> > > advance, e.g. when it all comes from the same client.
> > >
> > >
> https://stackoverflow.com/questions/23495372/extract-table-data-from-pdf
> > > https://stackoverflow.com/questions/17591426/extract-table-from-a-pdf
> > > https://stackoverflow.com/questions/17217194/extracting-
> > > table-contents-from-a-collection-of-pdf-files
> > >
> >
> https://stackoverflow.com/questions/3424588/programmatically-extract-pdf-
> > > tables
> > >
> > > Tilman
> > >
> > >
> > > Am 02.02.2015 um 16:29 schrieb Lorena Leishman:
> > >
> > >  Hi,
> > >> I have a PDF that has information displayed on tables. Example:
> > >> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:
> > >>  123x  345   679Status:
> > >>OpenClosed OpenBalance:
> > >>   $23.  $0.00$100
> > >> Is there a way with PDFbox to extract a specific value(s) from the
> > table?
> > >> Example: Bank Of America  and $0.00
> > >> And also is there a way to cut the whole table and paste it into a
> > >> different PDF?
> > >> Please let me know, Thanks!
> > >> Lorena
> > >>
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > >
> > >
> >
>
>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>


Re: [PDFBOX-2.0] Signature Issue

2015-02-02 Thread Isaias Barroso
Thank you,

After test I'll give a feedback.

BR

On Mon, Feb 2, 2015 at 6:05 PM, Andreas Lehmkuehler 
wrote:

> Hi,
>
> Am 02.02.2015 um 20:24 schrieb Isaias Barroso:
>
>> Hi Andreas,
>>
>> The SNAPSHOT (pdfbox-2.0.0-20150202.110005-1034) for today already
>> contains
>> the fixed code?
>>
> I'm afraid not. You have to wait for the next succesful build.
>
> BR
> Andreas Lehmkühler
>
>
>  BR
>>
>> On Mon, Feb 2, 2015 at 5:12 PM, Andreas Lehmkuehler 
>> wrote:
>>
>>  Hi,
>>>
>>>
>>> Am 29.01.2015 um 16:10 schrieb Isaias Barroso:
>>>
>>>  Hi Ruben,

 I think it isn't the same problem, because the file is correctly signed
 using PDFBOX 1.8.8 and BouncyCastle 1.45.

  I guess the problem was a missing trailer. I've fixed that in the
>>> trunk,
>>> see [1] for further details.
>>>
>>> Please, double check if everything is fine now.
>>>
>>> BR
>>> Andreas Lehmkühler
>>>
>>> [1] https://issues.apache.org/jira/browse/PDFBOX-2656
>>>
>>>
>>>  Best regards


 On Thu, Jan 29, 2015 at 12:05 PM, Ruben Lagar 
 wrote:

   Hi Isaias,

>
> I had a similar problem, and I think it is related to the problem
> described
> in this Jira
>
> https://issues.apache.org/jira/browse/PDFBOX-1822
>
> There is no fix yet, as far as I know.
>
>
> El Thu Jan 29 2015 at 1:39:40 PM, Isaias Barroso (<
> isaias.barr...@gmail.com>)
> escribió:
>
>   Hi Andreas,
>
>>
>> I got the updated SNAPSHOT (pdfbox-2.0.0-20150129.080600-1013.jar)
>> and
>> used the sign_me.pdf, keystore.p12 provided on test case. Follow the
>>
>>  result
>
>  file, now Adobe Reader says that the signature is invalid and when I
>>
>>  close
>
>  the save message appears.
>> I've tried using the CreateSignature.class of
>> pdfbox-examples-2.0.0-20150129.080737-985.jar SNAPSHOT too.
>>
>> BouncyCastle 1.51 are being used.
>>
>> Best regards
>>
>>
>> On Thu, Jan 29, 2015 at 9:53 AM, Andreas Lehmkühler > >
>> wrote:
>>
>>   Hi,
>>
>>>
>>>
>>>   Isaias Barroso  hat am 28. Januar 2015
>>> um
>>>

  12:35
>>>
>>>  geschrieben:


 Hi all,

 I'm trying the PDFBOX 2 SNAPSHOT and I have a issue with Signature,

  the
>>>
>>
>  file is processed and the size are increased but when I open the file
>>
>>>
  on
>>>
>>
>  Adobe Reader the signature information aren't showed. When I close the
>>
>>>
  an
>>>
>>>  information that the document was modified appears, so I'm thinking

  that
>>>
>>
>  process wasn't completed correctly, although none exception are thrown
>>
>>>
 To make the tests, I've used a pdfbox-examples snapshot
 (org.apache.pdfbox.examples.signature.CreateSignature)


https://repository.apache.org/content/groups/snapshots/org/
>>>
>> apache/pdfbox/pdfbox-examples/2.0.0-SNAPSHOT/
>
>  What exact SNAPSHOT version did you use as there were recently some
>>
>>> changes.
>>>
>>>   Do you have any suggestion to investigate the root cause?
>>>

  What exactly did you do to sign the pdf? Did you have a look at the
>>> provided
>>> testcase [1], which demonstrates all necessary steps to sign a pdf.
>>>
>>>   Best regards
>>>

 --
 Isaías Barroso
 Belo Horizonte - MG


>>> BR
>>> Andreas Lehmkühler
>>>
>>> [1]
>>>
>>>
>>>   http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/
>>>
>> test/java/org/apache/pdfbox/examples/pdmodel/
> TestCreateSignature.java?view=markup
>
>
>>
>>>
>>
>> --
>> Isaías Barroso
>> Belo Horizonte - MG
>>
>> -
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>
>>
>
>



>>> -
>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>>
>>>
>>>
>>
>>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


-- 
Isaías Barroso
Belo Horizonte - MG


Re: [PDFBOX-2.0] Signature Issue

2015-02-02 Thread Andreas Lehmkuehler

Hi,

Am 02.02.2015 um 20:24 schrieb Isaias Barroso:

Hi Andreas,

The SNAPSHOT (pdfbox-2.0.0-20150202.110005-1034) for today already contains
the fixed code?

I'm afraid not. You have to wait for the next succesful build.

BR
Andreas Lehmkühler


BR

On Mon, Feb 2, 2015 at 5:12 PM, Andreas Lehmkuehler 
wrote:


Hi,


Am 29.01.2015 um 16:10 schrieb Isaias Barroso:


Hi Ruben,

I think it isn't the same problem, because the file is correctly signed
using PDFBOX 1.8.8 and BouncyCastle 1.45.


I guess the problem was a missing trailer. I've fixed that in the trunk,
see [1] for further details.

Please, double check if everything is fine now.

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-2656



Best regards


On Thu, Jan 29, 2015 at 12:05 PM, Ruben Lagar 
wrote:

  Hi Isaias,


I had a similar problem, and I think it is related to the problem
described
in this Jira

https://issues.apache.org/jira/browse/PDFBOX-1822

There is no fix yet, as far as I know.


El Thu Jan 29 2015 at 1:39:40 PM, Isaias Barroso (<
isaias.barr...@gmail.com>)
escribió:

  Hi Andreas,


I got the updated SNAPSHOT (pdfbox-2.0.0-20150129.080600-1013.jar) and
used the sign_me.pdf, keystore.p12 provided on test case. Follow the


result


file, now Adobe Reader says that the signature is invalid and when I


close


the save message appears.
I've tried using the CreateSignature.class of
pdfbox-examples-2.0.0-20150129.080737-985.jar SNAPSHOT too.

BouncyCastle 1.51 are being used.

Best regards


On Thu, Jan 29, 2015 at 9:53 AM, Andreas Lehmkühler 
wrote:

  Hi,



  Isaias Barroso  hat am 28. Januar 2015 um



12:35


geschrieben:


Hi all,

I'm trying the PDFBOX 2 SNAPSHOT and I have a issue with Signature,


the



file is processed and the size are increased but when I open the file



on



Adobe Reader the signature information aren't showed. When I close the



an


information that the document was modified appears, so I'm thinking


that



process wasn't completed correctly, although none exception are thrown


To make the tests, I've used a pdfbox-examples snapshot
(org.apache.pdfbox.examples.signature.CreateSignature)



  https://repository.apache.org/content/groups/snapshots/org/

apache/pdfbox/pdfbox-examples/2.0.0-SNAPSHOT/


What exact SNAPSHOT version did you use as there were recently some

changes.

  Do you have any suggestion to investigate the root cause?



What exactly did you do to sign the pdf? Did you have a look at the
provided
testcase [1], which demonstrates all necessary steps to sign a pdf.

  Best regards


--
Isaías Barroso
Belo Horizonte - MG



BR
Andreas Lehmkühler

[1]


  http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/

test/java/org/apache/pdfbox/examples/pdmodel/
TestCreateSignature.java?view=markup








--
Isaías Barroso
Belo Horizonte - MG

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org










-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org








-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDF extraction

2015-02-02 Thread Peter Murray-Rust
I agree with all those who emphasis that there is no deterministic
algorithm. I also agree that Tabula is likely to be the best place to start
and am working with them.

The first question is:

"How do you know where the tables are?"

In some cases you can look for the Anglophone word "Table", and a regex of
something like:
- "Tab(le)?\s*((\d+)|(IVXL)+) "
or you can look for
 - grid lines
or you can look for whitespace patterns:

Isthis
a table

or just fortuitous.

and some tables use zebra stripes.

I suspect at least 100 person years (and probably much more) have been
spent on trying to extract tables. If we take the heuristic approach then
it's work pooling our efforts and trying to share code. I'm sharing mine on:
https://bitbucket.org/petermr/svg2xml/wiki/Home (which is built on PDFBox
and https://bitbucket.org/petermr/pdf2svg/wiki/Home).

Other people have built systems that use adaptive methods to decide where
the whitespace is.

I'd recommend splitting the PDF2Character part (I use SVG for the modelling
syntax) and characters2tables as it means we can use more character
extractors and combine them with the table synthesizers.

P.





On Mon, Feb 2, 2015 at 6:56 PM, Frank van der Hulst  wrote:

> I have written a couple of Java classes that extract tabular data to arrays
> of Strings.
>
> One works where the location of each column is fixed. The other figures out
> the locations of columns from the table headers and outline drawing.
>
> The usual story applies... hardly any documentation, and they only work for
> limited cases. I've sent the code to Lorena... I'd be grateful if you could
> improve the documentation.
>
> NB: I'll be out of reach of my computer (and therefore my source code) for
> the next few days, but will probably still be able to answer emails.
>
> Frank
>
>
> On Tue, Feb 3, 2015 at 7:07 AM, Tilman Hausherr 
> wrote:
>
> > Hi Lorena,
> >
> > There is no concept of table in a PDF, except in a tagged PDF.
> >
> > A table is just lines and text. In no specific order. It could also be an
> > image of a table.
> >
> > You can succeed in this only if you know the structure of the PDF in
> > advance, e.g. when it all comes from the same client.
> >
> > https://stackoverflow.com/questions/23495372/extract-table-data-from-pdf
> > https://stackoverflow.com/questions/17591426/extract-table-from-a-pdf
> > https://stackoverflow.com/questions/17217194/extracting-
> > table-contents-from-a-collection-of-pdf-files
> >
> https://stackoverflow.com/questions/3424588/programmatically-extract-pdf-
> > tables
> >
> > Tilman
> >
> >
> > Am 02.02.2015 um 16:29 schrieb Lorena Leishman:
> >
> >  Hi,
> >> I have a PDF that has information displayed on tables. Example:
> >> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:
> >>  123x  345   679Status:
> >>OpenClosed OpenBalance:
> >>   $23.  $0.00$100
> >> Is there a way with PDFbox to extract a specific value(s) from the
> table?
> >> Example: Bank Of America  and $0.00
> >> And also is there a way to cut the whole table and paste it into a
> >> different PDF?
> >> Please let me know, Thanks!
> >> Lorena
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> >
> >
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069


Re: [PDFBOX-2.0] Signature Issue

2015-02-02 Thread Isaias Barroso
Hi Andreas,

The SNAPSHOT (pdfbox-2.0.0-20150202.110005-1034) for today already contains
the fixed code?

BR

On Mon, Feb 2, 2015 at 5:12 PM, Andreas Lehmkuehler 
wrote:

> Hi,
>
>
> Am 29.01.2015 um 16:10 schrieb Isaias Barroso:
>
>> Hi Ruben,
>>
>> I think it isn't the same problem, because the file is correctly signed
>> using PDFBOX 1.8.8 and BouncyCastle 1.45.
>>
> I guess the problem was a missing trailer. I've fixed that in the trunk,
> see [1] for further details.
>
> Please, double check if everything is fine now.
>
> BR
> Andreas Lehmkühler
>
> [1] https://issues.apache.org/jira/browse/PDFBOX-2656
>
>
>> Best regards
>>
>>
>> On Thu, Jan 29, 2015 at 12:05 PM, Ruben Lagar 
>> wrote:
>>
>>  Hi Isaias,
>>>
>>> I had a similar problem, and I think it is related to the problem
>>> described
>>> in this Jira
>>>
>>> https://issues.apache.org/jira/browse/PDFBOX-1822
>>>
>>> There is no fix yet, as far as I know.
>>>
>>>
>>> El Thu Jan 29 2015 at 1:39:40 PM, Isaias Barroso (<
>>> isaias.barr...@gmail.com>)
>>> escribió:
>>>
>>>  Hi Andreas,

 I got the updated SNAPSHOT (pdfbox-2.0.0-20150129.080600-1013.jar) and
 used the sign_me.pdf, keystore.p12 provided on test case. Follow the

>>> result
>>>
 file, now Adobe Reader says that the signature is invalid and when I

>>> close
>>>
 the save message appears.
 I've tried using the CreateSignature.class of
 pdfbox-examples-2.0.0-20150129.080737-985.jar SNAPSHOT too.

 BouncyCastle 1.51 are being used.

 Best regards


 On Thu, Jan 29, 2015 at 9:53 AM, Andreas Lehmkühler 
 wrote:

  Hi,
>
>
>  Isaias Barroso  hat am 28. Januar 2015 um
>>
> 12:35
>
>> geschrieben:
>>
>>
>> Hi all,
>>
>> I'm trying the PDFBOX 2 SNAPSHOT and I have a issue with Signature,
>>
> the
>>>
 file is processed and the size are increased but when I open the file
>>
> on
>>>
 Adobe Reader the signature information aren't showed. When I close the
>>
> an
>
>> information that the document was modified appears, so I'm thinking
>>
> that
>>>
 process wasn't completed correctly, although none exception are thrown
>>
>> To make the tests, I've used a pdfbox-examples snapshot
>> (org.apache.pdfbox.examples.signature.CreateSignature)
>>
>>
>  https://repository.apache.org/content/groups/snapshots/org/
>>> apache/pdfbox/pdfbox-examples/2.0.0-SNAPSHOT/
>>>
 What exact SNAPSHOT version did you use as there were recently some
> changes.
>
>  Do you have any suggestion to investigate the root cause?
>>
> What exactly did you do to sign the pdf? Did you have a look at the
> provided
> testcase [1], which demonstrates all necessary steps to sign a pdf.
>
>  Best regards
>>
>> --
>> Isaías Barroso
>> Belo Horizonte - MG
>>
>
> BR
> Andreas Lehmkühler
>
> [1]
>
>
>  http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/
>>> test/java/org/apache/pdfbox/examples/pdmodel/
>>> TestCreateSignature.java?view=markup
>>>

>


 --
 Isaías Barroso
 Belo Horizonte - MG

 -
 To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
 For additional commands, e-mail: users-h...@pdfbox.apache.org

>>>
>>>
>>
>>
>>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


-- 
Isaías Barroso
Belo Horizonte - MG


Re: [PDFBOX-2.0] Signature Issue

2015-02-02 Thread Andreas Lehmkuehler

Hi,


Am 29.01.2015 um 16:10 schrieb Isaias Barroso:

Hi Ruben,

I think it isn't the same problem, because the file is correctly signed
using PDFBOX 1.8.8 and BouncyCastle 1.45.
I guess the problem was a missing trailer. I've fixed that in the trunk, see [1] 
for further details.


Please, double check if everything is fine now.

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-2656


Best regards


On Thu, Jan 29, 2015 at 12:05 PM, Ruben Lagar  wrote:


Hi Isaias,

I had a similar problem, and I think it is related to the problem described
in this Jira

https://issues.apache.org/jira/browse/PDFBOX-1822

There is no fix yet, as far as I know.


El Thu Jan 29 2015 at 1:39:40 PM, Isaias Barroso (<
isaias.barr...@gmail.com>)
escribió:


Hi Andreas,

I got the updated SNAPSHOT (pdfbox-2.0.0-20150129.080600-1013.jar) and
used the sign_me.pdf, keystore.p12 provided on test case. Follow the

result

file, now Adobe Reader says that the signature is invalid and when I

close

the save message appears.
I've tried using the CreateSignature.class of
pdfbox-examples-2.0.0-20150129.080737-985.jar SNAPSHOT too.

BouncyCastle 1.51 are being used.

Best regards


On Thu, Jan 29, 2015 at 9:53 AM, Andreas Lehmkühler 
wrote:


Hi,



Isaias Barroso  hat am 28. Januar 2015 um

12:35

geschrieben:


Hi all,

I'm trying the PDFBOX 2 SNAPSHOT and I have a issue with Signature,

the

file is processed and the size are increased but when I open the file

on

Adobe Reader the signature information aren't showed. When I close the

an

information that the document was modified appears, so I'm thinking

that

process wasn't completed correctly, although none exception are thrown

To make the tests, I've used a pdfbox-examples snapshot
(org.apache.pdfbox.examples.signature.CreateSignature)




https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-examples/2.0.0-SNAPSHOT/

What exact SNAPSHOT version did you use as there were recently some
changes.


Do you have any suggestion to investigate the root cause?

What exactly did you do to sign the pdf? Did you have a look at the
provided
testcase [1], which demonstrates all necessary steps to sign a pdf.


Best regards

--
Isaías Barroso
Belo Horizonte - MG


BR
Andreas Lehmkühler

[1]



http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/test/java/org/apache/pdfbox/examples/pdmodel/TestCreateSignature.java?view=markup






--
Isaías Barroso
Belo Horizonte - MG

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org









-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDF extraction

2015-02-02 Thread Frank van der Hulst
I have written a couple of Java classes that extract tabular data to arrays
of Strings.

One works where the location of each column is fixed. The other figures out
the locations of columns from the table headers and outline drawing.

The usual story applies... hardly any documentation, and they only work for
limited cases. I've sent the code to Lorena... I'd be grateful if you could
improve the documentation.

NB: I'll be out of reach of my computer (and therefore my source code) for
the next few days, but will probably still be able to answer emails.

Frank


On Tue, Feb 3, 2015 at 7:07 AM, Tilman Hausherr 
wrote:

> Hi Lorena,
>
> There is no concept of table in a PDF, except in a tagged PDF.
>
> A table is just lines and text. In no specific order. It could also be an
> image of a table.
>
> You can succeed in this only if you know the structure of the PDF in
> advance, e.g. when it all comes from the same client.
>
> https://stackoverflow.com/questions/23495372/extract-table-data-from-pdf
> https://stackoverflow.com/questions/17591426/extract-table-from-a-pdf
> https://stackoverflow.com/questions/17217194/extracting-
> table-contents-from-a-collection-of-pdf-files
> https://stackoverflow.com/questions/3424588/programmatically-extract-pdf-
> tables
>
> Tilman
>
>
> Am 02.02.2015 um 16:29 schrieb Lorena Leishman:
>
>  Hi,
>> I have a PDF that has information displayed on tables. Example:
>> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:
>>  123x  345   679Status:
>>OpenClosed OpenBalance:
>>   $23.  $0.00$100
>> Is there a way with PDFbox to extract a specific value(s) from the table?
>> Example: Bank Of America  and $0.00
>> And also is there a way to cut the whole table and paste it into a
>> different PDF?
>> Please let me know, Thanks!
>> Lorena
>>
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Re: PDF extraction

2015-02-02 Thread Frank van der Hulst
On Tue, Feb 3, 2015 at 4:29 AM, Lorena Leishman <
lorenaleish...@yahoo.com.invalid> wrote:

> Hi,
> I have a PDF that has information displayed on tables. Example:
> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:
> 123x  345   679Status:
>   OpenClosed OpenBalance:
>  $23.  $0.00$100
> Is there a way with PDFbox to extract a specific value(s) from the table?
> Example: Bank Of America  and $0.00
> And also is there a way to cut the whole table and paste it into a
> different PDF?
> Please let me know, Thanks!
> Lorena
package input.pdf;

import com.vividsolutions.jts.geom.Coordinate;
import com.vividsolutions.jts.geom.LineString;
import java.awt.Dimension;
import java.awt.Toolkit;
import java.awt.geom.Rectangle2D;
import java.util.ArrayList;
import java.util.Arrays;
import javafx.application.Platform;
import javafx.scene.Scene;
import javafx.scene.canvas.Canvas;
import javafx.scene.canvas.GraphicsContext;
import javafx.scene.layout.StackPane;
import javafx.scene.paint.Color;
import javafx.stage.Stage;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.geotools.data.simple.SimpleFeatureCollection;
import org.geotools.data.simple.SimpleFeatureIterator;
import output.ShapeStyle;
import topography.LineFeature;
/**
 *
 * @author Frank van der Hulst 
 */

public class Display1 {

  private final static Logger log = Logger.getLogger(Display1.class.getName());

  private final Dimension screen = Toolkit.getDefaultToolkit().getScreenSize();
  private StackPane root;
  private GraphicsContext gc;
  private Stage stage = null;
  private Canvas canvas;
  private double width, height, scale, xOffset, yOffset;

  /**
   * Can be called from FX thread
   *
   * @param title
   * @param pageWidth
   * @param pageHeight
   * @param xOffset
   * @param yOffset
   */
  @SuppressWarnings("SleepWhileInLoop")
  public Display1(final String title, double pageWidth, double pageHeight, double xOffset, double yOffset) {
this.xOffset = xOffset;
this.yOffset = yOffset;
Platform.runLater(() -> {
  stage = new Stage();
  root = new StackPane();
  stage.setX(0);
  stage.setY(0);
  stage.setTitle(title);
  scale = (double) screen.height / pageHeight;
  width = pageWidth * scale;
  height = screen.height;
  Scene scene = new Scene(root, width, height);
  canvas = new Canvas(width, height);
  root.getChildren().add(canvas);
  stage.setWidth(width);
  stage.setHeight(height);
  stage.setScene(scene);
  stage.show();
  gc = canvas.getGraphicsContext2D();
});
int count = 0;
while (gc == null) {
  try {
Thread.sleep(100);
count++;
  } catch (InterruptedException ex) {
  }
}
log.trace("Waited " + (count * 100) + "ms for graphics");
  }

  public void close() {
// update display on FX thread
Platform.runLater(stage::close);
canvas = null;
gc = null;
  }

  public Color javaFX(java.awt.Color awt) {
return new javafx.scene.paint.Color(awt.getRed() / 255, awt.getGreen() / 255, awt.getBlue() / 255, awt.getAlpha() / 255);
  }

  public void drawPolyLine(final ArrayList L, final Color c, final float lw) {
assert gc != null : "Null gc";
if (L == null || L.isEmpty()) {
  return;
}
//log.debug("DrawLine: " + L.size());
final int numPoints = L.size();
final double[] x = new double[numPoints];
final double[] y = new double[numPoints];
int i = 0;
for (java.awt.Point.Float P : L) {
  x[i] = P.x * scale;
  y[i++] = P.y * scale;
}

Platform.runLater(() -> {
  gc.setStroke(c);
  gc.setFill(null);
//  gc.setFill(lc);
  gc.setLineWidth(lw);
  if (numPoints < 256) {
gc.strokePolygon(x, y, numPoints);
return;
  }
  for (int i1 = 0; i1 < numPoints; i1 += 249) {
final int numPts = Math.min(250, numPoints - i1);
gc.strokePolyline(Arrays.copyOfRange(x, i1, i1 + numPts), Arrays.copyOfRange(y, i1, i1 + numPts), numPts);
  }
  gc.strokeLine(x[0], y[0], x[numPoints - 1], y[numPoints - 1]);
});
  }

  public void drawSegment(float x1, float y1, float x2, float y2, final Color c, final float lw) {
assert gc != null : "Null gc";
//log.debug("DrawCell: " + L.size());
final double[] x = {x1 * scale, x2 * scale};
final double[] y = {y1 * scale, y2 * scale};

Platform.runLater(() -> {
  gc.setStroke(c);
  gc.setLineWidth(lw);
  gc.strokePolygon(x, y, x.length);
});
  }

  public void drawRectangle(Rectangle2D.Float area, final Color c, final float lw) {
assert gc != null : "Null gc";
if (area == null) {
  return;
}
//log.debug("DrawRectangle: " + area.toString());
final double[] x = {area.x * scale, (area.x + area.width) * scale, (area.x + area.width) * scale, area.x * scale};
final dou

Re: PDF extraction

2015-02-02 Thread Tilman Hausherr

Hi Lorena,

There is no concept of table in a PDF, except in a tagged PDF.

A table is just lines and text. In no specific order. It could also be 
an image of a table.


You can succeed in this only if you know the structure of the PDF in 
advance, e.g. when it all comes from the same client.


https://stackoverflow.com/questions/23495372/extract-table-data-from-pdf
https://stackoverflow.com/questions/17591426/extract-table-from-a-pdf
https://stackoverflow.com/questions/17217194/extracting-table-contents-from-a-collection-of-pdf-files
https://stackoverflow.com/questions/3424588/programmatically-extract-pdf-tables

Tilman


Am 02.02.2015 um 16:29 schrieb Lorena Leishman:

Hi,
I have a PDF that has information displayed on tables. Example:
Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #: 
123x  345   679Status:   Open   
 Closed OpenBalance:$23.
  $0.00$100
Is there a way with PDFbox to extract a specific value(s) from the table? 
Example: Bank Of America  and $0.00
And also is there a way to cut the whole table and paste it into a different 
PDF?
Please let me know, Thanks!
Lorena



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: PDF extraction

2015-02-02 Thread Gilad Denneboom
It might be possible to extract the text you want but I don't think there's
a built-in method in PDFBox that will allow you to do it. It will have to
be based either on the text's location on the page, the context (relation
to other text), a certain pattern, etc. Each of these things can be
implemented but it would require a custom-made tool to be created.
It's very hard to say for sure without seeing the actual file, though.

On Mon, Feb 2, 2015 at 4:29 PM, Lorena Leishman <
lorenaleish...@yahoo.com.invalid> wrote:

> Hi,
> I have a PDF that has information displayed on tables. Example:
> Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:
> 123x  345   679Status:
>   OpenClosed OpenBalance:
>  $23.  $0.00$100
> Is there a way with PDFbox to extract a specific value(s) from the table?
> Example: Bank Of America  and $0.00
> And also is there a way to cut the whole table and paste it into a
> different PDF?
> Please let me know, Thanks!
> Lorena


PDF extraction

2015-02-02 Thread Lorena Leishman
Hi,
I have a PDF that has information displayed on tables. Example:
Company Name:   Barnes & Noble   Bank Of America  Macy'sAccount #:             
123x              345               679Status:                   
Open                    Closed                 OpenBalance:                $23. 
                     $0.00                    $100
Is there a way with PDFbox to extract a specific value(s) from the table? 
Example: Bank Of America  and $0.00 
And also is there a way to cut the whole table and paste it into a different 
PDF?
Please let me know, Thanks!
Lorena

Re: Release schedule 2.0.0

2015-02-02 Thread Maruan Sahyoun
Hi,

we are currently looking at coming up with a release date reviewing the issues 
we have and moving the ones we will not be targeting into subsequent releases.

This is the list of (currently) open issues targeted to be handled in 2.0  
https://issues.apache.org/jira/browse/PDFBOX/fixforversion/12319281/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel

If you feel there are issues you can help with yes please, let us know. The 
easiest would be to add a patch to the issue. If you can respect our coding 
conventions https://pdfbox.apache.org/codingconventions.html that would be 
great.

Feel free to ping me towards the end of the week if there is no progress.

BR
Maruan

Am 02.02.2015 um 14:07 schrieb Jan De Moerloose :

> Hi,
> 
> as i did not get an answer, i assume there is no planning yet ? Would it be 
> possible to tag a version as release candidate or milestone instead ?
> I would be willing to dedicate a small amount of time if that helps, but the 
> list of open issues for this version is daunting (100+). Please let me know 
> if there is anything i can help with (as a java programmer, not a pdf 
> specialist).
> Btw, the product works fine for me, so thanks to all developers for doing 
> such a great job !!!
> 
> cheers,
> Jan
> On 01/12/2015 02:07 PM, Jan De Moerloose wrote:
>> Hi,
>> 
>> i am using a snapshot version of pdfbox 2.0.0 for converting pdf's to 
>> images. Is there already a time schedule for the release of this version ?
>> 
>> Regards,
>> Jan
>> 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 



Re: Release schedule 2.0.0

2015-02-02 Thread Jan De Moerloose

Hi,

as i did not get an answer, i assume there is no planning yet ? Would it 
be possible to tag a version as release candidate or milestone instead ?
I would be willing to dedicate a small amount of time if that helps, but 
the list of open issues for this version is daunting (100+). Please let 
me know if there is anything i can help with (as a java programmer, not 
a pdf specialist).
Btw, the product works fine for me, so thanks to all developers for 
doing such a great job !!!


cheers,
Jan
On 01/12/2015 02:07 PM, Jan De Moerloose wrote:

Hi,

i am using a snapshot version of pdfbox 2.0.0 for converting pdf's to 
images. Is there already a time schedule for the release of this 
version ?


Regards,
Jan




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org