Re: [VOTE] Release Apache PDFBox 1.8.7

2014-09-16 Thread Timo Boehme

+1, thanks for managing the release process

Timo


Am 15.09.2014 um 20:49 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 1.8.7 release is available at:

 http://people.apache.org/~lehmi/pdfbox/1.8.7/

The release candidate is a zip archive of the sources in:

 http://svn.apache.org/repos/asf/pdfbox/tags/1.8.7/

The SHA1 checksum of the archive is
ba7f83a1db9e697bcd0d3613571e1b397968daf6.

Please vote on releasing this package as Apache PDFBox 1.8.7.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

 [ ] +1 Release this package as Apache PDFBox 1.8.7
 [ ] -1 Do not release this package because...

Here is my +1

BR
Andreas Lehmkühler



--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 timo.boe...@ontochem.com

_

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_



[jira] [Commented] (PDFBOX-2350) Type1 Parser hangs indefinitely

2014-09-16 Thread Daniel Scheibe (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135091#comment-14135091
 ] 

Daniel Scheibe commented on PDFBOX-2350:


Thanks Tilman for your feedback.

What i currently do is:

_pdDocument = PDDocument.load(_inputStream);

if (_pdDocument.isEncrypted())
{
_logger.warn(Document is encrypted, trying to decrypt 
without password);
_pdDocument.decrypt();
}

// 2.0.0-SNAPSHOT
_pdRenderer = new PDFRenderer(_pdDocument, true);

   // ...

So i guess what you said about an additional call to decrypt is already in 
place and should work?



 Type1 Parser hangs indefinitely
 ---

 Key: PDFBOX-2350
 URL: https://issues.apache.org/jira/browse/PDFBOX-2350
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
 Environment: Windows 7, JDK 1.7.0_51-b13
Reporter: Daniel Scheibe

 When rendering the first page of my pdf document the Type1Parser 
 (org.apache.fontbox.type1.Type1Parser) hangs in a loop in 
 {{parseBinary(byte[] bytes) throws IOException}}
 and kills our rendering pipeline. Please find the loop that hangs below:
 // find /Private dict
 while (!lexer.peekToken().getText().equals(Private))
 {
 lexer.nextToken();
 }
 There is no token named Private ever in the list of returned tokens 
 (they're empty all the time).  
 Furthermore going deeper into the source code it seems the class reading the 
 tokens (Type1Lexer) does never finally advance the buffer position and always 
 returns an empty name token in the readToken(Token prevToken) method.
 Looking at the decrypted buffer i cannot get something useful out of it based 
 on my current understanding.
 Unfortunately i cannot provide the pdf in question as it contains confidental 
 data.
 Acrobat Reader XI Version 11.0.08 renders the document just fine.
 In addition it seems the pdf was encrypted (40-Bit RC4) with an empty 
 password and says it's pdf version 1.5.
 Does this provide enough information or can i do anything else to help 
 nailing this one down?
 I guess this might be a pdf document structure/feature that is not yet 
 supported completely but at least pdfbox should throw an exception instead of 
 failing silently...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2353) ArrayIndexOutOfBoundsException in Type2CharString.drawAlternatingCurve

2014-09-16 Thread simon steiner (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135114#comment-14135114
 ] 

simon steiner commented on PDFBOX-2353:
---

Similar to PDFBOX-2177

 ArrayIndexOutOfBoundsException in Type2CharString.drawAlternatingCurve
 --

 Key: PDFBOX-2353
 URL: https://issues.apache.org/jira/browse/PDFBOX-2353
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Tilman Hausherr

 The file from PDFBOX-2348 fails with this exception:
 {code}
 java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
   at java.util.Vector.get(Vector.java:744)
   at 
 org.apache.fontbox.cff.Type2CharString.drawAlternatingCurve(Type2CharString.java:333)
   at 
 org.apache.fontbox.cff.Type2CharString.handleCommand(Type2CharString.java:181)
   at 
 org.apache.fontbox.cff.Type2CharString.access$000(Type2CharString.java:32)
   at 
 org.apache.fontbox.cff.Type2CharString$1.handleCommand(Type2CharString.java:104)
   at 
 org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:45)
   at 
 org.apache.fontbox.cff.Type2CharString.convertType1ToType2(Type2CharString.java:107)
   at 
 org.apache.fontbox.cff.Type2CharString.init(Type2CharString.java:58)
   at 
 org.apache.fontbox.cff.CIDKeyedType2CharString.init(CIDKeyedType2CharString.java:46)
   at 
 org.apache.fontbox.cff.CFFCIDFont.getType2CharString(CFFCIDFont.java:233)
   at 
 org.apache.pdfbox.pdmodel.font.PDCIDFontType0.getType2CharString(PDCIDFontType0.java:210)
   at 
 org.apache.pdfbox.rendering.font.CIDType0Glyph2D.getPathForCharacterCode(CIDType0Glyph2D.java:63)
   at 
 org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:431)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] move documentation and examples to git

2014-09-16 Thread Maruan Sahyoun
Hi there,

in order to make it easier for people to contribute to the documentation and 
examples I thought about the potential benefits of moving these to a git based 
repository instead of svn. The main idea behind that is to allow people to 
contribute via github opening another channel of communication and making it 
easier to contribute. 

Proposed names are pdfbox-docs and pdfbox-examples. Take a look at 
https://github.com/apache/cordova-docs for an example of that.

I haven’t thought about all potential implications and changes necessary yet 
but wanted to get a first feedback about support for that idea before putting 
more effort into that.

WDYT?

Maruan

Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Andreas Lehmkühler
Hi,

 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 10:21
 geschrieben:


 Hi there,

 in order to make it easier for people to contribute to the documentation and
 examples I thought about the potential benefits of moving these to a git based
 repository instead of svn. The main idea behind that is to allow people to
 contribute via github opening another channel of communication and making it
 easier to contribute.

 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.

 I haven’t thought about all potential implications and changes necessary yet
 but wanted to get a first feedback about support for that idea before putting
 more effort into that.

 WDYT?
Good idea, but I'm not sure if a splitted repo configuration (svn/git) is
supported by infra. So maybe this is only possible if we migrate the whole
project to git.

 Maruan

BR
Andreas Lehmkühler


Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Maruan Sahyoun
what about having extra repos for pdfbox-docs and pdfbox-examples?

Maruan

Am 16.09.2014 um 11:43 schrieb Andreas Lehmkühler andr...@lehmi.de:

 Hi,
 
 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 10:21
 geschrieben:
 
 
 Hi there,
 
 in order to make it easier for people to contribute to the documentation and
 examples I thought about the potential benefits of moving these to a git 
 based
 repository instead of svn. The main idea behind that is to allow people to
 contribute via github opening another channel of communication and making it
 easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes necessary yet
 but wanted to get a first feedback about support for that idea before putting
 more effort into that.
 
 WDYT?
 Good idea, but I'm not sure if a splitted repo configuration (svn/git) is
 supported by infra. So maybe this is only possible if we migrate the whole
 project to git.
 
 Maruan
 
 BR
 Andreas Lehmkühler



Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Andreas Lehmkühler
 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 11:46
 geschrieben:


 what about having extra repos for pdfbox-docs and pdfbox-examples?
Hmm, I'm a little bit puzzled. Your origin proposal was already about extra
git-repos for docs and examples, wasn't it?

Andreas


 Maruan

 Am 16.09.2014 um 11:43 schrieb Andreas Lehmkühler andr...@lehmi.de:

  Hi,
 
  Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 10:21
  geschrieben:
 
 
  Hi there,
 
  in order to make it easier for people to contribute to the documentation
  and
  examples I thought about the potential benefits of moving these to a git
  based
  repository instead of svn. The main idea behind that is to allow people to
  contribute via github opening another channel of communication and making
  it
  easier to contribute.
 
  Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
  https://github.com/apache/cordova-docs for an example of that.
 
  I haven’t thought about all potential implications and changes necessary
  yet
  but wanted to get a first feedback about support for that idea before
  putting
  more effort into that.
 
  WDYT?
  Good idea, but I'm not sure if a splitted repo configuration (svn/git) is
  supported by infra. So maybe this is only possible if we migrate the whole
  project to git.
 
  Maruan
 
  BR
  Andreas Lehmkühler



Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Maruan Sahyoun
OK - I see what you mean, got your question wrong. We can check with infra but 
I don’t see a reason why pdfbox-docs and pdfbox-examples can't exist in new 
repos and there is pdfbox in the old one and the new repos being git based. 
Would behave just like ‚different‘ projects.

So if it’s possible shall we do it?

Moving the whole project to git is a different story. I’d see the same benefit 
applying to pdfbox but the impact is larger. So moving the docs and examples 
might also be a good test case.

Maruan


Am 16.09.2014 um 11:55 schrieb Andreas Lehmkühler andr...@lehmi.de:

 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 11:46
 geschrieben:
 
 
 what about having extra repos for pdfbox-docs and pdfbox-examples?
 Hmm, I'm a little bit puzzled. Your origin proposal was already about extra
 git-repos for docs and examples, wasn't it?
 
 Andreas
 
 
 Maruan
 
 Am 16.09.2014 um 11:43 schrieb Andreas Lehmkühler andr...@lehmi.de:
 
 Hi,
 
 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 10:21
 geschrieben:
 
 
 Hi there,
 
 in order to make it easier for people to contribute to the documentation
 and
 examples I thought about the potential benefits of moving these to a git
 based
 repository instead of svn. The main idea behind that is to allow people to
 contribute via github opening another channel of communication and making
 it
 easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes necessary
 yet
 but wanted to get a first feedback about support for that idea before
 putting
 more effort into that.
 
 WDYT?
 Good idea, but I'm not sure if a splitted repo configuration (svn/git) is
 supported by infra. So maybe this is only possible if we migrate the whole
 project to git.
 
 Maruan
 
 BR
 Andreas Lehmkühler
 



Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Andreas Lehmkühler


 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 12:06
 geschrieben:


 OK - I see what you mean, got your question wrong. We can check with infra but
 I don’t see a reason why pdfbox-docs and pdfbox-examples can't exist in new
 repos and there is pdfbox in the old one and the new repos being git based.
 Would behave just like ‚different‘ projects.

Technically yes, but we should asked infra if it's possible from the
organizational point of view.

 So if it’s possible shall we do it?
+1,

We have to split the build if we move the examples to a git repo and concatenate
them.

 Moving the whole project to git is a different story. I’d see the same benefit
 applying to pdfbox but the impact is larger. So moving the docs and examples
 might also be a good test case.

Yes, that would be a perfect opportunity

 Maruan

Andreas


 Am 16.09.2014 um 11:55 schrieb Andreas Lehmkühler andr...@lehmi.de:

  Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 11:46
  geschrieben:
 
 
  what about having extra repos for pdfbox-docs and pdfbox-examples?
  Hmm, I'm a little bit puzzled. Your origin proposal was already about extra
  git-repos for docs and examples, wasn't it?
 
  Andreas
 
 
  Maruan
 
  Am 16.09.2014 um 11:43 schrieb Andreas Lehmkühler andr...@lehmi.de:
 
  Hi,
 
  Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um
  10:21
  geschrieben:
 
 
  Hi there,
 
  in order to make it easier for people to contribute to the documentation
  and
  examples I thought about the potential benefits of moving these to a git
  based
  repository instead of svn. The main idea behind that is to allow people
  to
  contribute via github opening another channel of communication and making
  it
  easier to contribute.
 
  Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
  https://github.com/apache/cordova-docs for an example of that.
 
  I haven’t thought about all potential implications and changes necessary
  yet
  but wanted to get a first feedback about support for that idea before
  putting
  more effort into that.
 
  WDYT?
  Good idea, but I'm not sure if a splitted repo configuration (svn/git) is
  supported by infra. So maybe this is only possible if we migrate the whole
  project to git.
 
  Maruan
 
  BR
  Andreas Lehmkühler
 



[jira] [Created] (PDFBOX-2354) DataFormatException: incorrect header check

2014-09-16 Thread simon steiner (JIRA)
simon steiner created PDFBOX-2354:
-

 Summary: DataFormatException: incorrect header check
 Key: PDFBOX-2354
 URL: https://issues.apache.org/jira/browse/PDFBOX-2354
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner


java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc -nonSeq 601501018.pdf 
java.util.zip.DataFormatException: incorrect header check
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2354) DataFormatException: incorrect header check

2014-09-16 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2354:
--
Description: 
http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/601501018.pdf?revision=682412view=copathrev=793348

java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc -nonSeq 601501018.pdf 
java.util.zip.DataFormatException: incorrect header check
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)

  was:
java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc -nonSeq 601501018.pdf 
java.util.zip.DataFormatException: incorrect header check
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)


 DataFormatException: incorrect header check
 ---

 Key: PDFBOX-2354
 URL: https://issues.apache.org/jira/browse/PDFBOX-2354
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner

 http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/601501018.pdf?revision=682412view=copathrev=793348
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq 601501018.pdf 
 java.util.zip.DataFormatException: incorrect header check
   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Andreas Lehmkühler
Hi,

 Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 14:35
 geschrieben:



 Am 16.09.2014 um 14:27 schrieb Andreas Lehmkühler andr...@lehmi.de:

  Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um 14:23
  geschrieben:
 
 
  Am 16.09.2014 um 14:08 schrieb Andreas Lehmkühler andr...@lehmi.de:
 
 
 
  Maruan Sahyoun sahy...@fileaffairs.de hat am 16. September 2014 um
  12:06
  geschrieben:
 
 
  OK - I see what you mean, got your question wrong. We can check with
  infra
  but
  I don’t see a reason why pdfbox-docs and pdfbox-examples can't exist in
  new
  repos and there is pdfbox in the old one and the new repos being git
  based.
  Would behave just like ‚different‘ projects.
 
  Technically yes, but we should asked infra if it's possible from the
  organizational point of view.
 
  You or me going to ask?
  Be my guest ;-)
 

 Thank you - looking forward to your feedback. In the meanwhile I’ll start with
 the changes for the content.
Done, I'm simply created a JIRA ticket. Let's see what happens 

https://issues.apache.org/jira/browse/INFRA-8357

BR
Andreas


[jira] [Closed] (PDFBOX-2353) ArrayIndexOutOfBoundsException in Type2CharString.drawAlternatingCurve

2014-09-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-2353.
---
Resolution: Duplicate

 ArrayIndexOutOfBoundsException in Type2CharString.drawAlternatingCurve
 --

 Key: PDFBOX-2353
 URL: https://issues.apache.org/jira/browse/PDFBOX-2353
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Tilman Hausherr

 The file from PDFBOX-2348 fails with this exception:
 {code}
 java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 3
   at java.util.Vector.get(Vector.java:744)
   at 
 org.apache.fontbox.cff.Type2CharString.drawAlternatingCurve(Type2CharString.java:333)
   at 
 org.apache.fontbox.cff.Type2CharString.handleCommand(Type2CharString.java:181)
   at 
 org.apache.fontbox.cff.Type2CharString.access$000(Type2CharString.java:32)
   at 
 org.apache.fontbox.cff.Type2CharString$1.handleCommand(Type2CharString.java:104)
   at 
 org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:45)
   at 
 org.apache.fontbox.cff.Type2CharString.convertType1ToType2(Type2CharString.java:107)
   at 
 org.apache.fontbox.cff.Type2CharString.init(Type2CharString.java:58)
   at 
 org.apache.fontbox.cff.CIDKeyedType2CharString.init(CIDKeyedType2CharString.java:46)
   at 
 org.apache.fontbox.cff.CFFCIDFont.getType2CharString(CFFCIDFont.java:233)
   at 
 org.apache.pdfbox.pdmodel.font.PDCIDFontType0.getType2CharString(PDCIDFontType0.java:210)
   at 
 org.apache.pdfbox.rendering.font.CIDType0Glyph2D.getPathForCharacterCode(CIDType0Glyph2D.java:63)
   at 
 org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:431)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2350) Type1 Parser hangs indefinitely

2014-09-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135708#comment-14135708
 ] 

Tilman Hausherr commented on PDFBOX-2350:
-

Please try also 
{code}
PDDocument.loadNonSeq(new File(pdfFilename), );
{code}
that does the decryption if needed.

also, the correct way to decrypt with the old parser is
{code}
if( document.isEncrypted() )
{
try
{
StandardDecryptionMaterial sdm = new 
StandardDecryptionMaterial();
document.openProtection(sdm);
}
catch( InvalidPasswordException e )
{
System.err.println( Error: The document is encrypted. );
}
}
{code}

I'm not saying that this will solve your problems but it is worth a try.

If it still doesn't work, please save the decrypt byte array (in the 
ParseBinary nethod) in a file and post it here.

 Type1 Parser hangs indefinitely
 ---

 Key: PDFBOX-2350
 URL: https://issues.apache.org/jira/browse/PDFBOX-2350
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
 Environment: Windows 7, JDK 1.7.0_51-b13
Reporter: Daniel Scheibe

 When rendering the first page of my pdf document the Type1Parser 
 (org.apache.fontbox.type1.Type1Parser) hangs in a loop in 
 {{parseBinary(byte[] bytes) throws IOException}}
 and kills our rendering pipeline. Please find the loop that hangs below:
 // find /Private dict
 while (!lexer.peekToken().getText().equals(Private))
 {
 lexer.nextToken();
 }
 There is no token named Private ever in the list of returned tokens 
 (they're empty all the time).  
 Furthermore going deeper into the source code it seems the class reading the 
 tokens (Type1Lexer) does never finally advance the buffer position and always 
 returns an empty name token in the readToken(Token prevToken) method.
 Looking at the decrypted buffer i cannot get something useful out of it based 
 on my current understanding.
 Unfortunately i cannot provide the pdf in question as it contains confidental 
 data.
 Acrobat Reader XI Version 11.0.08 renders the document just fine.
 In addition it seems the pdf was encrypted (40-Bit RC4) with an empty 
 password and says it's pdf version 1.5.
 Does this provide enough information or can i do anything else to help 
 nailing this one down?
 I guess this might be a pdf document structure/feature that is not yet 
 supported completely but at least pdfbox should throw an exception instead of 
 failing silently...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-2321) java.lang.ExceptionInInitializerError in PDFRenderer.renderImageWithDPI

2014-09-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-2321.
---
Resolution: Incomplete

Closing, please reopen and attach the file as described. We won't register with 
scribd to get the PDF.

 java.lang.ExceptionInInitializerError in PDFRenderer.renderImageWithDPI
 ---

 Key: PDFBOX-2321
 URL: https://issues.apache.org/jira/browse/PDFBOX-2321
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
 Environment: Windows 8.1 x64
Reporter: Marino

 An unhandled exception of type 'java.lang.ExceptionInInitializerError' occurs 
 when calling the method with the following pdf and 96 dpi.
 renderImageWithDPI(i, 96);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-2354) DataFormatException: incorrect header check

2014-09-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-2354.
---
Resolution: Invalid

The file is most probably broken and here's why:

- I first traced to find out which object has the problem. It is 97 0 obj.
- I then traced to see whether the object is decrypted. Yes, it is.
- I traced to see which other objects have the problem. Yes: 108, 115, 123, 
129, 133, 171, 196, 204, 223. All these objects are streams with length 7.
- I then ran qpdf with the file. It brings this error message:
(file position 160266): error decoding stream data for object 108 0: 
stream inflate: inflate: data: incorrect header check
stream inflate: inflate: data: incorrect header check

 DataFormatException: incorrect header check
 ---

 Key: PDFBOX-2354
 URL: https://issues.apache.org/jira/browse/PDFBOX-2354
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner

 http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/601501018.pdf?revision=682412view=copathrev=793348
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq 601501018.pdf 
 java.util.zip.DataFormatException: incorrect header check
   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-16 Thread gee (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gee updated PDFBOX-2301:

Attachment: clone2.diff

Although this patch need java 1.7, would fix issues formerly addressed by you. 
Now adding/removing element to shallow-cloned list changes internal data.

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  |
48 | 8,186,528
 |  |  |- class class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00
   
   
  |
 

Re: [DISCUSS] move documentation and examples to git

2014-09-16 Thread Santosh Arakeri
Pl dont send me mail.
On 16 Sep 2014 13:52, Maruan Sahyoun sahy...@fileaffairs.de wrote:

 Hi there,

 in order to make it easier for people to contribute to the documentation
 and examples I thought about the potential benefits of moving these to a
 git based repository instead of svn. The main idea behind that is to allow
 people to contribute via github opening another channel of communication
 and making it easier to contribute.

 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.

 I haven’t thought about all potential implications and changes necessary
 yet but wanted to get a first feedback about support for that idea before
 putting more effort into that.

 WDYT?

 Maruan


[jira] [Updated] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-16 Thread gee (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gee updated PDFBOX-2301:

Attachment: clone3.diff

clone2.diff is now invalid.
This patch fills remaining hole in previous patch. now every write operation 
ensures that changes is not seen to any other cloned object.

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff, clone3.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  |
48 | 8,186,528
 |  |  |- class class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00
   
   
   

[jira] [Created] (PDFBOX-2356) Error Validating PDF Archive Document

2014-09-16 Thread Cetra Free (JIRA)
Cetra Free created PDFBOX-2356:
--

 Summary: Error Validating PDF Archive Document
 Key: PDFBOX-2356
 URL: https://issues.apache.org/jira/browse/PDFBOX-2356
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 1.8.6, 1.8.5, 1.8.4
Reporter: Cetra Free


When trying to validate a PDF archive file (attached to this ticket) we get the 
following error:

{code}
7.2   - Error on MetaData, ModificationDate present in the document catalog 
dictionary doesn't match with XMP information
{code}

This is because the the Modification Date in the Dictionary is parsed 
differently from the XMP Metadata.  The XMP Metadata is correct, but the Date 
from the Dictionary appends an extra 30 minutes.

The following is the raw COSObject from the PDF File
{code}
COSString{D:20140917122850+09'30'}
{code}

The Long value should be *141092273*

The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the Date 
with Long *141092453* which is 30 minutes ahead.

XMP Modification Date is parsed differently and returns the correct date.

This means that validation will fail for PDF Archives.

My suggestion would be to refactor the parseDate function to use the Standard 
Java library.

Here's an example class which will be compatible with the PDF Specification:

{code}
static class DateParser {

 private MapInteger, SimpleDateFormat formats =
   new HashMapInteger, SimpleDateFormat();
 
 public DateParser() {
   String expr = ;
 
  for(String part: Arrays.asList(, MM, dd, HH, mm, ss, Z)) {
 expr = expr + part;
 formats.put(expr.length(), new SimpleDateFormat(expr));
   }
 }
 
 public Calendar parseDate(String expr) {
   try {
 expr = expr.replace(D:, ).replace(', ).replace(Z, +);
 Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
 
 
 Calendar calendar =  Calendar.getInstance();
 calendar.setTime(date);
 
 return calendar;
   } catch (ParseException e) {
 return null;
   }
 }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2356) Error Validating PDF Archive Document

2014-09-16 Thread Cetra Free (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cetra Free updated PDFBOX-2356:
---
Attachment: pdfafile.pdf

 Error Validating PDF Archive Document
 -

 Key: PDFBOX-2356
 URL: https://issues.apache.org/jira/browse/PDFBOX-2356
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 1.8.4, 1.8.5, 1.8.6
Reporter: Cetra Free
 Attachments: pdfafile.pdf


 When trying to validate a PDF archive file (attached to this ticket) we get 
 the following error:
 {code}
 7.2   - Error on MetaData, ModificationDate present in the document catalog 
 dictionary doesn't match with XMP information
 {code}
 This is because the the Modification Date in the Dictionary is parsed 
 differently from the XMP Metadata.  The XMP Metadata is correct, but the Date 
 from the Dictionary appends an extra 30 minutes.
 The following is the raw COSObject from the PDF File
 {code}
 COSString{D:20140917122850+09'30'}
 {code}
 The Long value should be *141092273*
 The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the 
 Date with Long *141092453* which is 30 minutes ahead.
 XMP Modification Date is parsed differently and returns the correct date.
 This means that validation will fail for PDF Archives.
 My suggestion would be to refactor the parseDate function to use the Standard 
 Java library.
 Here's an example class which will be compatible with the PDF Specification:
 {code}
 static class DateParser {
  private MapInteger, SimpleDateFormat formats =
new HashMapInteger, SimpleDateFormat();
  
  public DateParser() {
String expr = ;
  
   for(String part: Arrays.asList(, MM, dd, HH, mm, ss, Z)) {
  expr = expr + part;
  formats.put(expr.length(), new SimpleDateFormat(expr));
}
  }
  
  public Calendar parseDate(String expr) {
try {
  expr = expr.replace(D:, ).replace(', ).replace(Z, +);
  Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
  
  
  Calendar calendar =  Calendar.getInstance();
  calendar.setTime(date);
  
  return calendar;
} catch (ParseException e) {
  return null;
}
  }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)