Visible signature image

2014-03-14 Thread Tilman Hausherr
I believe that somebody mentioned somewhere that creating the signature 
image didn't work properly, but I just can't find out who it was. While 
working on a test for JPEGFactory (PDFBOX-1969) I noticed that 
JPEGFactory.createFromImage() was temporarly broken (now hopefully no 
more), and this method is only used by 
PDVisibleSigBuilder.createSignatureImage().


I see now that this was created in PDFBOX-1766 by Thomas and Vakhtang - 
please test whether it still works.


Tilman


[jira] [Commented] (PDFBOX-1969) JPEGFactory bug

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935918#comment-13935918
 ] 

John Hewson commented on PDFBOX-1969:
-

Thanks, that's great. I've never encountered a transparent JPEG, though they 
are part of the spec. Maybe I can make one in Photoshop.

> JPEGFactory bug
> ---
>
> Key: PDFBOX-1969
> URL: https://issues.apache.org/jira/browse/PDFBOX-1969
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Steven Burg
> Fix For: 2.0.0
>
>
> Attempted to run the RubberStampWithImage sample and received the following 
> errors:
> Exception in thread "main" java.lang.NullPointerException
>at 
> org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
> This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1969) JPEGFactory bug

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1969:


Fix Version/s: 2.0.0

> JPEGFactory bug
> ---
>
> Key: PDFBOX-1969
> URL: https://issues.apache.org/jira/browse/PDFBOX-1969
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Steven Burg
> Fix For: 2.0.0
>
>
> Attempted to run the RubberStampWithImage sample and received the following 
> errors:
> Exception in thread "main" java.lang.NullPointerException
>at 
> org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
> This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1969) JPEGFactory bug

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1969:


Affects Version/s: 2.0.0

> JPEGFactory bug
> ---
>
> Key: PDFBOX-1969
> URL: https://issues.apache.org/jira/browse/PDFBOX-1969
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Steven Burg
> Fix For: 2.0.0
>
>
> Attempted to run the RubberStampWithImage sample and received the following 
> errors:
> Exception in thread "main" java.lang.NullPointerException
>at 
> org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
> This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 10:02 PM:
---

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the value to overflow you 
will actually loose entropy resulting in lower security.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the value to overflow you 
will actually loose entropy, decreasing security.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 10:03 PM:
---

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the nonce to overflow you 
will actually loose entropy resulting in less security.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the nonce to overflow you 
will actually loose entropy resulting in lower security.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 10:03 PM:
---

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the nonce to overflow you 
will actually loose entropy resulting in lower security.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the value to overflow you 
will actually loose entropy resulting in lower security.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 10:02 PM:
---

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the value to overflow you 
will actually loose entropy, decreasing security.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the value to overflow you 
will actually loose entropy and decrease security.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 10:00 PM:
---

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same. In fact, if multiplying by the hash causes the value to overflow you 
will actually loose entropy and decrease security.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:59 PM:
--

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, e.g. the hash is always the same when the document contents are 
the same.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, as it is always the same when the document contents are the same.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:54 PM:
--

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
able to provide cryptographically secure random numbers and the document hash 
is almost certainly less random than the randomness produced by SecureRandom.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:56 PM:
--

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, as it is always the same when the document contents are the same.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, as it is always the same when the document contents is the same.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:55 PM:
--

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom, as it is always the same when the document contents is the same.


was (Author: jahewson):
1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
already able to provide cryptographically secure random numbers and the 
document hash is almost certainly less random than the randomness produced by 
SecureRandom.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935714#comment-13935714
 ] 

John Hewson commented on PDFBOX-1847:
-

1. Ok, will use Java's SHA-256
2. Ok, will use the a 32-bit range instead.
3. :)
4. Ok, will remove the header
5. Yes, please do. The IOExceptions are fine, it's just the fact that 
ClassCastException should not be thrown by bouncy castle (but it could be 
possible).
6. I would advise against using your own sources of entropy, SecureRandom is 
able to provide cryptographically secure random numbers and the document hash 
is almost certainly less random than the randomness produced by SecureRandom.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935713#comment-13935713
 ] 

Tilman Hausherr commented on PDFBOX-1988:
-

Yeah, works great!

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5, 2.0.0
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>
> at 
> org.apache.pdfbox.util.P

[jira] [Commented] (PDFBOX-1969) JPEGFactory bug

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935702#comment-13935702
 ] 

Tilman Hausherr commented on PDFBOX-1969:
-

I first isolated the "JPEG only" stuff to deal with shorter code.

I then applied a similar change than I did in CCITT. The test now works. An 
image-to-pdf test I did (temporarly modified the example to use that method) 
also worked. What I couldn't test is whether the alpha thing works, i.e. 
whether it is possible to put a semi transparent jpeg in a PDF.

This was done in rev 1577734 and rev 1577735. John, please review this, if you 
want.

> JPEGFactory bug
> ---
>
> Key: PDFBOX-1969
> URL: https://issues.apache.org/jira/browse/PDFBOX-1969
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Steven Burg
>
> Attempted to run the RubberStampWithImage sample and received the following 
> errors:
> Exception in thread "main" java.lang.NullPointerException
>at 
> org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
> This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935699#comment-13935699
 ] 

John Hewson commented on PDFBOX-1988:
-

Tilman, I checked with my fix in revision 1577725 and 2.0 rendering is now good.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5, 2.0.0
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.ja

[jira] [Issue Comment Deleted] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1988:


Comment: was deleted

(was: {quote}
adding a line for Courier-Bold in the PDFBox_External_Fonts.properties file.
{quote}

Courier-Bold is one of the Standard 14 fonts, it's not an external font. Also 
it's a Type 1 font.)

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5, 2.0.0
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
>

[jira] [Comment Edited] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935693#comment-13935693
 ] 

John Hewson edited comment on PDFBOX-1988 at 3/14/14 9:37 PM:
--

{quote}
adding a line for Courier-Bold in the PDFBox_External_Fonts.properties file.
{quote}

Courier-Bold is one of the Standard 14 fonts, it's not an external font. Also 
it's a Type 1 font.


was (Author: jahewson):
{quote}
adding a line for Courier-Bold in the PDFBox_External_Fonts.properties file.
{quote}

Courier-Bold is one of the Standard 14 fonts, it's not an external font.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5, 2.0.0
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.Nul

[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935693#comment-13935693
 ] 

John Hewson commented on PDFBOX-1988:
-

{quote}
adding a line for Courier-Bold in the PDFBox_External_Fonts.properties file.
{quote}

Courier-Bold is one of the Standard 14 fonts, it's not an external font.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5, 2.0.0
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
>  

[jira] [Resolved] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1988.
-

   Resolution: Fixed
Fix Version/s: 2.0.0

Revision 1577682 adds a new getName method to COSDictionary for connivence and 
a new COSName.EMPTY value. It also does some cleaning up of COSName such as 
removing the JavaDoc which was not meaningful.

Revision 1577683 adds the ability to get a COSName from a COSDictionary for 
connivence.

I've made the fix to PDFontFactory in revision 1577725 in trunk and 1577728 in 
1.8

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5, 2.0.0
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 

[jira] [Reopened] (PDFBOX-1946) Running within an Applet has many AccessControlException 's

2014-03-14 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reopened PDFBOX-1946:
-


I've encountered a case where this fix masks a bug in PDFBox. In PDFBOX-1988 a 
bug causes the font in PDFStreamEngine to be null, which should cause an NPE, 
causing the user to file a bug report. In 1.8.4 this happens, but in 1.8.5 and 
2.0 it does not, because it's been wrapped with if (font != null) so the NPE is 
silently swallowed.

We need to find some other way of fixing the null font issue, identifying 
exactly why it is null in an applet and perhaps just substituting a default 
font such as PDType1.HELVETICA so that null remains an invalid value. I've 
removed the fix from PDFStreamEngine in revision 1577725 in trunk and 1577728 
in 1.8, and am re-opening this issue.

> Running within an Applet has many AccessControlException 's
> ---
>
> Key: PDFBOX-1946
> URL: https://issues.apache.org/jira/browse/PDFBOX-1946
> Project: PDFBox
>  Issue Type: Wish
>Affects Versions: 1.8.4
> Environment: Running within an Applet
>Reporter: Fred Andrews
>  Labels: Security
> Fix For: 1.8.5, 2.0.0
>
> Attachments: patch.zip
>
>
> I've identified 6 modules that should be modified to avoid 
> AccessControlException's while running within an Applet.  My solution would 
> be to catch each AccessControlException and then use a default or continue 
> on.  For most of these, that is probably the best solution, for a few 
> especially PDFStreamEngine someone may have a better idea.
> The modules that have issues:
> pdfbox\pdfparser\BaseParser -- line 131 call to Boolean.getBoolean, line 170 
> call to Integer.getInteger
> pdfbox\util\PDFTextStripper -- line 79 call to System.getProperty()
> pdfbox\util\ResourceLoader -- line 67 call to getSystemClassLoader()
> pdfbox\pdmodel\graphics\color\PDColorState -- line 50, call to Color.getColor
> pdfbox/encoding/Encoding -- line 78, call to System.getProperty
> pdfbox\util\PDFStreamEngine -- Line 351 & 364 check for font == null (will be 
> null if had resource loading problems)
> Not sure what the best way is to proceed.  Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935533#comment-13935533
 ] 

Tilman Hausherr edited comment on PDFBOX-1988 at 3/14/14 8:15 PM:
--

Text extraction always works with the 2.0 version.

Rendering is bad with the 2.0 version. I'm able to render the file properly 
with 2.0 by making two changes:

- adding 
{code}
dic.setName( COSName.TYPE, COSName.TRUE_TYPE.toString() );
{code}
after the "Substituting TrueType for unknown font subtype=" warning

(I'm not committing this because I prefer to have another opinion. The idea is 
that if we are to fake a TT font, that we should attach the type)

- adding a line for Courier-Bold in the PDFBox_External_Fonts.properties file.



was (Author: tilman):
Rendering is also bad.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3

[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935533#comment-13935533
 ] 

Tilman Hausherr commented on PDFBOX-1988:
-

Rendering is also bad.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>
> at 
> org.apache.pdfbox.util.PDFS

[jira] [Updated] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1988:


Component/s: Rendering

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268

[jira] [Commented] (PDFBOX-1975) Improve TestImageIOUtils unit tests to check image resolution and compression

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935457#comment-13935457
 ] 

Tilman Hausherr commented on PDFBOX-1975:
-

I changed the POM files to a different repository that I found here:
http://stackoverflow.com/a/8923663/535646
https://github.com/stain/jai-imageio-core
Reason: [~msahyoun] sent me three JPX test files, and these couldn't be 
rendered by the current jai_imageio and the rendering was blank. The new one 
does render the files. (For production, you should use the three JPedal files I 
mentioned in PDFBOX-1752 instead, these are not available through maven). 
This was done in rev. 1577658. The three JPX files were added in rev. 1577661.

> Improve TestImageIOUtils unit tests to check image resolution and compression
> -
>
> Key: PDFBOX-1975
> URL: https://issues.apache.org/jira/browse/PDFBOX-1975
> Project: PDFBox
>  Issue Type: Task
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: imageio, test, tiff
> Fix For: 2.0.0
>
>
> Because of the problems with recent changes (see PDFBOX-1963), I will improve 
> the unit tests so that image resolution and compression is checked.
> I found out that JPEGs don't have a resolution, BMP had the wrong resolution. 
> The fault wasn't in the java TIFF writer as I thought before, it is in the 
> java PNG writer, which uses the PixelSize values wrongly, i.e. it interprets 
> them as "pixels per mm" instead of "mm per pixel" as per specification. The 
> JPEG writer throws an exception "JFIF APP0 must be first marker after SOI". 
> The BMP writer can set the resolution, but the BMP reader doesn't read it.
> (Some of this might be different depending on the version)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1975) Improve TestImageIOUtils unit tests to check image resolution and compression

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935432#comment-13935432
 ] 

Tilman Hausherr commented on PDFBOX-1975:
-

I added a CCITT G4 compressed file as yet another test file in rev 1577653.

> Improve TestImageIOUtils unit tests to check image resolution and compression
> -
>
> Key: PDFBOX-1975
> URL: https://issues.apache.org/jira/browse/PDFBOX-1975
> Project: PDFBox
>  Issue Type: Task
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
>  Labels: imageio, test, tiff
> Fix For: 2.0.0
>
>
> Because of the problems with recent changes (see PDFBOX-1963), I will improve 
> the unit tests so that image resolution and compression is checked.
> I found out that JPEGs don't have a resolution, BMP had the wrong resolution. 
> The fault wasn't in the java TIFF writer as I thought before, it is in the 
> java PNG writer, which uses the PixelSize values wrongly, i.e. it interprets 
> them as "pixels per mm" instead of "mm per pixel" as per specification. The 
> JPEG writer throws an exception "JFIF APP0 must be first marker after SOI". 
> The BMP writer can set the resolution, but the BMP reader doesn't read it.
> (Some of this might be different depending on the version)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935427#comment-13935427
 ] 

John Hewson commented on PDFBOX-1988:
-

The attached file uses the  "Courier Bold" font which is one of the "standard 
14" fonts which do not need to be embedded, so this looks like a bug in PDFBox 
rather than a font embedding issue.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
>   

[jira] [Reopened] (PDFBOX-1969) JPEGFactory bug

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-1969:
-


While writing the test for the CCITTFactory (PDFBOX-1983), I thought I'd write 
a test for this class as well, before noticing that there was already a test in 
the example package.

The test for createFromImage() failed with the now familiar "Stream was not 
read" exception. I'll correct that one this weekend unless someone else does it 
first.

I remember that someone said on the dev list that the signature images didn't 
appear. This might be caused by this bug.

In the meantime, I have committed a first version of the test in rev 1577622, 
with only one of two tests active. 

> JPEGFactory bug
> ---
>
> Key: PDFBOX-1969
> URL: https://issues.apache.org/jira/browse/PDFBOX-1969
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Steven Burg
>
> Attempted to run the RubberStampWithImage sample and received the following 
> errors:
> Exception in thread "main" java.lang.NullPointerException
>at 
> org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
> This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1969) JPEGFactory bug

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935328#comment-13935328
 ] 

Tilman Hausherr edited comment on PDFBOX-1969 at 3/14/14 5:52 PM:
--

While writing the test for the CCITTFactory (PDFBOX-1983), I thought I'd write 
a test for this class as well, before reading here / noticing that there was 
already a test in the example package.

The test for createFromImage() failed with the now familiar "Stream was not 
read" exception. I'll correct that one this weekend unless someone else does it 
first.

I remember that someone said on the dev list that the signature images didn't 
appear. This might be caused by this bug.

In the meantime, I have committed a first version of the test in rev 1577622, 
with only one of two tests active. 


was (Author: tilman):
While writing the test for the CCITTFactory (PDFBOX-1983), I thought I'd write 
a test for this class as well, before noticing that there was already a test in 
the example package.

The test for createFromImage() failed with the now familiar "Stream was not 
read" exception. I'll correct that one this weekend unless someone else does it 
first.

I remember that someone said on the dev list that the signature images didn't 
appear. This might be caused by this bug.

In the meantime, I have committed a first version of the test in rev 1577622, 
with only one of two tests active. 

> JPEGFactory bug
> ---
>
> Key: PDFBOX-1969
> URL: https://issues.apache.org/jira/browse/PDFBOX-1969
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Steven Burg
>
> Attempted to run the RubberStampWithImage sample and received the following 
> errors:
> Exception in thread "main" java.lang.NullPointerException
>at 
> org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93)
>at 
> org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185)
> This happens with any jog I tested with.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1983) Unable to add TIF images, CCITTFactory not working

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935295#comment-13935295
 ] 

Tilman Hausherr edited comment on PDFBOX-1983 at 3/14/14 5:34 PM:
--

I added a test in rev 1577618 and 1577619. Note the part at the end of the 
code, it's unclear to me if ximage.getImage() should return an image. 
(Currently it doesn't)


was (Author: tilman):
I added a test in rev 1577618. Note the part at the end of the code, it's 
unclear to me if ximage.getImage() should return an image. (Currently it 
doesn't)

> Unable to add TIF images, CCITTFactory not working
> --
>
> Key: PDFBOX-1983
> URL: https://issues.apache.org/jira/browse/PDFBOX-1983
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Joel Kääpä
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: G4.tif, huhu.pdf
>
>
> As used in the AddImageToPDF example, the following line generates an error 
> with tif image:
> PDImageXObject ximage =  CCITTFactory.createFromRandomAccess(document, new 
> RandomAccessFile(new File(imagePath), "r"));
> java.io.IOException: Stream was not read
> at org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:235)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:80)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:70)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory.createFromRandomAccess(CCITTFactory.java:50)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1983) Unable to add TIF images, CCITTFactory not working

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935295#comment-13935295
 ] 

Tilman Hausherr commented on PDFBOX-1983:
-

I added a test in rev 1577618. Note the part at the end of the code, it's 
unclear to me if ximage.getImage() should return an image. (Currently it 
doesn't)

> Unable to add TIF images, CCITTFactory not working
> --
>
> Key: PDFBOX-1983
> URL: https://issues.apache.org/jira/browse/PDFBOX-1983
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Joel Kääpä
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: G4.tif, huhu.pdf
>
>
> As used in the AddImageToPDF example, the following line generates an error 
> with tif image:
> PDImageXObject ximage =  CCITTFactory.createFromRandomAccess(document, new 
> RandomAccessFile(new File(imagePath), "r"));
> java.io.IOException: Stream was not read
> at org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:235)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:80)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:70)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory.createFromRandomAccess(CCITTFactory.java:50)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread Craig Strong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Strong updated PDFBOX-1988:
-

Attachment: Test1.pdf

Problem PDF with no embedded fonts.

> PDFBox ExtractText issue of PDF with no embedded fonts
> --
>
> Key: PDFBOX-1988
> URL: https://issues.apache.org/jira/browse/PDFBOX-1988
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.8.4
> Environment: Windows 7
> Also, PASE on IBM i
>Reporter: Craig Strong
>  Labels: patch
> Fix For: 1.8.5
>
> Attachments: Test1.pdf
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> I have been using PDFBox 1.8.4 to extract text from several different PDF 
> files fine.  I use the latest PDFBox app with ExtractText command line.  
> There is one PDF that PDFBox (and iText) fails to extract any text even 
> though I can extract the text with Adobe Reader and also pdftotext.exe part 
> of XPdf.  "java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I 
> don't want to have to rely on using pdftotext.exe from a PC since this is 
> part of an automated application.  I think the error relates to an unknown 
> font type and having to use the few fonts installed in the jar file.  I tried 
> running the API classes and trying to force a font from a certain location 
> but I still got errors.  I thought I loaded the font with the loadTTF method 
> but I don't know if that did anything with the font.  I would really like to 
> have this working straight from the ExtractText class anyway.
> Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
> our IBM i in the PASE environment but I get the same errors.  The section 
> starting processEncodedText and on repeats a few times so I just included the 
> first entries.
>  
> Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory 
> createFont   
> WARNING: Substituting TrueType for unknown font subtype=  
> 
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
> WARNING: java.lang.NullPointerException   
> 
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
> 
> at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119) 
>
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
>   
> at 
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) 
>
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)  
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>  
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)  
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) 
>
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
>
> at 
> org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)   
>
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
>   
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)  
>   
> Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
> processEncodedText   
> WARNING: java.lang.NullPointerException   
>   
> Throwable occurred: java.lang.NullPointerException
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
> at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) 
> 
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFSt

[jira] [Created] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts

2014-03-14 Thread Craig Strong (JIRA)
Craig Strong created PDFBOX-1988:


 Summary: PDFBox ExtractText issue of PDF with no embedded fonts
 Key: PDFBOX-1988
 URL: https://issues.apache.org/jira/browse/PDFBOX-1988
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.8.4
 Environment: Windows 7
Also, PASE on IBM i
Reporter: Craig Strong
 Fix For: 1.8.5


I have been using PDFBox 1.8.4 to extract text from several different PDF files 
fine.  I use the latest PDFBox app with ExtractText command line.  There is one 
PDF that PDFBox (and iText) fails to extract any text even though I can extract 
the text with Adobe Reader and also pdftotext.exe part of XPdf.  "java -jar 
pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt".  I don't want to have to 
rely on using pdftotext.exe from a PC since this is part of an automated 
application.  I think the error relates to an unknown font type and having to 
use the few fonts installed in the jar file.  I tried running the API classes 
and trying to force a font from a certain location but I still got errors.  I 
thought I loaded the font with the loadTTF method but I don't know if that did 
anything with the font.  I would really like to have this working straight from 
the ExtractText class anyway.
Here are the errors I am getting.  I tried this from both a Windows 7 PC and 
our IBM i in the PASE environment but I get the same errors.  The section 
starting processEncodedText and on repeats a few times so I just included the 
first entries.

 

Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont 
  
WARNING: Substituting TrueType for unknown font subtype=
  
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator  
  
WARNING: java.lang.NullPointerException 
  
Throwable occurred: java.lang.NullPointerException  
  
at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)


at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.(PDTrueTypeFont.java:119)   
 
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121) 
 
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204) 

at 
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)   
 
at 
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
 
at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
 
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)  
 
at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
 
at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)   
 
at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)  
 
at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)  

at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)  

at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)

Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
processEncodedText   
WARNING: java.lang.NullPointerException 


Throwable occurred: java.lang.NullPointerException  
  
at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)   
  
at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
   
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
  
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
  
at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)  
   
at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
   
at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)   
   
at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)  
   
at org.apache.pdfbo

[jira] [Commented] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread vakhtang koroghlishvili (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934946#comment-13934946
 ] 

vakhtang koroghlishvili commented on PDFBOX-1847:
-

1. We can do it with java's build-in "SHA-256" message digest too. :) you can 
change with it too. There is no difference :)
2. It is just a number, not special. Then it will be better if we add document 
hash for nonce too.   
3. you are right.
4. I was testing TSA for emails. we should remove this - If we have nothing in 
common with emails, we shouldn't use this header. 
5. I don't remember... I will test it...  In additional, in.readObject() might 
throw java.io.EOFException or java.io.IOException. So method needs some 
refactoring  :) 
6. Sure. Then I will add document hash to the nonce too :) For more security :) 


> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931975#comment-13931975
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:14 AM:
--

Just to recap our discussion on the mailing list: this patch is going to go 
into a new singing module:

{quote}
*Vakhtang Koroghlishvili:*
I think, it's time to create another project named sign-box or something like 
that.  At the moment I have time and I can create that project  with very good  
design architect and show you a patch or comitters  can create that project 
with existence features and then we will add new features step by step.
{quote}

and the existing non-core signing classes will be moved there too.

For anyone wondering why we would want to move the signing classes, one reason 
is that not all of them are part of the PDF specification. The spec defines a 
generic plug-in framework for signatures with some basic built-in 
functionality, but nothing beyond that. Instead, there are various third party 
signing standards, such as [PAdES|http://en.wikipedia.org/wiki/PAdES].


was (Author: jahewson):
Just to recap our discussion on the mailing list: this patch is going to go 
into a new singing module:

{quote}
*Vakhtang Koroghlishvili:*
I think, it's time to create another project named sign-box or something like 
that.  At the moment I have time and I can create that project  with very good  
design architect and show you a patch or comitters  can create that project 
with existence features and then we will add new features step by step.
{quote}

and the existing non-core signing classes will be moved there too.

For anyone wondering why we would want to move the signing classes, one reason 
is that they are not part of the PDF specification. The spec defines a generic 
plug-in framework for signatures with some basic built-in functionality, but 
nothing beyond that. Instead, there are various third party signing standards, 
such as [PAdES|http://en.wikipedia.org/wiki/PAdES].

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934798#comment-13934798
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:12 AM:
--

I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is that the current time is not (on its own) a source of 
cryptographically secure entropy, furthermore it's already included in the seed 
for  _java.util.Random_ which means that multiplying the random nonce by the 
current time cannot improve the nonce. I will fix these issues by using 
SecureRandom, but I wanted to let you know about them.

Thanks!


was (Author: jahewson):
I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is 

[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934798#comment-13934798
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:12 AM:
--

I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is that the current time is not (on it's own) a source of 
cryptographically secure entropy, furthermore it's already included in the seed 
for  _java.util.Random_ which means that multiplying the random nonce by the 
current time cannot improve the nonce. I will fix these issues by using 
SecureRandom, but I wanted to let you know about them.

Thanks!


was (Author: jahewson):
I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is

[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934798#comment-13934798
 ] 

John Hewson edited comment on PDFBOX-1847 at 3/14/14 9:11 AM:
--

I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is that the current time is not a source of cryptographically secure 
entropy, furthermore it's already included in the seed for  _java.util.Random_ 
which means that multiplying the random nonce by the current time cannot 
improve the nonce. I will fix these issues by using SecureRandom, but I wanted 
to let you know about them.

Thanks!


was (Author: jahewson):
I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is that the curr

[jira] [Commented] (PDFBOX-1847) TSA Time Signature

2014-03-14 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934798#comment-13934798
 ] 

John Hewson commented on PDFBOX-1847:
-

I'm in the process of adding this patch and making the necessary changes to 
PDFBox. I have some questions and feedback which need to be discussed:

# In CreateSignature you define SHA_256_IDENTIFIER which is passed to the 
getTimeStampAttribute method of TSACreator, which means the message digest 
which gets created is:
{code}
MessageDigest.getInstance("2.16.840.1.101.3.4.2.1", new BouncyCastleProvider());
{code}
Why not use Java's built-in "SHA-256" message digest?

# In TSACreator, the getTimeStampToken method generates an nonce as follows:
{code}
BigInteger nonce = TSAUtils.generateNonce(0, 97359710);
{code}
Where does magic value of 97359710 come from? Is it special?

# In TSACreator, creation of the nonce is immediately followed by:
{code}
log.log(Level.INFO, "nonce is" + nonce);
{code}
Which leaks the nonce to the log file before it has been sent to the server. I 
have fixed this, but I still wanted to let you know about it.

# In TSACreator, the openTSAConnection method sets the following HTTP header:
{code}
connection.setRequestProperty("Content-Transfer-Encoding", "binary");
{code}
But "Content-Transfer-Encoding" is a MIME header for e-mails, not HTTP 
requests, is there some reason why it's there?

# In TSAUtils, the byteToASN1Object method has a try/catch for 
ClassCastException, but I don't see why this exception needs catching, does 
bouncy castle throw it? (It shouldn't, but it is known to do things like that).

# In TSAUtils, the generateNonce method is as follows:
{code}
public static BigInteger generateNonce(int min, int max)
{
Random rand = new Random();
Integer randomNum = rand.nextInt((max - min) + 1) + min;
BigInteger nonce = new BigInteger(randomNum.toString());

Long timeLong = System.currentTimeMillis();
Integer timeInt = timeLong != null ? timeLong.intValue() : 761820123;

return nonce.multiply((new BigInteger(timeInt.toString(;
}
{code}
There are some problems with this approach. The first is that 
_java.util.Random_ is not cryptographically secure ([see 
documentation|http://docs.oracle.com/javase/7/docs/api/java/util/Random.html]). 
The second is that the current time is not a source of cryptographically secure 
entropy, furthermore it's already included in the seed for  _java.util.Random_ 
which means that multiplying the random nonce by the current time cannot 
improve the nonce. I will fix these issues by using SecureRandom, but I wanted 
to let you know about the issues with this method.

> TSA Time Signature
> --
>
> Key: PDFBOX-1847
> URL: https://issues.apache.org/jira/browse/PDFBOX-1847
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: John Hewson
> Fix For: 2.0.0
>
> Attachments: CreateSignature-updated.java.patch, 
> TSATimeSignature.patch, resultOfSigning.jpg
>
>
> When we was signing document, we was using time from our time. For more 
> security we can use Time Stamp server. 
> "Trusted timestamping is the process of securely keeping track of the 
> creation and modification time of a document. Security here means that no one 
> — not even the owner of the document — should be able to change it once it 
> has been recorded provided that the timestamper's integrity is never 
> compromised."(wiki)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing

2014-03-14 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934744#comment-13934744
 ] 

Maruan Sahyoun commented on PDFBOX-1987:


I attached a version of a PDF lexer together with a set of tests and some 
helper classes which extend RandomAccessRead to be able to read test data from 
strings for easier testing.

The purpose is that people who are interested - and have a better programming 
background - can inspect and comment on the code. 

An are which I kept out is how to handle malformed tokens such as strings which 
have an unbalanced number of parenthesis. For a relaxed processing such errors 
should be fixed. For a strict processing such errors should be reported and 
potentially fixed as the process shouldn’t stop with the first error.

The current idea I have in mind is that the lexer throws events in such cases 
which a parser could listen and react upon. Again looking for comments and 
ideas on this.

> Provide a PDF Lexer as a base for PDF parsing
> -
>
> Key: PDFBOX-1987
> URL: https://issues.apache.org/jira/browse/PDFBOX-1987
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: src.zip
>
>
> In order to enhance the parsing process and as a foundation for a combination 
> of the different parsers a PDF lexer should be provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing

2014-03-14 Thread Maruan Sahyoun (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-1987:
---

Attachment: src.zip

A PDF lexer

> Provide a PDF Lexer as a base for PDF parsing
> -
>
> Key: PDFBOX-1987
> URL: https://issues.apache.org/jira/browse/PDFBOX-1987
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Maruan Sahyoun
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: src.zip
>
>
> In order to enhance the parsing process and as a foundation for a combination 
> of the different parsers a PDF lexer should be provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing

2014-03-14 Thread Maruan Sahyoun (JIRA)
Maruan Sahyoun created PDFBOX-1987:
--

 Summary: Provide a PDF Lexer as a base for PDF parsing
 Key: PDFBOX-1987
 URL: https://issues.apache.org/jira/browse/PDFBOX-1987
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Reporter: Maruan Sahyoun
Priority: Minor
 Fix For: 2.0.0


In order to enhance the parsing process and as a foundation for a combination 
of the different parsers a PDF lexer should be provided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1466) Rendering of pattern colorspace fails

2014-03-14 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934711#comment-13934711
 ] 

Tilman Hausherr edited comment on PDFBOX-1466 at 3/14/14 7:37 AM:
--

Here's a current rendering, it is almost perfect now. The only problem left is 
a weird shadow. I'm not sure whether the shadow is related to the pattern 
colorspace. It might be a similar problem as in PDFBOX-1830 and PDFBOX-1954 
(line width).


was (Author: tilman):
Here's a current rendering, it is almost perfect now. The only problem left is 
a weird shadow. I'm not sure whether the shadow is related to the pattern 
colorspace. It might be a similar problem as in PDFBOX-1954 (line width).

> Rendering of pattern colorspace fails
> -
>
> Key: PDFBOX-1466
> URL: https://issues.apache.org/jira/browse/PDFBOX-1466
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.7.1, 1.8.4, 2.0.0
> Environment: Windows 7, JDK 1.6 / 1.7
>Reporter: Maurice Koch
>  Labels: tilingpattern
> Fix For: 2.0.0
>
> Attachments: pdfbox-1466.pdf-1.png, report.pdf, report.png
>
>
> I was trying to print a pdf which was generated by iText v2.1.5. 
> Unfortunately parts of it were printed in white – the filling color was 
> missing. I could reduce the problem to the attached PDF. When trying to print 
> with e.g. PDocument.silentPrint I get the following info message:
> [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't 
> provide a non-stroking color, using white instead!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1466) Rendering of pattern colorspace fails

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1466:


Fix Version/s: 2.0.0

> Rendering of pattern colorspace fails
> -
>
> Key: PDFBOX-1466
> URL: https://issues.apache.org/jira/browse/PDFBOX-1466
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.7.1, 1.8.4, 2.0.0
> Environment: Windows 7, JDK 1.6 / 1.7
>Reporter: Maurice Koch
>  Labels: tilingpattern
> Fix For: 2.0.0
>
> Attachments: pdfbox-1466.pdf-1.png, report.pdf, report.png
>
>
> I was trying to print a pdf which was generated by iText v2.1.5. 
> Unfortunately parts of it were printed in white – the filling color was 
> missing. I could reduce the problem to the attached PDF. When trying to print 
> with e.g. PDocument.silentPrint I get the following info message:
> [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't 
> provide a non-stroking color, using white instead!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1466) Rendering of pattern colorspace fails

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1466:


Attachment: pdfbox-1466.pdf-1.png

Here's a current rendering, it is almost perfect now. The only problem left is 
a weird shadow. I'm not sure whether the shadow is related to the pattern 
colorspace. It might be a similar problem as in PDFBOX-1954 (line width).

> Rendering of pattern colorspace fails
> -
>
> Key: PDFBOX-1466
> URL: https://issues.apache.org/jira/browse/PDFBOX-1466
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.7.1, 1.8.4, 2.0.0
> Environment: Windows 7, JDK 1.6 / 1.7
>Reporter: Maurice Koch
>  Labels: tilingpattern
> Attachments: pdfbox-1466.pdf-1.png, report.pdf, report.png
>
>
> I was trying to print a pdf which was generated by iText v2.1.5. 
> Unfortunately parts of it were printed in white – the filling color was 
> missing. I could reduce the problem to the attached PDF. When trying to print 
> with e.g. PDocument.silentPrint I get the following info message:
> [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't 
> provide a non-stroking color, using white instead!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1466) Rendering of pattern colorspace fails

2014-03-14 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1466:


Affects Version/s: 2.0.0
   1.8.4

> Rendering of pattern colorspace fails
> -
>
> Key: PDFBOX-1466
> URL: https://issues.apache.org/jira/browse/PDFBOX-1466
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.7.1, 1.8.4, 2.0.0
> Environment: Windows 7, JDK 1.6 / 1.7
>Reporter: Maurice Koch
>  Labels: tilingpattern
> Attachments: pdfbox-1466.pdf-1.png, report.pdf, report.png
>
>
> I was trying to print a pdf which was generated by iText v2.1.5. 
> Unfortunately parts of it were printed in white – the filling color was 
> missing. I could reduce the problem to the attached PDF. When trying to print 
> with e.g. PDocument.silentPrint I get the following info message:
> [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't 
> provide a non-stroking color, using white instead!



--
This message was sent by Atlassian JIRA
(v6.2#6252)