[jira] [Updated] (PDFBOX-2403) false negative? Font damaged, The FontFile can't be read

2014-10-10 Thread Schreiber (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Schreiber updated PDFBOX-2403:
--
Attachment: reportforfile_pdfa1b

Report Preflight fpr file pdfa1b.pdf, Preflight version 11.0.09 (119)

 false negative? Font damaged, The FontFile can't be read
 --

 Key: PDFBOX-2403
 URL: https://issues.apache.org/jira/browse/PDFBOX-2403
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
 Environment: deb7, java 7
Reporter: Ralf Hauser
 Attachments: Problems_pdfa1b.pdf_07.10.2014_001.pdf, 
 patchBetterErrorMessages.txt, patchPDFBOX-2403.txt, 
 patchPDFBOX-2403Type1.txt, pdfA_Validation_Report.eml, pdfa1b.pdf, 
 pdfa1b_summary_0001.pdf, report, reportforfile_pdfa1b, validation_report.xml


 - 1: 3.2.1 : Font damaged, The FontFile can't be read
  - 2: 3.2.1 : Font damaged, The FontFile can't be read
  - 3: 3.1.6 : Invalid Font definition, Width of the character 48 in the 
 font program SURPPV+HeiseiMaruGoStd-W8-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 4: 3.1.6 : Invalid Font definition, Width of the character 36 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 5: 3.3.1 : Glyph error, The character 74 in the font program 
 OIZFRF+KozMinProVI-Regular-Identity-H is missing from the Charater Encoding.
  - 6: 3.1.6 : Invalid Font definition, Width of the character 80 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 7: 3.1.6 : Invalid Font definition, Width of the character 420 in the 
 font program RRATCX+MathematicalPiLTStd-Identity-H is inconsistent with the 
 width in the PDF dictionary.
 possibly related to PDFBOX-2299?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2403) false negative? Font damaged, The FontFile can't be read

2014-10-10 Thread Schreiber (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166467#comment-14166467
 ] 

Schreiber edited comment on PDFBOX-2403 at 10/10/14 6:50 AM:
-

here ist report Preflight for file pdfa1b.pdf, Preflight version 11.0.09 (119)


was (Author: csch1):
Report Preflight fpr file pdfa1b.pdf, Preflight version 11.0.09 (119)

 false negative? Font damaged, The FontFile can't be read
 --

 Key: PDFBOX-2403
 URL: https://issues.apache.org/jira/browse/PDFBOX-2403
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
 Environment: deb7, java 7
Reporter: Ralf Hauser
 Attachments: Problems_pdfa1b.pdf_07.10.2014_001.pdf, 
 patchBetterErrorMessages.txt, patchPDFBOX-2403.txt, 
 patchPDFBOX-2403Type1.txt, pdfA_Validation_Report.eml, pdfa1b.pdf, 
 pdfa1b_summary_0001.pdf, report, reportforfile_pdfa1b, validation_report.xml


 - 1: 3.2.1 : Font damaged, The FontFile can't be read
  - 2: 3.2.1 : Font damaged, The FontFile can't be read
  - 3: 3.1.6 : Invalid Font definition, Width of the character 48 in the 
 font program SURPPV+HeiseiMaruGoStd-W8-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 4: 3.1.6 : Invalid Font definition, Width of the character 36 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 5: 3.3.1 : Glyph error, The character 74 in the font program 
 OIZFRF+KozMinProVI-Regular-Identity-H is missing from the Charater Encoding.
  - 6: 3.1.6 : Invalid Font definition, Width of the character 80 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 7: 3.1.6 : Invalid Font definition, Width of the character 420 in the 
 font program RRATCX+MathematicalPiLTStd-Identity-H is inconsistent with the 
 width in the PDF dictionary.
 possibly related to PDFBOX-2299?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PDFBOX-2422) PDFont.getStringWidth results in stackoverflow

2014-10-10 Thread Cornelis Hoeflake (JIRA)
Cornelis Hoeflake created PDFBOX-2422:
-

 Summary: PDFont.getStringWidth results in stackoverflow
 Key: PDFBOX-2422
 URL: https://issues.apache.org/jira/browse/PDFBOX-2422
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake


When loading a true type font and calling getStringWidth(é) will result in a 
stackoverflow. Calling the method with a 'regular' character is ok.

{code:title=Example code}
PDDocument doc = new PDDocument();
// load a font which is in PDFBox
PDTrueTypeFont font = PDTrueTypeFont.loadTTF(doc, 
getClass().getResourceAsStream(/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf));
font.getStringWidth(éé);
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


GSoC2015

2014-10-10 Thread Tilman Hausherr

Some ideas for GSoC2015:

- improved PDFDebugger (because of the difficulty to seeing the 
different sequence in PDFBOX-2401 and because the product shown at 
https://www.youtube.com/watch?v=g-QcU9B4qMc is better)

   - hex view
   - view of non printable characters
   - saving streams
   - color mark of PDF operators
   - show images that are streams
   - show PDIndexed gradient
   - show PDSeparation color
   - edit fields and streams
   - save altered PDF

- improved PDF Viewer (Zoom, drag and drop, resize view)

This could possibly be a candidate for Google Code-in 2014, although I'm 
not sure if Apache participates. I saw a msg from 2013 that looked like not.



- a working TIFF decoder
- a working JPX decoder
- the text extraction test suite for TIKA that Tim mentioned some time ago




Tilman


PS: No I won't participate in the Semester of Code because I don't 
have a project idea, and I want to relax somewhat. The work on GSoC2014 
has been pretty intense, i.e. reviewing code and making tests.


[jira] [Commented] (PDFBOX-2403) false negative? Font damaged, The FontFile can't be read

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167173#comment-14167173
 ] 

John Hewson commented on PDFBOX-2403:
-

The latest XML report contains the error:

{quote}
Ungültiger Wert für PDF/A-Konformitätslevel (muss B sein)
{quote}

In English:
{quote}
Invalid value for PDF / A conformance level (must be B)
{quote}

Which I think means you're running Acrobat's PDF/A-1a validation against this 
file, but this file is PDF/A-1b.

 false negative? Font damaged, The FontFile can't be read
 --

 Key: PDFBOX-2403
 URL: https://issues.apache.org/jira/browse/PDFBOX-2403
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
 Environment: deb7, java 7
Reporter: Ralf Hauser
 Attachments: Problems_pdfa1b.pdf_07.10.2014_001.pdf, 
 patchBetterErrorMessages.txt, patchPDFBOX-2403.txt, 
 patchPDFBOX-2403Type1.txt, pdfA_Validation_Report.eml, pdfa1b.pdf, 
 pdfa1b_summary_0001.pdf, report, reportforfile_pdfa1b, validation_report.xml


 - 1: 3.2.1 : Font damaged, The FontFile can't be read
  - 2: 3.2.1 : Font damaged, The FontFile can't be read
  - 3: 3.1.6 : Invalid Font definition, Width of the character 48 in the 
 font program SURPPV+HeiseiMaruGoStd-W8-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 4: 3.1.6 : Invalid Font definition, Width of the character 36 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 5: 3.3.1 : Glyph error, The character 74 in the font program 
 OIZFRF+KozMinProVI-Regular-Identity-H is missing from the Charater Encoding.
  - 6: 3.1.6 : Invalid Font definition, Width of the character 80 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 7: 3.1.6 : Invalid Font definition, Width of the character 420 in the 
 font program RRATCX+MathematicalPiLTStd-Identity-H is inconsistent with the 
 width in the PDF dictionary.
 possibly related to PDFBOX-2299?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2403) false negative? Font damaged, The FontFile can't be read

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167173#comment-14167173
 ] 

John Hewson edited comment on PDFBOX-2403 at 10/10/14 5:41 PM:
---

The latest XML report contains the error:

{quote}
Ungültiger Wert für PDF/A-Konformitätslevel (muss B sein)
{quote}

In English:
{quote}
Invalid value for PDF / A conformance level (must be B)
{quote}

Which I think means you're running Acrobat's PDF/A-1a validation against this 
file, but this file is actually PDF/A-1b.


was (Author: jahewson):
The latest XML report contains the error:

{quote}
Ungültiger Wert für PDF/A-Konformitätslevel (muss B sein)
{quote}

In English:
{quote}
Invalid value for PDF / A conformance level (must be B)
{quote}

Which I think means you're running Acrobat's PDF/A-1a validation against this 
file, but this file is PDF/A-1b.

 false negative? Font damaged, The FontFile can't be read
 --

 Key: PDFBOX-2403
 URL: https://issues.apache.org/jira/browse/PDFBOX-2403
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
 Environment: deb7, java 7
Reporter: Ralf Hauser
 Attachments: Problems_pdfa1b.pdf_07.10.2014_001.pdf, 
 patchBetterErrorMessages.txt, patchPDFBOX-2403.txt, 
 patchPDFBOX-2403Type1.txt, pdfA_Validation_Report.eml, pdfa1b.pdf, 
 pdfa1b_summary_0001.pdf, report, reportforfile_pdfa1b, validation_report.xml


 - 1: 3.2.1 : Font damaged, The FontFile can't be read
  - 2: 3.2.1 : Font damaged, The FontFile can't be read
  - 3: 3.1.6 : Invalid Font definition, Width of the character 48 in the 
 font program SURPPV+HeiseiMaruGoStd-W8-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 4: 3.1.6 : Invalid Font definition, Width of the character 36 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 5: 3.3.1 : Glyph error, The character 74 in the font program 
 OIZFRF+KozMinProVI-Regular-Identity-H is missing from the Charater Encoding.
  - 6: 3.1.6 : Invalid Font definition, Width of the character 80 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 7: 3.1.6 : Invalid Font definition, Width of the character 420 in the 
 font program RRATCX+MathematicalPiLTStd-Identity-H is inconsistent with the 
 width in the PDF dictionary.
 possibly related to PDFBOX-2299?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2403) false negative? Font damaged, The FontFile can't be read

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167173#comment-14167173
 ] 

John Hewson edited comment on PDFBOX-2403 at 10/10/14 5:42 PM:
---

The latest XML report contains the error:

{quote}
Ungültiger Wert für PDF/A-Konformitätslevel (muss B sein)
{quote}

In English:
{quote}
Invalid value for PDF / A conformance level (must be B)
{quote}

Which I think means you're running Acrobat's PDF/A-1a validation against this 
file, but this file is actually PDF/A-1b, so you need to run that validation 
instead.


was (Author: jahewson):
The latest XML report contains the error:

{quote}
Ungültiger Wert für PDF/A-Konformitätslevel (muss B sein)
{quote}

In English:
{quote}
Invalid value for PDF / A conformance level (must be B)
{quote}

Which I think means you're running Acrobat's PDF/A-1a validation against this 
file, but this file is actually PDF/A-1b.

 false negative? Font damaged, The FontFile can't be read
 --

 Key: PDFBOX-2403
 URL: https://issues.apache.org/jira/browse/PDFBOX-2403
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
 Environment: deb7, java 7
Reporter: Ralf Hauser
 Attachments: Problems_pdfa1b.pdf_07.10.2014_001.pdf, 
 patchBetterErrorMessages.txt, patchPDFBOX-2403.txt, 
 patchPDFBOX-2403Type1.txt, pdfA_Validation_Report.eml, pdfa1b.pdf, 
 pdfa1b_summary_0001.pdf, report, reportforfile_pdfa1b, validation_report.xml


 - 1: 3.2.1 : Font damaged, The FontFile can't be read
  - 2: 3.2.1 : Font damaged, The FontFile can't be read
  - 3: 3.1.6 : Invalid Font definition, Width of the character 48 in the 
 font program SURPPV+HeiseiMaruGoStd-W8-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 4: 3.1.6 : Invalid Font definition, Width of the character 36 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 5: 3.3.1 : Glyph error, The character 74 in the font program 
 OIZFRF+KozMinProVI-Regular-Identity-H is missing from the Charater Encoding.
  - 6: 3.1.6 : Invalid Font definition, Width of the character 80 in the 
 font program OIZFRF+KozMinProVI-Regular-Identity-H is inconsistent with the 
 width in the PDF dictionary.
  - 7: 3.1.6 : Invalid Font definition, Width of the character 420 in the 
 font program RRATCX+MathematicalPiLTStd-Identity-H is inconsistent with the 
 width in the PDF dictionary.
 possibly related to PDFBOX-2299?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PDFBOX-2422) PDFont.getStringWidth results in stackoverflow

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reassigned PDFBOX-2422:
---

Assignee: John Hewson

 PDFont.getStringWidth results in stackoverflow
 --

 Key: PDFBOX-2422
 URL: https://issues.apache.org/jira/browse/PDFBOX-2422
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 When loading a true type font and calling getStringWidth(é) will result in 
 a stackoverflow. Calling the method with a 'regular' character is ok.
 {code:title=Example code}
 PDDocument doc = new PDDocument();
 // load a font which is in PDFBox
 PDTrueTypeFont font = PDTrueTypeFont.loadTTF(doc, 
 getClass().getResourceAsStream(/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf));
 font.getStringWidth(éé);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2422) PDFont.getStringWidth results in stackoverflow

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167197#comment-14167197
 ] 

John Hewson commented on PDFBOX-2422:
-

This code works fine for me with the latest trunk, can you make sure you're 
using the latest version from SVN or the latest snapshot jar?

 PDFont.getStringWidth results in stackoverflow
 --

 Key: PDFBOX-2422
 URL: https://issues.apache.org/jira/browse/PDFBOX-2422
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 When loading a true type font and calling getStringWidth(é) will result in 
 a stackoverflow. Calling the method with a 'regular' character is ok.
 {code:title=Example code}
 PDDocument doc = new PDDocument();
 // load a font which is in PDFBox
 PDTrueTypeFont font = PDTrueTypeFont.loadTTF(doc, 
 getClass().getResourceAsStream(/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf));
 font.getStringWidth(éé);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 2.0

2014-10-10 Thread John Hewson
Simon,

Andreas has the best handle on this, but off the top of my head what we need is 
to finish
making breaking API changes and for the code to have been stable for a while 
before
making a 2.0 release.

Improvements and fixes which still need breaking API changes include:
- Pattern rendering
- Pages resource caching (significant memory usage issues)
- Font embedding (particularly TTF)
- Parsing (Andreas?)
- Page Tree (needs completely re-writing)
- Text extraction on Java 8 (this might end up being a breaking change 
to the sort)

There’s probably more, such as work on Acroforms, and we need to have much 
better
example code for 2.0 due to all the changes.

This seems like a good time to explicitly try to make sure that we have JIRA 
issues open
for all outstanding tasks, so that we can track how close 2.0 is to being 
ready. The stability
of the code is a pretty good indicator - we’re not there yet.

I’m going to open some JIRA issues. Andreas, Tilman - please open issues for any
2.0 features which you think we need.

Thanks

-- John

On 10 Oct 2014, at 08:08, Simon Steiner simonsteiner1...@gmail.com wrote:

 Hi,
 
 
 
 Could you set a target date for 2.0 release. What's missing to make a
 release?
 
 
 
 Thanks
 



Re: 2.0

2014-10-10 Thread John Hewson
Andreas - can we create a new “Later” version in JIRA so that we can assign
issues that we’ve decided to defer until after 2.0? That way we can have a
definitive list of what does and doesn’t need attention.

-- John

On 10 Oct 2014, at 11:05, John Hewson j...@jahewson.com wrote:

 Simon,
 
 Andreas has the best handle on this, but off the top of my head what we need 
 is to finish
 making breaking API changes and for the code to have been stable for a while 
 before
 making a 2.0 release.
 
 Improvements and fixes which still need breaking API changes include:
   - Pattern rendering
   - Pages resource caching (significant memory usage issues)
   - Font embedding (particularly TTF)
   - Parsing (Andreas?)
   - Page Tree (needs completely re-writing)
   - Text extraction on Java 8 (this might end up being a breaking change 
 to the sort)
 
 There’s probably more, such as work on Acroforms, and we need to have much 
 better
 example code for 2.0 due to all the changes.
 
 This seems like a good time to explicitly try to make sure that we have JIRA 
 issues open
 for all outstanding tasks, so that we can track how close 2.0 is to being 
 ready. The stability
 of the code is a pretty good indicator - we’re not there yet.
 
 I’m going to open some JIRA issues. Andreas, Tilman - please open issues for 
 any
 2.0 features which you think we need.
 
 Thanks
 
 -- John
 
 On 10 Oct 2014, at 08:08, Simon Steiner simonsteiner1...@gmail.com wrote:
 
 Hi,
 
 
 
 Could you set a target date for 2.0 release. What's missing to make a
 release?
 
 
 
 Thanks
 
 



[jira] [Created] (PDFBOX-2423) Page tree handling needs rewriting

2014-10-10 Thread John Hewson (JIRA)
John Hewson created PDFBOX-2423:
---

 Summary: Page tree handling needs rewriting
 Key: PDFBOX-2423
 URL: https://issues.apache.org/jira/browse/PDFBOX-2423
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: John Hewson


The way in which PDFBox handles the Page tree needs to be rewritten, preferably 
from scratch. Currently the document catalog returns the raw objects from the 
page tree, wrapped in either a PDPage or PDPageNode.

We need to abstract over the page tree and get rid of PDPageNode, we should 
provide methods which can add/remove PDPage objects *only*. The existing 
low-level access to the page tree is not needed at the PD-level.

Inheritance of page properties such as crop box, resources, and rotation should 
be reimplemented to use whatever new page tree abstraction we invent. We can 
finally remove the old broken methods which didn't look up the inheritance tree 
when retrieving these values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2423) Page tree handling needs rewriting

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2423:

Fix Version/s: 2.0.0

 Page tree handling needs rewriting
 --

 Key: PDFBOX-2423
 URL: https://issues.apache.org/jira/browse/PDFBOX-2423
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: John Hewson
 Fix For: 2.0.0


 The way in which PDFBox handles the Page tree needs to be rewritten, 
 preferably from scratch. Currently the document catalog returns the raw 
 objects from the page tree, wrapped in either a PDPage or PDPageNode.
 We need to abstract over the page tree and get rid of PDPageNode, we should 
 provide methods which can add/remove PDPage objects *only*. The existing 
 low-level access to the page tree is not needed at the PD-level.
 Inheritance of page properties such as crop box, resources, and rotation 
 should be reimplemented to use whatever new page tree abstraction we invent. 
 We can finally remove the old broken methods which didn't look up the 
 inheritance tree when retrieving these values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2400) Insert page

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167252#comment-14167252
 ] 

John Hewson commented on PDFBOX-2400:
-

The page tree in 1.8 is fundamentally broken and probably shouldn't receive any 
more attention. However, in 2.0 we're planning on fixing this, see PDFBOX-2423, 
after which we could probably add an insertPage method with ease.

 Insert page
 ---

 Key: PDFBOX-2400
 URL: https://issues.apache.org/jira/browse/PDFBOX-2400
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Reporter: Patrick Tucker
Priority: Minor

 It would be nice if PDDocument had an insertPage function similar to addPage, 
 but takes a number to indicate where to add the new page in the current set 
 of pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2400) Insert page

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2400:

Fix Version/s: 2.0.0

 Insert page
 ---

 Key: PDFBOX-2400
 URL: https://issues.apache.org/jira/browse/PDFBOX-2400
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: Patrick Tucker
Priority: Minor
 Fix For: 2.0.0


 It would be nice if PDDocument had an insertPage function similar to addPage, 
 but takes a number to indicate where to add the new page in the current set 
 of pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2400) Insert page

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2400:

Affects Version/s: 2.0.0
   1.8.7

 Insert page
 ---

 Key: PDFBOX-2400
 URL: https://issues.apache.org/jira/browse/PDFBOX-2400
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: Patrick Tucker
Priority: Minor
 Fix For: 2.0.0


 It would be nice if PDDocument had an insertPage function similar to addPage, 
 but takes a number to indicate where to add the new page in the current set 
 of pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2370) Move caching outside of PDResources

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2370:

Component/s: PDModel

 Move caching outside of PDResources
 ---

 Key: PDFBOX-2370
 URL: https://issues.apache.org/jira/browse/PDFBOX-2370
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
 Fix For: 2.0.0


 *Note:* This issue is based on a discussion which occurred regarding 
 PDFBOX-2301 but is actually a separate issue.
 Currently we cache the page resources in PDResources which belongs to a 
 specific PDPage. This causes two problems, 1) users who want to hold many 
 PDPage objects in memory will have high memory use (but this is often by 
 accident*). 2) By caching resources in PDPage we only get to keep that cache 
 for the lifetime of the page, which e.g. in PDFRenderer is a single page 
 only. That means that a font which appears on 40 pages has to be parsed 40 
 times, which causes slow running times, but also memory thrashing as objects 
 are destroyed frequently only to be re-created.
 What PDFRenderer really needs is not page-wide caching but document-wide 
 caching, so that it can cache fonts, cmaps, color profiles, etc. only once. 
 But that won't work for images, because they're too large. What we're 
 beginning to realise is that caching is use-case specific and probably 
 shouldn't be built-in to PDFBox's pdmodel. Instead we should removing 
 resource caching from PDPage/PDResources and implement custom caching in 
 PDFRenderer and other downstream classes such as PDFTextStripper. I'll 
 happily volunteer myself. The existing high-level PDFBox APIs will continue 
 to just work and power users will get a level of control that they 
 appreciate.
 This strategy could be enhanced by removing memory-hungry methods on 
 PDResources such as getFonts() and getXObjects() which force all resources of 
 a particular type to be loaded, whether or not they are needed, or actually 
 used in the content stream. They would be replaced by methods to retrieve a 
 single resource, e.g. getFont(name).
 ---
 \* There probably isn't a legitimate use case for 1) any more, we've solved 
 the issues which we used to have with image caching (in fact, the 
 clearCache() method actually no longer needs to be called by PDFRenderer, 
 though it currently is). The real problem is that it's easy to accidentally 
 retain PDPage objects, the PDDocument#getDocumentCatalog().getAllPages() 
 method is dangerous as looping over it will cause pages to be retained during 
 processing, like so:
 {code}
 for (PDPage page : document.getDocumentCatalog().getAllPages()) // 
 java.util.List
 {
  // ... this is idiomatic in PDFBox 1.8
 } 
 // List returned by getAllPages() kept in scope until here (bad)
 {code}
 I added of couple of methods a while ago to avoid this by fetching each 
 PDPage one at a time, and this is now used internally in PDFBox to avoid the 
 memory problems we used to have:
 {code}
 for (int i = 0; i  document.getNumberOfPages(); i++)
 {
 PDPage page = document.getPage(i);
 // ... this is the new 2.0 way
 // current page falls out of scope here (good)
 }
 {code}
 To solve this problem, we could change getAllPages() so that instead of 
 returning a List it returns an IteratorPDPage, which would provide a nicer 
 API than getPage(int) and most existing code will continue to work. This is 
 also an opportunity to also fix type safety issues due to PDPageNode and 
 incorrect handling of the page tree (this is similar to the issue we had 
 recently with the acroform field tree).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2370) Move caching outside of PDResources

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2370:

Affects Version/s: 2.0.0

 Move caching outside of PDResources
 ---

 Key: PDFBOX-2370
 URL: https://issues.apache.org/jira/browse/PDFBOX-2370
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
 Fix For: 2.0.0


 *Note:* This issue is based on a discussion which occurred regarding 
 PDFBOX-2301 but is actually a separate issue.
 Currently we cache the page resources in PDResources which belongs to a 
 specific PDPage. This causes two problems, 1) users who want to hold many 
 PDPage objects in memory will have high memory use (but this is often by 
 accident*). 2) By caching resources in PDPage we only get to keep that cache 
 for the lifetime of the page, which e.g. in PDFRenderer is a single page 
 only. That means that a font which appears on 40 pages has to be parsed 40 
 times, which causes slow running times, but also memory thrashing as objects 
 are destroyed frequently only to be re-created.
 What PDFRenderer really needs is not page-wide caching but document-wide 
 caching, so that it can cache fonts, cmaps, color profiles, etc. only once. 
 But that won't work for images, because they're too large. What we're 
 beginning to realise is that caching is use-case specific and probably 
 shouldn't be built-in to PDFBox's pdmodel. Instead we should removing 
 resource caching from PDPage/PDResources and implement custom caching in 
 PDFRenderer and other downstream classes such as PDFTextStripper. I'll 
 happily volunteer myself. The existing high-level PDFBox APIs will continue 
 to just work and power users will get a level of control that they 
 appreciate.
 This strategy could be enhanced by removing memory-hungry methods on 
 PDResources such as getFonts() and getXObjects() which force all resources of 
 a particular type to be loaded, whether or not they are needed, or actually 
 used in the content stream. They would be replaced by methods to retrieve a 
 single resource, e.g. getFont(name).
 ---
 \* There probably isn't a legitimate use case for 1) any more, we've solved 
 the issues which we used to have with image caching (in fact, the 
 clearCache() method actually no longer needs to be called by PDFRenderer, 
 though it currently is). The real problem is that it's easy to accidentally 
 retain PDPage objects, the PDDocument#getDocumentCatalog().getAllPages() 
 method is dangerous as looping over it will cause pages to be retained during 
 processing, like so:
 {code}
 for (PDPage page : document.getDocumentCatalog().getAllPages()) // 
 java.util.List
 {
  // ... this is idiomatic in PDFBox 1.8
 } 
 // List returned by getAllPages() kept in scope until here (bad)
 {code}
 I added of couple of methods a while ago to avoid this by fetching each 
 PDPage one at a time, and this is now used internally in PDFBox to avoid the 
 memory problems we used to have:
 {code}
 for (int i = 0; i  document.getNumberOfPages(); i++)
 {
 PDPage page = document.getPage(i);
 // ... this is the new 2.0 way
 // current page falls out of scope here (good)
 }
 {code}
 To solve this problem, we could change getAllPages() so that instead of 
 returning a List it returns an IteratorPDPage, which would provide a nicer 
 API than getPage(int) and most existing code will continue to work. This is 
 also an opportunity to also fix type safety issues due to PDPageNode and 
 incorrect handling of the page tree (this is similar to the issue we had 
 recently with the acroform field tree).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2400) Add insertPage() method

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2400:

Summary: Add insertPage() method  (was: Insert page)

 Add insertPage() method
 ---

 Key: PDFBOX-2400
 URL: https://issues.apache.org/jira/browse/PDFBOX-2400
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: Patrick Tucker
Priority: Minor
 Fix For: 2.0.0


 It would be nice if PDDocument had an insertPage function similar to addPage, 
 but takes a number to indicate where to add the new page in the current set 
 of pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2370) Move caching outside of PDResources

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2370:

Fix Version/s: 2.0.0

 Move caching outside of PDResources
 ---

 Key: PDFBOX-2370
 URL: https://issues.apache.org/jira/browse/PDFBOX-2370
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
 Fix For: 2.0.0


 *Note:* This issue is based on a discussion which occurred regarding 
 PDFBOX-2301 but is actually a separate issue.
 Currently we cache the page resources in PDResources which belongs to a 
 specific PDPage. This causes two problems, 1) users who want to hold many 
 PDPage objects in memory will have high memory use (but this is often by 
 accident*). 2) By caching resources in PDPage we only get to keep that cache 
 for the lifetime of the page, which e.g. in PDFRenderer is a single page 
 only. That means that a font which appears on 40 pages has to be parsed 40 
 times, which causes slow running times, but also memory thrashing as objects 
 are destroyed frequently only to be re-created.
 What PDFRenderer really needs is not page-wide caching but document-wide 
 caching, so that it can cache fonts, cmaps, color profiles, etc. only once. 
 But that won't work for images, because they're too large. What we're 
 beginning to realise is that caching is use-case specific and probably 
 shouldn't be built-in to PDFBox's pdmodel. Instead we should removing 
 resource caching from PDPage/PDResources and implement custom caching in 
 PDFRenderer and other downstream classes such as PDFTextStripper. I'll 
 happily volunteer myself. The existing high-level PDFBox APIs will continue 
 to just work and power users will get a level of control that they 
 appreciate.
 This strategy could be enhanced by removing memory-hungry methods on 
 PDResources such as getFonts() and getXObjects() which force all resources of 
 a particular type to be loaded, whether or not they are needed, or actually 
 used in the content stream. They would be replaced by methods to retrieve a 
 single resource, e.g. getFont(name).
 ---
 \* There probably isn't a legitimate use case for 1) any more, we've solved 
 the issues which we used to have with image caching (in fact, the 
 clearCache() method actually no longer needs to be called by PDFRenderer, 
 though it currently is). The real problem is that it's easy to accidentally 
 retain PDPage objects, the PDDocument#getDocumentCatalog().getAllPages() 
 method is dangerous as looping over it will cause pages to be retained during 
 processing, like so:
 {code}
 for (PDPage page : document.getDocumentCatalog().getAllPages()) // 
 java.util.List
 {
  // ... this is idiomatic in PDFBox 1.8
 } 
 // List returned by getAllPages() kept in scope until here (bad)
 {code}
 I added of couple of methods a while ago to avoid this by fetching each 
 PDPage one at a time, and this is now used internally in PDFBox to avoid the 
 memory problems we used to have:
 {code}
 for (int i = 0; i  document.getNumberOfPages(); i++)
 {
 PDPage page = document.getPage(i);
 // ... this is the new 2.0 way
 // current page falls out of scope here (good)
 }
 {code}
 To solve this problem, we could change getAllPages() so that instead of 
 returning a List it returns an IteratorPDPage, which would provide a nicer 
 API than getPage(int) and most existing code will continue to work. This is 
 also an opportunity to also fix type safety issues due to PDPageNode and 
 incorrect handling of the page tree (this is similar to the issue we had 
 recently with the acroform field tree).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2340:

Fix Version/s: 2.0.0

 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.0.0
Reporter: Maruan Sahyoun
 Fix For: 2.0.0

 Attachments: Mockup-20140912.png, Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from the examples/snippet ‚repository‘.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2366) Improve high-level font API

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2366:

Fix Version/s: 2.0.0

 Improve high-level font API
 ---

 Key: PDFBOX-2366
 URL: https://issues.apache.org/jira/browse/PDFBOX-2366
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Reporter: John Hewson
Assignee: John Hewson
Priority: Minor
 Fix For: 2.0.0


 The PDFont and Type1Equivalent APIs could expose some higher-level details, 
 such as a consistent way for to get names and Type1Equivalent instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2340:

Affects Version/s: 2.0.0

 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.0.0
Reporter: Maruan Sahyoun
 Fix For: 2.0.0

 Attachments: Mockup-20140912.png, Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from the examples/snippet ‚repository‘.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2337) Add an example for highlighting text based on a string

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2337:

Affects Version/s: 2.0.0
   1.8.7

 Add an example for highlighting text based on a string 
 ---

 Key: PDFBOX-2337
 URL: https://issues.apache.org/jira/browse/PDFBOX-2337
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Affects Versions: 1.8.7, 2.0.0
Reporter: Joël Kuiper
 Fix For: 1.8.7


 An often heard request is to be able to highlight a certain text within a PDF 
 programmatically, similar to the highlight functionality in Acrobat or 
 Preview.app.
 The actual implementation of this functionality is trickier than it appears, 
 since it requires the calculation of bouding boxes from TextPositions. 
 A example class may help people with implementing this (common) 
 functionality. 
 (see for example this discussion 
 https://mail-archives.apache.org/mod_mbox/pdfbox-users/201409.mbox/%3CC8340BB9-E299-4A76-A50B-6155504A0D5B%40joelkuiper.eu%3E)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2337) Add an example for highlighting text based on a string

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2337:

Fix Version/s: 1.8.7

 Add an example for highlighting text based on a string 
 ---

 Key: PDFBOX-2337
 URL: https://issues.apache.org/jira/browse/PDFBOX-2337
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Affects Versions: 1.8.7, 2.0.0
Reporter: Joël Kuiper
 Fix For: 1.8.7


 An often heard request is to be able to highlight a certain text within a PDF 
 programmatically, similar to the highlight functionality in Acrobat or 
 Preview.app.
 The actual implementation of this functionality is trickier than it appears, 
 since it requires the calculation of bouding boxes from TextPositions. 
 A example class may help people with implementing this (common) 
 functionality. 
 (see for example this discussion 
 https://mail-archives.apache.org/mod_mbox/pdfbox-users/201409.mbox/%3CC8340BB9-E299-4A76-A50B-6155504A0D5B%40joelkuiper.eu%3E)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2335) NPE in DictionaryEncoding constructor

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2335:

Affects Version/s: 2.0.0

 NPE in DictionaryEncoding constructor
 -

 Key: PDFBOX-2335
 URL: https://issues.apache.org/jira/browse/PDFBOX-2335
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: John Hewson
 Fix For: 2.0.0

 Attachments: PDFBOX-2335-203040-p17.pdf


 I get an NPE with the attached file:
 {code}
 Sep 09, 2014 9:16:57 PM org.apache.pdfbox.pdmodel.font.PDType1Font init
 WARNUNG: Using fallback font 'TimesNewRomanPSMT' for 'ZapfDingbats'
 Exception in thread main java.lang.NullPointerException
   at 
 org.apache.pdfbox.encoding.DictionaryEncoding.init(DictionaryEncoding.java:91)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.readEncoding(PDSimpleFont.java:126)
   at 
 org.apache.pdfbox.pdmodel.font.PDType1Font.init(PDType1Font.java:256)
   at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:65)
   at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:556)
   at 
 org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2333) Overhaul the apperance generation for PDF forms

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2333:

Affects Version/s: 2.0.0

 Overhaul the apperance generation for PDF forms
 ---

 Key: PDFBOX-2333
 URL: https://issues.apache.org/jira/browse/PDFBOX-2333
 Project: PDFBox
  Issue Type: Improvement
  Components: AcroForm
Affects Versions: 2.0.0
Reporter: Maruan Sahyoun
 Fix For: 2.0.0

 Attachments: AcroForms-SimpleTextFields.1.8.7.pdf, 
 AcroForms-SimpleTextFields.1.8.7.png, AcroForms-SimpleTextFields.pdf


 The appearance handling for forms in 1.x is limited and does not reflect all 
 settings possible for form fields. In addition the current code is not very 
 modular and does not follow the box model used for form fields. 
 Unfortunately only the basics of form handling are defined in the PDF spec. 
 The details like padding of boxes, text placement etc. have to be determined 
 by looking at how Adobe forms are generated.
 Update: The file from PDFBOX-2310 has bad rendering which might be related?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2289) New Example:

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2289:

Summary: New Example:   (was: provide an example how to set 
PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to an aes encrypted pdf)

 New Example: 
 -

 Key: PDFBOX-2289
 URL: https://issues.apache.org/jira/browse/PDFBOX-2289
 Project: PDFBox
  Issue Type: Wish
Reporter: Ralf Hauser

 Provide an example how to set PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to 
 an AES encrypted pdf such that it shows the attachment section after entering 
 the decryption password in Acrobat Viewer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2289) provide an example how to set PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to an aes encrypted pdf

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2289:

Description: Provide an example how to set 
PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to an AES encrypted pdf such that 
it shows the attachment section after entering the decryption password in 
Acrobat Viewer  (was: such that it shows the attachment section after entering 
the decryption password in Acrobat Viewer)

 provide an example how to set PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to 
 an aes encrypted pdf
 -

 Key: PDFBOX-2289
 URL: https://issues.apache.org/jira/browse/PDFBOX-2289
 Project: PDFBox
  Issue Type: Wish
Reporter: Ralf Hauser

 Provide an example how to set PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to 
 an AES encrypted pdf such that it shows the attachment section after entering 
 the decryption password in Acrobat Viewer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2289) Example: Attachments with AES encrypted PDF

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2289:

Summary: Example: Attachments with AES encrypted PDF  (was: New Example: )

 Example: Attachments with AES encrypted PDF
 ---

 Key: PDFBOX-2289
 URL: https://issues.apache.org/jira/browse/PDFBOX-2289
 Project: PDFBox
  Issue Type: Wish
Reporter: Ralf Hauser

 Provide an example how to set PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to 
 an AES encrypted pdf such that it shows the attachment section after entering 
 the decryption password in Acrobat Viewer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2289) Example: Attachments with AES encrypted PDF

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-2289.
-
Resolution: Won't Fix

This isn't really an example but a how to question which would be best asked 
on the mailing list. If we had a concrete example we could add it to PDFBox's 
examples suite later.

 Example: Attachments with AES encrypted PDF
 ---

 Key: PDFBOX-2289
 URL: https://issues.apache.org/jira/browse/PDFBOX-2289
 Project: PDFBox
  Issue Type: Wish
Reporter: Ralf Hauser

 Provide an example how to set PDDocumentCatalog.PAGE_MODE_USE_ATTACHMENTS to 
 an AES encrypted pdf such that it shows the attachment section after entering 
 the decryption password in Acrobat Viewer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2232) Is there difference between character \n and character space(32) in pdf stream

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-2232.
-
Resolution: Cannot Reproduce

There doesn't seem to be any information to go on here, or any real indication 
that this is a bug.

 Is there difference between character \n and character space(32) in pdf stream
 --

 Key: PDFBOX-2232
 URL: https://issues.apache.org/jira/browse/PDFBOX-2232
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Reporter: huangchangan

 when extract text from pdf files with PDFTextStripper, I get a space(32) at 
 each end of paragraph or  cells in a table, while in the text copyed from 
 Adobe reader, the end character is \n, I wonder whether pdfbox convert 
 character \n to space(32),I checked function processEncodedText in 
 PDFStreamEngine and get no usefull information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2168) Different behavior of Undo feature when form was pre filled by PDFBox

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167330#comment-14167330
 ] 

John Hewson commented on PDFBOX-2168:
-

Can you add the Affects Version/s for this issue?

 Different behavior of Undo feature when form was pre filled by PDFBox
 -

 Key: PDFBOX-2168
 URL: https://issues.apache.org/jira/browse/PDFBOX-2168
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Reporter: Maruan Sahyoun
Priority: Minor
  Labels: Appearance
 Attachments: formtemplate-filled-pdfbox.pdf, 
 formtemplate-filled-reader.pdf, formtemplate.pdf


 When a form is pre filled by PDFBox the Undo feature in Adobe Reader will 
 reset the field value but not change the visible appearance of the field i.e. 
 the old value will still be visible, The same form filled by Adobe 
 Reader/Acrobat behaves correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2142) some /ICCBased colorspaces not rendered correctly

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2142:

Fix Version/s: 2.0.0

 some /ICCBased colorspaces not rendered correctly
 -

 Key: PDFBOX-2142
 URL: https://issues.apache.org/jira/browse/PDFBOX-2142
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: PDFBOX-2142.pdf, PDFBOX-2142.pdf-1.png, PDFBOX-2142.ps


 I have created a test file from PostScript to show that -CIELAB and XYZ- some 
 colors are different when rendered by PDFBox.
 Btw the RGB colors in the file have no meaning, nor do the colors have a 
 relationship between each others, i.e. they do not have to look identical to 
 any other color anywhere.
 The postscript file was created based on files by [James 
 Cloos|http://jhcloos.com/PostScript/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data

2014-10-10 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2424:

Component/s: Parsing

 ClassCastException in getMetaData if no real meta data
 --

 Key: PDFBOX-2424
 URL: https://issues.apache.org/jira/browse/PDFBOX-2424
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr

 Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to 
 test it myself, the cause is obvious) with the attached file:
 {code}
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
 org.apache.tika.parser.pdf.PDFParser
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   at 
 org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137)
   at 
 org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120)
   at 
 org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153)
   at 
 org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96)
   at 
 org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:724)
 Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary 
 cannot be cast to org.apache.pdfbox.cos.COSStream
   at 
 org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312)
   at 
 org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   ... 13 more
 
 {code}
 here's the excerpt in the PDF:
 {code}
 241 0 obj  /Type /Metadata /Subtype /XML  endobj
 {code}
 the current code is
 {code}
 COSStream stream = (COSStream)root.getDictionaryObject( 
 COSName.METADATA );
 {code}
 shall we keep it that way or rather put out a warning if the meta data is not 
 a stream and return null? Adobe Reader does nothing when looking for the 
 properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2366) Improve high-level font API

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2366:

Affects Version/s: 2.0.0

 Improve high-level font API
 ---

 Key: PDFBOX-2366
 URL: https://issues.apache.org/jira/browse/PDFBOX-2366
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
Priority: Minor
 Fix For: 2.0.0


 The PDFont and Type1Equivalent APIs could expose some higher-level details, 
 such as a consistent way for to get names and Type1Equivalent instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data

2014-10-10 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2424:
---

 Summary: ClassCastException in getMetaData if no real meta data
 Key: PDFBOX-2424
 URL: https://issues.apache.org/jira/browse/PDFBOX-2424
 Project: PDFBox
  Issue Type: Bug
Reporter: Tilman Hausherr


Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to 
test it myself, the cause is obvious) with the attached file:
{code}
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.pdf.PDFParser
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137)
at 
org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120)
at 
org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153)
at 
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96)
at 
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary 
cannot be cast to org.apache.pdfbox.cos.COSStream
at 
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312)
at 
org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
... 13 more

{code}
here's the excerpt in the PDF:
{code}
241 0 obj  /Type /Metadata /Subtype /XML  endobj
{code}
the current code is
{code}
COSStream stream = (COSStream)root.getDictionaryObject( 
COSName.METADATA );
{code}
shall we keep it that way or rather put out a warning if the meta data is not a 
stream and return null? Adobe Reader does nothing when looking for the 
properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data

2014-10-10 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2424:

Affects Version/s: 2.0.0
   1.8.8
   1.8.7

 ClassCastException in getMetaData if no real meta data
 --

 Key: PDFBOX-2424
 URL: https://issues.apache.org/jira/browse/PDFBOX-2424
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr

 Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to 
 test it myself, the cause is obvious) with the attached file:
 {code}
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
 org.apache.tika.parser.pdf.PDFParser
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   at 
 org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137)
   at 
 org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120)
   at 
 org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153)
   at 
 org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96)
   at 
 org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:724)
 Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary 
 cannot be cast to org.apache.pdfbox.cos.COSStream
   at 
 org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312)
   at 
 org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   ... 13 more
 
 {code}
 here's the excerpt in the PDF:
 {code}
 241 0 obj  /Type /Metadata /Subtype /XML  endobj
 {code}
 the current code is
 {code}
 COSStream stream = (COSStream)root.getDictionaryObject( 
 COSName.METADATA );
 {code}
 shall we keep it that way or rather put out a warning if the meta data is not 
 a stream and return null? Adobe Reader does nothing when looking for the 
 properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2142) some /ICCBased colorspaces not rendered correctly

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2142:

Affects Version/s: 2.0.0

 some /ICCBased colorspaces not rendered correctly
 -

 Key: PDFBOX-2142
 URL: https://issues.apache.org/jira/browse/PDFBOX-2142
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: PDFBOX-2142.pdf, PDFBOX-2142.pdf-1.png, PDFBOX-2142.ps


 I have created a test file from PostScript to show that -CIELAB and XYZ- some 
 colors are different when rendered by PDFBox.
 Btw the RGB colors in the file have no meaning, nor do the colors have a 
 relationship between each others, i.e. they do not have to look identical to 
 any other color anywhere.
 The postscript file was created based on files by [James 
 Cloos|http://jhcloos.com/PostScript/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2130) PDAnnotationLinks are empty after saving as in Acrobat

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167332#comment-14167332
 ] 

John Hewson commented on PDFBOX-2130:
-

What are the Affects Version/s?

 PDAnnotationLinks are empty after saving as in Acrobat
 --

 Key: PDFBOX-2130
 URL: https://issues.apache.org/jira/browse/PDFBOX-2130
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Reporter: Andreas Weiss

 Hello dear pdfbox team,
 Do you have any idea on how to fix the problem with the not working links 
 after „saving as“ in Acrobat? 
 The PDAnnotationLinks/goToPage-action/ 
 PDPageDestination(PDPageFitHeihtDestination... doesn’t matter) –
 If you create a copy of the document using „save as“ option, than the links 
 in the new document are empty – no properties, no destinations.
 It seams that acrobat overriding then. The with acrobat created links 
 remaining only.
 Besides the COSArrays of the Destinations 'created in acrobat manually' and 
 'automatically using pdfbox' are slightly different.
 It’s a big Problem if you cannot save the document under other name without 
 links being destroyed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2424) ClassCastException in getMetaData if no real meta data

2014-10-10 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2424:

Attachment: 333472.pdf

 ClassCastException in getMetaData if no real meta data
 --

 Key: PDFBOX-2424
 URL: https://issues.apache.org/jira/browse/PDFBOX-2424
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
 Attachments: 333472.pdf


 Here's an exception from [~talli...@apache.org] latest TIKA test (too lazy to 
 test it myself, the cause is obvious) with the attached file:
 {code}
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
 org.apache.tika.parser.pdf.PDFParser
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   at 
 org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:137)
   at 
 org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:120)
   at 
 org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:153)
   at 
 org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:96)
   at 
 org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:38)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:724)
 Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSDictionary 
 cannot be cast to org.apache.pdfbox.cos.COSStream
   at 
 org.apache.pdfbox.pdmodel.PDDocumentCatalog.getMetadata(PDDocumentCatalog.java:312)
   at 
 org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:181)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   ... 13 more
 
 {code}
 here's the excerpt in the PDF:
 {code}
 241 0 obj  /Type /Metadata /Subtype /XML  endobj
 {code}
 the current code is
 {code}
 COSStream stream = (COSStream)root.getDictionaryObject( 
 COSName.METADATA );
 {code}
 shall we keep it that way or rather put out a warning if the meta data is not 
 a stream and return null? Adobe Reader does nothing when looking for the 
 properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1979) TypeTestingHelper is non-deterministic

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1979:

Fix Version/s: 2.0.0

 TypeTestingHelper is non-deterministic
 --

 Key: PDFBOX-1979
 URL: https://issues.apache.org/jira/browse/PDFBOX-1979
 Project: PDFBox
  Issue Type: Bug
  Components: XmpBox
Affects Versions: 1.8.7, 2.0.0
Reporter: John Hewson
Assignee: Guillaume Bailleul
 Fix For: 2.0.0

 Attachments: nd_test.patch


 TypeTestingHelper generates random calendar data and random UUIDs for 
 testing, which means that it is non-deterministic.
 As discussed in PDFBOX-1977, we should alter this test to make sure that it 
 has deterministic (regression test) functionality as well as the existing 
 non-deterministic (fuzz test) functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2168) Different behavior of Undo feature when form was pre filled by PDFBox

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2168:

Fix Version/s: 2.0.0

 Different behavior of Undo feature when form was pre filled by PDFBox
 -

 Key: PDFBOX-2168
 URL: https://issues.apache.org/jira/browse/PDFBOX-2168
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Reporter: Maruan Sahyoun
Priority: Minor
  Labels: Appearance
 Fix For: 2.0.0

 Attachments: formtemplate-filled-pdfbox.pdf, 
 formtemplate-filled-reader.pdf, formtemplate.pdf


 When a form is pre filled by PDFBox the Undo feature in Adobe Reader will 
 reset the field value but not change the visible appearance of the field i.e. 
 the old value will still be visible, The same form filled by Adobe 
 Reader/Acrobat behaves correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1987:

Affects Version/s: 2.0.0

 Provide a PDF Lexer as a base for PDF parsing
 -

 Key: PDFBOX-1987
 URL: https://issues.apache.org/jira/browse/PDFBOX-1987
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 2.0.0
Reporter: Maruan Sahyoun
Priority: Minor
 Fix For: 2.0.0

 Attachments: src.zip


 In order to enhance the parsing process and as a foundation for a combination 
 of the different parsers a PDF lexer should be provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1979) TypeTestingHelper is non-deterministic

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1979:

Affects Version/s: 2.0.0
   1.8.7

 TypeTestingHelper is non-deterministic
 --

 Key: PDFBOX-1979
 URL: https://issues.apache.org/jira/browse/PDFBOX-1979
 Project: PDFBox
  Issue Type: Bug
  Components: XmpBox
Affects Versions: 1.8.7, 2.0.0
Reporter: John Hewson
Assignee: Guillaume Bailleul
 Attachments: nd_test.patch


 TypeTestingHelper generates random calendar data and random UUIDs for 
 testing, which means that it is non-deterministic.
 As discussed in PDFBOX-1977, we should alter this test to make sure that it 
 has deterministic (regression test) functionality as well as the existing 
 non-deterministic (fuzz test) functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1978) Type1FontUtilTest is non-deterministic

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1978:

Affects Version/s: 2.0.0
   1.8.7

 Type1FontUtilTest is non-deterministic
 --

 Key: PDFBOX-1978
 URL: https://issues.apache.org/jira/browse/PDFBOX-1978
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.8.7, 2.0.0
Reporter: John Hewson
 Fix For: 2.0.0


 Type1FontUtilTest uses java.util.Random to generate random test data, which 
 means that it is is non-deterministic.
 As discussed in PDFBOX-1977, we should alter this test to make sure that it 
 has deterministic (regression test) functionality as well as the existing 
 non-deterministic (fuzz test) functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1978) Type1FontUtilTest is non-deterministic

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1978:

Fix Version/s: 2.0.0

 Type1FontUtilTest is non-deterministic
 --

 Key: PDFBOX-1978
 URL: https://issues.apache.org/jira/browse/PDFBOX-1978
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.8.7, 2.0.0
Reporter: John Hewson
 Fix For: 2.0.0


 Type1FontUtilTest uses java.util.Random to generate random test data, which 
 means that it is is non-deterministic.
 As discussed in PDFBOX-1977, we should alter this test to make sure that it 
 has deterministic (regression test) functionality as well as the existing 
 non-deterministic (fuzz test) functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1873) NoninvertibleTransformException if form field isn't set with Scroll long text option

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167336#comment-14167336
 ] 

John Hewson commented on PDFBOX-1873:
-

The Affects Version/s were not given, so I'm guessing this is a 1.8 bug and 
should be fixed in at least 2.0, if not 1.8.

 NoninvertibleTransformException if form field isn't set with Scroll long 
 text option
 --

 Key: PDFBOX-1873
 URL: https://issues.apache.org/jira/browse/PDFBOX-1873
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.7, 2.0.0
Reporter: Álison Fernandes
Priority: Critical
 Fix For: 2.0.0


 Creating a PDF with a form field in Adobe Acrobat X Pro, if the field doesn't 
 have the Scroll long text option set (which can be set in Field's 
 Properties Options tab), PDFBox will throw an extensive list of the same 
 exception (probably one for each char being drawn).
 Exception:
 Jan 31, 2014 10:31:59 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont writeFont
 SEVERE: Error in org.apache.pdfbox.pdmodel.font.PDType1Font.writeFont
 java.awt.geom.NoninvertibleTransformException: Determinant is 0
   at 
 java.awt.geom.AffineTransform.createInverse(AffineTransform.java:2707)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.writeFont(PDSimpleFont.java:339)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:147)
   at 
 org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:246)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:496)
   at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:156)
   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
   at 
 org.xpandit.vvp.signedpdf.PdfboxUtils.renderToPanel(PdfboxUtils.java:186)
   at org.xpandit.vvp.signedpdf.PDFApplet.repaintPDF(PDFApplet.java:843)
   at 
 org.xpandit.vvp.signedpdf.PDFApplet.actionPerformed(PDFApplet.java:925)
   at 
 javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
   at 
 javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
   at 
 javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
   at 
 javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
   at 
 javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
   at java.awt.Component.processMouseEvent(Component.java:6505)
   at javax.swing.JComponent.processMouseEvent(JComponent.java:3320)
   at java.awt.Component.processEvent(Component.java:6270)
   at java.awt.Container.processEvent(Container.java:2229)
   at java.awt.Component.dispatchEventImpl(Component.java:4861)
   at java.awt.Container.dispatchEventImpl(Container.java:2287)
   at java.awt.Component.dispatchEvent(Component.java:4687)
   at 
 java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
   at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492)
   at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
   at java.awt.Container.dispatchEventImpl(Container.java:2273)
   at java.awt.Component.dispatchEvent(Component.java:4687)
   at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735)
   at java.awt.EventQueue.access$200(EventQueue.java:103)
   at java.awt.EventQueue$3.run(EventQueue.java:694)
   at java.awt.EventQueue$3.run(EventQueue.java:692)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
   at java.awt.EventQueue$4.run(EventQueue.java:708)
   at java.awt.EventQueue$4.run(EventQueue.java:706)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
   at java.awt.EventQueue.dispatchEvent(EventQueue.java:705)
   at 
 java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
   at 
 java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
   at 
 

[jira] [Updated] (PDFBOX-1873) NoninvertibleTransformException if form field isn't set with Scroll long text option

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1873:

Fix Version/s: 2.0.0

 NoninvertibleTransformException if form field isn't set with Scroll long 
 text option
 --

 Key: PDFBOX-1873
 URL: https://issues.apache.org/jira/browse/PDFBOX-1873
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.7, 2.0.0
Reporter: Álison Fernandes
Priority: Critical
 Fix For: 2.0.0


 Creating a PDF with a form field in Adobe Acrobat X Pro, if the field doesn't 
 have the Scroll long text option set (which can be set in Field's 
 Properties Options tab), PDFBox will throw an extensive list of the same 
 exception (probably one for each char being drawn).
 Exception:
 Jan 31, 2014 10:31:59 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont writeFont
 SEVERE: Error in org.apache.pdfbox.pdmodel.font.PDType1Font.writeFont
 java.awt.geom.NoninvertibleTransformException: Determinant is 0
   at 
 java.awt.geom.AffineTransform.createInverse(AffineTransform.java:2707)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.writeFont(PDSimpleFont.java:339)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:147)
   at 
 org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:246)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:496)
   at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:156)
   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
   at 
 org.xpandit.vvp.signedpdf.PdfboxUtils.renderToPanel(PdfboxUtils.java:186)
   at org.xpandit.vvp.signedpdf.PDFApplet.repaintPDF(PDFApplet.java:843)
   at 
 org.xpandit.vvp.signedpdf.PDFApplet.actionPerformed(PDFApplet.java:925)
   at 
 javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
   at 
 javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
   at 
 javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
   at 
 javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
   at 
 javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
   at java.awt.Component.processMouseEvent(Component.java:6505)
   at javax.swing.JComponent.processMouseEvent(JComponent.java:3320)
   at java.awt.Component.processEvent(Component.java:6270)
   at java.awt.Container.processEvent(Container.java:2229)
   at java.awt.Component.dispatchEventImpl(Component.java:4861)
   at java.awt.Container.dispatchEventImpl(Container.java:2287)
   at java.awt.Component.dispatchEvent(Component.java:4687)
   at 
 java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
   at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492)
   at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
   at java.awt.Container.dispatchEventImpl(Container.java:2273)
   at java.awt.Component.dispatchEvent(Component.java:4687)
   at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735)
   at java.awt.EventQueue.access$200(EventQueue.java:103)
   at java.awt.EventQueue$3.run(EventQueue.java:694)
   at java.awt.EventQueue$3.run(EventQueue.java:692)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
   at java.awt.EventQueue$4.run(EventQueue.java:708)
   at java.awt.EventQueue$4.run(EventQueue.java:706)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
   at java.awt.EventQueue.dispatchEvent(EventQueue.java:705)
   at 
 java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
   at 
 java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
   at 
 java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
   at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
   at 

[jira] [Updated] (PDFBOX-1873) NoninvertibleTransformException if form field isn't set with Scroll long text option

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1873:

Affects Version/s: 2.0.0
   1.8.7

 NoninvertibleTransformException if form field isn't set with Scroll long 
 text option
 --

 Key: PDFBOX-1873
 URL: https://issues.apache.org/jira/browse/PDFBOX-1873
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.7, 2.0.0
Reporter: Álison Fernandes
Priority: Critical
 Fix For: 2.0.0


 Creating a PDF with a form field in Adobe Acrobat X Pro, if the field doesn't 
 have the Scroll long text option set (which can be set in Field's 
 Properties Options tab), PDFBox will throw an extensive list of the same 
 exception (probably one for each char being drawn).
 Exception:
 Jan 31, 2014 10:31:59 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont writeFont
 SEVERE: Error in org.apache.pdfbox.pdmodel.font.PDType1Font.writeFont
 java.awt.geom.NoninvertibleTransformException: Determinant is 0
   at 
 java.awt.geom.AffineTransform.createInverse(AffineTransform.java:2707)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.writeFont(PDSimpleFont.java:339)
   at 
 org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:147)
   at 
 org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:246)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:496)
   at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:156)
   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
   at 
 org.xpandit.vvp.signedpdf.PdfboxUtils.renderToPanel(PdfboxUtils.java:186)
   at org.xpandit.vvp.signedpdf.PDFApplet.repaintPDF(PDFApplet.java:843)
   at 
 org.xpandit.vvp.signedpdf.PDFApplet.actionPerformed(PDFApplet.java:925)
   at 
 javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
   at 
 javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
   at 
 javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
   at 
 javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
   at 
 javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
   at java.awt.Component.processMouseEvent(Component.java:6505)
   at javax.swing.JComponent.processMouseEvent(JComponent.java:3320)
   at java.awt.Component.processEvent(Component.java:6270)
   at java.awt.Container.processEvent(Container.java:2229)
   at java.awt.Component.dispatchEventImpl(Component.java:4861)
   at java.awt.Container.dispatchEventImpl(Container.java:2287)
   at java.awt.Component.dispatchEvent(Component.java:4687)
   at 
 java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
   at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492)
   at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
   at java.awt.Container.dispatchEventImpl(Container.java:2273)
   at java.awt.Component.dispatchEvent(Component.java:4687)
   at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735)
   at java.awt.EventQueue.access$200(EventQueue.java:103)
   at java.awt.EventQueue$3.run(EventQueue.java:694)
   at java.awt.EventQueue$3.run(EventQueue.java:692)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
   at java.awt.EventQueue$4.run(EventQueue.java:708)
   at java.awt.EventQueue$4.run(EventQueue.java:706)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
   at java.awt.EventQueue.dispatchEvent(EventQueue.java:705)
   at 
 java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
   at 
 java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
   at 
 java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
   at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
   at 

[jira] [Updated] (PDFBOX-1842) Warn if command-line pdf encryption destroys a pre-existing signature

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1842:

Fix Version/s: 2.0.0

 Warn if command-line pdf encryption destroys a pre-existing signature
 -

 Key: PDFBOX-1842
 URL: https://issues.apache.org/jira/browse/PDFBOX-1842
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.7, 2.0.0
Reporter: Ralf Hauser
Priority: Minor
 Fix For: 2.0.0


 see also PDFBOX-1594 , PDFBOX-912



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1863) Can't resize PDFPagePanel render

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1863:

Affects Version/s: 2.0.0
   1.8.7

 Can't resize PDFPagePanel render
 

 Key: PDFBOX-1863
 URL: https://issues.apache.org/jira/browse/PDFBOX-1863
 Project: PDFBox
  Issue Type: Bug
  Components: Swing GUI
Affects Versions: 1.8.7, 2.0.0
Reporter: Álison Fernandes

 I tried to use PDFPagePanel to render a PDF to an applet but, I had to change 
 my implementation because PDFPagePanel wasn't resizing the rendering so it 
 could be bigger. 
 I've checked in the source code (of pdfbox-1.8.2 and in the SVN trunk), the 
 Dimension drawDimension var that sets the rendering size isn't accessible 
 from outside and it will draw using the dimension of the PDF Cropbox.
 My current implementation to bypass this is:
 - Create a JPanel
 - Render the page to an image using PDPage.convertToImage(...)
 - Add the image to the JPanel using JLabel picLabel = new JLabel(new 
 ImageIcon(page.convertToImage(...)));
 - Repeat for all the pages
 - Set the panel as a viewport in a JScrollPane
 Unfortunately, this method takes way too much time if you have to render 
 things multiple times (~1 second for more complex pages with forms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1842) Warn if command-line pdf encryption destroys a pre-existing signature

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1842:

Affects Version/s: 2.0.0
   1.8.7

 Warn if command-line pdf encryption destroys a pre-existing signature
 -

 Key: PDFBOX-1842
 URL: https://issues.apache.org/jira/browse/PDFBOX-1842
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.7, 2.0.0
Reporter: Ralf Hauser
Priority: Minor
 Fix For: 2.0.0


 see also PDFBOX-1594 , PDFBOX-912



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1833) BaseParser tidy up

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1833:

Affects Version/s: 2.0.0

 BaseParser tidy up
 --

 Key: PDFBOX-1833
 URL: https://issues.apache.org/jira/browse/PDFBOX-1833
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 2.0.0
Reporter: Jens Kapitza
Priority: Minor
 Fix For: 2.0.0

 Attachments: baseparser.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 Tidy up logic (should not change the parsing result)
 Character.isWhitespace(c) is the only point wich may have site effects (but i 
 assume there is no File-Seperator in parseCOSHexString)
 so this should pass as it passes befor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1833) BaseParser tidy up

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1833:

Fix Version/s: 2.0.0

 BaseParser tidy up
 --

 Key: PDFBOX-1833
 URL: https://issues.apache.org/jira/browse/PDFBOX-1833
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing
Affects Versions: 2.0.0
Reporter: Jens Kapitza
Priority: Minor
 Fix For: 2.0.0

 Attachments: baseparser.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 Tidy up logic (should not change the parsing result)
 Character.isWhitespace(c) is the only point wich may have site effects (but i 
 assume there is no File-Seperator in parseCOSHexString)
 so this should pass as it passes befor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-1788) [PATCH] Show warning if system font not found

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1788.
-
Resolution: Won't Fix

We're no longer using AWT fonts in 2.0, so this patch no longer applies.

 [PATCH] Show warning if system font not found
 -

 Key: PDFBOX-1788
 URL: https://issues.apache.org/jira/browse/PDFBOX-1788
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Reporter: simon steiner
 Attachments: warnmissingfonts.patch


 If you process a pdf which doesnt embed a font, pdfbox will try to use system 
 font but that font may not exist so we should print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1710) PDF structure report as XML

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1710:

Affects Version/s: 1.8.2

 PDF structure report as XML
 ---

 Key: PDFBOX-1710
 URL: https://issues.apache.org/jira/browse/PDFBOX-1710
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Affects Versions: 1.8.2
Reporter: Axel Rose
Priority: Minor
 Attachments: PdfAnalysis.java


 I wrote a utility to get an XML report of a PDF input file.
 Please see the attached source code and check if it can be incorporated into 
 a pdfbox release.
 I'm happy to hear about problems in my code and tasks to correct this.
 Test on a command line like this
 java -cp pdfbox-1.8.2.jar:commons-logging-1.0.4.jar:bcprov-jdk16-145.jar:. 
 PdfAnalysis file.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-1579) add logging if /FontFile2 entry is missing and a system font is used instead

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1579.
---
Resolution: Won't Fix

We're no longer using AWT fonts in 2.0, so this patch no longer applies.

 add logging if /FontFile2 entry is missing and a system font is used instead
 

 Key: PDFBOX-1579
 URL: https://issues.apache.org/jira/browse/PDFBOX-1579
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Reporter: Luis Bernardo
Priority: Trivial
 Attachments: pdfbox-1579.patch


 This issue became apparent when output from a PDF that had a non embedded TTF 
 (/FontFile2 entry missing) was different in different machines. After 
 investigation I found out PDFBox was using a system font, but in one machine 
 the font was installed and in the other it wasn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1573) pdf text highlighting

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167345#comment-14167345
 ] 

John Hewson commented on PDFBOX-1573:
-

Didn't we add an example to 1.8 which does this?

 pdf text highlighting
 -

 Key: PDFBOX-1573
 URL: https://issues.apache.org/jira/browse/PDFBOX-1573
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Reporter: Arun

 Try to add a method which return the List coordinates of the given text.
 Find all locations of the text, determine x/y coordinates, width/height
 Feature is similar to https://pdfclown.wordpress.com/tag/text-highlighting/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1573) pdf text highlighting

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1573:

Priority: Minor  (was: Major)

 pdf text highlighting
 -

 Key: PDFBOX-1573
 URL: https://issues.apache.org/jira/browse/PDFBOX-1573
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Affects Versions: 1.8.6, 2.0.0
Reporter: Arun
Priority: Minor

 Try to add a method which return the List coordinates of the given text.
 Find all locations of the text, determine x/y coordinates, width/height
 Feature is similar to https://pdfclown.wordpress.com/tag/text-highlighting/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1573) pdf text highlighting

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1573:

Affects Version/s: 2.0.0
   1.8.6

 pdf text highlighting
 -

 Key: PDFBOX-1573
 URL: https://issues.apache.org/jira/browse/PDFBOX-1573
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Affects Versions: 1.8.6, 2.0.0
Reporter: Arun

 Try to add a method which return the List coordinates of the given text.
 Find all locations of the text, determine x/y coordinates, width/height
 Feature is similar to https://pdfclown.wordpress.com/tag/text-highlighting/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-1537) [PATCH] Java crash, Type 2 CID Fonts and image alpha channels not properly handled in Imported PDFs

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1537.
---
Resolution: Won't Fix

1.8's font handling isn't going to receive any further attention.

 [PATCH] Java crash, Type 2 CID Fonts and image alpha channels not properly 
 handled in Imported PDFs
 ---

 Key: PDFBOX-1537
 URL: https://issues.apache.org/jira/browse/PDFBOX-1537
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Reporter: simon steiner
 Attachments: pdfinpstrunk.patch, simontest.pdf, test.fo


 Running from fop
 fop test.fo -ps out.ps
 cid font gives crash, if i disable cid then font is wrong and image inverted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-1534) Graphics2D to create PDF

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1534.
---
Resolution: Won't Fix

No.

 Graphics2D to create PDF
 

 Key: PDFBOX-1534
 URL: https://issues.apache.org/jira/browse/PDFBOX-1534
 Project: PDFBox
  Issue Type: New Feature
  Components: Writing
Reporter: Samuel Pfitzer
Priority: Minor

 Apache FOP has a PDFGraphics2D. It is used for drawing into a pdf document. 
 Drawing into an image is not an alternative because the size of the document 
 gets too big.
 Is it planned to have this feature for PDFBox?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1450) document how to encrypt with AES 256

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1450:

Fix Version/s: 2.0.0

 document how to encrypt with AES 256
 

 Key: PDFBOX-1450
 URL: https://issues.apache.org/jira/browse/PDFBOX-1450
 Project: PDFBox
  Issue Type: Wish
  Components: Documentation
Affects Versions: 2.0.0
Reporter: Ralf Hauser
Priority: Minor
 Fix For: 2.0.0


 please add a java code sample how to do this on the web-site and link to it 
 from 
 http://pdfbox.apache.org/commandlineutilities/Encrypt.html
 see also see also PDFBOX-953  and see also PDFBOX-135



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1450) document how to encrypt with AES 256

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1450:

Affects Version/s: 2.0.0

 document how to encrypt with AES 256
 

 Key: PDFBOX-1450
 URL: https://issues.apache.org/jira/browse/PDFBOX-1450
 Project: PDFBox
  Issue Type: Wish
  Components: Documentation
Affects Versions: 2.0.0
Reporter: Ralf Hauser
Priority: Minor
 Fix For: 2.0.0


 please add a java code sample how to do this on the web-site and link to it 
 from 
 http://pdfbox.apache.org/commandlineutilities/Encrypt.html
 see also see also PDFBOX-953  and see also PDFBOX-135



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-1409) Create Preflight documentation

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1409.
-
Resolution: Fixed

Close enough.

 Create Preflight documentation
 --

 Key: PDFBOX-1409
 URL: https://issues.apache.org/jira/browse/PDFBOX-1409
 Project: PDFBox
  Issue Type: Task
  Components: Preflight
Reporter: Eric Leleu
Priority: Minor

 Add documentation about the preflight module. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-1386) Proposal for classes to handle optional contents

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1386.
-
Resolution: Won't Fix

We already have classes for using optional content in the 
org.apache.pdfbox.pdmodel.graphics.optionalcontent package.

 Proposal for classes to handle optional contents
 

 Key: PDFBOX-1386
 URL: https://issues.apache.org/jira/browse/PDFBOX-1386
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Reporter: Dominic Tubach
Priority: Minor
 Attachments: DTCOSName.java, DTPDContentUsageDictionary.java, 
 DTPDContentUsageDictionaryTest.java, DTPDOptionalContentConfiguration.java, 
 DTPDOptionalContentConfigurationTest.java, DTPDOptionalContentGroup.java, 
 DTPDOptionalContentGroupTest.java, 
 DTPDOptionalContentMembershipDictionary.java, 
 DTPDOptionalContentMembershipDictionaryTest.java, 
 DTPDOptionalContentProperties.java, DTPDOptionalContentPropertiesTest.java, 
 DTPDUsageApplicationDictionary.java, DTPDUsageApplicationDictionaryTest.java


 Attached are classes as proposal to handle optional contents.
 It requires the classes in the issues #PDFBOX-1383 and #PDFBOX-1385
 It requires Java 1.6 (It might be enough to remove the @Override annotations 
 for Java 1.5 compatibility.) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-1385) Proposal for a PD tree that represents a tree based on arrays.

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1385.
---
Resolution: Won't Fix

2.0 has taken a different direction with the handling of trees.

 Proposal for a PD tree that represents a tree based on arrays.
 --

 Key: PDFBOX-1385
 URL: https://issues.apache.org/jira/browse/PDFBOX-1385
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Reporter: Dominic Tubach
Priority: Minor
 Attachments: DTPDTreeIntermediateNode.java, DTPDTreeLabeledNode.java, 
 DTPDTreeLeafNode.java, DTPDTreeNode.java, DTPDTreeNodeTest.java


 Attached is a proposal for a PD tree that represents a tree that is based on 
 arrays (such as RBGroups).
 The required COSArrayList and COSBaseConverter can be found in issue 
 #PDFBOX-1383
 It requires Java 1.6 (It might be enough to remove the @Override annotations 
 for Java 1.5 compatibility.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1383) Proposal for a new COSArrayList

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1383:

Affects Version/s: 2.0.0
   1.8.7

 Proposal for a new COSArrayList
 ---

 Key: PDFBOX-1383
 URL: https://issues.apache.org/jira/browse/PDFBOX-1383
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: Dominic Tubach
Priority: Minor
 Fix For: 2.0.0

 Attachments: DTCOSArrayList.java, DTCOSArrayListTest.java, 
 DTCOSBaseConverter.java, DefaultDTCOSBaseConverter.java, 
 DefaultDTCOSBaseConverterTest.java


 Attached is a proposal for a new COSArrayList.
 Main differences to the existing COSArrayList:
 - type safety through generics.
 - it's always clear which types of objects the array holds.
 - flexible loading of objects from a dictionary through COSBaseConverter (see 
 below).
 - correct updating of dictionary entry, no matter whether it is optional, a 
 single value is allowed, or it is required.
 - listener interface.
 However there are some drawbacks:
 - it allows only classes/interfaces that implement/extend COSObjectable.
 - DualCOSObjectables are not possible. (Would require an extra class.)
 - no Java types such as String or Float (I see this as advantage as I was a 
 bit confused when I expected an Array with COSNames, but got Strings. By the 
 way adding a String in that case would not add a COSName as one might expect, 
 but a COSString.)
 - replacing the existing COSArrayList would require changes in existing code.
 - requires (as of now) Java 1.6 (It might be enough to remove the @Override 
 annotations for Java 1.5 compatibility.)
 Now to the COSBaseConverter. The COSBaseConverter is just an interface that 
 defines a conversion method to convert a COSBase object to a class that 
 implements COSObjectable.
 The default implementation tries to find a fitting constructor to instantiate 
 the object.
 If the destination class is an Enum it tries to find a fitting static valueOf 
 method to create the object.
 (To avoid a conflict with the existing COSArrayList i prefixed everything 
 with my initials.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1383) Proposal for a new COSArrayList

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1383:

Priority: Major  (was: Minor)

 Proposal for a new COSArrayList
 ---

 Key: PDFBOX-1383
 URL: https://issues.apache.org/jira/browse/PDFBOX-1383
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.8.7, 2.0.0
Reporter: Dominic Tubach
 Fix For: 2.0.0

 Attachments: DTCOSArrayList.java, DTCOSArrayListTest.java, 
 DTCOSBaseConverter.java, DefaultDTCOSBaseConverter.java, 
 DefaultDTCOSBaseConverterTest.java


 Attached is a proposal for a new COSArrayList.
 Main differences to the existing COSArrayList:
 - type safety through generics.
 - it's always clear which types of objects the array holds.
 - flexible loading of objects from a dictionary through COSBaseConverter (see 
 below).
 - correct updating of dictionary entry, no matter whether it is optional, a 
 single value is allowed, or it is required.
 - listener interface.
 However there are some drawbacks:
 - it allows only classes/interfaces that implement/extend COSObjectable.
 - DualCOSObjectables are not possible. (Would require an extra class.)
 - no Java types such as String or Float (I see this as advantage as I was a 
 bit confused when I expected an Array with COSNames, but got Strings. By the 
 way adding a String in that case would not add a COSName as one might expect, 
 but a COSString.)
 - replacing the existing COSArrayList would require changes in existing code.
 - requires (as of now) Java 1.6 (It might be enough to remove the @Override 
 annotations for Java 1.5 compatibility.)
 Now to the COSBaseConverter. The COSBaseConverter is just an interface that 
 defines a conversion method to convert a COSBase object to a class that 
 implements COSObjectable.
 The default implementation tries to find a fitting constructor to instantiate 
 the object.
 If the destination class is an Enum it tries to find a fitting static valueOf 
 method to create the object.
 (To avoid a conflict with the existing COSArrayList i prefixed everything 
 with my initials.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-1317) PDFBox giving AcroFields size zero for some pdf document.

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1317.
---
Resolution: Cannot Reproduce

The PDF file in the link is no longer available.

 PDFBox giving AcroFields size zero for some pdf document.
 -

 Key: PDFBOX-1317
 URL: https://issues.apache.org/jira/browse/PDFBOX-1317
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Reporter: Manoj Patel

 I am working on PDF Form fill utility and found some of pdf return blank 
 acrofield array. 
 Download PDF document from below mentioned link
 https://skydrive.live.com/redir?resid=C420713A859E927D!118authkey=!AJqh1odSC8MqMrE
 It will give blank list of acrofields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-1301) Wrong characters in HTML/TXT file from PDF containing scanned pages/images

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1301.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

This is fixed in 2.0

 Wrong characters in HTML/TXT file from PDF containing scanned pages/images
 --

 Key: PDFBOX-1301
 URL: https://issues.apache.org/jira/browse/PDFBOX-1301
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
 Environment: Windows XP, java version 1.6.0_29
Reporter: Jan Divis
 Fix For: 2.0.0

 Attachments: 54391-scan.pdf, converted-wrong-chars.html, 
 correct-chars-when-converted-splitted-page.html


 When trying to extract text/html from attached PDF file, there are some wrong 
 characters (instead of characters with diacritics):
 Pro úþely tohoto Protokolu mohou bêt sdělení ]asílána prostřednictvím 
 elektronickêch nebo Makêchkoli Minêch prostředkĤ
 instead of
 Pro účely tohoto Protokolu mohou být sdělení zasílána prostřednictvím 
 elektronických nebo jakýchkoli jiných prostředků
 resp. 
 Pro #250;#254;ely tohoto Protokolu mohou b#234;t sd#283;len#237; 
 ]as#237;l#225;na prost#345;ednictv#237;m elektronick#234;ch nebo 
 Mak#234;chkoli Min#234;ch prost#345;edk#292; 
 instead of
 Pro #250;#269;ely tohoto Protokolu mohou b#253;t sd#283;len#237; 
 zas#237;l#225;na prost#345;ednictv#237;m elektronick#253;ch nebo 
 jak#253;chkoli jin#253;ch prost#345;edk#367;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1234:

Affects Version/s: 2.0.0
   1.8.4

 NPE at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
 ---

 Key: PDFBOX-1234
 URL: https://issues.apache.org/jira/browse/PDFBOX-1234
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.4, 2.0.0
Reporter: Christer Palm
 Fix For: 2.0.0

 Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf


 Using SVN trunk revision 1291094 (2012-02-18)
 Getting the following stack trace when trying to call PDField.setValue() on a 
 AcroForm field in the attached document;
 java.lang.NullPointerException
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371)
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281)
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
 Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, 
 in turn because the font dictionary for the DA of the field (/Cour 11 Tf 0 
 g) is not present in the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1234:

Fix Version/s: 2.0.0

 NPE at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
 ---

 Key: PDFBOX-1234
 URL: https://issues.apache.org/jira/browse/PDFBOX-1234
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.4, 2.0.0
Reporter: Christer Palm
 Fix For: 2.0.0

 Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf


 Using SVN trunk revision 1291094 (2012-02-18)
 Getting the following stack trace when trying to call PDField.setValue() on a 
 AcroForm field in the attached document;
 java.lang.NullPointerException
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371)
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281)
   at 
 org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
 Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, 
 in turn because the font dictionary for the DA of the field (/Cour 11 Tf 0 
 g) is not present in the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1176) Watermark Annotations

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1176:

Summary: Watermark Annotations  (was: Watermark)

 Watermark Annotations
 -

 Key: PDFBOX-1176
 URL: https://issues.apache.org/jira/browse/PDFBOX-1176
 Project: PDFBox
  Issue Type: Wish
  Components: Writing
Reporter: Rubesh MX
  Labels: Watermark
   Original Estimate: 24h
  Remaining Estimate: 24h

 I am checking if watermarks can  be added to a PDF doc and the same way can 
 be removed, so far I could not find any option to do that with PDFBox; It 
 will be better if we have an option to add and remove watermak to a PDF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1176) Watermark Annotations

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1176:

Affects Version/s: 2.0.0
   1.8.7

 Watermark Annotations
 -

 Key: PDFBOX-1176
 URL: https://issues.apache.org/jira/browse/PDFBOX-1176
 Project: PDFBox
  Issue Type: Wish
  Components: Writing
Affects Versions: 1.8.7, 2.0.0
Reporter: Rubesh MX
  Labels: Watermark
 Fix For: 2.0.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am checking if watermarks can  be added to a PDF doc and the same way can 
 be removed, so far I could not find any option to do that with PDFBox; It 
 will be better if we have an option to add and remove watermak to a PDF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1176) Watermark Annotations

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1176:

Fix Version/s: 2.0.0

 Watermark Annotations
 -

 Key: PDFBOX-1176
 URL: https://issues.apache.org/jira/browse/PDFBOX-1176
 Project: PDFBox
  Issue Type: Wish
  Components: Writing
Affects Versions: 1.8.7, 2.0.0
Reporter: Rubesh MX
  Labels: Watermark
 Fix For: 2.0.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am checking if watermarks can  be added to a PDF doc and the same way can 
 be removed, so far I could not find any option to do that with PDFBox; It 
 will be better if we have an option to add and remove watermak to a PDF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1155) setSuppressDuplicateOverlappingText sometimes removes characters that it shouldn't

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1155:

Fix Version/s: 2.0.0

 setSuppressDuplicateOverlappingText sometimes removes characters that it 
 shouldn't
 --

 Key: PDFBOX-1155
 URL: https://issues.apache.org/jira/browse/PDFBOX-1155
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.8.7, 2.0.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 2.0.0

 Attachments: 000527.pdf, dedup.diffs.txt


 The duplicate detection (in PDFTextStripper.java) checks whether the
 same character was placed nearish to where we are about to place
 another and de-dups it if so; this is to catch documents that rewind
 and overwrite in order to bold word(s).
 But in some cases I see it removing valid characters (that were not
 dups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1155) setSuppressDuplicateOverlappingText sometimes removes characters that it shouldn't

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1155:

Affects Version/s: 2.0.0
   1.8.7

 setSuppressDuplicateOverlappingText sometimes removes characters that it 
 shouldn't
 --

 Key: PDFBOX-1155
 URL: https://issues.apache.org/jira/browse/PDFBOX-1155
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.8.7, 2.0.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 2.0.0

 Attachments: 000527.pdf, dedup.diffs.txt


 The duplicate detection (in PDFTextStripper.java) checks whether the
 same character was placed nearish to where we are about to place
 another and de-dups it if so; this is to catch documents that rewind
 and overwrite in order to bold word(s).
 But in some cases I see it removing valid characters (that were not
 dups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1143) PDFTextStripper doesn't process text annotations

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1143:

Affects Version/s: 1.7.0

 PDFTextStripper doesn't process text annotations
 

 Key: PDFBOX-1143
 URL: https://issues.apache.org/jira/browse/PDFBOX-1143
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.7.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 2.0.0


 Users are able to add annotations (comments) to a PDF, and PDFBox
 processes them correctly: you can retrieve them via
 PDPage.getAnnotations.
 But PDFTextStripper currently doesn't extract the text from
 annotations.
 I think it [optionally] should?
 I think we'd add a boolean (shouldProcessAnnotations?), and if
 enabled, we'd visit the annotations that have sub-type FreeText, and
 extract what text we can (Subject, TitlePopup, Contents, maybe
 RichContents?), associate the .getRectangle with the text to make a
 TextPosition, and then somehow associate that with the right
 article (so that annotations over a given article are rendered
 with it).
 Alternatively we just put all annotations into their own article?
 I'm not familiar enough with PDF text positioning nor PDFTextStripper
 to work out a real patch here... but I think this approach should
 work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1143) PDFTextStripper doesn't process text annotations

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1143:

Fix Version/s: 2.0.0

 PDFTextStripper doesn't process text annotations
 

 Key: PDFBOX-1143
 URL: https://issues.apache.org/jira/browse/PDFBOX-1143
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.7.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 2.0.0


 Users are able to add annotations (comments) to a PDF, and PDFBox
 processes them correctly: you can retrieve them via
 PDPage.getAnnotations.
 But PDFTextStripper currently doesn't extract the text from
 annotations.
 I think it [optionally] should?
 I think we'd add a boolean (shouldProcessAnnotations?), and if
 enabled, we'd visit the annotations that have sub-type FreeText, and
 extract what text we can (Subject, TitlePopup, Contents, maybe
 RichContents?), associate the .getRectangle with the text to make a
 TextPosition, and then somehow associate that with the right
 article (so that annotations over a given article are rendered
 with it).
 Alternatively we just put all annotations into their own article?
 I'm not familiar enough with PDF text positioning nor PDFTextStripper
 to work out a real patch here... but I think this approach should
 work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-1121) PDF Fields becomes non editable

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1121.
-
Resolution: Invalid

This issue is so old, that it's almost certainly no longer valid.

 PDF Fields becomes non editable
 ---

 Key: PDFBOX-1121
 URL: https://issues.apache.org/jira/browse/PDFBOX-1121
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Reporter: Rubesh MX
Priority: Minor
  Labels: .NET, newbie
   Original Estimate: 8h
  Remaining Estimate: 8h

 Hi, I am new to using PDFBox, so apologies if this is not a bug.
 I am using the .net version of PDFBox; I have a PDF File with editable 
 fields, I am trying to read all the field values and write the field values 
 if necessary, reading is fine, but when I write the values to the fields, and 
 save the doc. programatically. The fields become non-editable. Could you 
 please tell me what is going wrong, infact I even set the permission to 
 canFillinForm but it is of no use, my PDF is not password protected.
 Also When I open a different PDF file, during Load I am seeing this error - 
 expected='startxref' actual='3281' org.pdfbox.io.PushBackInputStream@329c933
 Could you please advise me on the above issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1109) Data corruption related to scratch file use

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1109:

Fix Version/s: 2.0.0

 Data corruption related to scratch file use
 ---

 Key: PDFBOX-1109
 URL: https://issues.apache.org/jira/browse/PDFBOX-1109
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.7, 2.0.0
Reporter: Stefan Mücke
Assignee: Andreas Lehmkühler
Priority: Critical
 Fix For: 2.0.0

 Attachments: COSDocument.java, PagedMultiRandomAccessFile.java, 
 PagedMultiRandomAccessFileTest.java


 PDFBox uses a scratch file to reduce memory consumption. However, there is no 
 mechanism that prevents two PDStreams from writing to the scratch file at the 
 same time. When this happens, the resulting PDF contains garbage in some 
 streams. This problem occurred several times to me (e.g. when writing to an 
 image stream while constructing a page).
 Reproducing the bug
 ***
 One can easily reproduce the bug. Open file AddImageToPDF.java and move the 
 following line:
 PDPageContentStream contentStream =
 new PDPageContentStream(doc, page, true, true);
 immediately after the line in which the PDPage object is fetched:
 PDPage page =
 (PDPage)doc.getDocumentCatalog().getAllPages().get( 0 );
 
 With this modification, one will still get a PDF file, but Acrobat Reader 
 will report that the image could not be processed. BTW, the files 
 AddImageToPDF.java and ImageToPDF.java are almost identical. One of them 
 should be deleted.
 Bug-Fix
 ***
 The problem can be solved by using a scratch file that is divided into pages 
 (e.g. of 4 KB). Each PDStream in the scratch file is then associated with a 
 list of pages. This list grows as more data is written to the stream.
 The bug fix requires minimal changes to the existing code. The very nice 
 RandomAccess interface made this very easy.
 Here is what needs to be changed:
 - Add the attached PagedMultiRandomAccessFile.java to the I/O package
 - Change COSDocument.getScratchFile() to return a RandomAccess
   instance provided by PagedMultiRandomAccessFile:
   private PagedMultiRandomAccessFile scratchFile = null;
   [...]
   public COSDocument(File scratchDir) throws IOException {
   tmpFile = File.createTempFile(pdfbox, tmp, scratchDir);
   scratchFile = new PagedMultiRandomAccessFile(
   new RandomAccessFile(tmpFile, rw));
   }
   public COSDocument(RandomAccess file) {
   // scratchFile = file;
   throw new RuntimeException(Not yet implemented.); 
 //$NON-NLS-1$
   }
   
   [...]
   /**
* Returns a new scratch file.
*
* @return the newly created scratch file
*/
   public RandomAccess getScratchFile() {
   return scratchFile.getNewRandomAcess();
   }
 One of the COSDocument constructors takes a RandomAccess file. This 
 constructor is only called in a single location, namely, in method 
 PDFParser.parse(). I am not sure if the RandomAccess parameter provided here 
 is really a scratch file. Someone will have to decide what to do with this 
 one.
 The code has been throughly tested and has been used in the production of 
 several books without any problems.
 In the attachment please find the code. There is also a JUnit test that was 
 used to debug my code. I have added an Apache license header and adopted 
 PDFBox's code style. Feel free to make any desired changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1109) Data corruption related to scratch file use

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1109:

Affects Version/s: 2.0.0
   1.8.7

 Data corruption related to scratch file use
 ---

 Key: PDFBOX-1109
 URL: https://issues.apache.org/jira/browse/PDFBOX-1109
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.7, 2.0.0
Reporter: Stefan Mücke
Assignee: Andreas Lehmkühler
Priority: Critical
 Attachments: COSDocument.java, PagedMultiRandomAccessFile.java, 
 PagedMultiRandomAccessFileTest.java


 PDFBox uses a scratch file to reduce memory consumption. However, there is no 
 mechanism that prevents two PDStreams from writing to the scratch file at the 
 same time. When this happens, the resulting PDF contains garbage in some 
 streams. This problem occurred several times to me (e.g. when writing to an 
 image stream while constructing a page).
 Reproducing the bug
 ***
 One can easily reproduce the bug. Open file AddImageToPDF.java and move the 
 following line:
 PDPageContentStream contentStream =
 new PDPageContentStream(doc, page, true, true);
 immediately after the line in which the PDPage object is fetched:
 PDPage page =
 (PDPage)doc.getDocumentCatalog().getAllPages().get( 0 );
 
 With this modification, one will still get a PDF file, but Acrobat Reader 
 will report that the image could not be processed. BTW, the files 
 AddImageToPDF.java and ImageToPDF.java are almost identical. One of them 
 should be deleted.
 Bug-Fix
 ***
 The problem can be solved by using a scratch file that is divided into pages 
 (e.g. of 4 KB). Each PDStream in the scratch file is then associated with a 
 list of pages. This list grows as more data is written to the stream.
 The bug fix requires minimal changes to the existing code. The very nice 
 RandomAccess interface made this very easy.
 Here is what needs to be changed:
 - Add the attached PagedMultiRandomAccessFile.java to the I/O package
 - Change COSDocument.getScratchFile() to return a RandomAccess
   instance provided by PagedMultiRandomAccessFile:
   private PagedMultiRandomAccessFile scratchFile = null;
   [...]
   public COSDocument(File scratchDir) throws IOException {
   tmpFile = File.createTempFile(pdfbox, tmp, scratchDir);
   scratchFile = new PagedMultiRandomAccessFile(
   new RandomAccessFile(tmpFile, rw));
   }
   public COSDocument(RandomAccess file) {
   // scratchFile = file;
   throw new RuntimeException(Not yet implemented.); 
 //$NON-NLS-1$
   }
   
   [...]
   /**
* Returns a new scratch file.
*
* @return the newly created scratch file
*/
   public RandomAccess getScratchFile() {
   return scratchFile.getNewRandomAcess();
   }
 One of the COSDocument constructors takes a RandomAccess file. This 
 constructor is only called in a single location, namely, in method 
 PDFParser.parse(). I am not sure if the RandomAccess parameter provided here 
 is really a scratch file. Someone will have to decide what to do with this 
 one.
 The code has been throughly tested and has been used in the production of 
 several books without any problems.
 In the attachment please find the code. There is also a JUnit test that was 
 used to debug my code. I have added an Apache license header and adopted 
 PDFBox's code style. Feel free to make any desired changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-1086) Error when decoding CCITT compressed data that contains EOLs, fill bits etc.

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1086.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

This is good enough to count as fixed.

 Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
 

 Key: PDFBOX-1086
 URL: https://issues.apache.org/jira/browse/PDFBOX-1086
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Reporter: Jeremias Maerki
Assignee: Jeremias Maerki
  Labels: CCITTFaxDecode, ccitt
 Fix For: 2.0.0


 The TIFFFaxDecoder class (originally coming from JAI via XML Graphics 
 Commons) does not handle cases like EOLs between lines and in front. But the 
 PDF CCITTFaxDecode filter needs to allow many different variants of the 
 encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT 
 data, so TIFFFaxDecoder was not written to be as flexible as we need it. 
 Ideally, PDFBox should handle anything that gets thrown at it.
 It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with 
 the necessary flexibility. So, new decoders for T.4 and T.6 should probably 
 be written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1000) Conforming parser

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1000:

Fix Version/s: 2.0.0

 Conforming parser
 -

 Key: PDFBOX-1000
 URL: https://issues.apache.org/jira/browse/PDFBOX-1000
 Project: PDFBox
  Issue Type: New Feature
  Components: Parsing
Affects Versions: 1.6.0
Reporter: Adam Nichols
Assignee: Adam Nichols
 Fix For: 1.7.0, 2.0.0

 Attachments: COSUnread.java, ConformingPDDocument.java, 
 ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, 
 PDFLexer.java, PDFStreamConstants.java, PDFStreamConstants.java, 
 XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf


 A conforming parser will start at the end of the file and read backward until 
 it has read the EOF marker, the xref location, and trailer[1].  Once this is 
 read, it will read in the xref table so it can locate other objects and 
 revisions.  This also allows skipping objects which have been rendered 
 obsolete (per the xref table)[2].  It also allows the minimum amount of 
 information to be read when the file is loaded, and then subsequent 
 information will be loaded if and when it is requested.  This is all laid out 
 in the official PDF specification, ISO 32000-1:2008.
 Existing code will be re-used where possible, but this will require new 
 classes in order to accommodate the lazy reading which is a very different 
 paradigm from the existing parser.  Using separate classes will also 
 eliminate the possibility of regression bugs from making their way into the 
 PDDocument or BaseParser classes.  Changes to existing classes will be kept 
 to a minimum in order to prevent regression bugs.
 [1] Section 7.5.5 Conforming readers should read a PDF file from its end
 [2] Section 7.5.4 the entire file need not be read to locate any particular 
 object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1000) Conforming parser

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1000:

Affects Version/s: 1.6.0

 Conforming parser
 -

 Key: PDFBOX-1000
 URL: https://issues.apache.org/jira/browse/PDFBOX-1000
 Project: PDFBox
  Issue Type: New Feature
  Components: Parsing
Affects Versions: 1.6.0
Reporter: Adam Nichols
Assignee: Adam Nichols
 Fix For: 1.7.0, 2.0.0

 Attachments: COSUnread.java, ConformingPDDocument.java, 
 ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, 
 PDFLexer.java, PDFStreamConstants.java, PDFStreamConstants.java, 
 XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf


 A conforming parser will start at the end of the file and read backward until 
 it has read the EOF marker, the xref location, and trailer[1].  Once this is 
 read, it will read in the xref table so it can locate other objects and 
 revisions.  This also allows skipping objects which have been rendered 
 obsolete (per the xref table)[2].  It also allows the minimum amount of 
 information to be read when the file is loaded, and then subsequent 
 information will be loaded if and when it is requested.  This is all laid out 
 in the official PDF specification, ISO 32000-1:2008.
 Existing code will be re-used where possible, but this will require new 
 classes in order to accommodate the lazy reading which is a very different 
 paradigm from the existing parser.  Using separate classes will also 
 eliminate the possibility of regression bugs from making their way into the 
 PDDocument or BaseParser classes.  Changes to existing classes will be kept 
 to a minimum in order to prevent regression bugs.
 [1] Section 7.5.5 Conforming readers should read a PDF file from its end
 [2] Section 7.5.4 the entire file need not be read to locate any particular 
 object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-1000) Conforming parser

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1000:

Fix Version/s: 1.7.0

 Conforming parser
 -

 Key: PDFBOX-1000
 URL: https://issues.apache.org/jira/browse/PDFBOX-1000
 Project: PDFBox
  Issue Type: New Feature
  Components: Parsing
Affects Versions: 1.6.0
Reporter: Adam Nichols
Assignee: Adam Nichols
 Fix For: 1.7.0, 2.0.0

 Attachments: COSUnread.java, ConformingPDDocument.java, 
 ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, 
 PDFLexer.java, PDFStreamConstants.java, PDFStreamConstants.java, 
 XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf


 A conforming parser will start at the end of the file and read backward until 
 it has read the EOF marker, the xref location, and trailer[1].  Once this is 
 read, it will read in the xref table so it can locate other objects and 
 revisions.  This also allows skipping objects which have been rendered 
 obsolete (per the xref table)[2].  It also allows the minimum amount of 
 information to be read when the file is loaded, and then subsequent 
 information will be loaded if and when it is requested.  This is all laid out 
 in the official PDF specification, ISO 32000-1:2008.
 Existing code will be re-used where possible, but this will require new 
 classes in order to accommodate the lazy reading which is a very different 
 paradigm from the existing parser.  Using separate classes will also 
 eliminate the possibility of regression bugs from making their way into the 
 PDDocument or BaseParser classes.  Changes to existing classes will be kept 
 to a minimum in order to prevent regression bugs.
 [1] Section 7.5.5 Conforming readers should read a PDF file from its end
 [2] Section 7.5.4 the entire file need not be read to locate any particular 
 object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1000) Conforming parser

2014-10-10 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167409#comment-14167409
 ] 

John Hewson commented on PDFBOX-1000:
-

This issue has been open for 3 years, despite ConformingPDFParser being 
introduced in PDFBox 1.7.0. Can we close this issue now? Any further changes 
should be new issues.

 Conforming parser
 -

 Key: PDFBOX-1000
 URL: https://issues.apache.org/jira/browse/PDFBOX-1000
 Project: PDFBox
  Issue Type: New Feature
  Components: Parsing
Affects Versions: 1.6.0
Reporter: Adam Nichols
Assignee: Adam Nichols
 Fix For: 1.7.0, 2.0.0

 Attachments: COSUnread.java, ConformingPDDocument.java, 
 ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, 
 PDFLexer.java, PDFStreamConstants.java, PDFStreamConstants.java, 
 XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf


 A conforming parser will start at the end of the file and read backward until 
 it has read the EOF marker, the xref location, and trailer[1].  Once this is 
 read, it will read in the xref table so it can locate other objects and 
 revisions.  This also allows skipping objects which have been rendered 
 obsolete (per the xref table)[2].  It also allows the minimum amount of 
 information to be read when the file is loaded, and then subsequent 
 information will be loaded if and when it is requested.  This is all laid out 
 in the official PDF specification, ISO 32000-1:2008.
 Existing code will be re-used where possible, but this will require new 
 classes in order to accommodate the lazy reading which is a very different 
 paradigm from the existing parser.  Using separate classes will also 
 eliminate the possibility of regression bugs from making their way into the 
 PDDocument or BaseParser classes.  Changes to existing classes will be kept 
 to a minimum in order to prevent regression bugs.
 [1] Section 7.5.5 Conforming readers should read a PDF file from its end
 [2] Section 7.5.4 the entire file need not be read to locate any particular 
 object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-830) Setting of logical page numbers

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-830:
---
Affects Version/s: 1.3.1

 Setting of logical page numbers
 ---

 Key: PDFBOX-830
 URL: https://issues.apache.org/jira/browse/PDFBOX-830
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Affects Versions: 1.3.1
 Environment: JDK 1.6.0_21, PDFBox 1.3.0-snapshot
Reporter: MH

 When viewing PDFs processed with PDFBox, Acrobat Reader / Foxit Reader show 
 logical page numbers. I guess PDFBox is somehow generating such logic page 
 numbers. However, the current automatic logic page numbering is not always 
 as expected/wished. So an API to change/set these logic page numbers would be 
 usefull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-830) Setting of logical page numbers

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-830:
---
Fix Version/s: 2.0.0

 Setting of logical page numbers
 ---

 Key: PDFBOX-830
 URL: https://issues.apache.org/jira/browse/PDFBOX-830
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Affects Versions: 1.3.1
 Environment: JDK 1.6.0_21, PDFBox 1.3.0-snapshot
Reporter: MH
 Fix For: 2.0.0


 When viewing PDFs processed with PDFBox, Acrobat Reader / Foxit Reader show 
 logical page numbers. I guess PDFBox is somehow generating such logic page 
 numbers. However, the current automatic logic page numbering is not always 
 as expected/wished. So an API to change/set these logic page numbers would be 
 usefull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-800) Wrong text extract from vertical textboxes in pdf files

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-800:
---
Affects Version/s: 1.7.0

 Wrong text extract from vertical textboxes in pdf files
 ---

 Key: PDFBOX-800
 URL: https://issues.apache.org/jira/browse/PDFBOX-800
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.7.0
 Environment: Windows 7, VS 2010 C#, Tika Library
Reporter: Sandor Dj
 Fix For: 2.0.0

 Attachments: problemdoc.doc, problemdoc.pdf


 Vertical textboxes in pdf files are not extracted correctly (using the tika 
 library in C#).
 For example if there is a vertical textbox hello in a pdf file (!WITHOUT! 
 line breaks):
 H
 E
 L
 L
 O
 the parser returns 5 strings, each with a single letter, even there is NO 
 line break after every letter.
 Is there a option to avoid this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-824) Support for PDF/A (long-term archiving)

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-824.
--
Resolution: Won't Fix

There's nothing to stop you generating PDF/A with PDFBox. Modifying all the 
APIs to stop you doing anything invalid in PDF/A is not viable.

 Support for PDF/A (long-term archiving)
 ---

 Key: PDFBOX-824
 URL: https://issues.apache.org/jira/browse/PDFBOX-824
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel
Reporter: MH

 Apache FOP already supports PDF/A by setting a renderer option
   pdf-a-mode, PDF/A-1b
 it would be a usefull feature for PDFBox to also support this (and other 
 PDF/A derivates).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-800) Wrong text extract from vertical textboxes in pdf files

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-800:
---
Fix Version/s: 2.0.0

 Wrong text extract from vertical textboxes in pdf files
 ---

 Key: PDFBOX-800
 URL: https://issues.apache.org/jira/browse/PDFBOX-800
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.7.0
 Environment: Windows 7, VS 2010 C#, Tika Library
Reporter: Sandor Dj
 Fix For: 2.0.0

 Attachments: problemdoc.doc, problemdoc.pdf


 Vertical textboxes in pdf files are not extracted correctly (using the tika 
 library in C#).
 For example if there is a vertical textbox hello in a pdf file (!WITHOUT! 
 line breaks):
 H
 E
 L
 L
 O
 the parser returns 5 strings, each with a single letter, even there is NO 
 line break after every letter.
 Is there a option to avoid this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-720) Inconsistency in parsing PDFs between Windows and Linux

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-720.

Resolution: Not a Problem

Having heard of no issues for 3 years, I presume this is no longer a problem.

 Inconsistency in parsing PDFs between Windows and Linux
 ---

 Key: PDFBOX-720
 URL: https://issues.apache.org/jira/browse/PDFBOX-720
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
 Environment: Windows Vista 32-bit, Sun JDK 1.5.0_06, PDFBox HEAD tag 
 (revision 941073)
 vs.
 Red Hat Linux, 2.6.9-67.ELsmp kernel, Java 1.5.0_06, PDFBox HEAD tag 
 (revision 941073)
Reporter: Adam Nichols
 Attachments: 238_Page_Report.pdf


 Run this same code using the same PDF and you'll get different results on 
 Linux than on Windows.  Regardless of which one you consider correct, it 
 should be consistent.
 doc = PDDocument.load(inputFile);
 PDDocumentOutline outline = doc.getDocumentCatalog().getDocumentOutline();
 if(outline == null)
 System.out.println(Document outline was null);
 else
 System.out.println(Document outline was not null);
 Some interesting notes about this PDF: Seems that Acrobat Distiller 8.1.0 
 basically just concatenated two PDFs into one.  There are two trailers, they 
 both refer to object 1600 0 as the root.  1600 0 appears multiple times, 
 one time it doesn't have Outlines in the dictionary, the other time it has 
 Outlines 1667 0.  Windows picks up the latter and shows the outline 
 correctly.  Linux picks up the former and thus returns null for the outline.  
 I tried debugging through PDFParser and BaseParser, but I'm not really sure 
 how that code works and I quickly got lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-577) TextPosition should expose its bounding box

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-577.
--
Resolution: Invalid

The Ascent and Descent values in the PDF dictionary are **not** used when 
computing glyph positions. In fact, it's common for these values to be missing 
or invalid. In any case, the BBox value is actually what is wanted, but that 
suffers from the same problem.

If somebody wants to tackle this problem in the future, it can be fairly easily 
done in 2.0 with the new APIs provided by PDFont which can extract the BBox 
from the embedded or substituted font - or even compute exact bounds from the 
glyph outlines. A new issue or patch addressing this is welcome.

 TextPosition should expose its bounding box
 ---

 Key: PDFBOX-577
 URL: https://issues.apache.org/jira/browse/PDFBOX-577
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Reporter: Villu Ruusmann
 Attachments: 
 0001-PDFont.java-Add-methods-to-retreive-the-Ascent-and-D.patch, 
 AFM-getHeight.png, AFM-getUpperRightY.png, textposition-randombg.zip


 It does not seem to be possible to calculate the bounding box of a 
 TextPosition.
 IIUC, TextPosition#getY is the baseline of the text and 
 TextPosition#getHeight is the absolute height of the text. When I subtract 
 the latter from the former I get a top line, but this is only correct if the 
 text does not contain descender characters.
 Below is a screenshot (AFM-getHeight.png) which shows the bounding boxes of 
 TextPositions calculated as {#getX(), #getY() - #getHeight, #getWidth, 
 #getHeight} painted in random colors. For example, the bounding boxes of 
 parentheses are severely misplaced, which makes the line-by-line text 
 extraction impossible.
 Right now I've solved the problem by tweaking AFM FontMetrics code so that it 
 returns BoundingBox#getUpperRightY instead of BoundingBox#getHeight when 
 queried via PDSimpleFont#getFontHeight(byte[], int, int). Another screenshot 
 (AFM-getUpperRightY.png) shows how this restores the previously broken text 
 extraction ability.
 It seems like a good idea to rework TextPosition so that it would be aware of 
 its bounding box:
 *) Replace methods PDSimpleFont#getFontWidth(byte[], int, int) and 
 PDSimpleFont#getFontHeight(byte[], int, int) with a single method 
 PDSimpleFont#getFontBoundingBox(byte[], int, int)
 *) Replace the constructor TextPosition(Matrix, Matrix) with 
 TextPosition(Matrix, BoundingBox)
 *) Add new methods TextPosition#getBoundingBox, 
 TextPosition#getBoundingBoxDir. This shouldn't affect existing application 
 clients, because TextPosition#getY and TextPosition#getHeight remain in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-566) PDChoiceField does not handle some valid PDFs

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-566:
---
Summary: PDChoiceField does not handle some valid PDFs   (was: 
PDChoiceField does not handle some valid PDF's )

 PDChoiceField does not handle some valid PDFs 
 --

 Key: PDFBOX-566
 URL: https://issues.apache.org/jira/browse/PDFBOX-566
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Reporter: Yonas Jongkind
 Attachments: PDChoiceField.java, PDChoiceField.java.diff


 The problem is that there are cases where sometimes the format is 
 periodically a array and/or a singleton. The attached fix allows it to work 
 smoothly for either system and for mixed cases. May also be more efficient.
 See attached diff and corrected source file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-448) Columns in text not extracted separately

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-448:
---
Summary: Columns in text not extracted separately  (was: Columns in text 
not extracted separately.  )

 Columns in text not extracted separately
 

 Key: PDFBOX-448
 URL: https://issues.apache.org/jira/browse/PDFBOX-448
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Reporter: Brian Carrier
 Attachments: WBPaper3120.pdf


 The paper that is attached to PDFBOX-80 has two columns of text, but the 
 extracted text is not separated by column.  Instead it combines the text in 
 each column on each line. 
 PDFTextStripper has a notion of columns and articles / beads, but they are 
 not being used with this file.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PDFBOX-566) PDChoiceField does not handle some valid PDFs

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-566.
--
Resolution: Invalid

As far as I can tell this no longer applies, at least not in 2.0.

 PDChoiceField does not handle some valid PDFs 
 --

 Key: PDFBOX-566
 URL: https://issues.apache.org/jira/browse/PDFBOX-566
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Reporter: Yonas Jongkind
 Attachments: PDChoiceField.java, PDChoiceField.java.diff


 The problem is that there are cases where sometimes the format is 
 periodically a array and/or a singleton. The attached fix allows it to work 
 smoothly for either system and for mixed cases. May also be more efficient.
 See attached diff and corrected source file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-448) Columns in text not extracted separately

2014-10-10 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-448:
---
Fix Version/s: 2.0.0

 Columns in text not extracted separately
 

 Key: PDFBOX-448
 URL: https://issues.apache.org/jira/browse/PDFBOX-448
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 1.8.7, 2.0.0
Reporter: Brian Carrier
 Fix For: 2.0.0

 Attachments: WBPaper3120.pdf


 The paper that is attached to PDFBOX-80 has two columns of text, but the 
 extracted text is not separated by column.  Instead it combines the text in 
 each column on each line. 
 PDFTextStripper has a notion of columns and articles / beads, but they are 
 not being used with this file.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >