[jira] [Commented] (PDFBOX-5795) Crash for Softmask with incorrect backdrop color components

2024-04-02 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833175#comment-17833175
 ] 

Daniel Persson commented on PDFBOX-5795:


Hi  [~tilman] 

 

Seems reasonable. I've tried it, and it seems to work just fine visually. And 
anyway, my patch would have needed a null pointer check as well, so we don't 
introduce that error. I vote for your suggestion.

 

Best regards

Daniel

> Crash for Softmask with incorrect backdrop color components
> ---
>
> Key: PDFBOX-5795
> URL: https://issues.apache.org/jira/browse/PDFBOX-5795
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Daniel Persson
>Priority: Major
> Attachments: borsen-2065-20111030-1-p4.pdf, crashfix.patch
>
>
> This error occured in our production of an old archive. None of the files 
> crashed in any other viewer (Chrome, Adobe, Firefox, Poppler a.s.o). 
>  
> I've read up on the subject in the 1.7 specification, and it seems like 
> PDFBox is following the specification, but not being able to open these files 
> seems a bit too strict.
>  
> The easiest way to reproduce is just to open the attached file with the 
> debugger.
> {code:java}
> java -jar debugger-app-4.0.0-SNAPSHOT.jar borsen-2065-20111030-1-p4.pdf {code}
> The application will crash with this exception:
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 1
>     org.apache.pdfbox.pdmodel.graphics.color.PDColor.toRGB(PDColor.java:155)
>     
> org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1696)
>     
> org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1573)
>     
> org.apache.pdfbox.rendering.PageDrawer.applySoftMaskToPaint(PageDrawer.java:604)
>     
> org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroupOnGraphics(PageDrawer.java:1549)
>     
> org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroup(PageDrawer.java:1489)
>     
> org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:81)
>     
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
>     
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
>     
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
>     
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158)
>     org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:270)
>     org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:346)
>     
> org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:527)
>     
> org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:506)
>     java.base/java.lang.Thread.run(Thread.java:833)
>  {code}
> My solution, added as a patch, is to add a fallback to the colorspace 
> available in the graphical context. This is working for the files I've tried 
> so far.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5795) Crash for Softmask with incorrect backdrop color components

2024-04-02 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-5795:
--

 Summary: Crash for Softmask with incorrect backdrop color 
components
 Key: PDFBOX-5795
 URL: https://issues.apache.org/jira/browse/PDFBOX-5795
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Daniel Persson
 Attachments: borsen-2065-20111030-1-p4.pdf, crashfix.patch

This error occured in our production of an old archive. None of the files 
crashed in any other viewer (Chrome, Adobe, Firefox, Poppler a.s.o). 

 

I've read up on the subject in the 1.7 specification, and it seems like PDFBox 
is following the specification, but not being able to open these files seems a 
bit too strict.

 

The easiest way to reproduce is just to open the attached file with the 
debugger.
{code:java}
java -jar debugger-app-4.0.0-SNAPSHOT.jar borsen-2065-20111030-1-p4.pdf {code}
The application will crash with this exception:
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for 
length 1
    org.apache.pdfbox.pdmodel.graphics.color.PDColor.toRGB(PDColor.java:155)
    
org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1696)
    
org.apache.pdfbox.rendering.PageDrawer$TransparencyGroup.(PageDrawer.java:1573)
    
org.apache.pdfbox.rendering.PageDrawer.applySoftMaskToPaint(PageDrawer.java:604)
    
org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroupOnGraphics(PageDrawer.java:1549)
    
org.apache.pdfbox.rendering.PageDrawer.showTransparencyGroup(PageDrawer.java:1489)
    
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:81)
    
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
    
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
    
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
    
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158)
    org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:270)
    org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:346)
    
org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:527)
    
org.apache.pdfbox.debugger.pagepane.PagePane$RenderWorker.doInBackground(PagePane.java:506)
    java.base/java.lang.Thread.run(Thread.java:833)
 {code}
My solution, added as a patch, is to add a fallback to the colorspace available 
in the graphical context. This is working for the files I've tried so far.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5788) ID References changes when saving PDFs.

2024-03-19 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-5788:
--

 Summary: ID References changes when saving PDFs.
 Key: PDFBOX-5788
 URL: https://issues.apache.org/jira/browse/PDFBOX-5788
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.2 PDFBox, 3.0.1 PDFBox
Reporter: Daniel Persson


 
{code:java}
private static void runPDF(String name) throws IOException, 
NoSuchAlgorithmException {
PDDocument doc = Loader.loadPDF(new File(name));

File tmpFile = File.createTempFile("tmp", ".pdf");
doc.save(tmpFile);
byte[] data = Files.readAllBytes(Paths.get(tmpFile.getAbsolutePath()));
byte[] hash = MessageDigest.getInstance("SHA256").digest(data);
System.out.println(encodeHexString(hash));

File tmpFile2 = File.createTempFile("tmp", ".pdf");
doc.save(tmpFile2);
byte[] data2 = Files.readAllBytes(Paths.get(tmpFile2.getAbsolutePath()));
byte[] hash2 = MessageDigest.getInstance("SHA256").digest(data2);
System.out.println(encodeHexString(hash2));
} {code}
Not sure, this might be expected behavior but it makes my testing framework a 
bit less robust so I thought I'd report it here. In the newer versions 3.0.2 
and 3.0.1 when you save a PDF the second time the reference ID's continue 
incrementing which means that the PDF stored the first time is not identical to 
the second time.

In my test case depending on what thread executes first there might be 
difference in the run and the expected result changes.

I've not seen this with 3.0.0 and earlier versions of PDFBox.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5442) Rending with the incorrect color

2022-05-24 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-5442:
--

 Summary: Rending with the incorrect color
 Key: PDFBOX-5442
 URL: https://issues.apache.org/jira/browse/PDFBOX-5442
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.26
Reporter: Daniel Persson
 Attachments: 23115_133_1_25693_17.pdf, 23115_133_1_25693_171.png

Hi Team.

 

We have noticed that PDFBox sometimes renders with brighter colors than other 
renderers and it doesn't matter that much on photos but when a PDF is split 
into multiple smaller images and all images aren't rendered with the same hue 
you will have a strangely looking image.

 

To reproduce: Open PDF in Debugger or render an image with PDFToImage.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5294) Incorrect rendering of Type3 character

2021-10-12 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-5294:
---
Attachment: issue-1.pdf
incorrect.png
correct.png

> Incorrect rendering of Type3 character
> --
>
> Key: PDFBOX-5294
> URL: https://issues.apache.org/jira/browse/PDFBOX-5294
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.24
>Reporter: Daniel Persson
>Priority: Major
> Attachments: correct.png, incorrect.png, issue-1.pdf, 
> type3resources.patch
>
>
> Hi Team.
>  
> We got a report from one of our customers that their images weren't rendered 
> correctly. Looking into it, we found that a Type3 character contained an 
> image. 
>  
> That image was present on the character glyph resource table and not the font 
> resource table which is a bit strange if you read the specification this 
> should not be allowed. 
> Then again Chrome, Opera, IE 11, and Adobe render this file correctly. But 
> Safari, Firefox, and Poppler are not rendering this PDF correctly.
>  
> I've created a small patch that will solve this issue.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5294) Incorrect rendering of Type3 character

2021-10-12 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-5294:
---
Attachment: type3resources.patch

> Incorrect rendering of Type3 character
> --
>
> Key: PDFBOX-5294
> URL: https://issues.apache.org/jira/browse/PDFBOX-5294
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.24
>Reporter: Daniel Persson
>Priority: Major
> Attachments: type3resources.patch
>
>
> Hi Team.
>  
> We got a report from one of our customers that their images weren't rendered 
> correctly. Looking into it, we found that a Type3 character contained an 
> image. 
>  
> That image was present on the character glyph resource table and not the font 
> resource table which is a bit strange if you read the specification this 
> should not be allowed. 
> Then again Chrome, Opera, IE 11, and Adobe render this file correctly. But 
> Safari, Firefox, and Poppler are not rendering this PDF correctly.
>  
> I've created a small patch that will solve this issue.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5294) Incorrect rendering of Type3 character

2021-10-12 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-5294:
--

 Summary: Incorrect rendering of Type3 character
 Key: PDFBOX-5294
 URL: https://issues.apache.org/jira/browse/PDFBOX-5294
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.24
Reporter: Daniel Persson
 Attachments: type3resources.patch

Hi Team.

 

We got a report from one of our customers that their images weren't rendered 
correctly. Looking into it, we found that a Type3 character contained an image. 

 

That image was present on the character glyph resource table and not the font 
resource table which is a bit strange if you read the specification this should 
not be allowed. 


Then again Chrome, Opera, IE 11, and Adobe render this file correctly. But 
Safari, Firefox, and Poppler are not rendering this PDF correctly.

 

I've created a small patch that will solve this issue.

 

Best regards

Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure

2021-04-21 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327107#comment-17327107
 ] 

Daniel Persson commented on PDFBOX-5170:


Hi [~mkl]

 

No, I thought there where a difference between tables and streams and that you 
only needed one of them.

"Applications that do not support PDF 1.5 cannot access objects that are 
referenced by cross-reference streams. If a file uses cross-reference streams 
exclusively, it cannot be opened by such applications."

So my understanding was that you only needed the table to read the document, 
but the streams were more efficient but not supported by older readers. But I'm 
still trying to figure this out.

 

Best regards

Daniel

> Compression creates issue with Page structure
> -
>
> Key: PDFBOX-5170
> URL: https://issues.apache.org/jira/browse/PDFBOX-5170
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Daniel Persson
>Priority: Minor
>
>  
> Hi Team.
>  
> PDFBox version 3.0.0-RC1
> pdftoppm version 21.04.0
> mupdf-gl version 1.18.0
>  
> This might be an unusual issue but might needs to be checked. The simple code 
> below creates a PDF that can't be viewed with Poppler because of "error: 
> malformed page tree"​
> {code:java}
> PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
> testPdf.save(new File("output.pdf"));
> testPdf.close();
> PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
> testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
> testPdf2.close();
> {code}
> This is not a content issue because all PDFs from the same producer have the 
> same problem, I've just picked an example.
> Best regards
> Daniel
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure

2021-04-21 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326553#comment-17326553
 ] 

Daniel Persson commented on PDFBOX-5170:


Hi [~mkl]

 

I can verify that your fix solves the issue. Another thing that might be 
related when reading the specification about Cross-Reference Stream (new 
knowledge for me) is that they may not be encrypted and need a different flag 
if compressed.

But do I understand the specification correctly that this is extra information 
for performance and not required to present the document correctly?

 

Best regards

Daniel

> Compression creates issue with Page structure
> -
>
> Key: PDFBOX-5170
> URL: https://issues.apache.org/jira/browse/PDFBOX-5170
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Daniel Persson
>Priority: Minor
>
>  
> Hi Team.
>  
> PDFBox version 3.0.0-RC1
> pdftoppm version 21.04.0
> mupdf-gl version 1.18.0
>  
> This might be an unusual issue but might needs to be checked. The simple code 
> below creates a PDF that can't be viewed with Poppler because of "error: 
> malformed page tree"​
> {code:java}
> PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
> testPdf.save(new File("output.pdf"));
> testPdf.close();
> PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
> testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
> testPdf2.close();
> {code}
> This is not a content issue because all PDFs from the same producer have the 
> same problem, I've just picked an example.
> Best regards
> Daniel
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure

2021-04-20 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326282#comment-17326282
 ] 

Daniel Persson commented on PDFBOX-5170:


Hi Tilman.

 

I've tested it now with SNAPSHOT pdfbox-3.0.0-20210420.210105-2628.jar.

 

The issue still exists. Compressing the file makes it not readable. If I turn 
compression off, it will be readable.

In the example above, output.pdf is not readable, but output2.pdf is.

 

Best regards

Daniel

> Compression creates issue with Page structure
> -
>
> Key: PDFBOX-5170
> URL: https://issues.apache.org/jira/browse/PDFBOX-5170
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Daniel Persson
>Priority: Minor
>
>  
> Hi Team.
>  
> PDFBox version 3.0.0-RC1
> pdftoppm version 21.04.0
> mupdf-gl version 1.18.0
>  
> This might be an unusual issue but might needs to be checked. The simple code 
> below creates a PDF that can't be viewed with Poppler because of "error: 
> malformed page tree"​
> {code:java}
> PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
> testPdf.save(new File("output.pdf"));
> testPdf.close();
> PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
> testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
> testPdf2.close();
> {code}
> This is not a content issue because all PDFs from the same producer have the 
> same problem, I've just picked an example.
> Best regards
> Daniel
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5170) Compression creates issue with Page structure

2021-04-20 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-5170:
---
Description: 
 

Hi Team.

 

PDFBox version 3.0.0-RC1

pdftoppm version 21.04.0

mupdf-gl version 1.18.0

 

This might be an unusual issue but might needs to be checked. The simple code 
below creates a PDF that can't be viewed with Poppler because of "error: 
malformed page tree"​
{code:java}
PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
testPdf.save(new File("output.pdf"));
testPdf.close();
PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
testPdf2.close();
{code}
This is not a content issue because all PDFs from the same producer have the 
same problem, I've just picked an example.

Best regards

Daniel

 

  was:
 

Hi Team.

 

This might be an unusual issue but might needs to be checked. The simple code 
below creates a PDF that can't be viewed with Poppler because of "error: 
malformed page tree"​
{code:java}
PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
testPdf.save(new File("output.pdf"));
testPdf.close();
PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
testPdf2.close();
{code}
This is not a content issue because all PDFs from the same producer have the 
same problem, I've just picked an example.

Best regards

Daniel

 


> Compression creates issue with Page structure
> -
>
> Key: PDFBOX-5170
> URL: https://issues.apache.org/jira/browse/PDFBOX-5170
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Daniel Persson
>Priority: Minor
>
>  
> Hi Team.
>  
> PDFBox version 3.0.0-RC1
> pdftoppm version 21.04.0
> mupdf-gl version 1.18.0
>  
> This might be an unusual issue but might needs to be checked. The simple code 
> below creates a PDF that can't be viewed with Poppler because of "error: 
> malformed page tree"​
> {code:java}
> PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
> testPdf.save(new File("output.pdf"));
> testPdf.close();
> PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
> testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
> testPdf2.close();
> {code}
> This is not a content issue because all PDFs from the same producer have the 
> same problem, I've just picked an example.
> Best regards
> Daniel
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5170) Compression creates issue with Page structure

2021-04-20 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325622#comment-17325622
 ] 

Daniel Persson commented on PDFBOX-5170:


Still problem with uploading larger PDFs so added it to an old directory for 
incorrect PDFs.

https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing

> Compression creates issue with Page structure
> -
>
> Key: PDFBOX-5170
> URL: https://issues.apache.org/jira/browse/PDFBOX-5170
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Daniel Persson
>Priority: Minor
>
>  
> Hi Team.
>  
> This might be an unusual issue but might needs to be checked. The simple code 
> below creates a PDF that can't be viewed with Poppler because of "error: 
> malformed page tree"​
> {code:java}
> PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
> testPdf.save(new File("output.pdf"));
> testPdf.close();
> PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
> testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
> testPdf2.close();
> {code}
> This is not a content issue because all PDFs from the same producer have the 
> same problem, I've just picked an example.
> Best regards
> Daniel
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5170) Compression creates issue with Page structure

2021-04-20 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-5170:
--

 Summary: Compression creates issue with Page structure
 Key: PDFBOX-5170
 URL: https://issues.apache.org/jira/browse/PDFBOX-5170
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.0 PDFBox
Reporter: Daniel Persson


 

Hi Team.

 

This might be an unusual issue but might needs to be checked. The simple code 
below creates a PDF that can't be viewed with Poppler because of "error: 
malformed page tree"​
{code:java}
PDDocument testPdf = Loader.loadPDF(new File("input.pdf"));
testPdf.save(new File("output.pdf"));
testPdf.close();
PDDocument testPdf2 = Loader.loadPDF(new File("input.pdf"));
testPdf2.save(new File("output2.pdf"), CompressParameters.NO_COMPRESSION);
testPdf2.close();
{code}
This is not a content issue because all PDFs from the same producer have the 
same problem, I've just picked an example.

Best regards

Daniel

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5135) Image can't render text.

2021-03-18 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304413#comment-17304413
 ] 

Daniel Persson commented on PDFBOX-5135:


Hi Tilman.

 

Great work, but I guess this will not be in the upcoming release?

 

Best regards

Daniel

> Image can't render text.
> 
>
> Key: PDFBOX-5135
> URL: https://issues.apache.org/jira/browse/PDFBOX-5135
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.22
>Reporter: Daniel Persson
>Assignee: Tilman Hausherr
>Priority: Major
> Attachments: 514867_709_1_18803-27-1.jpg, 
> 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, 
> 517551_709_1_19315-23-ppm-1.jpg, image-2021-03-18-18-02-17-417.png
>
>
> Hi Team
>  
> We have found a PDF that can't be rendered correctly in PDFBox. It renders 
> correctly in Adobe and Poppler.
>  
> PDFs could not be uploaded so I've added them to a google drive folder. If 
> that doesn't work please tell me and provide a way to send them.
>  
> [https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing]
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5135) Image can't render text.

2021-03-18 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-5135:
---
Description: 
Hi Team

 

We have found a PDF that can't be rendered correctly in PDFBox. It renders 
correctly in Adobe and Poppler.

 

PDFs could not be uploaded so I've added them to a google drive folder. If that 
doesn't work please tell me and provide a way to send them.

 

[https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing]

 

Best regards

Daniel

  was:
Hi Team

 

We have found a PDF that can't be rendered correctly in PDFBox. It renders 
correctly in Adobe and Poppler.

 

PDFs could not be uploaded so I've added them to a google drive folder. If that 
don't work please tell me and provide a way to send them.

 

https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing

 

Best regards

Daniel


> Image can't render text.
> 
>
> Key: PDFBOX-5135
> URL: https://issues.apache.org/jira/browse/PDFBOX-5135
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.22
>Reporter: Daniel Persson
>Priority: Major
> Attachments: 514867_709_1_18803-27-1.jpg, 
> 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, 
> 517551_709_1_19315-23-ppm-1.jpg
>
>
> Hi Team
>  
> We have found a PDF that can't be rendered correctly in PDFBox. It renders 
> correctly in Adobe and Poppler.
>  
> PDFs could not be uploaded so I've added them to a google drive folder. If 
> that doesn't work please tell me and provide a way to send them.
>  
> [https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing]
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5135) Image can't render text.

2021-03-18 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-5135:
---
Description: 
Hi Team

 

We have found a PDF that can't be rendered correctly in PDFBox. It renders 
correctly in Adobe and Poppler.

 

PDFs could not be uploaded so I've added them to a google drive folder. If that 
don't work please tell me and provide a way to send them.

 

https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing

 

Best regards

Daniel

  was:
Hi Team

 

We have found a PDF that can't be rendered correctly in PDFBox. It renders 
correctly in Adobe and Poppler.

 

Best regards

Daniel


> Image can't render text.
> 
>
> Key: PDFBOX-5135
> URL: https://issues.apache.org/jira/browse/PDFBOX-5135
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.22
>Reporter: Daniel Persson
>Priority: Major
> Attachments: 514867_709_1_18803-27-1.jpg, 
> 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, 
> 517551_709_1_19315-23-ppm-1.jpg
>
>
> Hi Team
>  
> We have found a PDF that can't be rendered correctly in PDFBox. It renders 
> correctly in Adobe and Poppler.
>  
> PDFs could not be uploaded so I've added them to a google drive folder. If 
> that don't work please tell me and provide a way to send them.
>  
> https://drive.google.com/drive/folders/1mddhI_rpvyNojj4MKMyunRBrBOQe54HF?usp=sharing
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5135) Image can't render text.

2021-03-18 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-5135:
---
Attachment: 514867_709_1_18803-27-1.jpg
517551_709_1_19315-23-1.jpg
514867_709_1_18803-27-ppm-1.jpg
517551_709_1_19315-23-ppm-1.jpg

> Image can't render text.
> 
>
> Key: PDFBOX-5135
> URL: https://issues.apache.org/jira/browse/PDFBOX-5135
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.22
>Reporter: Daniel Persson
>Priority: Major
> Attachments: 514867_709_1_18803-27-1.jpg, 
> 514867_709_1_18803-27-ppm-1.jpg, 517551_709_1_19315-23-1.jpg, 
> 517551_709_1_19315-23-ppm-1.jpg
>
>
> Hi Team
>  
> We have found a PDF that can't be rendered correctly in PDFBox. It renders 
> correctly in Adobe and Poppler.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5135) Image can't render text.

2021-03-18 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-5135:
--

 Summary: Image can't render text.
 Key: PDFBOX-5135
 URL: https://issues.apache.org/jira/browse/PDFBOX-5135
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.22
Reporter: Daniel Persson


Hi Team

 

We have found a PDF that can't be rendered correctly in PDFBox. It renders 
correctly in Adobe and Poppler.

 

Best regards

Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4917) Images are blurry after updating to 2.0.20

2020-07-16 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159438#comment-17159438
 ] 

Daniel Persson commented on PDFBOX-4917:


Hi again, update.

 

I did a git checkout on the 2.0 branch and tested to run the application there, 
and the image created on that branch did not have the issue mentioned above. 
Perhaps you can just verify if solved in the upcoming release. If so, we will 
be waiting patiently for the release of the coming weeks.

 

Best regards

Daniel

> Images are blurry after updating to 2.0.20
> --
>
> Key: PDFBOX-4917
> URL: https://issues.apache.org/jira/browse/PDFBOX-4917
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Critical
> Attachments: issue.pdf, pdfbox-app-2.0.19.jpg, pdfbox-app-2.0.20.jpg
>
>
> Hi team.
> We have noticed that after updating to PDFBox to 2.0.20 some images are 
> blurry and unreadable even in 300 DPI.
> We have rendered both these images with the same parameters just different 
> versions of PDFBox.
> {code:java}
> java -jar pdfbox-app-2.0.19.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf
> java -jar pdfbox-app-2.0.20.jar PDFToImage -dpi 300 -quality 0.95 
> issue.pdf{code}
> Hope we can find a solution to this problem, perhaps it is related to 
> PDFBOX-4516?
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4917) Images are blurry after updating to 2.0.20

2020-07-16 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-4917:
--

 Summary: Images are blurry after updating to 2.0.20
 Key: PDFBOX-4917
 URL: https://issues.apache.org/jira/browse/PDFBOX-4917
 Project: PDFBox
  Issue Type: Bug
Reporter: Daniel Persson
 Attachments: issue.pdf, pdfbox-app-2.0.19.jpg, pdfbox-app-2.0.20.jpg

Hi team.

We have noticed that after updating to PDFBox to 2.0.20 some images are blurry 
and unreadable even in 300 DPI.

We have rendered both these images with the same parameters just different 
versions of PDFBox.
{code:java}
java -jar pdfbox-app-2.0.19.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf

java -jar pdfbox-app-2.0.20.jar PDFToImage -dpi 300 -quality 0.95 
issue.pdf{code}
Hope we can find a solution to this problem, perhaps it is related to 
PDFBOX-4516?

Best regards

Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4917) Images are blurry after updating to 2.0.20

2020-07-16 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4917:
---
Attachment: pdfbox-app-2.0.20.jpg
pdfbox-app-2.0.19.jpg
issue.pdf

> Images are blurry after updating to 2.0.20
> --
>
> Key: PDFBOX-4917
> URL: https://issues.apache.org/jira/browse/PDFBOX-4917
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Critical
> Attachments: issue.pdf, pdfbox-app-2.0.19.jpg, pdfbox-app-2.0.20.jpg
>
>
> Hi team.
> We have noticed that after updating to PDFBox to 2.0.20 some images are 
> blurry and unreadable even in 300 DPI.
> We have rendered both these images with the same parameters just different 
> versions of PDFBox.
> {code:java}
> java -jar pdfbox-app-2.0.19.jar PDFToImage -dpi 300 -quality 0.95 issue.pdf
> java -jar pdfbox-app-2.0.20.jar PDFToImage -dpi 300 -quality 0.95 
> issue.pdf{code}
> Hope we can find a solution to this problem, perhaps it is related to 
> PDFBOX-4516?
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4852) Image rendering issue 3

2020-05-29 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4852:
---
Attachment: issue3.pdf
issue3-pdfbox.jpg
issue3-poppler.jpg

> Image rendering issue 3
> ---
>
> Key: PDFBOX-4852
> URL: https://issues.apache.org/jira/browse/PDFBOX-4852
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.19
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: issue3-pdfbox.jpg, issue3-poppler.jpg, issue3.pdf
>
>
> Text is unreadable in image rendered using PDFToImage.
>  
> Text is readable if you render with Poppler instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4852) Image rendering issue 3

2020-05-29 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-4852:
--

 Summary: Image rendering issue 3
 Key: PDFBOX-4852
 URL: https://issues.apache.org/jira/browse/PDFBOX-4852
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.19
Reporter: Daniel Persson
 Attachments: issue3-pdfbox.jpg, issue3-poppler.jpg, issue3.pdf

Text is unreadable in image rendered using PDFToImage.

 

Text is readable if you render with Poppler instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4851) Image rendering issue 2

2020-05-29 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-4851:
--

 Summary: Image rendering issue 2
 Key: PDFBOX-4851
 URL: https://issues.apache.org/jira/browse/PDFBOX-4851
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.19
Reporter: Daniel Persson
 Attachments: issue2-pdfbox.jpg, issue2-poppler.jpg, issue2.pdf

Text is missing in image rendered using PDFToImage.

 

Text is present if you render with Poppler instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4851) Image rendering issue 2

2020-05-29 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4851:
---
Attachment: issue2.pdf
issue2-poppler.jpg
issue2-pdfbox.jpg

> Image rendering issue 2
> ---
>
> Key: PDFBOX-4851
> URL: https://issues.apache.org/jira/browse/PDFBOX-4851
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.19
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: issue2-pdfbox.jpg, issue2-poppler.jpg, issue2.pdf
>
>
> Text is missing in image rendered using PDFToImage.
>  
> Text is present if you render with Poppler instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4850) Image rendering issue

2020-05-29 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4850:
---
Attachment: issue.pdf
issue-pdfbox.jpg
issue-poppler.jpg

> Image rendering issue
> -
>
> Key: PDFBOX-4850
> URL: https://issues.apache.org/jira/browse/PDFBOX-4850
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.19
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: issue-pdfbox.jpg, issue-poppler.jpg, issue.pdf
>
>
> Rendering file using PDFToImage creates a strange result where embedded 
> images aren't rotated or scaled correctly.
>  
> Rendering the same PDF using poppler will create a correct looking output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4850) Image rendering issue

2020-05-29 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-4850:
--

 Summary: Image rendering issue
 Key: PDFBOX-4850
 URL: https://issues.apache.org/jira/browse/PDFBOX-4850
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.19
Reporter: Daniel Persson
 Attachments: issue-pdfbox.jpg, issue-poppler.jpg, issue.pdf

Rendering file using PDFToImage creates a strange result where embedded images 
aren't rotated or scaled correctly.

 

Rendering the same PDF using poppler will create a correct looking output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4850) Image rendering issue

2020-05-29 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4850:
---
Issue Type: Bug  (was: Improvement)

> Image rendering issue
> -
>
> Key: PDFBOX-4850
> URL: https://issues.apache.org/jira/browse/PDFBOX-4850
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.19
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: issue-pdfbox.jpg, issue-poppler.jpg, issue.pdf
>
>
> Rendering file using PDFToImage creates a strange result where embedded 
> images aren't rotated or scaled correctly.
>  
> Rendering the same PDF using poppler will create a correct looking output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4762) Inconsistent handling of incorrect data

2020-02-04 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4762:
---
Description: 
We had a PDF that had a strange page with 200Mb+ of text to extract and the 
deflate function did not work correctly. 

This created a fatal in PDFBox and I did some debugging and noticed that we 
handle  SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. 
One of them had a check if the in data was incorrect and returned and the other 
one did not have this check.

I made this small patch that I will include in this issue to rectify this 
inconsistency.

 

Added the crashing pdf on my google drive if you want it to test with

https://drive.google.com/open?id=1bcT27NoqNM-pphYiFCy13bq81potqUc6

 

Best regards

Daniel

  was:
We had a PDF that had a strange page with 200Mb+ of text to extract and the 
deflate function did not work correctly. 

This created a fatal in PDFBox and I did some debugging and noticed that we 
handle  SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. 
One of them had a check if the in data was incorrect and returned and the other 
one did not have this check.

I made this small patch that I will include in this issue to rectify this 
inconsistency.

Best regards

Daniel


> Inconsistent handling of incorrect data
> ---
>
> Key: PDFBOX-4762
> URL: https://issues.apache.org/jira/browse/PDFBOX-4762
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.18
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: patch
> Attachments: inconsistant.patch
>
>
> We had a PDF that had a strange page with 200Mb+ of text to extract and the 
> deflate function did not work correctly. 
> This created a fatal in PDFBox and I did some debugging and noticed that we 
> handle  SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. 
> One of them had a check if the in data was incorrect and returned and the 
> other one did not have this check.
> I made this small patch that I will include in this issue to rectify this 
> inconsistency.
>  
> Added the crashing pdf on my google drive if you want it to test with
> https://drive.google.com/open?id=1bcT27NoqNM-pphYiFCy13bq81potqUc6
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4762) Inconsistent handling of incorrect data

2020-02-04 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-4762:
--

 Summary: Inconsistent handling of incorrect data
 Key: PDFBOX-4762
 URL: https://issues.apache.org/jira/browse/PDFBOX-4762
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.18
Reporter: Daniel Persson
 Attachments: inconsistant.patch

We had a PDF that had a strange page with 200Mb+ of text to extract and the 
deflate function did not work correctly. 

This created a fatal in PDFBox and I did some debugging and noticed that we 
handle  SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. 
One of them had a check if the in data was incorrect and returned and the other 
one did not have this check.

I made this small patch that I will include in this issue to rectify this 
inconsistency.

Best regards

Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4762) Inconsistent handling of incorrect data

2020-02-04 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4762:
---
Attachment: inconsistant.patch

> Inconsistent handling of incorrect data
> ---
>
> Key: PDFBOX-4762
> URL: https://issues.apache.org/jira/browse/PDFBOX-4762
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.18
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: patch
> Attachments: inconsistant.patch
>
>
> We had a PDF that had a strange page with 200Mb+ of text to extract and the 
> deflate function did not work correctly. 
> This created a fatal in PDFBox and I did some debugging and noticed that we 
> handle  SetNonStrokingColorSpace and SetStrokingColorSpace in different ways. 
> One of them had a check if the in data was incorrect and returned and the 
> other one did not have this check.
> I made this small patch that I will include in this issue to rectify this 
> inconsistency.
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2020-01-17 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018294#comment-17018294
 ] 

Daniel Persson commented on PDFBOX-4743:


Hi Tilman.

 

I have added two new images. One without the instructions to set a font and 
write text. And the other one without drawing instructions for images.

 

Without images is slow, without text is fast.

 

Best regards

Daniel

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: slow_rendering.pdf, without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2020-01-17 Thread Daniel Persson (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018294#comment-17018294
 ] 

Daniel Persson edited comment on PDFBOX-4743 at 1/17/20 8:35 PM:
-

Hi Tilman.

 

I have added two new PDFs. One without the instructions to set a font and write 
text. And the other one without drawing instructions for images.

 

Without images is slow, without text is fast.

 

Best regards

Daniel


was (Author: kalaspuffar):
Hi Tilman.

 

I have added two new images. One without the instructions to set a font and 
write text. And the other one without drawing instructions for images.

 

Without images is slow, without text is fast.

 

Best regards

Daniel

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: slow_rendering.pdf, without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2020-01-17 Thread Daniel Persson (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4743:
---
Attachment: without_text.pdf
without_images.pdf

> Long rendering time of fonts in a specific PDF
> --
>
> Key: PDFBOX-4743
> URL: https://issues.apache.org/jira/browse/PDFBOX-4743
> Project: PDFBox
>  Issue Type: Improvement
> Environment: Gentoo Linux, Java 8
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: slow_rendering.pdf, without_images.pdf, without_text.pdf
>
>
> Hi Team.
>  
> We have found a PDF that takes a long time to render images.
>  
> After some checking, we found that the one page takes more than 2 minutes to 
> render, but if we remove the font information and render the PDF without 
> text, it takes 3 seconds.
>  
> Just looking at the font information, it doesn't seem to be a lot of data. 
> 3-5kb per font and there are only about seven fonts defined. So there must be 
> something else that complicates things.
>  
> Best regards
> Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4743) Long rendering time of fonts in a specific PDF

2020-01-17 Thread Daniel Persson (Jira)
Daniel Persson created PDFBOX-4743:
--

 Summary: Long rendering time of fonts in a specific PDF
 Key: PDFBOX-4743
 URL: https://issues.apache.org/jira/browse/PDFBOX-4743
 Project: PDFBox
  Issue Type: Improvement
 Environment: Gentoo Linux, Java 8
Reporter: Daniel Persson
 Attachments: slow_rendering.pdf

Hi Team.

 

We have found a PDF that takes a long time to render images.

 

After some checking, we found that the one page takes more than 2 minutes to 
render, but if we remove the font information and render the PDF without text, 
it takes 3 seconds.

 

Just looking at the font information, it doesn't seem to be a lot of data. 
3-5kb per font and there are only about seven fonts defined. So there must be 
something else that complicates things.

 

Best regards

Daniel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4501) References numbers in embedded PDF become floats

2019-03-29 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-4501:
--

 Summary: References numbers in embedded PDF become floats
 Key: PDFBOX-4501
 URL: https://issues.apache.org/jira/browse/PDFBOX-4501
 Project: PDFBox
  Issue Type: Bug
Reporter: Daniel Persson
 Attachments: float_pointer.patch

Hi everyone.

We found an issue that happens sometimes with smaller producers that create PDF 
files with embedded advertisements or other articles. 

For some reason, this embedded makes the library to throw an exception and not 
read the file. In many cases, we can read most of the pages but just these 
embedded data will be missing.

I wrote a little patch that will handle the issue but I don't know how to 
decode the embedded data so I have not debugged the issue further. I will add a 
link to the file because it's 124 Mb so not allowed to upload with the issue.

[https://drive.google.com/file/d/1hQslqtrbIoo5bTmMXgH1NDSYXuvIUOAQ/view?usp=sharing]

If we could find a solution where the PDF could be read correctly that would be 
great but the current behavior of not reading it at all is not great.

 

```

java.io.IOException: expected number, actual=COSFloat\{18446744073221199360} at 
offset 127766191
 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:166)
 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
 org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
 org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912)
 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881)
 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801)
 org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:761)
 org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
 org.apache.pdfbox.debugger.PDFDebugger$12.open(PDFDebugger.java:1272)
 
org.apache.pdfbox.debugger.PDFDebugger$DocumentOpener.parse(PDFDebugger.java:1383)
 org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1275)
 org.apache.pdfbox.debugger.PDFDebugger.readPDFFile(PDFDebugger.java:1252)
 org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1243)

```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4306) Image clipping area rounding error

2018-09-17 Thread Daniel Persson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617769#comment-16617769
 ] 

Daniel Persson commented on PDFBOX-4306:


Hi @tilman

Well, after some consideration and no other response to this issue I would ask 
you to include the last patch in the next release if possible. 

Best regards
Daniel

> Image clipping area rounding error
> --
>
> Key: PDFBOX-4306
> URL: https://issues.apache.org/jira/browse/PDFBOX-4306
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Major
>  Labels: rendering
> Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg
>
>
> Creating images with PDFBox and merging them together when you have two pages 
> that connect will create a white line between the images.
> We have looked into the issue and tried to fix it and found that the clipping 
> area is a bit to tight so the images will not be rendered correctly. My guess 
> is that this is due to a rounding error when using floats. 
> Most of the graphics functions in java use double precision and PDFBox uses 
> floats so when using layer upon layer of bounding boxes intersecting the 
> clipping area it might get skewed to a bad bounding box.
> I've added a patch to this issue with the code we use as a workaround today. 
> It's by no means the final solution to the problem but it resolves the white 
> line issue.
> To be sure that you get the error when generating the images use the 
> following command
> ```
> java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 
> -format jpg page-1.pdf
> ```
> We run java 8 on our machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4306) Image clipping area rounding error

2018-09-01 Thread Daniel Persson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599636#comment-16599636
 ] 

Daniel Persson commented on PDFBOX-4306:


Well, if it's a rounding error when drawing elements then I feel that you 
should ensure to increase the precision instead of changing the image ratio.

With this solution we create images that are 1 pixel narrower than the same 
produced by Poppler. Then again that might not be the correct resolution. 

The thing that worries me is that when you create a spread of multiple images 
you could introduce a jagged edge between the images if one pixel is missing.

Best regards
Daniel

> Image clipping area rounding error
> --
>
> Key: PDFBOX-4306
> URL: https://issues.apache.org/jira/browse/PDFBOX-4306
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Major
>  Labels: rendering
> Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg
>
>
> Creating images with PDFBox and merging them together when you have two pages 
> that connect will create a white line between the images.
> We have looked into the issue and tried to fix it and found that the clipping 
> area is a bit to tight so the images will not be rendered correctly. My guess 
> is that this is due to a rounding error when using floats. 
> Most of the graphics functions in java use double precision and PDFBox uses 
> floats so when using layer upon layer of bounding boxes intersecting the 
> clipping area it might get skewed to a bad bounding box.
> I've added a patch to this issue with the code we use as a workaround today. 
> It's by no means the final solution to the problem but it resolves the white 
> line issue.
> To be sure that you get the error when generating the images use the 
> following command
> ```
> java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 
> -format jpg page-1.pdf
> ```
> We run java 8 on our machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4306) Image clipping area rounding error

2018-08-31 Thread Daniel Persson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598736#comment-16598736
 ] 

Daniel Persson commented on PDFBOX-4306:


Hi again.

We noticed that the result of smaller images in other pages created artifacts 
so we realized that patch.diff was not a solution we could use going forward. 

Patch2.diff is a hackier solution but seems to solve the immediate problem for 
us at least but there must be a better way.

best regards

Daniel

> Image clipping area rounding error
> --
>
> Key: PDFBOX-4306
> URL: https://issues.apache.org/jira/browse/PDFBOX-4306
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Major
>  Labels: rendering
> Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg
>
>
> Creating images with PDFBox and merging them together when you have two pages 
> that connect will create a white line between the images.
> We have looked into the issue and tried to fix it and found that the clipping 
> area is a bit to tight so the images will not be rendered correctly. My guess 
> is that this is due to a rounding error when using floats. 
> Most of the graphics functions in java use double precision and PDFBox uses 
> floats so when using layer upon layer of bounding boxes intersecting the 
> clipping area it might get skewed to a bad bounding box.
> I've added a patch to this issue with the code we use as a workaround today. 
> It's by no means the final solution to the problem but it resolves the white 
> line issue.
> To be sure that you get the error when generating the images use the 
> following command
> ```
> java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 
> -format jpg page-1.pdf
> ```
> We run java 8 on our machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4306) Image clipping area rounding error

2018-08-31 Thread Daniel Persson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4306:
---
Attachment: patch2.diff

> Image clipping area rounding error
> --
>
> Key: PDFBOX-4306
> URL: https://issues.apache.org/jira/browse/PDFBOX-4306
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Major
>  Labels: rendering
> Attachments: page-1.pdf, page-2.pdf, patch.diff, patch2.diff, test.jpg
>
>
> Creating images with PDFBox and merging them together when you have two pages 
> that connect will create a white line between the images.
> We have looked into the issue and tried to fix it and found that the clipping 
> area is a bit to tight so the images will not be rendered correctly. My guess 
> is that this is due to a rounding error when using floats. 
> Most of the graphics functions in java use double precision and PDFBox uses 
> floats so when using layer upon layer of bounding boxes intersecting the 
> clipping area it might get skewed to a bad bounding box.
> I've added a patch to this issue with the code we use as a workaround today. 
> It's by no means the final solution to the problem but it resolves the white 
> line issue.
> To be sure that you get the error when generating the images use the 
> following command
> ```
> java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 
> -format jpg page-1.pdf
> ```
> We run java 8 on our machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4306) Image clipping area rounding error

2018-08-31 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-4306:
--

 Summary: Image clipping area rounding error
 Key: PDFBOX-4306
 URL: https://issues.apache.org/jira/browse/PDFBOX-4306
 Project: PDFBox
  Issue Type: Bug
Reporter: Daniel Persson
 Attachments: page-1.pdf, page-2.pdf, patch.diff, test.jpg

Creating images with PDFBox and merging them together when you have two pages 
that connect will create a white line between the images.

We have looked into the issue and tried to fix it and found that the clipping 
area is a bit to tight so the images will not be rendered correctly. My guess 
is that this is due to a rounding error when using floats. 

Most of the graphics functions in java use double precision and PDFBox uses 
floats so when using layer upon layer of bounding boxes intersecting the 
clipping area it might get skewed to a bad bounding box.

I've added a patch to this issue with the code we use as a workaround today. 
It's by no means the final solution to the problem but it resolves the white 
line issue.

To be sure that you get the error when generating the images use the following 
command

```
java -jar pdfbox-app-3.0.0-SNAPSHOT.jar PDFToImage -dpi 150 -quality 0.95 
-format jpg page-1.pdf
```

We run java 8 on our machines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4296) Question: Performance

2018-08-21 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-4296:
--

 Summary: Question: Performance
 Key: PDFBOX-4296
 URL: https://issues.apache.org/jira/browse/PDFBOX-4296
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.11
Reporter: Daniel Persson


Hi Team.

We use a tool we built using PDFBox to extract text for about 10k pages per 
day. Then we have another tool to extract images using Poppler.

We want to use PDFBox for both tasks but sadly we see a performance hit using 
PDFBox in the order of 3 times.

Do you have any backlog / technical dept / ideas on how to improve performance?

We have tried -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true and 
that made image generation much slower.
We have set System.setProperty("sun.java2d.cmm", 
"sun.java2d.cmm.kcms.KcmsServiceProvider") in code.

We use image libraries from twelvemonkeys, pdfbox and the standard jai project.

I've read in the code that we do double writes for images using transparency 
which might be a culprit.

I have been allowed to put some time into the project if we have some solid 
leads or a roadmap to reach better performance.

Hope it's okay to track this issue here instead of a question on the mailing 
list.

Best regards

Daniel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4228) PDFBox crashes when a Type3 font don't have an embedded encoding.

2018-05-23 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4228:
---
Attachment: example.pdf

> PDFBox crashes when a Type3 font don't have an embedded encoding.
> -
>
> Key: PDFBOX-4228
> URL: https://issues.apache.org/jira/browse/PDFBOX-4228
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Critical
>  Labels: patch
> Attachments: example.pdf, type3_fixed.patch
>
>
> When running PDFBox on a pdf with WinAnsiEncoding for a Type3 font it crashes 
> without any output.
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.pdfbox.cos.COSName cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> at 
> org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:82)
> at org.apache.pdfbox.pdmodel.font.PDType3Font.(PDType3Font.java:66)
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79)
> at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
> at 
> org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181)
> at 
> org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181)
> at 
> org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
> at 
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:141)
> at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:360)
> at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
> at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
> at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237)
> at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4228) PDFBox crashes when a Type3 font don't have an embedded encoding.

2018-05-23 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4228:
---
Attachment: type3_fixed.patch

> PDFBox crashes when a Type3 font don't have an embedded encoding.
> -
>
> Key: PDFBOX-4228
> URL: https://issues.apache.org/jira/browse/PDFBOX-4228
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Critical
>  Labels: patch
> Attachments: example.pdf, type3_fixed.patch
>
>
> When running PDFBox on a pdf with WinAnsiEncoding for a Type3 font it crashes 
> without any output.
> {code:java}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.pdfbox.cos.COSName cannot be cast to 
> org.apache.pdfbox.cos.COSDictionary
> at 
> org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:82)
> at org.apache.pdfbox.pdmodel.font.PDType3Font.(PDType3Font.java:66)
> at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79)
> at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
> at 
> org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181)
> at 
> org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181)
> at 
> org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
> at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
> at 
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:141)
> at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:360)
> at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
> at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
> at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237)
> at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4228) PDFBox crashes when a Type3 font don't have an embedded encoding.

2018-05-23 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-4228:
--

 Summary: PDFBox crashes when a Type3 font don't have an embedded 
encoding.
 Key: PDFBOX-4228
 URL: https://issues.apache.org/jira/browse/PDFBOX-4228
 Project: PDFBox
  Issue Type: Bug
Reporter: Daniel Persson


When running PDFBox on a pdf with WinAnsiEncoding for a Type3 font it crashes 
without any output.
{code:java}
Exception in thread "main" java.lang.ClassCastException: 
org.apache.pdfbox.cos.COSName cannot be cast to 
org.apache.pdfbox.cos.COSDictionary
at org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:82)
at org.apache.pdfbox.pdmodel.font.PDType3Font.(PDType3Font.java:66)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181)
at 
org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:181)
at 
org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:65)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:841)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:498)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
at 
org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:141)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:360)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237)
at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:59)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4140) Crash when repeating flag is outside of range.

2018-03-05 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4140:
---
Attachment: LP-180302-08.pdf

> Crash when repeating flag is outside of range.
> --
>
> Key: PDFBOX-4140
> URL: https://issues.apache.org/jira/browse/PDFBOX-4140
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.8
>Reporter: Daniel Persson
>Priority: Major
>  Labels: patch
> Attachments: LP-180302-08.pdf, fixing_broken_pdf.diff
>
>
> Running PDFBox to create images with a PDF with bad data the tool crashes and 
> no image is rendered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4140) Crash when repeating flag is outside of range.

2018-03-05 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-4140:
---
Attachment: fixing_broken_pdf.diff

> Crash when repeating flag is outside of range.
> --
>
> Key: PDFBOX-4140
> URL: https://issues.apache.org/jira/browse/PDFBOX-4140
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.8
>Reporter: Daniel Persson
>Priority: Major
>  Labels: patch
> Attachments: fixing_broken_pdf.diff
>
>
> Running PDFBox to create images with a PDF with bad data the tool crashes and 
> no image is rendered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4140) Crash when repeating flag is outside of range.

2018-03-05 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-4140:
--

 Summary: Crash when repeating flag is outside of range.
 Key: PDFBOX-4140
 URL: https://issues.apache.org/jira/browse/PDFBOX-4140
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.8
Reporter: Daniel Persson
 Attachments: fixing_broken_pdf.diff

Running PDFBox to create images with a PDF with bad data the tool crashes and 
no image is rendered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4021) Building from source missing dependency

2017-11-22 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-4021:
--

 Summary: Building from source missing dependency
 Key: PDFBOX-4021
 URL: https://issues.apache.org/jira/browse/PDFBOX-4021
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Daniel Persson
Priority: Minor


Downloaded and built trunk from source today and got a failing test due to 
missing Noto font.

```
2017-11-23 08:19:58 ERROR 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider:661 - Could not load font 
file: /usr/share/fonts/noto/NotoSansCoptic-Regular.ttf
java.io.FileNotFoundException: /usr/share/fonts/noto/NotoSansCoptic-Regular.ttf 
(No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at 
org.apache.fontbox.ttf.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:88)
at org.apache.fontbox.ttf.RAFDataStream.(RAFDataStream.java:63)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:84)
at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.readTrueTypeFont(FileSystemFontProvider.java:682)
at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.getTrueTypeFont(FileSystemFontProvider.java:650)
at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.access$200(FileSystemFontProvider.java:55)
at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo.getFont(FileSystemFontProvider.java:126)
at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.getCIDFont(FontMapperImpl.java:518)
at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType0.(PDCIDFontType0.java:128)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:121)
at 
org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:80)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.getFonts(ResourcesValidationProcess.java:125)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:94)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:77)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at 
org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess.validateResources(SinglePageValidationProcess.java:169)
at 
org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess.validate(SinglePageValidationProcess.java:84)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at 
org.apache.pdfbox.preflight.process.PageTreeValidationProcess.validatePage(PageTreeValidationProcess.java:69)
at 
org.apache.pdfbox.preflight.process.PageTreeValidationProcess.validate(PageTreeValidationProcess.java:57)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:122)
at 
org.apache.pdfbox.preflight.PreflightDocument.validate(PreflightDocument.java:163)
at 
org.apache.pdfbox.preflight.TestIsartorBavaria.validate(TestIsartorBavaria.java:190)
```

```
validate[target/pdfs/Isartor testsuite/PDFA-1b/6.3 Fonts/6.3.4 Embedded font 
programs/isartor-6-3-4-t01-fail-c.pdf](org.apache.pdfbox.preflight.TestIsartorBavaria)
  Time elapsed: 0.025 sec  <<< ERROR!
java.lang.NullPointerException: null
at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType0.(PDCIDFontType0.java:158)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:121)
at 
org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:80)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.getFonts(ResourcesValidationProcess.java:125)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:94)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:77)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at 

[jira] [Commented] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing

2017-05-26 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026203#comment-16026203
 ] 

Daniel Persson commented on PDFBOX-3806:


I've tested with our application with installed trunk (3.0.0-SNAPSHOT), 
pdfbox-app (2.0.7-SNAPSHOT) and pdfbox-debugger (2.0.7 SNAPSHOT).

Don't see any errors.

Best regards
Daniel

> Nullpointer exception in getLeftSideBearing
> ---
>
> Key: PDFBOX-3806
> URL: https://issues.apache.org/jira/browse/PDFBOX-3806
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.6
>Reporter: Daniel Persson
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.7, 3.0.0
>
> Attachments: font.raw
>
>
> While processing todays batch of data we got a Nullpointer exception in 
> getLeftSideBearing. Sadly I can't give you the PDF.
> ```
> public int getLeftSideBearing(int gid) {
> return gid < 
> this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid
>  - this.numHMetrics];
> }
> ```
> In this function there could be a case where nonHorizontalLeftSideBearing is 
> null and you still ask for a GID in larger or equal to numHMetrics.
> First time I see this issue and so far only 4 characters in one PDF has this 
> issue so not critical.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing

2017-05-25 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3806:
---
Attachment: font.raw

> Nullpointer exception in getLeftSideBearing
> ---
>
> Key: PDFBOX-3806
> URL: https://issues.apache.org/jira/browse/PDFBOX-3806
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Minor
> Attachments: font.raw
>
>
> While processing todays batch of data we got a Nullpointer exception in 
> getLeftSideBearing. Sadly I can't give you the PDF.
> ```
> public int getLeftSideBearing(int gid) {
> return gid < 
> this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid
>  - this.numHMetrics];
> }
> ```
> In this function there could be a case where nonHorizontalLeftSideBearing is 
> null and you still ask for a GID in larger or equal to numHMetrics.
> First time I see this issue and so far only 4 characters in one PDF has this 
> issue so not critical.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing

2017-05-25 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025846#comment-16025846
 ] 

Daniel Persson commented on PDFBOX-3806:


When it comes to the code seems like I cut from the disassembled code in 
IntelliJ.

public int getLeftSideBearing(int gid)
{
if (gid < numHMetrics)
{
return leftSideBearing[gid];
}
else
{
return nonHorizontalLeftSideBearing[gid - numHMetrics];
}
   }

> Nullpointer exception in getLeftSideBearing
> ---
>
> Key: PDFBOX-3806
> URL: https://issues.apache.org/jira/browse/PDFBOX-3806
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Minor
>
> While processing todays batch of data we got a Nullpointer exception in 
> getLeftSideBearing. Sadly I can't give you the PDF.
> ```
> public int getLeftSideBearing(int gid) {
> return gid < 
> this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid
>  - this.numHMetrics];
> }
> ```
> In this function there could be a case where nonHorizontalLeftSideBearing is 
> null and you still ask for a GID in larger or equal to numHMetrics.
> First time I see this issue and so far only 4 characters in one PDF has this 
> issue so not critical.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing

2017-05-25 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025845#comment-16025845
 ] 

Daniel Persson commented on PDFBOX-3806:


java.lang.NullPointerException
at 
org.apache.fontbox.ttf.HorizontalMetricsTable.getLeftSideBearing(HorizontalMetricsTable.java:122)
at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:195)
at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:176)
at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:447)
at 
org.apache.pdfbox.debugger.fontencodingpane.SimpleFont.getGlyphs(SimpleFont.java:72)
at 
org.apache.pdfbox.debugger.fontencodingpane.SimpleFont.(SimpleFont.java:44)
at 
org.apache.pdfbox.debugger.fontencodingpane.FontEncodingPaneController.(FontEncodingPaneController.java:89)
at 
org.apache.pdfbox.debugger.PDFDebugger.showFont(PDFDebugger.java:1069)
at 
org.apache.pdfbox.debugger.PDFDebugger.jTree1ValueChanged(PDFDebugger.java:801)
at 
org.apache.pdfbox.debugger.PDFDebugger.access$200(PDFDebugger.java:118)
at 
org.apache.pdfbox.debugger.PDFDebugger$3.valueChanged(PDFDebugger.java:330)
at javax.swing.JTree.fireValueChanged(JTree.java:2927)
at 
javax.swing.JTree$TreeSelectionRedirector.valueChanged(JTree.java:3391)
at 
javax.swing.tree.DefaultTreeSelectionModel.fireValueChanged(DefaultTreeSelectionModel.java:635)
at 
javax.swing.tree.DefaultTreeSelectionModel.notifyPathChange(DefaultTreeSelectionModel.java:1093)
at 
javax.swing.tree.DefaultTreeSelectionModel.setSelectionPaths(DefaultTreeSelectionModel.java:294)
at 
javax.swing.tree.DefaultTreeSelectionModel.setSelectionPath(DefaultTreeSelectionModel.java:188)
at javax.swing.JTree.setSelectionPath(JTree.java:1634)
at 
javax.swing.plaf.basic.BasicTreeUI.selectPathForEvent(BasicTreeUI.java:2393)
at 
javax.swing.plaf.basic.BasicTreeUI$Handler.handleSelection(BasicTreeUI.java:3609)
at 
javax.swing.plaf.basic.BasicTreeUI$Handler.mousePressed(BasicTreeUI.java:3548)
at java.awt.Component.processMouseEvent(Component.java:6530)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6298)
at java.awt.Container.processEvent(Container.java:2236)
at java.awt.Component.dispatchEventImpl(Component.java:4889)
at java.awt.Container.dispatchEventImpl(Container.java:2294)
at java.awt.Component.dispatchEvent(Component.java:4711)
at 
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4522)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
at java.awt.Container.dispatchEventImpl(Container.java:2280)
at java.awt.Window.dispatchEventImpl(Window.java:2746)
at java.awt.Component.dispatchEvent(Component.java:4711)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at 
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80)
at 
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:90)
at java.awt.EventQueue$4.run(EventQueue.java:731)
at java.awt.EventQueue$4.run(EventQueue.java:729)
at java.security.AccessController.doPrivileged(Native Method)
at 
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at 
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at 
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at 
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)


> Nullpointer exception in getLeftSideBearing
> ---
>
> Key: PDFBOX-3806
> URL: https://issues.apache.org/jira/browse/PDFBOX-3806
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Daniel Persson
>Priority: Minor
>
> While processing todays batch of data we got a Nullpointer exception in 
> getLeftSideBearing. Sadly I can't 

[jira] [Created] (PDFBOX-3806) Nullpointer exception in getLeftSideBearing

2017-05-23 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3806:
--

 Summary: Nullpointer exception in getLeftSideBearing
 Key: PDFBOX-3806
 URL: https://issues.apache.org/jira/browse/PDFBOX-3806
 Project: PDFBox
  Issue Type: Bug
Reporter: Daniel Persson
Priority: Minor


While processing todays batch of data we got a Nullpointer exception in 
getLeftSideBearing. Sadly I can't give you the PDF.

```
public int getLeftSideBearing(int gid) {
return gid < 
this.numHMetrics?this.leftSideBearing[gid]:this.nonHorizontalLeftSideBearing[gid
 - this.numHMetrics];
}
```

In this function there could be a case where nonHorizontalLeftSideBearing is 
null and you still ask for a GID in larger or equal to numHMetrics.

First time I see this issue and so far only 4 characters in one PDF has this 
issue so not critical.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3802) Images wrong color

2017-05-22 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3802:
--

 Summary: Images wrong color
 Key: PDFBOX-3802
 URL: https://issues.apache.org/jira/browse/PDFBOX-3802
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.6
 Environment: Gentoo, Ubuntu
Reporter: Daniel Persson
Priority: Minor
 Attachments: pdfbox.png, poppler.png, test.pdf

We found that some images in our pdf flow didn't have the correct colors after 
extraction.

After some investigation it seemed that we had the same problem with both 
poppler and pdfbox. We found a solution for poppler where we recompiled it with 
version 2.8 of Little CMS.

The images in this issue was created with these commands:

```
java -jar pdfbox-app-2.1.0-SNAPSHOT.jar PDFToImage -imageType png test.pdf
```

```
pdftoppm test.pdf -png poppler
```

Best regards
Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-3764) 100 times performance hit on creating images

2017-04-24 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson resolved PDFBOX-3764.

Resolution: Invalid

> 100 times performance hit on creating images
> 
>
> Key: PDFBOX-3764
> URL: https://issues.apache.org/jira/browse/PDFBOX-3764
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.6
>Reporter: Daniel Persson
>  Labels: image, performance
> Attachments: callstack_1.png, callstack_2.png, test.pdf
>
>
> We found that PDFBox creates a better image than poppler so we wanted to 
> switch out our environment to get these improvements but found a file that 
> took about 10 minutes to create one image with PDFBox and only about 6 
> seconds with poppler. So a 100 times performance hit if we where to change.
> I've done some rudimentary profiling on the code and found that most of the 
> time is spent in ColorConvertOp.filter. Maybe there is a leaner way to 
> implement this in order to get a better result?
> best regards
> Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3764) 100 times performance hit on creating images

2017-04-24 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980990#comment-15980990
 ] 

Daniel Persson commented on PDFBOX-3764:


Hi Tilman.

Thank you for the support. Sadly I've not read the getting started guide at

https://pdfbox.apache.org/2.0/getting-started.html

Been using PDFBox for reading text for years now so I've must missed this 
update.

Now we're down to 19 seconds rendering instead of 10 minutes. :)

And after I added the org.apache.pdfbox.rendering.UsePureJavaCMYKConversion the 
thing taking the most time is the InputStream.read function which seems 
resonable.

Thank you for the quick response.

Best regards
Daniel

> 100 times performance hit on creating images
> 
>
> Key: PDFBOX-3764
> URL: https://issues.apache.org/jira/browse/PDFBOX-3764
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.6
>Reporter: Daniel Persson
>  Labels: image, performance
> Attachments: callstack_1.png, callstack_2.png, test.pdf
>
>
> We found that PDFBox creates a better image than poppler so we wanted to 
> switch out our environment to get these improvements but found a file that 
> took about 10 minutes to create one image with PDFBox and only about 6 
> seconds with poppler. So a 100 times performance hit if we where to change.
> I've done some rudimentary profiling on the code and found that most of the 
> time is spent in ColorConvertOp.filter. Maybe there is a leaner way to 
> implement this in order to get a better result?
> best regards
> Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3764) 100 times performance hit on creating images

2017-04-24 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3764:
---
Component/s: Rendering

> 100 times performance hit on creating images
> 
>
> Key: PDFBOX-3764
> URL: https://issues.apache.org/jira/browse/PDFBOX-3764
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.6
>Reporter: Daniel Persson
>  Labels: image, performance
> Attachments: callstack_1.png, callstack_2.png, test.pdf
>
>
> We found that PDFBox creates a better image than poppler so we wanted to 
> switch out our environment to get these improvements but found a file that 
> took about 10 minutes to create one image with PDFBox and only about 6 
> seconds with poppler. So a 100 times performance hit if we where to change.
> I've done some rudimentary profiling on the code and found that most of the 
> time is spent in ColorConvertOp.filter. Maybe there is a leaner way to 
> implement this in order to get a better result?
> best regards
> Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3764) 100 times performance hit on creating images

2017-04-24 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3764:
---
Affects Version/s: 2.0.6

> 100 times performance hit on creating images
> 
>
> Key: PDFBOX-3764
> URL: https://issues.apache.org/jira/browse/PDFBOX-3764
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.6
>Reporter: Daniel Persson
>  Labels: image, performance
> Attachments: callstack_1.png, callstack_2.png, test.pdf
>
>
> We found that PDFBox creates a better image than poppler so we wanted to 
> switch out our environment to get these improvements but found a file that 
> took about 10 minutes to create one image with PDFBox and only about 6 
> seconds with poppler. So a 100 times performance hit if we where to change.
> I've done some rudimentary profiling on the code and found that most of the 
> time is spent in ColorConvertOp.filter. Maybe there is a leaner way to 
> implement this in order to get a better result?
> best regards
> Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3764) 100 times performance hit on creating images

2017-04-24 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3764:
--

 Summary: 100 times performance hit on creating images
 Key: PDFBOX-3764
 URL: https://issues.apache.org/jira/browse/PDFBOX-3764
 Project: PDFBox
  Issue Type: Improvement
Reporter: Daniel Persson
 Attachments: callstack_1.png, callstack_2.png, test.pdf

We found that PDFBox creates a better image than poppler so we wanted to switch 
out our environment to get these improvements but found a file that took about 
10 minutes to create one image with PDFBox and only about 6 seconds with 
poppler. So a 100 times performance hit if we where to change.

I've done some rudimentary profiling on the code and found that most of the 
time is spent in ColorConvertOp.filter. Maybe there is a leaner way to 
implement this in order to get a better result?

best regards
Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3724) Wrong size in rendering of some artifacts

2017-03-19 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3724:
--

 Summary: Wrong size in rendering of some artifacts
 Key: PDFBOX-3724
 URL: https://issues.apache.org/jira/browse/PDFBOX-3724
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.5
Reporter: Daniel Persson
Priority: Minor
 Attachments: example1.pdf, example1-pdfbox1.png, example1-poppler-1.png

Seems that some artifacts get the wrong width when rendering. I've tested my 
way to that the artifact is a stroked line and it seems the stroke width is 
larger than a single pixel and stroke width might only be applied to how wide a 
stroke is and the length of the stroke might have a minimal length? Poppler 
seem to handle this stroke correctly.

- OFF TOPIC
We do text extraction with PDFBox and use poppler today to extract our images 
because we had a lot of artifacts earlier but with the tremendous work by the 
team to solve PDFBOX-3000 issues we are looking into using PDFBox for image 
rendering. A lot of our examples have even more details than the poppler 
rendered images.
Great work people.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3511) NullPointerException - missing glyph description

2016-09-22 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3511:
--

 Summary: NullPointerException - missing glyph description
 Key: PDFBOX-3511
 URL: https://issues.apache.org/jira/browse/PDFBOX-3511
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.3, 2.0.2, 2.0.1, 2.0.0
Reporter: Daniel Persson
Priority: Minor


Hi Team.

We process many PDF documents every day and today we ran into a file that we 
couldn't create an image to. For some reason it has glyphs that didn't have any 
glyph description. 

In GlyfCompositeDescript there is atleast two functions (Line 258, 271) that 
fetch an GlyphDescription from a map like this:
GlyphDescription gd = descriptions.get(c.getGlyphIndex());

Then the functions use the description without a null check which results in an 
NullPointer exception.

Exception in thread "main" java.lang.NullPointerException
at 
org.apache.fontbox.ttf.GlyfCompositeDescript.getCompositeCompEndPt(GlyfCompositeDescript.java:272)
at 
org.apache.fontbox.ttf.GlyfCompositeDescript.getEndPtOfContours(GlyfCompositeDescript.java:126)
at org.apache.fontbox.ttf.GlyphRenderer.describe(GlyphRenderer.java:72)
at org.apache.fontbox.ttf.GlyphRenderer.getPath(GlyphRenderer.java:56)
at org.apache.fontbox.ttf.GlyphData.getPath(GlyphData.java:116)
at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.getPath(PDCIDFontType2.java:446)
at 
org.apache.pdfbox.pdmodel.font.PDType0Font.getPath(PDType0Font.java:506)
at 
org.apache.pdfbox.rendering.TTFGlyph2D.getPathForGID(TTFGlyph2D.java:137)
at 
org.apache.pdfbox.rendering.TTFGlyph2D.getPathForCharacterCode(TTFGlyph2D.java:93)
at 
org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:353)
at 
org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:334)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:744)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:701)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:564)
at 
org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:189)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:145)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:94)

So far we have only seen one file with this issue in our processing. I've tried 
to run the PDFToImage with all versions of PDFBox 2 and they fail. 

PDFBox 1.8.12 gives some error output but generates an working image.

Sep 23, 2016 7:36:53 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Sep 23, 2016 7:36:53 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
WARNING: Changing font on <•> from  to the default font
Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
WARNING: Changing font on <•> from  to the default font
Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
WARNING: Changing font on <•> from  to the default font
Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
WARNING: Changing font on <•> from  to the default font
Sep 23, 2016 7:36:55 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
WARNING: Changing font on <•> from  to the default font
Sep 23, 2016 7:36:55 AM org.apache.pdfbox.util.PDFImageWriter writeImage
INFO: Writing: [Removed Identifer]_01_07_201609231.jpg

At the time of writing the bug report the file is to fresh to disclose. Might 
be able to add it in a week or so depending on the customer, and if it's 
required for the resolution of this issue.

Thanks for your time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3488) NullPointerException in PDTrueTypeFont.java if glyf table is missing

2016-09-09 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477561#comment-15477561
 ] 

Daniel Persson commented on PDFBOX-3488:


Hope this doesn't revert the solved issue PDFBOX-3395.

Might have been a logical continuation from that fix. Maybe all fonts need a 
null pointer check when the table is missing, but an empty isn't a missing 
table.

Looking forward to 2.0.3, going to solve a lot of our problems. Keep up the 
great work.

> NullPointerException in PDTrueTypeFont.java if glyf table is missing
> 
>
> Key: PDFBOX-3488
> URL: https://issues.apache.org/jira/browse/PDFBOX-3488
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 2.0.2, 2.0.3
>Reporter: Tilman Hausherr
>
> {code}
> Caused by: java.lang.NullPointerException: null
> 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getPath(PDTrueTypeFont.java:444)
> 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getNormalizedPath(PDTrueTypeFont.java:502)
> 
> org.apache.pdfbox.rendering.GlyphCache.getPathForCharacterCode(GlyphCache.java:71)
> org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:350)
> 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:756)
> 
> org.apache.pdfbox.debugger.pagepane.DebugPageDrawer.showGlyph(DebugPageDrawer.java:59)
> 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:713)
> 
> org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:572)
> 
> org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55)
> {code}
> The cause is the change in PDFBOX-3395; previously PDFBox would consider the 
> font to be bad and replace it. Now we don't do that because the glyf table is 
> not always needed.
> I'm throwing an exception for now but a better solution should be found. 
> Adobe Reader displays glyphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3464) character height 3 times higher than expected

2016-08-24 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435380#comment-15435380
 ] 

Daniel Persson commented on PDFBOX-3464:


I also took a look into the supplied PDF and our tool using PDFBox will extract 
the correct height after normalizing the fonts. Both fonts have a EM square of 
2048.

> character height 3 times higher than expected
> -
>
> Key: PDFBOX-3464
> URL: https://issues.apache.org/jira/browse/PDFBOX-3464
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Reporter: Roman
>Priority: Minor
> Attachments: notHelped.png, nowItsHelped.png, screenshot-1.png, 
> screenshot.png, subnode.docx.pdf
>
>
> The issue basically same as PDFBOX-2749, but wrong sample was attached to it 
> by mistake. Correct PDF is attached here.
> The core of the problem is that font height for this specific font is 
> determined incorrectly, please see code with comments below.
> The issue was reproduced on Pdfbox 1.8.4, but as we tested before, same 
> result we get on 1.8.9 and 2.0 versions.
> {code}
> public class Extractor extends PDFTextStripper {
> //<...CUT...>
>   protected void writePage() throws IOException {
>   for (List textList : charactersByArticle) { 
> //charactersByArticle was inherited from base class
>   Iterator textIter = textList.iterator();
> //<...CUT...>
>   while (textIter.hasNext()) {
>   TextPosition position = (TextPosition) 
> textIter.next();
> //<...CUT...>
>   PDFontDescriptor fontDescriptor = 
> position.getFont().getFontDescriptor();
> //<...CUT...>
>   float yscale = position.getTextPos().getYScale();
>   float asc = Math.abs(fontDescriptor.getAscent() / 1000 * 
> yscale);
>   float rh = 
> Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * 
> yscale);
>   float desc = Math.abs(fontDescriptor.getDescent() / 1000 * 
> yscale);
>   float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 
> * yscale);
>   if (capHeight == 0)
>   capHeight = position.getHeight();
>   float h = (rh + Math.max(Math.max(capHeight, 
> position.getHeight()), asc)) / 2;
> //"h" evaluates to 37.39 (should be between 11 and 12)
> //"desc" evaluates to 2.664
> //"capHeight" evaluates to 37.39
> //"position.getHeight()" evaluates to 33.48
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3464) character height 3 times higher than expected

2016-08-24 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435348#comment-15435348
 ] 

Daniel Persson commented on PDFBOX-3464:


You might be right about that all fonts in PDFs should have an EM square of 
size 1000 but both Opentype and Truetype defines unitsPerEm in their head block 
and when applied to your calculations the actual height seems accurate. 


Opentype head
https://www.microsoft.com/typography/otspec/head.htm

TrueType head
https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6head.html

> character height 3 times higher than expected
> -
>
> Key: PDFBOX-3464
> URL: https://issues.apache.org/jira/browse/PDFBOX-3464
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Reporter: Roman
>Priority: Minor
> Attachments: notHelped.png, nowItsHelped.png, screenshot-1.png, 
> screenshot.png, subnode.docx.pdf
>
>
> The issue basically same as PDFBOX-2749, but wrong sample was attached to it 
> by mistake. Correct PDF is attached here.
> The core of the problem is that font height for this specific font is 
> determined incorrectly, please see code with comments below.
> The issue was reproduced on Pdfbox 1.8.4, but as we tested before, same 
> result we get on 1.8.9 and 2.0 versions.
> {code}
> public class Extractor extends PDFTextStripper {
> //<...CUT...>
>   protected void writePage() throws IOException {
>   for (List textList : charactersByArticle) { 
> //charactersByArticle was inherited from base class
>   Iterator textIter = textList.iterator();
> //<...CUT...>
>   while (textIter.hasNext()) {
>   TextPosition position = (TextPosition) 
> textIter.next();
> //<...CUT...>
>   PDFontDescriptor fontDescriptor = 
> position.getFont().getFontDescriptor();
> //<...CUT...>
>   float yscale = position.getTextPos().getYScale();
>   float asc = Math.abs(fontDescriptor.getAscent() / 1000 * 
> yscale);
>   float rh = 
> Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * 
> yscale);
>   float desc = Math.abs(fontDescriptor.getDescent() / 1000 * 
> yscale);
>   float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 
> * yscale);
>   if (capHeight == 0)
>   capHeight = position.getHeight();
>   float h = (rh + Math.max(Math.max(capHeight, 
> position.getHeight()), asc)) / 2;
> //"h" evaluates to 37.39 (should be between 11 and 12)
> //"desc" evaluates to 2.664
> //"capHeight" evaluates to 37.39
> //"position.getHeight()" evaluates to 33.48
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3464) character height 3 times higher than expected

2016-08-24 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434771#comment-15434771
 ] 

Daniel Persson commented on PDFBOX-3464:


Just a thought.

Could it be because of the UPM square?

"With the knowledge that your font is using a 1000, 1024, or 2048 UPM, you need 
to set up the drawing of your glyphs to ensure that all aspects of your 
typeface fit adequately into that UPM square."

All values in your scaling is done with a UPM square of 1000 but this font 
might be using the 2048 square instead?

> character height 3 times higher than expected
> -
>
> Key: PDFBOX-3464
> URL: https://issues.apache.org/jira/browse/PDFBOX-3464
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Reporter: Roman
>Priority: Minor
> Attachments: notHelped.png, nowItsHelped.png, screenshot-1.png, 
> screenshot.png, subnode.docx.pdf
>
>
> The issue basically same as PDFBOX-2749, but wrong sample was attached to it 
> by mistake. Correct PDF is attached here.
> The core of the problem is that font height for this specific font is 
> determined incorrectly, please see code with comments below.
> The issue was reproduced on Pdfbox 1.8.4, but as we tested before, same 
> result we get on 1.8.9 and 2.0 versions.
> {code}
> public class Extractor extends PDFTextStripper {
> //<...CUT...>
>   protected void writePage() throws IOException {
>   for (List textList : charactersByArticle) { 
> //charactersByArticle was inherited from base class
>   Iterator textIter = textList.iterator();
> //<...CUT...>
>   while (textIter.hasNext()) {
>   TextPosition position = (TextPosition) 
> textIter.next();
> //<...CUT...>
>   PDFontDescriptor fontDescriptor = 
> position.getFont().getFontDescriptor();
> //<...CUT...>
>   float yscale = position.getTextPos().getYScale();
>   float asc = Math.abs(fontDescriptor.getAscent() / 1000 * 
> yscale);
>   float rh = 
> Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * 
> yscale);
>   float desc = Math.abs(fontDescriptor.getDescent() / 1000 * 
> yscale);
>   float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 
> * yscale);
>   if (capHeight == 0)
>   capHeight = position.getHeight();
>   float h = (rh + Math.max(Math.max(capHeight, 
> position.getHeight()), asc)) / 2;
> //"h" evaluates to 37.39 (should be between 11 and 12)
> //"desc" evaluates to 2.664
> //"capHeight" evaluates to 37.39
> //"position.getHeight()" evaluates to 33.48
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3468) ERROR: dash lengths all zero, ignored

2016-08-21 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3468:
--

 Summary: ERROR: dash lengths all zero, ignored
 Key: PDFBOX-3468
 URL: https://issues.apache.org/jira/browse/PDFBOX-3468
 Project: PDFBox
  Issue Type: Wish
  Components: Parsing
Affects Versions: 2.0.2
Reporter: Daniel Persson
Priority: Trivial


On Friday our production log system alerted us that we had an error ("dash 
lengths all zero, ignored"). We investigated and found that the PDF processed 
gave an error when opening it up in Adobe Reader as well but the page looked 
fine and was processed fine as well. But still we got this error. For us this 
is a false positive, even though a line pattern should not be empty the page 
isn't broken or can't be viewed so why handle it as an error. 

My suggestion is to handle the errors in the code below as an information 
logging or warning.

In our case we got an update 1 hour later with a PDF that didn't have the empty 
line dash pattern.

{code:title=SetLineDashPattern.java|borderStyle=solid}
for (COSBase base : dashArray)
{
if (base instanceof COSNumber)
{
COSNumber num = (COSNumber) base;
if (num.floatValue() != 0)
{
allZero = false;
break;
}
}
else
{
LOG.error("dash array has non number element " + base + ", 
ignored");
dashArray = new COSArray();
break;
}
}
if (dashArray.size() > 0 && allZero)
{
LOG.error("dash lengths all zero, ignored");
dashArray = new COSArray();
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3353) Create appearance streams for annotations

2016-07-22 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390513#comment-15390513
 ] 

Daniel Persson commented on PDFBOX-3353:


Hi John

Just had to comment on your last comment. The reasoning for not making a class 
inheritable is a solid on at first glans but might have consequences. 

When you make a class private / protected you lock down that class and those 
who need a quick fix could realize this and try to work around it. In the worst 
case someone might have to have a dummy subsystem to change one value that the 
author won't change for some reason.

So using a third party library can be annoying for many reasons. A good API is 
extendable and open. 

If you get complaints when you bugfix that seems more like a community problem 
than a code problem.

It's hard to measure the tone of text when English isn't your native language, 
but I hope you read my message as a reflection on your comment and not 
criticism. 

This community has made a great tool that I'm happy to use and contribute.

Best regards
Daniel

> Create appearance streams for annotations
> -
>
> Key: PDFBOX-3353
> URL: https://issues.apache.org/jira/browse/PDFBOX-3353
> Project: PDFBox
>  Issue Type: Task
>  Components: PDModel, Rendering
>Affects Versions: 1.8.12, 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Tilman Hausherr
>  Labels: Annotations
> Attachments: SquareAnnotations.pdf, showAnnotation.java
>
>
> Create appearance streams for annotations when missing.
> I'll start by replacing current code for Ink and Link annotations.
> Good example PDFs:
> http://www.pdfill.com/example/pdf_commenting_new.pdf
> https://github.com/mozilla/pdf.js/issues/6810



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-07-13 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375639#comment-15375639
 ] 

Daniel Persson commented on PDFBOX-3395:


Ran some of my test cases and the errors are gone. Now I only have warnings for 
missing Unicode mappings which are unrelated to this issue.

> Throwing exception when PDF has unused empty fonts embedded.
> 
>
> Key: PDFBOX-3395
> URL: https://issues.apache.org/jira/browse/PDFBOX-3395
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1, 2.0.2, 2.0.3
>Reporter: Daniel Persson
> Fix For: 2.0.3, 2.1.0
>
>
> I was trying to follow up on the issues in our system and found that some PDF 
> files threw ERRORs. These PDFs are produced by a publishing system and that 
> system seems to add fonts when you change to them and add them even though 
> they are never used. Or only space is used. Then they add this font with an 
> empty glyf table. This results in that errors are thrown on files that are 
> fine.
> Line 310 in TTFParser removes empty glyf tables.
> // skip tables with zero length
> if (table.getLength() == 0)
> {
> return null;
> }
> return table;
> Line 215 of TTFParser throws exception when glyf table is missing.
> if (font.getGlyph() == null)
> {
> throw new IOException("glyf is mandatory");
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-07-13 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375379#comment-15375379
 ] 

Daniel Persson commented on PDFBOX-3395:


True, if you read the glyph table you'll get an empty one. The problem is the 
line that generally skips tables of lenght 0 in the parser. Seems a bit odd. If 
the font have defined a table then the empty one should be a valid table right?

> Throwing exception when PDF has unused empty fonts embedded.
> 
>
> Key: PDFBOX-3395
> URL: https://issues.apache.org/jira/browse/PDFBOX-3395
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Reporter: Daniel Persson
>
> I was trying to follow up on the issues in our system and found that some PDF 
> files threw ERRORs. These PDFs are produced by a publishing system and that 
> system seems to add fonts when you change to them and add them even though 
> they are never used. Or only space is used. Then they add this font with an 
> empty glyf table. This results in that errors are thrown on files that are 
> fine.
> Line 310 in TTFParser removes empty glyf tables.
> // skip tables with zero length
> if (table.getLength() == 0)
> {
> return null;
> }
> return table;
> Line 215 of TTFParser throws exception when glyf table is missing.
> if (font.getGlyph() == null)
> {
> throw new IOException("glyf is mandatory");
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-07-12 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374390#comment-15374390
 ] 

Daniel Persson commented on PDFBOX-3395:


Then again the specification doesn't say that a glyph table require any glyphs. 
So why should an empty generate a warning. A missing table, yes that is an 
error

> Throwing exception when PDF has unused empty fonts embedded.
> 
>
> Key: PDFBOX-3395
> URL: https://issues.apache.org/jira/browse/PDFBOX-3395
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Reporter: Daniel Persson
>
> I was trying to follow up on the issues in our system and found that some PDF 
> files threw ERRORs. These PDFs are produced by a publishing system and that 
> system seems to add fonts when you change to them and add them even though 
> they are never used. Or only space is used. Then they add this font with an 
> empty glyf table. This results in that errors are thrown on files that are 
> fine.
> Line 310 in TTFParser removes empty glyf tables.
> // skip tables with zero length
> if (table.getLength() == 0)
> {
> return null;
> }
> return table;
> Line 215 of TTFParser throws exception when glyf table is missing.
> if (font.getGlyph() == null)
> {
> throw new IOException("glyf is mandatory");
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-07-12 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374385#comment-15374385
 ] 

Daniel Persson commented on PDFBOX-3395:


Correct, that's why I logged this as a minor wish issue. Our logging framework 
alerts us on errors and this isn't one so waking up to a false positive isn't 
preferable.

> Throwing exception when PDF has unused empty fonts embedded.
> 
>
> Key: PDFBOX-3395
> URL: https://issues.apache.org/jira/browse/PDFBOX-3395
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Reporter: Daniel Persson
>
> I was trying to follow up on the issues in our system and found that some PDF 
> files threw ERRORs. These PDFs are produced by a publishing system and that 
> system seems to add fonts when you change to them and add them even though 
> they are never used. Or only space is used. Then they add this font with an 
> empty glyf table. This results in that errors are thrown on files that are 
> fine.
> Line 310 in TTFParser removes empty glyf tables.
> // skip tables with zero length
> if (table.getLength() == 0)
> {
> return null;
> }
> return table;
> Line 215 of TTFParser throws exception when glyf table is missing.
> if (font.getGlyph() == null)
> {
> throw new IOException("glyf is mandatory");
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-07-12 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372664#comment-15372664
 ] 

Daniel Persson commented on PDFBOX-3395:


Thanks for the heads up, not terrible important that it won't be indexed. After 
all it's just an ads page. And I guess you might want to use it for a test case 
later.

> Throwing exception when PDF has unused empty fonts embedded.
> 
>
> Key: PDFBOX-3395
> URL: https://issues.apache.org/jira/browse/PDFBOX-3395
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Reporter: Daniel Persson
>
> I was trying to follow up on the issues in our system and found that some PDF 
> files threw ERRORs. These PDFs are produced by a publishing system and that 
> system seems to add fonts when you change to them and add them even though 
> they are never used. Or only space is used. Then they add this font with an 
> empty glyf table. This results in that errors are thrown on files that are 
> fine.
> Line 310 in TTFParser removes empty glyf tables.
> // skip tables with zero length
> if (table.getLength() == 0)
> {
> return null;
> }
> return table;
> Line 215 of TTFParser throws exception when glyf table is missing.
> if (font.getGlyph() == null)
> {
> throw new IOException("glyf is mandatory");
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-07-12 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3395:
---
Attachment: (was: commercal.pdf)

> Throwing exception when PDF has unused empty fonts embedded.
> 
>
> Key: PDFBOX-3395
> URL: https://issues.apache.org/jira/browse/PDFBOX-3395
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Reporter: Daniel Persson
>
> I was trying to follow up on the issues in our system and found that some PDF 
> files threw ERRORs. These PDFs are produced by a publishing system and that 
> system seems to add fonts when you change to them and add them even though 
> they are never used. Or only space is used. Then they add this font with an 
> empty glyf table. This results in that errors are thrown on files that are 
> fine.
> Line 310 in TTFParser removes empty glyf tables.
> // skip tables with zero length
> if (table.getLength() == 0)
> {
> return null;
> }
> return table;
> Line 215 of TTFParser throws exception when glyf table is missing.
> if (font.getGlyph() == null)
> {
> throw new IOException("glyf is mandatory");
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3395) Throwing exception when PDF has unused empty fonts embedded.

2016-06-23 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3395:
--

 Summary: Throwing exception when PDF has unused empty fonts 
embedded.
 Key: PDFBOX-3395
 URL: https://issues.apache.org/jira/browse/PDFBOX-3395
 Project: PDFBox
  Issue Type: Wish
Reporter: Daniel Persson
Priority: Minor


I was trying to follow up on the issues in our system and found that some PDF 
files threw ERRORs. These PDFs are produced by a publishing system and that 
system seems to add fonts when you change to them and add them even though they 
are never used. Or only space is used. Then they add this font with an empty 
glyf table. This results in that errors are thrown on files that are fine.

Line 310 in TTFParser removes empty glyf tables.
// skip tables with zero length
if (table.getLength() == 0)
{
return null;
}
return table;

Line 215 of TTFParser throws exception when glyf table is missing.
if (font.getGlyph() == null)
{
throw new IOException("glyf is mandatory");
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height

2015-10-30 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983261#comment-14983261
 ] 

Daniel Persson commented on PDFBOX-3075:


Thanks for quick responses. I'll look into the issues on Monday on company 
time. Been a really great chat. 

> Changed to the getHeight function for fonts so it will return a more accurate 
> height
> 
>
> Key: PDFBOX-3075
> URL: https://issues.apache.org/jira/browse/PDFBOX-3075
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: get_height.patch
>
>
> The getHeight in the fonts gave back approximated heights and in some cases 
> only height the first time the function was called. Tried to clean up the 
> functions and return a more accurate height for each glyph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height

2015-10-30 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983199#comment-14983199
 ] 

Daniel Persson commented on PDFBOX-3075:


John: I've read your reply at
http://mail-archives.apache.org/mod_mbox/pdfbox-users/201510.mbox/%3cbb3c23ee-0c8c-4b5f-a806-eb8d9373a...@jahewson.com%3E

And as you say there you need to rethink the font height so it works with 
PDFTextStripper. My changes made it though the test cases so I think the 
stripper can't be that dependent on the actual text height. It uses the fonts 
boundingbox height not the font.getHeight(int code) that gives you a specific 
glyph height.

Futher more all the font types doesn't have glyphs defined. Could be wrong 
behavior but in those cases you could only approximate the height. My patch 
gave me a unified font height in the 1000 em system so I could make accurate 
calculations on the position and height of glyphs.

I've been running a many tests on these functions but I would like to 
contribute back because the help I've gotten from PDFBOX is great. When it 
comes to the width advance it's pretty accurate as long as I make small changes 
when we have vertical texts and texts that writes from right to left. But we've 
solved those too.

The API documentation only states

Description copied from interface: PDFontLike
Returns the height of the given character, in glyph space. This can be 
expensive to calculate. Results are only approximate.

Which is not that descriptive. So what do you recommend that I do going forth. 
I would like to build my solution on PDFBOX and I have time alotted by my 
company to contribute code back to PDFBOX when our work requires changes in the 
PDFBOX engine. 

This could only be done if we go in the same direction. Should all font's have 
glyphs?

> Changed to the getHeight function for fonts so it will return a more accurate 
> height
> 
>
> Key: PDFBOX-3075
> URL: https://issues.apache.org/jira/browse/PDFBOX-3075
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: get_height.patch
>
>
> The getHeight in the fonts gave back approximated heights and in some cases 
> only height the first time the function was called. Tried to clean up the 
> functions and return a more accurate height for each glyph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3073) Change to use media box for page size instead of cropbox.

2015-10-30 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983152#comment-14983152
 ] 

Daniel Persson commented on PDFBOX-3073:


Yes but if you use all the information from the PDF with the local coorinates 
and your function in the PDFTextStreamEngine.java then all data is in the wrong 
place when you actually have media and crop boxes that differs in size. I've 
ran about 500 examples and get the wrong placement of text every time. But if I 
change this to media box and then recalculate the data to the crop box after 
the data has been extracted I get the correct positions. 

> Change to use media box for page size instead of cropbox.
> -
>
> Key: PDFBOX-3073
> URL: https://issues.apache.org/jira/browse/PDFBOX-3073
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: mediabox_for_content.patch
>
>
> For PDF documents where media box is larger or smaller than crop box the 
> content get squeezed or stretched.
> For PDF content the media box should be used as the page size.
> More information about this at 
> http://www.prepressure.com/pdf/basics/page-boxes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height

2015-10-30 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983139#comment-14983139
 ] 

Daniel Persson commented on PDFBOX-3075:


I've implemented a Bounding box function in the PDFTextStreamEngine.java that 
could give you an accurate box not requiring you to check the direction for 
using. Is this function also not a valid contribution? This function uses the 
getHeight function.

> Changed to the getHeight function for fonts so it will return a more accurate 
> height
> 
>
> Key: PDFBOX-3075
> URL: https://issues.apache.org/jira/browse/PDFBOX-3075
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: get_height.patch
>
>
> The getHeight in the fonts gave back approximated heights and in some cases 
> only height the first time the function was called. Tried to clean up the 
> functions and return a more accurate height for each glyph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3074) Mark transparency groups

2015-10-30 Thread Daniel Persson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983130#comment-14983130
 ] 

Daniel Persson commented on PDFBOX-3074:


Thanks for the input. Not found any good information about this embedded 
content that should not be show though. Transparency groups aren't stacked and 
only the data inside a marked content is actually a part of a group. Or have I 
miss understood this concept?

> Mark transparency groups
> 
>
> Key: PDFBOX-3074
> URL: https://issues.apache.org/jira/browse/PDFBOX-3074
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: mark_transparency_groups.patch
>
>
> We try to read text from PDF files but some of the files include extra data 
> that is never shown. These segments are usually grouped in transparency 
> groups. So for us this function to flag a marked content as a transparency 
> group is quite useful.
> If there is a way to do this please tell me or if there is a better way to 
> remove text that isn't presented or drawn when the PDF is viewed then I'm all 
> ears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height

2015-10-30 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3075:
---
Attachment: get_height.patch

Patch for this issue

> Changed to the getHeight function for fonts so it will return a more accurate 
> height
> 
>
> Key: PDFBOX-3075
> URL: https://issues.apache.org/jira/browse/PDFBOX-3075
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: get_height.patch
>
>
> The getHeight in the fonts gave back approximated heights and in some cases 
> only height the first time the function was called. Tried to clean up the 
> functions and return a more accurate height for each glyph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3074) Mark transparency groups

2015-10-30 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3074:
---
Attachment: mark_transparency_groups.patch

Patch for this issue

> Mark transparency groups
> 
>
> Key: PDFBOX-3074
> URL: https://issues.apache.org/jira/browse/PDFBOX-3074
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: mark_transparency_groups.patch
>
>
> We try to read text from PDF files but some of the files include extra data 
> that is never shown. These segments are usually grouped in transparency 
> groups. So for us this function to flag a marked content as a transparency 
> group is quite useful.
> If there is a way to do this please tell me or if there is a better way to 
> remove text that isn't presented or drawn when the PDF is viewed then I'm all 
> ears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3073) Change to use media box for page size instead of cropbox.

2015-10-30 Thread Daniel Persson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Persson updated PDFBOX-3073:
---
Attachment: mediabox_for_content.patch

Patch for this issue.

> Change to use media box for page size instead of cropbox.
> -
>
> Key: PDFBOX-3073
> URL: https://issues.apache.org/jira/browse/PDFBOX-3073
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
>Reporter: Daniel Persson
>Priority: Minor
>  Labels: github-import
> Fix For: 2.0.0
>
> Attachments: mediabox_for_content.patch
>
>
> For PDF documents where media box is larger or smaller than crop box the 
> content get squeezed or stretched.
> For PDF content the media box should be used as the page size.
> More information about this at 
> http://www.prepressure.com/pdf/basics/page-boxes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3075) Changed to the getHeight function for fonts so it will return a more accurate height

2015-10-30 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3075:
--

 Summary: Changed to the getHeight function for fonts so it will 
return a more accurate height
 Key: PDFBOX-3075
 URL: https://issues.apache.org/jira/browse/PDFBOX-3075
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 2.0.0
Reporter: Daniel Persson
Priority: Minor
 Fix For: 2.0.0


The getHeight in the fonts gave back approximated heights and in some cases 
only height the first time the function was called. Tried to clean up the 
functions and return a more accurate height for each glyph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3073) Change to use media box for page size instead of cropbox.

2015-10-30 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3073:
--

 Summary: Change to use media box for page size instead of cropbox.
 Key: PDFBOX-3073
 URL: https://issues.apache.org/jira/browse/PDFBOX-3073
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 2.0.0
Reporter: Daniel Persson
Priority: Minor
 Fix For: 2.0.0


For PDF documents where media box is larger or smaller than crop box the 
content get squeezed or stretched.

For PDF content the media box should be used as the page size.

More information about this at 
http://www.prepressure.com/pdf/basics/page-boxes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3074) Mark transparency groups

2015-10-30 Thread Daniel Persson (JIRA)
Daniel Persson created PDFBOX-3074:
--

 Summary: Mark transparency groups
 Key: PDFBOX-3074
 URL: https://issues.apache.org/jira/browse/PDFBOX-3074
 Project: PDFBox
  Issue Type: New Feature
  Components: Text extraction
Affects Versions: 2.0.0
Reporter: Daniel Persson
Priority: Minor
 Fix For: 2.0.0


We try to read text from PDF files but some of the files include extra data 
that is never shown. These segments are usually grouped in transparency groups. 
So for us this function to flag a marked content as a transparency group is 
quite useful.

If there is a way to do this please tell me or if there is a better way to 
remove text that isn't presented or drawn when the PDF is viewed then I'm all 
ears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org