[jira] [Commented] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672986#comment-15672986
 ] 

Tilman Hausherr commented on PDFBOX-3568:
-

I'm saying that the position of 200 matches for second "t" against the 
rendering in 1.8.

The position / widths values you get in 2.0.3 are correct.

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf, test_width11.png
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672531#comment-15672531
 ] 

Roman commented on PDFBOX-3568:
---

Why are you saying position of 200 is correct for second "t" ? 
And, in the same time, you are saying that correct position is 195.81, and it 
can only extracted via PdfBox 2.0 ?

Blue rectangles on my screenshot are drawn using values extracted by PdfBox 
1.8.12. I understand that it would be fixed after we migrate to 2.0.3.

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf, test_width11.png
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Kurt Devlin (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671425#comment-15671425
 ] 

Kurt Devlin commented on PDFBOX-3576:
-

I'm OK with TextPosition being a final class. I think of it just like Java 
String. But I think there should be the tools that let me make a copy of a 
TextPostion instance or to compare two instances. I think adding getters for 
any of the fields that are needed by the TextPosition constructor (along with 
equals() and hashcode()) would allow this.

For our application, we are processing PDF and creating an XML representation 
of it. Some of what we do may be generally applicable to PDFBox, but I'm still 
trying to understand the scope of the project, so I'm not prepared to make any 
suggestions for general suggestions/recommendations now. The project started 
out as a proof of concept and to accelerate development time a copy of PDFBox 
source was pulled in and modified as opposed to extending. I took over the 
project recently and the idea of maintaining a Frankenstein version (a 
combination of 1.7 and 1.8) of PDFBox as well as our own custom code is more 
than I'm willing to work with. I've been working over the last month or so to 
identify what is core PDFBox and what is ours and what needs to change in order 
to fully factor out the forked code and switch over to PDFBox 2.0. It hasn't 
been pretty.


> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3569) Performance regression in PDColorSpace#toRGBImageAWT

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671337#comment-15671337
 ] 

John Hewson commented on PDFBOX-3569:
-

Surely this problem will apply to any ICCBased color space too, not just 
DeviceCMYK?

> Performance regression in PDColorSpace#toRGBImageAWT
> 
>
> Key: PDFBOX-3569
> URL: https://issues.apache.org/jira/browse/PDFBOX-3569
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.3, 2.1.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.4, 2.1.0
>
> Attachments: PDFBOX-1058.pdf-3.png, PDFBOX-1058.pdf-3.png, 
> PDFBOX-1058.pdf-3.png-diff.png, PDFBOX-2700-JCS_YCCK.pdf-1.png, 
> PDFBOX-2700-JCS_YCCK.pdf-1.png, PDFBOX-2700-JCS_YCCK.pdf-1.png-diff.png, 
> PDFBOX-3569-patch_v2.txt
>
>
> I've a private pdf containing 1900 tiny inline images (CMYK, 8bit) which 
> renders way too slow. Again the CMYK2RGB conversion is the culprit here, BUT 
> the known issue with the KCMS/LCMS change isn't the main problem here.
> I ran some tests on linux (PDFToImage time -imageType png -resolution 150)
> 1.6.0_37: 355s
> 1.7.0_25: 289s
> 1.7.0_75: 298s
> 1.8.0_101: cancelled after 15 min



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3571) sRGB Color Space Profile is subject to 3rd party copyright

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671275#comment-15671275
 ] 

John Hewson commented on PDFBOX-3571:
-

That would work.

> sRGB Color Space Profile is subject to 3rd party copyright
> --
>
> Key: PDFBOX-3571
> URL: https://issues.apache.org/jira/browse/PDFBOX-3571
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.3
>Reporter: gil cattaneo
>
> Hi
> The file /examples/src/main/resources/org/apache/pdfbox/resources/pdfa/sRGB 
> Color Space Profile.icm
> The license: 
> /examples/src/main/resources/org/apache/pdfbox/resources/pdfa/sRGB Color 
> Space Profile.icm.LICENSE.txt
> contains the following:
>  "...permission to use, copy and distribute this file for any purpose is
>  hereby granted without fee, provided that the file is not changed
>  including the HP copyright notice tag, ... "
> The license says: "provided that the file is not changed"
> It does not respect the criteria "The license must meet the Open Source 
> Definition."
> The OSD [1] says:
> "3. Derived Works
> The license must allow modifications"
> [1] http://www.opensource.org/osd.html
> http://www.apache.org/legal/resolved.html#no-modification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671239#comment-15671239
 ] 

John Hewson edited comment on PDFBOX-3576 at 11/16/16 6:44 PM:
---

If you have any general improvements to TextStripper we'd be happy to include 
them in PDFBox, providing they don't cause regressions. You can attach an SVN 
patch to this issue via More > Attach files. That way we maintain it :)

Is any of your code not generally applicable to PDFBox users? If not, then 
these modifications to PDFBox belong in PDFBox itself.


was (Author: jahewson):
If you have any general improvements to TextStripper we'd be happy to include 
them in PDFBox, providing they don't cause regressions. You can attach an SVN 
patch to this issue via More > Attach files. That way we maintain it :)

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671255#comment-15671255
 ] 

John Hewson commented on PDFBOX-3576:
-

I'm vary wary of extending the ability to subclass TextStripper. These things 
always come back to bite us, and the people who's subclasses no longer work.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671249#comment-15671249
 ] 

Maruan Sahyoun commented on PDFBOX-3576:


And if it dies this will be 3.0.0 I think as this is a breaking change. So I'd 
also think that adding the getters will help short term in this case until the 
new functionality is in place. As soon as this is available we can deprecate 
the old classes and remove them in 3.0.0.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671244#comment-15671244
 ] 

John Hewson edited comment on PDFBOX-3576 at 11/16/16 6:40 PM:
---

We could, but I'd rather support use of TextPosition and TextStripper in a 
vanilla manner first - then accommodate subclassing only if strictly necessary.

How many times has "let me arbitrarily subclass this thing" actually meant "I 
need one small feature added to PDFBox". I've lost count.


was (Author: jahewson):
We could, but I'd rather support use of TextPosition and TextStripper in a 
vanilla manner first - then accommodate subclassing only if strictly necessary.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671244#comment-15671244
 ] 

John Hewson commented on PDFBOX-3576:
-

We could, but I'd rather support use of TextPosition and TextStripper in a 
vanilla manner first - then accommodate subclassing only if strictly necessary.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671239#comment-15671239
 ] 

John Hewson commented on PDFBOX-3576:
-

If you have any general improvements to TextStripper we'd be happy to include 
them in PDFBox, providing they don't cause regressions. You can attach an SVN 
patch to this issue via More > Attach files. That way we maintain it :)

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671236#comment-15671236
 ] 

Tilman Hausherr commented on PDFBOX-3576:
-

Yes we discussed this some time ago but no code has been written, except the 
DrawPrintTextLocations example.

Anyway, if you think that TextPosition is about to die soon, why not add the 
two getters he's asking for?

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671222#comment-15671222
 ] 

John Hewson commented on PDFBOX-3576:
-

We did discuss replacing the TextStripper in its entirety and switching to a 
visual glyph-bounds approach. And visual glyph bounds code has been progressing 
recently.

It's not a forgone conclusion, but it doesn't seem like TextPosition has much 
life left in it.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671174#comment-15671174
 ] 

John Hewson edited comment on PDFBOX-3576 at 11/16/16 6:10 PM:
---

TextPosition is rapidly approaching its end-of-life, so I doubt that this will 
happen.


was (Author: jahewson):
TextPosition is rapidly approach its end-of-life, so I doubt that this will 
happen.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition, but can getters be added to provide read-only access to the 
> class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671189#comment-15671189
 ] 

Tilman Hausherr commented on PDFBOX-3576:
-

{quote}
TextPosition is rapidly approaching its end-of-life
{quote}
really? I haven't noticed any development.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Kurt Devlin (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671184#comment-15671184
 ] 

Kurt Devlin commented on PDFBOX-3576:
-

I'm new to Apache PDFBox (and to our company's code that uses it). Is there a 
recommendation on how to move forward now without having forked copies of 
Apache PDFBox code in our project?

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Kurt Devlin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Devlin updated PDFBOX-3576:

Description: 
I've inherited code in my department that created a local copy of TextPosition 
and broke it's immutability. I'm trying to refactor this code and use the core 
Apache implementation and I understand the need for TextPosition to be 
immutable, but can getters be added to provide read-only access to the class's 
fields?

We have custom code that does comparison of endX/endY values to determine if 
there are overlapping characters and other features. In our application, we 
also have a swap feature where we need to create/clone new TextPosition 
instances with minor differences between an existing TextPosition instance. 
Since there are a lot of fields that are required by the constructor of the new 
instances, but can't be read from an existing instance, we need a way to access 
these fields.

There probably should also be an override for equals() and hashCode().

  was:
I've inherited code in my department that created a local copy of TextPosition 
and broke it's immutability. I'm trying to refactor this code and use the core 
Apache implementation and I understand the need for TextPosition, but can 
getters be added to provide read-only access to the class's fields?

We have custom code that does comparison of endX/endY values to determine if 
there are overlapping characters and other features. In our application, we 
also have a swap feature where we need to create/clone new TextPosition 
instances with minor differences between an existing TextPosition instance. 
Since there are a lot of fields that are required by the constructor of the new 
instances, but can't be read from an existing instance, we need a way to access 
these fields.

There probably should also be an override for equals() and hashCode().


> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition to be immutable, but can getters be added to provide read-only 
> access to the class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671174#comment-15671174
 ] 

John Hewson commented on PDFBOX-3576:
-

TextPosition is rapidly approach its end-of-life, so I doubt that this will 
happen.

> Add getter methods to TextPosition
> --
>
> Key: PDFBOX-3576
> URL: https://issues.apache.org/jira/browse/PDFBOX-3576
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.3
>Reporter: Kurt Devlin
>
> I've inherited code in my department that created a local copy of 
> TextPosition and broke it's immutability. I'm trying to refactor this code 
> and use the core Apache implementation and I understand the need for 
> TextPosition, but can getters be added to provide read-only access to the 
> class's fields?
> We have custom code that does comparison of endX/endY values to determine if 
> there are overlapping characters and other features. In our application, we 
> also have a swap feature where we need to create/clone new TextPosition 
> instances with minor differences between an existing TextPosition instance. 
> Since there are a lot of fields that are required by the constructor of the 
> new instances, but can't be read from an existing instance, we need a way to 
> access these fields.
> There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3576) Add getter methods to TextPosition

2016-11-16 Thread Kurt Devlin (JIRA)
Kurt Devlin created PDFBOX-3576:
---

 Summary: Add getter methods to TextPosition
 Key: PDFBOX-3576
 URL: https://issues.apache.org/jira/browse/PDFBOX-3576
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 2.0.3
Reporter: Kurt Devlin


I've inherited code in my department that created a local copy of TextPosition 
and broke it's immutability. I'm trying to refactor this code and use the core 
Apache implementation and I understand the need for TextPosition, but can 
getters be added to provide read-only access to the class's fields?

We have custom code that does comparison of endX/endY values to determine if 
there are overlapping characters and other features. In our application, we 
also have a swap feature where we need to create/clone new TextPosition 
instances with minor differences between an existing TextPosition instance. 
Since there are a lot of fields that are required by the constructor of the new 
instances, but can't be read from an existing instance, we need a way to access 
these fields.

There probably should also be an override for equals() and hashCode().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-3575) Missing (auto)close on RandomFileAccess in CCITTFactory

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-3575.
---
Resolution: Duplicate

Duplicate of PDFBOX-3517. Get a snapshot or wait for the release of 2.0.4 :-)

> Missing (auto)close on RandomFileAccess in CCITTFactory
> ---
>
> Key: PDFBOX-3575
> URL: https://issues.apache.org/jira/browse/PDFBOX-3575
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.3
> Environment: Windows 10/64, java version "1.8.0_112" Java(TM) SE 
> Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM 
> (build 25.112-b15, mixed mode)
>Reporter: Maurice Betzel
>
> In org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory the method 
> createFromFile instantiates a new RandomAccessFile but does not close it when 
> done. This keeps a lock on the File untill the JVM is closed.
> Using close or autocloseable resolves the issue.
> public static PDImageXObject createFromFile(PDDocument document, File file, 
> int number)
> throws IOException
> {
> return createFromRandomAccessImpl(document, new 
> RandomAccessFile(file, "r"), number);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3573) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671023#comment-15671023
 ] 

Tilman Hausherr commented on PDFBOX-3573:
-

Could you please attach the PDF file, and tell what -Xmx setting you are using 
(if any)?

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> --
>
> Key: PDFBOX-3573
> URL: https://issues.apache.org/jira/browse/PDFBOX-3573
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 2.0.3
> Environment: Ubuntu 15.10
>Reporter: Dmitri Russu
>
> I try to get Images from an PDF file, file pages are images, total size of 
> PDF file is about 1MB
> Code I run is next
> {code}
>   public static void testExtractImages() throws Exception {
>   File resource = new File("test/t1_edited.pdf");
>   PDDocument document = PDDocument.load(resource);
>   int page = 1;
>   for (final PDPage pdPage : document.getPages())
>   {
>   final int currentPage = page;
>   PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
> PDFGraphicsStreamEngine(pdPage)
>   {
>   int index = 0;
>   @Override
>   public void drawImage(PDImage pdImage) throws 
> IOException
>   {
>   if (pdImage instanceof PDImageXObject)
>   {
>   PDImageXObject image = 
> (PDImageXObject)pdImage;
>   File file = new File("test/", 
> String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
> image.getSuffix()));
>   
> ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
> FileOutputStream(file));
>   index++;
>   }
>   }
>   @Override
>   public void appendRectangle(Point2D p0, Point2D 
> p1, Point2D p2, Point2D p3) throws IOException { }
>   @Override
>   public void clip(int windingRule) throws 
> IOException { }
>   @Override
>   public void moveTo(float x, float y) throws 
> IOException {  }
>   @Override
>   public void lineTo(float x, float y) throws 
> IOException { }
>   @Override
>   public void curveTo(float x1, float y1, float 
> x2, float y2, float x3, float y3) throws IOException {  }
>   @Override
>   public Point2D getCurrentPoint() throws 
> IOException { return null; }
>   @Override
>   public void closePath() throws IOException { }
>   @Override
>   public void endPath() throws IOException { }
>   @Override
>   public void strokePath() throws IOException { }
>   @Override
>   public void fillPath(int windingRule) throws 
> IOException { }
>   @Override
>   public void fillAndStrokePath(int windingRule) 
> throws IOException { }
>   @Override
>   public void shadingFill(COSName shadingName) 
> throws IOException { }
>   };
>   pdfGraphicsStreamEngine.processPage(pdPage);
>   page++;
>   }
>   }
> --
> ERROR
> -
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3230)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>   at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
>   at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
>   at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
>   at 
> org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream

[jira] [Closed] (PDFBOX-3574) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-3574.
---
Resolution: Duplicate

Duplicate of PDFBOX-3573

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> --
>
> Key: PDFBOX-3574
> URL: https://issues.apache.org/jira/browse/PDFBOX-3574
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 2.0.3
> Environment: Ubuntu 15.10
>Reporter: Dmitri Russu
>
> I try to get images from PDF file
> Code I run is :
> {code}
>   public static void testExtractImages() throws Exception {
>   File resource = new File("test/t1_edited.pdf");
>   PDDocument document = PDDocument.load(resource);
>   int page = 1;
>   for (final PDPage pdPage : document.getPages())
>   {
>   final int currentPage = page;
>   PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
> PDFGraphicsStreamEngine(pdPage)
>   {
>   int index = 0;
>   @Override
>   public void drawImage(PDImage pdImage) throws 
> IOException
>   {
>   if (pdImage instanceof PDImageXObject)
>   {
>   PDImageXObject image = 
> (PDImageXObject)pdImage;
>   File file = new File("test/", 
> String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
> image.getSuffix()));
>   
> ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
> FileOutputStream(file));
>   index++;
>   }
>   }
>   @Override
>   public void appendRectangle(Point2D p0, Point2D 
> p1, Point2D p2, Point2D p3) throws IOException { }
>   @Override
>   public void clip(int windingRule) throws 
> IOException { }
>   @Override
>   public void moveTo(float x, float y) throws 
> IOException {  }
>   @Override
>   public void lineTo(float x, float y) throws 
> IOException { }
>   @Override
>   public void curveTo(float x1, float y1, float 
> x2, float y2, float x3, float y3) throws IOException {  }
>   @Override
>   public Point2D getCurrentPoint() throws 
> IOException { return null; }
>   @Override
>   public void closePath() throws IOException { }
>   @Override
>   public void endPath() throws IOException { }
>   @Override
>   public void strokePath() throws IOException { }
>   @Override
>   public void fillPath(int windingRule) throws 
> IOException { }
>   @Override
>   public void fillAndStrokePath(int windingRule) 
> throws IOException { }
>   @Override
>   public void shadingFill(COSName shadingName) 
> throws IOException { }
>   };
>   pdfGraphicsStreamEngine.processPage(pdPage);
>   page++;
>   }
>   }
> {code}
> Error received:
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3230)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>   at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
>   at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
>   at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
>   at 
> org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
>   at 
> org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObj

[jira] [Updated] (PDFBOX-3573) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3573:

Description: 
I try to get Images from an PDF file, file pages are images, total size of PDF 
file is about 1MB

Code I run is next
{code}
public static void testExtractImages() throws Exception {

File resource = new File("test/t1_edited.pdf");

PDDocument document = PDDocument.load(resource);
int page = 1;
for (final PDPage pdPage : document.getPages())
{
final int currentPage = page;
PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
PDFGraphicsStreamEngine(pdPage)
{
int index = 0;

@Override
public void drawImage(PDImage pdImage) throws 
IOException
{
if (pdImage instanceof PDImageXObject)
{
PDImageXObject image = 
(PDImageXObject)pdImage;
File file = new File("test/", 
String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
image.getSuffix()));

ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
FileOutputStream(file));
index++;
}
}

@Override
public void appendRectangle(Point2D p0, Point2D 
p1, Point2D p2, Point2D p3) throws IOException { }

@Override
public void clip(int windingRule) throws 
IOException { }

@Override
public void moveTo(float x, float y) throws 
IOException {  }

@Override
public void lineTo(float x, float y) throws 
IOException { }

@Override
public void curveTo(float x1, float y1, float 
x2, float y2, float x3, float y3) throws IOException {  }

@Override
public Point2D getCurrentPoint() throws 
IOException { return null; }

@Override
public void closePath() throws IOException { }

@Override
public void endPath() throws IOException { }

@Override
public void strokePath() throws IOException { }

@Override
public void fillPath(int windingRule) throws 
IOException { }

@Override
public void fillAndStrokePath(int windingRule) 
throws IOException { }

@Override
public void shadingFill(COSName shadingName) 
throws IOException { }
};
pdfGraphicsStreamEngine.processPage(pdPage);
page++;
}
}

--
ERROR
-
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.cont

[jira] [Updated] (PDFBOX-3574) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3574:

Fix Version/s: (was: 2.0.4)

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> --
>
> Key: PDFBOX-3574
> URL: https://issues.apache.org/jira/browse/PDFBOX-3574
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 2.0.3
> Environment: Ubuntu 15.10
>Reporter: Dmitri Russu
>
> I try to get images from PDF file
> Code I run is :
> {code}
>   public static void testExtractImages() throws Exception {
>   File resource = new File("test/t1_edited.pdf");
>   PDDocument document = PDDocument.load(resource);
>   int page = 1;
>   for (final PDPage pdPage : document.getPages())
>   {
>   final int currentPage = page;
>   PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
> PDFGraphicsStreamEngine(pdPage)
>   {
>   int index = 0;
>   @Override
>   public void drawImage(PDImage pdImage) throws 
> IOException
>   {
>   if (pdImage instanceof PDImageXObject)
>   {
>   PDImageXObject image = 
> (PDImageXObject)pdImage;
>   File file = new File("test/", 
> String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
> image.getSuffix()));
>   
> ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
> FileOutputStream(file));
>   index++;
>   }
>   }
>   @Override
>   public void appendRectangle(Point2D p0, Point2D 
> p1, Point2D p2, Point2D p3) throws IOException { }
>   @Override
>   public void clip(int windingRule) throws 
> IOException { }
>   @Override
>   public void moveTo(float x, float y) throws 
> IOException {  }
>   @Override
>   public void lineTo(float x, float y) throws 
> IOException { }
>   @Override
>   public void curveTo(float x1, float y1, float 
> x2, float y2, float x3, float y3) throws IOException {  }
>   @Override
>   public Point2D getCurrentPoint() throws 
> IOException { return null; }
>   @Override
>   public void closePath() throws IOException { }
>   @Override
>   public void endPath() throws IOException { }
>   @Override
>   public void strokePath() throws IOException { }
>   @Override
>   public void fillPath(int windingRule) throws 
> IOException { }
>   @Override
>   public void fillAndStrokePath(int windingRule) 
> throws IOException { }
>   @Override
>   public void shadingFill(COSName shadingName) 
> throws IOException { }
>   };
>   pdfGraphicsStreamEngine.processPage(pdPage);
>   page++;
>   }
>   }
> {code}
> Error received:
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3230)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>   at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
>   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
>   at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
>   at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
>   at 
> org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
>   at 
> org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.

[jira] [Updated] (PDFBOX-3574) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3574:

Description: 
I try to get images from PDF file

Code I run is :

{code}
public static void testExtractImages() throws Exception {

File resource = new File("test/t1_edited.pdf");

PDDocument document = PDDocument.load(resource);
int page = 1;
for (final PDPage pdPage : document.getPages())
{
final int currentPage = page;
PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
PDFGraphicsStreamEngine(pdPage)
{
int index = 0;

@Override
public void drawImage(PDImage pdImage) throws 
IOException
{
if (pdImage instanceof PDImageXObject)
{
PDImageXObject image = 
(PDImageXObject)pdImage;
File file = new File("test/", 
String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
image.getSuffix()));

ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
FileOutputStream(file));
index++;
}
}

@Override
public void appendRectangle(Point2D p0, Point2D 
p1, Point2D p2, Point2D p3) throws IOException { }

@Override
public void clip(int windingRule) throws 
IOException { }

@Override
public void moveTo(float x, float y) throws 
IOException {  }

@Override
public void lineTo(float x, float y) throws 
IOException { }

@Override
public void curveTo(float x1, float y1, float 
x2, float y2, float x3, float y3) throws IOException {  }

@Override
public Point2D getCurrentPoint() throws 
IOException { return null; }

@Override
public void closePath() throws IOException { }

@Override
public void endPath() throws IOException { }

@Override
public void strokePath() throws IOException { }

@Override
public void fillPath(int windingRule) throws 
IOException { }

@Override
public void fillAndStrokePath(int windingRule) 
throws IOException { }

@Override
public void shadingFill(COSName shadingName) 
throws IOException { }
};
pdfGraphicsStreamEngine.processPage(pdPage);
page++;
}
}
{code}
Error received:
{code}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:213)

[jira] [Comment Edited] (PDFBOX-3000) Transparency Group issues

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670975#comment-15670975
 ] 

Tilman Hausherr edited comment on PDFBOX-3000 at 11/16/16 5:08 PM:
---

The last commit solves the regression for the file from PDFBOX-2182. The file I 
just added (PDFBOX-2182_mod.pdf) shows the problem more clearly: the shading 
pattern is more visible, once as an annotation and once in the page. Both must 
look identical.


was (Author: tilman):
The last commit solves PDFBOX-2182. The file I just added (PDFBOX-2182_mod.pdf) 
shows the problem more clearly: the shading pattern is more visible, once as an 
annotation and once in the page. Both must look identical.

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: 007087-payment-due-p58_reduced2.pdf, 
> PDFBOX-1697-reduced-rotations.pdf, PDFBOX-2182_mod.pdf, PDFBOX-3400-RGB.pdf, 
> PDFBOX-3494_reduced.pdf, PDFBOX-3564-Mask.pdf, 
> PDFBox3359PanelTestEnhanced.java, PDFJS-2845-p1.pdf, 
> PDFJS-5811-2-p4_reduced-rotations.pdf, PDFJS-5811-2.pdf, 
> PDFJS-5853_reduced.pdf, gs-bugzilla691157.pdf, gs-bugzilla691157_mod_unc.pdf, 
> gs-bugzilla691157_mod_unc.png, gs-bugzilla691348.pdf, 
> gs-bugzilla693322_reduced.pdf, gs-bugzilla694556-3.pdf, 
> gs-bugzilla695354.pdf, gs-bugzilla695582-transparency-fill-stroke.pdf, 
> gs-bugzilla695582-transparency-fill-stroke.pdf-1.png, 
> samsung_galaxy_s_4_um-p1_reduced.pdf, softmask-rewrite-alt1.patch, 
> softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3000) Transparency Group issues

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670975#comment-15670975
 ] 

Tilman Hausherr edited comment on PDFBOX-3000 at 11/16/16 5:08 PM:
---

The last commit solves PDFBOX-2182. The file I just added (PDFBOX-2182_mod.pdf) 
shows the problem more clearly: the shading pattern is more visible, once as an 
annotation and once in the page. Both must look identical.


was (Author: tilman):
The last commit solves PDFBOX-2182. The file I just added (PDFBOX-2182_mod.pdf) 
shows the problem more clearly: the shading pattern, is more visible, once as 
an annotation and once in the page. Both must look identical.

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: 007087-payment-due-p58_reduced2.pdf, 
> PDFBOX-1697-reduced-rotations.pdf, PDFBOX-2182_mod.pdf, PDFBOX-3400-RGB.pdf, 
> PDFBOX-3494_reduced.pdf, PDFBOX-3564-Mask.pdf, 
> PDFBox3359PanelTestEnhanced.java, PDFJS-2845-p1.pdf, 
> PDFJS-5811-2-p4_reduced-rotations.pdf, PDFJS-5811-2.pdf, 
> PDFJS-5853_reduced.pdf, gs-bugzilla691157.pdf, gs-bugzilla691157_mod_unc.pdf, 
> gs-bugzilla691157_mod_unc.png, gs-bugzilla691348.pdf, 
> gs-bugzilla693322_reduced.pdf, gs-bugzilla694556-3.pdf, 
> gs-bugzilla695354.pdf, gs-bugzilla695582-transparency-fill-stroke.pdf, 
> gs-bugzilla695582-transparency-fill-stroke.pdf-1.png, 
> samsung_galaxy_s_4_um-p1_reduced.pdf, softmask-rewrite-alt1.patch, 
> softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3000) Transparency Group issues

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670975#comment-15670975
 ] 

Tilman Hausherr commented on PDFBOX-3000:
-

The last commit solves PDFBOX-2182. The file I just added (PDFBOX-2182_mod.pdf) 
shows the problem more clearly: the shading pattern, is more visible, once as 
an annotation and once in the page. Both must look identical.

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: 007087-payment-due-p58_reduced2.pdf, 
> PDFBOX-1697-reduced-rotations.pdf, PDFBOX-2182_mod.pdf, PDFBOX-3400-RGB.pdf, 
> PDFBOX-3494_reduced.pdf, PDFBOX-3564-Mask.pdf, 
> PDFBox3359PanelTestEnhanced.java, PDFJS-2845-p1.pdf, 
> PDFJS-5811-2-p4_reduced-rotations.pdf, PDFJS-5811-2.pdf, 
> PDFJS-5853_reduced.pdf, gs-bugzilla691157.pdf, gs-bugzilla691157_mod_unc.pdf, 
> gs-bugzilla691157_mod_unc.png, gs-bugzilla691348.pdf, 
> gs-bugzilla693322_reduced.pdf, gs-bugzilla694556-3.pdf, 
> gs-bugzilla695354.pdf, gs-bugzilla695582-transparency-fill-stroke.pdf, 
> gs-bugzilla695582-transparency-fill-stroke.pdf-1.png, 
> samsung_galaxy_s_4_um-p1_reduced.pdf, softmask-rewrite-alt1.patch, 
> softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3000) Transparency Group issues

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3000:

Attachment: PDFBOX-2182_mod.pdf

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: 007087-payment-due-p58_reduced2.pdf, 
> PDFBOX-1697-reduced-rotations.pdf, PDFBOX-2182_mod.pdf, PDFBOX-3400-RGB.pdf, 
> PDFBOX-3494_reduced.pdf, PDFBOX-3564-Mask.pdf, 
> PDFBox3359PanelTestEnhanced.java, PDFJS-2845-p1.pdf, 
> PDFJS-5811-2-p4_reduced-rotations.pdf, PDFJS-5811-2.pdf, 
> PDFJS-5853_reduced.pdf, gs-bugzilla691157.pdf, gs-bugzilla691157_mod_unc.pdf, 
> gs-bugzilla691157_mod_unc.png, gs-bugzilla691348.pdf, 
> gs-bugzilla693322_reduced.pdf, gs-bugzilla694556-3.pdf, 
> gs-bugzilla695354.pdf, gs-bugzilla695582-transparency-fill-stroke.pdf, 
> gs-bugzilla695582-transparency-fill-stroke.pdf-1.png, 
> samsung_galaxy_s_4_um-p1_reduced.pdf, softmask-rewrite-alt1.patch, 
> softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3000) Transparency Group issues

2016-11-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670967#comment-15670967
 ] 

ASF subversion and git services commented on PDFBOX-3000:
-

Commit 1770016 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1770016 ]

PDFBOX-3000: set initialMatrix for annotations

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: 007087-payment-due-p58_reduced2.pdf, 
> PDFBOX-1697-reduced-rotations.pdf, PDFBOX-3400-RGB.pdf, 
> PDFBOX-3494_reduced.pdf, PDFBOX-3564-Mask.pdf, 
> PDFBox3359PanelTestEnhanced.java, PDFJS-2845-p1.pdf, 
> PDFJS-5811-2-p4_reduced-rotations.pdf, PDFJS-5811-2.pdf, 
> PDFJS-5853_reduced.pdf, gs-bugzilla691157.pdf, gs-bugzilla691157_mod_unc.pdf, 
> gs-bugzilla691157_mod_unc.png, gs-bugzilla691348.pdf, 
> gs-bugzilla693322_reduced.pdf, gs-bugzilla694556-3.pdf, 
> gs-bugzilla695354.pdf, gs-bugzilla695582-transparency-fill-stroke.pdf, 
> gs-bugzilla695582-transparency-fill-stroke.pdf-1.png, 
> samsung_galaxy_s_4_um-p1_reduced.pdf, softmask-rewrite-alt1.patch, 
> softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3000) Transparency Group issues

2016-11-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670968#comment-15670968
 ] 

ASF subversion and git services commented on PDFBOX-3000:
-

Commit 1770017 from [~tilman] in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1770017 ]

PDFBOX-3000: set initialMatrix for annotations

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: 007087-payment-due-p58_reduced2.pdf, 
> PDFBOX-1697-reduced-rotations.pdf, PDFBOX-3400-RGB.pdf, 
> PDFBOX-3494_reduced.pdf, PDFBOX-3564-Mask.pdf, 
> PDFBox3359PanelTestEnhanced.java, PDFJS-2845-p1.pdf, 
> PDFJS-5811-2-p4_reduced-rotations.pdf, PDFJS-5811-2.pdf, 
> PDFJS-5853_reduced.pdf, gs-bugzilla691157.pdf, gs-bugzilla691157_mod_unc.pdf, 
> gs-bugzilla691157_mod_unc.png, gs-bugzilla691348.pdf, 
> gs-bugzilla693322_reduced.pdf, gs-bugzilla694556-3.pdf, 
> gs-bugzilla695354.pdf, gs-bugzilla695582-transparency-fill-stroke.pdf, 
> gs-bugzilla695582-transparency-fill-stroke.pdf-1.png, 
> samsung_galaxy_s_4_um-p1_reduced.pdf, softmask-rewrite-alt1.patch, 
> softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3568:

Affects Version/s: 1.8.12

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf, test_width11.png
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670960#comment-15670960
 ] 

Tilman Hausherr commented on PDFBOX-3568:
-

I've changed my earlier post so that it contains getX() and getwidth(). They 
are the same as the values that are already shown by PrintTextLocations.

I'm not sure if you understood my question. With "rendering", I meant how were 
you getting the glyphs in the image? Doing something yourself, using Adobe 
Reader, or using PDFBox? Because when I use PDFBox, the positions are correct. 
I have attached a rendering with 1.8.13.

You can verify that the second "t" of "test" really starts at 200.

I suspect you compare the coordinates with a different rendering, i.e. not done 
by PDFBox.

To get correct coordinates, use the current version 2.0.3.

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf, test_width11.png
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3568:

Attachment: test_width11.png

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf, test_width11.png
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3570) JDK-8054565 Java 8 close contract issue

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670955#comment-15670955
 ] 

Tilman Hausherr commented on PDFBOX-3570:
-

That was a mistake, it's tagged now. Don't know when there will be the next 1.8 
release, we skipped it last time we released the 2.0.3 version.

> JDK-8054565 Java 8 close contract issue
> ---
>
> Key: PDFBOX-3570
> URL: https://issues.apache.org/jira/browse/PDFBOX-3570
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 1.8.12
>Reporter: Caleb Cushing
>Assignee: Tilman Hausherr
> Fix For: 1.8.13, 2.0.4, 2.1.0
>
>
> Java 8 bug uncovered, and wondering if the PDFBox team would be willing to
> work around it?  You should probably reply with an emphatic "no", butI
> figure it is worth a shot.
> Here is the openjdk bug:  https://bugs.openjdk.java.net/browse/JDK-8054565
> PDDocument.saveIncremental(OutputStream) calls close() twice - once in
> try{} and once in finally{}, relying on the Closable contract which says it
> will do nothing if the stream is already close.
> But, we see this:
> {code}
> Caused by: java.io.IOException: Closed LOB
> at
> oracle.jdbc.driver.DatabaseError.SQLToIOException(DatabaseError.java:519)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.ensureOpen(OracleBlobOutputStream.java:231)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.flush(OracleBlobOutputStream.java:167)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at org.apache.pdfbox.pdfwriter.COSWriter.close(COSWriter.java:300)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1366)
> at ourpackage.util.pdf.PdfDRM.applyDRM(PdfDRM.java:225)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyDrmToCesPdfDocumentInDatabase(IQMDxContentLoader.java:383)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyZippedCesDocumentChangesToDatabase(IQMDxContentLoader.java:265)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.generateStatements(IQMDxContentLoader.java:153)
> ... 33 more
> {code}
> Because Java 8's FilterOutputStream.close() is calling flush() on the
> second close and Oracle's driver code doesn't like that.
> The bug can be worked around it by implementing close() in
> COSStandardOutputStream as below:
> {code}
> private boolean closed;
> @Override
> public void close() throws IOException
> {
> try (OutputStream ostream = out)
> {
> if (!closed)
> flush();
> }
> closed = true;
> }
> {code}
> I've done this in our project code base, by cloning and owning
> COSStandardOutputStream and adding it to our classpath first.  Not ideal.
> Also, mailing list thread on openjdk that recognizes the bug
> http://marc.info/?t=14176740874&r=1&w=2.  Although, it is fixed in Java
> 9 with no plan of backporting.  Not sure how to request a backport, but
> that would be the ideal solution.
> original mailing list report 
> http://asfmail.lucidworks.io/mail_files/pdfbox-users/201509.mbox/%3ccalrfkrtvyzc1y7cfxg8x17kkmd+7byyxgxsr0umqxz_mvd0...@mail.gmail.com%3E
> (note: listed versions I know this affects)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668076#comment-15668076
 ] 

Tilman Hausherr edited comment on PDFBOX-3568 at 11/16/16 4:49 PM:
---

output of PrintTextLocations:

1.8.13:
String[181.92,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=4.7039948]t - getX(): 181.92, getwidth(): 4.7039948
String[186.5909,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=7.380005]e - getX(): 186.5909, getwidth(): 7.380005
String[193.93779,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=6.251999]s - getX(): 193.93779, getwidth(): 6.251999
String[200.1567,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=4.7039948]t - getX(): 200.1567, getwidth(): 4.7039948


2.1:
String[181.92,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=3.9960022]t
String[185.8829,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=5.328003]e
String[191.17781,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=4.6679993]s
String[195.81271,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=3.9960022]t

the results for 1.8.13 are incorrect, but then, much font related stuff in 
1.8.* is incorrect.


was (Author: tilman):
output of PrintTextLocations:

1.8.13:
String[181.92,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=4.7039948]t
String[186.5909,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=7.380005]e
String[193.93779,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=6.251999]s
String[200.1567,83.52002 fs=12.0 xscale=12.0 height=6.984375 space=3.8160002 
width=4.7039948]t

2.1:
String[181.92,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=3.9960022]t
String[185.8829,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=5.328003]e
String[191.17781,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=4.6679993]s
String[195.81271,83.52002 fs=12.0 xscale=12.0 height=6.918 space=3.0 
width=3.9960022]t

the results for 1.8.13 are incorrect, but then, much font related stuff in 
1.8.* is incorrect.

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-3570) JDK-8054565 Java 8 close contract issue

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3570.
-
   Resolution: Fixed
 Assignee: Tilman Hausherr
Fix Version/s: 2.1.0
   2.0.4

> JDK-8054565 Java 8 close contract issue
> ---
>
> Key: PDFBOX-3570
> URL: https://issues.apache.org/jira/browse/PDFBOX-3570
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 1.8.12
>Reporter: Caleb Cushing
>Assignee: Tilman Hausherr
> Fix For: 2.0.4, 2.1.0
>
>
> Java 8 bug uncovered, and wondering if the PDFBox team would be willing to
> work around it?  You should probably reply with an emphatic "no", butI
> figure it is worth a shot.
> Here is the openjdk bug:  https://bugs.openjdk.java.net/browse/JDK-8054565
> PDDocument.saveIncremental(OutputStream) calls close() twice - once in
> try{} and once in finally{}, relying on the Closable contract which says it
> will do nothing if the stream is already close.
> But, we see this:
> {code}
> Caused by: java.io.IOException: Closed LOB
> at
> oracle.jdbc.driver.DatabaseError.SQLToIOException(DatabaseError.java:519)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.ensureOpen(OracleBlobOutputStream.java:231)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.flush(OracleBlobOutputStream.java:167)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at org.apache.pdfbox.pdfwriter.COSWriter.close(COSWriter.java:300)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1366)
> at ourpackage.util.pdf.PdfDRM.applyDRM(PdfDRM.java:225)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyDrmToCesPdfDocumentInDatabase(IQMDxContentLoader.java:383)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyZippedCesDocumentChangesToDatabase(IQMDxContentLoader.java:265)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.generateStatements(IQMDxContentLoader.java:153)
> ... 33 more
> {code}
> Because Java 8's FilterOutputStream.close() is calling flush() on the
> second close and Oracle's driver code doesn't like that.
> The bug can be worked around it by implementing close() in
> COSStandardOutputStream as below:
> {code}
> private boolean closed;
> @Override
> public void close() throws IOException
> {
> try (OutputStream ostream = out)
> {
> if (!closed)
> flush();
> }
> closed = true;
> }
> {code}
> I've done this in our project code base, by cloning and owning
> COSStandardOutputStream and adding it to our classpath first.  Not ideal.
> Also, mailing list thread on openjdk that recognizes the bug
> http://marc.info/?t=14176740874&r=1&w=2.  Although, it is fixed in Java
> 9 with no plan of backporting.  Not sure how to request a backport, but
> that would be the ideal solution.
> original mailing list report 
> http://asfmail.lucidworks.io/mail_files/pdfbox-users/201509.mbox/%3ccalrfkrtvyzc1y7cfxg8x17kkmd+7byyxgxsr0umqxz_mvd0...@mail.gmail.com%3E
> (note: listed versions I know this affects)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3570) JDK-8054565 Java 8 close contract issue

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3570:

Fix Version/s: 1.8.13

> JDK-8054565 Java 8 close contract issue
> ---
>
> Key: PDFBOX-3570
> URL: https://issues.apache.org/jira/browse/PDFBOX-3570
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 1.8.12
>Reporter: Caleb Cushing
>Assignee: Tilman Hausherr
> Fix For: 1.8.13, 2.0.4, 2.1.0
>
>
> Java 8 bug uncovered, and wondering if the PDFBox team would be willing to
> work around it?  You should probably reply with an emphatic "no", butI
> figure it is worth a shot.
> Here is the openjdk bug:  https://bugs.openjdk.java.net/browse/JDK-8054565
> PDDocument.saveIncremental(OutputStream) calls close() twice - once in
> try{} and once in finally{}, relying on the Closable contract which says it
> will do nothing if the stream is already close.
> But, we see this:
> {code}
> Caused by: java.io.IOException: Closed LOB
> at
> oracle.jdbc.driver.DatabaseError.SQLToIOException(DatabaseError.java:519)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.ensureOpen(OracleBlobOutputStream.java:231)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.flush(OracleBlobOutputStream.java:167)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at org.apache.pdfbox.pdfwriter.COSWriter.close(COSWriter.java:300)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1366)
> at ourpackage.util.pdf.PdfDRM.applyDRM(PdfDRM.java:225)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyDrmToCesPdfDocumentInDatabase(IQMDxContentLoader.java:383)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyZippedCesDocumentChangesToDatabase(IQMDxContentLoader.java:265)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.generateStatements(IQMDxContentLoader.java:153)
> ... 33 more
> {code}
> Because Java 8's FilterOutputStream.close() is calling flush() on the
> second close and Oracle's driver code doesn't like that.
> The bug can be worked around it by implementing close() in
> COSStandardOutputStream as below:
> {code}
> private boolean closed;
> @Override
> public void close() throws IOException
> {
> try (OutputStream ostream = out)
> {
> if (!closed)
> flush();
> }
> closed = true;
> }
> {code}
> I've done this in our project code base, by cloning and owning
> COSStandardOutputStream and adding it to our classpath first.  Not ideal.
> Also, mailing list thread on openjdk that recognizes the bug
> http://marc.info/?t=14176740874&r=1&w=2.  Although, it is fixed in Java
> 9 with no plan of backporting.  Not sure how to request a backport, but
> that would be the ideal solution.
> original mailing list report 
> http://asfmail.lucidworks.io/mail_files/pdfbox-users/201509.mbox/%3ccalrfkrtvyzc1y7cfxg8x17kkmd+7byyxgxsr0umqxz_mvd0...@mail.gmail.com%3E
> (note: listed versions I know this affects)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3570) JDK-8054565 Java 8 close contract issue

2016-11-16 Thread Caleb Cushing (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670782#comment-15670782
 ] 

Caleb Cushing commented on PDFBOX-3570:
---

I noticed you didn't tag it, so asking will there be a 1.8.13 with this fix? 
(since you did also commit to that branch) I'm not intimately familliar with 
pdfbox's release cycle.

> JDK-8054565 Java 8 close contract issue
> ---
>
> Key: PDFBOX-3570
> URL: https://issues.apache.org/jira/browse/PDFBOX-3570
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 1.8.12
>Reporter: Caleb Cushing
>Assignee: Tilman Hausherr
> Fix For: 1.8.13, 2.0.4, 2.1.0
>
>
> Java 8 bug uncovered, and wondering if the PDFBox team would be willing to
> work around it?  You should probably reply with an emphatic "no", butI
> figure it is worth a shot.
> Here is the openjdk bug:  https://bugs.openjdk.java.net/browse/JDK-8054565
> PDDocument.saveIncremental(OutputStream) calls close() twice - once in
> try{} and once in finally{}, relying on the Closable contract which says it
> will do nothing if the stream is already close.
> But, we see this:
> {code}
> Caused by: java.io.IOException: Closed LOB
> at
> oracle.jdbc.driver.DatabaseError.SQLToIOException(DatabaseError.java:519)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.ensureOpen(OracleBlobOutputStream.java:231)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.flush(OracleBlobOutputStream.java:167)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at org.apache.pdfbox.pdfwriter.COSWriter.close(COSWriter.java:300)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1366)
> at ourpackage.util.pdf.PdfDRM.applyDRM(PdfDRM.java:225)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyDrmToCesPdfDocumentInDatabase(IQMDxContentLoader.java:383)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyZippedCesDocumentChangesToDatabase(IQMDxContentLoader.java:265)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.generateStatements(IQMDxContentLoader.java:153)
> ... 33 more
> {code}
> Because Java 8's FilterOutputStream.close() is calling flush() on the
> second close and Oracle's driver code doesn't like that.
> The bug can be worked around it by implementing close() in
> COSStandardOutputStream as below:
> {code}
> private boolean closed;
> @Override
> public void close() throws IOException
> {
> try (OutputStream ostream = out)
> {
> if (!closed)
> flush();
> }
> closed = true;
> }
> {code}
> I've done this in our project code base, by cloning and owning
> COSStandardOutputStream and adding it to our classpath first.  Not ideal.
> Also, mailing list thread on openjdk that recognizes the bug
> http://marc.info/?t=14176740874&r=1&w=2.  Although, it is fixed in Java
> 9 with no plan of backporting.  Not sure how to request a backport, but
> that would be the ideal solution.
> original mailing list report 
> http://asfmail.lucidworks.io/mail_files/pdfbox-users/201509.mbox/%3ccalrfkrtvyzc1y7cfxg8x17kkmd+7byyxgxsr0umqxz_mvd0...@mail.gmail.com%3E
> (note: listed versions I know this affects)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3570) JDK-8054565 Java 8 close contract issue

2016-11-16 Thread Caleb Cushing (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670687#comment-15670687
 ] 

Caleb Cushing edited comment on PDFBOX-3570 at 11/16/16 3:20 PM:
-

tested, it does appear to work now without any workarounds/special exception 
handling. Thanks


was (Author: xenoterracide):
tested, it does appear to work now without any workarounds/special exception 
handling

> JDK-8054565 Java 8 close contract issue
> ---
>
> Key: PDFBOX-3570
> URL: https://issues.apache.org/jira/browse/PDFBOX-3570
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 1.8.12
>Reporter: Caleb Cushing
>
> Java 8 bug uncovered, and wondering if the PDFBox team would be willing to
> work around it?  You should probably reply with an emphatic "no", butI
> figure it is worth a shot.
> Here is the openjdk bug:  https://bugs.openjdk.java.net/browse/JDK-8054565
> PDDocument.saveIncremental(OutputStream) calls close() twice - once in
> try{} and once in finally{}, relying on the Closable contract which says it
> will do nothing if the stream is already close.
> But, we see this:
> {code}
> Caused by: java.io.IOException: Closed LOB
> at
> oracle.jdbc.driver.DatabaseError.SQLToIOException(DatabaseError.java:519)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.ensureOpen(OracleBlobOutputStream.java:231)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.flush(OracleBlobOutputStream.java:167)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at org.apache.pdfbox.pdfwriter.COSWriter.close(COSWriter.java:300)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1366)
> at ourpackage.util.pdf.PdfDRM.applyDRM(PdfDRM.java:225)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyDrmToCesPdfDocumentInDatabase(IQMDxContentLoader.java:383)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyZippedCesDocumentChangesToDatabase(IQMDxContentLoader.java:265)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.generateStatements(IQMDxContentLoader.java:153)
> ... 33 more
> {code}
> Because Java 8's FilterOutputStream.close() is calling flush() on the
> second close and Oracle's driver code doesn't like that.
> The bug can be worked around it by implementing close() in
> COSStandardOutputStream as below:
> {code}
> private boolean closed;
> @Override
> public void close() throws IOException
> {
> try (OutputStream ostream = out)
> {
> if (!closed)
> flush();
> }
> closed = true;
> }
> {code}
> I've done this in our project code base, by cloning and owning
> COSStandardOutputStream and adding it to our classpath first.  Not ideal.
> Also, mailing list thread on openjdk that recognizes the bug
> http://marc.info/?t=14176740874&r=1&w=2.  Although, it is fixed in Java
> 9 with no plan of backporting.  Not sure how to request a backport, but
> that would be the ideal solution.
> original mailing list report 
> http://asfmail.lucidworks.io/mail_files/pdfbox-users/201509.mbox/%3ccalrfkrtvyzc1y7cfxg8x17kkmd+7byyxgxsr0umqxz_mvd0...@mail.gmail.com%3E
> (note: listed versions I know this affects)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3570) JDK-8054565 Java 8 close contract issue

2016-11-16 Thread Caleb Cushing (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670687#comment-15670687
 ] 

Caleb Cushing commented on PDFBOX-3570:
---

tested, it does appear to work now without any workarounds/special exception 
handling

> JDK-8054565 Java 8 close contract issue
> ---
>
> Key: PDFBOX-3570
> URL: https://issues.apache.org/jira/browse/PDFBOX-3570
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.10, 1.8.11, 1.8.12
>Reporter: Caleb Cushing
>
> Java 8 bug uncovered, and wondering if the PDFBox team would be willing to
> work around it?  You should probably reply with an emphatic "no", butI
> figure it is worth a shot.
> Here is the openjdk bug:  https://bugs.openjdk.java.net/browse/JDK-8054565
> PDDocument.saveIncremental(OutputStream) calls close() twice - once in
> try{} and once in finally{}, relying on the Closable contract which says it
> will do nothing if the stream is already close.
> But, we see this:
> {code}
> Caused by: java.io.IOException: Closed LOB
> at
> oracle.jdbc.driver.DatabaseError.SQLToIOException(DatabaseError.java:519)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.ensureOpen(OracleBlobOutputStream.java:231)
> at
> oracle.jdbc.driver.OracleBlobOutputStream.flush(OracleBlobOutputStream.java:167)
> at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at org.apache.pdfbox.pdfwriter.COSWriter.close(COSWriter.java:300)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1366)
> at ourpackage.util.pdf.PdfDRM.applyDRM(PdfDRM.java:225)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyDrmToCesPdfDocumentInDatabase(IQMDxContentLoader.java:383)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.applyZippedCesDocumentChangesToDatabase(IQMDxContentLoader.java:265)
> at
> ourpackage.db.liquibase.customchanges.IQMDxContentLoader.generateStatements(IQMDxContentLoader.java:153)
> ... 33 more
> {code}
> Because Java 8's FilterOutputStream.close() is calling flush() on the
> second close and Oracle's driver code doesn't like that.
> The bug can be worked around it by implementing close() in
> COSStandardOutputStream as below:
> {code}
> private boolean closed;
> @Override
> public void close() throws IOException
> {
> try (OutputStream ostream = out)
> {
> if (!closed)
> flush();
> }
> closed = true;
> }
> {code}
> I've done this in our project code base, by cloning and owning
> COSStandardOutputStream and adding it to our classpath first.  Not ideal.
> Also, mailing list thread on openjdk that recognizes the bug
> http://marc.info/?t=14176740874&r=1&w=2.  Although, it is fixed in Java
> 9 with no plan of backporting.  Not sure how to request a backport, but
> that would be the ideal solution.
> original mailing list report 
> http://asfmail.lucidworks.io/mail_files/pdfbox-users/201509.mbox/%3ccalrfkrtvyzc1y7cfxg8x17kkmd+7byyxgxsr0umqxz_mvd0...@mail.gmail.com%3E
> (note: listed versions I know this affects)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670562#comment-15670562
 ] 

Roman edited comment on PDFBOX-3568 at 11/16/16 2:39 PM:
-

Screenshot is done via rendering based on coordinates extracted by PdfBox (as 
described in PDFBOX-3464). We use it in our tool, which is working in browser.


was (Author: rmakarov):
Screenshot is done via rendering based on coordinates extracted by PdfBox (as 
described in PDFBOX-3464). 

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3568) Characters widths and x-positions incorrect

2016-11-16 Thread Roman (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670562#comment-15670562
 ] 

Roman commented on PDFBOX-3568:
---

Screenshot is done via rendering based on coordinates extracted by PdfBox (as 
described in PDFBOX-3464). 

> Characters widths and x-positions incorrect
> ---
>
> Key: PDFBOX-3568
> URL: https://issues.apache.org/jira/browse/PDFBOX-3568
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Roman
> Attachments: screenshot-1.png, test_width1.pdf
>
>
> Using PdfBox 1.8.12 we are extracting character coordinates from 
> [^test_width1.pdf], as described in PDFBOX-3464
> We got two issues here:
> - wrong height, this issue was already addressed by PDFBOX-3464
> - wrong width and X positions of characters, which we are getting from PdfBox 
> via the code:
> {code} 
>position.getX();
>position.getWidth();
> {code}
> The coordinates, returned by PdfBox shown on [^screenshot-1.png] with blue 
> color.
> This code works fine on most PDFs. Something is wrong with this specific PDF. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3571) sRGB Color Space Profile is subject to 3rd party copyright

2016-11-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670183#comment-15670183
 ] 

Andreas Lehmkühler commented on PDFBOX-3571:


How about using the [sRGB 
profile|https://www.freedesktop.org/wiki/OpenIcc/ProfilePackages/] provided by 
the [OpenICC project|https://www.freedesktop.org/wiki/OpenIcc/]?

> sRGB Color Space Profile is subject to 3rd party copyright
> --
>
> Key: PDFBOX-3571
> URL: https://issues.apache.org/jira/browse/PDFBOX-3571
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.3
>Reporter: gil cattaneo
>
> Hi
> The file /examples/src/main/resources/org/apache/pdfbox/resources/pdfa/sRGB 
> Color Space Profile.icm
> The license: 
> /examples/src/main/resources/org/apache/pdfbox/resources/pdfa/sRGB Color 
> Space Profile.icm.LICENSE.txt
> contains the following:
>  "...permission to use, copy and distribute this file for any purpose is
>  hereby granted without fee, provided that the file is not changed
>  including the HP copyright notice tag, ... "
> The license says: "provided that the file is not changed"
> It does not respect the criteria "The license must meet the Open Source 
> Definition."
> The OSD [1] says:
> "3. Derived Works
> The license must allow modifications"
> [1] http://www.opensource.org/osd.html
> http://www.apache.org/legal/resolved.html#no-modification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3571) sRGB Color Space Profile is subject to 3rd party copyright

2016-11-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670183#comment-15670183
 ] 

Andreas Lehmkühler edited comment on PDFBOX-3571 at 11/16/16 11:15 AM:
---

How about using the [sRGB 
profile|https://www.freedesktop.org/wiki/OpenIcc/ProfilePackages/] provided by 
the [OpenICC project|https://www.freedesktop.org/wiki/OpenIcc/]? They are 
zlib/libpng licensed


was (Author: lehmi):
How about using the [sRGB 
profile|https://www.freedesktop.org/wiki/OpenIcc/ProfilePackages/] provided by 
the [OpenICC project|https://www.freedesktop.org/wiki/OpenIcc/]?

> sRGB Color Space Profile is subject to 3rd party copyright
> --
>
> Key: PDFBOX-3571
> URL: https://issues.apache.org/jira/browse/PDFBOX-3571
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.3
>Reporter: gil cattaneo
>
> Hi
> The file /examples/src/main/resources/org/apache/pdfbox/resources/pdfa/sRGB 
> Color Space Profile.icm
> The license: 
> /examples/src/main/resources/org/apache/pdfbox/resources/pdfa/sRGB Color 
> Space Profile.icm.LICENSE.txt
> contains the following:
>  "...permission to use, copy and distribute this file for any purpose is
>  hereby granted without fee, provided that the file is not changed
>  including the HP copyright notice tag, ... "
> The license says: "provided that the file is not changed"
> It does not respect the criteria "The license must meet the Open Source 
> Definition."
> The OSD [1] says:
> "3. Derived Works
> The license must allow modifications"
> [1] http://www.opensource.org/osd.html
> http://www.apache.org/legal/resolved.html#no-modification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3569) Performance regression in PDColorSpace#toRGBImageAWT

2016-11-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670152#comment-15670152
 ] 

Andreas Lehmkühler commented on PDFBOX-3569:


I'm going to make it configurable. 

Any ideas for a name to be used as parameter?

- org.apache.pdfbox.rendering.UsePureJavaCMYKConversion

> Performance regression in PDColorSpace#toRGBImageAWT
> 
>
> Key: PDFBOX-3569
> URL: https://issues.apache.org/jira/browse/PDFBOX-3569
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.3, 2.1.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.4, 2.1.0
>
> Attachments: PDFBOX-1058.pdf-3.png, PDFBOX-1058.pdf-3.png, 
> PDFBOX-1058.pdf-3.png-diff.png, PDFBOX-2700-JCS_YCCK.pdf-1.png, 
> PDFBOX-2700-JCS_YCCK.pdf-1.png, PDFBOX-2700-JCS_YCCK.pdf-1.png-diff.png, 
> PDFBOX-3569-patch_v2.txt
>
>
> I've a private pdf containing 1900 tiny inline images (CMYK, 8bit) which 
> renders way too slow. Again the CMYK2RGB conversion is the culprit here, BUT 
> the known issue with the KCMS/LCMS change isn't the main problem here.
> I ran some tests on linux (PDFToImage time -imageType png -resolution 150)
> 1.6.0_37: 355s
> 1.7.0_25: 289s
> 1.7.0_75: 298s
> 1.8.0_101: cancelled after 15 min



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3569) Performance regression in PDColorSpace#toRGBImageAWT

2016-11-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670145#comment-15670145
 ] 

Andreas Lehmkühler commented on PDFBOX-3569:


The classes in sun.java2d.cmm doesn't provide such an information

> Performance regression in PDColorSpace#toRGBImageAWT
> 
>
> Key: PDFBOX-3569
> URL: https://issues.apache.org/jira/browse/PDFBOX-3569
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.3, 2.1.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.4, 2.1.0
>
> Attachments: PDFBOX-1058.pdf-3.png, PDFBOX-1058.pdf-3.png, 
> PDFBOX-1058.pdf-3.png-diff.png, PDFBOX-2700-JCS_YCCK.pdf-1.png, 
> PDFBOX-2700-JCS_YCCK.pdf-1.png, PDFBOX-2700-JCS_YCCK.pdf-1.png-diff.png, 
> PDFBOX-3569-patch_v2.txt
>
>
> I've a private pdf containing 1900 tiny inline images (CMYK, 8bit) which 
> renders way too slow. Again the CMYK2RGB conversion is the culprit here, BUT 
> the known issue with the KCMS/LCMS change isn't the main problem here.
> I ran some tests on linux (PDFToImage time -imageType png -resolution 150)
> 1.6.0_37: 355s
> 1.7.0_25: 289s
> 1.7.0_75: 298s
> 1.8.0_101: cancelled after 15 min



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3574) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Dmitri Russu (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitri Russu updated PDFBOX-3574:
-
Description: 
I try to get images from PDF file

Code I run is :


public static void testExtractImages() throws Exception {

File resource = new File("test/t1_edited.pdf");

PDDocument document = PDDocument.load(resource);
int page = 1;
for (final PDPage pdPage : document.getPages())
{
final int currentPage = page;
PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
PDFGraphicsStreamEngine(pdPage)
{
int index = 0;

@Override
public void drawImage(PDImage pdImage) throws 
IOException
{
if (pdImage instanceof PDImageXObject)
{
PDImageXObject image = 
(PDImageXObject)pdImage;
File file = new File("test/", 
String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
image.getSuffix()));

ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
FileOutputStream(file));
index++;
}
}

@Override
public void appendRectangle(Point2D p0, Point2D 
p1, Point2D p2, Point2D p3) throws IOException { }

@Override
public void clip(int windingRule) throws 
IOException { }

@Override
public void moveTo(float x, float y) throws 
IOException {  }

@Override
public void lineTo(float x, float y) throws 
IOException { }

@Override
public void curveTo(float x1, float y1, float 
x2, float y2, float x3, float y3) throws IOException {  }

@Override
public Point2D getCurrentPoint() throws 
IOException { return null; }

@Override
public void closePath() throws IOException { }

@Override
public void endPath() throws IOException { }

@Override
public void strokePath() throws IOException { }

@Override
public void fillPath(int windingRule) throws 
IOException { }

@Override
public void fillAndStrokePath(int windingRule) 
throws IOException { }

@Override
public void shadingFill(COSName shadingName) 
throws IOException { }
};
pdfGraphicsStreamEngine.processPage(pdPage);
page++;
}
}

Error received:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup(PDFStreamEngine.java:213)
at 
org.apache

[jira] [Updated] (PDFBOX-3575) Missing (auto)close on RandomFileAccess in CCITTFactory

2016-11-16 Thread Maurice Betzel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maurice Betzel updated PDFBOX-3575:
---
Description: 
In org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory the method 
createFromFile instantiates a new RandomAccessFile but does not close it when 
done. This keeps a lock on the File untill the JVM is closed.
Using close or autocloseable resolves the issue.

public static PDImageXObject createFromFile(PDDocument document, File file, int 
number)
throws IOException
{
return createFromRandomAccessImpl(document, new RandomAccessFile(file, 
"r"), number);
}

  was:
In org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory the method 
createFromFile instantiates a new RandomAccessFile but does not close it wen 
done. This keeps a lock on the File untill the JVM is closed.
Use close or autocloseable resolves the issue.

public static PDImageXObject createFromFile(PDDocument document, File file, int 
number)
throws IOException
{
return createFromRandomAccessImpl(document, new RandomAccessFile(file, 
"r"), number);
}


> Missing (auto)close on RandomFileAccess in CCITTFactory
> ---
>
> Key: PDFBOX-3575
> URL: https://issues.apache.org/jira/browse/PDFBOX-3575
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.3
> Environment: Windows 10/64, java version "1.8.0_112" Java(TM) SE 
> Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM 
> (build 25.112-b15, mixed mode)
>Reporter: Maurice Betzel
>
> In org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory the method 
> createFromFile instantiates a new RandomAccessFile but does not close it when 
> done. This keeps a lock on the File untill the JVM is closed.
> Using close or autocloseable resolves the issue.
> public static PDImageXObject createFromFile(PDDocument document, File file, 
> int number)
> throws IOException
> {
> return createFromRandomAccessImpl(document, new 
> RandomAccessFile(file, "r"), number);
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3574) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Dmitri Russu (JIRA)
Dmitri Russu created PDFBOX-3574:


 Summary: Exception in thread "main" java.lang.OutOfMemoryError: 
Java heap space
 Key: PDFBOX-3574
 URL: https://issues.apache.org/jira/browse/PDFBOX-3574
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing, Utilities
Affects Versions: 2.0.3
 Environment: Ubuntu 15.10
Reporter: Dmitri Russu
 Fix For: 2.0.4


Code I run is 


public static void testExtractImages() throws Exception {

File resource = new File("test/t1_edited.pdf");

PDDocument document = PDDocument.load(resource);
int page = 1;
for (final PDPage pdPage : document.getPages())
{
final int currentPage = page;
PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
PDFGraphicsStreamEngine(pdPage)
{
int index = 0;

@Override
public void drawImage(PDImage pdImage) throws 
IOException
{
if (pdImage instanceof PDImageXObject)
{
PDImageXObject image = 
(PDImageXObject)pdImage;
File file = new File("test/", 
String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
image.getSuffix()));

ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
FileOutputStream(file));
index++;
}
}

@Override
public void appendRectangle(Point2D p0, Point2D 
p1, Point2D p2, Point2D p3) throws IOException { }

@Override
public void clip(int windingRule) throws 
IOException { }

@Override
public void moveTo(float x, float y) throws 
IOException {  }

@Override
public void lineTo(float x, float y) throws 
IOException { }

@Override
public void curveTo(float x1, float y1, float 
x2, float y2, float x3, float y3) throws IOException {  }

@Override
public Point2D getCurrentPoint() throws 
IOException { return null; }

@Override
public void closePath() throws IOException { }

@Override
public void endPath() throws IOException { }

@Override
public void strokePath() throws IOException { }

@Override
public void fillPath(int windingRule) throws 
IOException { }

@Override
public void fillAndStrokePath(int windingRule) 
throws IOException { }

@Override
public void shadingFill(COSName shadingName) 
throws IOException { }
};
pdfGraphicsStreamEngine.processPage(pdPage);
page++;
}
}

Error received:


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:81

[jira] [Created] (PDFBOX-3575) Missing (auto)close on RandomFileAccess in CCITTFactory

2016-11-16 Thread Maurice Betzel (JIRA)
Maurice Betzel created PDFBOX-3575:
--

 Summary: Missing (auto)close on RandomFileAccess in CCITTFactory
 Key: PDFBOX-3575
 URL: https://issues.apache.org/jira/browse/PDFBOX-3575
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.3
 Environment: Windows 10/64, java version "1.8.0_112" Java(TM) SE 
Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM 
(build 25.112-b15, mixed mode)
Reporter: Maurice Betzel


In org.apache.pdfbox.pdmodel.graphics.image.CCITTFactory the method 
createFromFile instantiates a new RandomAccessFile but does not close it wen 
done. This keeps a lock on the File untill the JVM is closed.
Use close or autocloseable resolves the issue.

public static PDImageXObject createFromFile(PDDocument document, File file, int 
number)
throws IOException
{
return createFromRandomAccessImpl(document, new RandomAccessFile(file, 
"r"), number);
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3573) Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

2016-11-16 Thread Dmitri Russu (JIRA)
Dmitri Russu created PDFBOX-3573:


 Summary: Exception in thread "main" java.lang.OutOfMemoryError: 
Java heap space
 Key: PDFBOX-3573
 URL: https://issues.apache.org/jira/browse/PDFBOX-3573
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing, Utilities
Affects Versions: 2.0.3
 Environment: Ubuntu 15.10
Reporter: Dmitri Russu


I try to get Images from an PDF file, file pages are images, total size of PDF 
file is about 1MB

Code I run is next

public static void testExtractImages() throws Exception {

File resource = new File("test/t1_edited.pdf");

PDDocument document = PDDocument.load(resource);
int page = 1;
for (final PDPage pdPage : document.getPages())
{
final int currentPage = page;
PDFGraphicsStreamEngine pdfGraphicsStreamEngine = new 
PDFGraphicsStreamEngine(pdPage)
{
int index = 0;

@Override
public void drawImage(PDImage pdImage) throws 
IOException
{
if (pdImage instanceof PDImageXObject)
{
PDImageXObject image = 
(PDImageXObject)pdImage;
File file = new File("test/", 
String.format("10948-new-engine-%s-%s.%s", currentPage, index, 
image.getSuffix()));

ImageIOUtil.writeImage(image.getImage(), image.getSuffix(), new 
FileOutputStream(file));
index++;
}
}

@Override
public void appendRectangle(Point2D p0, Point2D 
p1, Point2D p2, Point2D p3) throws IOException { }

@Override
public void clip(int windingRule) throws 
IOException { }

@Override
public void moveTo(float x, float y) throws 
IOException {  }

@Override
public void lineTo(float x, float y) throws 
IOException { }

@Override
public void curveTo(float x1, float y1, float 
x2, float y2, float x3, float y3) throws IOException {  }

@Override
public Point2D getCurrentPoint() throws 
IOException { return null; }

@Override
public void closePath() throws IOException { }

@Override
public void endPath() throws IOException { }

@Override
public void strokePath() throws IOException { }

@Override
public void fillPath(int windingRule) throws 
IOException { }

@Override
public void fillAndStrokePath(int windingRule) 
throws IOException { }

@Override
public void shadingFill(COSName shadingName) 
throws IOException { }
};
pdfGraphicsStreamEngine.processPage(pdPage);
page++;
}
}

--
ERROR
-
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3230)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:125)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:64)
at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:69)
at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:162)
at 
org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:235)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.(PDImageXObject.java:147)
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
at 
org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:409)
at 
org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:53)

[jira] [Resolved] (PDFBOX-3572) AES-Decryption with Bouncycastle throws NullPointerException

2016-11-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3572.
-
Resolution: Fixed
  Assignee: Tilman Hausherr

> AES-Decryption with Bouncycastle throws NullPointerException
> 
>
> Key: PDFBOX-3572
> URL: https://issues.apache.org/jira/browse/PDFBOX-3572
> Project: PDFBox
>  Issue Type: Bug
>  Components: Crypto
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.0.3
>Reporter: Markus Fensterer
>Assignee: Tilman Hausherr
>  Labels: bouncycastle, decrypt, nullpointerexception
> Fix For: 2.0.4, 2.1.0
>
>
> Using AES-Encryption with PdfBox and Bouncycastle yields a null pointer. When 
> JCE is used everything works nicely:  com.sun.crypt.provider.AESCipher 
> returns an empty byte array for the last 16 bytes of the metadata dictionary. 
> Bouncycastle returns a null reference.
> {code:title=Demo.java|borderStyle=solid}
> public class Demo {
> public static void main(String[] args) throws IOException, 
> NoSuchAlgorithmException, NoSuchPaddingException {
> String password = "pw";
> String cipherString = "AES/CBC/PKCS5Padding";
> String testFilename = "test.pdf";
> PDDocument document = new PDDocument();
> AccessPermission ap = new AccessPermission();
> ap.setReadOnly();
> StandardProtectionPolicy policy = new 
> StandardProtectionPolicy(password, password, ap);
> policy.setEncryptionKeyLength(128);
> policy.setPreferAES(true);
> document.protect(policy);
> document.getDocumentInformation().setAuthor("author");
> document.save(testFilename);
> document.close();
> // Decryption with SunJCE works
> Cipher cipher = Cipher.getInstance(cipherString);
> System.out.printf("Provider to use for %s decryption: %s\n", 
> cipherString, cipher.getProvider());
> document = PDDocument.load(new File(testFilename), password);
> Security.removeProvider("SunJCE");
> // Decryption with BouncyCastle fails with NPE
> cipher = Cipher.getInstance(cipherString);
> System.out.printf("Provider to use for %s decryption: %s\n", 
> cipherString, cipher.getProvider());
> document = PDDocument.load(new File(testFilename), password);
> }
> }
> {code}
> {code:title=Output with stacktrace}
> Provider to use for AES/CBC/PKCS5Padding decryption: SunJCE version 1.8
> Provider to use for AES/CBC/PKCS5Padding decryption: BC version 1.54
> Exception in thread "main" java.lang.NullPointerException
>   at java.io.OutputStream.write(OutputStream.java:75)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:269)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:152)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptString(SecurityHandler.java:532)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:391)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptDictionary(SecurityHandler.java:512)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:399)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:798)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2092)
>   at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:201)
>   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
>   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:891)
>   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:831)
> {code}
> This could possibly be fixed with a null check in the SecurityHandler before 
> writing to the OutputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-3572) AES-Decryption with Bouncycastle throws NullPointerException

2016-11-16 Thread Markus Fensterer (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669787#comment-15669787
 ] 

Markus Fensterer edited comment on PDFBOX-3572 at 11/16/16 8:12 AM:


I tested it with the newest 2.0.4-SNAPSHOT and it works.
Thanks!


was (Author: markus.f):
I tested it with the newest 2.0.4-SNAPSHOT and it works.

> AES-Decryption with Bouncycastle throws NullPointerException
> 
>
> Key: PDFBOX-3572
> URL: https://issues.apache.org/jira/browse/PDFBOX-3572
> Project: PDFBox
>  Issue Type: Bug
>  Components: Crypto
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.0.3
>Reporter: Markus Fensterer
>  Labels: bouncycastle, decrypt, nullpointerexception
> Fix For: 2.0.4, 2.1.0
>
>
> Using AES-Encryption with PdfBox and Bouncycastle yields a null pointer. When 
> JCE is used everything works nicely:  com.sun.crypt.provider.AESCipher 
> returns an empty byte array for the last 16 bytes of the metadata dictionary. 
> Bouncycastle returns a null reference.
> {code:title=Demo.java|borderStyle=solid}
> public class Demo {
> public static void main(String[] args) throws IOException, 
> NoSuchAlgorithmException, NoSuchPaddingException {
> String password = "pw";
> String cipherString = "AES/CBC/PKCS5Padding";
> String testFilename = "test.pdf";
> PDDocument document = new PDDocument();
> AccessPermission ap = new AccessPermission();
> ap.setReadOnly();
> StandardProtectionPolicy policy = new 
> StandardProtectionPolicy(password, password, ap);
> policy.setEncryptionKeyLength(128);
> policy.setPreferAES(true);
> document.protect(policy);
> document.getDocumentInformation().setAuthor("author");
> document.save(testFilename);
> document.close();
> // Decryption with SunJCE works
> Cipher cipher = Cipher.getInstance(cipherString);
> System.out.printf("Provider to use for %s decryption: %s\n", 
> cipherString, cipher.getProvider());
> document = PDDocument.load(new File(testFilename), password);
> Security.removeProvider("SunJCE");
> // Decryption with BouncyCastle fails with NPE
> cipher = Cipher.getInstance(cipherString);
> System.out.printf("Provider to use for %s decryption: %s\n", 
> cipherString, cipher.getProvider());
> document = PDDocument.load(new File(testFilename), password);
> }
> }
> {code}
> {code:title=Output with stacktrace}
> Provider to use for AES/CBC/PKCS5Padding decryption: SunJCE version 1.8
> Provider to use for AES/CBC/PKCS5Padding decryption: BC version 1.54
> Exception in thread "main" java.lang.NullPointerException
>   at java.io.OutputStream.write(OutputStream.java:75)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:269)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:152)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptString(SecurityHandler.java:532)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:391)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptDictionary(SecurityHandler.java:512)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:399)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:798)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2092)
>   at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:201)
>   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
>   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:891)
>   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:831)
> {code}
> This could possibly be fixed with a null check in the SecurityHandler before 
> writing to the OutputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3572) AES-Decryption with Bouncycastle throws NullPointerException

2016-11-16 Thread Markus Fensterer (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669787#comment-15669787
 ] 

Markus Fensterer commented on PDFBOX-3572:
--

I tested it with the newest 2.0.4-SNAPSHOT and it works.

> AES-Decryption with Bouncycastle throws NullPointerException
> 
>
> Key: PDFBOX-3572
> URL: https://issues.apache.org/jira/browse/PDFBOX-3572
> Project: PDFBox
>  Issue Type: Bug
>  Components: Crypto
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.0.3
>Reporter: Markus Fensterer
>  Labels: bouncycastle, decrypt, nullpointerexception
> Fix For: 2.0.4, 2.1.0
>
>
> Using AES-Encryption with PdfBox and Bouncycastle yields a null pointer. When 
> JCE is used everything works nicely:  com.sun.crypt.provider.AESCipher 
> returns an empty byte array for the last 16 bytes of the metadata dictionary. 
> Bouncycastle returns a null reference.
> {code:title=Demo.java|borderStyle=solid}
> public class Demo {
> public static void main(String[] args) throws IOException, 
> NoSuchAlgorithmException, NoSuchPaddingException {
> String password = "pw";
> String cipherString = "AES/CBC/PKCS5Padding";
> String testFilename = "test.pdf";
> PDDocument document = new PDDocument();
> AccessPermission ap = new AccessPermission();
> ap.setReadOnly();
> StandardProtectionPolicy policy = new 
> StandardProtectionPolicy(password, password, ap);
> policy.setEncryptionKeyLength(128);
> policy.setPreferAES(true);
> document.protect(policy);
> document.getDocumentInformation().setAuthor("author");
> document.save(testFilename);
> document.close();
> // Decryption with SunJCE works
> Cipher cipher = Cipher.getInstance(cipherString);
> System.out.printf("Provider to use for %s decryption: %s\n", 
> cipherString, cipher.getProvider());
> document = PDDocument.load(new File(testFilename), password);
> Security.removeProvider("SunJCE");
> // Decryption with BouncyCastle fails with NPE
> cipher = Cipher.getInstance(cipherString);
> System.out.printf("Provider to use for %s decryption: %s\n", 
> cipherString, cipher.getProvider());
> document = PDDocument.load(new File(testFilename), password);
> }
> }
> {code}
> {code:title=Output with stacktrace}
> Provider to use for AES/CBC/PKCS5Padding decryption: SunJCE version 1.8
> Provider to use for AES/CBC/PKCS5Padding decryption: BC version 1.54
> Exception in thread "main" java.lang.NullPointerException
>   at java.io.OutputStream.write(OutputStream.java:75)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:269)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:152)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptString(SecurityHandler.java:532)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:391)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptDictionary(SecurityHandler.java:512)
>   at 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:399)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:798)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
>   at 
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2092)
>   at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:201)
>   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249)
>   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:891)
>   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:831)
> {code}
> This could possibly be fixed with a null check in the SecurityHandler before 
> writing to the OutputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org