Re: Subversion integration with JIRA

2014-07-23 Thread Maruan Sahyoun
according to the sample provided in http://www.apache.org/dev/svngit2jira.html 
the commit will be shown in the comments.

Maruan

Am 23.07.2014 um 08:33 schrieb Thomas Chojecki :

> Am 2014-07-23 07:57, schrieb Tilman Hausherr:
>> Lets try it. TIKA has something similar, see e.g. here:
>> https://issues.apache.org/jira/browse/TIKA-1325
>> Tilman
> 
> Looks like they mishandle the hudson to do something that jira already 
> support in a similar way. I think the solution from infra is the better one. 
> So the code changes will be shown only in the sourcecode section of a ticket. 
> :-)
> 
> The feature to link a sourcecode with a issue is imo a must have.
> 
> +1
> 
>> Am 22.07.2014 19:53, schrieb Andreas Lehmkuehler:
>>> Hi,
>>> our infra guys provide an integration of subversion with JIRA tickets. All 
>>> subversion commits will be automatically added as comment  to the 
>>> corresponding JIRA ticket as long as the ticket number is used within the 
>>> svn commit comment.
>>> See http://www.apache.org/dev/svngit2jira.html for any further details.
>>> Should we ask infra to enable that feature for PDFBox?
>>> WDYT?
>>> BR
>>> Andreas Lehmkühler



[jira] [Updated] (PDFBOX-2233) Make PreflightParser sandbox safe

2014-07-23 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2233:
--

Attachment: avoidtmpfile.patch

Patch that doesnt care about performance

> Make PreflightParser sandbox safe
> -
>
> Key: PDFBOX-2233
> URL: https://issues.apache.org/jira/browse/PDFBOX-2233
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Preflight
>Affects Versions: 2.0.0
>Reporter: simon steiner
> Attachments: avoidtmpfile.patch
>
>
> It should be possible to pass a DataSource into PreflightParser without a 
> temp file being created. Temp file is being created in NonSequentialPDFParser 
> causing a SecurityException.
> java.lang.SecurityException: Unable to create temporary file
>   at java.io.File.createTempFile(File.java:2018)
>   at java.io.File.createTempFile(File.java:2070)
>   at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.createTmpFile(NonSequentialPDFParser.java:281)
>   at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.(NonSequentialPDFParser.java:261)
>   at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.(NonSequentialPDFParser.java:247)
>   at 
> org.apache.pdfbox.preflight.parser.PreflightParser.(PreflightParser.java:125)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2234) Invalid Color space preflight error on Java 8

2014-07-23 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2234:
--

Attachment: expected.pdf

> Invalid Color space preflight error on Java 8
> -
>
> Key: PDFBOX-2234
> URL: https://issues.apache.org/jira/browse/PDFBOX-2234
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 2.0.0
>Reporter: simon steiner
> Attachments: expected.pdf
>
>
> java -cp 
> pdf-box-svn/preflight/target/preflight-2.0.0-SNAPSHOT.jar:pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar:pdf-box-svn/xmpbox/target/xmpbox-2.0.0-SNAPSHOT.jar:lib/commons-io-1.3.1.jar
>  org.apache.pdfbox.preflight.Validator_A1b expected.pdf
> Java 7:
> The file expected.pdf is a valid PDF/A-1b file
> Java 8:
> The fileexpected.pdf is not valid, error(s) :
> 2.4.3 : Invalid Color space, The operator "G" can't be used without Color 
> Profile



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2234) Invalid Color space preflight error on Java 8

2014-07-23 Thread simon steiner (JIRA)
simon steiner created PDFBOX-2234:
-

 Summary: Invalid Color space preflight error on Java 8
 Key: PDFBOX-2234
 URL: https://issues.apache.org/jira/browse/PDFBOX-2234
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: expected.pdf

java -cp 
pdf-box-svn/preflight/target/preflight-2.0.0-SNAPSHOT.jar:pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar:pdf-box-svn/xmpbox/target/xmpbox-2.0.0-SNAPSHOT.jar:lib/commons-io-1.3.1.jar
 org.apache.pdfbox.preflight.Validator_A1b expected.pdf

Java 7:
The file expected.pdf is a valid PDF/A-1b file

Java 8:
The fileexpected.pdf is not valid, error(s) :
2.4.3 : Invalid Color space, The operator "G" can't be used without Color 
Profile




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2235) Lines missing PDF to image

2014-07-23 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2235:
--

Attachment: lines.pdf

> Lines missing PDF to image
> --
>
> Key: PDFBOX-2235
> URL: https://issues.apache.org/jira/browse/PDFBOX-2235
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: simon steiner
> Attachments: lines.pdf
>
>
> java -jar /home/simon/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
> PDFToImage lines.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2235) Lines missing PDF to image

2014-07-23 Thread simon steiner (JIRA)
simon steiner created PDFBOX-2235:
-

 Summary: Lines missing PDF to image
 Key: PDFBOX-2235
 URL: https://issues.apache.org/jira/browse/PDFBOX-2235
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: lines.pdf

java -jar /home/simon/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
PDFToImage lines.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Subversion integration with JIRA

2014-07-23 Thread Thomas Chojecki

Am 2014-07-23 09:00, schrieb Maruan Sahyoun:

according to the sample provided in
http://www.apache.org/dev/svngit2jira.html the commit will be shown in
the comments.


You are right. Hmmm I hope this will not do additional mailinglist mails 
:|


I only knew the solution with the source section. For this the jira 
project would need to be linked with a scm and then it grabs the commit 
messages and link the changesets to the right ticket.


IMO linking the source any kind would help tracking changes in the 
source and make it more transparent.


Either way +1

Best regards
Thomas


Maruan

Am 23.07.2014 um 08:33 schrieb Thomas Chojecki :


Am 2014-07-23 07:57, schrieb Tilman Hausherr:

Lets try it. TIKA has something similar, see e.g. here:
https://issues.apache.org/jira/browse/TIKA-1325
Tilman


Looks like they mishandle the hudson to do something that jira already 
support in a similar way. I think the solution from infra is the 
better one. So the code changes will be shown only in the sourcecode 
section of a ticket. :-)


The feature to link a sourcecode with a issue is imo a must have.

+1


Am 22.07.2014 19:53, schrieb Andreas Lehmkuehler:

Hi,
our infra guys provide an integration of subversion with JIRA 
tickets. All subversion commits will be automatically added as 
comment  to the corresponding JIRA ticket as long as the ticket 
number is used within the svn commit comment.
See http://www.apache.org/dev/svngit2jira.html for any further 
details.

Should we ask infra to enable that feature for PDFBox?
WDYT?
BR
Andreas Lehmkühler


Re: Subversion integration with JIRA

2014-07-23 Thread Andreas Lehmkuehler

Am 23.07.2014 07:57, schrieb Tilman Hausherr:

Lets try it. TIKA has something similar, see e.g. here:
https://issues.apache.org/jira/browse/TIKA-1325
That's something different. Those comments are added after any build based on a 
svn commit.


BR
Andreas Lehmkühler


Tilman

Am 22.07.2014 19:53, schrieb Andreas Lehmkuehler:

Hi,

our infra guys provide an integration of subversion with JIRA tickets. All
subversion commits will be automatically added as comment  to the
corresponding JIRA ticket as long as the ticket number is used within the svn
commit comment.

See http://www.apache.org/dev/svngit2jira.html for any further details.

Should we ask infra to enable that feature for PDFBox?

WDYT?


BR
Andreas Lehmkühler







[jira] [Created] (PDFBOX-2236) Useless dependency in specific usage

2014-07-23 Thread Cyril Bremaud (JIRA)
Cyril Bremaud created PDFBOX-2236:
-

 Summary: Useless dependency in specific usage
 Key: PDFBOX-2236
 URL: https://issues.apache.org/jira/browse/PDFBOX-2236
 Project: PDFBox
  Issue Type: Wish
  Components: Signing
Affects Versions: 1.8.6
Reporter: Cyril Bremaud
Priority: Trivial
 Fix For: 1.8.7


In class 
org.apache.pdfbox.pdmodel.interactive.digitalsignature.visible.PDVisibleSignDesigner
 line 324 you use Arrays.clone() method from BouncyCastle.

This creates a dependency on BouncyCastle if we try to add a visual signature 
(computed by third party solution) to a pdf document.
For this specific usage this dependency seems useless and  could be avoided 
using the copy method of the class java.util.Arrays.

Could you fix this issue in a future version ?




--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Subversion integration with JIRA

2014-07-23 Thread Andreas Lehmkuehler


Am 23.07.2014 15:16, schrieb Thomas Chojecki:

Am 2014-07-23 09:00, schrieb Maruan Sahyoun:

according to the sample provided in
http://www.apache.org/dev/svngit2jira.html the commit will be shown in
the comments.


You are right. Hmmm I hope this will not do additional mailinglist mails :|
It will, but if it is to much traffic we might consider asking for a new mailing 
list like issues@pdfbox and send all JIRA mails to that list instaed of dev@pdfbox



I only knew the solution with the source section. For this the jira project
would need to be linked with a scm and then it grabs the commit messages and
link the changesets to the right ticket.
That's a JIRA plugin and was used in the past. But the ASF repository is to huge 
so that the plugin more or less regularly crashed with a OOM exception.



IMO linking the source any kind would help tracking changes in the source and
make it more transparent.

Either way +1

Best regards
Thomas


Maruan

Am 23.07.2014 um 08:33 schrieb Thomas Chojecki :


Am 2014-07-23 07:57, schrieb Tilman Hausherr:

Lets try it. TIKA has something similar, see e.g. here:
https://issues.apache.org/jira/browse/TIKA-1325
Tilman


Looks like they mishandle the hudson to do something that jira already
support in a similar way. I think the solution from infra is the better one.
So the code changes will be shown only in the sourcecode section of a ticket.
:-)

The feature to link a sourcecode with a issue is imo a must have.

+1


Am 22.07.2014 19:53, schrieb Andreas Lehmkuehler:

Hi,
our infra guys provide an integration of subversion with JIRA tickets. All
subversion commits will be automatically added as comment  to the
corresponding JIRA ticket as long as the ticket number is used within the
svn commit comment.
See http://www.apache.org/dev/svngit2jira.html for any further details.
Should we ask infra to enable that feature for PDFBox?
WDYT?
BR
Andreas Lehmkühler




[jira] [Closed] (PDFBOX-2235) Lines missing PDF to image

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-2235.
---

Resolution: Duplicate

Duplicate of PDFBOX-1288 :-( Renders fine at 300 dpi, but not at some low 
resolutions.

> Lines missing PDF to image
> --
>
> Key: PDFBOX-2235
> URL: https://issues.apache.org/jira/browse/PDFBOX-2235
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: simon steiner
> Attachments: lines.pdf
>
>
> java -jar /home/simon/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
> PDFToImage lines.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2233) Make PreflightParser sandbox safe

2014-07-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071870#comment-14071870
 ] 

Tilman Hausherr commented on PDFBOX-2233:
-

... and that is the problem, sorry. The memory footprint shouldn't increase for 
files. The solution should be something that would not use more memory for 
files as now, files don't have to be cached in full, but that is what your 
patch does.

> Make PreflightParser sandbox safe
> -
>
> Key: PDFBOX-2233
> URL: https://issues.apache.org/jira/browse/PDFBOX-2233
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Preflight
>Affects Versions: 2.0.0
>Reporter: simon steiner
> Attachments: avoidtmpfile.patch
>
>
> It should be possible to pass a DataSource into PreflightParser without a 
> temp file being created. Temp file is being created in NonSequentialPDFParser 
> causing a SecurityException.
> java.lang.SecurityException: Unable to create temporary file
>   at java.io.File.createTempFile(File.java:2018)
>   at java.io.File.createTempFile(File.java:2070)
>   at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.createTmpFile(NonSequentialPDFParser.java:281)
>   at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.(NonSequentialPDFParser.java:261)
>   at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.(NonSequentialPDFParser.java:247)
>   at 
> org.apache.pdfbox.preflight.parser.PreflightParser.(PreflightParser.java:125)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2236) Useless dependency in specific usage

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2236.
-

Resolution: Fixed
  Assignee: Tilman Hausherr

Thanks, done in rev 1612855 for the 1.8 branch. You'll find a snapshot soon 
here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.7-SNAPSHOT/
Please comment and/or reopen if it doesn't work properly.

> Useless dependency in specific usage
> 
>
> Key: PDFBOX-2236
> URL: https://issues.apache.org/jira/browse/PDFBOX-2236
> Project: PDFBox
>  Issue Type: Wish
>  Components: Signing
>Affects Versions: 1.8.6
>Reporter: Cyril Bremaud
>Assignee: Tilman Hausherr
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.8.7
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> In class 
> org.apache.pdfbox.pdmodel.interactive.digitalsignature.visible.PDVisibleSignDesigner
>  line 324 you use Arrays.clone() method from BouncyCastle.
> This creates a dependency on BouncyCastle if we try to add a visual signature 
> (computed by third party solution) to a pdf document.
> For this specific usage this dependency seems useless and  could be avoided 
> using the copy method of the class java.util.Arrays.
> Could you fix this issue in a future version ?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Build failed in Jenkins: PDFBox 1.8.x #224

2014-07-23 Thread Apache Jenkins Server
See 

Changes:

[tilman] PDFBOX-2236: replace bouncycastle clone method with copyof, as 
suggested by Cyril Bremaud

--
[...truncated 277 lines...]
Running org.apache.xmpbox.schema.AdobePDFTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.059 sec
Running org.apache.xmpbox.schema.XMPSchemaTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.152 sec
Running org.apache.xmpbox.schema.PDFAIdentificationOthersTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.025 sec
Running org.apache.xmpbox.schema.PhotoshopSchemaTest
Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.567 sec
Running org.apache.xmpbox.parser.DeserializationTest
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.141 sec
Running org.apache.xmpbox.SaveMetadataHelperTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec
Running org.apache.xmpbox.TestXMPWithDefinedSchemas
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.056 sec
Running org.apache.xmpbox.type.TestLayerType
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 sec
Running org.apache.xmpbox.type.TestThumbnailType
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.076 sec
Running org.apache.xmpbox.type.TestResourceEventType
Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.123 sec
Running org.apache.xmpbox.type.TestJobType
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.055 sec
Running org.apache.xmpbox.type.TestVersionType
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
Running org.apache.xmpbox.type.AttributeTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running org.apache.xmpbox.type.TestSimpleMetadataProperties
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.038 sec
Running org.apache.xmpbox.type.TestDerivedType
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.044 sec
Running org.apache.xmpbox.type.TestResourceRefType
Tests run: 60, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.308 sec
Running org.apache.xmpbox.type.TestAbstractStructuredType
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec

Results :

Tests run: 424, Failures: 0, Errors: 0, Skipped: 0

[JENKINS] Recording test results
[INFO] 
[INFO] --- maven-bundle-plugin:2.3.7:bundle (default-bundle) @ xmpbox ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ xmpbox 
---
[INFO] 
[INFO] --- apache-rat-plugin:0.6:check (default) @ xmpbox ---
[INFO] Exclude: release.properties
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ xmpbox ---
[INFO] Installing 

 to 
/home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-SNAPSHOT.jar
[INFO] Installing 
 to 
/home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-bundle-plugin:2.3.7:install (default-install) @ xmpbox ---
[INFO] Installing 
org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-SNAPSHOT.jar
[INFO] Writing OBR metadata
[INFO] 
[INFO] --- maven-deploy-plugin:2.6:deploy (default-deploy) @ xmpbox ---
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/maven-metadata.xml
 (776 B at 4.9 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-20140723.160635-79.jar
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-20140723.160635-79.jar
 (112 KB at 490.8 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-20140723.160635-79.pom
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/xmpbox-1.8.7-20140723.160635-79.pom
 (4 KB at 27.2 KB/sec)
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/maven-metadata.xml
 (383 B at 4.3 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/maven-metadata.xml
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.7-SNAPSHOT/maven-metadat

Build failed in Jenkins: PDFBox 1.8.x » Apache PDFBox #224

2014-07-23 Thread Apache Jenkins Server
See 


Changes:

[tilman] PDFBOX-2236: replace bouncycastle clone method with copyof, as 
suggested by Cyril Bremaud

--
[INFO] 
[INFO] 
[INFO] Building Apache PDFBox 1.8.7-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ pdfbox ---
[INFO] Deleting 

[INFO] 
[INFO] --- maven-remote-resources-plugin:1.2.1:process (default) @ pdfbox ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (default) @ pdfbox ---
[INFO] Executing tasks

main:

find.adobefiles:

get.adobefiles:

testexist:

downloadfile:
[unjar] Expanding: 

 into 

[unjar] Expanding: 

 into 

[INFO] Executed tasks
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ pdfbox ---
[debug] execute contextualize
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] Copying 192 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ pdfbox ---
[INFO] Compiling 529 source files to 

[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
:[319,39]
 cannot find symbol
symbol  : method copyOf(byte[],int)
location: class java.util.Arrays
[INFO] 1 error
[INFO] -


Re: svn commit: r1612855 - /pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/digitalsignature/visible/PDVisibleSignDesigner.java

2014-07-23 Thread Andreas Lehmkuehler

Arrays.copyOf isn't available in java5

Am 23.07.2014 17:51, schrieb til...@apache.org:

Author: tilman
Date: Wed Jul 23 15:51:56 2014
New Revision: 1612855

URL: http://svn.apache.org/r1612855
Log:
PDFBOX-2236: replace bouncycastle clone method with copyof, as suggested by 
Cyril Bremaud

Modified:
 
pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/digitalsignature/visible/PDVisibleSignDesigner.java

Modified: 
pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/digitalsignature/visible/PDVisibleSignDesigner.java
URL: 
http://svn.apache.org/viewvc/pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/digitalsignature/visible/PDVisibleSignDesigner.java?rev=1612855&r1=1612854&r2=1612855&view=diff
==
--- 
pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/digitalsignature/visible/PDVisibleSignDesigner.java
 (original)
+++ 
pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/digitalsignature/visible/PDVisibleSignDesigner.java
 Wed Jul 23 15:51:56 2014
@@ -22,6 +22,7 @@ import java.io.ByteArrayOutputStream;
  import java.io.FileInputStream;
  import java.io.IOException;
  import java.io.InputStream;
+import java.util.Arrays;
  import java.util.List;

  import javax.imageio.ImageIO;
@@ -29,7 +30,6 @@ import javax.imageio.ImageIO;
  import org.apache.pdfbox.pdmodel.PDDocument;
  import org.apache.pdfbox.pdmodel.PDPage;
  import org.apache.pdfbox.pdmodel.common.PDRectangle;
-import org.bouncycastle.util.Arrays;

  /**
   *
@@ -54,11 +54,9 @@ public class PDVisibleSignDesigner
  private byte[] AffineTransformParams =   { 1, 0, 0, 1, 0, 0 }; // default
  private float imageSizeInPercents;
  private PDDocument document = null;
-
-

  /**
- *
+ *
   * @param originalDocumenStream
   * @param imageStream
   * @param page the page number the visible signature is added to.
@@ -80,7 +78,6 @@ public class PDVisibleSignDesigner
   */
  public PDVisibleSignDesigner(String documentPath, InputStream 
imageStream, int page) throws IOException
  {
-
  // set visible singature image Input stream
  signatureImageStream(imageStream);

@@ -114,7 +111,6 @@ public class PDVisibleSignDesigner
   */
  private void calculatePageSize(PDDocument document, int page)
  {
-
  if (page < 1)
  {
  throw new IllegalArgumentException("First page of pdf is 1, not " 
+ page);
@@ -309,7 +305,6 @@ public class PDVisibleSignDesigner
   */
  private PDVisibleSignDesigner signatureImageStream(InputStream 
imageStream) throws IOException
  {
-
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  byte[] buffer = new byte[1024];
  int len;
@@ -321,7 +316,7 @@ public class PDVisibleSignDesigner
  baos.close();

  byte[] byteArray = baos.toByteArray();
-byte[] byteArraySecond = Arrays.clone(byteArray);
+byte[] byteArraySecond = Arrays.copyOf(byteArray, byteArray.length);

  InputStream inputForBufferedImage = new 
ByteArrayInputStream(byteArray);
  InputStream revertInputStream = new 
ByteArrayInputStream(byteArraySecond);






RE: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when extracting images

2014-07-23 Thread Allison, Timothy B.
Andreas and Tilman,

  Thank you very much for fixing this so quickly.  I'm finally getting around 
to figuring out if we should change anything in the Tika code based on your 
fixes.  If I follow the example of the latest ExtractImages for the 1.8x 
branch, I think I see that we should add:

1) resources.clear() at the end of processResources()
2) image.clear() after image.write2File()

Is there anything else that our client code should do to decrease the memory 
footprint during extraction of images?  Thank you, again!

 Best,

  Tim


From: Andreas Lehmkühler (JIRA) [j...@apache.org]
Sent: Sunday, June 15, 2014 7:36 AM
To: dev@pdfbox.apache.org
Subject: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when 
extracting images

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-2101.


Resolution: Fixed

> Surprising memory consumption when extracting images
> 
>
> Key: PDFBOX-2101
> URL: https://issues.apache.org/jira/browse/PDFBOX-2101
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.5
> Environment: Windows 7
> java version "1.7.0_55"
> Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
>Reporter: Tim Allison
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 1.8.6, 2.0.0
>
> Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
> PDFBOX-2101-714-poor.jpg, java.hprof.zip
>
>
> ExtractImages seems to fail to release memory resources on some files in both 
> PDFBox 1.8.5 and trunk.
> On this file 4MB file 
> [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if 
> extracting every image on every page (and there are many, many duplicate 
> images), there is an OOM with -Xmx1g.  If there is no Xmx and there is > 2.5g 
> available, ExtractImages will work.
> With some experimentation, the triggers seem to be JPEG images that have 
> masks.  I'm not sure, though, whether the issue is with PDFBox or Java.
> Commandlines:
> 1.8.5:
> java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 
> 239665.pdf
> 2.0_SNAPSHOT:
> java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
> org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
> Results:
> 1.8.5: 906 files before OOM
> {noformat}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
> va:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at 
> org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
> 514)
> at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
> ixelMap.java:217)
> at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
> eam(PDPixelMap.java:363)
> at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
> PDXObjectImage.java:254)
> at 
> org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
> 02)
> at 
> org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
> at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
> {noformat}
> 2.0_SNAPSHOT: 428 files before OOM
> {noformat}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2271)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
> va:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
> at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(
> SampledImageReader.java:171)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma
> ge(SampledImageReader.java:154)
> at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm
> ageXObject.java:171)
> at 
> org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2
> 31)
> at 
> org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages.
> java:206)
> at 
> org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav
> a:164)
> at org.apache.pdfbox.tools.ExtractImages.main(ExtractImages.jav

[jira] [Comment Edited] (PDFBOX-2236) Useless dependency in specific usage

2014-07-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071880#comment-14071880
 ] 

Tilman Hausherr edited comment on PDFBOX-2236 at 7/23/14 4:37 PM:
--

Thanks, done in rev 1612855 and 1612868 for the 1.8 branch. You'll find a 
snapshot soon here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.7-SNAPSHOT/
Please comment and/or reopen if it doesn't work properly.


was (Author: tilman):
Thanks, done in rev 1612855 for the 1.8 branch. You'll find a snapshot soon 
here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/1.8.7-SNAPSHOT/
Please comment and/or reopen if it doesn't work properly.

> Useless dependency in specific usage
> 
>
> Key: PDFBOX-2236
> URL: https://issues.apache.org/jira/browse/PDFBOX-2236
> Project: PDFBox
>  Issue Type: Wish
>  Components: Signing
>Affects Versions: 1.8.6
>Reporter: Cyril Bremaud
>Assignee: Tilman Hausherr
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.8.7
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> In class 
> org.apache.pdfbox.pdmodel.interactive.digitalsignature.visible.PDVisibleSignDesigner
>  line 324 you use Arrays.clone() method from BouncyCastle.
> This creates a dependency on BouncyCastle if we try to add a visual signature 
> (computed by third party solution) to a pdf document.
> For this specific usage this dependency seems useless and  could be avoided 
> using the copy method of the class java.util.Arrays.
> Could you fix this issue in a future version ?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Jenkins build is back to normal : PDFBox 1.8.x » Apache PDFBox #225

2014-07-23 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : PDFBox 1.8.x #225

2014-07-23 Thread Apache Jenkins Server
See 



Re: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when extracting images

2014-07-23 Thread Tilman Hausherr

Hi Tim,
if you're working with pages (PDPage), you can also call .clear() after 
you're done.

Tilman

Am 23.07.2014 18:26, schrieb Allison, Timothy B.:

Andreas and Tilman,

   Thank you very much for fixing this so quickly.  I'm finally getting around 
to figuring out if we should change anything in the Tika code based on your 
fixes.  If I follow the example of the latest ExtractImages for the 1.8x 
branch, I think I see that we should add:

1) resources.clear() at the end of processResources()
2) image.clear() after image.write2File()

Is there anything else that our client code should do to decrease the memory 
footprint during extraction of images?  Thank you, again!

  Best,

   Tim


From: Andreas Lehmkühler (JIRA) [j...@apache.org]
Sent: Sunday, June 15, 2014 7:36 AM
To: dev@pdfbox.apache.org
Subject: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when 
extracting images

  [ 
https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-2101.


 Resolution: Fixed


Surprising memory consumption when extracting images


 Key: PDFBOX-2101
 URL: https://issues.apache.org/jira/browse/PDFBOX-2101
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.5
 Environment: Windows 7
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 1.8.6, 2.0.0

 Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
PDFBOX-2101-714-poor.jpg, java.hprof.zip


ExtractImages seems to fail to release memory resources on some files in both 
PDFBox 1.8.5 and trunk.
On this file 4MB file 
[http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if extracting 
every image on every page (and there are many, many duplicate images), there is an 
OOM with -Xmx1g.  If there is no Xmx and there is > 2.5g available, 
ExtractImages will work.
With some experimentation, the triggers seem to be JPEG images that have masks. 
 I'm not sure, though, whether the issue is with PDFBox or Java.
Commandlines:
1.8.5:
java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 239665.pdf
2.0_SNAPSHOT:
java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
Results:
1.8.5: 906 files before OOM
{noformat}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at 
org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
514)
 at 
org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
ixelMap.java:217)
 at 
org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
eam(PDPixelMap.java:363)
 at 
org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
PDXObjectImage.java:254)
 at 
org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
02)
 at 
org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
 at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
{noformat}
2.0_SNAPSHOT: 428 files before OOM
{noformat}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
va:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
 at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
 at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(
SampledImageReader.java:171)
 at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBIma
ge(SampledImageReader.java:154)
 at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDIm
ageXObject.java:171)
 at 
org.apache.pdfbox.tools.ExtractImages.write2file(ExtractImages.java:2
31)
 at 
org.apache.pdfbox.tools.ExtractImages.processResources(ExtractImages.
java:206)
 at 
org.apache.pdfbox.tools.ExtractImages.extractImages(ExtractImages.jav
a:164)
 at org.apache.pdfbox.tools.ExtractImages.main(ExtractImag

[jira] [Created] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2237:
---

 Summary: java.io.IOException: Image stream is empty for inline 
image
 Key: PDFBOX-2237
 URL: https://issues.apache.org/jira/browse/PDFBOX-2237
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.0
 Attachments: PDFBOX-2237-041715.pdf

The attached file throws an exception:
{code}
java.io.IOException: Image stream is empty
at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
at 
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
at 
org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
{code}
The reason is this:
{code}
BI
/H 1 /W 256 /CS /RGB /BPC 8 /F []
...
{code}
The empty filter array is incorrectly processed in PDInlineImage, it results in 
an empty output stream. Checking for an empty array like checking for null 
solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2237:


Attachment: PDFBOX-2237-041715.pdf

> java.io.IOException: Image stream is empty for inline image
> ---
>
> Key: PDFBOX-2237
> URL: https://issues.apache.org/jira/browse/PDFBOX-2237
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: PDFBOX-2237-041715.pdf
>
>
> The attached file throws an exception:
> {code}
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
>   at 
> org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
> {code}
> The reason is this:
> {code}
> BI
> /H 1 /W 256 /CS /RGB /BPC 8 /F []
> ...
> {code}
> The empty filter array is incorrectly processed in PDInlineImage, it results 
> in an empty output stream. Checking for an empty array like checking for null 
> solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072079#comment-14072079
 ] 

Tilman Hausherr commented on PDFBOX-2237:
-

Done in rev 1612907 for the trunk.

> java.io.IOException: Image stream is empty for inline image
> ---
>
> Key: PDFBOX-2237
> URL: https://issues.apache.org/jira/browse/PDFBOX-2237
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: PDFBOX-2237-041715.pdf
>
>
> The attached file throws an exception:
> {code}
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
>   at 
> org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
> {code}
> The reason is this:
> {code}
> BI
> /H 1 /W 256 /CS /RGB /BPC 8 /F []
> ...
> {code}
> The empty filter array is incorrectly processed in PDInlineImage, it results 
> in an empty output stream. Checking for an empty array like checking for null 
> solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


RE: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when extracting images

2014-07-23 Thread Allison, Timothy B.
Got it.  Will do.  Thank you.


From: Tilman Hausherr [thaush...@t-online.de]
Sent: Wednesday, July 23, 2014 1:28 PM
To: dev@pdfbox.apache.org
Subject: Re: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when 
extracting images

Hi Tim,
if you're working with pages (PDPage), you can also call .clear() after
you're done.
Tilman

Am 23.07.2014 18:26, schrieb Allison, Timothy B.:
> Andreas and Tilman,
>
>Thank you very much for fixing this so quickly.  I'm finally getting 
> around to figuring out if we should change anything in the Tika code based on 
> your fixes.  If I follow the example of the latest ExtractImages for the 1.8x 
> branch, I think I see that we should add:
>
> 1) resources.clear() at the end of processResources()
> 2) image.clear() after image.write2File()
>
> Is there anything else that our client code should do to decrease the memory 
> footprint during extraction of images?  Thank you, again!
>
>   Best,
>
>Tim
>
> 
> From: Andreas Lehmkühler (JIRA) [j...@apache.org]
> Sent: Sunday, June 15, 2014 7:36 AM
> To: dev@pdfbox.apache.org
> Subject: [jira] [Resolved] (PDFBOX-2101) Surprising memory consumption when 
> extracting images
>
>   [ 
> https://issues.apache.org/jira/browse/PDFBOX-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Andreas Lehmkühler resolved PDFBOX-2101.
> 
>
>  Resolution: Fixed
>
>> Surprising memory consumption when extracting images
>> 
>>
>>  Key: PDFBOX-2101
>>  URL: https://issues.apache.org/jira/browse/PDFBOX-2101
>>  Project: PDFBox
>>   Issue Type: Bug
>>   Components: Utilities
>> Affects Versions: 1.8.5
>>  Environment: Windows 7
>> java version "1.7.0_55"
>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
>> Reporter: Tim Allison
>> Assignee: Andreas Lehmkühler
>> Priority: Minor
>>  Fix For: 1.8.6, 2.0.0
>>
>>  Attachments: 239665.pdf, PDFBOX-2101-298-good.jpg, 
>> PDFBOX-2101-714-poor.jpg, java.hprof.zip
>>
>>
>> ExtractImages seems to fail to release memory resources on some files in 
>> both PDFBox 1.8.5 and trunk.
>> On this file 4MB file 
>> [http://digitalcorpora.org/corp/nps/files/govdocs1/239/239665.pdf], if 
>> extracting every image on every page (and there are many, many duplicate 
>> images), there is an OOM with -Xmx1g.  If there is no Xmx and there is > 
>> 2.5g available, ExtractImages will work.
>> With some experimentation, the triggers seem to be JPEG images that have 
>> masks.  I'm not sure, though, whether the issue is with PDFBox or Java.
>> Commandlines:
>> 1.8.5:
>> java -Xmx1g -cp pdfbox-app-1.8.5.jar org.apache.pdfbox.ExtractImages 
>> 239665.pdf
>> 2.0_SNAPSHOT:
>> java -Xmx1g -cp pdfbox-app-2.0.0-SNAPSHOT.jar 
>> org.apache.pdfbox.tools.ExtractImages -addkey 239665.pdf
>> Results:
>> 1.8.5: 906 files before OOM
>> {noformat}
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>  at java.util.Arrays.copyOf(Arrays.java:2271)
>>  at 
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>  at 
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
>> va:93)
>>  at 
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>  at 
>> org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:
>> 514)
>>  at 
>> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDP
>> ixelMap.java:217)
>>  at 
>> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStr
>> eam(PDPixelMap.java:363)
>>  at 
>> org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(
>> PDXObjectImage.java:254)
>>  at 
>> org.apache.pdfbox.ExtractImages.processResources(ExtractImages.java:2
>> 02)
>>  at 
>> org.apache.pdfbox.ExtractImages.extractImages(ExtractImages.java:160)
>>  at org.apache.pdfbox.ExtractImages.main(ExtractImages.java:65)
>> {noformat}
>> 2.0_SNAPSHOT: 428 files before OOM
>> {noformat}
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>  at java.util.Arrays.copyOf(Arrays.java:2271)
>>  at 
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>  at 
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.ja
>> va:93)
>>  at 
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>  at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:70)
>>  at org.apache.pdfbox.io.IOUtils.toByteArray(IOUtils.java:52)
>>  at 
>> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bi

[jira] [Updated] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2237:


Fix Version/s: 1.8.7

> java.io.IOException: Image stream is empty for inline image
> ---
>
> Key: PDFBOX-2237
> URL: https://issues.apache.org/jira/browse/PDFBOX-2237
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.6, 1.8.7, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.7, 2.0.0
>
> Attachments: PDFBOX-2237-041715.pdf
>
>
> The attached file throws an exception:
> {code}
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
>   at 
> org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
> {code}
> The reason is this:
> {code}
> BI
> /H 1 /W 256 /CS /RGB /BPC 8 /F []
> ...
> {code}
> The empty filter array is incorrectly processed in PDInlineImage, it results 
> in an empty output stream. Checking for an empty array like checking for null 
> solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2237:


Affects Version/s: 1.8.7
   1.8.6

> java.io.IOException: Image stream is empty for inline image
> ---
>
> Key: PDFBOX-2237
> URL: https://issues.apache.org/jira/browse/PDFBOX-2237
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.6, 1.8.7, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.7, 2.0.0
>
> Attachments: PDFBOX-2237-041715.pdf
>
>
> The attached file throws an exception:
> {code}
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
>   at 
> org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
> {code}
> The reason is this:
> {code}
> BI
> /H 1 /W 256 /CS /RGB /BPC 8 /F []
> ...
> {code}
> The empty filter array is incorrectly processed in PDInlineImage, it results 
> in an empty output stream. Checking for an empty array like checking for null 
> solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072105#comment-14072105
 ] 

Tilman Hausherr commented on PDFBOX-2237:
-

The exception is different in 1.8
{code}
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.pdfbox.pdmodel.graphics.xobject.PDInlinedImage.createImage(PDInlinedImage.java:218)
at 
org.apache.pdfbox.util.operator.pagedrawer.BeginInlineImage.process(BeginInlineImage.java:69)
{code}
but the fix is the same. Done in rev 1612911.

> java.io.IOException: Image stream is empty for inline image
> ---
>
> Key: PDFBOX-2237
> URL: https://issues.apache.org/jira/browse/PDFBOX-2237
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.6, 1.8.7, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.7, 2.0.0
>
> Attachments: PDFBOX-2237-041715.pdf
>
>
> The attached file throws an exception:
> {code}
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
>   at 
> org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
> {code}
> The reason is this:
> {code}
> BI
> /H 1 /W 256 /CS /RGB /BPC 8 /F []
> ...
> {code}
> The empty filter array is incorrectly processed in PDInlineImage, it results 
> in an empty output stream. Checking for an empty array like checking for null 
> solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-2237) java.io.IOException: Image stream is empty for inline image

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2237.
-

Resolution: Fixed

> java.io.IOException: Image stream is empty for inline image
> ---
>
> Key: PDFBOX-2237
> URL: https://issues.apache.org/jira/browse/PDFBOX-2237
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.6, 1.8.7, 2.0.0
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
> Fix For: 1.8.7, 2.0.0
>
> Attachments: PDFBOX-2237-041715.pdf
>
>
> The attached file throws an exception:
> {code}
> java.io.IOException: Image stream is empty
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:117)
>   at 
> org.apache.pdfbox.pdmodel.graphics.image.PDInlineImage.getImage(PDInlineImage.java:234)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1082)
>   at 
> org.apache.pdfbox.util.operator.graphics.BeginInlineImage.process(BeginInlineImage.java:40)
> {code}
> The reason is this:
> {code}
> BI
> /H 1 /W 256 /CS /RGB /BPC 8 /F []
> ...
> {code}
> The empty filter array is incorrectly processed in PDInlineImage, it results 
> in an empty output stream. Checking for an empty array like checking for null 
> solves the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072175#comment-14072175
 ] 

Tilman Hausherr commented on PDFBOX-1511:
-

Although you made a diff to the wrong version, I think I see what would have to 
be changed. The previous strategy is indeed risky, and I wonder why there 
haven't been any complaints except here? Many PDF files name their images 
"Im1", "Im2", etc. Is there anybody here who does a lot of merging, and could 
test a modification?

> pdfMerger App produces Garbage
> --
>
> Key: PDFBOX-1511
> URL: https://issues.apache.org/jira/browse/PDFBOX-1511
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.7.1
> Environment: Win XP; Windows Server 2008 R2; java version "1.6.0_21", 
>Reporter: Michael Huber
> Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
> PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
> targetPdfMergeUtilityApp.pdf
>
>
> pdfbox Utility pdfMerger produces a merged document containing garbage. All 
> merged pdf files are contained but Strings are destroyed.
> The source pdf files are created with graphviz and are readable without error 
> or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
> Another astoundig thing is that a handcoded merger using pdfMergerUtility 
> class works fine when run within Eclipse Juno and creates same garbage when 
> run from cmd line (pls. see attached source)
> I checked everything that comes in mind to find the differences, e.g. Java 
> version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-23 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1511:


Description: 
pdfbox Utility pdfMerger produces a merged document containing garbage. All 
merged pdf files are contained but Strings are destroyed.

The source pdf files are created with graphviz and are readable without error 
or disturbance both with Acrobat X and pdfbox pdfDebug Utility.

Another astounding thing is that a handcoded merger using pdfMergerUtility 
class works fine when run within Eclipse Juno and creates same garbage when run 
from cmd line (pls. see attached source PdfRenderer.java)

I checked everything that comes in mind to find the differences, e.g. Java 
version, encoding/codepage issues, memory settings, found nothing.

  was:
pdfbox Utility pdfMerger produces a merged document containing garbage. All 
merged pdf files are contained but Strings are destroyed.

The source pdf files are created with graphviz and are readable without error 
or disturbance both with Acrobat X and pdfbox pdfDebug Utility.

Another astoundig thing is that a handcoded merger using pdfMergerUtility class 
works fine when run within Eclipse Juno and creates same garbage when run from 
cmd line (pls. see attached source)

I checked everything that comes in mind to find the differences, e.g. Java 
version, encoding/codepage issues, memory settings, found nothing.


> pdfMerger App produces Garbage
> --
>
> Key: PDFBOX-1511
> URL: https://issues.apache.org/jira/browse/PDFBOX-1511
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.7.1
> Environment: Win XP; Windows Server 2008 R2; java version "1.6.0_21", 
>Reporter: Michael Huber
> Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
> PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
> targetPdfMergeUtilityApp.pdf
>
>
> pdfbox Utility pdfMerger produces a merged document containing garbage. All 
> merged pdf files are contained but Strings are destroyed.
> The source pdf files are created with graphviz and are readable without error 
> or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
> Another astounding thing is that a handcoded merger using pdfMergerUtility 
> class works fine when run within Eclipse Juno and creates same garbage when 
> run from cmd line (pls. see attached source PdfRenderer.java)
> I checked everything that comes in mind to find the differences, e.g. Java 
> version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-23 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072875#comment-14072875
 ] 

Maruan Sahyoun commented on PDFBOX-1511:


[~tilman] I could test a modification against a set of files we are using for a 
customer where we are merging banking documents. I'm not using PDFMergerUtility 
though. 

 

> pdfMerger App produces Garbage
> --
>
> Key: PDFBOX-1511
> URL: https://issues.apache.org/jira/browse/PDFBOX-1511
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.7.1
> Environment: Win XP; Windows Server 2008 R2; java version "1.6.0_21", 
>Reporter: Michael Huber
> Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
> PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
> targetPdfMergeUtilityApp.pdf
>
>
> pdfbox Utility pdfMerger produces a merged document containing garbage. All 
> merged pdf files are contained but Strings are destroyed.
> The source pdf files are created with graphviz and are readable without error 
> or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
> Another astounding thing is that a handcoded merger using pdfMergerUtility 
> class works fine when run within Eclipse Juno and creates same garbage when 
> run from cmd line (pls. see attached source PdfRenderer.java)
> I checked everything that comes in mind to find the differences, e.g. Java 
> version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-23 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072902#comment-14072902
 ] 

Tilman Hausherr commented on PDFBOX-1511:
-

Ok, committed for the trunk only as rev 1613017. NOT done for the 1.8 branch 
yet.

> pdfMerger App produces Garbage
> --
>
> Key: PDFBOX-1511
> URL: https://issues.apache.org/jira/browse/PDFBOX-1511
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.7.1
> Environment: Win XP; Windows Server 2008 R2; java version "1.6.0_21", 
>Reporter: Michael Huber
> Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
> PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
> targetPdfMergeUtilityApp.pdf
>
>
> pdfbox Utility pdfMerger produces a merged document containing garbage. All 
> merged pdf files are contained but Strings are destroyed.
> The source pdf files are created with graphviz and are readable without error 
> or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
> Another astounding thing is that a handcoded merger using pdfMergerUtility 
> class works fine when run within Eclipse Juno and creates same garbage when 
> run from cmd line (pls. see attached source PdfRenderer.java)
> I checked everything that comes in mind to find the differences, e.g. Java 
> version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)