Re: Jenkins Error

2014-09-11 Thread Andreas Lehmkühler
Hi,


 John Hewson j...@jahewson.com hat am 10. September 2014 um 22:11
 geschrieben:


 I’m getting strange build errors on Jenkins with HTTP 401 “Unauthorized” from
 https://repository.apache.org.

 Here’s the log:

 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-deploy-plugin:2.8.1:deploy (default-deploy) on
 project pdfbox-parent: Failed to deploy artifacts: Could not transfer artifact
 org.apache.pdfbox:pdfbox-parent:pom:2.0.0-20140910.200319-587 from/to
 apache.snapshots.https
 (https://repository.apache.org/content/repositories/snapshots): Failed to
 transfer file:
 https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/pdfbox-parent-2.0.0-20140910.200319-587.pom.
 Return code is: 401, ReasonPhrase: Unauthorized. - [Help 1]

According to infra@ there was an issue with nexus yesterday which should be
solved by now. I've already triggered a build manually to check that.

 -- John


BR
Andreas Lehmkühler


Jenkins build is back to normal : PDFBox-trunk » PDFBox parent #1264

2014-09-11 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/1264/



[jira] [Commented] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-09-11 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129732#comment-14129732
 ] 

Maruan Sahyoun commented on PDFBOX-2340:


For now I’ll take a slightly different approach keeping the current web site as 
is with the Cookbook becoming an kind of detached microsite. So the changes are 
minimized. At a later stage we can still reintegrate it and drive it directly 
from the examples package if we choose to do so.

 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Maruan Sahyoun
 Attachments: Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from the examples/snippet ‚repository‘.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: PDFBox 1.8.7 release?

2014-09-11 Thread Maruan Sahyoun
Hi Andreas,

what are your current plans to cut the new release? Dependent on that I could 
do https://issues.apache.org/jira/browse/PDFBOX-91 [Comb Fields] as a quick fix 
this weekend to the 1.8 branch.

BR
Maruan

Am 14.08.2014 um 09:08 schrieb Andreas Lehmkühler andr...@lehmi.de:

 
 
 Andreas Lehmkühler andr...@lehmi.de hat am 7. August 2014 um 12:35
 geschrieben:
 
 
 Hi,
 
 there is already a number of solved issues and I guess it's
 time for a new bugfix release.
 
 I'm working on PDFBOX-2250 and I'd like to finish that
 first but how about a new release in 2 or 3 weeks from now?
 
 WDYT?
 
 As there weren't any objections I'm targeting the first week of september to 
 cut
 the release.
  
 BR
 Andreas Lehmkühler



[jira] [Commented] (PDFBOX-2337) Add an example for highlighting text based on a string

2014-09-11 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129762#comment-14129762
 ] 

Maruan Sahyoun commented on PDFBOX-2337:


Do we really need a ICLA / CCLA in this case? As per [~lehmi]’s comment on the 
users mailing list that might not be necessary.

Could we come up with a licensing header for such cases as others might be 
interested in writing up samples or help enhancing the documentation. I would 
like to see us making this process as simple as possible.

I’d also vote for taking this sample as is and include that in 1.8.x if that 
doesn’t hold the release process for long and fix it for 2.0 after that. Could 
be done by Joël or us.

WDYT?

 Add an example for highlighting text based on a string 
 ---

 Key: PDFBOX-2337
 URL: https://issues.apache.org/jira/browse/PDFBOX-2337
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Reporter: Joël Kuiper

 An often heard request is to be able to highlight a certain text within a PDF 
 programmatically, similar to the highlight functionality in Acrobat or 
 Preview.app.
 The actual implementation of this functionality is trickier than it appears, 
 since it requires the calculation of bouding boxes from TextPositions. 
 A example class may help people with implementing this (common) 
 functionality. 
 (see for example this discussion 
 https://mail-archives.apache.org/mod_mbox/pdfbox-users/201409.mbox/%3CC8340BB9-E299-4A76-A50B-6155504A0D5B%40joelkuiper.eu%3E)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1614) Digitally sign PDFs without file system access

2014-09-11 Thread Andrei Solntsev (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129782#comment-14129782
 ] 

Andrei Solntsev commented on PDFBOX-1614:
-

Hi!
Can you say when PDFBOX 2.0 will be released? We are waiting for this feature 
to be available.

 Digitally sign PDFs without file system access
 --

 Key: PDFBOX-1614
 URL: https://issues.apache.org/jira/browse/PDFBOX-1614
 Project: PDFBox
  Issue Type: Improvement
  Components: Signing
Affects Versions: 1.8.1
Reporter: Thierry Boschat
Assignee: Thomas Chojecki
 Fix For: 2.0.0


 Hi I'm using pdfbox-1.8.1 to digitally sign PDFs.
 I find the sample below to handle it.
 But in this example I have to use a FileInputStream however I want to do it 
 only through streams (without any file system access). I tried to extends 
 FileInputStream to deal with it but I failed. Any tips for me about that 
 problem ?
 Thanks.
 File outputDocument = new File(resources/signed + document.getName());
 FileInputStream fis = new FileInputStream(document);
 FileOutputStream fos = new FileOutputStream(outputDocument);
 int c;
 while ((c = fis.read(buffer)) != -1)
 {
   fos.write(buffer, 0, c);
 }
 fis.close();
 fis = new FileInputStream(outputDocument);
 // load document
 PDDocument doc = PDDocument.load(document);
 // create signature dictionary
 PDSignature signature = new PDSignature();
 signature.setFilter(PDSignature.FILTER_ADOBE_PPKLITE); // default filter
 // subfilter for basic and PAdES Part 2 signatures
 signature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED);
 signature.setName(signer name);
 signature.setLocation(signer location);
 signature.setReason(reason for signature);
 // the signing date, needed for valid signature
 signature.setSignDate(Calendar.getInstance());
 // register signature dictionary and sign interface
 doc.addSignature(signature, this);
 // write incremental (only for signing purpose)
 doc.saveIncremental(fis, fos);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2337) Add an example for highlighting text based on a string

2014-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129830#comment-14129830
 ] 

Andreas Lehmkühler commented on PDFBOX-2337:


{quote}
Do we really need a ICLA / CCLA in this case?
{quote}
There are no explicit rules to decide that, just a rule of thumb: only 
substantial changes require signing a CLA. IMO an example on how to use PDFBox 
doesn't qualify for that. It simply uses PDFBox and doesn't add any new 
features.

{quote}
The Apache header you've used is for ASF projects, e.g. Licensed to the Apache 
Software Foundation (ASF) under one or more contributor license agreements. 
isn't true unless you've signed a CLA and contributed this code to Apache.
{quote}
Good catch. That should be changed to the more general version of the AL 2.0 
which can be found [here|http://apache.org/licenses/LICENSE-2.0] in the 
APPENDIX section at the end.

{quote}
I’d also vote for taking this sample as is and include that in 1.8.x 
{quote}
+1, we agreed to stop adding any new features to the 1.8 branch. But as this is 
just an example I don't see any reason not to add it.

 Add an example for highlighting text based on a string 
 ---

 Key: PDFBOX-2337
 URL: https://issues.apache.org/jira/browse/PDFBOX-2337
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Reporter: Joël Kuiper

 An often heard request is to be able to highlight a certain text within a PDF 
 programmatically, similar to the highlight functionality in Acrobat or 
 Preview.app.
 The actual implementation of this functionality is trickier than it appears, 
 since it requires the calculation of bouding boxes from TextPositions. 
 A example class may help people with implementing this (common) 
 functionality. 
 (see for example this discussion 
 https://mail-archives.apache.org/mod_mbox/pdfbox-users/201409.mbox/%3CC8340BB9-E299-4A76-A50B-6155504A0D5B%40joelkuiper.eu%3E)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Jenkins Error

2014-09-11 Thread Andreas Lehmkühler
Hi,

infra@ just blogged about the incident

https://blogs.apache.org/infra/entry/nexus_reduced_performance_issues_resolved

BR
Andreas Lehmkühler

 Andreas Lehmkühler andr...@lehmi.de hat am 11. September 2014 um 08:45
 geschrieben:


 Hi,


  John Hewson j...@jahewson.com hat am 10. September 2014 um 22:11
  geschrieben:
 
 
  I’m getting strange build errors on Jenkins with HTTP 401 “Unauthorized”
  from
  https://repository.apache.org.
 
  Here’s the log:
 
  [ERROR] Failed to execute goal
  org.apache.maven.plugins:maven-deploy-plugin:2.8.1:deploy (default-deploy)
  on
  project pdfbox-parent: Failed to deploy artifacts: Could not transfer
  artifact
  org.apache.pdfbox:pdfbox-parent:pom:2.0.0-20140910.200319-587 from/to
  apache.snapshots.https
  (https://repository.apache.org/content/repositories/snapshots): Failed to
  transfer file:
  https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-parent/2.0.0-SNAPSHOT/pdfbox-parent-2.0.0-20140910.200319-587.pom.
  Return code is: 401, ReasonPhrase: Unauthorized. - [Help 1]
 
 According to infra@ there was an issue with nexus yesterday which should be
 solved by now. I've already triggered a build manually to check that.

  -- John


 BR
 Andreas Lehmkühler


Re: PDFBox 1.8.7 release?

2014-09-11 Thread Andreas Lehmkühler
Hi Maruan,

 Maruan Sahyoun sahy...@fileaffairs.de hat am 11. September 2014 um 09:32
 geschrieben:


 Hi Andreas,

 what are your current plans to cut the new release? Dependent on that I could
 do https://issues.apache.org/jira/browse/PDFBOX-91 [Comb Fields] as a quick
 fix this weekend to the 1.8 branch.

I'm targeting next week, so that your plan would fit in.

 BR
 Maruan

BR
Andreas Lehmkühler


 Am 14.08.2014 um 09:08 schrieb Andreas Lehmkühler andr...@lehmi.de:

 
 
  Andreas Lehmkühler andr...@lehmi.de hat am 7. August 2014 um 12:35
  geschrieben:
 
 
  Hi,
 
  there is already a number of solved issues and I guess it's
  time for a new bugfix release.
 
  I'm working on PDFBOX-2250 and I'd like to finish that
  first but how about a new release in 2 or 3 weeks from now?
 
  WDYT?
 
  As there weren't any objections I'm targeting the first week of september to
  cut
  the release.
  
  BR
  Andreas Lehmkühler



[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129848#comment-14129848
 ] 

Andreas Lehmkühler commented on PDFBOX-2301:


The origin issue (PDFBOX-1625) was about merging pdfs and the underlying issue 
about the usage of the scratch file when merging pdfs (see PDFBOX-1586). 
PDFBOX-1586 reduces the usage of the scratch file and PDFBOX-1625 tries to 
detach the source and the destination pdf by cloning. Both are just workarounds 
and in the case of PDFBOX-1625 it has some side effects.
IMHO, we have to overhaul the stream handling within the COS layer and we 
shouldn't expose the scratch file anymore. The whole stuff should be handled 
under the hood. The only thing the user may decide is wether to use the file 
system or the memory as temp area.

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Reporter: gee
 Attachments: clone.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  

[jira] [Created] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread simon steiner (JIRA)
simon steiner created PDFBOX-2341:
-

 Summary: WriteDecodedDoc cant decrypt pdf correctly
 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner


java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc aes256_57.pdf tmp.pdf

Kind Regards missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2341:
--
Attachment: aes256_57.pdf

 WriteDecodedDoc cant decrypt pdf correctly
 --

 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: aes256_57.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc aes256_57.pdf tmp.pdf
 Kind Regards missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2341:
--
Description: 
java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc aes256_57.pdf tmp.pdf

Kind Regards missing

I guess you will ask me to use nonseq

  was:
java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc aes256_57.pdf tmp.pdf

Kind Regards missing


 WriteDecodedDoc cant decrypt pdf correctly
 --

 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: aes256_57.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc aes256_57.pdf tmp.pdf
 Kind Regards missing
 I guess you will ask me to use nonseq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129876#comment-14129876
 ] 

Tilman Hausherr commented on PDFBOX-2341:
-

There's a part cut off. End of correct stream:
{code}
BT
/F15 10 Tf
1 0 0 -1 0 359.34899902 Tm [002000130016001B00010021000600170004000C001B0012] 
TJ
1 0 0 -1 0 371.34899902 Tm 
[00220023002400250026002700010024000300260025002800250027002900210024 74 
0029 18 002A0021] TJ
ET
Q
{code}

end of bad stream:
{code}
BT
/F15 10 Tf
1 0 0 -1 0 359.34899902 Tm [002000130016001B00010021000600170004000C001B0012] 
TJ
1 0 0 -1 0 371.34899902 Tm [002200230024002500260027000100
{code}

However PDFDebugger shows the correct contents (which is where I got the above 
data). PDFReader can't render, it has the bad stream and throws an exception 
Missing closing bracket for hex string.

 WriteDecodedDoc cant decrypt pdf correctly
 --

 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: aes256_57.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc aes256_57.pdf tmp.pdf
 Kind Regards missing
 I guess you will ask me to use nonseq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread simon steiner (JIRA)
simon steiner created PDFBOX-2342:
-

 Summary: WriteDecodedDoc cant decrypt pdf form correctly
 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner


java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
WriteDecodedDoc -nonSeq test.pdf

country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread simon steiner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

simon steiner updated PDFBOX-2342:
--
Attachment: test.pdf

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2337) Add an example for highlighting text based on a string

2014-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129931#comment-14129931
 ] 

Joël Kuiper commented on PDFBOX-2337:
-

I'll change the License header and code style (assuming the Eclipse settings in 
the repo are still up to date), also I'd be happy to sign a CLA if needed. 

I could port the functionality to 2.0, however I need (a modified version) of 
this code in production which still runs 1.8 … so I'll probably only do that 
after a public release of 2.0 (even a pre-release would be fine, as long as it 
in Maven) 



 Add an example for highlighting text based on a string 
 ---

 Key: PDFBOX-2337
 URL: https://issues.apache.org/jira/browse/PDFBOX-2337
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Reporter: Joël Kuiper

 An often heard request is to be able to highlight a certain text within a PDF 
 programmatically, similar to the highlight functionality in Acrobat or 
 Preview.app.
 The actual implementation of this functionality is trickier than it appears, 
 since it requires the calculation of bouding boxes from TextPositions. 
 A example class may help people with implementing this (common) 
 functionality. 
 (see for example this discussion 
 https://mail-archives.apache.org/mod_mbox/pdfbox-users/201409.mbox/%3CC8340BB9-E299-4A76-A50B-6155504A0D5B%40joelkuiper.eu%3E)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2337) Add an example for highlighting text based on a string

2014-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129931#comment-14129931
 ] 

Joël Kuiper edited comment on PDFBOX-2337 at 9/11/14 11:56 AM:
---

I'll change the License header and code style (assuming the Eclipse settings in 
the repo are still up to date), also I'd be happy to sign a CLA if needed. 

I could port the functionality to 2.0, however I need (a modified version) of 
this code in production which still runs 1.8 … so I'll probably only do that 
after a public release of 2.0 (even a pre-release would be fine, as long as it 
is in Maven) 




was (Author: joelkuiper):
I'll change the License header and code style (assuming the Eclipse settings in 
the repo are still up to date), also I'd be happy to sign a CLA if needed. 

I could port the functionality to 2.0, however I need (a modified version) of 
this code in production which still runs 1.8 … so I'll probably only do that 
after a public release of 2.0 (even a pre-release would be fine, as long as it 
in Maven) 



 Add an example for highlighting text based on a string 
 ---

 Key: PDFBOX-2337
 URL: https://issues.apache.org/jira/browse/PDFBOX-2337
 Project: PDFBox
  Issue Type: New Feature
  Components: Utilities
Reporter: Joël Kuiper

 An often heard request is to be able to highlight a certain text within a PDF 
 programmatically, similar to the highlight functionality in Acrobat or 
 Preview.app.
 The actual implementation of this functionality is trickier than it appears, 
 since it requires the calculation of bouding boxes from TextPositions. 
 A example class may help people with implementing this (common) 
 functionality. 
 (see for example this discussion 
 https://mail-archives.apache.org/mod_mbox/pdfbox-users/201409.mbox/%3CC8340BB9-E299-4A76-A50B-6155504A0D5B%40joelkuiper.eu%3E)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-11 Thread Timo Boehme (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129954#comment-14129954
 ] 

Timo Boehme commented on PDFBOX-2301:
-

I'm also interested in fixing the RandomAccess buffer issue since the cloning 
work-around seems a bit problematic to me too. I did try to understand the 
issue for this workaround an came to following conclusions:
# problem is closing the buffer stream when closing a document but COS objects 
should be used somewhere else, needing access to the buffered data
# the work-around is implemented for COSStream objects only and clones buffer 
in case it is RandomAccessBuffer; in case of RandomAccessFile this is not done 
thus here the problem from point 1 still persists (?)
# since only COSStream will access the data it writes into buffer, why not 
simply create a new buffer object instead of cloning? At least this will not 
duplicate data already in buffer but not used by the stream. The buffer will be 
garbage collected when there is no reference to the COSStream or to any input 
stream created on the buffer

WDYT?

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Reporter: gee
 Attachments: clone.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |

[jira] [Created] (PDFBOX-2343) Giving NullPoint exception when we call PDType1Font.HELVETICA_BOLD.getStringWidth(Some String)

2014-09-11 Thread Gayan Wijenayaka (JIRA)
Gayan Wijenayaka created PDFBOX-2343:


 Summary: Giving NullPoint exception when we call 
PDType1Font.HELVETICA_BOLD.getStringWidth(Some String)
 Key: PDFBOX-2343
 URL: https://issues.apache.org/jira/browse/PDFBOX-2343
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Gayan Wijenayaka


When we call the PDType1Font.HELVETICA_BOLD.getStringWidth(Some String) it is 
throwing java.lang.NullPointerException after  
pdfbox-app-2.0.0-20140903.210612-518 release. Could you please fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2341.
-
Resolution: Won't Fix
  Assignee: Tilman Hausherr

The old parser works as described, I traced through it, saved intermediate 
files etc. What really happened is a border case that can't be solved without 
creating trouble elsewhere: the encrypted stream has a hex 0D at the end. In 
the sequential parser that 0x0D isn't taken into the stream because it is 
assumed that it doesn't belong to it, i.e. one expects stream-data 0D 0A 
endstream, or stream-data 0D endstream. The non-sequential parser just takes 
the stream according to the length.

To prove that theory I changed your file so that there is one 0D more in the 
stream. Now it works with the sequential parser. This suggests that this 
missing 0D has a meaning for the decryption routine. I also did the opposite 
test, changed the length of the stream so that 0D isn't included, and not it 
failed with the nonsequential parser.

So long story in short: use nonseq :-)

 WriteDecodedDoc cant decrypt pdf correctly
 --

 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: aes256_57.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc aes256_57.pdf tmp.pdf
 Kind Regards missing
 I guess you will ask me to use nonseq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-2341:
-

oops, meant to close only

 WriteDecodedDoc cant decrypt pdf correctly
 --

 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: aes256_57.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc aes256_57.pdf tmp.pdf
 Kind Regards missing
 I guess you will ask me to use nonseq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2341) WriteDecodedDoc cant decrypt pdf correctly

2014-09-11 Thread simon steiner (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130166#comment-14130166
 ] 

simon steiner commented on PDFBOX-2341:
---

I tried moving to nonseq but blocked by PDFBOX-2342

 WriteDecodedDoc cant decrypt pdf correctly
 --

 Key: PDFBOX-2341
 URL: https://issues.apache.org/jira/browse/PDFBOX-2341
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: aes256_57.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc aes256_57.pdf tmp.pdf
 Kind Regards missing
 I guess you will ask me to use nonseq



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130228#comment-14130228
 ] 

Tilman Hausherr commented on PDFBOX-2342:
-

To clarify:
Country drop down contents are garbage when WriteDecodedDoc is used with the 
-nonSeq option

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files

2014-09-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130271#comment-14130271
 ] 

ASF subversion and git services commented on PDFBOX-2261:
-

Commit 1624334 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1624334 ]

PDFBOX-2261: move setValue to super class PDChoice

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files

2014-09-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-2261.

Resolution: Fixed

I'm done here. PDFBOX-2333 is a follow up for the creation of the appearance 
stream

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2261) Extremely long hang during getFields() on a few PDF files

2014-09-11 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130341#comment-14130341
 ] 

Tim Allison commented on PDFBOX-2261:
-

Thank you, all!

 Extremely long hang during getFields() on a few PDF files
 -

 Key: PDFBOX-2261
 URL: https://issues.apache.org/jira/browse/PDFBOX-2261
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 1.8.6
Reporter: Tim Allison
Assignee: Andreas Lehmkühler
Priority: Minor
 Fix For: 2.0.0

 Attachments: 966679.pdf, RadioButtons.pdf, screenshot-pdfdebugger.png


 When I run oap.examples.fdf.PrintFields from trunk, the code seems to hang 
 during acroForm.getFields().  This is a heavy load hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130367#comment-14130367
 ] 

ASF subversion and git services commented on PDFBOX-2342:
-

Commit 1624347 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1624347 ]

PDFBOX-2342: decrypt COSArray too, not just COSString

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130370#comment-14130370
 ] 

ASF subversion and git services commented on PDFBOX-2342:
-

Commit 1624348 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1624348 ]

PDFBOX-2342: allow public access to decryptArray

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130386#comment-14130386
 ] 

Tilman Hausherr commented on PDFBOX-2342:
-

The non sequential parser doesn't decrypt recursively like the sequential one 
does (why?). In a COSDictionary, only COSStrings are decrypted, the rest is 
left untouched. There's already a suspicious TODO there. For now, I've just 
added the decryption of COSArray which solves [~ssteiner1]s problem. But I 
wonder what else is incorrectly (not) decrypted, e.g. a COSDictionary within a 
COSDictionary. Why aren't we using this nice does everything method?
{code}
private void decrypt(COSBase obj, long objNum, long genNum) throws IOException
{code}

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130386#comment-14130386
 ] 

Tilman Hausherr edited comment on PDFBOX-2342 at 9/11/14 5:52 PM:
--

The non sequential parser doesn't decrypt recursively like the sequential one 
does (why?). In a COSDictionary, only COSStrings are decrypted, the rest is 
left untouched. There's already a suspicious TODO there. For now, I've just 
added the decryption of COSArray which solves [~ssteiner1]s problem. But I 
wonder what else is incorrectly (not) decrypted, e.g. a COSDictionary within a 
COSDictionary. Why aren't we using this nice does everything method in the 
SecurityHandler class?
{code}
private void decrypt(COSBase obj, long objNum, long genNum) throws IOException
{code}


was (Author: tilman):
The non sequential parser doesn't decrypt recursively like the sequential one 
does (why?). In a COSDictionary, only COSStrings are decrypted, the rest is 
left untouched. There's already a suspicious TODO there. For now, I've just 
added the decryption of COSArray which solves [~ssteiner1]s problem. But I 
wonder what else is incorrectly (not) decrypted, e.g. a COSDictionary within a 
COSDictionary. Why aren't we using this nice does everything method?
{code}
private void decrypt(COSBase obj, long objNum, long genNum) throws IOException
{code}

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130228#comment-14130228
 ] 

Tilman Hausherr edited comment on PDFBOX-2342 at 9/11/14 5:53 PM:
--

To clarify:
Country drop down box contents are garbage when WriteDecodedDoc is used with 
the -nonSeq option, but they are fine with the old parser.


was (Author: tilman):
To clarify:
Country drop down contents are garbage when WriteDecodedDoc is used with the 
-nonSeq option

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)