Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Maruan Sahyoun
Dear Santosh,

you can unregister using the link below.

https://pdfbox.apache.org/mailinglists.html

With kind regards
Maruan

 Am 17.09.2014 um 03:00 schrieb Santosh Arakeri santosh.arak...@gmail.com:
 
 Pl dont send me mail.
 On 16 Sep 2014 13:52, Maruan Sahyoun sahy...@fileaffairs.de wrote:
 
 Hi there,
 
 in order to make it easier for people to contribute to the documentation
 and examples I thought about the potential benefits of moving these to a
 git based repository instead of svn. The main idea behind that is to allow
 people to contribute via github opening another channel of communication
 and making it easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes necessary
 yet but wanted to get a first feedback about support for that idea before
 putting more effort into that.
 
 WDYT?
 
 Maruan


Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Andreas Lehmkühler
Hi,

 Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
 geschrieben:


 -1, I don't like the idea to have different repository types.
Hmmm, is this just a I don't like it, but I can live with it or is it a clear
veto?

In a case of a veto, how about starting with moving parts of the docs to a new
git repo? IMO sooner or later the project will move from svn to git and that
would be a good opertunity to get used to the general usage of git and of course
to the special processes used here at the ASF so that we are not thrown in at
the deep end after the migration.

 Tilman

BR
Andreas


 Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:
  Hi there,
 
  in order to make it easier for people to contribute to the documentation and
  examples I thought about the potential benefits of moving these to a git
  based repository instead of svn. The main idea behind that is to allow
  people to contribute via github opening another channel of communication and
  making it easier to contribute.
 
  Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
  https://github.com/apache/cordova-docs for an example of that.
 
  I haven’t thought about all potential implications and changes necessary yet
  but wanted to get a first feedback about support for that idea before
  putting more effort into that.
 
  WDYT?
 
  Maruan



Re: test failures trunk

2014-09-17 Thread Cornelis Hoeflake
Hi,

Ok. My issue was that I could not build while the PDFBox Jenkins has
succesfull builds. So I was wondering why. But I run the tests in Eclipse,
which ignores the test exclusions in the pom.xml. Now it is clear for me.

Cornlelis Hoeflake


2014-09-15 20:44 GMT+02:00 John Hewson j...@jahewson.com:

 Yep, this test was failing back in January when I first encountered it. It
 uses some external certificate files, which could be the problem, but I
 really don’t know anything about it.

 -- John

 On 13 Sep 2014, at 08:32, Tilman Hausherr thaush...@t-online.de wrote:

  I had a look:
  - yes it fails in 2.0 but not in 1.8
  - it still fails with the correction I mentioned is done (can't match
 the recipient)
  - the test was removed long ago which is why the builds don't fail
  - that part was written by one Benoit Guillon, who has moved on
 
  I don't know why the test was removed. John mentioned this before in
 PDFBOX-1825. I don't know enough about bc to understand what's wrong.
 
  Tilman
 
  Am 12.09.2014 um 10:49 schrieb Cornelis Hoeflake:
  Hi,
 
  I get exceptions when running the
 org.apache.pdfbox.encryption.TestPublicKeyEncryption tests. It is saying
 that the document is alread closed when calling the save method in reload.
 
  I get failures when running org.apache.pdfbox.util.TestPDFToImage,
 
  Is there something wrong with my test setup or are there more
 developers having these errors/failures?
 
  Met vriendelijke groet,
 
  Cornelis Hoeflake
 
  *Postex |  post opnieuw uitgevonden
  *[t]  088 07 07 400
  [m] 06 18684806
  [w] www.postex.com http://www.postex.com/
 
  Postex Nederland B.V. - Postbus 70466 - 1007 KL  Amsterdam
 
  **
 
 
  The information in this message is confidential and may be legally
 privileged.
  It is intended solely for the addressee. Access to this message by
 anyone else is unauthorized.
  If you are not the intended recipient, any disclosure, copying, or
 distribution of the message,
  or any action or omission taken by you in reliance on it, is prohibited
 and may be unlawful.
  Please immediately contact the sender if you have received this message
 in error.




[jira] [Created] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread Cornelis Hoeflake (JIRA)
Cornelis Hoeflake created PDFBOX-2357:
-

 Summary: PDTrueTypeFont has no method to load font from stream
 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake


The PDTrueTypeFont had formely static method to load a font from a stream. Now 
that method is gone. As far as I can see without a reason. Probably removed by 
mistake.

Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread Cornelis Hoeflake (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136873#comment-14136873
 ] 

Cornelis Hoeflake commented on PDFBOX-2357:
---

The method:

/**
 * Loads a TTF to be embedded into a document.
 *
 * @param doc The PDF document that will hold the embedded font.
 * @param file a ttf file.
 * @return a PDTrueTypeFont instance.
 * @throws IOException If there is an error loading the data.
 */
public static PDTrueTypeFont loadTTF(PDDocument doc, InputStream is) throws 
IOException
{
return new PDTrueTypeFont(doc, is);
}

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread Cornelis Hoeflake (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136873#comment-14136873
 ] 

Cornelis Hoeflake edited comment on PDFBOX-2357 at 9/17/14 7:02 AM:


The method:
{code:title=PDTrueTypeFont.java|borderStyle=solid}
/**
 * Loads a TTF to be embedded into a document.
 *
 * @param doc The PDF document that will hold the embedded font.
 * @param file a ttf file.
 * @return a PDTrueTypeFont instance.
 * @throws IOException If there is an error loading the data.
 */
public static PDTrueTypeFont loadTTF(PDDocument doc, InputStream is) throws 
IOException
{
return new PDTrueTypeFont(doc, is);
}
{code}


was (Author: c.hoeflake):
The method:

/**
 * Loads a TTF to be embedded into a document.
 *
 * @param doc The PDF document that will hold the embedded font.
 * @param file a ttf file.
 * @return a PDTrueTypeFont instance.
 * @throws IOException If there is an error loading the data.
 */
public static PDTrueTypeFont loadTTF(PDDocument doc, InputStream is) throws 
IOException
{
return new PDTrueTypeFont(doc, is);
}

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread Cornelis Hoeflake (JIRA)
Cornelis Hoeflake created PDFBOX-2358:
-

 Summary: ExternalFonts uses classloader of class in font-box
 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake


ExternalFonts loads some default fonts via the 
org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
classloader (ResourceLoader.class.getClassLoader()) for loading the given 
resource.
The problem is that the resource is in the PDFBox project and the 
ResourceLoader in the FontBox. In an OSGI environment this is a problem.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread Cornelis Hoeflake (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136940#comment-14136940
 ] 

Cornelis Hoeflake commented on PDFBOX-2358:
---

One solution is to pass a classloader to the ResourceLoader. But throwing 
around with classloaders can cause issues.

Another solution is to load the file directly without the ResourceLoader, as 
far as I can see the resourceloader does nothing extra in this case. Same for 
CMapParser which also uses the ResourceLoader.

Next is to remove ResourceLoader or to make very clear that the ResourceLoader 
cannot be used outside the fontbox. My personal opinion is that the 
ResourceLoader is a black class which does some magic tricks (for example, 
return a FileInputStream if the resource could not be found in the classloader, 
this is somewhat obscure and could be dangerous).

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137026#comment-14137026
 ] 

Andreas Lehmkühler commented on PDFBOX-2301:


[~jojelino] PDFBox 1.8.x has java 1.5 and 2.0.x has java 1.6 as minimum 
requirement, so that your patch isn't valid. However, we already started a 
refactoring in 2.0 and we don't need cloning anymore. I guess we won't fix this 
in the 1.8 branch

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff, clone3.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  |
48 | 8,186,528
 |  |  |- class class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00
   

[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137038#comment-14137038
 ] 

Andreas Lehmkühler commented on PDFBOX-2301:


[~tboehme]I'm still thinking about the options concerning the scratch file 
issue. It's quite easy to switch back to use one scratch file and I'm going to 
do that soon at least as a workaround. But the root issue is, that the parser 
creates all streams at the beginning and thus is allocating a lot of 
memory/creating a lot of scratch files. IMO we have to refactor the whole 
parser to change that behaviour. And now we come to the point to implement 
on-demand parsing.

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff, clone3.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  

[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137047#comment-14137047
 ] 

Andreas Lehmkühler commented on PDFBOX-2357:


Have a look at org.apache.pdfbox.pdmodel.font.PDTrueTypeFontEmbedder, it should 
do the trick

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-2357:
---
Comment: was deleted

(was: Have a look at org.apache.pdfbox.pdmodel.font.PDTrueTypeFontEmbedder, it 
should do the trick)

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread Cornelis Hoeflake (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137078#comment-14137078
 ] 

Cornelis Hoeflake commented on PDFBOX-2357:
---

That class is not a public class.

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137097#comment-14137097
 ] 

Andreas Lehmkühler commented on PDFBOX-2357:


I know, that's why I deleted my former comment, obviously I wasn't fast enough 
;-)

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake

 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-17 Thread Timo Boehme (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137140#comment-14137140
 ] 

Timo Boehme commented on PDFBOX-2301:
-

[~lehmi] The NonSeqParser by design is an on-demand parser. Only because other 
parts of PDFBOX require data already parsed it initializes/parses all objects 
in the init procedure (see parseMinimalCatalog variable) as a work around. So 
COSObject and its subclasses should only be a stub in the beginning and if used 
(any method call) should trigger parsing the object by the parser 
(NonSequentialPDFParser.parseObjectDynamically). Thus COSDocument needs to have 
a reference to the parser.
For the scratch file workaround I'm still in favor for a split in-memory/file 
usage so that only large PDF need to write to file.

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff, clone3.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
  

[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document

2014-09-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137490#comment-14137490
 ] 

Tilman Hausherr commented on PDFBOX-2356:
-

Are you building from source? If yes, please try this:
{code}
private static void adjustTimeZoneNicely(GregorianCalendar cal, TimeZone tz)
{
cal.setTimeZone(tz);
int offset = (cal.get(Calendar.ZONE_OFFSET) + 
cal.get(Calendar.DST_OFFSET)) / 
MILLIS_PER_MINUTE;
cal.add(Calendar.MINUTE, -offset);
}
{code}
If no, please post a minimal code or command line that you used to check your 
file (I never use preflight) and I'll test it.


 Error Validating PDF Archive Document
 -

 Key: PDFBOX-2356
 URL: https://issues.apache.org/jira/browse/PDFBOX-2356
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 1.8.4, 1.8.5, 1.8.6
Reporter: Cetra Free
 Attachments: pdfafile.pdf


 When trying to validate a PDF archive file (attached to this ticket) we get 
 the following error:
 {code}
 7.2   - Error on MetaData, ModificationDate present in the document catalog 
 dictionary doesn't match with XMP information
 {code}
 This is because the the Modification Date in the Dictionary is parsed 
 differently from the XMP Metadata.  The XMP Metadata is correct, but the Date 
 from the Dictionary appends an extra 30 minutes.
 The following is the raw COSObject from the PDF File
 {code}
 COSString{D:20140917122850+09'30'}
 {code}
 The Long value should be *141092273*
 The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the 
 Date with Long *141092453* which is 30 minutes ahead.
 XMP Modification Date is parsed differently and returns the correct date.
 This means that validation will fail for PDF Archives.
 My suggestion would be to refactor the parseDate function to use the Standard 
 Java library.
 Here's an example class which will be compatible with the PDF Specification:
 {code}
 static class DateParser {
  private MapInteger, SimpleDateFormat formats =
new HashMapInteger, SimpleDateFormat();
  
  public DateParser() {
String expr = ;
  
   for(String part: Arrays.asList(, MM, dd, HH, mm, ss, Z)) {
  expr = expr + part;
  formats.put(expr.length(), new SimpleDateFormat(expr));
}
  }
  
  public Calendar parseDate(String expr) {
try {
  expr = expr.replace(D:, ).replace(', ).replace(Z, +);
  Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
  
  
  Calendar calendar =  Calendar.getInstance();
  calendar.setTime(date);
  
  return calendar;
} catch (ParseException e) {
  return null;
}
  }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Tilman Hausherr
It is a I don't like it, but I can live with it but I think it might be 
a pain. A soft -1.


Tilman

Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:

Hi,


Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
geschrieben:


-1, I don't like the idea to have different repository types.

Hmmm, is this just a I don't like it, but I can live with it or is it a clear
veto?

In a case of a veto, how about starting with moving parts of the docs to a new
git repo? IMO sooner or later the project will move from svn to git and that
would be a good opertunity to get used to the general usage of git and of course
to the special processes used here at the ASF so that we are not thrown in at
the deep end after the migration.


Tilman

BR
Andreas


Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:

Hi there,

in order to make it easier for people to contribute to the documentation and
examples I thought about the potential benefits of moving these to a git
based repository instead of svn. The main idea behind that is to allow
people to contribute via github opening another channel of communication and
making it easier to contribute.

Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
https://github.com/apache/cordova-docs for an example of that.

I haven’t thought about all potential implications and changes necessary yet
but wanted to get a first feedback about support for that idea before
putting more effort into that.

WDYT?

Maruan




Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Maruan Sahyoun
is that because of the examples, the docs or both?

BR

Maruan

Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de:

 It is a I don't like it, but I can live with it but I think it might be a 
 pain. A soft -1.
 
 Tilman
 
 Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:
 Hi,
 
 Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
 geschrieben:
 
 
 -1, I don't like the idea to have different repository types.
 Hmmm, is this just a I don't like it, but I can live with it or is it a 
 clear
 veto?
 
 In a case of a veto, how about starting with moving parts of the docs to a 
 new
 git repo? IMO sooner or later the project will move from svn to git and that
 would be a good opertunity to get used to the general usage of git and of 
 course
 to the special processes used here at the ASF so that we are not thrown in at
 the deep end after the migration.
 
 Tilman
 BR
 Andreas
 
 Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:
 Hi there,
 
 in order to make it easier for people to contribute to the documentation 
 and
 examples I thought about the potential benefits of moving these to a git
 based repository instead of svn. The main idea behind that is to allow
 people to contribute via github opening another channel of communication 
 and
 making it easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes necessary 
 yet
 but wanted to get a first feedback about support for that idea before
 putting more effort into that.
 
 WDYT?
 
 Maruan
 



[jira] [Commented] (PDFBOX-2299) Isartor tests don't work anymore

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137574#comment-14137574
 ] 

John Hewson commented on PDFBOX-2299:
-

That's ok, I know where to find it.

P.S. You can attach files with More  Attach Files.

 Isartor tests don't work anymore
 

 Key: PDFBOX-2299
 URL: https://issues.apache.org/jira/browse/PDFBOX-2299
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: John Hewson
Priority: Critical
  Labels: Isartor, regression

 Sorry, I hadn't thought about this when testing the no-awt version, but the 
 Isartor tests don't work anymore (I have them enabled for my own version 
 since PDFBOX-2179).
 {code}
 ---
 Test set: org.apache.pdfbox.preflight.TestIsartor
 ---
 Tests run: 204, Failures: 35, Errors: 0, Skipped: 0, Time elapsed: 29.485 sec 
  FAILURE! - in org.apache.pdfbox.preflight.TestIsartor
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.2 Graphics\6.2.3 Colour 
 spaces\6.2.3.3 Uncalibrated colour 
 spaces\isartor-6-2-3-3-t02-fail-h.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.092 sec   FAILURE!
 java.lang.AssertionError: isartor-6-2-3-3-t02-fail-h.pdf : 
 IllegalArgumentException raised , message=Built-in Encoding required for 
 symbolic font
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.2 Graphics\6.2.3 Colour 
 spaces\6.2.3.3 Uncalibrated colour 
 spaces\isartor-6-2-3-3-t02-fail-i.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.006 sec   FAILURE!
 java.lang.AssertionError: isartor-6-2-3-3-t02-fail-i.pdf : 
 IllegalArgumentException raised , message=Built-in Encoding required for 
 symbolic font
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.2 Font 
 types\isartor-6-3-2-t01-fail-b.pdf](org.apache.pdfbox.preflight.TestIsartor)  
 Time elapsed: 1.837 sec   FAILURE!
 java.lang.AssertionError: isartor-6-3-2-t01-fail-b.pdf : NullPointerException 
 raised , message=null
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite 
 fonts\6.3.3.1 
 General\isartor-6-3-3-1-t01-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.191 sec   FAILURE!
 java.lang.AssertionError: isartor-6-3-3-1-t01-fail-a.pdf : 
 NullPointerException raised , message=null
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite 
 fonts\6.3.3.1 
 General\isartor-6-3-3-1-t01-fail-b.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.051 sec   FAILURE!
 java.lang.AssertionError: isartor-6-3-3-1-t01-fail-b.pdf : 
 NullPointerException raised , message=null
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite 
 fonts\6.3.3.2 
 CIDFonts\isartor-6-3-3-2-t01-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.051 sec   FAILURE!
 java.lang.AssertionError: isartor-6-3-3-2-t01-fail-a.pdf : 
 NullPointerException raised , message=null
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite 
 fonts\6.3.3.3 
 CMaps\isartor-6-3-3-3-t01-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.199 sec   FAILURE!
 java.lang.AssertionError: isartor-6-3-3-3-t01-fail-a.pdf : 
 NullPointerException raised , message=null
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 validate[target\pdfs\Isartor testsuite\PDFA-1b\6.3 Fonts\6.3.3 Composite 
 fonts\6.3.3.3 
 CMaps\isartor-6-3-3-3-t02-fail-a.pdf](org.apache.pdfbox.preflight.TestIsartor)
   Time elapsed: 0.042 sec   FAILURE!
 java.lang.AssertionError: isartor-6-3-3-3-t02-fail-a.pdf : 
 NullPointerException raised , message=null
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.pdfbox.preflight.TestIsartor.validate(TestIsartor.java:175)
 

[jira] [Commented] (PDFBOX-2350) Type1 Parser hangs indefinitely

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137583#comment-14137583
 ] 

John Hewson commented on PDFBOX-2350:
-

Can you post a stack trace for the point where the application hangs? (You 
might have to write one down by hand using your debugger). I'd be interested to 
see where exactly Type1Parser is being called form in this specific case. I 
suspect the problem is that bad data is being passed to Type1Parser.

 Type1 Parser hangs indefinitely
 ---

 Key: PDFBOX-2350
 URL: https://issues.apache.org/jira/browse/PDFBOX-2350
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
 Environment: Windows 7, JDK 1.7.0_51-b13
Reporter: Daniel Scheibe

 When rendering the first page of my pdf document the Type1Parser 
 (org.apache.fontbox.type1.Type1Parser) hangs in a loop in 
 {{parseBinary(byte[] bytes) throws IOException}}
 and kills our rendering pipeline. Please find the loop that hangs below:
 // find /Private dict
 while (!lexer.peekToken().getText().equals(Private))
 {
 lexer.nextToken();
 }
 There is no token named Private ever in the list of returned tokens 
 (they're empty all the time).  
 Furthermore going deeper into the source code it seems the class reading the 
 tokens (Type1Lexer) does never finally advance the buffer position and always 
 returns an empty name token in the readToken(Token prevToken) method.
 Looking at the decrypted buffer i cannot get something useful out of it based 
 on my current understanding.
 Unfortunately i cannot provide the pdf in question as it contains confidental 
 data.
 Acrobat Reader XI Version 11.0.08 renders the document just fine.
 In addition it seems the pdf was encrypted (40-Bit RC4) with an empty 
 password and says it's pdf version 1.5.
 Does this provide enough information or can i do anything else to help 
 nailing this one down?
 I guess this might be a pdf document structure/feature that is not yet 
 supported completely but at least pdfbox should throw an exception instead of 
 failing silently...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Andreas Lehmkuehler

Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun:

is that because of the examples, the docs or both?

The examples could be tricky as they depend on the source code in the svn repo.


BR

Maruan


BR
Andreas



Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de:


It is a I don't like it, but I can live with it but I think it might be a pain. A 
soft -1.

Tilman

Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:

Hi,


Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
geschrieben:


-1, I don't like the idea to have different repository types.

Hmmm, is this just a I don't like it, but I can live with it or is it a clear
veto?

In a case of a veto, how about starting with moving parts of the docs to a new
git repo? IMO sooner or later the project will move from svn to git and that
would be a good opertunity to get used to the general usage of git and of course
to the special processes used here at the ASF so that we are not thrown in at
the deep end after the migration.


Tilman

BR
Andreas


Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:

Hi there,

in order to make it easier for people to contribute to the documentation and
examples I thought about the potential benefits of moving these to a git
based repository instead of svn. The main idea behind that is to allow
people to contribute via github opening another channel of communication and
making it easier to contribute.

Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
https://github.com/apache/cordova-docs for an example of that.

I haven’t thought about all potential implications and changes necessary yet
but wanted to get a first feedback about support for that idea before
putting more effort into that.

WDYT?

Maruan









Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Tilman Hausherr

Hi Maruan,

The examples only.

With the docs I assume you mean the website. I've never touched it 
(although I might in the future), it isn't part of the project, so I 
don't mind.


Tilman

Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun:

is that because of the examples, the docs or both?

BR

Maruan

Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de:


It is a I don't like it, but I can live with it but I think it might be a pain. A 
soft -1.

Tilman

Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:

Hi,


Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
geschrieben:


-1, I don't like the idea to have different repository types.

Hmmm, is this just a I don't like it, but I can live with it or is it a clear
veto?

In a case of a veto, how about starting with moving parts of the docs to a new
git repo? IMO sooner or later the project will move from svn to git and that
would be a good opertunity to get used to the general usage of git and of course
to the special processes used here at the ASF so that we are not thrown in at
the deep end after the migration.


Tilman

BR
Andreas


Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:

Hi there,

in order to make it easier for people to contribute to the documentation and
examples I thought about the potential benefits of moving these to a git
based repository instead of svn. The main idea behind that is to allow
people to contribute via github opening another channel of communication and
making it easier to contribute.

Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
https://github.com/apache/cordova-docs for an example of that.

I haven’t thought about all potential implications and changes necessary yet
but wanted to get a first feedback about support for that idea before
putting more effort into that.

WDYT?

Maruan






[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137641#comment-14137641
 ] 

ASF subversion and git services commented on PDFBOX-2355:
-

Commit 1625717 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1625717 ]

PDFBOX-2355: Refactor Splitter protected members

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Priority: Critical
  Labels: pdfbox

 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137649#comment-14137649
 ] 

John Hewson commented on PDFBOX-2355:
-

The Splitter class has not had any attention for a while. Clearly you're the 
first person to try to use the subclassing functionality it provides, and it 
isn't actually possible. I've refactored the internals of Splitter and reduced 
its protected API to to just 5 methods.

The createNewDocument() method can now simply call getSourceDocument() to 
access the source document, and can now return a PDDocument instance which will 
be added to newDocuments (now destinationDocuments) internally.

As nobody else has tried using the API in this manner, please let me know how 
well it works for you - we can incorporate any useful changes needed.

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Priority: Critical
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reassigned PDFBOX-2355:
---

Assignee: John Hewson

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Assignee: John Hewson
Priority: Critical
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-2355.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Assignee: John Hewson
Priority: Critical
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2355:

Priority: Major  (was: Critical)

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Assignee: John Hewson
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread G. Ralph Kuntz (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137658#comment-14137658
 ] 

G. Ralph Kuntz commented on PDFBOX-2355:


 Clearly you're the first person to try to use the subclassing functionality 
 it provides

I kind of figured :-)

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Assignee: John Hewson
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread G. Ralph Kuntz (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137658#comment-14137658
 ] 

G. Ralph Kuntz edited comment on PDFBOX-2355 at 9/17/14 6:04 PM:
-

 Clearly you're the first person to try to use the subclassing functionality 
 it provides

I kind of figured :-)

I used reflection to access the private field. I'll rewrite it properly once 
the new code is released.


was (Author: grkuntzmd):
 Clearly you're the first person to try to use the subclassing functionality 
 it provides

I kind of figured :-)

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Assignee: John Hewson
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build became unstable: PDFBox-trunk #1279

2014-09-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-trunk/1279/changes



Jenkins build became unstable: PDFBox-trunk » Apache PDFBox #1279

2014-09-17 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox/1279/changes



[jira] [Assigned] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reassigned PDFBOX-2358:
---

Assignee: John Hewson

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-09-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137665#comment-14137665
 ] 

Tilman Hausherr edited comment on PDFBOX-2340 at 9/17/14 6:08 PM:
--

While the mockups looks very nice, I'm more a content guy who doesn't care 
about looks (this only applies to software :-) ). There are two things that are 
missing in the documentation, one is sample code for rendering, the other is an 
improved text for people opening issues. Here's a text that could be merged 
with the existing text at https://pdfbox.apache.org/support.html

We want to help you. We don't respond by clicking on boilerplate texts. Solving 
your issues is what makes PDFBox better and better!

Do's:
- attach the PDF that makes trouble by using More, Attach files. 
- If your file is too large, upload it to a sharehoster, or use the PDFSplit 
application to isolate the troublesome page
- Mention the PDFBox version you are using.
- Attach the shortest possible code that reproduces the problem. Insert java 
code between \{code\}...\{code\}. Or try to reproduce the problem with the 
command line applications.
- mention what you were doing, what was the expected behaviour, and what 
happened instead
- Provide a stack trace of an exception if there is one
- Try using the non-sequential parser (loadNonSeq() instead of load(), and 
-nonSeq with the command line applications)
- Search JIRA if your problem has been mentioned before.
- Be patient: all the people here are unpaid volunteers who work for you in 
their free time

Dont's:
- upload files to a hoster that requires registration to read the file.
- create an issue in JIRA and then go on vacation so you won't repond to our 
questions / suggestions.
- ask how to questions. Ask such questions on the mailing lists, on 
stackoverflow.com, and look at the sample and the test code in the sources.
- attach PDF files with confidential and/or personal data (name, DoB, bank 
data, health data, SSN) without getting permission from the client and/or the 
people mentioned on the PDF
- create issues about obsolete PDFBox versions

We can sometimes solve problems without having the PDF, but it is difficult.



was (Author: tilman):
While it looks very nice, I'm more a content guy who doesn't care about looks 
(this only applies to software :-) ). There are two things that are missing in 
the documentation, one is sample code for rendering, the other is an improved 
text for people opening issues. Here's a text that could be merged with the 
existing text at https://pdfbox.apache.org/support.html

We want to help you. We don't respond by clicking on boilerplate texts. Solving 
your issues is what makes PDFBox better and better!

Do's:
- attach the PDF that makes trouble by using More, Attach files. 
- If your file is too large, upload it to a sharehoster, or use the PDFSplit 
application to isolate the troublesome page
- Mention the PDFBox version you are using.
- Attach the shortest possible code that reproduces the problem. Insert java 
code between \{code\}...\{code\}. Or try to reproduce the problem with the 
command line applications.
- mention what you were doing, what was the expected behaviour, and what 
happened instead
- Provide a stack trace of an exception if there is one
- Try using the non-sequential parser (loadNonSeq() instead of load(), and 
-nonSeq with the command line applications)
- Search JIRA if your problem has been mentioned before.
- Be patient: all the people here are unpaid volunteers who work for you in 
their free time

Dont's:
- upload files to a hoster that requires registration to read the file.
- create an issue in JIRA and then go on vacation so you won't repond to our 
questions / suggestions.
- ask how to questions. Ask such questions on the mailing lists, on 
stackoverflow.com, and look at the sample and the test code in the sources.
- attach PDF files with confidential and/or personal data (name, DoB, bank 
data, health data, SSN) without getting permission from the client and/or the 
people mentioned on the PDF
- create issues about obsolete PDFBox versions

We can sometimes solve problems without having the PDF, but it is difficult.


 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Maruan Sahyoun
 Attachments: Mockup-20140912.png, Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from 

[jira] [Commented] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-09-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137665#comment-14137665
 ] 

Tilman Hausherr commented on PDFBOX-2340:
-

While it looks very nice, I'm more a content guy who doesn't care about looks 
(this only applies to software :-) ). There are two things that are missing in 
the documentation, one is sample code for rendering, the other is an improved 
text for people opening issues. Here's a text that could be merged with the 
existing text at https://pdfbox.apache.org/support.html

We want to help you. We don't respond by clicking on boilerplate texts. Solving 
your issues is what makes PDFBox better and better!

Do's:
- attach the PDF that makes trouble by using More, Attach files. 
- If your file is too large, upload it to a sharehoster, or use the PDFSplit 
application to isolate the troublesome page
- Mention the PDFBox version you are using.
- Attach the shortest possible code that reproduces the problem. Insert java 
code between \{code\}...\{code\}. Or try to reproduce the problem with the 
command line applications.
- mention what you were doing, what was the expected behaviour, and what 
happened instead
- Provide a stack trace of an exception if there is one
- Try using the non-sequential parser (loadNonSeq() instead of load(), and 
-nonSeq with the command line applications)
- Search JIRA if your problem has been mentioned before.
- Be patient: all the people here are unpaid volunteers who work for you in 
their free time

Dont's:
- upload files to a hoster that requires registration to read the file.
- create an issue in JIRA and then go on vacation so you won't repond to our 
questions / suggestions.
- ask how to questions. Ask such questions on the mailing lists, on 
stackoverflow.com, and look at the sample and the test code in the sources.
- attach PDF files with confidential and/or personal data (name, DoB, bank 
data, health data, SSN) without getting permission from the client and/or the 
people mentioned on the PDF
- create issues about obsolete PDFBox versions

We can sometimes solve problems without having the PDF, but it is difficult.


 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Maruan Sahyoun
 Attachments: Mockup-20140912.png, Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from the examples/snippet ‚repository‘.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721
 ] 

John Hewson edited comment on PDFBOX-2358 at 9/17/14 6:40 PM:
--

The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling. As 
you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. 
The solution would seem to be to move the cmap resource files to FontBox, 
because they are not PDF-specific.


was (Author: jahewson):
The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling, 
passing resource paths from PDFBox to FontBox is not OSGI friendly. The 
solution would seem to be to move the cmap resource files to FontBox, because 
they are not PDF-specific.

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721
 ] 

John Hewson commented on PDFBOX-2358:
-

The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling, 
passing resource paths from PDFBox to FontBox is not OSGI friendly. The 
solution would seem to be to move the cmap resource files to FontBox, because 
they are not PDF-specific.

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721
 ] 

John Hewson edited comment on PDFBOX-2358 at 9/17/14 6:40 PM:
--

The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling. As 
you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. 
The solution would seem to be to move the cmap resource files from PDFBox to 
FontBox, because they are not PDF-specific.


was (Author: jahewson):
The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling. As 
you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. 
The solution would seem to be to move the cmap resource files to FontBox, 
because they are not PDF-specific.

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137721#comment-14137721
 ] 

John Hewson edited comment on PDFBOX-2358 at 9/17/14 6:41 PM:
--

The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling. As 
you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. 
The solution would seem to be to move the cmap resource files from PDFBox to 
FontBox, because they are not PDF-specific, they are more properly part of 
Adobe's CIDFont system.


was (Author: jahewson):
The use of FontBox's ResourceLoader in ExternalFonts is an IDE autocomplete 
accident. PDFBox has its own ResourceLoader class which should have been used.

The way that CMapParser uses the FontBox ResourceLoader is more troubling. As 
you say, passing resource paths from PDFBox to FontBox is not OSGI friendly. 
The solution would seem to be to move the cmap resource files from PDFBox to 
FontBox, because they are not PDF-specific.

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2306) Error reading stream, expected='endstream' actual='endobj'

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137724#comment-14137724
 ] 

ASF subversion and git services commented on PDFBOX-2306:
-

Commit 1625736 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625736 ]

PDFBOX-2306: be lenient, allow stream to end with endobj

 Error reading stream, expected='endstream' actual='endobj'
 --

 Key: PDFBOX-2306
 URL: https://issues.apache.org/jira/browse/PDFBOX-2306
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.0


 I get this exception with the file of PDFBOX-269:
 {code}
 java.io.IOException: Error reading stream, expected='endstream' 
 actual='endobj' at offset 183468
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1578)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1249)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1176)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1152)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:487)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:755)
   at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1155)
   at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1138)
 {code}
 The cause is that a stream ends with endobj instead of endstream. This is 
 accepted in the non sequential parser in readUntilEndStream() but later it 
 isn't. It is a problem that was fixed in the old parser many years ago. My 
 fix is for the sequential parser. I also changed a misleading error message 
 nearby.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2306) Error reading stream, expected='endstream' actual='endobj'

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2306:

Affects Version/s: 1.8.8
   1.8.7
   1.8.6

 Error reading stream, expected='endstream' actual='endobj'
 --

 Key: PDFBOX-2306
 URL: https://issues.apache.org/jira/browse/PDFBOX-2306
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 I get this exception with the file of PDFBOX-269:
 {code}
 java.io.IOException: Error reading stream, expected='endstream' 
 actual='endobj' at offset 183468
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1578)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1249)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1176)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1152)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:487)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:755)
   at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1155)
   at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1138)
 {code}
 The cause is that a stream ends with endobj instead of endstream. This is 
 accepted in the non sequential parser in readUntilEndStream() but later it 
 isn't. It is a problem that was fixed in the old parser many years ago. My 
 fix is for the sequential parser. I also changed a misleading error message 
 nearby.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2306) Error reading stream, expected='endstream' actual='endobj'

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2306.
-
Resolution: Fixed

 Error reading stream, expected='endstream' actual='endobj'
 --

 Key: PDFBOX-2306
 URL: https://issues.apache.org/jira/browse/PDFBOX-2306
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 I get this exception with the file of PDFBOX-269:
 {code}
 java.io.IOException: Error reading stream, expected='endstream' 
 actual='endobj' at offset 183468
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1578)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1249)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1176)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1152)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:487)
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:755)
   at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1155)
   at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1138)
 {code}
 The cause is that a stream ends with endobj instead of endstream. This is 
 accepted in the non sequential parser in readUntilEndStream() but later it 
 isn't. It is a problem that was fixed in the old parser many years ago. My 
 fix is for the sequential parser. I also changed a misleading error message 
 nearby.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2296) Wrong stream length used for truetype font

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2296:

Affects Version/s: 1.8.8
   1.8.7
   1.8.6

 Wrong stream length used for truetype font
 --

 Key: PDFBOX-2296
 URL: https://issues.apache.org/jira/browse/PDFBOX-2296
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the 
 PDF but it is really about 27350. This wrong length is used to read the 
 encoded font stream and this results in further trouble (EOF).
 The problem is that the wrong length is passed to createFilteredStream() 
 instead of just calling it without parameters. In cosStream.doDecode() 
 unFilteredStream = filteredStream (there is a FIXME there!!!), and in 
 doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is 
 used, which returns the expectedLength.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2296) Wrong stream length used for truetype font

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137734#comment-14137734
 ] 

ASF subversion and git services commented on PDFBOX-2296:
-

Commit 1625743 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625743 ]

PDFBOX-2296: don't call createFilteredStream() with an expected length if we 
know that length is wrong

 Wrong stream length used for truetype font
 --

 Key: PDFBOX-2296
 URL: https://issues.apache.org/jira/browse/PDFBOX-2296
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the 
 PDF but it is really about 27350. This wrong length is used to read the 
 encoded font stream and this results in further trouble (EOF).
 The problem is that the wrong length is passed to createFilteredStream() 
 instead of just calling it without parameters. In cosStream.doDecode() 
 unFilteredStream = filteredStream (there is a FIXME there!!!), and in 
 doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is 
 used, which returns the expectedLength.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2296) Wrong stream length used for truetype font

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2296:

Fix Version/s: 2.0.0
   1.8.8

 Wrong stream length used for truetype font
 --

 Key: PDFBOX-2296
 URL: https://issues.apache.org/jira/browse/PDFBOX-2296
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the 
 PDF but it is really about 27350. This wrong length is used to read the 
 encoded font stream and this results in further trouble (EOF).
 The problem is that the wrong length is passed to createFilteredStream() 
 instead of just calling it without parameters. In cosStream.doDecode() 
 unFilteredStream = filteredStream (there is a FIXME there!!!), and in 
 doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is 
 used, which returns the expectedLength.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2296) Wrong stream length used for truetype font

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2296:

Assignee: (was: Tilman Hausherr)

 Wrong stream length used for truetype font
 --

 Key: PDFBOX-2296
 URL: https://issues.apache.org/jira/browse/PDFBOX-2296
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the 
 PDF but it is really about 27350. This wrong length is used to read the 
 encoded font stream and this results in further trouble (EOF).
 The problem is that the wrong length is passed to createFilteredStream() 
 instead of just calling it without parameters. In cosStream.doDecode() 
 unFilteredStream = filteredStream (there is a FIXME there!!!), and in 
 doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is 
 used, which returns the expectedLength.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137790#comment-14137790
 ] 

ASF subversion and git services commented on PDFBOX-2320:
-

Commit 1625776 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625776 ]

PDFBOX-2320: use readUntilEndStream from BaseParser, remove the method from 
NonSequentialParser; better log output

 IOException: Could not read embedded TTF for font TimesNewRoman
 ---

 Key: PDFBOX-2320
 URL: https://issues.apache.org/jira/browse/PDFBOX-2320
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: Stream-1410081173856.txt


 http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq TEST_SetCharSpacing_Error.pdf
 {code}
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Exception in thread main java.io.IOException: Could not read embedded TTF 
 for font TimesNewRoman
   at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116)
   at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73)
   at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543)
   at 
 org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194)
   at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109)
   at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid 
 block type
   at 

[jira] [Updated] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2320:

Affects Version/s: 1.8.8
   1.8.7
   1.8.6

 IOException: Could not read embedded TTF for font TimesNewRoman
 ---

 Key: PDFBOX-2320
 URL: https://issues.apache.org/jira/browse/PDFBOX-2320
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: Stream-1410081173856.txt


 http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq TEST_SetCharSpacing_Error.pdf
 {code}
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Exception in thread main java.io.IOException: Could not read embedded TTF 
 for font TimesNewRoman
   at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116)
   at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73)
   at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543)
   at 
 org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194)
   at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109)
   at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid 
 block type
   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:365)
   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:286)
   at 
 

[jira] [Commented] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137793#comment-14137793
 ] 

ASF subversion and git services commented on PDFBOX-2320:
-

Commit 1625777 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625777 ]

PDFBOX-2320: set readUntilEndStream from BaseParser to protected to allow 
access from nonseq parser

 IOException: Could not read embedded TTF for font TimesNewRoman
 ---

 Key: PDFBOX-2320
 URL: https://issues.apache.org/jira/browse/PDFBOX-2320
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: Stream-1410081173856.txt


 http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq TEST_SetCharSpacing_Error.pdf
 {code}
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Exception in thread main java.io.IOException: Could not read embedded TTF 
 for font TimesNewRoman
   at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116)
   at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73)
   at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543)
   at 
 org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194)
   at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109)
   at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid 
 block type
   at 

[jira] [Resolved] (PDFBOX-2320) IOException: Could not read embedded TTF for font TimesNewRoman

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2320.
-
   Resolution: Fixed
Fix Version/s: 2.0.0
   1.8.8

 IOException: Could not read embedded TTF for font TimesNewRoman
 ---

 Key: PDFBOX-2320
 URL: https://issues.apache.org/jira/browse/PDFBOX-2320
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0

 Attachments: Stream-1410081173856.txt


 http://svn.apache.org/viewvc/incubator/pdfbox/trunk/test/input/TEST_SetCharSpacing_Error.pdf?revision=682412view=copathrev=793348
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq TEST_SetCharSpacing_Error.pdf
 {code}
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.pdfparser.NonSequentialPDFParser 
 validateStreamLength
 SEVERE: The end of the stream doesn't point to the correct offset, using 
 workaround to read the stream
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Sep 5, 2014 10:56:40 AM org.apache.pdfbox.filter.FlateFilter decode
 SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
 Exception in thread main java.io.IOException: Could not read embedded TTF 
 for font TimesNewRoman
   at 
 org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:116)
   at 
 org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:73)
   at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:171)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:543)
   at 
 org.apache.pdfbox.util.operator.text.SetTextFont.process(SetTextFont.java:48)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:510)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:240)
   at 
 org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:194)
   at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:176)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:228)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:160)
   at 
 org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:109)
   at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:265)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid 
 block type
   at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:365)
   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:286)
   at 
 

[jira] [Updated] (PDFBOX-2332) Error reading stream, expected='endstream' actual='endstream8' at offset 1993

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2332:

Affects Version/s: 1.8.8
   1.8.7
   1.8.6

 Error reading stream, expected='endstream' actual='endstream8' at offset 1993
 -

 Key: PDFBOX-2332
 URL: https://issues.apache.org/jira/browse/PDFBOX-2332
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 2.0.0


 PDF from PDFBOX-195
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq test_bad.pdf 
 Exception in thread main java.io.IOException: Error reading stream, 
 expected='endstream' actual='endstream8' at offset 1993
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1576)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2332) Error reading stream, expected='endstream' actual='endstream8' at offset 1993

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2332.
-
   Resolution: Fixed
Fix Version/s: 1.8.8

 Error reading stream, expected='endstream' actual='endstream8' at offset 1993
 -

 Key: PDFBOX-2332
 URL: https://issues.apache.org/jira/browse/PDFBOX-2332
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 PDF from PDFBOX-195
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq test_bad.pdf 
 Exception in thread main java.io.IOException: Error reading stream, 
 expected='endstream' actual='endstream8' at offset 1993
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1576)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2332) Error reading stream, expected='endstream' actual='endstream8' at offset 1993

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137798#comment-14137798
 ] 

ASF subversion and git services commented on PDFBOX-2332:
-

Commit 1625778 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625778 ]

PDFBOX-2332: allow missing space characters after endstream in non sequential 
parser, e.g. entstream8 0 obj

 Error reading stream, expected='endstream' actual='endstream8' at offset 1993
 -

 Key: PDFBOX-2332
 URL: https://issues.apache.org/jira/browse/PDFBOX-2332
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 PDF from PDFBOX-195
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 -nonSeq test_bad.pdf 
 Exception in thread main java.io.IOException: Error reading stream, 
 expected='endstream' actual='endstream8' at offset 1993
   at 
 org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1576)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137838#comment-14137838
 ] 

ASF subversion and git services commented on PDFBOX-2342:
-

Commit 1625779 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625779 ]

PDFBOX-2342: decrypt COSArray too, not just COSString

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137841#comment-14137841
 ] 

ASF subversion and git services commented on PDFBOX-2342:
-

Commit 1625780 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625780 ]

PDFBOX-2342: allow public access to decryptArray

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2342:

Affects Version/s: 1.8.8
   1.8.7
   1.8.6

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-2342:
---

Assignee: Tilman Hausherr

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-17 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-2342.
-
   Resolution: Fixed
Fix Version/s: 2.0.0
   1.8.8

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0

 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2350) Type1 Parser hangs indefinitely

2014-09-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135708#comment-14135708
 ] 

Tilman Hausherr edited comment on PDFBOX-2350 at 9/17/14 7:47 PM:
--

Please try also 
{code}
PDDocument.loadNonSeq(new File(pdfFilename), );
{code}
that does the decryption if needed.

also, the correct way to decrypt with the old parser is
{code}
if( document.isEncrypted() )
{
try
{
StandardDecryptionMaterial sdm = new 
StandardDecryptionMaterial();
document.openProtection(sdm);
}
catch( InvalidPasswordException e )
{
System.err.println( Error: The document is encrypted. );
}
}
{code}

I'm not saying that this will solve your problems but it is worth a try.

If it still doesn't work, please change the code so that parseBinary() saves 
the decrypted byte array into a file (below line 448), and attach that file 
here.

{code}
FileOutputStream fos  = new FileOutputStream(new File(font- + 
System.currentTimeMillis() + .txt));
fos.write(decrypted);
fos.close();
{code}
Unless that font is also confidential :-(

Other things to try if you are using windows: download and use qpdf to 
uncompress the file and see if there are any error messages.

qpdf --stream-data=uncompress file.pdf fileU.pdf


was (Author: tilman):
Please try also 
{code}
PDDocument.loadNonSeq(new File(pdfFilename), );
{code}
that does the decryption if needed.

also, the correct way to decrypt with the old parser is
{code}
if( document.isEncrypted() )
{
try
{
StandardDecryptionMaterial sdm = new 
StandardDecryptionMaterial();
document.openProtection(sdm);
}
catch( InvalidPasswordException e )
{
System.err.println( Error: The document is encrypted. );
}
}
{code}

I'm not saying that this will solve your problems but it is worth a try.

If it still doesn't work, please save the decrypt byte array (in the 
ParseBinary nethod) in a file and post it here.

 Type1 Parser hangs indefinitely
 ---

 Key: PDFBOX-2350
 URL: https://issues.apache.org/jira/browse/PDFBOX-2350
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
 Environment: Windows 7, JDK 1.7.0_51-b13
Reporter: Daniel Scheibe

 When rendering the first page of my pdf document the Type1Parser 
 (org.apache.fontbox.type1.Type1Parser) hangs in a loop in 
 {{parseBinary(byte[] bytes) throws IOException}}
 and kills our rendering pipeline. Please find the loop that hangs below:
 // find /Private dict
 while (!lexer.peekToken().getText().equals(Private))
 {
 lexer.nextToken();
 }
 There is no token named Private ever in the list of returned tokens 
 (they're empty all the time).  
 Furthermore going deeper into the source code it seems the class reading the 
 tokens (Type1Lexer) does never finally advance the buffer position and always 
 returns an empty name token in the readToken(Token prevToken) method.
 Looking at the decrypted buffer i cannot get something useful out of it based 
 on my current understanding.
 Unfortunately i cannot provide the pdf in question as it contains confidental 
 data.
 Acrobat Reader XI Version 11.0.08 renders the document just fine.
 In addition it seems the pdf was encrypted (40-Bit RC4) with an empty 
 password and says it's pdf version 1.5.
 Does this provide enough information or can i do anything else to help 
 nailing this one down?
 I guess this might be a pdf document structure/feature that is not yet 
 supported completely but at least pdfbox should throw an exception instead of 
 failing silently...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: PDFBox 1.8.x (JDK7) #82

2014-09-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/82/changes

Changes:

[tilman] PDFBOX-2342: allow public access to decryptArray

[tilman] PDFBOX-2342: decrypt COSArray too, not just COSString

[tilman] PDFBOX-2332: allow missing space characters after endstream in non 
sequential parser, e.g. entstream8 0 obj

[tilman] PDFBOX-2320: set readUntilEndStream from BaseParser to protected to 
allow access from nonseq parser

[tilman] PDFBOX-2320: use readUntilEndStream from BaseParser, remove the method 
from NonSequentialParser; better log output

--
[...truncated 295 lines...]
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.03 sec
Running org.apache.xmpbox.type.TestResourceEventType
Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.083 sec
Running org.apache.xmpbox.type.TestSimpleMetadataProperties
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 sec
Running org.apache.xmpbox.type.TestVersionType
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec
Running org.apache.xmpbox.type.AttributeTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.apache.xmpbox.schema.PDFAIdentificationTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.014 sec
Running org.apache.xmpbox.schema.XMPBasicTest
Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 sec
Running org.apache.xmpbox.schema.PDFAIdentificationOthersTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec
Running org.apache.xmpbox.schema.DublinCoreTest
Tests run: 60, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.198 sec
Running org.apache.xmpbox.schema.XMPMediaManagementTest
Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.075 sec
Running org.apache.xmpbox.schema.XMPSchemaTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec
Running org.apache.xmpbox.schema.AdobePDFErrorsTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec
Running org.apache.xmpbox.schema.BasicJobTicketSchemaTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.062 sec
Running org.apache.xmpbox.schema.PhotoshopSchemaTest
Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.262 sec
Running org.apache.xmpbox.schema.XmpRightsSchemaTest
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.025 sec
Running org.apache.xmpbox.schema.AdobePDFTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.027 sec
Running org.apache.xmpbox.TestXMPWithDefinedSchemas
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.056 sec
Running org.apache.xmpbox.parser.DeserializationTest
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.095 sec
Running org.apache.xmpbox.DoubleSameTypeSchemaTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Running org.apache.xmpbox.SaveMetadataHelperTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec

Results :

Tests run: 424, Failures: 0, Errors: 0, Skipped: 0

[JENKINS] Recording test results
[INFO] 
[INFO] --- maven-bundle-plugin:2.3.7:bundle (default-bundle) @ xmpbox ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ xmpbox 
---
[INFO] 
[INFO] --- apache-rat-plugin:0.6:check (default) @ xmpbox ---
[INFO] Exclude: release.properties
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ xmpbox ---
[INFO] Installing 
https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/ws/1.8/xmpbox/target/xmpbox-1.8.8-SNAPSHOT.jar
 to 
/home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-SNAPSHOT.jar
[INFO] Installing 
https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/ws/1.8/xmpbox/pom.xml 
to 
/home/jenkins/jenkins-slave/maven-repositories/0/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-bundle-plugin:2.3.7:install (default-install) @ xmpbox ---
[INFO] Installing 
org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-SNAPSHOT.jar
[INFO] Writing OBR metadata
[INFO] 
[INFO] --- maven-deploy-plugin:2.6:deploy (default-deploy) @ xmpbox ---
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/maven-metadata.xml
 (773 B at 3.3 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-20140917.195426-5.jar
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/xmpbox/1.8.8-SNAPSHOT/xmpbox-1.8.8-20140917.195426-5.jar
 (112 KB at 405.6 KB/sec)
Uploading: 

[jira] [Commented] (PDFBOX-2342) WriteDecodedDoc cant decrypt pdf form correctly

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137862#comment-14137862
 ] 

ASF subversion and git services commented on PDFBOX-2342:
-

Commit 1625791 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625791 ]

PDFBOX-2342: catch CryptographyException like it is done elsewhere

 WriteDecodedDoc cant decrypt pdf form correctly
 ---

 Key: PDFBOX-2342
 URL: https://issues.apache.org/jira/browse/PDFBOX-2342
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0

 Attachments: test.pdf


 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc -nonSeq test.pdf
 country selection is wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2296) Wrong stream length used for truetype font

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137956#comment-14137956
 ] 

ASF subversion and git services commented on PDFBOX-2296:
-

Commit 1625817 from [~tilman] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1625817 ]

PDFBOX-2296: fix documentation

 Wrong stream length used for truetype font
 --

 Key: PDFBOX-2296
 URL: https://issues.apache.org/jira/browse/PDFBOX-2296
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 1.8.6, 1.8.7, 1.8.8, 2.0.0
Reporter: Tilman Hausherr
 Fix For: 1.8.8, 2.0.0


 The file of PDFBOX-2048 has a wrong encoded font length, it is 4412 in the 
 PDF but it is really about 27350. This wrong length is used to read the 
 encoded font stream and this results in further trouble (EOF).
 The problem is that the wrong length is passed to createFilteredStream() 
 instead of just calling it without parameters. In cosStream.doDecode() 
 unFilteredStream = filteredStream (there is a FIXME there!!!), and in 
 doDecode(COSName filterName, int filterIndex) unFilteredStream.getLength() is 
 used, which returns the expectedLength.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : PDFBox 1.8.x (JDK7) » Apache PDFBox #83

2014-09-17 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/org.apache.pdfbox$pdfbox/83/changes



Jenkins build is back to normal : PDFBox 1.8.x (JDK7) #83

2014-09-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox%201.8.x%20(JDK7)/83/changes



Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Maruan Sahyoun
The docs are part of the website 

Currently I mean the cookbook, how to build the project, architecture ..

Maruan

Am 17.09.2014 um 19:26 schrieb Tilman Hausherr thaush...@t-online.de:

 Hi Maruan,
 
 The examples only.
 
 With the docs I assume you mean the website. I've never touched it 
 (although I might in the future), it isn't part of the project, so I don't 
 mind.
 
 Tilman
 
 Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun:
 is that because of the examples, the docs or both?
 
 BR
 
 Maruan
 
 Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de:
 
 It is a I don't like it, but I can live with it but I think it might be a 
 pain. A soft -1.
 
 Tilman
 
 Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:
 Hi,
 
 Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
 geschrieben:
 
 
 -1, I don't like the idea to have different repository types.
 Hmmm, is this just a I don't like it, but I can live with it or is it a 
 clear
 veto?
 
 In a case of a veto, how about starting with moving parts of the docs to a 
 new
 git repo? IMO sooner or later the project will move from svn to git and 
 that
 would be a good opertunity to get used to the general usage of git and of 
 course
 to the special processes used here at the ASF so that we are not thrown in 
 at
 the deep end after the migration.
 
 Tilman
 BR
 Andreas
 
 Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:
 Hi there,
 
 in order to make it easier for people to contribute to the documentation 
 and
 examples I thought about the potential benefits of moving these to a git
 based repository instead of svn. The main idea behind that is to allow
 people to contribute via github opening another channel of communication 
 and
 making it easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes necessary 
 yet
 but wanted to get a first feedback about support for that idea before
 putting more effort into that.
 
 WDYT?
 
 Maruan
 
 



[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138034#comment-14138034
 ] 

ASF subversion and git services commented on PDFBOX-2358:
-

Commit 1625834 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1625834 ]

PDFBOX-2358: Move CMaps from PDFBox to FontBox and remove ResourceLoader

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: PDFBox-trunk » PDFBox parent #1280

2014-09-17 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/1280/

--
[...truncated 824 lines...]
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-sink-api/1.4/doxia-sink-api-1.4.jar
 (11 KB at 1215.7 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-logging-api/1.4/doxia-logging-api-1.4.jar
 (12 KB at 1380.7 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpclient/4.0.2/httpclient-4.0.2.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-core/1.4/doxia-core-1.4.jar
 (162 KB at 13419.2 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-guice/2.1.7/sisu-guice-2.1.7-noaop.jar
 (461 KB at 20029.6 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.3/commons-codec-1.3.jar
Downloaded: 
http://repo.maven.apache.org/maven2/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
 (190 KB at 8627.2 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar
Downloaded: 
http://repo.maven.apache.org/maven2/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar
 (60 KB at 4233.1 KB/sec)
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpclient/4.0.2/httpclient-4.0.2.jar
 (287 KB at 13001.2 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-apt/1.4/doxia-module-apt-1.4.jar
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xhtml/1.4/doxia-module-xhtml-1.4.jar
Downloaded: 
http://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.3/commons-codec-1.3.jar
 (46 KB at 2074.1 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xdoc/1.4/doxia-module-xdoc-1.4.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-apt/1.4/doxia-module-apt-1.4.jar
 (51 KB at 3635.0 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-fml/1.4/doxia-module-fml-1.4.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xhtml/1.4/doxia-module-xhtml-1.4.jar
 (16 KB at 1001.8 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-markdown/1.4/doxia-module-markdown-1.4.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xdoc/1.4/doxia-module-xdoc-1.4.jar
 (36 KB at 3580.4 KB/sec)
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar
 (169 KB at 7340.7 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/pegdown/pegdown/1.2.1/pegdown-1.2.1.jar
Downloading: 
http://repo.maven.apache.org/maven2/org/parboiled/parboiled-java/1.1.4/parboiled-java-1.1.4.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-markdown/1.4/doxia-module-markdown-1.4.jar
 (12 KB at 1626.3 KB/sec)
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-fml/1.4/doxia-module-fml-1.4.jar
 (37 KB at 4617.3 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/parboiled/parboiled-core/1.1.4/parboiled-core-1.1.4.jar
Downloading: http://repo.maven.apache.org/maven2/org/ow2/asm/asm/4.1/asm-4.1.jar
Downloaded: 
http://repo.maven.apache.org/maven2/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
 (1201 KB at 23083.0 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/ow2/asm/asm-tree/4.1/asm-tree-4.1.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/pegdown/pegdown/1.2.1/pegdown-1.2.1.jar 
(58 KB at 7154.4 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/ow2/asm/asm-analysis/4.1/asm-analysis-4.1.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/parboiled/parboiled-java/1.1.4/parboiled-java-1.1.4.jar
 (72 KB at 7940.5 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/ow2/asm/asm-util/4.1/asm-util-4.1.jar
Downloaded: http://repo.maven.apache.org/maven2/org/ow2/asm/asm/4.1/asm-4.1.jar 
(47 KB at 5138.8 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/ow2/asm/asm-tree/4.1/asm-tree-4.1.jar 
(22 KB at 2705.3 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-decoration-model/1.4/doxia-decoration-model-1.4.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/parboiled/parboiled-core/1.1.4/parboiled-core-1.1.4.jar
 (181 KB at 13893.0 KB/sec)
Downloading: 

Re: Custom TextStripper / PDGraphicsState Not Reading Color

2014-09-17 Thread John Hewson
Hi All

Just to follow up on this thread, I haven’t yet removed the .properties 
functionality.

However, it has now become not just desirable but necessary, as PDFBOX-2358 has 
shown that PDFBox is not handling resource loading in an OSGI compatible manner.

Basically, a class shouldn’t load resources from other packages, which means 
that the mechanism for overloading operators via .properties isn’t safe in 
subclasses. The PrintImageLocations in the “examples” package breaks this rule 
with the following code:

public PrintImageLocations() throws IOException
{
super( ResourceLoader.loadProperties(
org/apache/pdfbox/resources/PDFTextStripper.properties, true ) );
}

Because it’s loading resources in the “pdfbox” package. What this also means is 
that nobody can subclass the built-in PDFBox classes and safely load the 
built-in PDFBox .properties.

I’m going to migrate the .properties to the existing 
registerOperatorProcessor() mechanism.

-- John

On 30 Jul 2014, at 10:10, John Hewson j...@jahewson.com wrote:

 On 29 Jul 2014, at 23:12, Maruan Sahyoun sahy...@fileaffairs.de wrote:
 
 +1 for removing the .properties file if the new mechanism is easier to 
 understand and handle. The discussion doesn’t provide that proof or some 
 information about that.
 
 How would a replacement look like?
 
 Basically like registerOperatorProcessor(), as used in PreflightStreamEngine.
 
 
 OTOH if it’s a documentation issue we could also add some more information 
 to the javadocs to explain the dependencies. 
 
 We could add a register/unregister method to allow to add/remove custom 
 operator handling or provide a service discovery mechanism. This way we 
 still have the old flexibility.
 
 
 As Andreas notes, there’s a registerOperatorProcessor method which does this, 
 so the mechanism is already in place. The problem is not that we don’t have 
 the mechanism, it’s that we’re using .properties files at all. The list of 
 operator’s can’t be controlled from both code and from .properties lists, one 
 source has to be authoritative - otherwise we’d end up with a situation where 
 we have an operator disabled in a .properties file and then re-enabled in 
 code. Currently we have a situation where that could happen.
 
 Therefore, removing the .properties is the only workable solution. It’s 
 important to note that it’s very, very unlikely that anybody is using the 
 .properties files in a use-case where they are not also making some code 
 changes, so the supposed benefit of “not having to recompile” never existed. 
 Adding an operator would always require compile-time changes to PDFBox so 
 that the PDFStreamEngine subclasses actually does something with the new 
 operator.
 
 -- John
 
 BR
 Maruan
 
 Am 29.07.2014 um 21:48 schrieb John Hewson j...@jahewson.com:
 
 Right but we need to address the confusion and complexity that has been 
 caused by .properties files which made PDFBOX-2246 so tricky to figure out.
 
 Lets remove this wart!
 
 -- John
 
 On 29 Jul 2014, at 10:44, Tilman Hausherr thaush...@t-online.de wrote:
 
 Hi,
 
 At this time, the problem I see and wanted to solve (PDFBOX-2246) exists 
 regardless whether we use a properties file or initialize directly in the 
 code.
 
 Tilman
 
 
 Am 29.07.2014 19:41, schrieb John Hewson:
 On 29 Jul 2014, at 03:44, Andreas Lehmkühler andr...@lehmi.de wrote:
 
 Hi,
 
 it's not a black and white issue (comments inline)
 
 John Hewson j...@jahewson.com hat am 29. Juli 2014 um 07:44 
 geschrieben:
 
 
 Yes, really I should have said subclasses of PDFStreamEngine -  that's 
 where
 the .properties file originates. I'd propose replacing the properties
 mechanism with a simple method containing the mapping which can be 
 overridden
 in subclasses. Ultimately, users expect to be able to subclass the 
 behaviour
 of a class by just subclassing the class.
 PDFStreamEngine doesn't configure any operator set itself. The 
 subclasses are
 supposed to configure their own set of operators depending on the 
 particular
 usecase. E.g. to extend the text extraction one has to subclass 
 PDFTextStripper
 and so on.
 It’s PDFStreamEngine which implements the .property mechanism though, via 
 the
 PDFStreamEngine(Properties properties) constructor.
 
 E.g. to extend the text extraction one has to subclass PDFTextStripper 
 and so on.
 That’s true, but it’s only half the story, don’t forget that the 
 .properties files need
 to be copied and pasted elsewhere and modified along with overriding 
 which .property
 file is passed in the constructor if you want to truly override the 
 class’ behaviour.
 
 We've seen a number of incidents of confusion on the mailing list due 
 to the
 current design.
 IMHO, most of the confusion is based on the lack of knowledge of the pdf 
 spec.
 One can't understand how pdfbox works under the hood by simply looking 
 at the
 code. One has to understand the pdf spec as well, at least the base 
 concepts.
 I’m specifically talking about 

[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138134#comment-14138134
 ] 

ASF subversion and git services commented on PDFBOX-2358:
-

Commit 1625840 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1625840 ]

PDFBOX-2358: Remove ResourceLoader usage in PDFBox

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138135#comment-14138135
 ] 

John Hewson commented on PDFBOX-2358:
-

I've also removed as may uses of PDFBox's ResourceLoader as possible, because 
as you point out it is obscure and dangerous. It is now only used by 
PDFStreamEngine and its subclasses for loading .properties files. However, this 
usage is itself not OSGI compatible as it is designed to by subclasses by 
classes in a different module, where loading resources from the pdfbox module 
is not OSGI friendly.

This looks like the final straw for the PDFStreamEngine .properties mechanism, 
which was already a candidate for removal.

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2358) ExternalFonts uses classloader of class in font-box

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138135#comment-14138135
 ] 

John Hewson edited comment on PDFBOX-2358 at 9/17/14 10:39 PM:
---

I've also removed as may uses of PDFBox's ResourceLoader as possible, because 
as you point out it is obscure and dangerous. It is now only used by 
PDFStreamEngine and its subclasses for loading .properties files. However, this 
usage is itself not OSGI compatible as it is designed to by used by subclasses 
in a different module, where loading resources from the pdfbox module is not 
OSGI friendly.

This looks like the final straw for the PDFStreamEngine .properties mechanism, 
which was already a candidate for removal.


was (Author: jahewson):
I've also removed as may uses of PDFBox's ResourceLoader as possible, because 
as you point out it is obscure and dangerous. It is now only used by 
PDFStreamEngine and its subclasses for loading .properties files. However, this 
usage is itself not OSGI compatible as it is designed to by subclasses by 
classes in a different module, where loading resources from the pdfbox module 
is not OSGI friendly.

This looks like the final straw for the PDFStreamEngine .properties mechanism, 
which was already a candidate for removal.

 ExternalFonts uses classloader of class in font-box
 ---

 Key: PDFBOX-2358
 URL: https://issues.apache.org/jira/browse/PDFBOX-2358
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson

 ExternalFonts loads some default fonts via the 
 org.apache.fontbox.util.ResourceLoader. That resourceloader uses it's own 
 classloader (ResourceLoader.class.getClassLoader()) for loading the given 
 resource.
 The problem is that the resource is in the PDFBox project and the 
 ResourceLoader in the FontBox. In an OSGI environment this is a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: PDFBox-trunk #1281

2014-09-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-trunk/1281/changes

Changes:

[jahewson] PDFBOX-2358: Remove ResourceLoader usage in PDFBox

--
[...truncated 1756 lines...]
[TASKS] Scanning folder 
'https://builds.apache.org/job/PDFBox-trunk/ws/trunk/parent' for files 
matching the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #1279
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ pdfbox-parent 
---
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.pom
 (4 KB at 39.9 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-container-default/1.5.5/plexus-container-default-1.5.5.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-container-default/1.5.5/plexus-container-default-1.5.5.pom
 (3 KB at 76.9 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-classworlds/2.2.2/plexus-classworlds-2.2.2.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-classworlds/2.2.2/plexus-classworlds-2.2.2.pom
 (4 KB at 112.5 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/xbean/xbean-reflect/3.4/xbean-reflect-3.4.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/xbean/xbean-reflect/3.4/xbean-reflect-3.4.pom
 (3 KB at 78.4 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/xbean/xbean/3.4/xbean-3.4.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/xbean/xbean/3.4/xbean-3.4.pom 
(19 KB at 377.0 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/commons-logging/commons-logging-api/1.1/commons-logging-api-1.1.pom
Downloaded: 
http://repo.maven.apache.org/maven2/commons-logging/commons-logging-api/1.1/commons-logging-api-1.1.pom
 (6 KB at 149.2 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/com/google/collections/google-collections/1.0/google-collections-1.0.pom
Downloaded: 
http://repo.maven.apache.org/maven2/com/google/collections/google-collections/1.0/google-collections-1.0.pom
 (3 KB at 67.2 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/com/google/google/1/google-1.pom
Downloaded: 
http://repo.maven.apache.org/maven2/com/google/google/1/google-1.pom (2 KB at 
43.4 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.pom
 (2 KB at 36.9 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.jar
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.jar
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/1.5.15/plexus-utils-1.5.15.jar
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.jar
 (31 KB at 629.7 KB/sec)
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.0-alpha-7/plexus-resources-1.0-alpha-7.jar
 (23 KB at 360.5 KB/sec)
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/1.5.15/plexus-utils-1.5.15.jar
 (223 KB at 1638.0 KB/sec)
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
pdfbox-parent ---
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-exec/1.1/maven-reporting-exec-1.1.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-exec/1.1/maven-reporting-exec-1.1.pom
 (11 KB at 289.2 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.3/maven-shared-utils-0.3.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.3/maven-shared-utils-0.3.pom
 (4 KB at 104.0 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/18/maven-shared-components-18.pom
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/18/maven-shared-components-18.pom
 (5 KB at 137.7 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/2.0.1/jsr305-2.0.1.pom
Downloaded: 
http://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/2.0.1/jsr305-2.0.1.pom
 (965 B at 27.7 KB/sec)
Downloading: 

[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138167#comment-14138167
 ] 

John Hewson commented on PDFBOX-2301:
-

Why use scratch files at all for parsing? Couldn't we use a 
java.io.RandomAccessFile to read the PDF (or perhaps one per COS stream) which 
would avoid the extra round-trip to disk:

With Java's RandomAccessFile:
- COSStream uses RandomAccessFile to read the data from disk when it is needed

With scratch files:
- Read the entire COS stream from the PDF
- Write it to disk
- Re-read the stream from disk when COSStream is read

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff, clone3.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  |
48 | 

[jira] [Resolved] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-2357.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson
 Fix For: 2.0.0


 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread John Hewson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson reassigned PDFBOX-2357:
---

Assignee: John Hewson

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson
 Fix For: 2.0.0


 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138170#comment-14138170
 ] 

ASF subversion and git services commented on PDFBOX-2357:
-

Commit 1625849 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1625849 ]

PDFBOX-2357: Add PDTrueTypeFont constructor with InputStream

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
Assignee: John Hewson
 Fix For: 2.0.0


 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2357) PDTrueTypeFont has no method to load font from stream

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138169#comment-14138169
 ] 

John Hewson commented on PDFBOX-2357:
-

Yep, it was a mistake, I will add this method back.

 PDTrueTypeFont has no method to load font from stream
 -

 Key: PDFBOX-2357
 URL: https://issues.apache.org/jira/browse/PDFBOX-2357
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Cornelis Hoeflake
 Fix For: 2.0.0


 The PDTrueTypeFont had formely static method to load a font from a stream. 
 Now that method is gone. As far as I can see without a reason. Probably 
 removed by mistake.
 Could that method be restored?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2355) newDocuments is private in Splitter

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138183#comment-14138183
 ] 

John Hewson commented on PDFBOX-2355:
-

{quote}
I'll rewrite it properly once the new code is released.
{quote}

If you wan't until then we won't be able to incorporate any feedback into 2.0 
before the API is made stable.

 newDocuments is private in Splitter
 ---

 Key: PDFBOX-2355
 URL: https://issues.apache.org/jira/browse/PDFBOX-2355
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.8.6
 Environment: Ubuntu 14.04, Java 8_20
Reporter: G. Ralph Kuntz
Assignee: John Hewson
  Labels: pdfbox
 Fix For: 2.0.0


 The method `createNewDocument` in `Splitter` is protected, so it can be 
 overridden, but one of the things it needs to do with the new document is add 
 it to the `newDocuments` list, which is private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document

2014-09-17 Thread Cetra Free (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138197#comment-14138197
 ] 

Cetra Free commented on PDFBOX-2356:


I'm just using the code from here:

http://pdfbox.apache.org/cookbook/pdfavalidation.html

{code}
ValidationResult result = null;

FileDataSource fd = new FileDataSource(args[0]);
PreflightParser parser = new PreflightParser(fd);
try {

  /* Parse the PDF file with PreflightParser that inherits from the 
NonSequentialParser.
   * Some additional controls are present to check a set of PDF/A requirements. 
   * (Stream length consistency, EOL after some Keyword...)
   */
  parser.parse();

  /* Once the syntax validation is done, 
   * the parser can provide a PreflightDocument 
   * (that inherits from PDDocument) 
   * This document process the end of PDF/A validation.
   */
  PreflightDocument document = parser.getPreflightDocument();
  document.validate();

  // Get validation result
  result = document.getResult();
  document.close();

} catch (SyntaxValidationException e) {
  /* the parse method can throw a SyntaxValidationException 
   *if the PDF file can't be parsed.
   */ In this case, the exception contains an instance of ValidationResult  
  result = e.getResult();
}

// display validation result
if (result.isValid()) {
  System.out.println(The file  + args[0] +  is a valid PDF/A-1b file);
} else {
  System.out.println(The file + args[0] +  is not valid, error(s) :);
  for (ValidationError error : result.getErrorsList()) {
System.out.println(error.getErrorCode() +  :  + error.getDetails());
  }
}
{code}


 Error Validating PDF Archive Document
 -

 Key: PDFBOX-2356
 URL: https://issues.apache.org/jira/browse/PDFBOX-2356
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 1.8.4, 1.8.5, 1.8.6
Reporter: Cetra Free
 Attachments: pdfafile.pdf


 When trying to validate a PDF archive file (attached to this ticket) we get 
 the following error:
 {code}
 7.2   - Error on MetaData, ModificationDate present in the document catalog 
 dictionary doesn't match with XMP information
 {code}
 This is because the the Modification Date in the Dictionary is parsed 
 differently from the XMP Metadata.  The XMP Metadata is correct, but the Date 
 from the Dictionary appends an extra 30 minutes.
 The following is the raw COSObject from the PDF File
 {code}
 COSString{D:20140917122850+09'30'}
 {code}
 The Long value should be *141092273*
 The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the 
 Date with Long *141092453* which is 30 minutes ahead.
 XMP Modification Date is parsed differently and returns the correct date.
 This means that validation will fail for PDF Archives.
 My suggestion would be to refactor the parseDate function to use the Standard 
 Java library.
 Here's an example class which will be compatible with the PDF Specification:
 {code}
 static class DateParser {
  private MapInteger, SimpleDateFormat formats =
new HashMapInteger, SimpleDateFormat();
  
  public DateParser() {
String expr = ;
  
   for(String part: Arrays.asList(, MM, dd, HH, mm, ss, Z)) {
  expr = expr + part;
  formats.put(expr.length(), new SimpleDateFormat(expr));
}
  }
  
  public Calendar parseDate(String expr) {
try {
  expr = expr.replace(D:, ).replace(', ).replace(Z, +);
  Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
  
  
  Calendar calendar =  Calendar.getInstance();
  calendar.setTime(date);
  
  return calendar;
} catch (ParseException e) {
  return null;
}
  }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: PDFBox-trunk » Apache FontBox #1282

2014-09-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/1282/

--
[INFO] 
[INFO] 
[INFO] Building Apache FontBox 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ fontbox ---
[TASKS] Scanning folder 
'https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/' 
for files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 93 files to scan for tasks
Found 17 open tasks.
[TASKS] Computing warning deltas based on reference build #1279
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ fontbox ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ fontbox 
---
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] Copying 89 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ fontbox ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 88 source files to 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/target/classes
[WARNING] Note: Some input files use unchecked or unsafe operations.
[WARNING] Note: Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
fontbox ---
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
fontbox ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 5 source files to 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ fontbox ---
[INFO] Surefire report directory: 
https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$fontbox/ws/target/surefire-reports

---
 T E S T S
---
Running org.apache.fontbox.cff.Type1FontUtilTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.027 sec - in 
org.apache.fontbox.cff.Type1FontUtilTest
Running org.apache.fontbox.cmap.TestCMap
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec - in 
org.apache.fontbox.cmap.TestCMap
Running org.apache.fontbox.cmap.TestCMapParser
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec - in 
org.apache.fontbox.cmap.TestCMapParser
Running org.apache.fontbox.ttf.TestTTFParser
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.044 sec - in 
org.apache.fontbox.ttf.TestTTFParser
Running org.apache.fontbox.ttf.TestMemoryTTFDataStream
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
org.apache.fontbox.ttf.TestMemoryTTFDataStream

Results :

Tests run: 7, Failures: 0, Errors: 0, Skipped: 0

[JENKINS] Recording test results
[INFO] 
[INFO] --- maven-bundle-plugin:2.4.0:bundle (default-bundle) @ fontbox ---
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
fontbox ---
[INFO] 
[INFO] --- apache-rat-plugin:0.10:check (default) @ fontbox ---
[INFO] 51 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 202 resources included (use -debug for more details)
[INFO] Rat check: Summary of files. Unapproved: 89 unknown: 89 generated: 0 
approved: 107 licence.


Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread John Hewson
I agree with Tilman on this point, the examples need to stay in the trunk where 
they can be built along with it.
It’s very common to modify an example to take into account API changes. They’re 
also currently distributed along with the main PDFBox source bundle, which is a 
good thing.

I’d be surprised if anybody outside of the project wanted to contribute to the 
documentation, almost nobody seems to like writing it. Perhaps we could do this 
as a trial - see if it really increases contributions or not? It would be great 
if it did.

It’s worth adding that I’m (reluctantly) against moving PDFBox trunk over to 
GitHub because GitHub Issues is not powerful enough for our needs (e.g. no file 
attachments), which is really a shame.

-- John

On 17 Sep 2014, at 10:26, Tilman Hausherr thaush...@t-online.de wrote:

 Hi Maruan,
 
 The examples only.
 
 With the docs I assume you mean the website. I've never touched it 
 (although I might in the future), it isn't part of the project, so I don't 
 mind.
 
 Tilman
 
 Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun:
 is that because of the examples, the docs or both?
 
 BR
 
 Maruan
 
 Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de:
 
 It is a I don't like it, but I can live with it but I think it might be a 
 pain. A soft -1.
 
 Tilman
 
 Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:
 Hi,
 
 Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 18:03
 geschrieben:
 
 
 -1, I don't like the idea to have different repository types.
 Hmmm, is this just a I don't like it, but I can live with it or is it a 
 clear
 veto?
 
 In a case of a veto, how about starting with moving parts of the docs to a 
 new
 git repo? IMO sooner or later the project will move from svn to git and 
 that
 would be a good opertunity to get used to the general usage of git and of 
 course
 to the special processes used here at the ASF so that we are not thrown in 
 at
 the deep end after the migration.
 
 Tilman
 BR
 Andreas
 
 Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:
 Hi there,
 
 in order to make it easier for people to contribute to the documentation 
 and
 examples I thought about the potential benefits of moving these to a git
 based repository instead of svn. The main idea behind that is to allow
 people to contribute via github opening another channel of communication 
 and
 making it easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes necessary 
 yet
 but wanted to get a first feedback about support for that idea before
 putting more effort into that.
 
 WDYT?
 
 Maruan
 
 



[jira] [Comment Edited] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138255#comment-14138255
 ] 

John Hewson edited comment on PDFBOX-2340 at 9/18/14 12:14 AM:
---

The first mockup of the documentation was better: as a user I want content, not 
empty green space, or - heaven forbid - anything which resembles a PDF :). 
Perhaps we're getting ahead of ourselves here, the content should come first.


was (Author: jahewson):
The first mockup of the documentation was better: as a user I want content, not 
empty green space, or - heaven forbid - anything which resembles a PDF :). 
Perhaps we're getting ahead of ourselves here, after all content should come 
first.

 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Maruan Sahyoun
 Attachments: Mockup-20140912.png, Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from the examples/snippet ‚repository‘.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2340) Overhaul PDFBox Documentation

2014-09-17 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138255#comment-14138255
 ] 

John Hewson commented on PDFBOX-2340:
-

The first mockup of the documentation was better: as a user I want content, not 
empty green space, or - heaven forbid - anything which resembles a PDF :). 
Perhaps we're getting ahead of ourselves here, after all content should come 
first.

 Overhaul PDFBox Documentation
 -

 Key: PDFBOX-2340
 URL: https://issues.apache.org/jira/browse/PDFBOX-2340
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Maruan Sahyoun
 Attachments: Mockup-20140912.png, Mockup_Documentation.png


 In oder to make it easier for users of PDFBox to work with the library there 
 shall be an enhanced documentation consisting of an introduction, API 
 references and more well documented examples and code snippets (Cookbook).
 In order to make it easier to contribute the Cookbook shall be build 
 automatically from the examples/snippet ‚repository‘.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] move documentation and examples to git

2014-09-17 Thread Maruan Sahyoun


Maruan Sahyoun

 Am 18.09.2014 um 02:03 schrieb John Hewson j...@jahewson.com:
 
 I agree with Tilman on this point, the examples need to stay in the trunk 
 where they can be built along with it.
 It’s very common to modify an example to take into account API changes. 
 They’re also currently distributed along with the main PDFBox source bundle, 
 which is a good thing.
 
 I’d be surprised if anybody outside of the project wanted to contribute to 
 the documentation, almost nobody seems to like writing it. Perhaps we could 
 do this as a trial - see if it really increases contributions or not? It 
 would be great if it did.
 

OK so lets try with the docs. 

To mention it for completness - the build process for the web site and the 
documentation contained within will still be done by the Apache CMS. 

 It’s worth adding that I’m (reluctantly) against moving PDFBox trunk over to 
 GitHub because GitHub Issues is not powerful enough for our needs (e.g. no 
 file attachments), which is really a shame.
 

Issue tracking would still be done using Jira. Same as for most other Apache 
projects

 -- John
 
 On 17 Sep 2014, at 10:26, Tilman Hausherr thaush...@t-online.de wrote:
 
 Hi Maruan,
 
 The examples only.
 
 With the docs I assume you mean the website. I've never touched it 
 (although I might in the future), it isn't part of the project, so I don't 
 mind.
 
 Tilman
 
 Am 17.09.2014 um 19:01 schrieb Maruan Sahyoun:
 is that because of the examples, the docs or both?
 
 BR
 
 Maruan
 
 Am 17.09.2014 um 18:46 schrieb Tilman Hausherr thaush...@t-online.de:
 
 It is a I don't like it, but I can live with it but I think it might be a 
 pain. A soft -1.
 
 Tilman
 
 Am 17.09.2014 um 08:40 schrieb Andreas Lehmkühler:
 Hi,
 
 Tilman Hausherr thaush...@t-online.de hat am 16. September 2014 um 
 18:03
 geschrieben:
 
 
 -1, I don't like the idea to have different repository types.
 Hmmm, is this just a I don't like it, but I can live with it or is it a 
 clear
 veto?
 
 In a case of a veto, how about starting with moving parts of the docs to 
 a new
 git repo? IMO sooner or later the project will move from svn to git and 
 that
 would be a good opertunity to get used to the general usage of git and of 
 course
 to the special processes used here at the ASF so that we are not thrown 
 in at
 the deep end after the migration.
 
 Tilman
 BR
 Andreas
 
 Am 16.09.2014 um 10:21 schrieb Maruan Sahyoun:
 Hi there,
 
 in order to make it easier for people to contribute to the 
 documentation and
 examples I thought about the potential benefits of moving these to a git
 based repository instead of svn. The main idea behind that is to allow
 people to contribute via github opening another channel of 
 communication and
 making it easier to contribute.
 
 Proposed names are pdfbox-docs and pdfbox-examples. Take a look at
 https://github.com/apache/cordova-docs for an example of that.
 
 I haven’t thought about all potential implications and changes 
 necessary yet
 but wanted to get a first feedback about support for that idea before
 putting more effort into that.
 
 WDYT?
 
 Maruan
 


[jira] [Commented] (PDFBOX-2301) RandomAccessBuffer consumes too much memory.

2014-09-17 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138545#comment-14138545
 ] 

Tilman Hausherr commented on PDFBOX-2301:
-

I thought about it too, but there are some bizarre situations where the objects 
are itself compressed in a stream, or when streams are encrypted. So in these 
case we would have to have temp files in memory or on disk.

 RandomAccessBuffer consumes too much memory.
 

 Key: PDFBOX-2301
 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.6, 2.0.0
Reporter: gee
Assignee: Andreas Lehmkühler
 Fix For: 2.0.0

 Attachments: clone.diff, clone2.diff, clone3.diff


 RandomAccessBuffer holds uncompressed image during operation because it is 
 what exactly pdfbox ExtractImages do.
 but holding uncompressed image instead of compressed one in memory consumes 
 too much memory, not excluding many PDF XObjects that can use filter to 
 compress itself. It would be good if pdfbox provides option that reverts to 
 COSObject state just before the RandomAccess object created(the state that 
 pdf XObject stream parsed and COSDictionary objects haven't created because 
 user doesn't requested it using get() method.) It is crucial feature so 
 that pdfbox can analyze huge pdf file(100MB).
 In current source, one must close COSStream unless required(and I know closed 
 stream cannot reopened again.)
 Class Name
   
   
  | 
 Shallow Heap | Retained Heap
 --
 org.apache.pdfbox.cos.COSObject @ 0x5ad4940   
   
   
  |
24 | 8,187,264
 |- class class org.apache.pdfbox.cos.COSObject @ 0x58c4020  
   
   
  |
 0 | 0
 |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080  
   
   
  |
24 |24
 |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 
   
   
  |
32 | 8,187,216
 |  |- class class org.apache.pdfbox.cos.COSStream @ 0x58c3e00   
   
   
  |
 8 | 8
 |  |- items java.util.LinkedHashMap @ 0x5b2a0f0   
   
   
  |
56 |   552
 |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128
   
   
  |
48 | 8,186,528
 |  |  |- class class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00