date:20140209

[jira] [Commented] (PDFBOX-1330) Generic changes

2014-02-09 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896286#comment-13896286
 ] 

John Hewson commented on PDFBOX-1330:
-

These sorts of changes are definitely needed in 2.0.0. There are hundreds of 
instances of cases where generics and enums could replace older Java 
constructs, so while the dozen or so here are certainly relevant, a much bigger 
refactor is needed.

> Generic changes
> ---
>
> Key: PDFBOX-1330
> URL: https://issues.apache.org/jira/browse/PDFBOX-1330
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.8.0
>Reporter: Jens Kapitza
> Attachments: PDFTextStripperByArea.java.diff, PDPageNode.java.diff, 
> PDPageNode.java.diff, TextNormalize.diff, TextPosition.diff
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1734) ImageIoUtil.WriteImage doesn't work with tiff images

2014-02-09 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896285#comment-13896285
 ] 

Tilman Hausherr commented on PDFBOX-1734:
-

Sadly, I found two arguments against Apache Imaging - for now. One is that the 
size is sometimes bigger than with java imageio (see IMAGING-126 that I just 
created), the other is that it doesn't support writing JPEGs.

About the Sun code you just found: it looks quite similar to the module names 
from the com.sun.* code. We'll have to see whether it does whats needed.

> ImageIoUtil.WriteImage doesn't work with tiff images
> 
>
> Key: PDFBOX-1734
> URL: https://issues.apache.org/jira/browse/PDFBOX-1734
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
> Environment: XP, W7
>Reporter: Tilman Hausherr
>Priority: Minor
>  Labels: tiff
> Attachments: ImageIOUtil.patch, TestImageIOUtils.patch
>
>
> ImageIoUtil.WriteImage brings an I/O error exception when trying to write a 
> tiff file. Debugging shows that the cause is "Bits per sample must be 1 for 
> RLE compression!". This means that the compression used (the first one of the 
> following list, returned by writerParams.getCompressionTypes() ) is only 
> allowed for bitonal images.
> CCITT RLE
> CCITT T.4
> CCITT T.6
> LZW
> JPEG
> ZLib
> PackBits
> Deflate
> EXIF JPEG
> After correcting this, the next problem was that tiff images didn't have the 
> proper resolutions. I added that too. Yes it uses the com.sun.* classes; 
> however there is no other way. Even apache xmlgraphics uses them, although in 
> a very different way than I do
> https://svn.apache.org/repos/asf/xmlgraphics/commons/tags/commons-1_3_1/src/java/org/apache/xmlgraphics/image/writer/imageio/ImageIOTIFFImageWriter.java
> writeImage() has a parameter "int imageType" which is never used. Why?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1330) Generic changes

2014-02-09 Thread Thomas Chojecki (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896280#comment-13896280
 ] 

Thomas Chojecki commented on PDFBOX-1330:
-

This changes will be done in the pdfbox 2.0.0
Maybe we can use some of the diffs if there aren't to many changes done in the 
past

> Generic changes
> ---
>
> Key: PDFBOX-1330
> URL: https://issues.apache.org/jira/browse/PDFBOX-1330
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.8.0
>Reporter: Jens Kapitza
> Attachments: PDFTextStripperByArea.java.diff, PDPageNode.java.diff, 
> PDPageNode.java.diff, TextNormalize.diff, TextPosition.diff
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1895) Font definitions must precede font references

2014-02-09 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896281#comment-13896281
 ] 

John Hewson commented on PDFBOX-1895:
-

In a PDF file the order in which the objects are numbered does not matter, from 
the PDF spec:

{quote}
Indirect objects may be numbered sequentially within a PDF file, but this is 
not required; object numbers may be assigned in any arbitrary order. 
{quote}

So the object numbering is not the cause of your problem. Without the file 
there's really no way to know what's wrong, are you able to generate a 
non-sensitive file which exhibits these problems, or perhaps use PDFSplit to 
extract a non-sensitive page?

> Font definitions must precede font references
> -
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Build failed in Jenkins: PDFBox-trunk » PDFBox reactor #801

2014-02-09 Thread Apache Jenkins Server

See 


--
[INFO] 
[INFO] 
[INFO] Building PDFBox reactor 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ pdfbox-reactor ---
[INFO] Deleting 

[TASKS] Scanning folder 
'
 for files matching the pattern '**/*.java' - excludes: parent/**/*, 
fontbox/**/*, jempbox/**/*, xmpbox/**/*, pdfbox/**/*, preflight/**/*, 
preflight-app/**/*, war/**/*, tools/**/*, app/**/*, examples/**/*
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #800
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.4:process (default) @ pdfbox-reactor 
---
[INFO] 
[INFO] --- maven-site-plugin:3.2:attach-descriptor (attach-descriptor) @ 
pdfbox-reactor ---
[INFO] 
[INFO] --- apache-rat-plugin:0.10:check (default) @ pdfbox-reactor ---
[INFO] 62 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 25 resources included (use -debug for more details)
[INFO] Rat check: Summary of files. Unapproved: 2 unknown: 2 generated: 0 
approved: 9 licence.
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] PDFBox parent . SUCCESS [39.902s]
[INFO] Apache FontBox  SUCCESS [36.598s]
[INFO] Apache JempBox  SUCCESS [20.583s]
[INFO] Apache XmpBox . SUCCESS [30.753s]
[INFO] Apache PDFBox . SUCCESS [1:14.105s]
[INFO] Apache Preflight .. SUCCESS [31.625s]
[INFO] Apache Preflight application .. SUCCESS [23.368s]
[INFO] Apache PDFBox webapp .. SUCCESS [13.856s]
[INFO] Apache PDFBox tools ... SUCCESS [15.257s]
[INFO] Apache PDFBox application . SUCCESS [29.693s]
[INFO] Apache PDFBox examples  SUCCESS [13.076s]
[INFO] PDFBox reactor  FAILURE [3.965s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 5:53.277s
[INFO] Finished at: Mon Feb 10 07:10:04 UTC 2014
[INFO] Final Memory: 42M/417M
[INFO] 
Waiting for Jenkins to finish collecting data[ERROR] Failed to execute goal 
org.apache.rat:apache-rat-plugin:0.10:check (default) on project 
pdfbox-reactor: Too many files with unapproved license: 2 See RAT report in: 

 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :pdfbox-reactor

Build failed in Jenkins: PDFBox-trunk #801

2014-02-09 Thread Apache Jenkins Server

See 

Changes:

[lehmi] added John and Tilman as new PMC members

[lehmi] ignore some subdirs

--
[...truncated 992 lines...]
[JENKINS] Recording test results
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ pdfbox-tools ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-site-plugin:3.2:attach-descriptor (attach-descriptor) @ 
pdfbox-tools ---
[INFO] 
[INFO] --- apache-rat-plugin:0.10:check (default) @ pdfbox-tools ---
[INFO] 51 implicit excludes (use -debug for more details).
[INFO] Exclude: release.properties
[INFO] 24 resources included (use -debug for more details)
[INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 
approved: 23 licence.
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ pdfbox-tools 
---
[INFO] Installing 

 to 
/home/jenkins/jenkins-slave/maven-repositories/1/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/pdfbox-tools-2.0.0-SNAPSHOT.jar
[INFO] Installing 
 to 
/home/jenkins/jenkins-slave/maven-repositories/1/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/pdfbox-tools-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-deploy-plugin:2.7:deploy (default-deploy) @ pdfbox-tools ---
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/maven-metadata.xml
 (779 B at 2.2 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/pdfbox-tools-2.0.0-20140210.070908-3.jar
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/pdfbox-tools-2.0.0-20140210.070908-3.jar
 (74 KB at 196.5 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/pdfbox-tools-2.0.0-20140210.070908-3.pom
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/pdfbox-tools-2.0.0-20140210.070908-3.pom
 (3 KB at 6.8 KB/sec)
Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/maven-metadata.xml
Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/maven-metadata.xml
 (289 B at 1.2 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/maven-metadata.xml
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/2.0.0-SNAPSHOT/maven-metadata.xml
 (779 B at 2.0 KB/sec)
Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/maven-metadata.xml
Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-tools/maven-metadata.xml
 (289 B at 0.8 KB/sec)
[INFO] 
[INFO] 
[INFO] Building Apache PDFBox application 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ pdfbox-app ---
[INFO] Deleting 
[TASKS] Scanning folder 
' for files matching 
the pattern '**/*.java' - excludes: 
[TASKS] Found 0 files to scan for tasks
Found 0 open tasks.
[TASKS] Computing warning deltas based on reference build #800
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.4:process (default) @ pdfbox-app ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
pdfbox-app ---
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ pdfbox-app 
---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
pdfbox-app ---
[INFO] Using 'ISO-8859-1' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ 
pdfbox-app ---

[jira] [Commented] (PDFBOX-1803) StringIndexOutOfBound on DateConverter.toCalendar

2014-02-09 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896180#comment-13896180
 ] 

Tim Allison commented on PDFBOX-1803:
-

[~zweibieren], thank you for all of your work on date parsing!
This patch does a far better job than I did with my duplicate (PDFBOX-1883) of 
dealing with back-compat and the history of the exceptions/return values. 

 Would it be possible to consider two small mods (first is important, second 
isn't):
1) add handling and tests for a string that is made up of only spaces.  Current 
patch still throws StringIndexOutOfBounds for that case.
2) add a syntactic sugar static isBad(Calendar cal) function (or similar) so 
that clients don't have to test for null and test for the magic bad date. 

> StringIndexOutOfBound on DateConverter.toCalendar
> -
>
> Key: PDFBOX-1803
> URL: https://issues.apache.org/jira/browse/PDFBOX-1803
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel, Utilities
>Affects Versions: 1.8.3
>Reporter: Eric Leleu
>Priority: Minor
> Attachments: PDFBOX-DateConverter-1.8-fred.patch, 
> PDFBOX-DateConverter-Trunk-fred.patch, PDFBox-DateConverter-Br18.patch, 
> PDFBox-DateConverter-Trunk.patch
>
>
> Some PDF have an empty string as CreationDate &  ModDate in the Information 
> Dictionary.
> According to the PDF specification, this two element are optional.
> My first fix was to test the null & the empty string in the 
> toCalendar(String, String[]) method and I return null if one of the both 
> condition is verified.
> But according to a test case(TestDateUtil) a NullPointer is expected on null 
> value of text. Can you explain why this behaviour has been adopted?
> To fixe this unexpected exception in my execution path, I have added a test 
> on the empty string in the deprecated method toCalendar(String). (Patch in 
> attachment)
> I'm waiting your comment before commit this patch (or change it by my first 
> implementation)
> BR,
> Eric



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1895) Font definitions must precede font references

2014-02-09 Thread Pat Hickey (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896173#comment-13896173
 ] 

Pat Hickey commented on PDFBOX-1895:


Oops... forgot to add object 405...

405 0 obj
<<
/Ascent 859
/CapHeight 669
/Descent -140
/Flags 7
/FontBBox [-82 -136 996 859]
/FontName /MS-Mincho
/ItalicAngle 0
/StemV 92
/Style <<
/Panose 
>>
/Type /FontDescriptor
/XHeight 439
>>
endobj


> Font definitions must precede font references
> -
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1895) Font definitions must precede font references

2014-02-09 Thread Pat Hickey (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896171#comment-13896171
 ] 

Pat Hickey commented on PDFBOX-1895:


I am very sorry not to have permission (yet) to share the document. I'm working 
on that.
I'll just illustrate one of the items that don't seem to be working.
The input document is looking for font F0 as object number 7.
It has the FontDescriptor as object number 5, 
which is before it is used in the Font defined in object number 6, 
which is then used as the DescendantFonts in object number 7, where font F0 is 
expected.
This is the pattern of the input document: the font descriptors precede the 
references to them.
Adobe Reader is able to find and display these fonts.
The output document is looking for font F0 as object number 145.
Object number 145 references the DescendantFonts in object number 375.
Object number 375 references the FontDescriptor in object number 405.
This is the pattern of the output document: the font descriptors follow the 
references to them.
Adobe Reader is *not* able to find or display these fonts.
Again, the operation here is, basically, just load and save.

h4. INPUT DOCUMENT (just the good parts):
3 0 obj
<<
/Font <<
/F0 7 0 R
/F1 10 0 R
/G1 277 0 R
>>
5 0 obj
<<
/Ascent 859
/CapHeight 669
/Descent -140
/Flags 7
/FontBBox
\[-82 -136 996 859\]

/FontName /MS-Mincho
/ItalicAngle 0
/StemV 92
/Style <<
/Panose 
>>
/Type /FontDescriptor
/XHeight 439
>>
endobj
6 0 obj
<<
/BaseFont /MS-Mincho
/CIDSystemInfo <<
/CIDToGIDMap /Identity
/Ordering <15E1735673F3>
/Registry <1EE46C5578>
/Supplement 4
>>
/DW 1000
/FontDescriptor 5 0 R
/Subtype /CIDFontType2
/Type /Font
/W \[ WIDTHS REMOVED FOR BREVITY \]
>>
endobj
7 0 obj
<<
/BaseFont /MS-Mincho
/DescendantFonts
\[6 0 R\]

/Encoding /UniJIS-UCS2-H
/Subtype /Type0
/Type /Font
>>
endobj

h4. OUTPUT DOCUMENT (just the good parts):

54 0 obj
<<
/Font <<
/F0 145 0 R
/F1 146 0 R
/G1 147 0 R
>>
145 0 obj
<<
/BaseFont /MS-Mincho
/DescendantFonts
\[375 0 R\]

/Encoding /UniJIS-UCS2-H
/Subtype /Type0
/Type /Font
>>
endobj
375 0 obj
<<
/BaseFont /MS-Mincho
/CIDSystemInfo <<
/CIDToGIDMap /Identity
/Ordering <15E1735673F3>
/Registry <1EE46C5578>
/Supplement 4
>>
/DW 1000
/FontDescriptor 405 0 R
/Subtype /CIDFontType2
/Type /Font
/W \[ WIDTHS DELETED FOR BREVITY \]
>>
endobj

> Font definitions must precede font references
> -
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1883) Avoid StringIndexOutOfBoundsException in DateConverter

2014-02-09 Thread Fred Hansen (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896143#comment-13896143
 ] 

Fred Hansen commented on PDFBOX-1883:
-

1803 is still open because no committer has yet gotten around to incorporating 
the patch there.

A key question is whether that patch would resolve your problem. If so, you 
should vote to 1803. If not, please let me know what about the behavior should 
change.

Thanks, Fred Hansen

> Avoid StringIndexOutOfBoundsException in DateConverter
> --
>
> Key: PDFBOX-1883
> URL: https://issues.apache.org/jira/browse/PDFBOX-1883
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4
>Reporter: Tim Allison
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.8.5
>
> Attachments: PDFBOX-1883.patch
>
>
> Passing an empty string to parseDate can result in an 
> ArrayIndexOutOfBoundsException.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: 0
>   at java.lang.String.charAt(Unknown Source)
>   at 
> org.apache.pdfbox.util.DateConverter.parseDate(DateConverter.java:680)
>   at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:808)
>   at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:780)
>   at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:754)
>   at org.apache.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:797)
>   at 
> org.apache.pdfbox.pdmodel.PDDocumentInformation.getCreationDate(PDDocumentInformation.java:210)
>   at 
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:170)
>   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:142)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> I can't share the triggering document, but I'll submit patch with test case 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1167) PDFStreamEngine#processSubStream should throw original IOException instead of RuntimeException + FIX

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1167:


Component/s: Documentation

> PDFStreamEngine#processSubStream should throw original IOException instead of 
> RuntimeException + FIX
> 
>
> Key: PDFBOX-1167
> URL: https://issues.apache.org/jira/browse/PDFBOX-1167
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.7.0
>Reporter: Timo Boehme
>Priority: Minor
> Attachments: PDFStreamEngine_processSubStream.java
>
>
> PDFStreamEngine#processSubStream(COSStream) uses TokenIterator from 
> PDFStreamParser. This iterator when called hasNext() might face an 
> IOException which it wraps insight a RuntimeException (because hasNext has no 
> declared exceptions). Now when such an IOException occurs it normally won't 
> be handled by any calling class  because only IOException is declared to be 
> thrown but none of the classes is prepared to handle RuntimeException. 
> Therefore within the mentioned method processSubStream the RuntimeException 
> should be catched and tested for embedded IOException. In this case the 
> IOException should be thrown.
> I will attach the processSubStream method with the added catch block (method 
> taken from rev. 1163297).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1187) Cut dependency between pdfbox and jempbox

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1187:


Component/s: JempBox

> Cut dependency between pdfbox and jempbox
> -
>
> Key: PDFBOX-1187
> URL: https://issues.apache.org/jira/browse/PDFBOX-1187
> Project: PDFBox
>  Issue Type: Wish
>  Components: JempBox
>Reporter: Guillaume Bailleul
>Assignee: Guillaume Bailleul
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: cut_jempbox.patch
>
>
> pdfbox artifact depends on jempbox only in PDMetadata class where two methods 
> export or import XMPMetadata :
> * exportXMPMetadata
> * importXMPMetadata
> The work on serializing/unserializing could be done in the calling code 
> without complexity (see attached patch)
> Please give opinion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-240) Overlay generates invalid PDF

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-240:
---

Component/s: Utilities

> Overlay generates invalid PDF
> -
>
> Key: PDFBOX-240
> URL: https://issues.apache.org/jira/browse/PDFBOX-240
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1649236
> Originally submitted by lag0m0rph on 2007-01-31 14:16.
> I'm trying to use Overlay to combine two PDFs and the resultant PDF appears 
> to have dangling references to objects not in the PDF so is therefore 
> invalid. The error message I get is "Could not find the ColorSpace named 
> 'Cs6overlay'." This object is defined by PDFBox during the Overlay operation. 
> There is an object Cs6 in the PDF in the foreground. I get this error with 
> the last release code as well as the nightly build 0.7.4-dev-20070130.
> I will attach a PDF that when used as an overlay will exhibit the problem 
> with any general PDF I tried to overlay it on.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1649236&file_id=213941
> badoverlay.pdf (application/pdf), 5917 bytes
> file that when used as overlay generates invalid PDF



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1086) Error when decoding CCITT compressed data that contains EOLs, fill bits etc.

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1086:


Component/s: Parsing

> Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
> 
>
> Key: PDFBOX-1086
> URL: https://issues.apache.org/jira/browse/PDFBOX-1086
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Reporter: Jeremias Maerki
>Assignee: Jeremias Maerki
>
> The TIFFFaxDecoder class (originally coming from JAI via XML Graphics 
> Commons) does not handle cases like EOLs between lines and in front. But the 
> PDF CCITTFaxDecode filter needs to allow many different variants of the 
> encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT 
> data, so TIFFFaxDecoder was not written to be as flexible as we need it. 
> Ideally, PDFBox should handle anything that gets thrown at it.
> It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with 
> the necessary flexibility. So, new decoders for T.4 and T.6 should probably 
> be written.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1462) OOM on object deflate

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1462:


Component/s: Parsing

> OOM on object deflate
> -
>
> Key: PDFBOX-1462
> URL: https://issues.apache.org/jira/browse/PDFBOX-1462
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.7.1
>Reporter: Edoardo Causarano
>
> org.apache.pdfbox.filter.FlateFilter#decompress can cause OOM (depending on 
> the size of the document and configured heap of course.) Consider using 
> something like 
> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/FileBackedOutputStream.html
>  to constraint memory pressure.
> Best,
> Edoardo



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1268) OutOfMemory Error because of huge colors

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1268:


Component/s: Parsing

> OutOfMemory Error because of huge colors
> 
>
> Key: PDFBOX-1268
> URL: https://issues.apache.org/jira/browse/PDFBOX-1268
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.6.0
>Reporter: Christophe Vandeplas
> Attachments: CVE-2009-3957 PDF 2009-09-21_RANDInfo.pdf
>
>
> Hi,
> Am 26.03.2012 07:42, schrieb Christophe Vandeplas:
> Hello List,
> I'm working on a PDF scanning tool and with a specific (malicious) PDF
> I always get OutOfMemory Errors.
> The backtrace is:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>at 
> org.apache.pdfbox.filter.FlateFilter.decodePredictor(FlateFilter.java:218)
>at 
> org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:170)
>at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
>at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
>at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
>at ScanPdf.checkCOSBaseObject(ScanPdf.java:199)
> ...
> When looking in the PDFBox code FlateFilter.java:218 is
> byte[] lastline = new byte[rowlength];
> In that contact rowlength = 1073741838   =>  seems rather big, no?
> Looking back in the code it seems that it's colors who is so big.
> Colors seems to be extracted from the dict in FlateFilter.java:96:
> colors = dict.getInt(COSName.COLORS);
> The (malicious) PDF has indeed the definition :/Colors 1073741838
> Hmm, that sounds quite large, but the pdf spec describes the colors value as 
> follows:
> "(May be used only if Predictor is greater than 1) The number of interleaved 
> colour components per sample. Valid values are 1 to 4 (PDF 1.0) and 1 or 
> greater (PDF 1.3). Default value: 1."
> So my question is now:
> Is this something I need to catch in my own code, or should PDFBox be
> patched to catch such issues? (like the catched OutOfMemoryError in
> FlateFilter:124)
> PDFBox should handle that. Please create an issue on JIRA [1] and attach the 
> pdf in question.
> Thanks for your expertise
> Christophe
> BR
> Andreas Lehmkühler



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-194) PDPageXYZDestination jumps one page too far

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-194:
---

Component/s: PDModel

> PDPageXYZDestination jumps one page too far
> ---
>
> Key: PDFBOX-194
> URL: https://issues.apache.org/jira/browse/PDFBOX-194
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1549236
> Originally submitted by nobody on 2006-08-30 05:05.
> When using PDPageXYZDestination with zoom -1, top 0 and
> left 0 (other valeus, too) it jumps one page further
> than it should.
> This also means that PDPageXYZDestination can't jump to
> the first page



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1788) [PATCH] Show warning if system font not found

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1788:


Component/s: Rendering

> [PATCH] Show warning if system font not found
> -
>
> Key: PDFBOX-1788
> URL: https://issues.apache.org/jira/browse/PDFBOX-1788
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Reporter: simon steiner
> Attachments: warnmissingfonts.patch
>
>
> If you process a pdf which doesnt embed a font, pdfbox will try to use system 
> font but that font may not exist so we should print a warning.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-877) processOperator breaks contract - never throws IOException

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-877:
---

Component/s: Documentation

> processOperator breaks contract - never throws IOException
> --
>
> Key: PDFBOX-877
> URL: https://issues.apache.org/jira/browse/PDFBOX-877
> Project: PDFBox
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Jimmy Juncker
>
> PDFStreamEngine.processOperator documents to throw IOException. However, all 
> Exceptions are swallowed (in a catch Exception clause) which makes it 
> impossible for us to handle:
> /**
>  * This is used to handle an operation.
>  *
>  * @param operator The operation to perform.
>  * @param arguments The list of arguments.
>  *
>  * @throws IOException If there is an error processing the operation.
>  */
> protected void processOperator( PDFOperator operator, List arguments ) 
> throws IOException
> {
> try
> {
> String operation = operator.getOperation();
> OperatorProcessor processor = (OperatorProcessor)operators.get( 
> operation );
> if( processor != null )
> {
> processor.setContext(this);
> processor.process( operator, arguments );
> }
> else
> {
> if (!unsupportedOperators.contains(operation)) 
> {
> log.info("unsupported/disabled operation: " + operation);
> unsupportedOperators.add(operation);
> }
> }
> }
> catch (Exception e)
> {
> log.warn(e, e);
> }
> }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-577) TextPosition should expose its bounding box

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-577:
---

Component/s: PDModel

> TextPosition should expose its bounding box
> ---
>
> Key: PDFBOX-577
> URL: https://issues.apache.org/jira/browse/PDFBOX-577
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Reporter: Villu Ruusmann
> Attachments: 
> 0001-PDFont.java-Add-methods-to-retreive-the-Ascent-and-D.patch, 
> AFM-getHeight.png, AFM-getUpperRightY.png, textposition-randombg.zip
>
>
> It does not seem to be possible to calculate the bounding box of a 
> TextPosition.
> IIUC, TextPosition#getY is the baseline of the text and 
> TextPosition#getHeight is the absolute height of the text. When I subtract 
> the latter from the former I get a top line, but this is only correct if the 
> text does not contain descender characters.
> Below is a screenshot (AFM-getHeight.png) which shows the bounding boxes of 
> TextPositions calculated as {#getX(), #getY() - #getHeight, #getWidth, 
> #getHeight} painted in random colors. For example, the bounding boxes of 
> parentheses are severely misplaced, which makes the line-by-line text 
> extraction impossible.
> Right now I've solved the problem by tweaking AFM FontMetrics code so that it 
> returns BoundingBox#getUpperRightY instead of BoundingBox#getHeight when 
> queried via PDSimpleFont#getFontHeight(byte[], int, int). Another screenshot 
> (AFM-getUpperRightY.png) shows how this restores the previously broken text 
> extraction ability.
> It seems like a good idea to rework TextPosition so that it would be aware of 
> its bounding box:
> *) Replace methods PDSimpleFont#getFontWidth(byte[], int, int) and 
> PDSimpleFont#getFontHeight(byte[], int, int) with a single method 
> PDSimpleFont#getFontBoundingBox(byte[], int, int)
> *) Replace the constructor TextPosition(Matrix, Matrix) with 
> TextPosition(Matrix, BoundingBox)
> *) Add new methods TextPosition#getBoundingBox, 
> TextPosition#getBoundingBoxDir. This shouldn't affect existing application 
> clients, because TextPosition#getY and TextPosition#getHeight remain in place.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-961) Unable to extract string from COSArray with indirect reference to COSString

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-961:
---

Component/s: Parsing

> Unable to extract string from COSArray with indirect reference to COSString
> ---
>
> Key: PDFBOX-961
> URL: https://issues.apache.org/jira/browse/PDFBOX-961
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Reporter: Kevin Jackson
> Attachments: COSArray.java.patch
>
>
> COSArray.getString() returned null if the COSArray contained an indirect 
> reference to the String.  This resulted in a NPE in a calling function.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-944) number of pages returns the incorrect number for some PDFs

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-944:
---

Component/s: Parsing

> number of pages returns the incorrect number for some PDFs
> --
>
> Key: PDFBOX-944
> URL: https://issues.apache.org/jira/browse/PDFBOX-944
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.4.0
>Reporter: Adam Nichols
>
> This is a regression bug which appeared between 1.3.1 and 1.4.0, as the 
> former returns the correct page count while the latter does not.  
> Unfortunately, the PDF which demonstrates this problem is confidential, so I 
> can not attach it here, however I will describe the things which may be 
> causing this problem as best I can.
> The problem does not occur after using the "uncompress" feature of pdftk.  
> The problem does not occur after using PdfDecompressor from PDFBox.  The 
> original file which was given to me is Linearized.  In Adobe Acrobat Standard 
> -> File -> Properties, it says the Application was "Adobe Photoshop CS4 
> Windows", the PDF Producer was "Adobe Photoshop for Windows -- Image 
> Conversion Plug-in" and the PDF Version is 1.7 (Acrobat 8.x).  Fast Web View 
> is set to "No".  I suspect that the problem has to do with the fact that it's 
> Lineraized or the fact that it uses ObjStm.  I don't have enough time to 
> trace through this, so I'm either going to revert back to PDFBox 1.3.1 or 
> pre-process all the ObjStm objects, save the uncompressed file, and then 
> process that.  The latter is less efficient, but I think it'll handle more 
> cases.  I just wanted to make sure to open an issue here on JIRA so we can 
> eventually get a proper solution to this problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-25) Import/Export of XML Data Package files (XDP)

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-25:
--

Component/s: AcroForm

> Import/Export of XML Data Package files (XDP)
> -
>
> Key: PDFBOX-25
> URL: https://issues.apache.org/jira/browse/PDFBOX-25
> Project: PDFBox
>  Issue Type: New Feature
>  Components: AcroForm
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1047923
> Originally submitted by thansson on 2004-10-15 10:34.
> Please, add support for import and export of XDP forms 
> data. Attached is a sample PDF form and the exported 
> XDP file. The PDF file was created using Adobe Designer 
> 6.0
> --
> Tomas
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552835&aid=1047923&file_id=105253
> Forms.zip (), 111975 bytes
> [comment on SourceForge]
> Originally sent by gnatware.
> Logged In: YES 
> user_id=73363
> I just started playing with Adobe Designer (60-day trial). 
> What we need (but will probably run up against Adobe patents
> trying to implement) is an XDP processor that does some or
> all of what Adobe Form Server (or LiveCycle Forms) does:
> take the form specified in the XDP file and generate client
> side HTML/Javascript via servlets and JSPs.  The
> HTML/Javascript (I'm guessing, since I don't have $3
> required to license Form Server) will include all the
> validation provided in the PDF form (through Adobe Reader),
> and the backend servlet will also be responsible for
> connecting to data sources (ODBC, XML w/ schema, etc),
> posting the updated data, providing pre-filled PDF and or
> XML versions of the data to be downloaded, emailed, etc.
> I suppose we could start by transforming the XDP using XSLT
> to generate simple text, checkbox, radio, select and submit
> inputs...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1883) Avoid StringIndexOutOfBoundsException in DateConverter

2014-02-09 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison closed PDFBOX-1883.
---

Resolution: Duplicate

PDFBOX-1803.  Apologies for duplicate!

> Avoid StringIndexOutOfBoundsException in DateConverter
> --
>
> Key: PDFBOX-1883
> URL: https://issues.apache.org/jira/browse/PDFBOX-1883
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.8.4
>Reporter: Tim Allison
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.8.5
>
> Attachments: PDFBOX-1883.patch
>
>
> Passing an empty string to parseDate can result in an 
> ArrayIndexOutOfBoundsException.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: 0
>   at java.lang.String.charAt(Unknown Source)
>   at 
> org.apache.pdfbox.util.DateConverter.parseDate(DateConverter.java:680)
>   at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:808)
>   at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:780)
>   at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:754)
>   at org.apache.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:797)
>   at 
> org.apache.pdfbox.pdmodel.PDDocumentInformation.getCreationDate(PDDocumentInformation.java:210)
>   at 
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:170)
>   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:142)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> I can't share the triggering document, but I'll submit patch with test case 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-248) Document outline landscape pages missing

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-248:
---

Component/s: (was: Parsing)
 PDModel

> Document outline landscape pages missing
> 
>
> Key: PDFBOX-248
> URL: https://issues.apache.org/jira/browse/PDFBOX-248
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1663446
> Originally submitted by pbeichert on 2007-02-19 04:55.
> I create an index which contains all pdf-bookmarks and the corrensponding 
> pages of the bookmarks. This works fine when all pages are in portrait. As 
> soon as one page inbetween is in landscape mode, the returned page number is 
> null.
> The result is the same with PDFBox 0.7.3 and with PDFBox-0.7.4-dev.
> Here is the code I use:
> //first map all pages to the page number:
> List allPages =doc.getDocumentCatalog().getAllPages(); 
> HashMap page2PageNumber = new HashMap(); 
> for (int i=0;i { 
>  page2PageNumber.put(allPages.get(i), new Integer(i+1));
> }
> PDDocumentOutline bookmark = doc.getDocumentCatalog().getDocumentOutline();
> PDOutlineItem current = bookmark.getFirstChild();
> // now iterate through the bookmarks
> ...
> System.out.println("Page Nr: " + 
> current.page2PageNumber.get(current.findDestinationPage(doc))
> + " Title = " current.getTitle());



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-244) XMP: Create TIFF Sample

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-244:
---

Summary: XMP: Create TIFF Sample  (was: Create TIFF Sample)

> XMP: Create TIFF Sample
> ---
>
> Key: PDFBOX-244
> URL: https://issues.apache.org/jira/browse/PDFBOX-244
> Project: PDFBox
>  Issue Type: New Feature
>  Components: JempBox
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=164503&atid=831990&aid=1655371
> Originally submitted by benlitchfield on 2007-02-08 08:45.
> Create a sample showing how to modify XMP data in a TIFF file.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-242) XMP: Create JPG Sample

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-242:
---

Summary: XMP: Create JPG Sample  (was: Create JPG Sample)

> XMP: Create JPG Sample
> --
>
> Key: PDFBOX-242
> URL: https://issues.apache.org/jira/browse/PDFBOX-242
> Project: PDFBox
>  Issue Type: New Feature
>  Components: JempBox
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=164503&atid=831990&aid=1654373
> Originally submitted by benlitchfield on 2007-02-07 09:17.
> Create a sample that will read/write metadata into a JPG file.
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> Originator: YES
> Many users have requested a JPG example, I tried loading a JPG with ImageIO 
> and it does not appear to give access to the XMP metadata stream :(
> If someone can supply a sample showing raw access to the XMP stream using 
> standard Java classes then please send it over to me.
> Thanks,
> Ben



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-248) Document outline landscape pages missing

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-248:
---

Summary: Document outline landscape pages missing  (was: landscape parsing 
problem)

> Document outline landscape pages missing
> 
>
> Key: PDFBOX-248
> URL: https://issues.apache.org/jira/browse/PDFBOX-248
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1663446
> Originally submitted by pbeichert on 2007-02-19 04:55.
> I create an index which contains all pdf-bookmarks and the corrensponding 
> pages of the bookmarks. This works fine when all pages are in portrait. As 
> soon as one page inbetween is in landscape mode, the returned page number is 
> null.
> The result is the same with PDFBox 0.7.3 and with PDFBox-0.7.4-dev.
> Here is the code I use:
> //first map all pages to the page number:
> List allPages =doc.getDocumentCatalog().getAllPages(); 
> HashMap page2PageNumber = new HashMap(); 
> for (int i=0;i { 
>  page2PageNumber.put(allPages.get(i), new Integer(i+1));
> }
> PDDocumentOutline bookmark = doc.getDocumentCatalog().getDocumentOutline();
> PDOutlineItem current = bookmark.getFirstChild();
> // now iterate through the bookmarks
> ...
> System.out.println("Page Nr: " + 
> current.page2PageNumber.get(current.findDestinationPage(doc))
> + " Title = " current.getTitle());



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1584) Add unit test for RandomAccessFileOutputStream

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1584:


Component/s: (was: Parsing)
 Writing

> Add unit test for RandomAccessFileOutputStream
> --
>
> Key: PDFBOX-1584
> URL: https://issues.apache.org/jira/browse/PDFBOX-1584
> Project: PDFBox
>  Issue Type: Test
>  Components: Writing
>Affects Versions: 1.8.1
>Reporter: Fredrik Kjellberg
>Priority: Minor
> Attachments: TestRandomAccessFileOutputStream_diff.txt
>
>
> This patch includes a unit test for RandomAccessFileOutputStream



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1759) NullPointerException when loading/saving a PDF

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1759:


Component/s: (was: Parsing)
 Writing

> NullPointerException when loading/saving a PDF
> --
>
> Key: PDFBOX-1759
> URL: https://issues.apache.org/jira/browse/PDFBOX-1759
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.2
> Environment: Ubuntu Linux & Windows 7 (both JDK6)
>Reporter: William Palmer
>Priority: Minor
>
> Loading and saving a PDF causes a NullPointerException:
> java.lang.NullPointerException
>   at org.apache.pdfbox.pdmodel.PDPageNode.updateCount(PDPageNode.java:98)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1319)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
> Files that cause this exception:
> https://github.com/openplanets/format-corpus/blob/master/pdfCabinetOfHorrors/encryption_openpassword.pdf
> https://github.com/openplanets/format-corpus/blob/master/pdfCabinetOfHorrors/encryption_notextaccess.pdf
> https://github.com/openplanets/format-corpus/blob/master/pdfCabinetOfHorrors/encryption_noprinting.pdf
> https://github.com/openplanets/format-corpus/blob/master/pdfCabinetOfHorrors/encryption_nocopy.pdf
> Code to reproduce:
> PDFParser parser = new PDFParser(new FileInputStream("file"));
> parser.parse();
> File temp = File.createTempFile("temp-", ".pdf");
> parser.getPDDocument().save(temp);
> parser.getDocument().close();
> I imagine the exception is thrown because the files have restrictions within 
> them but I would not expect to get a NullPointerException if that is the case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1256) [PATCH] Split PDFStreamEngine, moving functionality to simpler stream processor base class

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1256:


Component/s: (was: Text extraction)
 (was: Parsing)
 PDModel

> [PATCH] Split PDFStreamEngine, moving functionality to simpler stream 
> processor base class
> --
>
> Key: PDFBOX-1256
> URL: https://issues.apache.org/jira/browse/PDFBOX-1256
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.7.0, 2.0.0
> Environment: N/A
>Reporter: Craig Ringer
>Priority: Minor
>  Labels: api, refactoring, streams
> Attachments: 
> 0002-New-PDFStreamProcessor-base-of-PDFStreamEngine-adds-.patch
>
>
> The attached patch restructures PDFStreamEngine to move the basic 
> functionality of invoking callbacks for each operator in a stream into a 
> parent class. The parent class knows nothing about the meaning of operators, 
> it just invokes handlers with accumulated arguments whenever it encounters an 
> operator. PDFStreamEngine retains all the "knowledge" of what those operators 
> mean, the state of the graphics state stack, etc.
> The purpose of the change is to make it simpler and easier to use PDFBox's 
> PDF stream processor/parser code without dealing with the full features of 
> PDFStreamEngine with its built-in operator handlers, awareness of the 
> graphics stack, etc when that functionality isn't required. Specifically, I 
> needed to write a tool that copies a PDF stream, renaming resource references 
> as it goes but otherwise leaving it unchanged. I wanted to handle all 
> operators including future or unknown ones, and only needed to special-case a 
> couple of them. PDFStreamEngine was poorly suited to that because it doesn't 
> support a default handler fallback, tries to "understand" the stream, etc. 
> Rather than write a new class that duplicated much of PDFStreamEngine I 
> thought I'd try to factor the required functionality out, so others could use 
> it too.
> The changes should be backward compatible with existing code that uses 
> PDFStreamEngine. No changes in any PDFStreamEngine clients in PDFBox were 
> required for the test suite to pass, text extraction tool to work, etc. 
> Nonetheless, it's possible you'll only consider these changes for inclusion 
> in PDFBox 2.0, in which case they can be cleaned up to remove some of the 
> backward compatibility crap that's currently in them. Let me know.
> In terms of open issues or TODOs, the class naming could probably use work. I 
> can't rename PDFStreamEngine or OperatorProcessor for backward compatibility 
> reasons, so I've had to come up with more contrived names than I'd like.
> The logic of the changes is:
> - Move content stream argument accumulation and operator callback 
> functionality into new PDFStreamProcessor class
> - Add support for a default (fallback) handler to PDFStreamProcessor so 
> operators not explicitly matched may be handled
> - Modify PDFStreamEngine to extend PDFStreamProcessor, retaining all its 
> existing methods though some are now inherited.
> - Deprecate the properties-map based configuration of PDFStreamEngine because 
> it'll be fragile whenever more than one classloader is in use. Add 
> PDFStreamProcessor.replaceOperatorProcessors(...) for equivalent 
> functionality using a type-safe, multi-classloader-safe HashMap of operator 
> names to handler instances. This isn't added as a ctor override because 
> operator handler registration/unregistration methods are not final (to 
> preserve compatibility with PDFStreamEngine) and if overridden, they might 
> use data from a not-yet-initialized derived class. If a ctor override is 
> required then registerOperatorProcessor must be made final, breaking BC with 
> PDFStreamEngine.
> - Deprecate OperatorProcessor (the PDFStreamEngine operator handler class). 
> Instances of this are bound to a particular PDFStreamEngine via the `context' 
> property and they carry state when they don't have to. They're also an 
> abstract class, so handlers can't extend any other class. OperatorProcessor 
> based handlers continue to be supported just fine via a simple wrapper that's 
> used automatically where required.
> - Introduce new PDFStreamProcessor.OperatorHandler interface to replace 
> OperatorProcessor . It's a simple one-method interface that passes the 
> PDFStreamProcessor as an argument, so application designers are free to 
> choose whether to tie their OperationProcessorHandler implementations to 
> PDFStreamProcessor instances or whether they want to re-use the same handler 
> on many different processors. This change is useful for my app and removes 
> unnecessary stateful API, but isn't strictly n

[jira] [Updated] (PDFBOX-1256) [PATCH] Split PDFStreamEngine, moving functionality to simpler stream processor base class

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1256:


Priority: Major  (was: Minor)

> [PATCH] Split PDFStreamEngine, moving functionality to simpler stream 
> processor base class
> --
>
> Key: PDFBOX-1256
> URL: https://issues.apache.org/jira/browse/PDFBOX-1256
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.7.0, 2.0.0
> Environment: N/A
>Reporter: Craig Ringer
>  Labels: api, refactoring, streams
> Attachments: 
> 0002-New-PDFStreamProcessor-base-of-PDFStreamEngine-adds-.patch
>
>
> The attached patch restructures PDFStreamEngine to move the basic 
> functionality of invoking callbacks for each operator in a stream into a 
> parent class. The parent class knows nothing about the meaning of operators, 
> it just invokes handlers with accumulated arguments whenever it encounters an 
> operator. PDFStreamEngine retains all the "knowledge" of what those operators 
> mean, the state of the graphics state stack, etc.
> The purpose of the change is to make it simpler and easier to use PDFBox's 
> PDF stream processor/parser code without dealing with the full features of 
> PDFStreamEngine with its built-in operator handlers, awareness of the 
> graphics stack, etc when that functionality isn't required. Specifically, I 
> needed to write a tool that copies a PDF stream, renaming resource references 
> as it goes but otherwise leaving it unchanged. I wanted to handle all 
> operators including future or unknown ones, and only needed to special-case a 
> couple of them. PDFStreamEngine was poorly suited to that because it doesn't 
> support a default handler fallback, tries to "understand" the stream, etc. 
> Rather than write a new class that duplicated much of PDFStreamEngine I 
> thought I'd try to factor the required functionality out, so others could use 
> it too.
> The changes should be backward compatible with existing code that uses 
> PDFStreamEngine. No changes in any PDFStreamEngine clients in PDFBox were 
> required for the test suite to pass, text extraction tool to work, etc. 
> Nonetheless, it's possible you'll only consider these changes for inclusion 
> in PDFBox 2.0, in which case they can be cleaned up to remove some of the 
> backward compatibility crap that's currently in them. Let me know.
> In terms of open issues or TODOs, the class naming could probably use work. I 
> can't rename PDFStreamEngine or OperatorProcessor for backward compatibility 
> reasons, so I've had to come up with more contrived names than I'd like.
> The logic of the changes is:
> - Move content stream argument accumulation and operator callback 
> functionality into new PDFStreamProcessor class
> - Add support for a default (fallback) handler to PDFStreamProcessor so 
> operators not explicitly matched may be handled
> - Modify PDFStreamEngine to extend PDFStreamProcessor, retaining all its 
> existing methods though some are now inherited.
> - Deprecate the properties-map based configuration of PDFStreamEngine because 
> it'll be fragile whenever more than one classloader is in use. Add 
> PDFStreamProcessor.replaceOperatorProcessors(...) for equivalent 
> functionality using a type-safe, multi-classloader-safe HashMap of operator 
> names to handler instances. This isn't added as a ctor override because 
> operator handler registration/unregistration methods are not final (to 
> preserve compatibility with PDFStreamEngine) and if overridden, they might 
> use data from a not-yet-initialized derived class. If a ctor override is 
> required then registerOperatorProcessor must be made final, breaking BC with 
> PDFStreamEngine.
> - Deprecate OperatorProcessor (the PDFStreamEngine operator handler class). 
> Instances of this are bound to a particular PDFStreamEngine via the `context' 
> property and they carry state when they don't have to. They're also an 
> abstract class, so handlers can't extend any other class. OperatorProcessor 
> based handlers continue to be supported just fine via a simple wrapper that's 
> used automatically where required.
> - Introduce new PDFStreamProcessor.OperatorHandler interface to replace 
> OperatorProcessor . It's a simple one-method interface that passes the 
> PDFStreamProcessor as an argument, so application designers are free to 
> choose whether to tie their OperationProcessorHandler implementations to 
> PDFStreamProcessor instances or whether they want to re-use the same handler 
> on many different processors. This change is useful for my app and removes 
> unnecessary stateful API, but isn't strictly necessary and can be dropped 
> while retaining the PDFStreamEngine / PDFStreamProcessor split. As pa

[jira] [Updated] (PDFBOX-1167) PDFStreamEngine#processSubStream should throw original IOException instead of RuntimeException + FIX

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1167:


Component/s: (was: Parsing)

> PDFStreamEngine#processSubStream should throw original IOException instead of 
> RuntimeException + FIX
> 
>
> Key: PDFBOX-1167
> URL: https://issues.apache.org/jira/browse/PDFBOX-1167
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Timo Boehme
>Priority: Minor
> Attachments: PDFStreamEngine_processSubStream.java
>
>
> PDFStreamEngine#processSubStream(COSStream) uses TokenIterator from 
> PDFStreamParser. This iterator when called hasNext() might face an 
> IOException which it wraps insight a RuntimeException (because hasNext has no 
> declared exceptions). Now when such an IOException occurs it normally won't 
> be handled by any calling class  because only IOException is declared to be 
> thrown but none of the classes is prepared to handle RuntimeException. 
> Therefore within the mentioned method processSubStream the RuntimeException 
> should be catched and tested for embedded IOException. In this case the 
> IOException should be thrown.
> I will attach the processSubStream method with the added catch block (method 
> taken from rev. 1163297).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1833) BaseParser tidy up

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1833:


Priority: Minor  (was: Major)

> BaseParser tidy up
> --
>
> Key: PDFBOX-1833
> URL: https://issues.apache.org/jira/browse/PDFBOX-1833
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Jens Kapitza
>Priority: Minor
> Attachments: baseparser.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Tidy up logic (should not change the parsing result)
> Character.isWhitespace(c) is the only point wich may have site effects (but i 
> assume there is no File-Seperator in parseCOSHexString)
> so this should pass as it passes befor.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1731) Converting pdf to Image

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1731:


Component/s: (was: Parsing)

> Converting pdf to Image
> ---
>
> Key: PDFBOX-1731
> URL: https://issues.apache.org/jira/browse/PDFBOX-1731
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.2
> Environment: Windows 8  and Linux 
> JDK 1.7
>Reporter: Paulo R C Mello Junior
>  Labels: newbie
>
> I'm trying to convert a pdf page to image but an exception occurs:
> 17:28:20,652 ERROR [org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap] 
> (Thread-69) Something went wrong ... the pixelmap doesn't contain any data.
> 17:28:20,654 WARN  [org.apache.pdfbox.util.operator.pagedrawer.Invoke] 
> (Thread-69) getRGBImage returned NULL
> 17:28:20,661 INFO  [org.apache.pdfbox.util.PDFStreamEngine] (Thread-69) 
> unsupported/disabled operation: i
> 17:28:36,809 ERROR [stderr] (Thread-70) Exception in thread "Thread-70" 
> java.lang.OutOfMemoryError: Java heap space
> 17:28:36,811 ERROR [stderr] (Thread-70)   at 
> java.awt.image.DataBufferByte.(DataBufferByte.java:92)
> 17:28:36,812 ERROR [stderr] (Thread-70)   at 
> java.awt.image.ComponentSampleModel.createDataBuffer(ComponentSampleModel.java:415)
> 17:28:36,814 ERROR [stderr] (Thread-70)   at 
> java.awt.image.Raster.createWritableRaster(Raster.java:941)
> 17:28:36,814 ERROR [stderr] (Thread-70)   at 
> javax.imageio.ImageTypeSpecifier.createBufferedImage(ImageTypeSpecifier.java:1073)
> 17:28:36,815 ERROR [stderr] (Thread-70)   at 
> javax.imageio.ImageReader.getDestination(ImageReader.java:2896)
> 17:28:36,816 ERROR [stderr] (Thread-70)   at 
> com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1066)
> 17:28:36,817 ERROR [stderr] (Thread-70)   at 
> com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1034)
> 17:28:36,818 ERROR [stderr] (Thread-70)   at 
> javax.imageio.ImageIO.read(ImageIO.java:1448)
> 17:28:36,818 ERROR [stderr] (Thread-70)   at 
> javax.imageio.ImageIO.read(ImageIO.java:1352)
> 17:28:36,819 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDJpeg.getRGBImage(PDJpeg.java:264)
> 17:28:36,820 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:83)
> 17:28:36,821 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
> 17:28:36,823 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> 17:28:36,824 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> 17:28:36,825 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> 17:28:36,826 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
> 17:28:36,827 ERROR [stderr] (Thread-70)   at 
> org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:769)
> My code:
> public static List getPdfPagesAsImages(String pdfPath)
>   throws IOException {
>   File f = new File(pdfPath);
>   PDDocument pdfDocument = null;
>   pdfDocument = PDDocument.loadNonSeq(f, null);
>   List bImages = new ArrayList();
>   try {
>   System.out.println(pdfPath);
>   int resolution = 185;
>   if (pdfDocument != null) {
>   @SuppressWarnings("unchecked")
>   List pages = (List) pdfDocument
>   
> .getDocumentCatalog().getAllPages();
>   for (PDPage p : pages) {
>   BufferedImage convertedImage = 
> p.convertToImage(
>   
> BufferedImage.TYPE_INT_RGB, resolution);
>   if (isNegativeImage(convertedImage)) {
>   
> bImages.add(invertNegativeImage(convertedImage));
>   } else {
>   bImages.add(convertedImage);
>   }
>   }
>   }
>   } catch (FileNotFoundException e) {
>   e.printStackTrace();
>   e.getMessage();
>   e.getCause();
>   } catch (IOException e) {
>   e.printStackTrace();
>   e.getMessage();
>   e

[jira] [Updated] (PDFBOX-1715) java.lang.OutOfMemoryError when extracting images

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1715:


Component/s: (was: Parsing)

> java.lang.OutOfMemoryError when extracting images
> -
>
> Key: PDFBOX-1715
> URL: https://issues.apache.org/jira/browse/PDFBOX-1715
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.1
> Environment: LSB Version:
> :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
> Distributor ID: CentOS
> Description:CentOS release 4.7 (Final)
> Release:4.7
> Codename:   Final
> Java 1.6.0
>Reporter: sarathy
>
> We are trying to extract images from PDF file. As part of that, we are 
> converting a PDPage into an image. using PDPage.convertImage method. Its a 52 
> page document.
> At that time, We are seeing the following trace:
> Here are the steps:
> PDDocument document = PDDocument.load(inputStream);
> List pages = document.getDocumentCatalog().getAllPages();
> for (PDPage pdPage : pages) {
>if (pdPage.getResources() != null && pdPage.getResources().getImages() != 
> null)
>  PageInfo  page = new PageInfo(pdPage, true, rotation);
>  ...
>}
> }
> In PageInfo, we are doing:
> BufferedImage bimage = page.convertToImage();
> And after processing about 12 or so pages, it starts complaining as follows.
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:263)
> at 
> org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:222)
> at 
> org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
> at java.io.OutputStream.write(OutputStream.java:75)
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237)
> at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172)
> at 
> org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:231)
> at 
> org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:509)
> at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:185)
> at 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:83)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> at 
> org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
> at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
> at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
> at oss.rcpt.PageInfo.(PageInfo.java:328)
> at oss.utl.PDFImageSplitter.execute(PDFImageSplitter.java:217)
> at oss.utl.PDFUtilities.getImageCount(PDFUtilities.java:165)
> at cms.utl.PDFImageOperations.main(PDFImageOperations.java:157)
> when we run this from command line, 
> * if we set -Xms=512m and -Xmx=512m, its complaining after 12 pages.
> * if we set -Xms=1024m and -Xmx=1024m, its complaining after 27 pages.
> On the side, we are also getting "Colour key masking isn't supported" message 
> for each image in the file.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1532) extra space added to rotated text

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1532:


Component/s: (was: Parsing)
 Text extraction

> extra space added to rotated text 
> --
>
> Key: PDFBOX-1532
> URL: https://issues.apache.org/jira/browse/PDFBOX-1532
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.7.1, 1.8.0
>Reporter: Jinder Aujla
> Attachments: 0049-My-squashed-commits.patch, rotated.pdf
>
>
> Extra line break added after first character is read in a document that has 
> rotated text.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1507) Getting Issue at text reading

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1507:


Component/s: .NET

> Getting Issue at text reading 
> --
>
> Key: PDFBOX-1507
> URL: https://issues.apache.org/jira/browse/PDFBOX-1507
> Project: PDFBox
>  Issue Type: Bug
>  Components: .NET, Text extraction
>Affects Versions: 1.7.1
> Environment: windows, runing pdfbox in .Net using ikvm-7.2.4630.5 
> conversion , we are actually converting pdf into ALTO file
>Reporter: Tanmay Mandal
> Attachments: Pdf2Text.zip
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  xmlns="http://www.loc.gov/standards/
> alto/alto-v2.0.xsd">inch1200 scription>
> 
> 
> 
> 
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.pdfbox.util.PDFTextStripper.processTextPosition(PDFTextStr
> ipper.java:954)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
> gine.java:498)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.j
> ava:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
> e.java:556)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:271)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:237)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:218)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.processDocumen
> ts(PrintWordLocation.cs:185)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.Main(PrintWord
> Location.cs:228)
> at cli.System.AppDomain._nExecuteAssembly(Unknown Source)
> at cli.System.AppDomain.ExecuteAssembly(Unknown Source)
> at 
> cli.Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly(U
> nknown Source)
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.pdfbox.util.PDFTextStripper.processTextPosition(PDFTextStr
> ipper.java:954)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
> gine.java:498)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.j
> ava:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
> e.java:556)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:271)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:237)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:218)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.processDocumen
> ts(PrintWordLocation.cs:185)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.Main(PrintWord
> Location.cs:228)
> at cli.System.AppDomain._nExecuteAssembly(Unknown Source)
> at cli.System.AppDomain.ExecuteAssembly(Unknown Source)
> at 
> cli.Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly(U
> nknown Source)
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.pdfbox.util.PDFTextStripper.processTextPosition(PDFTextStr
> ipper.java:954)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
> gine.java:498)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.j
> ava:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
> e.java:556)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:271)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:237)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:218)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.processDocumen
> ts(PrintWordLocation.cs:185)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.Main(PrintWord
> Location.cs:228)
> at cli.System.AppDomain._nExecuteAssembly(Unknown Source)
> at cli.System.AppDomain.ExecuteAssembly(Unknown Source)
> at 
> cli.Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly(U
> nknown Source)
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
>

[jira] [Updated] (PDFBOX-1507) Getting Issue at text reading

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1507:


Component/s: (was: Parsing)
 Text extraction

> Getting Issue at text reading 
> --
>
> Key: PDFBOX-1507
> URL: https://issues.apache.org/jira/browse/PDFBOX-1507
> Project: PDFBox
>  Issue Type: Bug
>  Components: .NET, Text extraction
>Affects Versions: 1.7.1
> Environment: windows, runing pdfbox in .Net using ikvm-7.2.4630.5 
> conversion , we are actually converting pdf into ALTO file
>Reporter: Tanmay Mandal
> Attachments: Pdf2Text.zip
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  xmlns="http://www.loc.gov/standards/
> alto/alto-v2.0.xsd">inch1200 scription>
> 
> 
> 
> 
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.pdfbox.util.PDFTextStripper.processTextPosition(PDFTextStr
> ipper.java:954)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
> gine.java:498)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.j
> ava:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
> e.java:556)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:271)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:237)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:218)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.processDocumen
> ts(PrintWordLocation.cs:185)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.Main(PrintWord
> Location.cs:228)
> at cli.System.AppDomain._nExecuteAssembly(Unknown Source)
> at cli.System.AppDomain.ExecuteAssembly(Unknown Source)
> at 
> cli.Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly(U
> nknown Source)
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.pdfbox.util.PDFTextStripper.processTextPosition(PDFTextStr
> ipper.java:954)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
> gine.java:498)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.j
> ava:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
> e.java:556)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:271)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:237)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:218)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.processDocumen
> ts(PrintWordLocation.cs:185)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.Main(PrintWord
> Location.cs:228)
> at cli.System.AppDomain._nExecuteAssembly(Unknown Source)
> at cli.System.AppDomain.ExecuteAssembly(Unknown Source)
> at 
> cli.Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly(U
> nknown Source)
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.pdfbox.util.PDFTextStripper.processTextPosition(PDFTextStr
> ipper.java:954)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEn
> gine.java:498)
> at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.j
> ava:62)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngin
> e.java:556)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:271)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi
> ne.java:237)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.
> java:218)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.processDocumen
> ts(PrintWordLocation.cs:185)
> at 
> cli.org.apache.pdfbox.examples.util.PrintWordLocations.Main(PrintWord
> Location.cs:228)
> at cli.System.AppDomain._nExecuteAssembly(Unknown Source)
> at cli.System.AppDomain.ExecuteAssembly(Unknown Source)
> at 
> cli.Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly(U
> nknown Source)
> Feb 04, 2013 8:40:03 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
> WARNING: java.lang.NullPoint

[jira] [Updated] (PDFBOX-1284) PDFBox giving ??? characters

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1284:


Component/s: (was: Parsing)
 Text extraction

> PDFBox giving ??? characters
> 
>
> Key: PDFBOX-1284
> URL: https://issues.apache.org/jira/browse/PDFBOX-1284
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.6.0
>Reporter: Ravi Kumar
>  Labels: textextraction
> Attachments: aaa1.pdf
>
>
> I wrote sample standalone application with 1.6 version for pdf reading. 
> Parser giving ??? characters particular PDF, few of other PDFs are working 
> fine.
> Is there any problem with PDF file, but i have checked with other vendor 
> parsers it is giving proper text.I am getting these ??? characters from 
> PDFBox only.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-712) SecurityHandlersManager May stop the application Server when running PDFParser in a Servlet.

2014-02-09 Thread peter_lena...@ibi.com (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896124#comment-13896124
 ] 

peter_lena...@ibi.com commented on PDFBOX-712:
--

I am currently on vacation
I have no access to e-mail.

Peter



> SecurityHandlersManager May stop the application Server when running 
> PDFParser in a Servlet.
> 
>
> Key: PDFBOX-712
> URL: https://issues.apache.org/jira/browse/PDFBOX-712
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 1.1.0
>Reporter: peter_lena...@ibi.com
>Priority: Minor
>
> When parsing a PDF document within an Application Server, you should never 
> have a code path which call System.exit()
> I am not sure what invokes the Class 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandlersManager() from with the 
> code, so I have not a clue how to fix this.
> I imagine that the best place to notify PDFBox that it is running in an 
> application would be something like this.
> PDDocument.setApplication(true or false);
> I would like to be able to tell the Parser that it is not running as an 
> application so this code is never hit, but I did not see a way to do this.
>
> catch(Exception e)
> {
> System.err.println("SecurityHandlersManager strange error with 
> builtin handlers: " + e.getMessage());
> System.exit(1);
> }
> Bug: new org.apache.pdfbox.pdmodel.encryption.SecurityHandlersManager() 
> invokes System.exit(...), which shuts down the entire virtual machine
> Pattern id: DM_EXIT, type: Dm, category: BAD_PRACTICE
> Invoking System.exit shuts down the entire Java virtual machine. This should 
> only been done when it is appropriate. Such calls make it hard or impossible 
> for your code to be invoked by other code. Consider throwing a 
> RuntimeException instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1207) PDFPageProcessor.processStream() take 10 minutes to return

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1207:


Component/s: (was: Parsing)

> PDFPageProcessor.processStream() take 10 minutes to return
> --
>
> Key: PDFBOX-1207
> URL: https://issues.apache.org/jira/browse/PDFBOX-1207
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: Seen on multiple platforms
>Reporter: Dan Krause
>
> Attempting to extract images and text from each page. Long processing time is 
> specific to this file: 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/pdf/Installation_Guide/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US.pdf



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1061) PDFBox can't correctly extract text after bullet punctuation

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1061:


Component/s: (was: Parsing)

> PDFBox can't correctly extract text after bullet punctuation
> 
>
> Key: PDFBOX-1061
> URL: https://issues.apache.org/jira/browse/PDFBOX-1061
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.5.0, 1.6.0, 1.7.0
> Environment: jdk.1.6
>Reporter: Funfel
> Attachments: PageFrom_DURP2011_115_0666_01_p7.html, 
> PageFrom_DURP2011_115_0666_01_p7.pdf
>
>
> PDFBox can't correctly extract text after bullet punctuation.
> After a bullet punctuation whole line gets strange encoding, but the next 
> line is correct.
> Probably some font/encoding problem.
> (tested on ver. 1.5.0, 1.6.0, 1.7.0-SNAPSHOT)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1086) Error when decoding CCITT compressed data that contains EOLs, fill bits etc.

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1086:


Component/s: (was: Parsing)

> Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
> 
>
> Key: PDFBOX-1086
> URL: https://issues.apache.org/jira/browse/PDFBOX-1086
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Jeremias Maerki
>Assignee: Jeremias Maerki
>
> The TIFFFaxDecoder class (originally coming from JAI via XML Graphics 
> Commons) does not handle cases like EOLs between lines and in front. But the 
> PDF CCITTFaxDecode filter needs to allow many different variants of the 
> encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT 
> data, so TIFFFaxDecoder was not written to be as flexible as we need it. 
> Ideally, PDFBox should handle anything that gets thrown at it.
> It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with 
> the necessary flexibility. So, new decoders for T.4 and T.6 should probably 
> be written.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-712) SecurityHandlersManager May stop the application Server when running PDFParser in a Servlet.

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-712:
---

Priority: Minor  (was: Major)

> SecurityHandlersManager May stop the application Server when running 
> PDFParser in a Servlet.
> 
>
> Key: PDFBOX-712
> URL: https://issues.apache.org/jira/browse/PDFBOX-712
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 1.1.0
>Reporter: peter_lena...@ibi.com
>Priority: Minor
>
> When parsing a PDF document within an Application Server, you should never 
> have a code path which call System.exit()
> I am not sure what invokes the Class 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandlersManager() from with the 
> code, so I have not a clue how to fix this.
> I imagine that the best place to notify PDFBox that it is running in an 
> application would be something like this.
> PDDocument.setApplication(true or false);
> I would like to be able to tell the Parser that it is not running as an 
> application so this code is never hit, but I did not see a way to do this.
>
> catch(Exception e)
> {
> System.err.println("SecurityHandlersManager strange error with 
> builtin handlers: " + e.getMessage());
> System.exit(1);
> }
> Bug: new org.apache.pdfbox.pdmodel.encryption.SecurityHandlersManager() 
> invokes System.exit(...), which shuts down the entire virtual machine
> Pattern id: DM_EXIT, type: Dm, category: BAD_PRACTICE
> Invoking System.exit shuts down the entire Java virtual machine. This should 
> only been done when it is appropriate. Such calls make it hard or impossible 
> for your code to be invoked by other code. Consider throwing a 
> RuntimeException instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-712) SecurityHandlersManager May stop the application Server when running PDFParser in a Servlet.

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-712:
---

Component/s: (was: Parsing)
 PDModel

> SecurityHandlersManager May stop the application Server when running 
> PDFParser in a Servlet.
> 
>
> Key: PDFBOX-712
> URL: https://issues.apache.org/jira/browse/PDFBOX-712
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 1.1.0
>Reporter: peter_lena...@ibi.com
>
> When parsing a PDF document within an Application Server, you should never 
> have a code path which call System.exit()
> I am not sure what invokes the Class 
> org.apache.pdfbox.pdmodel.encryption.SecurityHandlersManager() from with the 
> code, so I have not a clue how to fix this.
> I imagine that the best place to notify PDFBox that it is running in an 
> application would be something like this.
> PDDocument.setApplication(true or false);
> I would like to be able to tell the Parser that it is not running as an 
> application so this code is never hit, but I did not see a way to do this.
>
> catch(Exception e)
> {
> System.err.println("SecurityHandlersManager strange error with 
> builtin handlers: " + e.getMessage());
> System.exit(1);
> }
> Bug: new org.apache.pdfbox.pdmodel.encryption.SecurityHandlersManager() 
> invokes System.exit(...), which shuts down the entire virtual machine
> Pattern id: DM_EXIT, type: Dm, category: BAD_PRACTICE
> Invoking System.exit shuts down the entire Java virtual machine. This should 
> only been done when it is appropriate. Such calls make it hard or impossible 
> for your code to be invoked by other code. Consider throwing a 
> RuntimeException instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-800) Wrong text extract from vertical textboxes in pdf files

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-800:
---

Description: 
Vertical textboxes in pdf files are not extracted correctly (using the tika 
library in C#).
For example if there is a vertical textbox "hello" in a pdf file (!WITHOUT! 
line breaks):
H
E
L
L
O
the parser returns 5 strings, each with a single letter, even there is NO line 
break after every letter.
Is there a option to avoid this problem?

  was:
I was told to move this issue to the pdfbox parser, so I hope this is the right 
section.
Vertical textboxes in pdf files are not extracted correctly (using the tika 
library in C#).
For example if there is a vertical textbox "hello" in a pdf file (!WITHOUT! 
line breaks):
H
E
L
L
O
the parser returns 5 strings, each with a single letter, even there is NO line 
break after every letter.
Is there a option to avoid this problem?


> Wrong text extract from vertical textboxes in pdf files
> ---
>
> Key: PDFBOX-800
> URL: https://issues.apache.org/jira/browse/PDFBOX-800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
> Environment: Windows 7, VS 2010 C#, Tika Library
>Reporter: Sandor Dj
> Attachments: problemdoc.doc, problemdoc.pdf
>
>
> Vertical textboxes in pdf files are not extracted correctly (using the tika 
> library in C#).
> For example if there is a vertical textbox "hello" in a pdf file (!WITHOUT! 
> line breaks):
> H
> E
> L
> L
> O
> the parser returns 5 strings, each with a single letter, even there is NO 
> line break after every letter.
> Is there a option to avoid this problem?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-800) Wrong text extract from vertical textboxes in pdf files

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-800:
---

Component/s: (was: Parsing)
 Text extraction

> Wrong text extract from vertical textboxes in pdf files
> ---
>
> Key: PDFBOX-800
> URL: https://issues.apache.org/jira/browse/PDFBOX-800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
> Environment: Windows 7, VS 2010 C#, Tika Library
>Reporter: Sandor Dj
> Attachments: problemdoc.doc, problemdoc.pdf
>
>
> Vertical textboxes in pdf files are not extracted correctly (using the tika 
> library in C#).
> For example if there is a vertical textbox "hello" in a pdf file (!WITHOUT! 
> line breaks):
> H
> E
> L
> L
> O
> the parser returns 5 strings, each with a single letter, even there is NO 
> line break after every letter.
> Is there a option to avoid this problem?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-982) Unable to convert valid pdf to html

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-982:
---

Component/s: (was: Text extraction)

> Unable to convert valid pdf to html
> ---
>
> Key: PDFBOX-982
> URL: https://issues.apache.org/jira/browse/PDFBOX-982
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.5.0
> Environment: Ubuntu X86_64, Win7 X86_64
>Reporter: Varun Bhansaly
> Attachments: team21_devel.pdf
>
>
> Encountered an exception while converting a pdf to HTML/ text using 
> pdfbox-app-1.5.0.
> The file in this case is "team21_devel.pdf", please note this seems to be a 
> valid PDF as it gets opened in adobe reader.
> I have used the command line utility as 
> java -jar pdfbox-app-1.5.0.jar ExtractText -html team21_devel.pdf 
> The Exception :
> ExtractText failed with the following exception:
> java.io.IOException: Expected='null' actual='nullnullnull'
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1025)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:802)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1011)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:179)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:292)
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:1000)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:533)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:180)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:881)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:846)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:771)
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:179)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:42)
> Related markmail - 
> http://pdfbox-users.markmail.org/message/4hu3awc4a775fczz?q=type:users+list:org%2Eapache%2Epdfbox%2Eusers&page=1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-816) 1.2.1 - PDFTextStripper* uses different Y values when cropbox has non-zero Y: not so for X coordinates.

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-816:
---

Component/s: (was: Parsing)
 Utilities

> 1.2.1 - PDFTextStripper* uses different Y values when cropbox has non-zero Y: 
> not so for X coordinates.
> ---
>
> Key: PDFBOX-816
> URL: https://issues.apache.org/jira/browse/PDFBOX-816
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.2.1
> Environment: Mac OS X 10.6.4 + JDK 1.6.0_20, and Ubuntu 10.4 (kernel 
> 2.6.32-24) with Sun Java 1.6.0_20. 
>Reporter: Larry West
> Attachments: TaxReturn-1.pdf
>
>   Original Estimate: 31h
>  Remaining Estimate: 31h
>
> [First off, kudos to the folks who work on PDFBox.  It's got some great 
> functionality.]
> The issue is that a cropbox with non-zero "lower-left-corner" changes 
> positions reported for text by PDFTextStripper.  In the Y coordinate only.
> See page 5 of the attached PDF (which is a phony tax return, not real data).
> As an example, near the top is the tax year, "2009".   Using a program such 
> as Apple's Preview, one would estimate that a snug bounding rectangle for 
> that text would be x=300, y=54, w=41, and h=18.
> And on other PDFs, that would be fine with PDFTextStripperByArea.  But this 
> PDF has a non-zero-origin cropbox set, one with the alleged lower-left-corner 
> at [-24.0, -24.0].   So the region coordinates that PDFTextStripperByArea 
> wants to see need to be offset by subtracting -24 from x and y, i.e., 
> yielding x=324, y=78.
> Or so you would think.  It turns out that the X coordinate stays the same, 
> only the Y coordinate gets affected by the cropbox setting.
> Using the sample program PrintTextLocations, which, like 
> PDFTextStripperByArea, derives from PDFTextStripper, reports both coordinates 
> as being offset by 24 in its processTextPosition():
> ...
> String[92.0,94.0 fs=12.0 xscale=1.0 height=9.0720005 space=3.3360004 
> width=186.71997]U.S. Individual Income Tax Retur
> String[278.71997,94.0 fs=12.0 xscale=1.0 height=9.0720005 space=3.3360004 
> width=7.3320007]n
> String[301.0,94.0 fs=18.0 xscale=1.0 height=13.122001 space=5.0040007 
> width=30.023987]200
> String[331.024,94.0 fs=18.0 xscale=1.0 height=13.122001 space=5.0040007 
> width=10.007996]9
> String[368.0,94.0 fs=8.0 xscale=1.0 height=7.525 space=2.2240002 
> width=11.559998](99
> String[379.56,94.0 fs=8.0 xscale=1.0 height=7.525 space=2.2240002 
> width=2.6640015])
> String[399.0,94.0 fs=6.0 xscale=1.0 height=4.5360003 space=1.6680002 
> width=36.34201]IRS Use Only
> ...
> (Lines 3 and 4 are the only key ones, the others for comparison).  
> To make sense of this: 301.0 is close enough to 300 for the X coordinate: if 
> I go to 324, I don't get the "200", just the "9".
> Also, the y coordinate of 94 is the bottom of the text, with a height of 18 
> that roughly extends up to 76, but 78 works as far as extractRegions() is 
> concerned (I think it only cares about the lower-left corner of each 
> character).
> So the bounding rectangle reported above for "2009" is lower-left corner 
> (301.0, 94.0) to upper-right corner approx (341.03, 76).
> In those coordinates, the region that works with extractRegions() is LL (300, 
> 96) to UR (341, 78).
> (Or, the exact Rectangle2D I pass to extractRegions: x=300, y=78, w=41, h=18).
> This applies to any field you choose on this page.
> So:
> (a) it doesn't seem to me that a cropbox has any business changing the 
> coordinates.  But I could be wrong.
> (b) if it does make sense for a cropbox to affect the coordinates, it should 
> do so in both X and Y dimensions, shouldn't it?
> (c) I suppose it would be too much to ask for notes explaining the 
> coordinates used for each method, but it's a nice thought.
> I tried looking through PDFTextStripper* but I'm not sufficiently familiar 
> with the code to determine where the coordinate perturbation occurs.   It 
> might be in how PDFStreamEngine.processEncodedText() is using the 
> graphicsState (initialized with the cropbox) to transform textMatrixStDisp, 
> but that seems to be initialized with a Dimension, so I don't see how an 
> offset would affect it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1895) Font definitions must precede font references

2014-02-09 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896123#comment-13896123
 ] 

John Hewson commented on PDFBOX-1895:
-

I don't know what you mean by "font descriptions" or  "font references", you 
need to provide a clear description of the problem using the standard 
terminology from the [PDF 
Reference|http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf]
 or attach a PDF file which demonstrates the problem.

> Font definitions must precede font references
> -
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1895) Font definitions must precede font references

2014-02-09 Thread Pat Hickey (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896121#comment-13896121
 ] 

Pat Hickey commented on PDFBOX-1895:


Did you make a change that would insure that the font descriptors are written 
before the fonts?
Could you point me to the code? I'd like to change our local copy (perhaps by 
deriving new classes).
Thanks!


> Font definitions must precede font references
> -
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1667) org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent throws Exception while it can throw IOException instead

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1667:


Component/s: (was: FontBox)
 PDModel

> org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent throws Exception 
> while it can throw IOException instead
> ---
>
> Key: PDFBOX-1667
> URL: https://issues.apache.org/jira/browse/PDFBOX-1667
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Max Gilead
>Priority: Trivial
>  Labels: easyfix
> Fix For: 2.0.0
>
> Attachments: PDFBOX-1667.patch
>
>
> public PDOutputIntent(PDDocument doc, InputStream colorProfile) throws 
> Exception
> can be
> public PDOutputIntent(PDDocument doc, InputStream colorProfile) throws 
> IOException



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1571) Images Look Fuzzy

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1571:


Component/s: (was: JempBox)
 (was: FontBox)
 Rendering

> Images Look Fuzzy 
> --
>
> Key: PDFBOX-1571
> URL: https://issues.apache.org/jira/browse/PDFBOX-1571
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.7.0, 1.8.1
> Environment: Java, Linux, Mac OS
>Reporter: Harippriya Parameswaran
>Priority: Minor
>  Labels: Fuzzy
> Attachments: Test.pdf, text.jpeg
>
>
> I tried to create a image and superimpose on the PDF, the image looks fuzzy. 
> Also if i try to zoom the image created it looks fuzzy even for the first 
> Zoom Out. 
> I tried increasing the resolution this is increasing the image size while 
> superimposing and even looks more fuzzy with a huge image. Tried with the 
> below piece of code.
>  String fontFile = "Monika.ttf";
>   String text="Hello";
>  float fontSize = 20;
> Font font = null;
> try {
> 
>  font = Font.createFont(Font.TRUETYPE_FONT,
>  Sample.class.getClassLoader()
>  .getResource(fontFile).openStream());
>  font = font.deriveFont(fontSize);
> } catch (Exception e) {
> throw new IOException("could not load TrueTypeFont for file: "
> + fontFile, e);
> }
> FontRenderContext fc = new FontRenderContext(null, true, true);
> Rectangle2D bounds = font.getStringBounds(text, fc);
> int width = (int) bounds.getWidth();
> int height = (int) bounds.getHeight();
> int maxWidth = 500;
> int maxHeight = 50;
> int minFontSize = 20;
> while (width > (maxWidth - 2 * 5)
> || height > (maxHeight - 2 * 5)) {
> if (fontSize <= minFontSize) {
> break;
> }
> fontSize--;
> font = font.deriveFont(fontSize);
> bounds = font.getStringBounds(text, fc);
> width = (int) bounds.getWidth();
> height = (int) bounds.getHeight();
> }
> int paddingWidth = 5;
> int paddingHeight = 5;
> BufferedImage buffer = null;
> PDDocument doc = new PDDocument();
> PDPage page = new PDPage(new PDRectangle(width + 2 * 
> paddingWidth,
> height + 2 * paddingHeight));
> 
> BufferedImage newBufferedImage = ImageIO.read(Sample.class
> .getClassLoader()
> 
> .getResource(Sample.get("image.blue.background"))
> .openStream());
> PDJpeg newImage = new PDJpeg(doc, newBufferedImage);
> PDFont pdffont = null;
> try {
> 
>  pdffont = PDTrueTypeFont.loadTTF(doc,
>  
> Sample.class.getClassLoader().getResource(fontFile).openStream());
> 
> } catch (Exception e) {
> throw new IOException(
> "could not load PDTrueTypeFont for 
> file: " + fontFile,
> e);
> }
> PDPageContentStream stream = new PDPageContentStream(doc, 
> page);
> stream.drawImage(newImage, 0, 0);
> stream.setNonStrokingColor(Color.BLACK);
> stream.setStrokingColor(Color.BLACK);
> stream.beginText();
> stream.setFont(pdffont, fontSize);
> stream.moveTextPositionByAmount(paddingWidth,
> (float) (height / 2.5 + paddingHeight));
> stream.drawString(text);
> stream.endText();
> stream.close();
> buffer = page.convertToImage(BufferedImage.TYPE_INT_RGB,
> 94);
> 
>// Convert image image to PDXObjectImage
> PDXObjectImage watermark = new PDJpeg(doc, buffer);
> @SuppressWarnings("rawtypes")
> List pages = doc.getDocumentCatalog().getAllPages();
> Iterator iterator1 = pages.iterator();
> PDPage page1=null;
> while(iterator1.hasNext()){
> page1 = (PDPage) iterator1.next();
> }
>PDPageContentStream stream1 = new PDPageContentStream(doc, page1, 
> true, true);
>  stream1.drawImage(w

[jira] [Resolved] (PDFBOX-1645) [PATCH] Improved the accuracy of the bounding box for each rendered CFF glyph

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1645.
-

Resolution: Later

Closing this issue with the resolution "Later". If anyone wants this 
implemented at some point please open a new issue, as this one is mixed up with 
the operand checks that went in as part of PDFBOX-1844.

> [PATCH] Improved the accuracy of the bounding box for each rendered CFF glyph
> -
>
> Key: PDFBOX-1645
> URL: https://issues.apache.org/jira/browse/PDFBOX-1645
> Project: PDFBox
>  Issue Type: Improvement
>  Components: FontBox
>Affects Versions: 1.8.2
>Reporter: Robert Meyer
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
> Attachments: characterl.png, charactert.png, patch-20131202.diff, 
> patch.diff
>
>
> In a previous patch to the CharStringRenderer class, I resolved the rendering 
> issues and added a method to retrieve the bounding box for a CFF glyph. This 
> utilized the GeneralPath.getBounds() method to retrieve it's bounding box. 
> Unfortunately it was found that the method uses the control points of the 
> bezier curves instead of the actual lines and was not very accurate. I have 
> therefore added several new methods to calculate the correct extents of the 
> glyph so that now it matches that of the measurements found in tools like 
> FontForge.
> As a side note, there are several checks which were originally added in my 
> patch which were unfortunately removed relating to the number of arguments 
> provided with an operator. I have one Adobe Font (Adobe Heiti Standard - 
> CID-Keyed OTF) which has one or more glyphs which trip up on this and cause 
> an Array index out of Bounds exception. Each glyph renders correctly even 
> though this issue occurs and therefore would be grateful if these could be 
> left in. I have re-added these checks back with the patch I am about to add.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1645) [PATCH] Improved the accuracy of the bounding box for each rendered CFF glyph

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1645.
---


> [PATCH] Improved the accuracy of the bounding box for each rendered CFF glyph
> -
>
> Key: PDFBOX-1645
> URL: https://issues.apache.org/jira/browse/PDFBOX-1645
> Project: PDFBox
>  Issue Type: Improvement
>  Components: FontBox
>Affects Versions: 1.8.2
>Reporter: Robert Meyer
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
> Attachments: characterl.png, charactert.png, patch-20131202.diff, 
> patch.diff
>
>
> In a previous patch to the CharStringRenderer class, I resolved the rendering 
> issues and added a method to retrieve the bounding box for a CFF glyph. This 
> utilized the GeneralPath.getBounds() method to retrieve it's bounding box. 
> Unfortunately it was found that the method uses the control points of the 
> bezier curves instead of the actual lines and was not very accurate. I have 
> therefore added several new methods to calculate the correct extents of the 
> glyph so that now it matches that of the measurements found in tools like 
> FontForge.
> As a side note, there are several checks which were originally added in my 
> patch which were unfortunately removed relating to the number of arguments 
> provided with an operator. I have one Adobe Font (Adobe Heiti Standard - 
> CID-Keyed OTF) which has one or more glyphs which trip up on this and cause 
> an Array index out of Bounds exception. Each glyph renders correctly even 
> though this issue occurs and therefore would be grateful if these could be 
> left in. I have re-added these checks back with the patch I am about to add.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1740) Umlaut not rendered correctly

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1740:


Component/s: (was: FontBox)
 Rendering

> Umlaut not rendered correctly
> -
>
> Key: PDFBOX-1740
> URL: https://issues.apache.org/jira/browse/PDFBOX-1740
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
> Environment: XP, W7
>Reporter: Tilman Hausherr
> Attachments: FreeSansTestÜ.pdf, FreeSansTestÜ.pdf-1.png
>
>
> The dots above the "U" in the attached file are not rendered correctly. From 
> looking at the points array, I think that the cause is NOT the calculation of 
> the shape path itself (PDFBOX-1435), it must be before, i.e. the calculation 
> of the point coordinates that are used later for the shapes, done in 
> GlyfCompositeDescript.getXCoordinate() or even deeper.
> The X coordinates from the "U" are between 80 and 640. The X coordinates of 
> the two dots are between 406 and 587, i.e. the two dots are at the right:
> points:
> Point(547,-729,onCurve,)
> Point(640,-729,onCurve,)
> Point(640,-217,onCurve,)
> Point(640,-107,,)
> Point(487,23,,)
> Point(359,23,onCurve,)
> Point(229,23,,)
> Point(80,-106,,)
> Point(80,-217,onCurve,)
> Point(80,-729,onCurve,)
> Point(173,-729,onCurve,)
> Point(173,-217,onCurve,)
> Point(173,-138,,)
> Point(274,-59,,)
> Point(359,-59,onCurve,)
> Point(447,-59,,)
> Point(547,-143,,)
> Point(547,-217,onCurve,endOfContour)
> Point(510,-881,onCurve,)
> Point(510,-777,onCurve,)
> Point(406,-777,onCurve,)
> Point(406,-881,onCurve,endOfContour)
> Point(587,-881,onCurve,)
> Point(587,-777,onCurve,)
> Point(483,-777,onCurve,)
> Point(483,-881,onCurve,endOfContour)
> The font can be found here:
> http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1827) Broken Link

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1827:


Component/s: (was: FontBox)
 Documentation

> Broken Link
> ---
>
> Key: PDFBOX-1827
> URL: https://issues.apache.org/jira/browse/PDFBOX-1827
> Project: PDFBox
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.3
>Reporter: gunasilan
>  Labels: features
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> http://pdfbox.apache.org/commandlineutilities/ExtractText.html link is broken.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1278) PDF file containing PDCIDFontType0 (PDType1CFont) does not render correctly to image

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1278:


Component/s: (was: FontBox)
 Rendering

> PDF file containing PDCIDFontType0 (PDType1CFont) does not render correctly 
> to image
> 
>
> Key: PDFBOX-1278
> URL: https://issues.apache.org/jira/browse/PDFBOX-1278
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.7.0
> Environment: Windows 7, JDK 1.6
>Reporter: Hamed Iravanchi
> Attachments: l.pdf
>
>
> Some of the PDF files that contain CID fonts (which are parsed by class 
> PDType1CFont) do not produce correct images.
> This is, I think, because the created AWT font by PDType1CFont.getawtFont 
> method is not right.
> The AWT font do not return correct glyphs for either of the unicode 
> characters or code points.
> I will attach a sample PDF to demonstrate the case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-664) Incorrect rendering

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-664:
---

Component/s: (was: FontBox)
 Text extraction

> Incorrect rendering
> ---
>
> Key: PDFBOX-664
> URL: https://issues.apache.org/jira/browse/PDFBOX-664
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.1.0
>Reporter: Villu Ruusmann
> Attachments: frontpage.png
>
>
> Peter Zavadsky reported to PDFBox users' mailing list about unsatisfiable 
> results when trying to perform text extraction from the following Slovak 
> language PDF document:
> http://www.justice.gov.sk/kop/ovest/ov10/03/050/OV050A.pdf
> While I'm not expert enough to say anything about text extraction, I clearly 
> see numerous rendering problems. Please take a look at the image attachment 
> frontpage.png
> Quite obviously, Slovak language makes use of custom character encoding 
> schemes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1192) getWidth() == 0 if external fonts are used

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1192:


Component/s: (was: FontBox)

> getWidth() == 0 if external fonts are used
> --
>
> Key: PDFBOX-1192
> URL: https://issues.apache.org/jira/browse/PDFBOX-1192
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.6.0
>Reporter: Andreas Lehmkühler
>  Labels: width
> Attachments: ApacheConPDFBox_outsidein-8.3.7.pdf
>
>
> getWidth() == 0 if a font isn't embedded to the pdf



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1124) PDFBox Is Throwing Exception in extraction in case of few pdf in .NET 3.5

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1124:


Component/s: (was: FontBox)
 .NET

> PDFBox Is Throwing Exception in extraction in case of few pdf  in .NET 3.5
> --
>
> Key: PDFBOX-1124
> URL: https://issues.apache.org/jira/browse/PDFBOX-1124
> Project: PDFBox
>  Issue Type: Bug
>  Components: .NET
>Affects Versions: 0.7.3
>Reporter: gagan deep sharma
> Attachments: QT_Install_Guide.pdf
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> i am using the pdfbox libraries in C# with Framework 3.5.
> my code is :- 
> PDDocument doc = PDDocument.load("pdf file path");
> PDFTextStripper stripper = new PDFTextStripper();
> Result = stripper.getText(doc);
> this code is running and working fine. but in case of few pdf(specially when 
> it has images)
> It was giving the error of missing assembly bcprov-jdk14-132.  i add the 
> reference of this assembly.
> now it is again giving the following error on the third line of fetching text.
> The type initializer for 'gnu.java.util.regex.RESyntax' threw an 
> exception.
> Please solve the problem. it is very urgent.
> I have also attached the pdf which is giving error in extracting the text
> thanks in advance :)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1134) fontbox not decoding font correctly for all characters

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1134:


Component/s: (was: FontBox)
 Rendering

> fontbox not decoding font correctly for all characters
> --
>
> Key: PDFBOX-1134
> URL: https://issues.apache.org/jira/browse/PDFBOX-1134
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.6.0
>Reporter: Joseph Berglund
> Attachments: 
> ASF.LICENSE.NOT.GRANTED--pdftoimage_screenshot_of_brochure.jpg, Avaya Aura 
> Server and Gateway Brochure.pdf
>
>
> Attached is a particular PDF with a font encoding that PDFBox does not always 
> understand. Some letters are correct, but most are not. It appears to have 
> something to do with a particular font, which I was hoping would degrade 
> gracefully. I don't know enough about the issue to tell if is a problem with 
> embedded fonts.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-970) TeX-created ligatures and umlauts are not recognised

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-970:
---

Component/s: (was: FontBox)
 Text extraction

> TeX-created ligatures and umlauts are not recognised
> 
>
> Key: PDFBOX-970
> URL: https://issues.apache.org/jira/browse/PDFBOX-970
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.5.0
> Environment: Mac OS X 10.6.6, Java(TM) SE Runtime Environment (build 
> 1.6.0_22-b04-307-10M3261)
>Reporter: Thomas Fischer
>  Labels: textExtraction
> Attachments: A Python Library for Provenance Recording and 
> Querying.txt, A Python Library for Provenance Recording and Querying.txt, 
> Test.pdf, Test.pdf, Test2-1.6.txt, Test2.1.4.txt, Test2.pdf
>
>
> Ligatures in a TeX-created document are lost, which are regognised by v. 1.4, 
> e.g.
>   1.4  1.5
> official  ocial
> efforte ort
> fieldselds
> first  rst
> In addition, German umlauts (ä, ö, ü) are represented as ( a,  o,  u), 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1600) COSDocument and PDDocument declare throws IOException when they don't

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1600:


Component/s: (was: PDModel)
 Documentation

> COSDocument and PDDocument declare throws IOException when they don't
> -
>
> Key: PDFBOX-1600
> URL: https://issues.apache.org/jira/browse/PDFBOX-1600
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.8.1
>Reporter: Patrick Tucker
>Priority: Trivial
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The doc for COSDocument() says it throws an IOException if there is an error 
> creating the temp file.  If you dig through the code a temp is never created, 
> a value of null is assigned to tmpFile.
> Upon fixing the COSDocument() constructor, the constructor for PDDocument 
> will also not need to declare an IOException as a possible throws.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1637) Faulty documentation of PDStream.getInputStreamAsString()

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1637:


Component/s: (was: PDModel)
 Documentation

> Faulty documentation of PDStream.getInputStreamAsString()
> -
>
> Key: PDFBOX-1637
> URL: https://issues.apache.org/jira/browse/PDFBOX-1637
> Project: PDFBox
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.2
>Reporter: Dominic Tubach
>Priority: Trivial
>  Labels: documentation
>
> In the documentation of the method getInputStreamAsString() in PDStream it 
> says: "Uses the default system encoding." Although in the code ISO-8859-1 is 
> used correctly #PDFBOX-945



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (PDFBOX-1073) Error when handling a TIFF image

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1073.
-

Resolution: Incomplete

Closing due to a lack of information.

> Error when handling a TIFF image
> 
>
> Key: PDFBOX-1073
> URL: https://issues.apache.org/jira/browse/PDFBOX-1073
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.6.0
> Environment: Ubuntu 11/04
>Reporter: Mehdi Houshmand
>Priority: Trivial
> Attachments: tiffErrorFix.patch
>
>
> There is an error in a TIFF image within a PDF document and when 
> PDCcitt.getRGBImage() is called, there is a disparity between current PDFBox 
> and v0.8.0. The issue is that there now TIFF decoding features whereas the 
> older PDFBox used to hand that off to java.imageio. 
> The bug here isn't in the Ccitt decoder, but rather in how 
> PDCcitt.getRGBImage() handles an error. Rather than bombing out, I have 
> reimplemented the old feature when an error is thrown.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1073) Error when handling a TIFF image

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1073.
---


> Error when handling a TIFF image
> 
>
> Key: PDFBOX-1073
> URL: https://issues.apache.org/jira/browse/PDFBOX-1073
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.6.0
> Environment: Ubuntu 11/04
>Reporter: Mehdi Houshmand
>Priority: Trivial
> Attachments: tiffErrorFix.patch
>
>
> There is an error in a TIFF image within a PDF document and when 
> PDCcitt.getRGBImage() is called, there is a disparity between current PDFBox 
> and v0.8.0. The issue is that there now TIFF decoding features whereas the 
> older PDFBox used to hand that off to java.imageio. 
> The bug here isn't in the Ccitt decoder, but rather in how 
> PDCcitt.getRGBImage() handles an error. Rather than bombing out, I have 
> reimplemented the old feature when an error is thrown.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1792) Different metadata extracted with NonSequentialPDFParser vs classic parser on some documents

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1792:


Component/s: (was: PDModel)
 Parsing

> Different metadata extracted with NonSequentialPDFParser vs classic parser on 
> some documents
> 
>
> Key: PDFBOX-1792
> URL: https://issues.apache.org/jira/browse/PDFBOX-1792
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.3
>Reporter: Tim Allison
>Priority: Minor
> Attachments: PDFBOX-1792.tar.gz, testPDF_acroForm2.pdf
>
>
> The traditional parser is able to extract metadata from a test document from 
> TIKA-738.  The NonSequentialPDFParser is not able to extract metadata from 
> that file.  Another file from the Tika test suite has metadata that can be 
> extracted by the NonSequentialPDFParser but not by classic. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1383) Proposal for a new COSArrayList

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1383:


Fix Version/s: 2.0.0

> Proposal for a new COSArrayList
> ---
>
> Key: PDFBOX-1383
> URL: https://issues.apache.org/jira/browse/PDFBOX-1383
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Reporter: Dominic Tubach
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: DTCOSArrayList.java, DTCOSArrayListTest.java, 
> DTCOSBaseConverter.java, DefaultDTCOSBaseConverter.java, 
> DefaultDTCOSBaseConverterTest.java
>
>
> Attached is a proposal for a new COSArrayList.
> Main differences to the existing COSArrayList:
> - type safety through generics.
> - it's always clear which types of objects the array holds.
> - flexible loading of objects from a dictionary through COSBaseConverter (see 
> below).
> - correct updating of dictionary entry, no matter whether it is optional, a 
> single value is allowed, or it is required.
> - listener interface.
> However there are some drawbacks:
> - it allows only classes/interfaces that implement/extend COSObjectable.
> -> DualCOSObjectables are not possible. (Would require an extra class.)
> -> no Java types such as String or Float (I see this as advantage as I was a 
> bit confused when I expected an Array with COSNames, but got Strings. By the 
> way adding a String in that case would not add a COSName as one might expect, 
> but a COSString.)
> - replacing the existing COSArrayList would require changes in existing code.
> - requires (as of now) Java 1.6 (It might be enough to remove the @Override 
> annotations for Java 1.5 compatibility.)
> Now to the COSBaseConverter. The COSBaseConverter is just an interface that 
> defines a conversion method to convert a COSBase object to a class that 
> implements COSObjectable.
> The default implementation tries to find a fitting constructor to instantiate 
> the object.
> If the destination class is an Enum it tries to find a fitting static valueOf 
> method to create the object.
> (To avoid a conflict with the existing COSArrayList i prefixed everything 
> with my initials.)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1390) Is COSNumber mutable or immutable?

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1390:


Fix Version/s: 2.0.0

> Is COSNumber mutable or immutable?
> --
>
> Key: PDFBOX-1390
> URL: https://issues.apache.org/jira/browse/PDFBOX-1390
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.7.0
>Reporter: Aaron Stewart
>Priority: Minor
>  Labels: COS
> Fix For: 2.0.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I'm writing code to clone a PDPage as a deep copy.  I'm trying to decide 
> which objects are mutable and which are immutable.  
> COSInteger is confusing.  It has a factory method, which suggests there is 
> some internal caching going on, but it also has a setValue() method.  Caching 
> makes sense for immutable objects.  If it is caching values, then setValue() 
> should probably be deprecated or removed.  
> * * *
> Proposed JUnit code:
> COSInteger original = COSInteger.get(1);
> COSInteger copy = COSInteger.get(1);
> copy.setValue(5);
> assertEquals(1L, original.longValue());



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1270) Change internal page resolution to float everywhere

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1270:


Component/s: Rendering

> Change internal page resolution to float everywhere
> ---
>
> Key: PDFBOX-1270
> URL: https://issues.apache.org/jira/browse/PDFBOX-1270
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel, Rendering
>Affects Versions: 1.6.0
>Reporter: Benjamin Pick
>Priority: Minor
>  Labels: PDFToImage
> Attachments: PDFBox-1270-2.patch
>
>
> Was: PDPage.convertToImage() : Parameter "resolution" should be float
> Is there any specific reason why the method signature reads "int resolution", 
> not "float resolution"?
> I want to create an image of a certain pixel size, so it would be 
> easier/faster to calculate the exact resolution instead of resizing the image 
> afterwards. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1329) Update PDPage to enum

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1329:


Fix Version/s: 2.0.0

> Update PDPage to enum
> -
>
> Key: PDFBOX-1329
> URL: https://issues.apache.org/jira/browse/PDFBOX-1329
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.8.0
> Environment: Linux, UBUNTU 12.04, openjdk-7
>Reporter: Jens Kapitza
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: change_pdpage.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1094) Pattern colorspace support

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1094:


Component/s: (was: PDModel)
 Rendering

> Pattern colorspace support
> --
>
> Key: PDFBOX-1094
> URL: https://issues.apache.org/jira/browse/PDFBOX-1094
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 1.6.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Minor
>
> PDFBox doesn't support PDPattern colorspaces



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (PDFBOX-1070) __NSAutoreleaseNoPool messages in headless mode on Mac OS X

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1070.
-

Resolution: Invalid

Closing, as this is not a bug in PDFBox.

> __NSAutoreleaseNoPool messages in headless mode on Mac OS X
> ---
>
> Key: PDFBOX-1070
> URL: https://issues.apache.org/jira/browse/PDFBOX-1070
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.6.0
> Environment: Mac OS 10.6.7
>Reporter: Sarah Kelley
>Priority: Minor
> Attachments: sakelley_pdf_rendering_problem.zip
>
>
>  On MacOS, running the headless tests ("ant run-headless")
> generates multiple instances of messages like this:
> 
> *** __NSAutoreleaseNoPool(): Object 0x10b60a5a0 of class 
> NSConcreteMapTableValueEnumerator autoreleased with no pool 
> in place - just leaking
> was able to get this stack trace (use "ant run-headless-gdb" to reproduce):
> 
> Breakpoint 1, 0x7fff86465d34 in __NSAutoreleaseNoPool ()
> #0  0x7fff86465d34 in __NSAutoreleaseNoPool ()
> #1  0x7fff863b0ea9 in _CFAutoreleasePoolAddObject ()
> #2  0x7fff863b0c16 in -[NSObject(NSObject) autorelease] ()
> #3  0x7fff8539f3e0 in __NSReactToFontSetChange ()
> #4  0x7fff863b4000 in __CFXNotificationPost ()
> #5  0x7fff863a0578 in _CFXNotificationPostNotification ()
> #6  0x7fff890a5bc6 in AsynchronousLocalNotificationTimerCallBack 
> ()
> #7  0x7fff863a8be8 in __CFRunLoopRun ()
> #8  0x7fff863a6dbf in CFRunLoopRunSpecific ()
> #9  0x0001486d in dyld_stub_strcmp ()
> #10 0x000142ca in dyld_stub_strcmp ()
> #11 0x00011ac8 in dyld_stub_strcmp ()
> Reported to Apple as Radar #9793519



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1187) Cut dependency between pdfbox and jempbox

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1187:


Component/s: (was: PDModel)

> Cut dependency between pdfbox and jempbox
> -
>
> Key: PDFBOX-1187
> URL: https://issues.apache.org/jira/browse/PDFBOX-1187
> Project: PDFBox
>  Issue Type: Wish
>Reporter: Guillaume Bailleul
>Assignee: Guillaume Bailleul
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: cut_jempbox.patch
>
>
> pdfbox artifact depends on jempbox only in PDMetadata class where two methods 
> export or import XMPMetadata :
> * exportXMPMetadata
> * importXMPMetadata
> The work on serializing/unserializing could be done in the calling code 
> without complexity (see attached patch)
> Please give opinion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1070) __NSAutoreleaseNoPool messages in headless mode on Mac OS X

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1070.
---


> __NSAutoreleaseNoPool messages in headless mode on Mac OS X
> ---
>
> Key: PDFBOX-1070
> URL: https://issues.apache.org/jira/browse/PDFBOX-1070
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.6.0
> Environment: Mac OS 10.6.7
>Reporter: Sarah Kelley
>Priority: Minor
> Attachments: sakelley_pdf_rendering_problem.zip
>
>
>  On MacOS, running the headless tests ("ant run-headless")
> generates multiple instances of messages like this:
> 
> *** __NSAutoreleaseNoPool(): Object 0x10b60a5a0 of class 
> NSConcreteMapTableValueEnumerator autoreleased with no pool 
> in place - just leaking
> was able to get this stack trace (use "ant run-headless-gdb" to reproduce):
> 
> Breakpoint 1, 0x7fff86465d34 in __NSAutoreleaseNoPool ()
> #0  0x7fff86465d34 in __NSAutoreleaseNoPool ()
> #1  0x7fff863b0ea9 in _CFAutoreleasePoolAddObject ()
> #2  0x7fff863b0c16 in -[NSObject(NSObject) autorelease] ()
> #3  0x7fff8539f3e0 in __NSReactToFontSetChange ()
> #4  0x7fff863b4000 in __CFXNotificationPost ()
> #5  0x7fff863a0578 in _CFXNotificationPostNotification ()
> #6  0x7fff890a5bc6 in AsynchronousLocalNotificationTimerCallBack 
> ()
> #7  0x7fff863a8be8 in __CFRunLoopRun ()
> #8  0x7fff863a6dbf in CFRunLoopRunSpecific ()
> #9  0x0001486d in dyld_stub_strcmp ()
> #10 0x000142ca in dyld_stub_strcmp ()
> #11 0x00011ac8 in dyld_stub_strcmp ()
> Reported to Apple as Radar #9793519



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-239) PDFToImage prints every word at the start of the line

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-239:
---

Component/s: (was: PDModel)
 Rendering

> PDFToImage prints every word at the start of the line
> -
>
> Key: PDFBOX-239
> URL: https://issues.apache.org/jira/browse/PDFBOX-239
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.3.1
>Reporter: Jukka Zitting
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1641283
> Originally submitted by trejkaz on 2007-01-21 21:30.
> I'm evaluating using PDFBox to convert PDF files to image files.  However, on 
> the most basic test I could come up with, PDFToImage fails to create the 
> result I would expect.
> Each word on the line is printed at the very start of the line, the text 
> overlapping the previous words.
> Attached is a simple PDF file containing nothing but "Hello World!", which 
> was created via FOP.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1641283&file_id=212437
> helloworld1.png (image/png), 5488 bytes
> helloworld1.png (result of running PDFToImage on helloworld.png)
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1641283&file_id=212436
> helloworld.pdf (application/pdf), 4775 bytes
> helloworld.pdf
> [comment on SourceForge]
> Originally sent by trejkaz.
> Logged In: YES 
> user_id=639492
> Originator: YES
> Would this happen to be related to PDFStreamEngine.java line 334?
> //todo, handle horizontal displacement
> [comment on SourceForge]
> Originally sent by trejkaz.
> Logged In: YES 
> user_id=639492
> Originator: YES
> Attaching the PNG file resulting from rendering this PDF.
> File Added: helloworld1.png



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-287) Inserting TIF / BMP / GIF into PDF

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-287:
---

Component/s: (was: PDModel)
 Writing

> Inserting TIF / BMP / GIF into PDF
> --
>
> Key: PDFBOX-287
> URL: https://issues.apache.org/jira/browse/PDFBOX-287
> Project: PDFBox
>  Issue Type: New Feature
>  Components: Writing
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1748641
> Originally submitted by robertsearle on 2007-07-05 10:13.
> I would like three new classes---PDtif, PDbmp, and PDgif.  These classes 
> should work like PDJpeg.  I really hope the constructor is the same (pdfDoc, 
> inputStream).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-229) error with PDFToImage

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-229.
--


> error with PDFToImage
> -
>
> Key: PDFBOX-229
> URL: https://issues.apache.org/jira/browse/PDFBOX-229
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Priority: Minor
> Fix For: 1.8.0
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1620717
> Originally submitted by nobody on 2006-12-22 02:42.
> Trying to convert to images the pages of a pdf file, i get the following 
> error (we are using version 0.7.3). I attach the pdf file that produces the 
> error.
> My email is jesus.cri...@altia.es
> C:\Documents and Settings\Administrador\Escritorio\prueba>java 
> org.pdfbox.PDFToI
> mage -imageType png fichero1.pdf
> java.io.IOException: Unknown stream filter:COSName{JBIG2Decode}
> at org.pdfbox.filter.FilterManager.getFilter(FilterManager.java:116)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:262)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> at 
> org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:22
> 6)
> at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
> at 
> org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap
> .java:138)
> at org.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:81)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at org.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:182)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at 
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:17
> 4)
> at org.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:104)
> at org.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:657)
> at org.pdfbox.PDFToImage.main(PDFToImage.java:183)
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/fontbox/cmap/CMap
> Parser
> at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:534)
> at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
> at 
> org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
> at org.pdfbox.util.operator.ShowText.process(ShowText.java:64)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at org.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:182)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at 
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:17
> 4)
> at org.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:104)
> at org.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:657)
> at org.pdfbox.PDFToImage.main(PDFToImage.java:183)
> C:\Documents and Settings\Administrador\Escritorio\prueba>
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> Originator: NO
> Two comments
> 1)"NoClassDefFoundError: org/fontbox/cmap/CMap" Make sure you have FontBox in 
> your classpath
> 2)JBIG2 is not supported by Java yet, but you can vote for it to get 
> implemented!
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4799898
> Ben



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (PDFBOX-229) error with PDFToImage

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-229.


   Resolution: Fixed
Fix Version/s: 1.8.0

JBIG2 support was https://code.google.com/p/jbig2-imageio/ added in 1.8.0

> error with PDFToImage
> -
>
> Key: PDFBOX-229
> URL: https://issues.apache.org/jira/browse/PDFBOX-229
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Priority: Minor
> Fix For: 1.8.0
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1620717
> Originally submitted by nobody on 2006-12-22 02:42.
> Trying to convert to images the pages of a pdf file, i get the following 
> error (we are using version 0.7.3). I attach the pdf file that produces the 
> error.
> My email is jesus.cri...@altia.es
> C:\Documents and Settings\Administrador\Escritorio\prueba>java 
> org.pdfbox.PDFToI
> mage -imageType png fichero1.pdf
> java.io.IOException: Unknown stream filter:COSName{JBIG2Decode}
> at org.pdfbox.filter.FilterManager.getFilter(FilterManager.java:116)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:262)
> at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235)
> at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170)
> at 
> org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:22
> 6)
> at org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:481)
> at 
> org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap
> .java:138)
> at org.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:81)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at org.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:182)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at 
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:17
> 4)
> at org.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:104)
> at org.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:657)
> at org.pdfbox.PDFToImage.main(PDFToImage.java:183)
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/fontbox/cmap/CMap
> Parser
> at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:534)
> at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
> at 
> org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
> at org.pdfbox.util.operator.ShowText.process(ShowText.java:64)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at org.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:182)
> at 
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:
> 452)
> at 
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :215)
> at 
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:17
> 4)
> at org.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:104)
> at org.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:657)
> at org.pdfbox.PDFToImage.main(PDFToImage.java:183)
> C:\Documents and Settings\Administrador\Escritorio\prueba>
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES 
> user_id=601708
> Originator: NO
> Two comments
> 1)"NoClassDefFoundError: org/fontbox/cmap/CMap" Make sure you have FontBox in 
> your classpath
> 2)JBIG2 is not supported by Java yet, but you can vote for it to get 
> implemented!
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4799898
> Ben



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1881) Visual Signature created by other lib not printable via PDFBox

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1881:


Component/s: (was: PDModel)
 Signing

> Visual Signature created by other lib not printable via PDFBox
> --
>
> Key: PDFBOX-1881
> URL: https://issues.apache.org/jira/browse/PDFBOX-1881
> Project: PDFBox
>  Issue Type: Bug
>  Components: Signing
>Affects Versions: 1.6.0, 1.8.4, 2.0.0
> Environment: Win7, Java7 32bit
>Reporter: Matthias Küng
> Attachments: PDFBox_VisSignature_Print.pdf, 
> pdfbox_vissignature_print.pdf-1.png
>
>
> The attached pdf with the visual signature is created by another tool and I 
> want to print out the document.
> This basically works - only the images of the signatures are missing. These 
> are displayed in Adobe Reader and Foxit Reader.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1895) Font definitions must precede font references

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1895:


Component/s: (was: PDModel)
 Writing

> Font definitions must precede font references
> -
>
> Key: PDFBOX-1895
> URL: https://issues.apache.org/jira/browse/PDFBOX-1895
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 1.8.3, 1.8.4
>Reporter: Pat Hickey
>
> When re-writing a document with font descriptions, Adobe Reader is unable to 
> display the fonts in the document.  Reader can display the fonts in the 
> original document. The difference is that in the original document, the font 
> descriptions are in lower object numbers than the font references; in the 
> output document, the font descriptions are in higher object numbers than the 
> font references.  Is there a quick way to re-order them?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1892) Empty pages after rendering images: org.apache.pdfbox.util.operator.pagedrawer.Invoke

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1892:


Component/s: (was: PDModel)
 Rendering

> Empty pages after rendering images: 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke
> -
>
> Key: PDFBOX-1892
> URL: https://issues.apache.org/jira/browse/PDFBOX-1892
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.4, 2.0.0
> Environment: Windows 7 x64, Java 7
>Reporter: Lukas Vasek
>  Labels: noimage, pdfbox
> Fix For: 1.8.5
>
> Attachments: test.pdf, test2.pdf
>
>
> Hello, 
> I'm printing file (test.pdf) which has on each page generated number with 
> another font. I'm using PDDocument.loadNonSeq() to load data. In logs I can 
> see 
> Feb 6, 2014 3:25:26 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke 
> process
> WARNING: Can't find the XObject for 'Xf1' 
> and no data except that generated numbers are printed.
> I've seen in some old bug that imageIo library was needed, but now in 
> dependencies I don't see it (http://pdfbox.apache.org/dependencies.html)
> Please can you fix this?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-158) Renaming of form fields to identical names

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-158:
---

Component/s: (was: PDModel)
 AcroForm

> Renaming of form fields to identical names
> --
>
> Key: PDFBOX-158
> URL: https://issues.apache.org/jira/browse/PDFBOX-158
> Project: PDFBox
>  Issue Type: New Feature
>  Components: AcroForm
>Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1482326
> Originally submitted by mmajis on 2006-05-05 02:21.
> In Acrobat 7, I can create form fields with identical
> names which will show the same entered value. I tried
> to rename fields to the same effect using PDFBox but it
> did not work. The PDF specification says there should
> be a construct where a parent of two nameless fields
> has this common name. 
> Would this functionality be easy to implement in PDFBox?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1879) Gibberish characters when converting pdf to image

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1879:


Component/s: (was: PDModel)
 Rendering

> Gibberish characters when converting pdf to image
> -
>
> Key: PDFBOX-1879
> URL: https://issues.apache.org/jira/browse/PDFBOX-1879
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.4
> Environment: Windows Server 2008 R2 64 bits / Scala app
>Reporter: Harry Cossec
> Attachments: stinky_vat.pdf, stinky_vat.pdf-1.png, 
> stinky_vat_image.jpg
>
>
> Hello,
> I am an API user of PDFBOX 1.8.4.
> I am encountering an image conversion problem, with a specific .pdf file 
> (attached).
> Some paragraphes are displayed with gibberish characters, while others 
> paragraphes are still readable.
> In the console, there is the log below :
> fÚvr. 03, 2014 3:07:46 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont 
> getawtFo
> nt
> INFO: Using font SansSerif.bold instead
> Thank you for helping us to investigate this issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1822) Signature byte range is Invalid

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1822:


Component/s: (was: Parsing)
 (was: AcroForm)
 (was: PDModel)

> Signature byte range is Invalid
> ---
>
> Key: PDFBOX-1822
> URL: https://issues.apache.org/jira/browse/PDFBOX-1822
> Project: PDFBox
>  Issue Type: Bug
>  Components: Signing
>Affects Versions: 1.8.3, 1.8.4, 2.0.0
>Reporter: vakhtang koroghlishvili
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
> Attachments: 
> SignatureFileSet-PDFBOX-1.8.2_TO_1.8.4-SNAPSHOT_SEQ_AND_NONSEQ.zip, 
> araxis-merge - compare two document.jpg, damaged-sig.jpg, 
> unsigned-signed.pdf, unsigned.pdf, unsigned_signed_fix.pdf
>
>
> On person send me a unsigned PDF document. He wanted to sign it. When I try 
> to sign it (using pad box), I have some problem.
> After signing adobe reader tells me "The signature byre range is invalid".  
> I will attach original and signed document.
> I think, it is PDF box parser error. another signature libraries sign 
> document very well. I'm searching the problem at the moment, in order to fix 
> it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1798) Performance problem with PDDocument.saveIncremental (when signing document)

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1798:


Component/s: (was: PDModel)
 Writing

> Performance problem with PDDocument.saveIncremental (when signing document)
> ---
>
> Key: PDFBOX-1798
> URL: https://issues.apache.org/jira/browse/PDFBOX-1798
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing, Writing
>Affects Versions: 1.8.3, 2.0.0
>Reporter: Dmytro Karimov
>  Labels: patch, performance
> Attachments: PDFBOX-1798.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Performance problem in class COSWriter:
> Constructor of COSWriter takes 2 args:
> COSWriter(OutputStream os, FileInputStream is)
> method saveIncremental in class PDDocument:
> saveIncremental(FileInputStream input, OutputStream output)
> It create COSWriter with this args. If I pass FileInputStream into 
> saveIncremental then signing the document goes quite a long time.
> If you pass BufferedInputStream, the signing speed is increases. But alas, 
> this is not possible, because the parameters of the method saveIncremental 
> does not allow to do this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1877) Radial Shading (type 3) fails Ghent Workgroup tests

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1877:


Component/s: (was: PDModel)
 Rendering

> Radial Shading (type 3) fails Ghent Workgroup tests
> ---
>
> Key: PDFBOX-1877
> URL: https://issues.apache.org/jira/browse/PDFBOX-1877
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
> Environment: W7
>Reporter: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: GWG060_Shading_x1a.pdf, GWG061_Shading_x1a.pdf, 
> gwg060_shading_x1a.pdf-1.png, gwg061_shading_x1a.pdf-1.png, 
> pdfbox-615-radial-input.pdf-1-bad.png, 
> pdfbox-615-radial-input.pdf-1-good.png, shading_pattern.pdf
>
>
> GWG 6.0 test: the black rectangle around the circle is missing
> GWG 6.1 test: The rectangles are there but shouldn't be. Plus, the second 
> type 3 shading has wrong colors. (Maybe same problem as in PDFBOX-1876 ?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1666) Missing StemV font descriptor entry when embedding AFM fonts

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1666:


Component/s: (was: PDModel)
 Writing

> Missing StemV font descriptor entry when embedding AFM fonts
> 
>
> Key: PDFBOX-1666
> URL: https://issues.apache.org/jira/browse/PDFBOX-1666
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Max Gilead
>  Labels: easyfix, patch
> Fix For: 2.0.0
>
> Attachments: AFMTest.java, AFM_Test-invalid.pdf, AFM_Test-valid.pdf, 
> n022004l.afm, n022004l.pfb, sRGB_IEC61966-2-1_black_scaled.icc
>
>
> When embedding an AFM font the StemV field is missing in the PDF which 
> renders it not PDF/A-1b compliant.
> As the StemV value is not included in AFM files it seems to be OK to simply 
> set it to 0. A quick test in Firefox, Chrome, OSX Preview and Acrobat Reader 
> indicates having StemV set to 0 does not impact font rendering in any obvious 
> way. FOP computes StemV from other values stored in PFM files but the fields 
> are optional so can't be relied upon [1] (hence results are often 0 anyway) 
> and Word [2] and iOS [3] seem to use 0 by default.
> Verified in SVN trunk 1504502 (2013.07.18)
> [1] http://xmlgraphics.apache.org/fop/1.1/fonts.html
> [2] http://tracker.luatex.org/view.php?id=32
> [3] 
> http://blog.nomzit.com/2010/08/18/annoying-bug-in-quartz-pdfcontext-font-handling/
>  -- just a link to a iOS-originating PDF dissected, nothing to do with the 
> bug the article is about



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1706) Reading PDF documents that contain special characters (e.g. €) cause warning and invalid parse result

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1706:


Component/s: (was: PDModel)
 Text extraction

> Reading PDF documents that contain special characters (e.g. €) cause warning 
> and invalid parse result
> -
>
> Key: PDFBOX-1706
> URL: https://issues.apache.org/jira/browse/PDFBOX-1706
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.8.2, 2.0.0
> Environment: Windows
>Reporter: Robert Neumann
>  Labels: patch
>
> When trying to call stripper.getText on the PDF file 
> http://www.edi-energy.de/files2/EDI@Energy%20UTILMD%205.1_20130401.pdf, 
> PDFBox 1.8.2 emits the following warning:
> 08:48:20,222  WARN PDFStreamEngine:567 - java.io.IOException: Error: Could 
> not find font(COSName{F7}) in 
> map={F1=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@676825b5, 
> F2=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@547e97d8}
> java.io.IOException: Error: Could not find font(COSName{F7}) in 
> map={F1=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@676825b5, 
> F2=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@547e97d8}
> at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:455)
> at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:379)
> at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:335)
> at 
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:254)
> Interestingly, PDFBox 2.0 emits a different warning that calls out the 
> problem more precisely:
> Aug 27, 2013 9:35:30 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont 
> extractToUnicodeEncoding
> SEVERE: Error: Could not load embedded ToUnicode CMap
> Aug 27, 2013 9:35:30 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont 
> getSpaceWidth
> SEVERE: Can't determine the width of the space character using 250 as default
> java.lang.NullPointerException
>   at 
> org.apache.pdfbox.pdmodel.font.PDSimpleFont.getSpaceWidth(PDSimpleFont.java:406)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:343)
>   at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:529)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:258)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
>   at 
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:455)
>   at 
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:379)
>   at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:335)
>   at 
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:254)
> We could trace the problem down to reading pages that contain special 
> characters (e.g. €). In the referenced PDF document, pages that do not 
> contain special characters (e.g. €) do not cause the above mentioned warning. 
> The text parts in the document that cause the warning do not get parsed 
> correctly. The parse result contains byte rubbish. 
> Adobe reader displays the entire document correctly.
> The following snippet should serve as a repro:
> package com.regiocom.bpo.mig;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileNotFoundException;
> import java.io.IOException;
> import java.util.List;
> import org.apache.pdfbox.pdfparser.PDFParser;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.util.PDFTextStripper;
> import org.apache.pdfbox.util.Splitter;
> public class Repro {
>   
>   public Repro() {
>   
>   try {
>   stripper = new PDFTextStripper();
>   } catch (IOException e) {
>   e.printStackTrace();
>   }
>   }
>   // use this PDF as input: 
> http://www.edi-energy.de/files2/EDI@Energ

[jira] [Commented] (PDFBOX-1661) Fix font subtype automatically

2014-02-09 Thread John Hewson (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896068#comment-13896068
 ] 

John Hewson commented on PDFBOX-1661:
-

There is indeed a problem reading the Type 0 fonts in this file. I tried 2.0.0 
trunk and got the following, along with bad rendering:

{code}
Feb 09, 2014 1:57:14 PM org.apache.pdfbox.pdfviewer.PageDrawer createAWTFont
INFO: Unsupported type of font org.apache.pdfbox.pdmodel.font.PDType0Font
Feb 09, 2014 1:57:15 PM org.apache.pdfbox.pdfviewer.PageDrawer createAWTFont
INFO: Using font Helvetica-Light instead
Feb 09, 2014 1:57:15 PM org.apache.pdfbox.pdfviewer.PageDrawer createAWTFont
INFO: Unsupported type of font org.apache.pdfbox.pdmodel.font.PDType0Font
Feb 09, 2014 1:57:15 PM org.apache.pdfbox.pdfviewer.PageDrawer createAWTFont
INFO: Using font Helvetica-Light instead
Feb 09, 2014 1:57:15 PM org.apache.pdfbox.util.PDFImageWriter writeImage
INFO: Writing: page-081.jpg
{code}

> Fix font subtype automatically
> --
>
> Key: PDFBOX-1661
> URL: https://issues.apache.org/jira/browse/PDFBOX-1661
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel, Rendering
>Affects Versions: 1.8.1
> Environment: PDFBox: PDFBox 1.8.1
> Reader: Adobe Reader 11.0.0
> Generator:  TCPDF 4.5.041
> PDF Content:
> < /BaseFont /AdobeSongStd-Light,Bold-UniGB-UTF16-H
> /Subtype /Type0
> /Encoding /UniGB-UTF16-H
> /DescendantFonts [27 0 R]
>Reporter: Raymond Wu
>  Labels: encoding, font
> Attachments: adobe-screenshot.png, page-08.pdf, pdf-screenshot.png
>
>
> Subtype is parsed as "Type0" by PDFBox, but parsed as "Type1" by Adobe Reader.
> This is not a bug of PDFBox.
> The reason is TCPDF 4.5.041 generate font AdobeSongStd-Light with bad subtype 
> "Type0".
> It should be "Type1".
> I have test the following codes and they work.
> File: org/apache/pdfbox/pdmodel/font/PDFontFactory.java
> Method: public static PDFont createFont( COSDictionary dic ) throws 
> IOException
> Original:
> else if( subType.equals( COSName.TYPE0 ) )
> {
> retval = new PDType0Font( dic );
> }
> Fixed:
> else if( subType.equals( COSName.TYPE0 ) )
> {
> COSName encoding = (COSName)dic.getDictionaryObject(COSName.ENCODING);
> retval = (encoding!=null) ? new PDType1Font( dic ) : new PDType0Font( dic 
> );
> }
> With such patch PDFBox will act like Adobe Reader.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1661) Fix font subtype automatically

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1661:


Component/s: Rendering

> Fix font subtype automatically
> --
>
> Key: PDFBOX-1661
> URL: https://issues.apache.org/jira/browse/PDFBOX-1661
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel, Rendering
>Affects Versions: 1.8.1
> Environment: PDFBox: PDFBox 1.8.1
> Reader: Adobe Reader 11.0.0
> Generator:  TCPDF 4.5.041
> PDF Content:
> < /BaseFont /AdobeSongStd-Light,Bold-UniGB-UTF16-H
> /Subtype /Type0
> /Encoding /UniGB-UTF16-H
> /DescendantFonts [27 0 R]
>Reporter: Raymond Wu
>  Labels: encoding, font
> Attachments: adobe-screenshot.png, page-08.pdf, pdf-screenshot.png
>
>
> Subtype is parsed as "Type0" by PDFBox, but parsed as "Type1" by Adobe Reader.
> This is not a bug of PDFBox.
> The reason is TCPDF 4.5.041 generate font AdobeSongStd-Light with bad subtype 
> "Type0".
> It should be "Type1".
> I have test the following codes and they work.
> File: org/apache/pdfbox/pdmodel/font/PDFontFactory.java
> Method: public static PDFont createFont( COSDictionary dic ) throws 
> IOException
> Original:
> else if( subType.equals( COSName.TYPE0 ) )
> {
> retval = new PDType0Font( dic );
> }
> Fixed:
> else if( subType.equals( COSName.TYPE0 ) )
> {
> COSName encoding = (COSName)dic.getDictionaryObject(COSName.ENCODING);
> retval = (encoding!=null) ? new PDType1Font( dic ) : new PDType0Font( dic 
> );
> }
> With such patch PDFBox will act like Adobe Reader.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1594) Add support for AES256 Encryption

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1594:


Component/s: (was: PDModel)

> Add support for AES256 Encryption 
> --
>
> Key: PDFBOX-1594
> URL: https://issues.apache.org/jira/browse/PDFBOX-1594
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Maruan Sahyoun
> Fix For: 2.0.0
>
> Attachments: pdfbox-1.8.4-aes256.diff
>
>
> Adobe 9 added support for AES 256 encryption. Further information is 
> available at  
> http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf
>  (specially 3.5.1) or ISO 32000-2.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1614) Digitally sign PDFs without file system access

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1614:


Component/s: (was: Writing)
 (was: PDModel)

> Digitally sign PDFs without file system access
> --
>
> Key: PDFBOX-1614
> URL: https://issues.apache.org/jira/browse/PDFBOX-1614
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 1.8.1
>Reporter: Thierry Boschat
>Assignee: Thomas Chojecki
>
> Hi I'm using pdfbox-1.8.1 to digitally sign PDFs.
> I find the sample below to handle it.
> But in this example I have to use a FileInputStream however I want to do it 
> only through streams (without any file system access). I tried to extends 
> FileInputStream to deal with it but I failed. Any tips for me about that 
> problem ?
> Thanks.
> File outputDocument = new File("resources/signed" + document.getName());
> FileInputStream fis = new FileInputStream(document);
> FileOutputStream fos = new FileOutputStream(outputDocument);
> int c;
> while ((c = fis.read(buffer)) != -1)
> {
>   fos.write(buffer, 0, c);
> }
> fis.close();
> fis = new FileInputStream(outputDocument);
> // load document
> PDDocument doc = PDDocument.load(document);
> // create signature dictionary
> PDSignature signature = new PDSignature();
> signature.setFilter(PDSignature.FILTER_ADOBE_PPKLITE); // default filter
> // subfilter for basic and PAdES Part 2 signatures
> signature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED);
> signature.setName("signer name");
> signature.setLocation("signer location");
> signature.setReason("reason for signature");
> // the signing date, needed for valid signature
> signature.setSignDate(Calendar.getInstance());
> // register signature dictionary and sign interface
> doc.addSignature(signature, this);
> // write incremental (only for signing purpose)
> doc.saveIncremental(fis, fos);



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1566) reduce duplicated code and add caching to pdpagenode

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1566.
---


> reduce duplicated code and add caching to pdpagenode
> 
>
> Key: PDFBOX-1566
> URL: https://issues.apache.org/jira/browse/PDFBOX-1566
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.8.2
>Reporter: Jens Kapitza
> Fix For: 2.0.0
>
> Attachments: pdfbox-pdpage.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (PDFBOX-1566) reduce duplicated code and add caching to pdpagenode

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1566.
-

Resolution: Won't Fix

Closing due to the age of this issue as it can no longer be applied against the 
trunk.

> reduce duplicated code and add caching to pdpagenode
> 
>
> Key: PDFBOX-1566
> URL: https://issues.apache.org/jira/browse/PDFBOX-1566
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 1.8.2
>Reporter: Jens Kapitza
> Fix For: 2.0.0
>
> Attachments: pdfbox-pdpage.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-1562) Thumbnail of PDF is missing image

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-1562:


Component/s: (was: PDModel)
 Rendering

> Thumbnail of PDF is missing image
> -
>
> Key: PDFBOX-1562
> URL: https://issues.apache.org/jira/browse/PDFBOX-1562
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.0
> Environment: Running Oracle JDK 1.7 U17 on Linux with Fontbox 1.8.0, 
> Jempbox 1.8.0, and PDFBox 1.8.0. The lib directory also contains 
> levigo-jbig2-imageio-1.5.2.jar
>Reporter: George Sexton
> Attachments: Metro Parent May.pdf
>
>
> When rendering a thumbnail of the attached PDF, the image of the boy holding 
> the drumsticks is missing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1545.
---


Closing as ReplaceString was removed at some point.

> ReplaceString fails to replace text, however RemoveText or TextExtraction 
> works fine
> 
>
> Key: PDFBOX-1545
> URL: https://issues.apache.org/jira/browse/PDFBOX-1545
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.7.1
> Environment: ubuntu 32bit, Java 6
>Reporter: MartinV
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings 
> in this pdf :
> https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
> (anyone with link can view and download it...)
> As i found during iteration in "Tj" and "tj" operations :
>  COSString previous = (COSString)tokens.get( j-1 );
>  String string = previous.getString();
> Those strings are just empty or with length of 2 (some whitespaces only) ... 
> i would expect to get some separated group of words from my PDF.
> I tried this on version 1.7.1 and then i download latest code from SVN 
> (today) and both version had the same behaviour. I my PDF special in any way 
> or which objects should be explored next ? I tried another two PDF downloaded 
> from google drive and both had the same issue (maybe google formats PDF in 
> special way ?).
> I am suprised that RemoveText works fine in this PDF and also test extraction 
> give me good result - so there must be a way... Thank you
> PS: I don`t mind to fix bug on my own it but i do not have any significant 
> knowledge of internal PDF structure. Hints welcomed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2014-02-09 Thread John Hewson (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson resolved PDFBOX-1545.
-

Resolution: Unresolved

> ReplaceString fails to replace text, however RemoveText or TextExtraction 
> works fine
> 
>
> Key: PDFBOX-1545
> URL: https://issues.apache.org/jira/browse/PDFBOX-1545
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 1.7.1
> Environment: ubuntu 32bit, Java 6
>Reporter: MartinV
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings 
> in this pdf :
> https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
> (anyone with link can view and download it...)
> As i found during iteration in "Tj" and "tj" operations :
>  COSString previous = (COSString)tokens.get( j-1 );
>  String string = previous.getString();
> Those strings are just empty or with length of 2 (some whitespaces only) ... 
> i would expect to get some separated group of words from my PDF.
> I tried this on version 1.7.1 and then i download latest code from SVN 
> (today) and both version had the same behaviour. I my PDF special in any way 
> or which objects should be explored next ? I tried another two PDF downloaded 
> from google drive and both had the same issue (maybe google formats PDF in 
> special way ?).
> I am suprised that RemoveText works fine in this PDF and also test extraction 
> give me good result - so there must be a way... Thank you
> PS: I don`t mind to fix bug on my own it but i do not have any significant 
> knowledge of internal PDF structure. Hints welcomed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

1 2 >

1 - 100 of 183 matches

Mail list logo