[jira] [Commented] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948970#comment-13948970
 ] 

Tilman Hausherr commented on PDFBOX-1918:
-

There will always be PDFs that are not correct. Your java application should 
simply report that, and tell that it won't be able to analyse it, maybe point 
to some FAQ, with the answer to "why didn't we index that PDF that was 
generated by this multi-billion-dollar corporation and is displayed by Adobe 
Viewer?"

The customer will then have to find a way to get a correct PDF. In this case, 
either by learning about the "binary" option in his ftp software (if that was 
the cause), or (if the PDF was really generated this way by Oracle) by 
explaining an Oracle help desk intern assistant in Kasachstan the difference 
between unix newlines and windows newlines and then pray that this information 
will get up seven hierarchy levels and reach a developer who will put it in the 
"todo" list and so that they will include it in the next major release.

That I was able to fix this PDF is just luck. Most PDFs aren't ascii readable 
like that one.

> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948953#comment-13948953
 ] 

Maruan Sahyoun commented on PDFBOX-1918:


As Tilman wrote the PDF could have been corrupted after it was generated. So 
you could check how the file is transferred to you. What is the process from 
PDF generation until it ends in your possession? Does that process change the 
file unintendedly? Could you get a file from the customer directly at the 
source? Written to disk directly? Will PDFBox still complain with that file?

> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Jr. John (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948933#comment-13948933
 ] 

Jr. John edited comment on PDFBOX-1918 at 3/27/14 6:18 AM:
---

This report is from Oracle10gR2 AS Reports Services...
We development an archive all mails system for exchange server. And all mails 
pdf files will be parsed with tika of pdfbox. Our java application will fail 
when mails with this kind of the pdf. We can't ask employee to correct their 
oracle pdf...

Wish you can help...thank you.


was (Author: jrjohn):
This report is from Oracle10gR2 AS Reports Services...
We development an archive all mails system for exchange server. And all mails 
pdf files will be parsed with tika of pdfbox. Our java application will fail 
When mails with this kind of the pdf. We can't ask employee to correct their 
oracle pdf...

Wish you can help...thank you.

> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Jr. John (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948933#comment-13948933
 ] 

Jr. John edited comment on PDFBOX-1918 at 3/27/14 6:18 AM:
---

This report is from Oracle10gR2 AS Reports Services...
We development an archive all mails system for exchange server. And all mails 
pdf files will be parsed with tika of pdfbox. Our java application will fail 
When mails with this kind of the pdf. We can't ask employee to correct their 
oracle pdf...

Wish you can help...thank you.


was (Author: jrjohn):
This report is from Oracle10gR2 AS Reports Services...
We development an archive all mails system for exchange server. And all mails 
pdf files will be parsed with tika of pdfbox. When mail with this kind of the 
pdf. Our java system will fail. We can't ask employee to correct their oracle 
pdf...

Wish you can help...thank you.

> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Jr. John (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948933#comment-13948933
 ] 

Jr. John commented on PDFBOX-1918:
--

This report is from Oracle10gR2 AS Reports Services...
We development an archive all mails system for exchange server. And all mails 
pdf files will be parsed with tika of pdfbox. When mail with this kind of the 
pdf. Our java system will fail. We can't ask employee to correct their oracle 
pdf...

Wish you can help...thank you.

> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948927#comment-13948927
 ] 

Tilman Hausherr commented on PDFBOX-1918:
-

Good statement by a company that also does PDF conversion: [Buggy PDF Files, 
should we try to fix them?|http://blog.amyuni.com/?p=1627]


> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1918) PDF convert error

2014-03-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-1918:


Attachment: rpt1390780234888753.pdf

The whole xref table is incorrect. The likely reason is that PDF file was 
ftp-transferred in ascii mode instead of binary mode. I used NOTEPAD++ and 
replaced 0D 0A with 0D and suddenly the xref table matches. I am attaching the 
modified PDF so that you can view both with a hex editor.

> PDF convert error
> -
>
> Key: PDFBOX-1918
> URL: https://issues.apache.org/jira/browse/PDFBOX-1918
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Utilities
>Affects Versions: 1.8.4
>Reporter: Jr. John
> Attachments: rpt1390780234888753.pdf, rpt1390780234888753.pdf
>
>
> Current version has same problem 1.8.4
> D:\Software\pdfbox>java -jar pdfbox-app-1.8.4.jar ConvertColorspace 
> rpt1390780234888753.pdf test.pdf
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 15353 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.BaseParser parseCOSStream
> 警告: Specified stream length 12156 is wrong. Fall back to reading stream until 
> 'endstream'.
> 二月 07, 2014 4:59:11 下午 org.apache.pdfbox.pdfparser.XrefTrailerResolver 
> setStartxref
> 警告: Did not found XRef object at specified startxref position 83636
> ConvertColorspace failed with the following exception:
> java.io.IOException: Missing closing bracket for hex string. Reached EOS.
> at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSHexString(BaseParser.java:1023)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:816)
> at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:259)
> at org.apache.pdfbox.pdfparser.PDFStreamParser.parse(PDFStreamParser.java:133)
> at 
> org.apache.pdfbox.ConvertColorspace.replaceColors(ConvertColorspace.java:88)
> at org.apache.pdfbox.ConvertColorspace.main(ConvertColorspace.java:385)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:46)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1074) TIFFFaxDecoder5 when using PDFImageWriter

2014-03-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-1074:
---

Assignee: Tilman Hausherr  (was: Andreas Lehmkühler)

> TIFFFaxDecoder5 when using PDFImageWriter
> -
>
> Key: PDFBOX-1074
> URL: https://issues.apache.org/jira/browse/PDFBOX-1074
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.6.0, 1.8.4, 1.8.5
>Reporter: Anton Stremoukhov
>Assignee: Tilman Hausherr
>  Labels: CCITTFaxDecode, ccitt
> Fix For: 1.8.5, 2.0.0
>
> Attachments: 34315.pdf, page_83.pdf, s2130312-100.pdf, 
> s2130312-100.pdf-1.tif, s2130312.pdf
>
>
> I'm getting this when I try to PDFImageWriter.writeImage() on a PDF with one 
> page (see attached page_83.pdf):
> Caused by: java.lang.Error: TIFFFaxDecoder5
>   at 
> org.apache.pdfbox.filter.TIFFFaxDecoder.decodeT6(TIFFFaxDecoder.java:1005)
>   at 
> org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:101)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
>   at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:153)
>   at 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:78)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:107)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:722)
>   at 
> org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:135)
>   at 
> org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:105)
> If you look on the pdf file i'm using (see attached page_83.pdf) you'll 
> notice its completely blank, but this is ok - page was obtained from source 
> pdf file with 84 pages where the last one is blank (see attached 34315.pdf).
> Source pdf has been splitted on pages (without any errors) via Splitter like 
> so:
> FileInputStream fis = new FileInputStream(file);
> PDFParser parser = new PDFParser(fis);
> parser.parse();
> COSDocument cosDoc = parser.getDocument();
> PDDocument pdDoc = new PDDocument(cosDoc);
> 
> Splitter splitter = new Splitter();
> List pages = splitter.split(pdDoc);
> for (int i = 0; i < pages.size(); i++){
>  PDDocument pageDoc = pages.get(i);
>  String fileNameNew = "page_" + i + ".pdf";
>  writeDocument(pageDoc, new File(destDir, fileNameNew).getPath());
>  pageDoc.close();
> }
> fis.close();
> cosDoc.close();
> pdDoc.close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2002) Show deprecation in the build

2014-03-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948265#comment-13948265
 ] 

Tilman Hausherr commented on PDFBOX-2002:
-

I modified the parent POM in the trunk in rev. 1581986.

> Show deprecation in the build
> -
>
> Key: PDFBOX-2002
> URL: https://issues.apache.org/jira/browse/PDFBOX-2002
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.0
>
>
> According to 
> https://pdfbox.apache.org/ideas.html
> one of the tasks is "Remove all deprecated methods". Therefore, I will modify 
> the parent POM to show the deprecated calls. This will show such calls, but 
> not fail the build. It is a gentle hint to fix these calls. Lets leave this 
> issue open until all is done.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2002) Show deprecation in the build

2014-03-26 Thread Tilman Hausherr (JIRA)
Tilman Hausherr created PDFBOX-2002:
---

 Summary: Show deprecation in the build
 Key: PDFBOX-2002
 URL: https://issues.apache.org/jira/browse/PDFBOX-2002
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Priority: Minor
 Fix For: 2.0.0


According to 
https://pdfbox.apache.org/ideas.html
one of the tasks is "Remove all deprecated methods". Therefore, I will modify 
the parent POM to show the deprecated calls. This will show such calls, but not 
fail the build. It is a gentle hint to fix these calls. Lets leave this issue 
open until all is done.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size

2014-03-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948183#comment-13948183
 ] 

Tilman Hausherr commented on PDFBOX-1994:
-

I haven't tested whether it will work on 1.4. I have 1.7 here and soon 1.8.

If your car runs on "super" gas, you don't fill the gas tank with "regular" 
gas, or do you?

What you could do, if you have an old pc somewhere: install jre 1.4 and see 
what happens when you run your application.

> PDDocument.load(filename.pdf) hangs for pdf files having size
> -
>
> Key: PDFBOX-1994
> URL: https://issues.apache.org/jira/browse/PDFBOX-1994
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: brijesh
>
> The below code i am using for loading my pdf. but my pdf file is not a zero 
> sized files and having full permission and it is not a corrupt file also. but 
> i ddint get any error after code. it just hangs. 
> it is working in local, but not working in server .
> (created ,jar files and then exe, then the .exe will excuted in the server)
> java using 1,4
> PDDocument pdf=PDDocument.load("d:\\filename.pdf");
> pdf.print();
> please provide me why the same code is not working in server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1086) Error when decoding CCITT compressed data that contains EOLs, fill bits etc.

2014-03-26 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948169#comment-13948169
 ] 

Tilman Hausherr commented on PDFBOX-1086:
-

I fixed two of three decoders re fill bits. (I could fix the third one but 
would prefer to have a test file). Now there's only PDFBOX-457 left. It could 
be an EOL, but I can neither prove or disprove that theory. An EOL would make 
no sense in a G4 encoded document, at least according to wikipedia:
https://en.wikipedia.org/wiki/Group_4_compression

> Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
> 
>
> Key: PDFBOX-1086
> URL: https://issues.apache.org/jira/browse/PDFBOX-1086
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Reporter: Jeremias Maerki
>Assignee: Jeremias Maerki
>  Labels: CCITTFaxDecode, ccitt
>
> The TIFFFaxDecoder class (originally coming from JAI via XML Graphics 
> Commons) does not handle cases like EOLs between lines and in front. But the 
> PDF CCITTFaxDecode filter needs to allow many different variants of the 
> encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT 
> data, so TIFFFaxDecoder was not written to be as flexible as we need it. 
> Ideally, PDFBox should handle anything that gets thrown at it.
> It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with 
> the necessary flexibility. So, new decoders for T.4 and T.6 should probably 
> be written.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PDFBOX-1708) IndexOutOfBoundsException on convertToImage with an embedded Fax-Image

2014-03-26 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-1708.
-

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.8.5

Fixed in rev 1581928 for the trunk and rev 1581946 for the 1.8 branch. The 
cause was that EncodedByteAlign = true was ignored.

Try to download a snapshot version to see that it works :-)
https://repository.apache.org/snapshots/org/apache/pdfbox/

> IndexOutOfBoundsException on convertToImage with an embedded Fax-Image
> --
>
> Key: PDFBOX-1708
> URL: https://issues.apache.org/jira/browse/PDFBOX-1708
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel, Rendering
>Affects Versions: 1.8.2
>Reporter: Martin Withake
>  Labels: CCITTFaxDecode, ccitt
> Fix For: 1.8.5, 2.0.0
>
> Attachments: IN06119.PDF
>
>
> PDPage.convertToImage brings me this stacktrace:
> java.lang.IndexOutOfBoundsException: offset + length > bit count
>   at 
> org.apache.pdfbox.io.ccitt.PackedBitArray.setBits(PackedBitArray.java:108)
>   at 
> org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.writeRun(CCITTFaxG31DDecodeInputStream.java:184)
>   at 
> org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.access$400(CCITTFaxG31DDecodeInputStream.java:29)
>   at 
> org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream$RunLengthTreeNode.execute(CCITTFaxG31DDecodeInputStream.java:375)
>   at 
> org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.decodeLine(CCITTFaxG31DDecodeInputStream.java:165)
>   at 
> org.apache.pdfbox.io.ccitt.CCITTFaxG31DDecodeInputStream.read(CCITTFaxG31DDecodeInputStream.java:98)
>   at java.io.InputStream.read(InputStream.java:163)
>   at java.io.FilterInputStream.read(FilterInputStream.java:116)
>   at 
> org.apache.pdfbox.io.ccitt.FillOrderChangeInputStream.read(FillOrderChangeInputStream.java:45)
>   at java.io.FilterInputStream.read(FilterInputStream.java:90)
>   at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:68)
>   at 
> org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:114)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295)
>   at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237)
>   at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172)
>   at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:155)
>   at 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:83)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>   at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>   at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
>   at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
>   at 
> de.rekers.ui.table.YDateianlageTable$4.doInBackground(YDateianlageTable.java:740)
>   at 
> de.rekers.ui.table.YDateianlageTable$4.doInBackground(YDateianlageTable.java:1)
>   at javax.swing.SwingWorker$1.call(SwingWorker.java:277)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at javax.swing.SwingWorker.run(SwingWorker.java:316)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> The document is partially rendered. The document is created by our fax 
> software. Acrobat Reader shows the document without an error.
> Thanks in advance!
> Martin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947947#comment-13947947
 ] 

Maruan Sahyoun commented on PDFBOX-2001:


That’s for the Field Type. It checks if a field is a signature field [ISO 32000 
Table 220]. That’s a different information to the signature dictionary which 
may or may not have the Sig entry for the dictionary type as you correctly 
found in the spec.

> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Nicolas Bouillon (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947938#comment-13947938
 ] 

Nicolas Bouillon commented on PDFBOX-2001:
--

In the following source, line 447 : 
http://svn.apache.org/viewvc/pdfbox/tags/1.8.4/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java?view=markup#l447

There is a test for the value COSName.SIG.

It's not related ?

> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947924#comment-13947924
 ] 

Maruan Sahyoun commented on PDFBOX-2001:


PDFBox has an issue parsing the document. Try loading with 
PDDocument.loadNonSeq and you get a number of errors for wrong offset 
information.

PDFBox doesn’t rely on the Sig key being present for the Type information of 
the dictionary. So currently the parsing seems to be the root cause.

As you asked what our position about "But PDFBox seems to limit its research of 
signature only if this key "Sig" is present. „ is - well this limitation 
doesn’t seem to exist.

We will need to look into why the parsing fails.

> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size

2014-03-26 Thread brijesh (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947898#comment-13947898
 ] 

brijesh edited comment on PDFBOX-1994 at 3/26/14 1:27 PM:
--

in server i am  using j2re 4.
in local java 6. but in my project (jdeveloper) , selected compiler as 1.4 . 
that is the reason i told that i am working in java 4 in local.
i will update the jre in server to 5 or 6. then i will create .exe .
i will test and inform you asap.
is it sure,  PDDocument.load(filename.pdf) not work for java 
 1,4?


was (Author: bpv):
in server i am  using j2re 4.
in local java 6. but in my project (jdeveloper) , selected compiler as 1.4 . 
that is the reason i told that i am working in java 4 in local.
i will update the jre in server to 5 or 6. then i will create .exe .
i will test and inform you asap.
is it sure,  PDDocument.load(filename.pdf) not work for 1,4?

> PDDocument.load(filename.pdf) hangs for pdf files having size
> -
>
> Key: PDFBOX-1994
> URL: https://issues.apache.org/jira/browse/PDFBOX-1994
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: brijesh
>
> The below code i am using for loading my pdf. but my pdf file is not a zero 
> sized files and having full permission and it is not a corrupt file also. but 
> i ddint get any error after code. it just hangs. 
> it is working in local, but not working in server .
> (created ,jar files and then exe, then the .exe will excuted in the server)
> java using 1,4
> PDDocument pdf=PDDocument.load("d:\\filename.pdf");
> pdf.print();
> please provide me why the same code is not working in server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1994) PDDocument.load(filename.pdf) hangs for pdf files having size

2014-03-26 Thread brijesh (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947898#comment-13947898
 ] 

brijesh commented on PDFBOX-1994:
-

in server i am  using j2re 4.
in local java 6. but in my project (jdeveloper) , selected compiler as 1.4 . 
that is the reason i told that i am working in java 4 in local.
i will update the jre in server to 5 or 6. then i will create .exe .
i will test and inform you asap.
is it sure,  PDDocument.load(filename.pdf) not work for 1,4?

> PDDocument.load(filename.pdf) hangs for pdf files having size
> -
>
> Key: PDFBOX-1994
> URL: https://issues.apache.org/jira/browse/PDFBOX-1994
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: brijesh
>
> The below code i am using for loading my pdf. but my pdf file is not a zero 
> sized files and having full permission and it is not a corrupt file also. but 
> i ddint get any error after code. it just hangs. 
> it is working in local, but not working in server .
> (created ,jar files and then exe, then the .exe will excuted in the server)
> java using 1,4
> PDDocument pdf=PDDocument.load("d:\\filename.pdf");
> pdf.print();
> please provide me why the same code is not working in server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Nicolas Kaczmarski (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Kaczmarski updated PDFBOX-2001:
---

Attachment: acrobatSignatureExample.PNG

> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf, acrobatSignatureExample.PNG
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Nicolas Kaczmarski (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947884#comment-13947884
 ] 

Nicolas Kaczmarski commented on PDFBOX-2001:


If you open this document with Acrobat or FoxIt Reader, signature is well 
detected. (I have added a screnshot in attachment)
This is not the case with PDFBox.


> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)

2014-03-26 Thread Damon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947770#comment-13947770
 ] 

Damon Li commented on PDFBOX-1234:
--

I am working with the attached 'fw8bene--dft.pdf' which throws the same error 
for when setting a value to a field.

Has anyone solved this issue yet? I have attached a unit test which fails for 
me when trying to set a value for the first text field, the name of which is 
'topmostSubform[0].Page1[0].f1_001[0]' I believe.

> NPE at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
> ---
>
> Key: PDFBOX-1234
> URL: https://issues.apache.org/jira/browse/PDFBOX-1234
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Christer Palm
> Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf
>
>
> Using SVN trunk revision 1291094 (2012-02-18)
> Getting the following stack trace when trying to call PDField.setValue() on a 
> AcroForm field in the attached document;
> java.lang.NullPointerException
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
> Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, 
> in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 
> g") is not present in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)

2014-03-26 Thread Damon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Li updated PDFBOX-1234:
-

Comment: was deleted

(was: A unit test for fw8bene--dft.pdf)

> NPE at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
> ---
>
> Key: PDFBOX-1234
> URL: https://issues.apache.org/jira/browse/PDFBOX-1234
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Christer Palm
> Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf
>
>
> Using SVN trunk revision 1291094 (2012-02-18)
> Getting the following stack trace when trying to call PDField.setValue() on a 
> AcroForm field in the attached document;
> java.lang.NullPointerException
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
> Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, 
> in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 
> g") is not present in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)

2014-03-26 Thread Damon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Li updated PDFBOX-1234:
-

Attachment: SetPDFFieldValueTest.java

A unit test for fw8bene--dft.pdf

> NPE at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
> ---
>
> Key: PDFBOX-1234
> URL: https://issues.apache.org/jira/browse/PDFBOX-1234
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Christer Palm
> Attachments: 200221.pdf, SetPDFFieldValueTest.java, fw8bene--dft.pdf
>
>
> Using SVN trunk revision 1291094 (2012-02-18)
> Getting the following stack trace when trying to call PDField.setValue() on a 
> AcroForm field in the attached document;
> java.lang.NullPointerException
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
> Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, 
> in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 
> g") is not present in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1234) NPE at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)

2014-03-26 Thread Damon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Li updated PDFBOX-1234:
-

Attachment: fw8bene--dft.pdf

> NPE at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
> ---
>
> Key: PDFBOX-1234
> URL: https://issues.apache.org/jira/browse/PDFBOX-1234
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Christer Palm
> Attachments: 200221.pdf, fw8bene--dft.pdf
>
>
> Using SVN trunk revision 1291094 (2012-02-18)
> Getting the following stack trace when trying to call PDField.setValue() on a 
> AcroForm field in the attached document;
> java.lang.NullPointerException
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:551)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.insertGeneratedAppearance(PDAppearance.java:371)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:281)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
> Reason seems to be that PDApperance.getFontAndUpdateResources() returns null, 
> in turn because the font dictionary for the DA of the field ("/Cour 11 Tf 0 
> g") is not present in the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Thomas Chojecki (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947753#comment-13947753
 ] 

Thomas Chojecki commented on PDFBOX-2001:
-

There is a problem parsing the field dictionary. It should contain two entries 
but if I get the first entry, I will get the field dictionary object. It is 
some kind of parsing problem.

The document catalog looks really weird, it contains only direct objects up to 
the felds. Also it has a second signature encapsulate inside the perms 
dictionary. 

> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2000) White page when converting first page to image

2014-03-26 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947708#comment-13947708
 ] 

Hong-Thai Nguyen commented on PDFBOX-2000:
--

Thanks,
I've noticed the PDFBox works fine on same file on Linux.

> White page when converting first page to image
> --
>
> Key: PDFBOX-2000
> URL: https://issues.apache.org/jira/browse/PDFBOX-2000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.4
> Environment: windows
>Reporter: Hong-Thai Nguyen
> Fix For: 2.0.0
>
> Attachments: wrongpdf.pdf
>
>
> When converting first page to image by this code for attached PDF:
> {code}
> private static BufferedImage computeImage(PDDocument document) throws 
> IOException {
> int imageType = BufferedImage.TYPE_INT_RGB;
> int resolution;
> try {
>   resolution = Toolkit.getDefaultToolkit().getScreenResolution();
> } catch (HeadlessException e) {
>   resolution = 96;
> }
> PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(0);
> try {
>   BufferedImage image = page.convertToImage(imageType, resolution);
>   return image;
> } finally {
>   page = null;
> }
>   }
> {code}
> returned image is the of a white page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Nicolas Kaczmarski (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Kaczmarski updated PDFBOX-2001:
---

Attachment: D.1_signiert.pdf

File with signature not found

> Digital Signature information
> -
>
> Key: PDFBOX-2001
> URL: https://issues.apache.org/jira/browse/PDFBOX-2001
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.3
>Reporter: Nicolas Kaczmarski
> Attachments: D.1_signiert.pdf
>
>
> We have a signed PDF but signature is described without key "Sig".
> As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
> signature dictionary, this key is optional :
> "(Optional) The type of PDF object that this dictionary describes; if 
> present, shall be Sig for a signature dictionary. "
> But PDFBox seems to limit its research of signature only if this key "Sig" is 
> present.
> What is your position about that?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2001) Digital Signature information

2014-03-26 Thread Nicolas Kaczmarski (JIRA)
Nicolas Kaczmarski created PDFBOX-2001:
--

 Summary: Digital Signature information
 Key: PDFBOX-2001
 URL: https://issues.apache.org/jira/browse/PDFBOX-2001
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.8.3
Reporter: Nicolas Kaczmarski


We have a signed PDF but signature is described without key "Sig".
As you can see in the standard PDF 32000-1:2008 - Table 252 - Entries in a 
signature dictionary, this key is optional :
"(Optional) The type of PDF object that this dictionary describes; if present, 
shall be Sig for a signature dictionary. "

But PDFBox seems to limit its research of signature only if this key "Sig" is 
present.

What is your position about that?




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7

2014-03-26 Thread Maruan Sahyoun (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-1512:
---

Attachment: Topo.pdf
Topo.txt
TopoOverlap.pdf
TopoOverlap.txt
TopoContained.pdf
TopoContained.txt

A series of sample files model after the chart in 
http://en.wikipedia.org/wiki/Topological_sorting together with the text 
extraction done by Adobe Reader.

> TextPositionComparator is not compatible with Java 7
> 
>
> Key: PDFBOX-1512
> URL: https://issues.apache.org/jira/browse/PDFBOX-1512
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.7.1
> Environment: Java 7
>Reporter: Benjamin Papez
>Assignee: Andreas Lehmkühler
> Attachments: FOP-2252.pdf, TextPositionComparator.java, Topo.pdf, 
> Topo.txt, TopoContained.pdf, TopoContained.txt, TopoOverlap.pdf, 
> TopoOverlap.txt, WFI_PDFParser_TextPostionComparator.txt, 
> illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf
>
>
> The TextPostionCompartor causes the following exception running on Java 7: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison 
> method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
> (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
> (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive: 
> ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that 
> sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7

2014-03-26 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947608#comment-13947608
 ] 

Maruan Sahyoun edited comment on PDFBOX-1512 at 3/26/14 7:32 AM:
-

I’d think that we can find a sorting algorithm which can handle such cases. 
Before that what would be the expectation of the sorting result looking at the 
drawing Hannes provided? Shall we look at inspecting the results of other tools 
such as Adobe Reader and replicate their behavior?

[Update] Adobe Reader would extract the sample to C,A,B

I’m willing to look into solving the issue but would like to have some input on 
the end result first.

Maruan


was (Author: msahyoun):
I’d think that we can find a sorting algorithm which can handle such cases. 
Before that what would be the expectation of the sorting result looking at the 
drawing Hannes provided? Shall we look at inspecting the results of other tools 
such as Adobe Reader and replicate their behavior?

I’m willing to look into solving the issue but would like to have some input on 
the end result first.

Maruan

> TextPositionComparator is not compatible with Java 7
> 
>
> Key: PDFBOX-1512
> URL: https://issues.apache.org/jira/browse/PDFBOX-1512
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.7.1
> Environment: Java 7
>Reporter: Benjamin Papez
>Assignee: Andreas Lehmkühler
> Attachments: FOP-2252.pdf, TextPositionComparator.java, 
> WFI_PDFParser_TextPostionComparator.txt, 
> illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf
>
>
> The TextPostionCompartor causes the following exception running on Java 7: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison 
> method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
> (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
> (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive: 
> ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that 
> sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7

2014-03-26 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947608#comment-13947608
 ] 

Maruan Sahyoun commented on PDFBOX-1512:


I’d think that we can find a sorting algorithm which can handle such cases. 
Before that what would be the expectation of the sorting result looking at the 
drawing Hannes provided? Shall we look at inspecting the results of other tools 
such as Adobe Reader and replicate their behavior?

I’m willing to look into solving the issue but would like to have some input on 
the end result first.

Maruan

> TextPositionComparator is not compatible with Java 7
> 
>
> Key: PDFBOX-1512
> URL: https://issues.apache.org/jira/browse/PDFBOX-1512
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 1.7.1
> Environment: Java 7
>Reporter: Benjamin Papez
>Assignee: Andreas Lehmkühler
> Attachments: FOP-2252.pdf, TextPositionComparator.java, 
> WFI_PDFParser_TextPostionComparator.txt, 
> illustration-of-inconsistent-sorting.png, immo-kurier_arsenal_93x62.pdf
>
>
> The TextPostionCompartor causes the following exception running on Java 7: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison 
> method violates its general contract!
> I think the problem is with this check:
> if ( yDifference < .1 ||
> (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
> (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))
> as it violates the contract requirement:
> The implementor must also ensure that the relation is transitive: 
> ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
> Finally, the implementor must ensure that compare(x, y)==0 implies that 
> sgn(compare(x, z))==sgn(compare(y, z)) for all z.
> Java 7 now is strict and throws exceptions when the contract is violated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)