[
https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612945#comment-13612945
]
Wolfgang Kronberg commented on PDFBOX-940:
------------------------------------------
I still see this issue with 1.8.0 and 1.9.0-SNAPSHOT. In my case, the filename
consists of binary rubbish, plus '-UCS2'.
Looking at the code of PDCIDFont.determineEncoding(), it seems to me that the
error message is misleading:
cmap = parseCmap( resourceRootCMAP,
ResourceLoader.loadResource( resourceName ));
if( cmap == null)
{
log.error("Error: Could not parse predefined CMAP file
for '" + cidSystemInfo + "'" );
}
Obviously, the message is so harsh because parseCmap() of a predefined file
(included with pdfbox) must never fail, otherwise it would be a bug in pdfbox.
Usually, however, the reason for this message is not failing parsing, but
simply that there is no predefined file for the given ressource name.
In my opinion, such a case should not be treated more harshly than the case
that getCIDSystemInfo() yields null in the first place.
PDCIDFont.determineEncoding() handles this case by silently calling
super.determineEncoding(), which usually completes without any errors. Thus, in
my opinion, the code snippet above should be changed to:
InputStream resIn = ResourceLoader.loadResource(
resourceName );
if (resIn != null) {
cmap = parseCmap( resourceRootCMAP, resIn);
if( cmap == null)
{
log.error("Error: Could not parse
predefined CMAP file for '" + cidSystemInfo + "'" );
}
} else {
super.determineEncoding();
}
Anyway, the binary rubbbish I observe probably points to some other bug, and I
have not been able to pin that one down. I have loads of PDF documents
exhibiting this bug, all of them unfortunately being confidential. In case any
team member is interested, please email me so that I can provide you with some
examples.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for
> 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts1.JPG, pdf
> fonts2.JPG, pdf fonts.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf
> properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in
> the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined
> CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira