[
https://issues.apache.org/jira/browse/PDFBOX-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519225#comment-16519225
]
Artur Jablonski commented on PDFBOX-4247:
-----------------------------------------
Hmmm... OK, I don't think I follow your comment, which is most likely because I
have a very vague idea about PDF format internals.
So what you're saying is that the reason i don't get anything using
`PDFTextStripper` class has nothing to do with permissions, but with the way
the text is represented in the file, which is not a collection of glyphs, but
some sort of vector graphic format.
If that's the case, is there any accurate, programatic way via PdfBox to detect
this 'vector' text and then deploy some OCR text recognition on it?
> Access permissions read by pdfbox are wrong.
> --------------------------------------------
>
> Key: PDFBOX-4247
> URL: https://issues.apache.org/jira/browse/PDFBOX-4247
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.8, 2.0.9
> Reporter: Artur Jablonski
> Priority: Major
> Attachments: PDFBOX-4247.pdf
>
>
> A pdf that in AcrobatReader shows that permissions to extract content and
> assembly document are not granted, when parsed with PdfBox, for both
> {{PDDocument.getCurrentAccessPermission().canExtractContent()}} and
> {{PDDocument.getCurrentAccessPermission().canAssembleDocument()}} returns
> {{true}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]