[jira] [Commented] (PDFBOX-4247) Access permissions read by pdfbox are wrong.

Artur Jablonski (JIRA) Thu, 21 Jun 2018 03:56:07 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519225#comment-16519225
 ]


Artur Jablonski commented on PDFBOX-4247:
-----------------------------------------

Hmmm... OK, I don't think I follow your comment, which is most likely because I 
have a very vague idea about PDF format internals. 

So what you're saying is that the reason i don't get anything using 
`PDFTextStripper` class has nothing to do with permissions, but with the way 
the text is represented in the file, which is not a collection of glyphs, but 
some sort of vector graphic format.

If that's the case, is there any accurate, programatic way via PdfBox to detect 
this 'vector' text and then deploy some OCR text recognition on it?

> Access permissions read by pdfbox are wrong.
> --------------------------------------------
>
>                 Key: PDFBOX-4247
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4247
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.8, 2.0.9
>            Reporter: Artur Jablonski
>            Priority: Major
>         Attachments: PDFBOX-4247.pdf
>
>
> A pdf that in AcrobatReader shows that permissions to extract content and 
> assembly document are not granted, when parsed with PdfBox, for both 
> {{PDDocument.getCurrentAccessPermission().canExtractContent()}} and 
> {{PDDocument.getCurrentAccessPermission().canAssembleDocument()}} returns 
> {{true}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4247) Access permissions read by pdfbox are wrong.

Reply via email to