Tilman Hausherr created TIKA-1489:
-------------------------------------
Summary: PDF Text extraction without permission
Key: TIKA-1489
URL: https://issues.apache.org/jira/browse/TIKA-1489
Project: Tika
Issue Type: Bug
Affects Versions: 1.7
Reporter: Tilman Hausherr
In TIKA-1442 text extraction from files like 717226.pdf that don't have text
extraction permission works. The permissions in PDF files are only enforced by
the application (i.e. PDFBox), i.e. the text information isn't stored
separately in encrypted form.
PDFBox ExtractText command line does throw an exception.
So I wonder why TIKA is able to extract text. Either TIKA or the PDFBox call
used bypasses the permission checking.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)