[ https://issues.apache.org/jira/browse/TIKA-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823239#comment-16823239 ]
Tim Allison edited comment on TIKA-2858 at 4/22/19 5:01 PM: ------------------------------------------------------------ Hmmmm....I can't get PDFBox to open the file directly with the following...can you see what I'm doing wrong? {noformat} final String password = " ! < > \" \\ € œ ¤ ¼ ½ \uD841\uDF0E \uD867\uDD98 \uD83D\uDE00 "; ParseContext parseContext = new ParseContext(); parseContext.set(PasswordProvider.class, new PasswordProvider() { @Override public String getPassword(Metadata metadata) { return password; } }); PDFParser p = new PDFParser(); BodyContentHandler handler = new BodyContentHandler(); try (InputStream is = this.getClass().getResourceAsStream("/testUnicodePassword.pdf")) { p.parse(is, handler, new Metadata(), parseContext); } {noformat} Or if I set this literally: {noformat} final String password = " ! < > \" \\ € œ ¤ ¼ ½ 𠜎 𩶘 😀 "; {noformat} was (Author: talli...@mitre.org): Hmmmm....I can't get PDFBox to open the file directly with the following...can you see what I'm doing wrong? {noformat} final String password = " ! < > \" \\ € œ ¤ ¼ ½ \uD841\uDF0E \uD867\uDD98 \uD83D\uDE00 "; ParseContext parseContext = new ParseContext(); parseContext.set(PasswordProvider.class, new PasswordProvider() { @Override public String getPassword(Metadata metadata) { return password; } }); PDFParser p = new PDFParser(); BodyContentHandler handler = new BodyContentHandler(); try (InputStream is = this.getClass().getResourceAsStream("/testUnicodePassword.pdf")) { p.parse(is, handler, new Metadata(), parseContext); } {noformat} > JAXRS server: allow passwords with special chars (MIME encoded words) > --------------------------------------------------------------------- > > Key: TIKA-2858 > URL: https://issues.apache.org/jira/browse/TIKA-2858 > Project: Tika > Issue Type: Improvement > Components: server > Affects Versions: 1.20 > Reporter: Ross Johnson > Priority: Minor > Attachments: protected - 4 space password.pdf, protected - Unicode > password.pdf > > > Tika Server allows passing a document password in a special {{Password}} > request header; however, I don't believe this header allows for passwords > with non-US-ASCII characters, or for passwords with leading or trailing > spaces. > One potential solution would be to allow MIME encoded-word values (RFC 2047) > in the password header so that one could specify any password with only > US-ASCII. This extra decoding could be enabled / disabled with some other > flag or header value, in order to avoid any breaking changes for clients that > are not encoding this header (e.g. if the password happens to literally be > "{{=?UTF-8?B??=}}"). > Attached are 2 sample PDF files that I'm unable to use with TIka Server due > to their passwords. These passwords are a bit contrived, but I have come > across this issue with real passwords. I've included the passwords in code > blocks to avoid the issue editor / viewer from collapsing multiple spaces > into one. > The file named "{{protected - 4 space password.pdf}}" has a password of 4 > literal spaces: > {code:java} > // Password is on line below (4 literal spaces) > > {code} > The file named "{{protected - Unicode password.pdf}}" has a password of > mostly special characters, with 2 leading spaces and 2 trailing spaces thrown > in for good measure: > {code:java} > // Password is on following line (with 2 leading spaces, 2 trailing spaces) > ! < > " \ € œ ¤ ¼ ½ 𠜎 𩶘 😀 > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)