[ 
https://issues.apache.org/jira/browse/TIKA-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823239#comment-16823239
 ] 

Tim Allison edited comment on TIKA-2858 at 4/22/19 5:01 PM:
------------------------------------------------------------

Hmmmm....I can't get PDFBox to open the file directly with the following...can 
you see what I'm doing wrong?
{noformat}
        final String password = "  ! < > \" \\ € œ ¤ ¼ ½ \uD841\uDF0E 
\uD867\uDD98 \uD83D\uDE00  ";
        ParseContext parseContext = new ParseContext();
        parseContext.set(PasswordProvider.class, new PasswordProvider() {
            @Override
            public String getPassword(Metadata metadata) {
                return password;
            }
        });
        PDFParser p = new PDFParser();
        BodyContentHandler handler = new BodyContentHandler();
        try (InputStream is = 
this.getClass().getResourceAsStream("/testUnicodePassword.pdf")) {
            p.parse(is, handler, new Metadata(), parseContext);
        }
{noformat}

Or if I set this literally:
{noformat}
        final String password = "  ! < > \" \\ € œ ¤ ¼ ½ 𠜎 𩶘 😀  ";
{noformat}



was (Author: talli...@mitre.org):
Hmmmm....I can't get PDFBox to open the file directly with the following...can 
you see what I'm doing wrong?
{noformat}
        final String password = "  ! < > \" \\ € œ ¤ ¼ ½ \uD841\uDF0E 
\uD867\uDD98 \uD83D\uDE00  ";
        ParseContext parseContext = new ParseContext();
        parseContext.set(PasswordProvider.class, new PasswordProvider() {
            @Override
            public String getPassword(Metadata metadata) {
                return password;
            }
        });
        PDFParser p = new PDFParser();
        BodyContentHandler handler = new BodyContentHandler();
        try (InputStream is = 
this.getClass().getResourceAsStream("/testUnicodePassword.pdf")) {
            p.parse(is, handler, new Metadata(), parseContext);
        }
{noformat}


> JAXRS server: allow passwords with special chars (MIME encoded words)
> ---------------------------------------------------------------------
>
>                 Key: TIKA-2858
>                 URL: https://issues.apache.org/jira/browse/TIKA-2858
>             Project: Tika
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 1.20
>            Reporter: Ross Johnson
>            Priority: Minor
>         Attachments: protected - 4 space password.pdf, protected - Unicode 
> password.pdf
>
>
> Tika Server allows passing a document password in a special {{Password}} 
> request header; however, I don't believe this header allows for passwords 
> with non-US-ASCII characters, or for passwords with leading or trailing 
> spaces.
> One potential solution would be to allow MIME encoded-word values (RFC 2047) 
> in the password header so that one could specify any password with only 
> US-ASCII. This extra decoding could be enabled / disabled with some other 
> flag or header value, in order to avoid any breaking changes for clients that 
> are not encoding this header (e.g. if the password happens to literally be 
> "{{=?UTF-8?B??=}}").
> Attached are 2 sample PDF files that I'm unable to use with TIka Server due 
> to their passwords. These passwords are a bit contrived, but I have come 
> across this issue with real passwords. I've included the passwords in code 
> blocks to avoid the issue editor / viewer from collapsing multiple spaces 
> into one.
> The file named "{{protected - 4 space password.pdf}}" has a password of 4 
> literal spaces:
> {code:java}
> // Password is on line below (4 literal spaces)
>     
> {code}
> The file named "{{protected - Unicode password.pdf}}" has a password of 
> mostly special characters, with 2 leading spaces and 2 trailing spaces thrown 
> in for good measure:
> {code:java}
> // Password is on following line (with 2 leading spaces, 2 trailing spaces)
>   ! < > " \ € œ ¤ ¼ ½ 𠜎 𩶘 😀  
> {code}
>      



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to