[ https://issues.apache.org/jira/browse/TIKA-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256798#comment-16256798 ]
Nick Burch commented on TIKA-2505: ---------------------------------- Are you using 1.6 or 1.16? (They're very different!) When you made your request to the server, what content encoding headers (if any) did you send? That said, I think there might be a problem with the PDF itself, or PDFBox. When I try your file with the Tika App, I get errors like these; {code} WARN No Unicode mapping for 0 (0) in font null WARN No Unicode mapping for 1 (1) in font null WARN No Unicode mapping for 2 (2) in font null WARN No Unicode mapping for .notdef (3) in font null WARN No Unicode mapping for 4 (4) in font null WARN No Unicode mapping for 5 (5) in font null WARN No Unicode mapping for 6 (6) in font null WARN No Unicode mapping for 7 (7) in font null WARN No Unicode mapping for 8 (8) in font null WARN No Unicode mapping for 9 (9) in font null WARN No Unicode mapping for 10 (10) in font null WARN No Unicode mapping for 11 (11) in font null {code} > Tika server output encoding problems > ------------------------------------ > > Key: TIKA-2505 > URL: https://issues.apache.org/jira/browse/TIKA-2505 > Project: Tika > Issue Type: Bug > Affects Versions: 1.16 > Reporter: Fanni Kovacs > Attachments: original.pdf, response.txt > > > Hello, > We noticed during a conversion of large amount of files, there are some > issues when we get a non UTF-8 response from tika server 1.6. -- This message was sent by Atlassian JIRA (v6.4.14#64029)