Arabic PDF doesn't extract correctly ------------------------------------ Key: TIKA-722 URL: https://issues.apache.org/jira/browse/TIKA-722 Project: Tika Issue Type: Bug Components: parser Reporter: Michael McCandless Priority: Minor
I have a PDF w/ Arabic font that Tika fails to extract (gets all gibberish). Looks like the PDF does not include the separate Unicode text metadata (hmm: would Tika extract that if it were present?), and copy/paste out of the PDF also produces gibberish. To fix this I think we'd somehow have to know the mapping for the font (this particular font is AXTManal)? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira