[ https://issues.apache.org/jira/browse/TIKA-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177081#comment-13177081 ]
Nick Burch commented on TIKA-793: --------------------------------- Comment (COM/COMM) tag handling fixed in r1225480 - it uses a different form to the other text tags so needs explicit encoding aware handling of the different parts of it. > Invalid ASCII character (65533) when retriving MP3 metadata > ----------------------------------------------------------- > > Key: TIKA-793 > URL: https://issues.apache.org/jira/browse/TIKA-793 > Project: Tika > Issue Type: Bug > Components: metadata, parser > Affects Versions: 1.0 > Environment: Ubuntu 10.04 (x64), Android (2.2 +) > Reporter: William Seemann > Priority: Minor > Fix For: 1.1 > > Attachments: TikaTest.java > > > When extracting metadata from certain mp3's (the id3 version appears to be > 2.4) I'm seeing invalid characters at the end of the parsed fields. For > example: > American M� > which should be: > American Me -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira