[ https://issues.apache.org/jira/browse/HIVE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-4199: ------------------------------ Attachment: HIVE-4199.HIVE-4199.HIVE-4199.D9501.3.patch sxyuan updated the revision "HIVE-4199 [jira] ORC writer doesn't handle non-UTF8 encoded Text properly". Making the new data file binary. Reviewers: kevinwilfong REVISION DETAIL https://reviews.facebook.net/D9501 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D9501?vs=29973&id=30009#toc AFFECTED FILES data/files/nonutf8.txt ql/src/test/results/clientpositive/orc_nonutf8.q.out ql/src/test/queries/clientpositive/orc_nonutf8.q ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java To: kevinwilfong, sxyuan Cc: JIRA > ORC writer doesn't handle non-UTF8 encoded Text properly > -------------------------------------------------------- > > Key: HIVE-4199 > URL: https://issues.apache.org/jira/browse/HIVE-4199 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Reporter: Samuel Yuan > Assignee: Samuel Yuan > Priority: Minor > Attachments: HIVE-4199.HIVE-4199.HIVE-4199.D9501.1.patch, > HIVE-4199.HIVE-4199.HIVE-4199.D9501.2.patch, > HIVE-4199.HIVE-4199.HIVE-4199.D9501.3.patch > > > StringTreeWriter currently converts fields stored as Text objects into > Strings. This can lose information (see > http://en.wikipedia.org/wiki/Replacement_character#Replacement_character), > and is also unnecessary since the dictionary stores Text objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira