[ https://issues.apache.org/jira/browse/HIVE-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-2917: ------------------------------ Attachment: HIVE-2917.D2619.1.patch flyinggarden requested code review of "HIVE-2917 [jira] Add support for various charsets in LazySimpleSerDe". Reviewers: JIRA https://issues.apache.org/jira/browse/HIVE-2917 HIVE-2917: Add support for various charsets in LazySimpleSerDe Currently hive can only serialize/deserialize data encoded in utf-8. It would be useful to specify the data's charset when creating the table. The idea is to add a new keyword CHARSET to set charset at table level. For example: CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS TERMINATED BY '\t'; Another place to use CHARSET is in TRANSFORM clause. For example: SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk' USING 'some_script' AS (col3, col4) ROW FORMAT CHARSET 'utf-8'; TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D2619 AFFECTED FILES hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java data/files/gbk.txt serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyCharset.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyString.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyObjectInspectorFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyUnionObjectInspector.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyStringObjectInspector.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyListObjectInspector.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyMapObjectInspector.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazySimpleStructObjectInspector.java serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py serde/src/gen/thrift/gen-cpp/serde_constants.cpp serde/src/gen/thrift/gen-cpp/serde_constants.h serde/src/gen/thrift/gen-rb/serde_constants.rb serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/Constants.java serde/src/gen/thrift/gen-php/serde/serde_constants.php serde/if/serde.thrift ql/src/test/results/clientpositive/charset.q.out ql/src/test/results/clientpositive/input35.q.out ql/src/test/results/clientpositive/input36.q.out ql/src/test/results/clientpositive/transform_charset.q.out ql/src/test/queries/clientpositive/transform_charset.q ql/src/test/queries/clientpositive/charset.q ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/6027/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. > Add support for various charsets in LazySimpleSerDe > --------------------------------------------------- > > Key: HIVE-2917 > URL: https://issues.apache.org/jira/browse/HIVE-2917 > Project: Hive > Issue Type: New Feature > Components: CLI, Serializers/Deserializers > Affects Versions: 0.9.0 > Reporter: Kai Zhang > Attachments: HIVE-2917.1.patch.txt, HIVE-2917.2.patch.txt, > HIVE-2917.3.patch.txt, HIVE-2917.D2619.1.patch > > > Currently hive can only serialize/deserialize data encoded in utf-8. > It would be useful to specify the data's charset when creating the table. > The idea is to add a new keyword CHARSET to set charset at table level. > For example: > CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS > TERMINATED BY '\t'; > Another place to use CHARSET is in TRANSFORM clause. > For example: > SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk' > USING 'some_script' > AS (col3, col4) ROW FORMAT CHARSET 'utf-8'; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira