[
https://issues.apache.org/jira/browse/HIVE-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HIVE-2917:
------------------------------
Attachment: HIVE-2917.D2619.1.patch
flyinggarden requested code review of "HIVE-2917 [jira] Add support for various
charsets in LazySimpleSerDe".
Reviewers: JIRA
https://issues.apache.org/jira/browse/HIVE-2917
HIVE-2917: Add support for various charsets in LazySimpleSerDe
Currently hive can only serialize/deserialize data encoded in utf-8.
It would be useful to specify the data's charset when creating the table.
The idea is to add a new keyword CHARSET to set charset at table level.
For example:
CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS
TERMINATED BY '\t';
Another place to use CHARSET is in TRANSFORM clause.
For example:
SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk'
USING 'some_script'
AS (col3, col4) ROW FORMAT CHARSET 'utf-8';
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D2619
AFFECTED FILES
hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
data/files/gbk.txt
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java
serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyCharset.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyString.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyObjectInspectorFactory.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyUnionObjectInspector.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyStringObjectInspector.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyListObjectInspector.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyMapObjectInspector.java
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazySimpleStructObjectInspector.java
serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java
serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py
serde/src/gen/thrift/gen-cpp/serde_constants.cpp
serde/src/gen/thrift/gen-cpp/serde_constants.h
serde/src/gen/thrift/gen-rb/serde_constants.rb
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/Constants.java
serde/src/gen/thrift/gen-php/serde/serde_constants.php
serde/if/serde.thrift
ql/src/test/results/clientpositive/charset.q.out
ql/src/test/results/clientpositive/input35.q.out
ql/src/test/results/clientpositive/input36.q.out
ql/src/test/results/clientpositive/transform_charset.q.out
ql/src/test/queries/clientpositive/transform_charset.q
ql/src/test/queries/clientpositive/charset.q
ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java
ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/6027/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> Add support for various charsets in LazySimpleSerDe
> ---------------------------------------------------
>
> Key: HIVE-2917
> URL: https://issues.apache.org/jira/browse/HIVE-2917
> Project: Hive
> Issue Type: New Feature
> Components: CLI, Serializers/Deserializers
> Affects Versions: 0.9.0
> Reporter: Kai Zhang
> Attachments: HIVE-2917.1.patch.txt, HIVE-2917.2.patch.txt,
> HIVE-2917.3.patch.txt, HIVE-2917.D2619.1.patch
>
>
> Currently hive can only serialize/deserialize data encoded in utf-8.
> It would be useful to specify the data's charset when creating the table.
> The idea is to add a new keyword CHARSET to set charset at table level.
> For example:
> CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS
> TERMINATED BY '\t';
> Another place to use CHARSET is in TRANSFORM clause.
> For example:
> SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk'
> USING 'some_script'
> AS (col3, col4) ROW FORMAT CHARSET 'utf-8';
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira