[ 
https://issues.apache.org/jira/browse/HIVE-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2917:
------------------------------

    Attachment: HIVE-2917.D2619.1.patch

flyinggarden requested code review of "HIVE-2917 [jira] Add support for various 
charsets in LazySimpleSerDe".
Reviewers: JIRA

  https://issues.apache.org/jira/browse/HIVE-2917

  HIVE-2917: Add support for various charsets in LazySimpleSerDe

  Currently hive can only serialize/deserialize data encoded in utf-8.

  It would be useful to specify the data's charset when creating the table.

  The idea is to add a new keyword CHARSET to set charset at table level.
  For example:
  CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS 
TERMINATED BY '\t';

  Another place to use CHARSET is in TRANSFORM clause.
  For example:
  SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk'
  USING 'some_script'
  AS (col3, col4) ROW FORMAT CHARSET 'utf-8';

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D2619

AFFECTED FILES
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
  data/files/gbk.txt
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyPrimitive.java
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyCharset.java
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyString.java
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyObjectInspectorFactory.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyUnionObjectInspector.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyStringObjectInspector.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/primitive/LazyPrimitiveObjectInspectorFactory.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyListObjectInspector.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyMapObjectInspector.java
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazySimpleStructObjectInspector.java
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java
  serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py
  serde/src/gen/thrift/gen-cpp/serde_constants.cpp
  serde/src/gen/thrift/gen-cpp/serde_constants.h
  serde/src/gen/thrift/gen-rb/serde_constants.rb
  serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/Constants.java
  serde/src/gen/thrift/gen-php/serde/serde_constants.php
  serde/if/serde.thrift
  ql/src/test/results/clientpositive/charset.q.out
  ql/src/test/results/clientpositive/input35.q.out
  ql/src/test/results/clientpositive/input36.q.out
  ql/src/test/results/clientpositive/transform_charset.q.out
  ql/src/test/queries/clientpositive/transform_charset.q
  ql/src/test/queries/clientpositive/charset.q
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/6027/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Add support for various charsets in LazySimpleSerDe
> ---------------------------------------------------
>
>                 Key: HIVE-2917
>                 URL: https://issues.apache.org/jira/browse/HIVE-2917
>             Project: Hive
>          Issue Type: New Feature
>          Components: CLI, Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Kai Zhang
>         Attachments: HIVE-2917.1.patch.txt, HIVE-2917.2.patch.txt, 
> HIVE-2917.3.patch.txt, HIVE-2917.D2619.1.patch
>
>
> Currently hive can only serialize/deserialize data encoded in utf-8.
> It would be useful to specify the data's charset when creating the table.
> The idea is to add a new keyword CHARSET to set charset at table level.
> For example:
> CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS 
> TERMINATED BY '\t';
> Another place to use CHARSET is in TRANSFORM clause.
> For example:
> SELECT TRANSFORM(col1, col2) ROW FORMAT CHARSET 'gbk'
> USING 'some_script'
> AS (col3, col4) ROW FORMAT CHARSET 'utf-8';

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to