Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Mohit Sabharwal Sun, 23 Nov 2014 14:59:53 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62744
-----------------------------------------------------------




data/files/parquet_types.txt
<https://reviews.apache.org/r/28147/#comment104827>

    I think this is bit confusing, since the 0b prefix gives the impression 
that data is read in binary format, whereas it is actually getting read as a 
string.
    
    I think we can either write (preferably non-ascii) binary data instead (for 
example, see: data/files/string.txt) OR alternatively, we could write it 
legibly in hex, like 68656c6c6f ("hello") and convert it to binary using 
unhex() in the INSERT OVERWRITE query. What do you think ?



ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104828>

    If we write hex format (like 68656c6c6f) in parquet_types.q, we can just 
use unhex() to convert it to binary:
    
    INSERT OVERWRITE TABLE parquet_types
    SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, 
cvarchar, unhex(cbinary), m1, l1, st1 FROM parquet_types_staging;



ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104830>

    Instead of "select * from parquet_types"... since cbinary column may have 
unprintable characters, you can pass it through hex() to make it legible:
    
    SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, 
cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types;



ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104829>

    No need to unhex here...
    
    Can just be:
    
     SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM 
parquet_types
     
    Or you can pass it through hex() if original data has unprintable 
characters:
    
     SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) FROM 
parquet_types


- Mohit Sabharwal


On Nov. 21, 2014, 8:53 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2014, 8:53 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
>  472de8f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
>  d5aae3b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 4effe73 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
> 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>

Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Reply via email to