Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file

Sergio Pena Mon, 24 Nov 2014 16:38:07 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28372/#review62911
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java
<https://reviews.apache.org/r/28372/#comment105080>

    This class will need more work in order to detect unannotated types as 
specified in the following tickets:
    https://issues.apache.org/jira/browse/HIVE-8909
    
https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md
    
    I was going to add more comments, but I then noticed that this code will 
look as little similar (for loops, and converters) to the HIVE-8909 patch. Of 
course, the HIVE-8909 patch returns converter objects, and this returns the 
column names & types. So, I was thinking if we can make use of the 
DataWritableRecordConverter.java class to get the correct converters, and then 
translate the converters to column names & types. 
    
    This is what I found while debugging valid parquet files:
    Each example has 3 blocks:
    - parquet file schema
    - hive columne names & types
    - converters returned by DataWritableRecordConverter
    
    Could we use the converter objects and translate them to names? 
    
    message SingleFieldGroupInList {
      optional group single_element_groups (LIST) {
        repeated group single_element_group {
          required int64 count;
        }
      }
    }
    
    single_element_groups ARRAY<BIGINT>
    
        hivestructconverter                             
                hivecollectionconverter:elementconverter        array<>
                        EINT64_CONVERTER                                        
                bigint
                        
    
--------------------------------------------------------------------------------
    
    message HiveRequiredGroupInList {
      optional group locations (LIST) {
        repeated group bag {
          required group element {
            required double latitude;
            required double longitude;
          }
        }
      }
    }
                        
    locations ARRAY<STRUCT<latitude: DOUBLE, longitude: DOUBLE>>
    
        hivestructconverter
                hivecollectionconverter:elementconverter        array<>
                        hivestructconverter                                     
                struct<>
                                DOUBLE_CONVERTER                                
                        double
                                DOUBLE_CONVERTER                                
                        double
                                
    
--------------------------------------------------------------------------------
                            
                        
    message UnannotatedListOfPrimitives {
      repeated int32 list_of_ints;
    }
        
    list_of_ints ARRAY<INT>
    
        hivestructconverter
                RepeatedPrimitiveConverter                                      
        array<>
                        EINT32_CONVERTER                                        
                        int
                        
    
--------------------------------------------------------------------------------


- Sergio Pena


On Nov. 23, 2014, 1:34 a.m., Ashish Singh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28372/
> -----------------------------------------------------------
> 
> (Updated Nov. 23, 2014, 1:34 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8950
>     https://issues.apache.org/jira/browse/HIVE-8950
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-8950: Add support in ParquetHiveSerde to create table schema from a 
> parquet file
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> fafd78e63e9b41c9fdb0e017b567dc719d151784 
>   data/files/data.parq PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 4effe736fcf9d3715f03eed9885c299a7aa040dd 
>   ql/src/test/queries/clientpositive/parquet_create_gen_schema.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_create_gen_schema1.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_create_gen_schema.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_create_gen_schema1.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/28372/diff/
> 
> 
> Testing
> -------
> 
> Tested by adding appropriate qTests.
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>

Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file

Reply via email to