> On Nov. 25, 2014, 12:37 a.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java,
> >  line 35
> > <https://reviews.apache.org/r/28372/diff/1/?file=773791#file773791line35>
> >
> >     This class will need more work in order to detect unannotated types as 
> > specified in the following tickets:
> >     https://issues.apache.org/jira/browse/HIVE-8909
> >     
> > https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md
> >     
> >     I was going to add more comments, but I then noticed that this code 
> > will look as little similar (for loops, and converters) to the HIVE-8909 
> > patch. Of course, the HIVE-8909 patch returns converter objects, and this 
> > returns the column names & types. So, I was thinking if we can make use of 
> > the DataWritableRecordConverter.java class to get the correct converters, 
> > and then translate the converters to column names & types. 
> >     
> >     This is what I found while debugging valid parquet files:
> >     Each example has 3 blocks:
> >     - parquet file schema
> >     - hive columne names & types
> >     - converters returned by DataWritableRecordConverter
> >     
> >     Could we use the converter objects and translate them to names? 
> >     
> >     message SingleFieldGroupInList {
> >       optional group single_element_groups (LIST) {
> >         repeated group single_element_group {
> >           required int64 count;
> >         }
> >       }
> >     }
> >     
> >     single_element_groups ARRAY<BIGINT>
> >     
> >             hivestructconverter                             
> >                     hivecollectionconverter:elementconverter        array<>
> >                             EINT64_CONVERTER                                
> >                         bigint
> >                             
> >     
> > --------------------------------------------------------------------------------
> >     
> >     message HiveRequiredGroupInList {
> >       optional group locations (LIST) {
> >         repeated group bag {
> >           required group element {
> >             required double latitude;
> >             required double longitude;
> >           }
> >         }
> >       }
> >     }
> >                             
> >     locations ARRAY<STRUCT<latitude: DOUBLE, longitude: DOUBLE>>
> >     
> >             hivestructconverter
> >                     hivecollectionconverter:elementconverter        array<>
> >                             hivestructconverter                             
> >                         struct<>
> >                                     DOUBLE_CONVERTER                        
> >                                 double
> >                                     DOUBLE_CONVERTER                        
> >                                 double
> >                                     
> >     
> > --------------------------------------------------------------------------------
> >                                 
> >                             
> >     message UnannotatedListOfPrimitives {
> >       repeated int32 list_of_ints;
> >     }
> >             
> >     list_of_ints ARRAY<INT>
> >     
> >             hivestructconverter
> >                     RepeatedPrimitiveConverter                              
> >                 array<>
> >                             EINT32_CONVERTER                                
> >                                 int
> >                             
> >     
> > --------------------------------------------------------------------------------

Sergio, thanks for the review and valuable comments!

DataWritableRecordConverter expects hive schema. So, not sure if that can be 
used to create hive schema. I have updated the patch to take care of rules 
listed on 
"https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md";.
 Kindly take a look.


- Ashish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28372/#review62911
-----------------------------------------------------------


On Nov. 27, 2014, 1:08 a.m., Ashish Singh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28372/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2014, 1:08 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8950
>     https://issues.apache.org/jira/browse/HIVE-8950
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-8950: Add support in ParquetHiveSerde to create table schema from a 
> parquet file
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> fafd78e63e9b41c9fdb0e017b567dc719d151784 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 4effe736fcf9d3715f03eed9885c299a7aa040dd 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/28372/diff/
> 
> 
> Testing
> -------
> 
> Tested by adding appropriate qTests.
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>

Reply via email to