> On Nov. 25, 2014, 12:37 a.m., Sergio Pena wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java, > > line 35 > > <https://reviews.apache.org/r/28372/diff/1/?file=773791#file773791line35> > > > > This class will need more work in order to detect unannotated types as > > specified in the following tickets: > > https://issues.apache.org/jira/browse/HIVE-8909 > > > > https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md > > > > I was going to add more comments, but I then noticed that this code > > will look as little similar (for loops, and converters) to the HIVE-8909 > > patch. Of course, the HIVE-8909 patch returns converter objects, and this > > returns the column names & types. So, I was thinking if we can make use of > > the DataWritableRecordConverter.java class to get the correct converters, > > and then translate the converters to column names & types. > > > > This is what I found while debugging valid parquet files: > > Each example has 3 blocks: > > - parquet file schema > > - hive columne names & types > > - converters returned by DataWritableRecordConverter > > > > Could we use the converter objects and translate them to names? > > > > message SingleFieldGroupInList { > > optional group single_element_groups (LIST) { > > repeated group single_element_group { > > required int64 count; > > } > > } > > } > > > > single_element_groups ARRAY<BIGINT> > > > > hivestructconverter > > hivecollectionconverter:elementconverter array<> > > EINT64_CONVERTER > > bigint > > > > > > -------------------------------------------------------------------------------- > > > > message HiveRequiredGroupInList { > > optional group locations (LIST) { > > repeated group bag { > > required group element { > > required double latitude; > > required double longitude; > > } > > } > > } > > } > > > > locations ARRAY<STRUCT<latitude: DOUBLE, longitude: DOUBLE>> > > > > hivestructconverter > > hivecollectionconverter:elementconverter array<> > > hivestructconverter > > struct<> > > DOUBLE_CONVERTER > > double > > DOUBLE_CONVERTER > > double > > > > > > -------------------------------------------------------------------------------- > > > > > > message UnannotatedListOfPrimitives { > > repeated int32 list_of_ints; > > } > > > > list_of_ints ARRAY<INT> > > > > hivestructconverter > > RepeatedPrimitiveConverter > > array<> > > EINT32_CONVERTER > > int > > > > > > --------------------------------------------------------------------------------
Sergio, thanks for the review and valuable comments! DataWritableRecordConverter expects hive schema. So, not sure if that can be used to create hive schema. I have updated the patch to take care of rules listed on "https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md". Kindly take a look. - Ashish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28372/#review62911 ----------------------------------------------------------- On Nov. 27, 2014, 1:08 a.m., Ashish Singh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28372/ > ----------------------------------------------------------- > > (Updated Nov. 27, 2014, 1:08 a.m.) > > > Review request for hive. > > > Bugs: HIVE-8950 > https://issues.apache.org/jira/browse/HIVE-8950 > > > Repository: hive-git > > > Description > ------- > > HIVE-8950: Add support in ParquetHiveSerde to create table schema from a > parquet file > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > fafd78e63e9b41c9fdb0e017b567dc719d151784 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 4effe736fcf9d3715f03eed9885c299a7aa040dd > > ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > > Diff: https://reviews.apache.org/r/28372/diff/ > > > Testing > ------- > > Tested by adding appropriate qTests. > > > Thanks, > > Ashish Singh > >
