----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28372/#review62911 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java <https://reviews.apache.org/r/28372/#comment105080> This class will need more work in order to detect unannotated types as specified in the following tickets: https://issues.apache.org/jira/browse/HIVE-8909 https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md I was going to add more comments, but I then noticed that this code will look as little similar (for loops, and converters) to the HIVE-8909 patch. Of course, the HIVE-8909 patch returns converter objects, and this returns the column names & types. So, I was thinking if we can make use of the DataWritableRecordConverter.java class to get the correct converters, and then translate the converters to column names & types. This is what I found while debugging valid parquet files: Each example has 3 blocks: - parquet file schema - hive columne names & types - converters returned by DataWritableRecordConverter Could we use the converter objects and translate them to names? message SingleFieldGroupInList { optional group single_element_groups (LIST) { repeated group single_element_group { required int64 count; } } } single_element_groups ARRAY<BIGINT> hivestructconverter hivecollectionconverter:elementconverter array<> EINT64_CONVERTER bigint -------------------------------------------------------------------------------- message HiveRequiredGroupInList { optional group locations (LIST) { repeated group bag { required group element { required double latitude; required double longitude; } } } } locations ARRAY<STRUCT<latitude: DOUBLE, longitude: DOUBLE>> hivestructconverter hivecollectionconverter:elementconverter array<> hivestructconverter struct<> DOUBLE_CONVERTER double DOUBLE_CONVERTER double -------------------------------------------------------------------------------- message UnannotatedListOfPrimitives { repeated int32 list_of_ints; } list_of_ints ARRAY<INT> hivestructconverter RepeatedPrimitiveConverter array<> EINT32_CONVERTER int -------------------------------------------------------------------------------- - Sergio Pena On Nov. 23, 2014, 1:34 a.m., Ashish Singh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28372/ > ----------------------------------------------------------- > > (Updated Nov. 23, 2014, 1:34 a.m.) > > > Review request for hive. > > > Bugs: HIVE-8950 > https://issues.apache.org/jira/browse/HIVE-8950 > > > Repository: hive-git > > > Description > ------- > > HIVE-8950: Add support in ParquetHiveSerde to create table schema from a > parquet file > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > fafd78e63e9b41c9fdb0e017b567dc719d151784 > data/files/data.parq PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 4effe736fcf9d3715f03eed9885c299a7aa040dd > ql/src/test/queries/clientpositive/parquet_create_gen_schema.q PRE-CREATION > ql/src/test/queries/clientpositive/parquet_create_gen_schema1.q > PRE-CREATION > ql/src/test/results/clientpositive/parquet_create_gen_schema.q.out > PRE-CREATION > ql/src/test/results/clientpositive/parquet_create_gen_schema1.q.out > PRE-CREATION > > Diff: https://reviews.apache.org/r/28372/diff/ > > > Testing > ------- > > Tested by adding appropriate qTests. > > > Thanks, > > Ashish Singh > >