Repository: spark Updated Branches: refs/heads/master 6a68c5d7b -> 5de1737b0
[SPARK-16777][SQL] Do not use deprecated listType API in ParquetSchemaConverter ## What changes were proposed in this pull request? This PR removes build waning as below. ```scala [WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:448: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. [WARNING] ConversionPatterns.listType( [WARNING] ^ [WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:464: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information. [WARNING] ConversionPatterns.listType( [WARNING] ^ ``` This should not use `listOfElements` (recommended to be replaced from `listType`) instead because the new method checks if the name of elements in Parquet's `LIST` is `element` in Parquet schema and throws an exception if not. However, It seems Spark prior to 1.4.x writes `ArrayType` with Parquet's `LIST` but with `array` as its element name. Therefore, this PR avoids to use both `listOfElements` and `listType` but just use the existing schema builder to construct the same `GroupType`. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls...@gmail.com> Closes #14399 from HyukjinKwon/SPARK-16777. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5de1737b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5de1737b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5de1737b Branch: refs/heads/master Commit: 5de1737b02710e36f6804d2ae243d1aeb30a0b32 Parents: 6a68c5d Author: hyukjinkwon <gurwls...@gmail.com> Authored: Wed Sep 28 00:39:47 2016 +0800 Committer: Cheng Lian <l...@databricks.com> Committed: Wed Sep 28 00:39:47 2016 +0800 ---------------------------------------------------------------------- .../parquet/ParquetSchemaConverter.scala | 26 +++++++++++++------- 1 file changed, 17 insertions(+), 9 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/5de1737b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala index c81a65f..b4f36ce 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala @@ -445,14 +445,20 @@ private[parquet] class ParquetSchemaConverter( // repeated <element-type> array; // } // } - ConversionPatterns.listType( - repetition, - field.name, - Types + + // This should not use `listOfElements` here because this new method checks if the + // element name is `element` in the `GroupType` and throws an exception if not. + // As mentioned above, Spark prior to 1.4.x writes `ArrayType` as `LIST` but with + // `array` as its element name as below. Therefore, we build manually + // the correct group type here via the builder. (See SPARK-16777) + Types + .buildGroup(repetition).as(LIST) + .addField(Types .buildGroup(REPEATED) - // "array_element" is the name chosen by parquet-hive (1.7.0 and prior version) + // "array" is the name chosen by parquet-hive (1.7.0 and prior version) .addField(convertField(StructField("array", elementType, nullable))) .named("bag")) + .named(field.name) // Spark 1.4.x and prior versions convert ArrayType with non-nullable elements into a 2-level // LIST structure. This behavior mimics parquet-avro (1.6.0rc3). Note that this case is @@ -461,11 +467,13 @@ private[parquet] class ParquetSchemaConverter( // <list-repetition> group <name> (LIST) { // repeated <element-type> element; // } - ConversionPatterns.listType( - repetition, - field.name, + + // Here too, we should not use `listOfElements`. (See SPARK-16777) + Types + .buildGroup(repetition).as(LIST) // "array" is the name chosen by parquet-avro (1.7.0 and prior version) - convertField(StructField("array", elementType, nullable), REPEATED)) + .addField(convertField(StructField("array", elementType, nullable), REPEATED)) + .named(field.name) // Spark 1.4.x and prior versions convert MapType into a 3-level group annotated by // MAP_KEY_VALUE. This is covered by `convertGroupField(field: GroupType): DataType`. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org