[jira] [Commented] (SPARK-22474) cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]
[ https://issues.apache.org/jira/browse/SPARK-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932257#comment-16932257 ] Arun Khetarpal commented on SPARK-22474: Why is this closed? It seems we still hit this issue with 2.4.0 > cannot read a parquet file containing a Seq[Map[MyCaseClass, String]] > - > > Key: SPARK-22474 > URL: https://issues.apache.org/jira/browse/SPARK-22474 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.2, 2.2.0 >Reporter: Mikael Valot >Priority: Major > Labels: bulk-closed > > The following code run in spark-shell throws an exception. It is working fine > in Spark 2.0.2 > {code:java} > case class MyId(v: String) > case class MyClass(infos: Seq[Map[MyId, String]]) > val seq = Seq(MyClass(Seq(Map(MyId("1234") -> "blah" > seq.toDS().write.parquet("/tmp/myclass") > spark.read.parquet("/tmp/myclass").as[MyClass].collect() > Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected > to be a primitive type, but found: required group key { > optional binary v (UTF8); > }; > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:246) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:201) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:87) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:84) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetSchemaConverter$$convert(ParquetSchemaConverter.scala:84) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:483) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:298) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:183) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:180) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:180) > at >
[jira] [Commented] (SPARK-22474) cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]
[ https://issues.apache.org/jira/browse/SPARK-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245915#comment-16245915 ] Kazuaki Ishizaki commented on SPARK-22474: -- This check was introduced by [this PR|https://github.com/apache/spark/commit/8ab50765cd793169091d983b50d87a391f6ac1f4]. While I did not run it with Spark 2.0.2, Spark 2.0.2 seem to include this PR, too. > cannot read a parquet file containing a Seq[Map[MyCaseClass, String]] > - > > Key: SPARK-22474 > URL: https://issues.apache.org/jira/browse/SPARK-22474 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.2, 2.2.0 >Reporter: Mikael Valot > > The following code run in spark-shell throws an exception. It is working fine > in Spark 2.0.2 > {code:java} > case class MyId(v: String) > case class MyClass(infos: Seq[Map[MyId, String]]) > val seq = Seq(MyClass(Seq(Map(MyId("1234") -> "blah" > seq.toDS().write.parquet("/tmp/myclass") > spark.read.parquet("/tmp/myclass").as[MyClass].collect() > Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected > to be a primitive type, but found: required group key { > optional binary v (UTF8); > }; > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:246) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:201) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:87) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:84) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetSchemaConverter$$convert(ParquetSchemaConverter.scala:84) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:483) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:298) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:183) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:180) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at >
[jira] [Commented] (SPARK-22474) cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]
[ https://issues.apache.org/jira/browse/SPARK-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245875#comment-16245875 ] Mikael Valot commented on SPARK-22474: -- I can read the file if I comment out line 581 in org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter {code:java} def checkConversionRequirement(f: => Boolean, message: String): Unit = { if (!f) { // throw new AnalysisException(message) } } {code} > cannot read a parquet file containing a Seq[Map[MyCaseClass, String]] > - > > Key: SPARK-22474 > URL: https://issues.apache.org/jira/browse/SPARK-22474 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.2, 2.2.0 >Reporter: Mikael Valot > > The following code run in spark-shell throws an exception. It is working fine > in Spark 2.0.2 > {code:java} > case class MyId(v: String) > case class MyClass(infos: Seq[Map[MyId, String]]) > val seq = Seq(MyClass(Seq(Map(MyId("1234") -> "blah" > seq.toDS().write.parquet("/tmp/myclass") > spark.read.parquet("/tmp/myclass").as[MyClass].collect() > Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected > to be a primitive type, but found: required group key { > optional binary v (UTF8); > }; > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:246) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:201) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:87) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:84) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetSchemaConverter$$convert(ParquetSchemaConverter.scala:84) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:483) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:298) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:183) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:180) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at