[jira] [Commented] (SPARK-22474) cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]

2019-09-18 Thread Arun Khetarpal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932257#comment-16932257
 ] 

Arun Khetarpal commented on SPARK-22474:


Why is this closed? It seems we still hit this issue with 2.4.0

> cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]
> -
>
> Key: SPARK-22474
> URL: https://issues.apache.org/jira/browse/SPARK-22474
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.0
>Reporter: Mikael Valot
>Priority: Major
>  Labels: bulk-closed
>
> The following code run in spark-shell throws an exception. It is working fine 
> in Spark 2.0.2
> {code:java}
> case class MyId(v: String)
> case class MyClass(infos: Seq[Map[MyId, String]])
> val seq = Seq(MyClass(Seq(Map(MyId("1234") -> "blah"
> seq.toDS().write.parquet("/tmp/myclass")
> spark.read.parquet("/tmp/myclass").as[MyClass].collect()
> Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected 
> to be a primitive type, but found: required group key {
>   optional binary v (UTF8);
> };
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:246)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:201)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:87)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:84)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetSchemaConverter$$convert(ParquetSchemaConverter.scala:84)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:483)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:298)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:183)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:180)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.(ParquetRowConverter.scala:180)
>   at 
> 

[jira] [Commented] (SPARK-22474) cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]

2017-11-09 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245915#comment-16245915
 ] 

Kazuaki Ishizaki commented on SPARK-22474:
--

This check was introduced by [this 
PR|https://github.com/apache/spark/commit/8ab50765cd793169091d983b50d87a391f6ac1f4].
While I did not run it with Spark 2.0.2, Spark 2.0.2 seem to include this PR, 
too.

> cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]
> -
>
> Key: SPARK-22474
> URL: https://issues.apache.org/jira/browse/SPARK-22474
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.0
>Reporter: Mikael Valot
>
> The following code run in spark-shell throws an exception. It is working fine 
> in Spark 2.0.2
> {code:java}
> case class MyId(v: String)
> case class MyClass(infos: Seq[Map[MyId, String]])
> val seq = Seq(MyClass(Seq(Map(MyId("1234") -> "blah"
> seq.toDS().write.parquet("/tmp/myclass")
> spark.read.parquet("/tmp/myclass").as[MyClass].collect()
> Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected 
> to be a primitive type, but found: required group key {
>   optional binary v (UTF8);
> };
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:246)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:201)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:87)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:84)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetSchemaConverter$$convert(ParquetSchemaConverter.scala:84)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:483)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:298)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:183)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:180)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> 

[jira] [Commented] (SPARK-22474) cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]

2017-11-09 Thread Mikael Valot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245875#comment-16245875
 ] 

Mikael Valot commented on SPARK-22474:
--

I can read the file if I comment out line 581 in 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter
{code:java}
  def checkConversionRequirement(f: => Boolean, message: String): Unit = {
if (!f) {
//  throw new AnalysisException(message)
}
  }
{code}



> cannot read a parquet file containing a Seq[Map[MyCaseClass, String]]
> -
>
> Key: SPARK-22474
> URL: https://issues.apache.org/jira/browse/SPARK-22474
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.0
>Reporter: Mikael Valot
>
> The following code run in spark-shell throws an exception. It is working fine 
> in Spark 2.0.2
> {code:java}
> case class MyId(v: String)
> case class MyClass(infos: Seq[Map[MyId, String]])
> val seq = Seq(MyClass(Seq(Map(MyId("1234") -> "blah"
> seq.toDS().write.parquet("/tmp/myclass")
> spark.read.parquet("/tmp/myclass").as[MyClass].collect()
> Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected 
> to be a primitive type, but found: required group key {
>   optional binary v (UTF8);
> };
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:246)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:201)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:87)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$2.apply(ParquetSchemaConverter.scala:84)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetSchemaConverter$$convert(ParquetSchemaConverter.scala:84)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertGroupField$1.apply(ParquetSchemaConverter.scala:201)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:201)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:109)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$ParquetArrayConverter.(ParquetRowConverter.scala:483)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:298)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:183)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$$anonfun$6.apply(ParquetRowConverter.scala:180)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at