Hello, We are trying to insert a case class in Parquet using SparkSql. When i'm creating the SchemaRDD, that include a Set, i have the following exception:
sqc.createSchemaRDD(r) scala.MatchError: Set[(scala.Int, scala.Int)] (of class scala.reflect.internal.Types$TypeRef$$anon$1) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:62) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50) at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44) at org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229) at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:62) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:64) at $iwC$$iwC$$iwC.<init>(<console>:66) at $iwC$$iwC.<init>(<console>:68) at $iwC.<init>(<console>:70) at <init>(<console>:72) at .<init>(<console>:76) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) There is a code snippet: scala> case class A(a: Set[(Int, Int)]) defined class A scala> val a = A(Set((1, 2), (1, 3))) a: A = A(Set((1,2), (1,3))) scala> val r = sc.parallelize(Array(a)) r: org.apache.spark.rdd.RDD[A] = ParallelCollectionRDD[17] at parallelize at <console>:44 scala> sqc.createSchemaRDD(r) scala.MatchError: Set[(scala.Int, scala.Int)] (of class scala.reflect.internal.Types$TypeRef$$anon$1) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:62) .... This code has been tested with Spark 1.1.0 y 1.2.0, is this the expected behaviour or maybe we doing something wrong? Un saludo Jorge López-Malla Matute Big Data Developer Vía de las Dos Castillas, 33. Ática 4. 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: 91 828 64 73 // @stratiobd