[jira] [Issue Comment Deleted] (SPARK-6587) Inferring schema for case class hierarchy fails with mysterious message

Cheng Lian (JIRA) Sat, 04 Apr 2015 08:28:22 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cheng Lian updated SPARK-6587:
------------------------------
    Comment: was deleted

(was: JSON needs this kind of schema inference because JSON is weakly typed. 
The JSON sample you provided is actually considered as dirty data rather than 
OO-like "polymorphism". So the type reconciliation in case of JSON is designed 
to deal with dirty data. Scala case classes are already well typed, so there 
shouldn't be such kind of dirty, conflicting data.

I think the thing you're looking for is actually [union types in 
Hive|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypes],
 which unfortunately is not supported in Spark SQL yet.)

> Inferring schema for case class hierarchy fails with mysterious message
> -----------------------------------------------------------------------
>
>                 Key: SPARK-6587
>                 URL: https://issues.apache.org/jira/browse/SPARK-6587
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>         Environment: At least Windows 8, Scala 2.11.2.  
>            Reporter: Spiro Michaylov
>
> (Don't know if this is a functionality bug, error reporting bug or an RFE ...)
> I define the following hierarchy:
> {code}
>     private abstract class MyHolder
>     private case class StringHolder(s: String) extends MyHolder
>     private case class IntHolder(i: Int) extends MyHolder
>     private case class BooleanHolder(b: Boolean) extends MyHolder
> {code}
> and a top level case class:
> {code}
>     private case class Thing(key: Integer, foo: MyHolder)
> {code}
> When I try to convert it:
> {code}
>     val things = Seq(
>       Thing(1, IntHolder(42)),
>       Thing(2, StringHolder("hello")),
>       Thing(3, BooleanHolder(false))
>     )
>     val thingsDF = sc.parallelize(things, 4).toDF()
>     thingsDF.registerTempTable("things")
>     val all = sqlContext.sql("SELECT * from things")
> {code}
> I get the following stack trace:
> {noformat}
> Exception in thread "main" scala.MatchError: 
> sql.CaseClassSchemaProblem.MyHolder (of class 
> scala.reflect.internal.Types$ClassNoArgsTypeRef)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:112)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:159)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:157)
>       at scala.collection.immutable.List.map(List.scala:276)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:157)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:107)
>       at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
>       at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:312)
>       at 
> org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:250)
>       at sql.CaseClassSchemaProblem$.main(CaseClassSchemaProblem.scala:35)
>       at sql.CaseClassSchemaProblem.main(CaseClassSchemaProblem.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> {noformat}
> I wrote this to answer [a question on 
> StackOverflow|http://stackoverflow.com/questions/29310405/what-is-the-right-way-to-represent-an-any-type-in-spark-sql]
>  which uses a much simpler approach and suffers the same problem.
> Looking at what seems to me to be the [relevant unit test 
> suite|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/ScalaReflectionRelationSuite.scala]
>  I see that this case is not covered.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-6587) Inferring schema for case class hierarchy fails with mysterious message

Reply via email to