Burak Yavuz created SPARK-15835:
-----------------------------------

             Summary: The read path of json doesn't support write path when 
schema contains Options
                 Key: SPARK-15835
                 URL: https://issues.apache.org/jira/browse/SPARK-15835
             Project: Spark
          Issue Type: Bug
            Reporter: Burak Yavuz



my schema contains optional fields. When these fields are written in json (and 
all of these records are None), the field will be omitted during writes. When 
reading, these fields can't be found and this throws an exception.
Either during writes, the fields should be included as `null`, or the Dataset 
should not require the field to exist in the DataFrame if the field is an 
Option (which may be a better solution)

{code}
case class Bug(field1: String, field2: Option[String])
Seq(Bug("abc", None)).toDS.write.json("/tmp/sqlBug")
spark.read.json("/tmp/sqlBug").as[Bug]
{code}

stack trace:
{code}
org.apache.spark.sql.AnalysisException: cannot resolve '`field2`' given input 
columns: [field1]
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to