Josh Rosen created SPARK-17093:
----------------------------------

             Summary: Roundtrip encoding of array<struct<>> fields is wrong 
when whole-stage codegen is disabled
                 Key: SPARK-17093
                 URL: https://issues.apache.org/jira/browse/SPARK-17093
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Josh Rosen
            Priority: Critical


The following failing test demonstrates a bug where Spark mis-encodes 
array-of-struct fields if whole-stage codegen is disabled:

{code}
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
  val data = Array(Array((1, 2), (3, 4)))
  val ds = spark.sparkContext.parallelize(data).toDS()
  assert(ds.collect() === data)
}
{code}

When wholestage codegen is enabled (the default), this works fine. When it's 
disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. 
Because the last element of the array appears to be repeated my best guess is 
that the interpreted evaluation codepath forgot to {{copy()}} somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to