[ https://issues.apache.org/jira/browse/SPARK-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-17093: ------------------------------- Priority: Blocker (was: Critical) > Roundtrip encoding of array<struct<>> fields is wrong when whole-stage > codegen is disabled > ------------------------------------------------------------------------------------------ > > Key: SPARK-17093 > URL: https://issues.apache.org/jira/browse/SPARK-17093 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Josh Rosen > Priority: Blocker > > The following failing test demonstrates a bug where Spark mis-encodes > array-of-struct fields if whole-stage codegen is disabled: > {code} > withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { > val data = Array(Array((1, 2), (3, 4))) > val ds = spark.sparkContext.parallelize(data).toDS() > assert(ds.collect() === data) > } > {code} > When wholestage codegen is enabled (the default), this works fine. When it's > disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. > Because the last element of the array appears to be repeated my best guess is > that the interpreted evaluation codepath forgot to {{copy()}} somewhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org