Josh Rosen created SPARK-17093: ---------------------------------- Summary: Roundtrip encoding of array<struct<>> fields is wrong when whole-stage codegen is disabled Key: SPARK-17093 URL: https://issues.apache.org/jira/browse/SPARK-17093 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Josh Rosen Priority: Critical
The following failing test demonstrates a bug where Spark mis-encodes array-of-struct fields if whole-stage codegen is disabled: {code} withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { val data = Array(Array((1, 2), (3, 4))) val ds = spark.sparkContext.parallelize(data).toDS() assert(ds.collect() === data) } {code} When wholestage codegen is enabled (the default), this works fine. When it's disabled, as in the test above, Spark returns {{Array(Array((3,4), (3,4)))}}. Because the last element of the array appears to be repeated my best guess is that the interpreted evaluation codepath forgot to {{copy()}} somewhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org