Zamil Majdy created SPARK-45604: ----------------------------------- Summary: Converting timestamp_ntz to array<timestamp_ntz> can cause NPE or SEGFAULT on parquet vectorized reader Key: SPARK-45604 URL: https://issues.apache.org/jira/browse/SPARK-45604 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: Zamil Majdy
Repro: {{{}```{}}}{{{}{}}} spark.conf.set("spark.databricks.photon.enabled", "false") {{}} val path = "/tmp/somepath" val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) AS field") {{}} df.write.mode("overwrite").parquet(path) spark.read.schema("field map<string, array<timestamp_ntz>>").parquet(path).collect() {{{}{}}}{{{}```{}}} Depending on the memory mode is used, it will produced NPE on on-heap mode, and segfault on off-heap -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org