panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2010991612
- Why is the result displayed through `to_csv` inconsistency in Scala and Python for this case? Because this case is on the `python side`, it ultimately uses `GenericArrayData`, which happens to implement the method `toString`, so `to_csv` displays readable text. https://github.com/apache/spark/blob/11247d804cd370aaeb88736a706c587e7f5c83b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L85 However, on the `scala side`, it ultimately uses `UnsafeArrayData`. `Unfortunately`, it does not implement the method `toString` (using the default `Object.toString` method), so the final `to_csv` displays `the address of the object`. - In the implementation process of this PR, it can display `non-standard but pretty strings`, as follows: https://github.com/apache/spark/pull/44665/commits/9695e975f3299556e7c268918ecd51be7a03c157 <img width="605" alt="image" src="https://github.com/apache/spark/assets/15246973/fd07dc0a-4d61-4663-8631-daff518da278"> The `disadvantage` of this is that it `cannot` be `read back` through `from_csv` `at present`. If the final result of the discussion is acceptable, it should be easy to bring back this feature. - Another possible compromise solution is to add a configuration (defaultly, it does `not` support displaying data of type [Array, Map, Struct ...] as `non-standard but pretty strings` through `to_csv`). If the user sets this configuration to be enabled, restore the original behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org