Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/20858 @maropu What other libraries do you mean? I'm not aware of any library providing this functionality on top Spark SQL. When using Spark SQL as an ETL tool for structured and nested data, people are forced to use UDFs for transforming arrays since current api for array columns is lacking. This approach brings several drawbacks: - bad code readability - Catalyst is blind when performing optimizations - impossibility to track data lineage of the transformation (a key aspect for the financial industry, see [Spline](https://absaoss.github.io/spline/) and [Spline paper](https://github.com/AbsaOSS/spline/releases/download/release%2F0.2.7/Spline_paper_IEEE_2018.pdf)) So my colleagues and I decided to extend the current Spark SQL API with well-known collection functions like concat, flatten, zipWithIndex, etc. We don't want to keep this functionality just in our fork of Spark, but would like to share it with others.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org