Hello, At Netflix's algorithm team, we work on ranking problems a lot where we naturally deal with the dataset with nested list of the structs. We built Scala APIs like map, filter, drop, withColumn that can work on the nested list of structs efficiently using SQL expression with codegen.
Here is what we purpose on how APIs will look like, and we would like to socialize with community to get more feedback! https://issues.apache.org/jira/browse/SPARK-22231 It will be cool to share some building blocks with Databricks's higher order function feature. Thanks. On Fri, Jun 9, 2017 at 5:04 PM, Antoine HOM <antoine....@gmail.com> wrote: > Good news :) Thx Sameer. > > > On Friday, June 9, 2017, Sameer Agarwal <sam...@databricks.com> wrote: >>> >>> * As a heavy user of complex data types I was wondering if there was >>> any plan to push those changes upstream? >> >> >> Yes, we intend to contribute this to open source. >> >>> >>> * In addition, I was wondering if as part of this change it also tries >>> to solve the column pruning / filter pushdown issues with complex >>> datatypes? >> >> >> For parquet, this effort is primarily tracked via SPARK-4502 (see >> https://github.com/apache/spark/pull/16578) and is currently targeted for >> 2.3. -- Sincerely, DB Tsai ---------------------------------------------------------- PGP Key ID: 0x5CED8B896A6BDFA0 --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org