[ https://issues.apache.org/jira/browse/SPARK-15687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-15687: -------------------------------- Priority: Critical (was: Major) > Columnar execution engine > ------------------------- > > Key: SPARK-15687 > URL: https://issues.apache.org/jira/browse/SPARK-15687 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Reynold Xin > Priority: Critical > > This ticket tracks progress in making the entire engine columnar, especially > in the context of nested data type support. > In Spark 2.0, we have used the internal column batch interface in Parquet > reading (via a vectorized Parquet decoder) and low cardinality aggregation. > Other parts of the engine are already using whole-stage code generation, > which is in many ways more efficient than a columnar execution engine for > flat data types. > The goal here is to figure out a story to work towards making column batch > the common data exchange format between operators outside whole-stage code > generation, as well as with external systems (e.g. Pandas). > Some of the important questions to answer are: > - What is the end state architecture? > - Should aggregation be columnar? > - Should sorting be columnar? > - How do we handle nested data types? > - What is the transition plan towards the end state? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org