GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10577
[SPARK-12616] [SQL] Adding a New Logical Operator Unions `Union` logical operator only supports two children. Thus, adding a new logical operator `Unions` which can have arbitrary number of children. `Union` logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer to collapse all adjacent `Union`s into a single `Unions`. Note that this problem doesn't exist in physical plan, because the physical Union already supports arbitrary number of children. After this is merged, will submit a separate PR for adding a new optimizer rule: Push `Unions` through `Filter` and `Project` You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark unionAllMultiChildren Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10577.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10577 ---- commit 73270c8aa7b7e387e7b0e75369dfcbf8c554aa5e Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-04T20:09:50Z added a new logical operator UNIONS commit d9811c7bb3f2c15ef9ba6fe95ec0b09f8f66b973 Author: gatorsmile <gatorsm...@gmail.com> Date: 2016-01-04T20:21:36Z Merge remote-tracking branch 'upstream/master' into unionAllMultiChildren ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org