GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/10577

    [SPARK-12616] [SQL] Adding a New Logical Operator Unions

    `Union` logical operator only supports two children. Thus, adding a new 
logical operator `Unions` which can have arbitrary number of children.
    
    `Union` logical plan is a binary node. However, a typical use case for 
union is to union a very large number of input sources (DataFrames, RDDs, or 
files). It is not uncommon to union hundreds of thousands of files. In this 
case, our optimizer can become very slow due to the large number of logical 
unions. We should change the Union logical plan to support an arbitrary number 
of children, and add a single rule in the optimizer to collapse all adjacent 
`Union`s into a single `Unions`. Note that this problem doesn't exist in 
physical plan, because the physical Union already supports arbitrary number of 
children.
    
    After this is merged, will submit a separate PR for adding a new optimizer 
rule:  Push `Unions` through `Filter` and `Project`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark unionAllMultiChildren

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10577
    
----
commit 73270c8aa7b7e387e7b0e75369dfcbf8c554aa5e
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-04T20:09:50Z

    added a new logical operator UNIONS

commit d9811c7bb3f2c15ef9ba6fe95ec0b09f8f66b973
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-01-04T20:21:36Z

    Merge remote-tracking branch 'upstream/master' into unionAllMultiChildren

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to