[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

marmbrus Sat, 13 Sep 2014 12:18:55 -0700

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/2382


    [SPARK-3414][SQL] Replace LowerCaseSchema with Resolver

    _This PR is a follow up to #2293 (and to a lesser extent #2262 #2334)._
    
    In #2293 the catalog was changed to store analyzed logical plans instead of 
unresolved ones.  While this change fixed the reported bug (which was caused by 
yet another instance of us forgetting to put in a `LowerCaseSchema` operator) 
it had the consequence of breaking assumptions made by `MultiInstanceRelation`. 
 Specifically, we can't replace swap out leaf operators in a tree without 
rewriting changed expression ids (which happens when you self join the same RDD 
that has been registered as a temp table).
    
    In this PR, I instead remove the need to insert `LowerCaseSchema` operators 
at all, and instead move the concern of matching up identifiers completely into 
analysis.  Doing so allows the test cases from both #2293 and #2262 to pass at 
the same time (and likely fixes a slew of other "unknown unknown" bugs).
    
    While it is rolled back in this PR, storing the analyzed plan might 
actually be a good idea.  For instance, it is kind of confusing if you register 
a temporary table, change the case sensitivity of resolution and now you can't 
query that table anymore.  This can be addressed in a follow up PR.
    
    Follow-ups:
     - Configurable case sensitivity
     - Consider storing analyzed plans for temp tables

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark lowercase

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2382.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2382
    
----
commit 19a61b93b2d3f835d1f1e286b2c91cfcf16371f8
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-08-27T19:43:38Z

    Decrease partitions when testing

commit 76b3e04a7140fb4e06bc51827341659282bff2f8
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-08-27T20:40:48Z

    increase test parallelism

commit fd7b671267687d46d4b7304cd5cdd44c5c926b2f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-09-09T01:58:28Z

    Make parquet tests less order dependent

commit dc7cb6ea81f5cd1da2247a43e8b4f81e78b52715
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-09-10T20:59:40Z

    more test fixes

commit 9188b59b0b741376608b91dd71afdd3b90ac07f9
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-09-13T05:55:19Z

    Merge branch 'shufflePartitions' into lowercase

commit c2f2ec8b5967b3e69efee4ba054afb96320ac673
Author: Michael Armbrust <mich...@databricks.com>
Date:   2014-09-13T19:00:29Z

    Replace LowerCaseSchema with Resolver.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

Reply via email to