[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 Thank YOU for your PR and open discussion on this, @seancxmao . Let's see in another PRs. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-16 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 Sure, close this PR. Thank you all for your time and insights. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 Could you close this PR and JIRA, @seancxmao ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-12 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 I agree that correctness is more important. If we should not make behaviors consistent when do the convertion, I will close this PR. @cloud-fan @gatorsmile what do you think? ---

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 Compatibility is not a gold rule if it sacrifices correctness. Fast and **wrong** result doesn't looks like benefits to me. Do you think the customer want to get a wrong result like Hive?

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-11 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 It keeps Hive compatibility but loses performance benefit by setting spark.sql.hive.convertMetastoreParquet=false. We can do better by enabling the conversion and still keeping Hive

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 @seancxmao . For Hive compatibility, `spark.sql.hive.convertMetastoreParquet=false` looks enough to me. --- - To

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-11 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 Could we see this as a behavior change? We can add a legacy conf (e.g. `spark.sql.hive.legacy.convertMetastoreParquet`, may be defined in HiveUtils) to enable users to revert back to the previous

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 Thank you for the pointer, @seancxmao . And thank you for clarification, @cloud-fan . It looks like we are re-creating correctness issue somewhat in this PR when

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22343 To clarify: this is just a workaround when we hit a problematic(having case-insensitive duplicated filed names in the parquet file) hive parquet tables and we want to read it with the native

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 @dongjoon-hyun It is a little complicated. There has been a discussion about this in #22184. Below are some key comments from @cloud-fan and @gatorsmile, just FYI. *

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 What I asked was the following, wasn't it? > In case-insensitive mode, when converting hive parquet table to parquet data source, we switch the duplicated fields resolution mode to ask

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 Hi, @dongjoon-hyun When we find duplicated field names in the case of convertMetastoreXXX, we have 2 options (1) raise exception as parquet data source. To most of end users, they do not

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 Hi, @seancxmao . Should we be consistent? IIRC, all the previous PR raises Exception to prevent any potential issues. In this case, I have a feeling that `convertMetastoreXXX` should be used

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22343 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95864/ Test PASSed. ---

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22343 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22343 **[Test build #95864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95864/testReport)** for PR 22343 at commit

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22343 **[Test build #95864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95864/testReport)** for PR 22343 at commit

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22343 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22343 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95857/ Test FAILed. ---

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22343 **[Test build #95857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95857/testReport)** for PR 22343 at commit

[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...

2018-09-09 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 @dongjoon-hyun @HyukjinKwon I created a new JIRA ticket and try to use a more complete and clear title for this PR. What do you think? ---