[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91092/ Test PASSed. ---

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #91092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91092/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #91092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91092/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-03-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88775/ Test PASSed. ---

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-03-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #88775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88775/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2018-03-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #88775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88775/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-11-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19560 @wangyum Make sense. You can also try approach in this pr. If there are many(tens of thousands of) ETLs in the warehouse, we cannot afford to give that many hints or fix all the

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-11-14 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19560 I also hint this issues: ```sql select * from A join B on a.key = b.key ``` table A is small but table B is big and table B's stats are incorrect. so It will Broadcast table B.

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-25 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19560 I can see the value and also the potential extra overhead (more expensive for object stores), although this does not resolve the root cause. Before we providing adaptive runtime

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19560 >My main concern is, we'd better not to put burden on Spark to deal with metastore failures I think this make sense. I was also thinking about this when proposing this pr. I do agree

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19560 My main concern is, we'd better not to put burden on Spark to deal with metastore failures, because Spark doesn't have control on metastores. The system using Spark and metastore should be

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19560 > Users always do not know there's error in stats. Isn't there any exceptions or error messages when updating table/stats fails? I suppose the system is able to know it through logging or

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19560 @wzhfy Thanks for comment; I know your point. In my cluster, namenode is under heavy pressure. Errors in stats happen with big chance. Users always do not know there's error in stats.

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19560 I wonder when this config should be used. If user knows there's some error in stats, why not just analyze the table (specify "noscan" if only size is needed)? This can fix the problem instead of

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83008/ Test PASSed. ---

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #83008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83008/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #83008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83008/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19560 @viirya Thanks a lot for comments. 1. In current change, I verify the stats from file system only when the relation is under join. 2. I added a warning when the size from file system

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83002/ Test PASSed. ---

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19560 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19560 **[Test build #83002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83002/testReport)** for PR 19560 at commit

[GitHub] spark issue #19560: [SPARK-22334][SQL] Check table size from filesystem in c...

2017-10-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19560 @gatorsmile @dongjoon-hyun Thanks a lot for looking into this. This pr aims to avoid OOM if metastore fails to update table properties after the data is already produced. With the