[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-12-02 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 @cloud-fan Yes, Spark doesn't allow user to set (Spark's) statistics manually. This PR treats 0 row count of **Hive's stats**, it doesn't affect the logic for Spark's stats. Besides, Spark

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-12-01 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19831 Instead of manually setting up table statistics, I'm just trying to simulate the statistics for these tables by this way. If `totalSize (or rawDataSize) > 0` and `rowCount = 0`, at least one

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19831 Is it really an issue? If you manually set a wrong statistics, how would you expect the system to do? I think data source tables don't allow you set the statistics manually, so this problem is

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-12-01 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19831 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19831 Yes, I saw some of these tables in my cluster, but the user did not manually modify this parameter: ``` # Detailed Table Information Databasedw Table

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 Since Hive can't protect user to set a wrong stats properties, I think this solution can alleviate the problem. Besides, it's consistent with what we do for `totalSize and rawDataSize` (only use the

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 > Besides, if the size stats totalSize or rawDataSize is wrong, the problem exists whether CBO is enabled or not. > If CBO enabled, the outputRowCount == 0, the getOutputSize is 1,

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19831 If CBO enabled, the [`outputRowCount == 0`](https://github.com/apache/spark/pull/19831#L67), the [`getOutputSize`](https://github.com/apache/spark/pull/19831#L60) is 1, `sizeInBytes` is 1 and

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 Besides, if the size stats `totalSize` or `rawDataSize` is wrong, the problem also exists whether CBO is enabled or not. We need to change that in the title too. ---

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 BTW, the case here is not about join reorder, it's actually about broadcast decision. Could you update the title of this PR? ---

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19831 cc @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19831 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19831 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84259/ Test PASSed. ---

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19831 **[Test build #84259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84259/testReport)** for PR 19831 at commit

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19831 **[Test build #84259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84259/testReport)** for PR 19831 at commit

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19831 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84255/ Test FAILed. ---

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19831 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19831 **[Test build #84255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84255/testReport)** for PR 19831 at commit