Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
@cloud-fan Yes, Spark doesn't allow user to set (Spark's) statistics
manually.
This PR treats 0 row count of **Hive's stats**, it doesn't affect the logic
for Spark's stats. Besides, Spark
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/19831
Instead of manually setting up table statistics, I'm just trying to
simulate the statistics for these tables by this way.
If `totalSize (or rawDataSize) > 0` and `rowCount = 0`, at least one
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19831
Is it really an issue? If you manually set a wrong statistics, how would
you expect the system to do? I think data source tables don't allow you set the
statistics manually, so this problem is
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/19831
cc @gatorsmile @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/19831
Yes, I saw some of these tables in my cluster, but the user did not
manually modify this parameter:
```
# Detailed Table Information
Databasedw
Table
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
Since Hive can't protect user to set a wrong stats properties, I think this
solution can alleviate the problem. Besides, it's consistent with what we do
for `totalSize and rawDataSize` (only use the
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
> Besides, if the size stats totalSize or rawDataSize is wrong, the problem
exists whether CBO is enabled or not.
> If CBO enabled, the outputRowCount == 0, the getOutputSize is 1,
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/19831
If CBO enabled, the [`outputRowCount ==
0`](https://github.com/apache/spark/pull/19831#L67), the
[`getOutputSize`](https://github.com/apache/spark/pull/19831#L60) is 1,
`sizeInBytes` is 1 and
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
Besides, if the size stats `totalSize` or `rawDataSize` is wrong, the
problem also exists whether CBO is enabled or not. We need to change that in
the title too.
---
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
BTW, the case here is not about join reorder, it's actually about broadcast
decision. Could you update the title of this PR?
---
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/19831
cc @wzhfy
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19831
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19831
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84259/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19831
**[Test build #84259 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84259/testReport)**
for PR 19831 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19831
**[Test build #84259 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84259/testReport)**
for PR 19831 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19831
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84255/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19831
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19831
**[Test build #84255 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84255/testReport)**
for PR 19831 at commit
18 matches
Mail list logo