[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19952#discussion_r156555044 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -147,65 +139,76 @@ object

[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-11 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19783 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-11 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19783 For the past 2 test builds #84725 and #84732, I checked the test result on the web. Actually there were no failures. See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84725

[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-12-11 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19783 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155964396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,171 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155963930 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -359,7 +371,7 @@ class

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155963740 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -471,37 +508,47 @@ case

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-10 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155963622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -332,8 +332,41 @@ case

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-07 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155683828 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,144 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-06 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155343962 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,171 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-05 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155125157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-05 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155125029 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,194 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154255068 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -578,6 +590,112 @@ class

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154254419 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -359,7 +371,7 @@ class

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154254145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -784,11 +879,16 @@ case

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154252063 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -513,10 +560,9 @@ case

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154250197 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -332,8 +332,45 @@ case

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154249995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154225069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154223769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154223705 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-29 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r153902382 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-11-28 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r153665547 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala --- @@ -67,6 +68,205 @@ class

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-11-28 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r153665092 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala --- @@ -67,6 +68,205 @@ class

[GitHub] spark issue #19783: [SPARK-21322][SQL] support histogram in filter cardinali...

2017-11-19 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19783 cc @sameeragarwal @cloud-fan @gatorsmile @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19357: [SPARK-21322][SQL][WIP] support histogram in filter card...

2017-11-19 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19357 This pull request was created while there were several dependencies that had not been defined yet. As a result, it has caused many conflicts. I decided to close this pull request and start a clean

[GitHub] spark pull request #19357: [SPARK-21322][SQL][WIP] support histogram in filt...

2017-11-19 Thread ron8hu
Github user ron8hu closed the pull request at: https://github.com/apache/spark/pull/19357 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: support histogram in filter cardinality estimatio...

2017-11-19 Thread ron8hu
GitHub user ron8hu opened a pull request: https://github.com/apache/spark/pull/19783 support histogram in filter cardinality estimation ## What changes were proposed in this pull request? Histogram is effective in dealing with skewed distribution. After we generate

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-11-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r148724807 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -275,6 +327,64 @@ object ColumnStat extends

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-10-19 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r145840719 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -216,65 +218,61 @@ object ColumnStat extends

[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-10-19 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r145828713 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -216,65 +218,61 @@ object ColumnStat extends

[GitHub] spark issue #19357: [SPARK-21322][SQL][WIP] support histogram in filter card...

2017-09-26 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/19357 cc @wzhfy Please review code first before I request the community to review it. Thanks. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #19357: support histogram in filter cardinality estimatio...

2017-09-26 Thread ron8hu
GitHub user ron8hu opened a pull request: https://github.com/apache/spark/pull/19357 support histogram in filter cardinality estimation ## What changes were proposed in this pull request? Histogram is effective in dealing with skewed distribution. After we generate

[GitHub] spark issue #17918: [SPARK-20678][SQL] Ndv for columns not in filter conditi...

2017-05-09 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/17918 I agree that we need to scale down NDV for all the referenced columns in a query if a filter condition reduces the number of qualified rows. Do you find this problem when running tpc-ds benchmark

[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

2017-04-07 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17546#discussion_r110480494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -736,6 +736,12 @@ object SQLConf { .checkValue(weight

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109476536 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,221 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109476505 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,221 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109476460 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -491,7 +599,22 @@ class

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109471156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-02 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109340165 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-02 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109326608 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-02 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109323394 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-02 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109323397 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-02 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109323376 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109293607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-31 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109273331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,140 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-31 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109273334 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,140 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-31 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109273326 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,140 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-31 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109268076 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,140 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-31 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109267675 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,140 @@ case

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-30 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/17415 cc @sameeragarwal @cloud-fan @gatorsmile @wzhfy After a few round of changes and commits, this PR should be in good shape. If we can include in Spark 2.2, then it can help tpc-h queries

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-29 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108754614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -515,8 +530,138 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-29 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108753830 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -515,8 +530,138 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-29 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108752975 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -515,8 +530,138 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-29 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108751882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -515,8 +530,138 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-28 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108583109 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -515,8 +530,135 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-28 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108582594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -515,8 +530,135 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-28 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108582540 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -509,8 +524,131 @@ case

[GitHub] spark issue #17446: [SPARK-17075][SQL][followup] Add Estimation of Constant ...

2017-03-27 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/17446 The logic is straightforward. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-27 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108308949 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -509,8 +524,131 @@ case

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-27 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108307672 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -381,7 +461,22 @@ class

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-27 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108307490 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -381,7 +461,22 @@ class

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-27 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r108246637 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -509,8 +524,131 @@ case

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-03-24 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/17415 cc @sameeragarwal @cloud-fan @gatorsmile This Jira is not on Spark 2.2 blocker list. If time permits, we can include it in Spark 2.2. If not, we can wait for a maintenance release. Thanks

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-03-24 Thread ron8hu
GitHub user ron8hu opened a pull request: https://github.com/apache/spark/pull/17415 [SPARK-19408][SQL] filter estimation on two columns of same table ## What changes were proposed in this pull request? In SQL queries, we also see predicate expressions involving two columns

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-17 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/15363#discussion_r106766281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -20,19 +20,347 @@ package

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-15 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/15363#discussion_r106285527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -51,6 +51,11 @@ case class

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-15 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/15363#discussion_r106271250 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -51,6 +51,11 @@ case class

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-15 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/15363#discussion_r106237955 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -51,6 +51,11 @@ case class

[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...

2017-03-14 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/15363#discussion_r106084556 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -51,6 +51,11 @@ case class

[GitHub] spark pull request #17138: [SPARK-17080] [SQL] join reorder

2017-03-07 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17138#discussion_r104760579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -0,0 +1,274 @@ +/* + * Licensed

[GitHub] spark pull request #17148: [SPARK-17075][SQL][followup] fix filter estimatio...

2017-03-06 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17148#discussion_r104557770 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -90,32 +95,43 @@ case class

[GitHub] spark pull request #17148: [SPARK-17075][SQL][followup] fix filter estimatio...

2017-03-06 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17148#discussion_r104557294 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -90,32 +95,43 @@ case class

[GitHub] spark pull request #17148: [SPARK-17075][SQL][followup] fix filter estimatio...

2017-03-06 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17148#discussion_r104527462 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -254,133 +270,118 @@ class

[GitHub] spark pull request #17148: [SPARK-17075][SQL][followup] fix filter estimatio...

2017-03-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17148#discussion_r104267627 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -157,7 +157,7 @@ class

[GitHub] spark pull request #17148: [SPARK-17075][SQL][followup] fix filter estimatio...

2017-03-03 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17148#discussion_r104267295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -101,21 +101,23 @@ case

[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...

2017-02-25 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103090517 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -361,57 +343,52 @@ case

[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...

2017-02-25 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103087483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -95,15 +84,16 @@ case class

[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...

2017-02-25 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103087345 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -297,6 +278,8 @@ case class

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-20 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r102133390 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -0,0 +1,389

[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-19 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/16395 @cloud-fan I have updated code based on your feedback. Please review it again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-16 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/16395 Hi @cloud-fan I revised the code using latest Range class. Thanks for reviewing the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-13 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r100955392 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,623

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-13 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r100954234 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,623

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-13 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r100952454 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,623

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-13 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r100941831 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,623

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-13 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r100940942 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,623

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-02-04 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16696#discussion_r99474494 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/StatsEstimationSuite.scala --- @@ -18,12 +18,41 @@ package

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-31 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16696#discussion_r98776491 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -727,37 +728,18 @@ case class

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-23 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/16594 To show a very large Long number, there is no need to print out every digit in the number. We can use exponent. For example, a number 120,000,000,005,123 can be printed as 1.2*10**14, where 10**14

[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-18 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/16395 @wzhfy For predicate condition d_date >= '2000-01-27', we do not support it because Spark SQL cast d_date column to String first before comparison. For predicate condition d_date >= cast('2

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-18 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96718076 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -0,0 +1,303

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-16 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96325991 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,620

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-16 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96324552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,620

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-16 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96308583 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -0,0 +1,309

[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-14 Thread ron8hu
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/16395 cc @rxin @wzhfy Have updated code. Please review again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-14 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96128057 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/Range.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-14 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96128021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/Range.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-12 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r95935452 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -116,6 +116,12 @@ case class Filter

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-12 Thread ron8hu
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r95916520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,555

  1   2   >