date:20200713

[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



beliefer commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453544249



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]

Review comment:
   I think I can keep my previous behavior. `AggregationIterator` already 
done this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657468027







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657467768


   **[Test build #125766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125766/testReport)**
 for PR 27735 at commit 
[`19504d6`](https://github.com/apache/spark/commit/19504d688486dc0ea7764b721b25771388bb20d5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



beliefer commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453541715



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * COUNT(DISTINCT cat2) as cat2_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * COUNT(DISTINCT 'cat2),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ *   count(if (('gid = 2)) '_gen_attr_2 else null),
+ *   first(if (('gid = 0)) 'total else null) ignore nulls]
+ *  output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * Aggregate(
+ *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *functions = [sum('_gen_attr_3)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
+ *   Expand(
+ *  projections = [('key, null, null, 0, cast('value as bigint)),
+ * ('key, if ('id > 1) 'cat1 else null, null, 1, null),
+ * ('key, null, 'cat2, 2, null)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * The rule consists of the two phases as follows:
+ *
+ * In the first phase, if the aggregate query exists filter clauses, project 
the output of
+ * the child of the aggregate query:
+ * 1. Project the data. There are three aggregation groups in this query:
+ *i. the non-distinct group;
+ *ii. the distinct 'cat1 group;
+ *iii. the distinct 'cat2 group with filter clause.
+ *Because there is at least one group with filter clause (e.g. the 
distinct 'cat2
+ *group with filter clause), then will project the data.
+ * 2. Avoid projections that may output the same attributes. There are three 
aggregation groups
+ *in this query:
+ *i. the non-distinct 'cat1 group;
+ *ii. the distinct

[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



beliefer commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453539953



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * COUNT(DISTINCT cat2) as cat2_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * COUNT(DISTINCT 'cat2),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ *   count(if (('gid = 2)) '_gen_attr_2 else null),
+ *   first(if (('gid = 0)) 'total else null) ignore nulls]
+ *  output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * Aggregate(
+ *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *functions = [sum('_gen_attr_3)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
+ *   Expand(
+ *  projections = [('key, null, null, 0, cast('value as bigint)),
+ * ('key, if ('id > 1) 'cat1 else null, null, 1, null),
+ * ('key, null, 'cat2, 2, null)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * The rule consists of the two phases as follows:
+ *
+ * In the first phase, if the aggregate query exists filter clauses, project 
the output of
+ * the child of the aggregate query:
+ * 1. Project the data. There are three aggregation groups in this query:
+ *i. the non-distinct group;
+ *ii. the distinct 'cat1 group;
+ *iii. the distinct 'cat2 group with filter clause.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657461572


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125761/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657461556


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657461072







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



SparkQA removed a comment on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657403782







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657461556







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



SparkQA commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657461188


   **[Test build #125761 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125761/testReport)**
 for PR 27428 at commit 
[`5bbbfd7`](https://github.com/apache/spark/commit/5bbbfd771e7aa1cfa6c41ccced92c8f0cf760a4d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-657460424







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



SparkQA commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657460682


   **[Test build #125756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125756/testReport)**
 for PR 27428 at commit 
[`d531864`](https://github.com/apache/spark/commit/d531864c7da4244674377369854ab2f02b5acfa2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



beliefer commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453537350



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):

Review comment:
   OK. How about `at least two distinct aggregate function and one of them 
contains filter clauses (in sql)` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657461072







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-657460424







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



SparkQA commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657459572


   **[Test build #125764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125764/testReport)**
 for PR 29045 at commit 
[`743ffe3`](https://github.com/apache/spark/commit/743ffe3775c03ef1b6a37b48cc3f3fe767f64a52).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-13 Thread GitBox



SparkQA commented on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-657459581


   **[Test build #125765 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125765/testReport)**
 for PR 28957 at commit 
[`f2356c8`](https://github.com/apache/spark/commit/f2356c80a105c0b74e578152282013c1d3b9c647).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657459685


   **[Test build #125766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125766/testReport)**
 for PR 27735 at commit 
[`19504d6`](https://github.com/apache/spark/commit/19504d688486dc0ea7764b721b25771388bb20d5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657455720







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-657455734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-13 Thread GitBox



HyukjinKwon commented on pull request #29057:
URL: https://github.com/apache/spark/pull/29057#issuecomment-657455681


   @dbtsai that sounds great. I will leave the link at SPARK-32253



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-657455734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657455720







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



SparkQA commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-657455001


   **[Test build #125763 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125763/testReport)**
 for PR 29064 at commit 
[`c95fa7d`](https://github.com/apache/spark/commit/c95fa7d0153fae2df34e53d410b183353b040f29).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657455284


   > @dongjoon-hyun / @maropu - Its seems like some issue happened after I 
rebase this current branch with the master branch.
   > 
   > Its now showing number of files change is -185. Not sure why its showing 
this difference
   
   Now its fine



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



yaooqinn commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453531441



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",

Review comment:
   If we want to fail this like this completely, we need a new interval 
parser, otherwise, we can only forbid it based on the interval result. e.g. 
`INTERVAL 1 MICROSECOND  -1 MICROSECOND 1 HOUR` will work...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657453855


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125762/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657453846







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



SparkQA removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657446183


   **[Test build #125762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125762/testReport)**
 for PR 27735 at commit 
[`d554659`](https://github.com/apache/spark/commit/d554659e5cc9a493fa01cc196ded53d0a9fe31c8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657453846


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657453623


   **[Test build #125762 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125762/testReport)**
 for PR 27735 at commit 
[`d554659`](https://github.com/apache/spark/commit/d554659e5cc9a493fa01cc196ded53d0a9fe31c8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class MountConfDirExecutorFeatureStep(conf: KubernetesConf)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657453017







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



SparkQA removed a comment on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657394110


   **[Test build #125753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125753/testReport)**
 for PR 29018 at commit 
[`19e4bcf`](https://github.com/apache/spark/commit/19e4bcfac6d876afd289b736bc19008032564c2d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657453017







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



SparkQA commented on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657452051


   **[Test build #125753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125753/testReport)**
 for PR 29018 at commit 
[`19e4bcf`](https://github.com/apache/spark/commit/19e4bcfac6d876afd289b736bc19008032564c2d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657451506


   @dongjoon-hyun / @maropu  - Its seems like some issue happened after I 
rebase this current branch with the master branch. 
   
   Its now showing number of files change is -185. Not sure why its showing 
this difference



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



yaooqinn commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453527973



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",
+  ctx.interval())
+  } else {
+val seconds = (interval.microseconds / 
DateTimeConstants.MICROS_PER_SECOND).toInt
+SetCommand(Some(key -> 
Some(ZoneOffset.ofTotalSeconds(seconds).toString)))
+  }
+} else if (ctx.timezone != null) {
+  ctx.timezone.getType match {
+case SqlBaseParser.LOCAL =>
+  SetCommand(Some(key -> Some(TimeZone.getDefault.getID)))

Review comment:
   This is the same as the default value of `spark.sql.session.timeZone`, 
shall I change that part too if I change it here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



yaooqinn commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453525655



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",
+  ctx.interval())
+  } else {
+val seconds = (interval.microseconds / 
DateTimeConstants.MICROS_PER_SECOND).toInt
+SetCommand(Some(key -> 
Some(ZoneOffset.ofTotalSeconds(seconds).toString)))
+  }
+} else if (ctx.timezone != null) {
+  ctx.timezone.getType match {
+case SqlBaseParser.LOCAL =>
+  SetCommand(Some(key -> Some(TimeZone.getDefault.getID)))
+case _ =>
+  SetCommand(Some(key -> Some(string(ctx.STRING
+  }
+} else {
+  throw new ParseException("Invalid time zone displacement value", ctx)
+}
+  }
+
+  /**
+   * Create a [[ListTimeZonesCommand]] to retrieve all the supported 
region-based Zone IDs
+   * supported.
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE ALL;

Review comment:
   I remove this, this is not standard





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



yaooqinn commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453525655



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",
+  ctx.interval())
+  } else {
+val seconds = (interval.microseconds / 
DateTimeConstants.MICROS_PER_SECOND).toInt
+SetCommand(Some(key -> 
Some(ZoneOffset.ofTotalSeconds(seconds).toString)))
+  }
+} else if (ctx.timezone != null) {
+  ctx.timezone.getType match {
+case SqlBaseParser.LOCAL =>
+  SetCommand(Some(key -> Some(TimeZone.getDefault.getID)))
+case _ =>
+  SetCommand(Some(key -> Some(string(ctx.STRING
+  }
+} else {
+  throw new ParseException("Invalid time zone displacement value", ctx)
+}
+  }
+
+  /**
+   * Create a [[ListTimeZonesCommand]] to retrieve all the supported 
region-based Zone IDs
+   * supported.
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE ALL;

Review comment:
   I removed this, this is not standard





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



yaooqinn commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453524124



##
File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##
@@ -240,6 +240,10 @@ statement
 | MSCK REPAIR TABLE multipartIdentifier
#repairTable
 | op=(ADD | LIST) identifier (STRING | .*?)
#manageResource
 | SET ROLE .*? 
#failNativeCommand
+| SET TIME ZONE ALL
#listTimeZones
+| SET TIME ZONE interval   
#setTimeZone
+| SET TIME ZONE timezone=(STRING | LOCAL)  
#setTimeZone
+| SET TIME ZONE .*?
#setTimeZone

Review comment:
   this is used to fail invalid set time zone syntax explicitly, cuz' now 
we support 
   
   ```sql
   spark-sql (default)> set time zone abcd;
   key  value
   time zone abcd   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-07-13 Thread GitBox



SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-657446183


   **[Test build #125762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125762/testReport)**
 for PR 27735 at commit 
[`d554659`](https://github.com/apache/spark/commit/d554659e5cc9a493fa01cc196ded53d0a9fe31c8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-13 Thread GitBox



Ngone51 commented on a change in pull request #29015:
URL: https://github.com/apache/spark/pull/29015#discussion_r453521642



##
File path: 
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
##
@@ -49,6 +56,23 @@ class MasterWebUI(
   "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
 attachHandler(createRedirectHandler(
   "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
+attachHandler(createServletHandler("/workers/kill", new HttpServlet {
+  override def doPost(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
+val hostPorts: Seq[String] = Option(req.getParameterValues("host"))
+  .getOrElse(Array[String]()).map(_.toLowerCase(Locale.ENGLISH)).toSeq
+logInfo(s"Received request to decommission hostPorts $hostPorts from 
${req.getRemoteAddr}")

Review comment:
   nit: `decommission hostPorts` -> `decommission workers`?

##
File path: core/src/main/scala/org/apache/spark/deploy/master/Master.scala
##
@@ -863,7 +872,36 @@ private[deploy] class Master(
 true
   }
 
-  private def decommissionWorker(worker: WorkerInfo): Unit = {
+  private def handleDecommissionWorkers(hostPorts: Seq[String]): Integer = {
+val hostsWithoutPorts = new HashSet[String]
+val addressToRemove = new HashSet[RpcAddress]
+hostPorts.foreach(hp => {

Review comment:
   we can first filter the valid workers and then apply 
`decommissionWorker` for each worker, e.g.
   
   `hostPorsts.filter(...).foreach(...)`

##
File path: core/src/main/scala/org/apache/spark/deploy/master/Master.scala
##
@@ -863,7 +872,36 @@ private[deploy] class Master(
 true
   }
 
-  private def decommissionWorker(worker: WorkerInfo): Unit = {
+  private def handleDecommissionWorkers(hostPorts: Seq[String]): Integer = {
+val hostsWithoutPorts = new HashSet[String]
+val addressToRemove = new HashSet[RpcAddress]
+hostPorts.foreach(hp => {
+  val (host, port) = Utils.parseHostPort(hp)
+  if (port == 0) {
+hostsWithoutPorts.add(host)
+  } else {
+val rpcAddress = new RpcAddress(host, port)
+if (addressToWorker.contains(rpcAddress)) {
+  addressToRemove.add(rpcAddress)
+}
+  }
+})
+
+if (hostsWithoutPorts.nonEmpty) {
+  addressToRemove ++= addressToWorker.keys.filter(addr => 
hostsWithoutPorts.contains(addr.host))
+}

Review comment:
   It seems that comparing the `host` only should be enough?

##
File path: 
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
##
@@ -49,6 +56,23 @@ class MasterWebUI(
   "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
 attachHandler(createRedirectHandler(
   "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
+attachHandler(createServletHandler("/workers/kill", new HttpServlet {
+  override def doPost(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
+val hostPorts: Seq[String] = Option(req.getParameterValues("host"))
+  .getOrElse(Array[String]()).map(_.toLowerCase(Locale.ENGLISH)).toSeq

Review comment:
   Why use `Locale.ENGLISH`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453520007



##
File path: sql/core/src/test/resources/sql-tests/results/timezone.sql.out
##
@@ -0,0 +1,121 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 11
+
+
+-- !query
+SET TIME ZONE 'Asia/Hong_Kong'
+-- !query schema
+struct
+-- !query output
+spark.sql.session.timeZone Asia/Hong_Kong

Review comment:
   This looks like a weird output for `SET TIME ZONE 'Asia/Hong_Kong'`. 
Shall we add a `Project` node on the `SetCommand` to give better output>

##
File path: sql/core/src/test/resources/sql-tests/results/timezone.sql.out
##
@@ -0,0 +1,121 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 11
+
+
+-- !query
+SET TIME ZONE 'Asia/Hong_Kong'
+-- !query schema
+struct
+-- !query output
+spark.sql.session.timeZone Asia/Hong_Kong

Review comment:
   This looks like a weird output for `SET TIME ZONE 'Asia/Hong_Kong'`. 
Shall we add a `Project` node on the `SetCommand` to give better output?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657442634







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453519340



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",
+  ctx.interval())
+  } else {
+val seconds = (interval.microseconds / 
DateTimeConstants.MICROS_PER_SECOND).toInt
+SetCommand(Some(key -> 
Some(ZoneOffset.ofTotalSeconds(seconds).toString)))
+  }
+} else if (ctx.timezone != null) {
+  ctx.timezone.getType match {
+case SqlBaseParser.LOCAL =>
+  SetCommand(Some(key -> Some(TimeZone.getDefault.getID)))
+case _ =>
+  SetCommand(Some(key -> Some(string(ctx.STRING
+  }
+} else {
+  throw new ParseException("Invalid time zone displacement value", ctx)
+}
+  }
+
+  /**
+   * Create a [[ListTimeZonesCommand]] to retrieve all the supported 
region-based Zone IDs
+   * supported.
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE ALL;

Review comment:
   is this a standard SQL syntax?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453519063



##
File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##
@@ -240,6 +240,10 @@ statement
 | MSCK REPAIR TABLE multipartIdentifier
#repairTable
 | op=(ADD | LIST) identifier (STRING | .*?)
#manageResource
 | SET ROLE .*? 
#failNativeCommand
+| SET TIME ZONE ALL
#listTimeZones
+| SET TIME ZONE interval   
#setTimeZone
+| SET TIME ZONE timezone=(STRING | LOCAL)  
#setTimeZone
+| SET TIME ZONE .*?
#setTimeZone

Review comment:
   so we add this only for better parser message?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657442634







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453518326



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",
+  ctx.interval())
+  } else {
+val seconds = (interval.microseconds / 
DateTimeConstants.MICROS_PER_SECOND).toInt
+SetCommand(Some(key -> 
Some(ZoneOffset.ofTotalSeconds(seconds).toString)))
+  }
+} else if (ctx.timezone != null) {
+  ctx.timezone.getType match {
+case SqlBaseParser.LOCAL =>
+  SetCommand(Some(key -> Some(TimeZone.getDefault.getID)))

Review comment:
   shall we use the `ZoneId.systemDefault()`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453518326



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",
+  ctx.interval())
+  } else {
+val seconds = (interval.microseconds / 
DateTimeConstants.MICROS_PER_SECOND).toInt
+SetCommand(Some(key -> 
Some(ZoneOffset.ofTotalSeconds(seconds).toString)))
+  }
+} else if (ctx.timezone != null) {
+  ctx.timezone.getType match {
+case SqlBaseParser.LOCAL =>
+  SetCommand(Some(key -> Some(TimeZone.getDefault.getID)))

Review comment:
   shall we use the `ZoneID`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



SparkQA commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657441762


   **[Test build #125761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125761/testReport)**
 for PR 27428 at commit 
[`5bbbfd7`](https://github.com/apache/spark/commit/5bbbfd771e7aa1cfa6c41ccced92c8f0cf760a4d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29064:
URL: https://github.com/apache/spark/pull/29064#discussion_r453518043



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##
@@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 ResetCommand
   }
 
+  /**
+   * Create a [[SetCommand]] logical plan to set 
[[SQLConf.SESSION_LOCAL_TIMEZONE]]
+   * Example SQL :
+   * {{{
+   *   SET TIME ZONE LOCAL;
+   *   SET TIME ZONE 'Asia/Shanghai';
+   *   SET TIME ZONE INTERVAL 10 HOURS;
+   * }}}
+   */
+  override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = 
withOrigin(ctx) {
+val key = SQLConf.SESSION_LOCAL_TIMEZONE.key
+if (ctx.interval != null) {
+  val interval = parseIntervalLiteral(ctx.interval)
+  if (interval.months != 0 || interval.days != 0 ||
+math.abs(interval.microseconds) > 18 * 
DateTimeConstants.MICROS_PER_HOUR) {
+throw new ParseException("The interval value must be in the range of 
[-18, +18] hours",

Review comment:
   shall we also fail for things like `INTERVAL 1 MICROSECOND`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453513546



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]

Review comment:
   Do we need to rewrite this query? The planner can handle single distinct 
agg func AFAIK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453511010



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * COUNT(DISTINCT cat2) as cat2_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * COUNT(DISTINCT 'cat2),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ *   count(if (('gid = 2)) '_gen_attr_2 else null),
+ *   first(if (('gid = 0)) 'total else null) ignore nulls]
+ *  output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * Aggregate(
+ *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *functions = [sum('_gen_attr_3)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
+ *   Expand(
+ *  projections = [('key, null, null, 0, cast('value as bigint)),
+ * ('key, if ('id > 1) 'cat1 else null, null, 1, null),
+ * ('key, null, 'cat2, 2, null)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * The rule consists of the two phases as follows:
+ *
+ * In the first phase, if the aggregate query exists filter clauses, project 
the output of
+ * the child of the aggregate query:
+ * 1. Project the data. There are three aggregation groups in this query:
+ *i. the non-distinct group;
+ *ii. the distinct 'cat1 group;
+ *iii. the distinct 'cat2 group with filter clause.
+ *Because there is at least one group with filter clause (e.g. the 
distinct 'cat2
+ *group with filter clause), then will project the data.
+ * 2. Avoid projections that may output the same attributes. There are three 
aggregation groups
+ *in this query:
+ *i. the non-distinct 'cat1 group;
+ *ii. the distinct

[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453510469



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * COUNT(DISTINCT cat2) as cat2_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * COUNT(DISTINCT 'cat2),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ *   count(if (('gid = 2)) '_gen_attr_2 else null),
+ *   first(if (('gid = 0)) 'total else null) ignore nulls]
+ *  output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * Aggregate(
+ *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *functions = [sum('_gen_attr_3)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
+ *   Expand(
+ *  projections = [('key, null, null, 0, cast('value as bigint)),
+ * ('key, if ('id > 1) 'cat1 else null, null, 1, null),
+ * ('key, null, 'cat2, 2, null)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * The rule consists of the two phases as follows:
+ *
+ * In the first phase, if the aggregate query exists filter clauses, project 
the output of
+ * the child of the aggregate query:
+ * 1. Project the data. There are three aggregation groups in this query:
+ *i. the non-distinct group;
+ *ii. the distinct 'cat1 group;
+ *iii. the distinct 'cat2 group with filter clause.
+ *Because there is at least one group with filter clause (e.g. the 
distinct 'cat2

Review comment:
   When there is at least one aggregate function has the filter clause, we 
add a project node on the input plan.

##
File path:

[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453509648



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * COUNT(DISTINCT cat2) as cat2_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * COUNT(DISTINCT 'cat2),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ *   count(if (('gid = 2)) '_gen_attr_2 else null),
+ *   first(if (('gid = 0)) 'total else null) ignore nulls]
+ *  output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
+ * Aggregate(
+ *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *functions = [sum('_gen_attr_3)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
+ *   Expand(
+ *  projections = [('key, null, null, 0, cast('value as bigint)),
+ * ('key, if ('id > 1) 'cat1 else null, null, 1, null),
+ * ('key, null, 'cat2, 2, null)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3])
+ * LocalTableScan [...]
+ * }}}
+ *
+ * The rule consists of the two phases as follows:
+ *
+ * In the first phase, if the aggregate query exists filter clauses, project 
the output of
+ * the child of the aggregate query:
+ * 1. Project the data. There are three aggregation groups in this query:
+ *i. the non-distinct group;
+ *ii. the distinct 'cat1 group;
+ *iii. the distinct 'cat2 group with filter clause.

Review comment:
   This doesn't match the group. Maybe just make it general `the distinct 
group without filter clause` and `the distinct group with filter clause`





This is an automated message from the Apache Git Service.
To respond to the message, please log

[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #27428:
URL: https://github.com/apache/spark/pull/27428#discussion_r453508040



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType
  * {{{
  * Aggregate(
  *key = ['key]
- *functions = [count(if (('gid = 1)) 'cat1 else null),
- * count(if (('gid = 2)) 'cat2 else null),
+ *functions = [count(if (('gid = 1)) '_gen_attr_1 else null),
+ * count(if (('gid = 2)) '_gen_attr_2 else null),
  * first(if (('gid = 0)) 'total else null) ignore nulls]
  *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  *   Aggregate(
- *  key = ['key, 'cat1, 'cat2, 'gid]
- *  functions = [sum('value) with FILTER('id > 1)]
- *  output = ['key, 'cat1, 'cat2, 'gid, 'total])
+ *  key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid]
+ *  functions = [sum('_gen_attr_3)]
+ *  output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total])
  * Expand(
- *projections = [('key, null, null, 0, cast('value as bigint), 'id),
+ *projections = [('key, null, null, 0, if ('id > 1) cast('value as 
bigint) else null, 'id),
  *   ('key, 'cat1, null, 1, null, null),
  *   ('key, null, 'cat2, 2, null, null)]
- *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
+ *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * Third example: single distinct aggregate function with filter clauses and 
have
+ * not other distinct aggregate function (in sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count('_gen_attr_1),
+  *  sum('_gen_attr_2)]
+ *  output = ['key, 'cat1_cnt, 'total])
+ * Project(
+ *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as 
bigint)]
+ *output = ['key, '_gen_attr_1, '_gen_attr_2])
  *   LocalTableScan [...]
  * }}}
  *
- * The rule does the following things here:
+ * Four example: single distinct aggregate function with filter clauses (in 
sql):

Review comment:
   `single`  -> `more than one`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657424545







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657424743







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-657424673







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29075:
URL: https://github.com/apache/spark/pull/29075#issuecomment-657424693







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657424743







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657424545







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29075:
URL: https://github.com/apache/spark/pull/29075#issuecomment-657424693







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-657424673







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE

2020-07-13 Thread GitBox



SparkQA commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-657423952


   **[Test build #125759 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125759/testReport)**
 for PR 29064 at commit 
[`95a756a`](https://github.com/apache/spark/commit/95a756a435377edf5e4b7d9f0de7863d7ed6f9a5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



SparkQA commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657423953


   **[Test build #125760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125760/testReport)**
 for PR 29045 at commit 
[`4469e6d`](https://github.com/apache/spark/commit/4469e6d1c5970bc5de3d4a4dc728e9775e43e141).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



SparkQA commented on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657423819


   **[Test build #125757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125757/testReport)**
 for PR 29084 at commit 
[`d24dbed`](https://github.com/apache/spark/commit/d24dbede2b50707e30bd8261bf324e9f9cc731b4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-13 Thread GitBox



SparkQA commented on pull request #29075:
URL: https://github.com/apache/spark/pull/29075#issuecomment-657423860


   **[Test build #125758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125758/testReport)**
 for PR 29075 at commit 
[`6fe106c`](https://github.com/apache/spark/commit/6fe106ce84e83641d66d572c45050b25761bbf3d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #27983: [SPARK-32105][SQL]Refactor current ScriptTransformationExec code

2020-07-13 Thread GitBox



cloud-fan commented on pull request #27983:
URL: https://github.com/apache/spark/pull/27983#issuecomment-657422187


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #27983: [SPARK-32105][SQL]Refactor current ScriptTransformationExec code

2020-07-13 Thread GitBox



cloud-fan closed pull request #27983:
URL: https://github.com/apache/spark/pull/27983


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AngersZh commented on a change in pull request #29084:
URL: https://github.com/apache/spark/pull/29084#discussion_r453496821



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##
@@ -191,15 +191,19 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 def createSortMergeJoin() = {
   if (RowOrdering.isOrderable(leftKeys)) {
 Some(Seq(joins.SortMergeJoinExec(
-  leftKeys, rightKeys, joinType, condition, planLater(left), 
planLater(right
+  leftKeys, rightKeys, joinType, nonEquiCond, planLater(left), 
planLater(right
   } else {
 None
   }
 }
 
 def createCartesianProduct() = {
   if (joinType.isInstanceOf[InnerLike]) {
-Some(Seq(joins.CartesianProductExec(planLater(left), 
planLater(right), p.condition)))
+// Since for join hint [[SHUFFLE_REPLICATE_NL]] when having equi 
join conditions,
+// we still choose Cartesian Product Join, but in 
ExtractEquiJoinKeys, it will filter
+// out equi condition. Instead of using the condition extracted by 
ExtractEquiJoinKeys,
+// we should use the original join condition "j.condition".

Review comment:
   > how about:
   > 
   > ```
   > `CartesianProductExec` can't implicitly evaluate the equal join condition, 
here we should
   > pass the original condition which includes both equal and non-equal 
conditions.
   > ```
   
   Updated





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



cloud-fan commented on a change in pull request #29084:
URL: https://github.com/apache/spark/pull/29084#discussion_r453480891



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##
@@ -191,15 +191,19 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 def createSortMergeJoin() = {
   if (RowOrdering.isOrderable(leftKeys)) {
 Some(Seq(joins.SortMergeJoinExec(
-  leftKeys, rightKeys, joinType, condition, planLater(left), 
planLater(right
+  leftKeys, rightKeys, joinType, nonEquiCond, planLater(left), 
planLater(right
   } else {
 None
   }
 }
 
 def createCartesianProduct() = {
   if (joinType.isInstanceOf[InnerLike]) {
-Some(Seq(joins.CartesianProductExec(planLater(left), 
planLater(right), p.condition)))
+// Since for join hint [[SHUFFLE_REPLICATE_NL]] when having equi 
join conditions,
+// we still choose Cartesian Product Join, but in 
ExtractEquiJoinKeys, it will filter
+// out equi condition. Instead of using the condition extracted by 
ExtractEquiJoinKeys,
+// we should use the original join condition "j.condition".

Review comment:
   how about:
   ```
   `CartesianProductExec` can't implicitly evaluate the equal join condition, 
here we should
   pass the original condition which includes both equal and non-equal 
conditions.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657404261







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657404261







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-657404294







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-657404294







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-13 Thread GitBox



SparkQA commented on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-657403743


   **[Test build #125755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125755/testReport)**
 for PR 28957 at commit 
[`5c27ea8`](https://github.com/apache/spark/commit/5c27ea8c6c3bbf0c86e539c0002aa24ec18f591d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-13 Thread GitBox



SparkQA commented on pull request #27428:
URL: https://github.com/apache/spark/pull/27428#issuecomment-657403782


   **[Test build #125756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125756/testReport)**
 for PR 27428 at commit 
[`d531864`](https://github.com/apache/spark/commit/d531864c7da4244674377369854ab2f02b5acfa2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-13 Thread GitBox



dbtsai commented on pull request #29057:
URL: https://github.com/apache/spark/pull/29057#issuecomment-657401623


   @HyukjinKwon and @dongjoon-hyun seems there are plugs for Github actions to 
show the failing tests, for example 
https://github.com/check-run-reporter/action This might help for people to read 
the tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dbtsai edited a comment on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-13 Thread GitBox



dbtsai edited a comment on pull request #29057:
URL: https://github.com/apache/spark/pull/29057#issuecomment-657401623


   Great job! @HyukjinKwon and @dongjoon-hyun seems there are plugs for Github 
actions to show the failing tests, for example 
https://github.com/check-run-reporter/action This might help for people to read 
the tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657400982







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657400982







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



SparkQA commented on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657400399


   **[Test build #125754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125754/testReport)**
 for PR 29084 at commit 
[`f8774ec`](https://github.com/apache/spark/commit/f8774ecc0e5576b4c42d8cca8759c586dbe7c000).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AngersZh opened a new pull request #29084:
URL: https://github.com/apache/spark/pull/29084


   ### What changes were proposed in this pull request?
   follow comment 
https://github.com/apache/spark/pull/29035#discussion_r453468999
   Explain for pr 
   
   
   ### Why are the changes needed?
   add comment
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Not need
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AngersZh commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r453473254



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##
@@ -199,7 +199,7 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 
 def createCartesianProduct() = {
   if (joinType.isInstanceOf[InnerLike]) {
-Some(Seq(joins.CartesianProductExec(planLater(left), 
planLater(right), condition)))
+Some(Seq(joins.CartesianProductExec(planLater(left), 
planLater(right), p.condition)))

Review comment:
   > We need to write a comment above this line and explain what it is 
doing.
   > // instead of using the condition extracted by ExtractEquiJoinKeys, we 
should use the original join condition, i.e., "p.condition".
   
   Raise a PR  https://github.com/apache/spark/pull/29084





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



AngersZh commented on pull request #29084:
URL: https://github.com/apache/spark/pull/29084#issuecomment-657398681


   cc @dongjoon-hyun @gatorsmile 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657394198







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



AmplabJenkins commented on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657394198







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-13 Thread GitBox



SparkQA commented on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-657394110


   **[Test build #125753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125753/testReport)**
 for PR 29018 at commit 
[`19e4bcf`](https://github.com/apache/spark/commit/19e4bcfac6d876afd289b736bc19008032564c2d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-13 Thread GitBox



gatorsmile commented on a change in pull request #29035:
URL: https://github.com/apache/spark/pull/29035#discussion_r453469535



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
##
@@ -159,7 +159,7 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   //   4. Pick cartesian product if join type is inner like.
   //   5. Pick broadcast nested loop join as the final solution. It may 
OOM but we don't have
   //  other choice.
-  case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, 
right, hint) =>
+  case p @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, 
left, right, hint) =>

Review comment:
   To avoid making the similar mistakes, we need to rename `condition` to a 
self-descriptive name. "otherConditions"? It is a little bit hard to name it TBH
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657393312


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125750/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-657393281


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125751/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29075:
URL: https://github.com/apache/spark/pull/29075#issuecomment-657393342


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125746/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



SparkQA removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657361898







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657393265







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-13 Thread GitBox



SparkQA removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-657367971


   **[Test build #125751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125751/testReport)**
 for PR 28363 at commit 
[`b648156`](https://github.com/apache/spark/commit/b64815622bb4e8cd8b474cb2983f2c9b78ed9342).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #28363:
URL: https://github.com/apache/spark/pull/28363#issuecomment-657393278


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-13 Thread GitBox



AmplabJenkins removed a comment on pull request #29075:
URL: https://github.com/apache/spark/pull/29075#issuecomment-657393340


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-13 Thread GitBox



SparkQA removed a comment on pull request #29075:
URL: https://github.com/apache/spark/pull/29075#issuecomment-657337043


   **[Test build #125746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125746/testReport)**
 for PR 29075 at commit 
[`df08390`](https://github.com/apache/spark/commit/df083906f2d9938d77b4b41878271af1826b5220).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

501 - 600 of 649 matches

Mail list logo