[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
beliefer commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453544249 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] Review comment: I think I can keep my previous behavior. `AggregationIterator` already done this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657468027 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657467768 **[Test build #125766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125766/testReport)** for PR 27735 at commit [`19504d6`](https://github.com/apache/spark/commit/19504d688486dc0ea7764b721b25771388bb20d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
beliefer commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453541715 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * COUNT(DISTINCT cat2) as cat2_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * COUNT(DISTINCT 'cat2), + * sum('value)] + *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), + * first(if (('gid = 0)) 'total else null) ignore nulls] + * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * Aggregate( + *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + *functions = [sum('_gen_attr_3)] + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) + * Expand( + * projections = [('key, null, null, 0, cast('value as bigint)), + * ('key, if ('id > 1) 'cat1 else null, null, 1, null), + * ('key, null, 'cat2, 2, null)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3]) + * LocalTableScan [...] + * }}} + * + * The rule consists of the two phases as follows: + * + * In the first phase, if the aggregate query exists filter clauses, project the output of + * the child of the aggregate query: + * 1. Project the data. There are three aggregation groups in this query: + *i. the non-distinct group; + *ii. the distinct 'cat1 group; + *iii. the distinct 'cat2 group with filter clause. + *Because there is at least one group with filter clause (e.g. the distinct 'cat2 + *group with filter clause), then will project the data. + * 2. Avoid projections that may output the same attributes. There are three aggregation groups + *in this query: + *i. the non-distinct 'cat1 group; + *ii. the distinct
[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
beliefer commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453539953 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * COUNT(DISTINCT cat2) as cat2_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * COUNT(DISTINCT 'cat2), + * sum('value)] + *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), + * first(if (('gid = 0)) 'total else null) ignore nulls] + * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * Aggregate( + *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + *functions = [sum('_gen_attr_3)] + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) + * Expand( + * projections = [('key, null, null, 0, cast('value as bigint)), + * ('key, if ('id > 1) 'cat1 else null, null, 1, null), + * ('key, null, 'cat2, 2, null)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3]) + * LocalTableScan [...] + * }}} + * + * The rule consists of the two phases as follows: + * + * In the first phase, if the aggregate query exists filter clauses, project the output of + * the child of the aggregate query: + * 1. Project the data. There are three aggregation groups in this query: + *i. the non-distinct group; + *ii. the distinct 'cat1 group; + *iii. the distinct 'cat2 group with filter clause. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657461572 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125761/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657461556 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657461072 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657403782 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657461556 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657461188 **[Test build #125761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125761/testReport)** for PR 27428 at commit [`5bbbfd7`](https://github.com/apache/spark/commit/5bbbfd771e7aa1cfa6c41ccced92c8f0cf760a4d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
AmplabJenkins removed a comment on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657460424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657460682 **[Test build #125756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125756/testReport)** for PR 27428 at commit [`d531864`](https://github.com/apache/spark/commit/d531864c7da4244674377369854ab2f02b5acfa2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
beliefer commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453537350 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): Review comment: OK. How about `at least two distinct aggregate function and one of them contains filter clauses (in sql)` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657461072 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
AmplabJenkins commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657460424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SparkQA commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657459572 **[Test build #125764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125764/testReport)** for PR 29045 at commit [`743ffe3`](https://github.com/apache/spark/commit/743ffe3775c03ef1b6a37b48cc3f3fe767f64a52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
SparkQA commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657459581 **[Test build #125765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125765/testReport)** for PR 28957 at commit [`f2356c8`](https://github.com/apache/spark/commit/f2356c80a105c0b74e578152282013c1d3b9c647). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657459685 **[Test build #125766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125766/testReport)** for PR 27735 at commit [`19504d6`](https://github.com/apache/spark/commit/19504d688486dc0ea7764b721b25771388bb20d5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657455720 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
AmplabJenkins removed a comment on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657455734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions
HyukjinKwon commented on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657455681 @dbtsai that sounds great. I will leave the link at SPARK-32253 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
AmplabJenkins commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657455734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657455720 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
SparkQA commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657455001 **[Test build #125763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125763/testReport)** for PR 29064 at commit [`c95fa7d`](https://github.com/apache/spark/commit/c95fa7d0153fae2df34e53d410b183353b040f29). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657455284 > @dongjoon-hyun / @maropu - Its seems like some issue happened after I rebase this current branch with the master branch. > > Its now showing number of files change is -185. Not sure why its showing this difference Now its fine This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
yaooqinn commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453531441 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", Review comment: If we want to fail this like this completely, we need a new interval parser, otherwise, we can only forbid it based on the interval result. e.g. `INTERVAL 1 MICROSECOND -1 MICROSECOND 1 HOUR` will work... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657453855 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125762/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657453846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657446183 **[Test build #125762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125762/testReport)** for PR 27735 at commit [`d554659`](https://github.com/apache/spark/commit/d554659e5cc9a493fa01cc196ded53d0a9fe31c8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657453846 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657453623 **[Test build #125762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125762/testReport)** for PR 27735 at commit [`d554659`](https://github.com/apache/spark/commit/d554659e5cc9a493fa01cc196ded53d0a9fe31c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MountConfDirExecutorFeatureStep(conf: KubernetesConf)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
AmplabJenkins removed a comment on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657453017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
SparkQA removed a comment on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657394110 **[Test build #125753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125753/testReport)** for PR 29018 at commit [`19e4bcf`](https://github.com/apache/spark/commit/19e4bcfac6d876afd289b736bc19008032564c2d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
AmplabJenkins commented on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657453017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
SparkQA commented on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657452051 **[Test build #125753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125753/testReport)** for PR 29018 at commit [`19e4bcf`](https://github.com/apache/spark/commit/19e4bcfac6d876afd289b736bc19008032564c2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657451506 @dongjoon-hyun / @maropu - Its seems like some issue happened after I rebase this current branch with the master branch. Its now showing number of files change is -185. Not sure why its showing this difference This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
yaooqinn commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453527973 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", + ctx.interval()) + } else { +val seconds = (interval.microseconds / DateTimeConstants.MICROS_PER_SECOND).toInt +SetCommand(Some(key -> Some(ZoneOffset.ofTotalSeconds(seconds).toString))) + } +} else if (ctx.timezone != null) { + ctx.timezone.getType match { +case SqlBaseParser.LOCAL => + SetCommand(Some(key -> Some(TimeZone.getDefault.getID))) Review comment: This is the same as the default value of `spark.sql.session.timeZone`, shall I change that part too if I change it here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
yaooqinn commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453525655 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", + ctx.interval()) + } else { +val seconds = (interval.microseconds / DateTimeConstants.MICROS_PER_SECOND).toInt +SetCommand(Some(key -> Some(ZoneOffset.ofTotalSeconds(seconds).toString))) + } +} else if (ctx.timezone != null) { + ctx.timezone.getType match { +case SqlBaseParser.LOCAL => + SetCommand(Some(key -> Some(TimeZone.getDefault.getID))) +case _ => + SetCommand(Some(key -> Some(string(ctx.STRING + } +} else { + throw new ParseException("Invalid time zone displacement value", ctx) +} + } + + /** + * Create a [[ListTimeZonesCommand]] to retrieve all the supported region-based Zone IDs + * supported. + * Example SQL : + * {{{ + * SET TIME ZONE ALL; Review comment: I remove this, this is not standard This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
yaooqinn commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453525655 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", + ctx.interval()) + } else { +val seconds = (interval.microseconds / DateTimeConstants.MICROS_PER_SECOND).toInt +SetCommand(Some(key -> Some(ZoneOffset.ofTotalSeconds(seconds).toString))) + } +} else if (ctx.timezone != null) { + ctx.timezone.getType match { +case SqlBaseParser.LOCAL => + SetCommand(Some(key -> Some(TimeZone.getDefault.getID))) +case _ => + SetCommand(Some(key -> Some(string(ctx.STRING + } +} else { + throw new ParseException("Invalid time zone displacement value", ctx) +} + } + + /** + * Create a [[ListTimeZonesCommand]] to retrieve all the supported region-based Zone IDs + * supported. + * Example SQL : + * {{{ + * SET TIME ZONE ALL; Review comment: I removed this, this is not standard This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
yaooqinn commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453524124 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -240,6 +240,10 @@ statement | MSCK REPAIR TABLE multipartIdentifier #repairTable | op=(ADD | LIST) identifier (STRING | .*?) #manageResource | SET ROLE .*? #failNativeCommand +| SET TIME ZONE ALL #listTimeZones +| SET TIME ZONE interval #setTimeZone +| SET TIME ZONE timezone=(STRING | LOCAL) #setTimeZone +| SET TIME ZONE .*? #setTimeZone Review comment: this is used to fail invalid set time zone syntax explicitly, cuz' now we support ```sql spark-sql (default)> set time zone abcd; key value time zone abcd ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-657446183 **[Test build #125762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125762/testReport)** for PR 27735 at commit [`d554659`](https://github.com/apache/spark/commit/d554659e5cc9a493fa01cc196ded53d0a9fe31c8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI
Ngone51 commented on a change in pull request #29015: URL: https://github.com/apache/spark/pull/29015#discussion_r453521642 ## File path: core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala ## @@ -49,6 +56,23 @@ class MasterWebUI( "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = Set("POST"))) attachHandler(createRedirectHandler( "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = Set("POST"))) +attachHandler(createServletHandler("/workers/kill", new HttpServlet { + override def doPost(req: HttpServletRequest, resp: HttpServletResponse): Unit = { +val hostPorts: Seq[String] = Option(req.getParameterValues("host")) + .getOrElse(Array[String]()).map(_.toLowerCase(Locale.ENGLISH)).toSeq +logInfo(s"Received request to decommission hostPorts $hostPorts from ${req.getRemoteAddr}") Review comment: nit: `decommission hostPorts` -> `decommission workers`? ## File path: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ## @@ -863,7 +872,36 @@ private[deploy] class Master( true } - private def decommissionWorker(worker: WorkerInfo): Unit = { + private def handleDecommissionWorkers(hostPorts: Seq[String]): Integer = { +val hostsWithoutPorts = new HashSet[String] +val addressToRemove = new HashSet[RpcAddress] +hostPorts.foreach(hp => { Review comment: we can first filter the valid workers and then apply `decommissionWorker` for each worker, e.g. `hostPorsts.filter(...).foreach(...)` ## File path: core/src/main/scala/org/apache/spark/deploy/master/Master.scala ## @@ -863,7 +872,36 @@ private[deploy] class Master( true } - private def decommissionWorker(worker: WorkerInfo): Unit = { + private def handleDecommissionWorkers(hostPorts: Seq[String]): Integer = { +val hostsWithoutPorts = new HashSet[String] +val addressToRemove = new HashSet[RpcAddress] +hostPorts.foreach(hp => { + val (host, port) = Utils.parseHostPort(hp) + if (port == 0) { +hostsWithoutPorts.add(host) + } else { +val rpcAddress = new RpcAddress(host, port) +if (addressToWorker.contains(rpcAddress)) { + addressToRemove.add(rpcAddress) +} + } +}) + +if (hostsWithoutPorts.nonEmpty) { + addressToRemove ++= addressToWorker.keys.filter(addr => hostsWithoutPorts.contains(addr.host)) +} Review comment: It seems that comparing the `host` only should be enough? ## File path: core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala ## @@ -49,6 +56,23 @@ class MasterWebUI( "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = Set("POST"))) attachHandler(createRedirectHandler( "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = Set("POST"))) +attachHandler(createServletHandler("/workers/kill", new HttpServlet { + override def doPost(req: HttpServletRequest, resp: HttpServletResponse): Unit = { +val hostPorts: Seq[String] = Option(req.getParameterValues("host")) + .getOrElse(Array[String]()).map(_.toLowerCase(Locale.ENGLISH)).toSeq Review comment: Why use `Locale.ENGLISH`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
cloud-fan commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453520007 ## File path: sql/core/src/test/resources/sql-tests/results/timezone.sql.out ## @@ -0,0 +1,121 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 11 + + +-- !query +SET TIME ZONE 'Asia/Hong_Kong' +-- !query schema +struct +-- !query output +spark.sql.session.timeZone Asia/Hong_Kong Review comment: This looks like a weird output for `SET TIME ZONE 'Asia/Hong_Kong'`. Shall we add a `Project` node on the `SetCommand` to give better output> ## File path: sql/core/src/test/resources/sql-tests/results/timezone.sql.out ## @@ -0,0 +1,121 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 11 + + +-- !query +SET TIME ZONE 'Asia/Hong_Kong' +-- !query schema +struct +-- !query output +spark.sql.session.timeZone Asia/Hong_Kong Review comment: This looks like a weird output for `SET TIME ZONE 'Asia/Hong_Kong'`. Shall we add a `Project` node on the `SetCommand` to give better output? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657442634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
cloud-fan commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453519340 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", + ctx.interval()) + } else { +val seconds = (interval.microseconds / DateTimeConstants.MICROS_PER_SECOND).toInt +SetCommand(Some(key -> Some(ZoneOffset.ofTotalSeconds(seconds).toString))) + } +} else if (ctx.timezone != null) { + ctx.timezone.getType match { +case SqlBaseParser.LOCAL => + SetCommand(Some(key -> Some(TimeZone.getDefault.getID))) +case _ => + SetCommand(Some(key -> Some(string(ctx.STRING + } +} else { + throw new ParseException("Invalid time zone displacement value", ctx) +} + } + + /** + * Create a [[ListTimeZonesCommand]] to retrieve all the supported region-based Zone IDs + * supported. + * Example SQL : + * {{{ + * SET TIME ZONE ALL; Review comment: is this a standard SQL syntax? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
cloud-fan commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453519063 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -240,6 +240,10 @@ statement | MSCK REPAIR TABLE multipartIdentifier #repairTable | op=(ADD | LIST) identifier (STRING | .*?) #manageResource | SET ROLE .*? #failNativeCommand +| SET TIME ZONE ALL #listTimeZones +| SET TIME ZONE interval #setTimeZone +| SET TIME ZONE timezone=(STRING | LOCAL) #setTimeZone +| SET TIME ZONE .*? #setTimeZone Review comment: so we add this only for better parser message? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657442634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
cloud-fan commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453518326 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", + ctx.interval()) + } else { +val seconds = (interval.microseconds / DateTimeConstants.MICROS_PER_SECOND).toInt +SetCommand(Some(key -> Some(ZoneOffset.ofTotalSeconds(seconds).toString))) + } +} else if (ctx.timezone != null) { + ctx.timezone.getType match { +case SqlBaseParser.LOCAL => + SetCommand(Some(key -> Some(TimeZone.getDefault.getID))) Review comment: shall we use the `ZoneId.systemDefault()`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
cloud-fan commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453518326 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", + ctx.interval()) + } else { +val seconds = (interval.microseconds / DateTimeConstants.MICROS_PER_SECOND).toInt +SetCommand(Some(key -> Some(ZoneOffset.ofTotalSeconds(seconds).toString))) + } +} else if (ctx.timezone != null) { + ctx.timezone.getType match { +case SqlBaseParser.LOCAL => + SetCommand(Some(key -> Some(TimeZone.getDefault.getID))) Review comment: shall we use the `ZoneID`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657441762 **[Test build #125761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125761/testReport)** for PR 27428 at commit [`5bbbfd7`](https://github.com/apache/spark/commit/5bbbfd771e7aa1cfa6c41ccced92c8f0cf760a4d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
cloud-fan commented on a change in pull request #29064: URL: https://github.com/apache/spark/pull/29064#discussion_r453518043 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ## @@ -90,6 +93,51 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { ResetCommand } + /** + * Create a [[SetCommand]] logical plan to set [[SQLConf.SESSION_LOCAL_TIMEZONE]] + * Example SQL : + * {{{ + * SET TIME ZONE LOCAL; + * SET TIME ZONE 'Asia/Shanghai'; + * SET TIME ZONE INTERVAL 10 HOURS; + * }}} + */ + override def visitSetTimeZone(ctx: SetTimeZoneContext): LogicalPlan = withOrigin(ctx) { +val key = SQLConf.SESSION_LOCAL_TIMEZONE.key +if (ctx.interval != null) { + val interval = parseIntervalLiteral(ctx.interval) + if (interval.months != 0 || interval.days != 0 || +math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) { +throw new ParseException("The interval value must be in the range of [-18, +18] hours", Review comment: shall we also fail for things like `INTERVAL 1 MICROSECOND`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453513546 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] Review comment: Do we need to rewrite this query? The planner can handle single distinct agg func AFAIK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453511010 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * COUNT(DISTINCT cat2) as cat2_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * COUNT(DISTINCT 'cat2), + * sum('value)] + *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), + * first(if (('gid = 0)) 'total else null) ignore nulls] + * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * Aggregate( + *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + *functions = [sum('_gen_attr_3)] + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) + * Expand( + * projections = [('key, null, null, 0, cast('value as bigint)), + * ('key, if ('id > 1) 'cat1 else null, null, 1, null), + * ('key, null, 'cat2, 2, null)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3]) + * LocalTableScan [...] + * }}} + * + * The rule consists of the two phases as follows: + * + * In the first phase, if the aggregate query exists filter clauses, project the output of + * the child of the aggregate query: + * 1. Project the data. There are three aggregation groups in this query: + *i. the non-distinct group; + *ii. the distinct 'cat1 group; + *iii. the distinct 'cat2 group with filter clause. + *Because there is at least one group with filter clause (e.g. the distinct 'cat2 + *group with filter clause), then will project the data. + * 2. Avoid projections that may output the same attributes. There are three aggregation groups + *in this query: + *i. the non-distinct 'cat1 group; + *ii. the distinct
[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453510469 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * COUNT(DISTINCT cat2) as cat2_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * COUNT(DISTINCT 'cat2), + * sum('value)] + *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), + * first(if (('gid = 0)) 'total else null) ignore nulls] + * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * Aggregate( + *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + *functions = [sum('_gen_attr_3)] + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) + * Expand( + * projections = [('key, null, null, 0, cast('value as bigint)), + * ('key, if ('id > 1) 'cat1 else null, null, 1, null), + * ('key, null, 'cat2, 2, null)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3]) + * LocalTableScan [...] + * }}} + * + * The rule consists of the two phases as follows: + * + * In the first phase, if the aggregate query exists filter clauses, project the output of + * the child of the aggregate query: + * 1. Project the data. There are three aggregation groups in this query: + *i. the non-distinct group; + *ii. the distinct 'cat1 group; + *iii. the distinct 'cat2 group with filter clause. + *Because there is at least one group with filter clause (e.g. the distinct 'cat2 Review comment: When there is at least one aggregate function has the filter clause, we add a project node on the input plan. ## File path:
[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453509648 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * COUNT(DISTINCT cat2) as cat2_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * COUNT(DISTINCT 'cat2), + * sum('value)] + *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), + * first(if (('gid = 0)) 'total else null) ignore nulls] + * output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) + * Aggregate( + *key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + *functions = [sum('_gen_attr_3)] + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) + * Expand( + * projections = [('key, null, null, 0, cast('value as bigint)), + * ('key, if ('id > 1) 'cat1 else null, null, 1, null), + * ('key, null, 'cat2, 2, null)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3]) + * LocalTableScan [...] + * }}} + * + * The rule consists of the two phases as follows: + * + * In the first phase, if the aggregate query exists filter clauses, project the output of + * the child of the aggregate query: + * 1. Project the data. There are three aggregation groups in this query: + *i. the non-distinct group; + *ii. the distinct 'cat1 group; + *iii. the distinct 'cat2 group with filter clause. Review comment: This doesn't match the group. Maybe just make it general `the distinct group without filter clause` and `the distinct group with filter clause` This is an automated message from the Apache Git Service. To respond to the message, please log
[GitHub] [spark] cloud-fan commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r453508040 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -102,23 +102,126 @@ import org.apache.spark.sql.types.IntegerType * {{{ * Aggregate( *key = ['key] - *functions = [count(if (('gid = 1)) 'cat1 else null), - * count(if (('gid = 2)) 'cat2 else null), + *functions = [count(if (('gid = 1)) '_gen_attr_1 else null), + * count(if (('gid = 2)) '_gen_attr_2 else null), * first(if (('gid = 0)) 'total else null) ignore nulls] *output = ['key, 'cat1_cnt, 'cat2_cnt, 'total]) * Aggregate( - * key = ['key, 'cat1, 'cat2, 'gid] - * functions = [sum('value) with FILTER('id > 1)] - * output = ['key, 'cat1, 'cat2, 'gid, 'total]) + * key = ['key, '_gen_attr_1, '_gen_attr_2, 'gid] + * functions = [sum('_gen_attr_3)] + * output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, 'total]) * Expand( - *projections = [('key, null, null, 0, cast('value as bigint), 'id), + *projections = [('key, null, null, 0, if ('id > 1) cast('value as bigint) else null, 'id), * ('key, 'cat1, null, 1, null, null), * ('key, null, 'cat2, 2, null, null)] - *output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id]) + *output = ['key, '_gen_attr_1, '_gen_attr_2, 'gid, '_gen_attr_3, 'id]) + * LocalTableScan [...] + * }}} + * + * Third example: single distinct aggregate function with filter clauses and have + * not other distinct aggregate function (in sql): + * {{{ + * SELECT + * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt, + * SUM(value) AS total + * FROM + *data + * GROUP BY + *key + * }}} + * + * This translates to the following (pseudo) logical plan: + * {{{ + * Aggregate( + *key = ['key] + *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1), + * sum('value)] + *output = ['key, 'cat1_cnt, 'total]) + * LocalTableScan [...] + * }}} + * + * This rule rewrites this logical plan to the following (pseudo) logical plan: + * {{{ + * Aggregate( + * key = ['key] + * functions = [count('_gen_attr_1), + * sum('_gen_attr_2)] + * output = ['key, 'cat1_cnt, 'total]) + * Project( + *projectionList = ['key, if ('id > 1) 'cat1 else null, cast('value as bigint)] + *output = ['key, '_gen_attr_1, '_gen_attr_2]) * LocalTableScan [...] * }}} * - * The rule does the following things here: + * Four example: single distinct aggregate function with filter clauses (in sql): Review comment: `single` -> `more than one`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AmplabJenkins removed a comment on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657424545 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657424743 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
AmplabJenkins removed a comment on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657424673 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning
AmplabJenkins removed a comment on pull request #29075: URL: https://github.com/apache/spark/pull/29075#issuecomment-657424693 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657424743 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AmplabJenkins commented on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657424545 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning
AmplabJenkins commented on pull request #29075: URL: https://github.com/apache/spark/pull/29075#issuecomment-657424693 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
AmplabJenkins commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657424673 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE
SparkQA commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-657423952 **[Test build #125759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125759/testReport)** for PR 29064 at commit [`95a756a`](https://github.com/apache/spark/commit/95a756a435377edf5e4b7d9f0de7863d7ed6f9a5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SparkQA commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657423953 **[Test build #125760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125760/testReport)** for PR 29045 at commit [`4469e6d`](https://github.com/apache/spark/commit/4469e6d1c5970bc5de3d4a4dc728e9775e43e141). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
SparkQA commented on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657423819 **[Test build #125757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125757/testReport)** for PR 29084 at commit [`d24dbed`](https://github.com/apache/spark/commit/d24dbede2b50707e30bd8261bf324e9f9cc731b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning
SparkQA commented on pull request #29075: URL: https://github.com/apache/spark/pull/29075#issuecomment-657423860 **[Test build #125758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125758/testReport)** for PR 29075 at commit [`6fe106c`](https://github.com/apache/spark/commit/6fe106ce84e83641d66d572c45050b25761bbf3d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #27983: [SPARK-32105][SQL]Refactor current ScriptTransformationExec code
cloud-fan commented on pull request #27983: URL: https://github.com/apache/spark/pull/27983#issuecomment-657422187 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #27983: [SPARK-32105][SQL]Refactor current ScriptTransformationExec code
cloud-fan closed pull request #27983: URL: https://github.com/apache/spark/pull/27983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AngersZh commented on a change in pull request #29084: URL: https://github.com/apache/spark/pull/29084#discussion_r453496821 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -191,15 +191,19 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { def createSortMergeJoin() = { if (RowOrdering.isOrderable(leftKeys)) { Some(Seq(joins.SortMergeJoinExec( - leftKeys, rightKeys, joinType, condition, planLater(left), planLater(right + leftKeys, rightKeys, joinType, nonEquiCond, planLater(left), planLater(right } else { None } } def createCartesianProduct() = { if (joinType.isInstanceOf[InnerLike]) { -Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), p.condition))) +// Since for join hint [[SHUFFLE_REPLICATE_NL]] when having equi join conditions, +// we still choose Cartesian Product Join, but in ExtractEquiJoinKeys, it will filter +// out equi condition. Instead of using the condition extracted by ExtractEquiJoinKeys, +// we should use the original join condition "j.condition". Review comment: > how about: > > ``` > `CartesianProductExec` can't implicitly evaluate the equal join condition, here we should > pass the original condition which includes both equal and non-equal conditions. > ``` Updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
cloud-fan commented on a change in pull request #29084: URL: https://github.com/apache/spark/pull/29084#discussion_r453480891 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -191,15 +191,19 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { def createSortMergeJoin() = { if (RowOrdering.isOrderable(leftKeys)) { Some(Seq(joins.SortMergeJoinExec( - leftKeys, rightKeys, joinType, condition, planLater(left), planLater(right + leftKeys, rightKeys, joinType, nonEquiCond, planLater(left), planLater(right } else { None } } def createCartesianProduct() = { if (joinType.isInstanceOf[InnerLike]) { -Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), p.condition))) +// Since for join hint [[SHUFFLE_REPLICATE_NL]] when having equi join conditions, +// we still choose Cartesian Product Join, but in ExtractEquiJoinKeys, it will filter +// out equi condition. Instead of using the condition extracted by ExtractEquiJoinKeys, +// we should use the original join condition "j.condition". Review comment: how about: ``` `CartesianProductExec` can't implicitly evaluate the equal join condition, here we should pass the original condition which includes both equal and non-equal conditions. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins removed a comment on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657404261 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
AmplabJenkins commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657404261 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
AmplabJenkins removed a comment on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657404294 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
AmplabJenkins commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657404294 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5
SparkQA commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657403743 **[Test build #125755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125755/testReport)** for PR 28957 at commit [`5c27ea8`](https://github.com/apache/spark/commit/5c27ea8c6c3bbf0c86e539c0002aa24ec18f591d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
SparkQA commented on pull request #27428: URL: https://github.com/apache/spark/pull/27428#issuecomment-657403782 **[Test build #125756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125756/testReport)** for PR 27428 at commit [`d531864`](https://github.com/apache/spark/commit/d531864c7da4244674377369854ab2f02b5acfa2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions
dbtsai commented on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657401623 @HyukjinKwon and @dongjoon-hyun seems there are plugs for Github actions to show the failing tests, for example https://github.com/check-run-reporter/action This might help for people to read the tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai edited a comment on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions
dbtsai edited a comment on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657401623 Great job! @HyukjinKwon and @dongjoon-hyun seems there are plugs for Github actions to show the failing tests, for example https://github.com/check-run-reporter/action This might help for people to read the tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AmplabJenkins removed a comment on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657400982 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AmplabJenkins commented on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657400982 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
SparkQA commented on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657400399 **[Test build #125754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125754/testReport)** for PR 29084 at commit [`f8774ec`](https://github.com/apache/spark/commit/f8774ecc0e5576b4c42d8cca8759c586dbe7c000). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AngersZh opened a new pull request #29084: URL: https://github.com/apache/spark/pull/29084 ### What changes were proposed in this pull request? follow comment https://github.com/apache/spark/pull/29035#discussion_r453468999 Explain for pr ### Why are the changes needed? add comment ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AngersZh commented on a change in pull request #29035: URL: https://github.com/apache/spark/pull/29035#discussion_r453473254 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -199,7 +199,7 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { def createCartesianProduct() = { if (joinType.isInstanceOf[InnerLike]) { -Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition))) +Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), p.condition))) Review comment: > We need to write a comment above this line and explain what it is doing. > // instead of using the condition extracted by ExtractEquiJoinKeys, we should use the original join condition, i.e., "p.condition". Raise a PR https://github.com/apache/spark/pull/29084 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29084: [SPARK-32220][SQL][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
AngersZh commented on pull request #29084: URL: https://github.com/apache/spark/pull/29084#issuecomment-657398681 cc @dongjoon-hyun @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
AmplabJenkins removed a comment on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657394198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
AmplabJenkins commented on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657394198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type
SparkQA commented on pull request #29018: URL: https://github.com/apache/spark/pull/29018#issuecomment-657394110 **[Test build #125753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125753/testReport)** for PR 29018 at commit [`19e4bcf`](https://github.com/apache/spark/commit/19e4bcfac6d876afd289b736bc19008032564c2d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #29035: [SPARK-32220][SQL]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
gatorsmile commented on a change in pull request #29035: URL: https://github.com/apache/spark/pull/29035#discussion_r453469535 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -159,7 +159,7 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // 4. Pick cartesian product if join type is inner like. // 5. Pick broadcast nested loop join as the final solution. It may OOM but we don't have // other choice. - case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) => + case p @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) => Review comment: To avoid making the similar mistakes, we need to rename `condition` to a self-descriptive name. "otherConditions"? It is a little bit hard to name it TBH This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657393312 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125750/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657393281 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125751/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning
AmplabJenkins removed a comment on pull request #29075: URL: https://github.com/apache/spark/pull/29075#issuecomment-657393342 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125746/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SparkQA removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657361898 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
AmplabJenkins removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657393265 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
SparkQA removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657367971 **[Test build #125751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125751/testReport)** for PR 28363 at commit [`b648156`](https://github.com/apache/spark/commit/b64815622bb4e8cd8b474cb2983f2c9b78ed9342). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657393278 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning
AmplabJenkins removed a comment on pull request #29075: URL: https://github.com/apache/spark/pull/29075#issuecomment-657393340 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning
SparkQA removed a comment on pull request #29075: URL: https://github.com/apache/spark/pull/29075#issuecomment-657337043 **[Test build #125746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125746/testReport)** for PR 29075 at commit [`df08390`](https://github.com/apache/spark/commit/df083906f2d9938d77b4b41878271af1826b5220). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org