[GitHub] [spark] SparkQA removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
SparkQA removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#issuecomment-518874024 **[Test build #108733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108733/testReport)** for PR 25372 at commit [`8dd1feb`](https://github.com/apache/spark/commit/8dd1feba8231dc0b73e08935997f4acd8eb957d6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#issuecomment-518875170 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#issuecomment-518912413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108733/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#issuecomment-518912460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108738/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#issuecomment-518912451 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
WeichenXu123 commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518912457 @HyukjinKwon @dongjoon-hyun I also update `github_jira_sync.py` and `run-test-jenkins.py`. Now all scripts are covered. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
SparkQA commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518913142 **[Test build #108740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108740/testReport)** for PR 25289 at commit [`2735255`](https://github.com/apache/spark/commit/273525538128ce753b7f3b3c1b0f39838a4dea82). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
SparkQA removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#issuecomment-518911536 **[Test build #108738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108738/testReport)** for PR 25348 at commit [`3e35d5c`](https://github.com/apache/spark/commit/3e35d5c84b4ba0b332981382f71f68eb05b966e2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#issuecomment-518912413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108733/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#issuecomment-518912412 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#issuecomment-518912451 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518912743 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13820/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#issuecomment-518912460 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108738/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518912736 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518912743 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13820/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518912736 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
SparkQA commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#issuecomment-518912442 **[Test build #108738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108738/testReport)** for PR 25348 at commit [`3e35d5c`](https://github.com/apache/spark/commit/3e35d5c84b4ba0b332981382f71f68eb05b966e2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait SupportsV1Write extends SparkPlan ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#issuecomment-518912412 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518911554 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108739/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a change in pull request #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
WeichenXu123 commented on a change in pull request #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#discussion_r311345989 ## File path: python/run-tests.py ## @@ -161,8 +161,13 @@ def run_individual_python_test(target_dir, test_name, pyspark_python): def get_default_python_executables(): python_execs = [x for x in ["python2.7", "python3.6", "pypy"] if which(x)] -if "python2.7" not in python_execs: -LOGGER.warning("Not testing against `python2.7` because it could not be found; falling" +if ("python3.6" not in python_execs) and which("python3"): +LOGGER.warning("Not testing against `python3.6` because it could not be found; falling" + " back to `python3` instead") +python_execs.insert(0, "python3") Review comment: I revert the change here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518914175 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518914177 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13821/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array
beliefer commented on issue #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array URL: https://github.com/apache/spark/pull/25172#issuecomment-518913939 @maropu Could you continue to take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518914175 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518914177 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13821/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence
gatorsmile commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence URL: https://github.com/apache/spark/pull/25279#issuecomment-518915141 @srowen Could you show the perf benchmark? The performance regression is expected, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag
dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag URL: https://github.com/apache/spark/pull/25229#issuecomment-518916061 @skonto . My name is `Dongjoon Hyun` with GitHub id `@dongjoon-hyun`. :) I knew the history of #24792 and Spark uses `YarnSparkHadoopUtil.addOutOfMemoryErrorArgument`. I agree with the necessity of this since that PR, and tried to review/merge this PR. However, the current implementation seems a little too complicated and not robust, especially the part of copying `/opt/spark/conf/spark.properties` to `/tmp/spark.properties` and replacing it. In general, I believe the best UX is to keep it simple by reusing the existing general one. `OnOutOfMemoryError` is a well-known option for JVM users and `spark.driver.extraJavaOptions` is for that kind of option. As of now, I prefer a new documentation, but other committers may have different opinions. Let me ping them to get their advice. Hi, @srowen , @squito , @mccheah , @zsxwing , @tgravescs . This is the continuation of #24796 . Could you review this PR's implementation in order to make a progress and finalize the issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #25229: [SPARK-27900][K8s] Add jvm oom flag
dongjoon-hyun edited a comment on issue #25229: [SPARK-27900][K8s] Add jvm oom flag URL: https://github.com/apache/spark/pull/25229#issuecomment-518916061 @skonto . My name is `Dongjoon Hyun` with GitHub id `@dongjoon-hyun`. :) I knew the history of #24796 and Spark uses `YarnSparkHadoopUtil.addOutOfMemoryErrorArgument`. I agree with the necessity of this since that PR, and tried to review/merge this PR. However, the current implementation seems a little too complicated and not robust, especially the part of copying `/opt/spark/conf/spark.properties` to `/tmp/spark.properties` and replacing it. In general, I believe the best UX is to keep it simple by reusing the existing general one. `OnOutOfMemoryError` is a well-known option for JVM users and `spark.driver.extraJavaOptions` is for that kind of option. As of now, I prefer a new documentation, but other committers may have different opinions. Let me ping them to get their advice. Hi, @srowen , @squito , @mccheah , @zsxwing , @tgravescs . This is the continuation of #24796 . Could you review this PR's implementation in order to make a progress and finalize the issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
beliefer commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test URL: https://github.com/apache/spark/pull/25366#issuecomment-518916476 > I made a PR to you, @beliefer . Please review and merge. > > * [beliefer#2](https://github.com/beliefer/spark/pull/2) Thanks for your help! It's my mistake. I forget something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-518917065 @dongjoon-hyun Could you check this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
SparkQA commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test URL: https://github.com/apache/spark/pull/25366#issuecomment-518917430 **[Test build #108741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108741/testReport)** for PR 25366 at commit [`30ed8d3`](https://github.com/apache/spark/commit/30ed8d3f61e5455d17efe77e6e5985c7e72d0109). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence
srowen commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence URL: https://github.com/apache/spark/pull/25279#issuecomment-518917299 I did not benchmark this as I think it's a correctness issue that would be worth a perf hit. I also expect it makes almost no difference - computing a function in SQL is dominated by so much more than the math here. Let me assess that though with some microbenchmarks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
AmplabJenkins removed a comment on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test URL: https://github.com/apache/spark/pull/25366#issuecomment-518918462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13822/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test URL: https://github.com/apache/spark/pull/25366#issuecomment-518918460 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
AmplabJenkins removed a comment on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test URL: https://github.com/apache/spark/pull/25366#issuecomment-518918460 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test URL: https://github.com/apache/spark/pull/25366#issuecomment-518918462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13822/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
SparkQA commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518918889 **[Test build #108742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108742/testReport)** for PR 25289 at commit [`34c3af7`](https://github.com/apache/spark/commit/34c3af7e22e1c9fb6fad27bb03b45a1e95f6d828). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#discussion_r311351159 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -501,3 +528,19 @@ private[v2] case class DataWritingSparkTaskResult( * Sink progress information collected after commit. */ private[sql] case class StreamWriterCommitProgress(numOutputRows: Long) + +/** + * A trait that allows Tables that use V1 Writer interfaces to write data. + */ +sealed trait SupportsV1Write extends V2TableWriteExec { + def plan: LogicalPlan + + protected def writeWithV1( + relation: CreatableRelationProvider, + mode: SaveMode, + options: CaseInsensitiveStringMap): RDD[InternalRow] = { +relation.createRelation( + sqlContext, mode, options.asScala.toMap, Dataset.ofRows(sqlContext.sparkSession, plan)) +sparkContext.emptyRDD Review comment: ``` val writtenRows = writer match { case v1: V1WriteBuilder => writeWithV1(v1.buildForV1Write(), writeOptions) case v2 => doWrite(v2.buildForBatch()) }``` If this is always empty why do we save it as writtenRows here? This is just to hold a reference to the empty result set? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#discussion_r311351159 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -501,3 +528,19 @@ private[v2] case class DataWritingSparkTaskResult( * Sink progress information collected after commit. */ private[sql] case class StreamWriterCommitProgress(numOutputRows: Long) + +/** + * A trait that allows Tables that use V1 Writer interfaces to write data. + */ +sealed trait SupportsV1Write extends V2TableWriteExec { + def plan: LogicalPlan + + protected def writeWithV1( + relation: CreatableRelationProvider, + mode: SaveMode, + options: CaseInsensitiveStringMap): RDD[InternalRow] = { +relation.createRelation( + sqlContext, mode, options.asScala.toMap, Dataset.ofRows(sqlContext.sparkSession, plan)) +sparkContext.emptyRDD Review comment: ``` val writtenRows = writer match { case v1: V1WriteBuilder => writeWithV1(v1.buildForV1Write(), writeOptions) case v2 => doWrite(v2.buildForBatch()) } ``` If this is always empty why do we save it as writtenRows here? This is just to hold a reference to the empty result set? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths
RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths URL: https://github.com/apache/spark/pull/25348#discussion_r311351159 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -501,3 +528,19 @@ private[v2] case class DataWritingSparkTaskResult( * Sink progress information collected after commit. */ private[sql] case class StreamWriterCommitProgress(numOutputRows: Long) + +/** + * A trait that allows Tables that use V1 Writer interfaces to write data. + */ +sealed trait SupportsV1Write extends V2TableWriteExec { + def plan: LogicalPlan + + protected def writeWithV1( + relation: CreatableRelationProvider, + mode: SaveMode, + options: CaseInsensitiveStringMap): RDD[InternalRow] = { +relation.createRelation( + sqlContext, mode, options.asScala.toMap, Dataset.ofRows(sqlContext.sparkSession, plan)) +sparkContext.emptyRDD Review comment: ``` val writtenRows = writer match { case v1: V1WriteBuilder => writeWithV1(v1.buildForV1Write(), writeOptions) case v2 => doWrite(v2.buildForBatch()) } ``` If this is always empty why do we save it as writtenRows here? This is just to hold a reference to the empty result set? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518920002 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13823/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518919997 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518919997 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3 URL: https://github.com/apache/spark/pull/25289#issuecomment-518920002 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13823/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] advancedxy commented on a change in pull request #25306: [SPARK-28573][SQL] Convert InsertIntoTable(HiveTableRelation) to DataSource inserting for partitioned table
advancedxy commented on a change in pull request #25306: [SPARK-28573][SQL] Convert InsertIntoTable(HiveTableRelation) to DataSource inserting for partitioned table URL: https://github.com/apache/spark/pull/25306#discussion_r311351669 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala ## @@ -58,7 +58,11 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto |TBLPROPERTIES('prop1Key'="prop1Val", '`prop2Key`'="prop2Val") """.stripMargin) sql("CREATE TABLE parquet_tab3(col1 int, `col 2` int)") -sql("CREATE TABLE parquet_tab4 (price int, qty int) partitioned by (year int, month int)") +sql( + """ +|CREATE TABLE parquet_tab4 (price int, qty int) partitioned by (year int, month int) +|STORED AS PARQUET Review comment: It's modified because I randomly chose it to make sure the insert into partitioned table can be safely converted. However it should already covered in other test cases. So it's neutral change. I can revert it if you think it's unnecessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#discussion_r311352726 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ## @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper { * extended with the set of connected/unconnected plans. */ case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int]) + +/** + * Reorder the joins using a genetic algorithm. The algorithm treat the reorder problem + * to a traveling salesmen problem, and use genetic algorithm give an optimized solution. + * + * The implementation refs the geqo in postgresql, which is contibuted by Darrell Whitley: + * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html + * + * For more info about genetic algorithm and the edge recombination crossover, pls see: + * "A Genetic Algorithm Tutorial, Darrell Whitley" + * https://link.springer.com/article/10.1007/BF00175354 + * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator, + * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238 + * respectively. + */ +object JoinReorderGA extends PredicateHelper with Logging { + + def search( + conf: SQLConf, + items: Seq[LogicalPlan], + conditions: Set[Expression], + output: Seq[Attribute]): Option[LogicalPlan] = { + +val startTime = System.nanoTime() + +val itemsWithIndex = items.zipWithIndex.map { + case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0)) +}.toMap + +val topOutputSet = AttributeSet(output) + +val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve + +val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) +logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + +s"${items.length}, number of plans in memo: ${ pop.chromos.size}") + +assert(pop.chromos.head.basicPlans.size == items.length) +pop.chromos.head.integratedPlan match { + case Some(joinPlan) => joinPlan.plan match { +case p @ Project(projectList, _: Join) if projectList != output => + assert(topOutputSet == p.outputSet) + // Keep the same order of final output attributes. + Some(p.copy(projectList = output)) +case finalPlan if !sameOutput(finalPlan, output) => + Some(Project(output, finalPlan)) +case finalPlan => + Some(finalPlan) + } + case _ => None +} + } +} + +/** + * A pair of parent individuals can breed a child with certain crossover process. + * With crossover, child can inherit gene from its parents, and these gene snippets + * finally compose a new [[Chromosome]]. + */ +@DeveloperApi +trait Crossover { + + /** + * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s, + * with this crossover algorithm. + */ + def newChromo(father: Chromosome, mother: Chromosome) : Chromosome +} + +case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]]) + +/** + * This class implements the Genetic Edge Recombination algorithm. + * For more information about the Genetic Edge Recombination, + * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge + * Recombination Operator" by Darrell Whitley et al. + * https://dl.acm.org/citation.cfm?id=657238 + */ +object EdgeRecombination extends Crossover { + + def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = { +val fatherTable = father.basicPlans.map(g => g -> findNeighbours(father.basicPlans, g)).toMap +val motherTable = mother.basicPlans.map(g => g -> findNeighbours(mother.basicPlans, g)).toMap +EdgeTable( + fatherTable.map(entry => entry._1 -> (entry._2 ++ motherTable(entry._1 + } + + def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = { +val genesIndexed = genes.toIndexedSeq +val index = genesIndexed.indexOf(g) +val length = genes.size +if (index > 0 && index < length - 1) { + Seq(genesIndexed(index - 1), genesIndexed(index + 1)) +} else if (index == 0) { + Seq(genesIndexed(1), genesIndexed(length - 1)) +} else if (index == length - 1) { + Seq(genesIndexed(0), genesIndexed(length - 2)) +} else { + Seq() +} + } + + override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = { +var newGenes: Seq[JoinPlan] = Seq() +// 1. Generate the edge table. +var table = genEdgeTable(father, mother).table +// 2. Choose a start point randomly from the heads of father/mother. +var current = + if (util.Random.nextInt(2) == 0) father.basicPlans.head else mother.basicPlans.head +newGenes :+= current + +var stop = false +while (!stop) { + // 3. Filter out the chosen point from the edge table. + table = table.map( +
[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#discussion_r311352702 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ## @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper { * extended with the set of connected/unconnected plans. */ case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int]) + +/** + * Reorder the joins using a genetic algorithm. The algorithm treat the reorder problem + * to a traveling salesmen problem, and use genetic algorithm give an optimized solution. + * + * The implementation refs the geqo in postgresql, which is contibuted by Darrell Whitley: + * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html + * + * For more info about genetic algorithm and the edge recombination crossover, pls see: + * "A Genetic Algorithm Tutorial, Darrell Whitley" + * https://link.springer.com/article/10.1007/BF00175354 + * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator, + * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238 + * respectively. + */ +object JoinReorderGA extends PredicateHelper with Logging { + + def search( + conf: SQLConf, + items: Seq[LogicalPlan], + conditions: Set[Expression], + output: Seq[Attribute]): Option[LogicalPlan] = { + +val startTime = System.nanoTime() + +val itemsWithIndex = items.zipWithIndex.map { + case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0)) +}.toMap + +val topOutputSet = AttributeSet(output) + +val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve + +val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) +logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + +s"${items.length}, number of plans in memo: ${ pop.chromos.size}") + +assert(pop.chromos.head.basicPlans.size == items.length) +pop.chromos.head.integratedPlan match { + case Some(joinPlan) => joinPlan.plan match { +case p @ Project(projectList, _: Join) if projectList != output => + assert(topOutputSet == p.outputSet) + // Keep the same order of final output attributes. + Some(p.copy(projectList = output)) +case finalPlan if !sameOutput(finalPlan, output) => + Some(Project(output, finalPlan)) +case finalPlan => + Some(finalPlan) + } + case _ => None +} + } +} + +/** + * A pair of parent individuals can breed a child with certain crossover process. + * With crossover, child can inherit gene from its parents, and these gene snippets + * finally compose a new [[Chromosome]]. + */ +@DeveloperApi +trait Crossover { + + /** + * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s, + * with this crossover algorithm. + */ + def newChromo(father: Chromosome, mother: Chromosome) : Chromosome +} + +case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]]) + +/** + * This class implements the Genetic Edge Recombination algorithm. + * For more information about the Genetic Edge Recombination, + * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge + * Recombination Operator" by Darrell Whitley et al. + * https://dl.acm.org/citation.cfm?id=657238 + */ +object EdgeRecombination extends Crossover { Review comment: Done. Added a simple description and an example of the algorithm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#discussion_r311352767 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ## @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper { * extended with the set of connected/unconnected plans. */ case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int]) + +/** + * Reorder the joins using a genetic algorithm. The algorithm treat the reorder problem + * to a traveling salesmen problem, and use genetic algorithm give an optimized solution. + * + * The implementation refs the geqo in postgresql, which is contibuted by Darrell Whitley: + * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html + * + * For more info about genetic algorithm and the edge recombination crossover, pls see: + * "A Genetic Algorithm Tutorial, Darrell Whitley" + * https://link.springer.com/article/10.1007/BF00175354 + * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator, + * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238 + * respectively. + */ +object JoinReorderGA extends PredicateHelper with Logging { + + def search( + conf: SQLConf, + items: Seq[LogicalPlan], + conditions: Set[Expression], + output: Seq[Attribute]): Option[LogicalPlan] = { + +val startTime = System.nanoTime() + +val itemsWithIndex = items.zipWithIndex.map { + case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0)) +}.toMap + +val topOutputSet = AttributeSet(output) + +val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve + +val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) +logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + +s"${items.length}, number of plans in memo: ${ pop.chromos.size}") + +assert(pop.chromos.head.basicPlans.size == items.length) +pop.chromos.head.integratedPlan match { + case Some(joinPlan) => joinPlan.plan match { +case p @ Project(projectList, _: Join) if projectList != output => + assert(topOutputSet == p.outputSet) + // Keep the same order of final output attributes. + Some(p.copy(projectList = output)) +case finalPlan if !sameOutput(finalPlan, output) => + Some(Project(output, finalPlan)) +case finalPlan => + Some(finalPlan) + } + case _ => None +} + } +} + +/** + * A pair of parent individuals can breed a child with certain crossover process. + * With crossover, child can inherit gene from its parents, and these gene snippets + * finally compose a new [[Chromosome]]. + */ +@DeveloperApi +trait Crossover { + + /** + * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s, + * with this crossover algorithm. + */ + def newChromo(father: Chromosome, mother: Chromosome) : Chromosome +} + +case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]]) + +/** + * This class implements the Genetic Edge Recombination algorithm. + * For more information about the Genetic Edge Recombination, + * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge + * Recombination Operator" by Darrell Whitley et al. + * https://dl.acm.org/citation.cfm?id=657238 + */ +object EdgeRecombination extends Crossover { + + def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = { +val fatherTable = father.basicPlans.map(g => g -> findNeighbours(father.basicPlans, g)).toMap +val motherTable = mother.basicPlans.map(g => g -> findNeighbours(mother.basicPlans, g)).toMap +EdgeTable( + fatherTable.map(entry => entry._1 -> (entry._2 ++ motherTable(entry._1 + } + + def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = { +val genesIndexed = genes.toIndexedSeq +val index = genesIndexed.indexOf(g) +val length = genes.size +if (index > 0 && index < length - 1) { + Seq(genesIndexed(index - 1), genesIndexed(index + 1)) +} else if (index == 0) { + Seq(genesIndexed(1), genesIndexed(length - 1)) +} else if (index == length - 1) { + Seq(genesIndexed(0), genesIndexed(length - 2)) +} else { + Seq() +} + } + + override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = { +var newGenes: Seq[JoinPlan] = Seq() +// 1. Generate the edge table. +var table = genEdgeTable(father, mother).table +// 2. Choose a start point randomly from the heads of father/mother. +var current = + if (util.Random.nextInt(2) == 0) father.basicPlans.head else mother.basicPlans.head +newGenes :+= current + +var stop = false +while (!stop) { + // 3. Filter out the chosen point from the edge table. + table = table.map( +
[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#discussion_r311352547 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ## @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper { * extended with the set of connected/unconnected plans. */ case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int]) + +/** + * Reorder the joins using a genetic algorithm. The algorithm treat the reorder problem + * to a traveling salesmen problem, and use genetic algorithm give an optimized solution. + * + * The implementation refs the geqo in postgresql, which is contibuted by Darrell Whitley: + * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html + * + * For more info about genetic algorithm and the edge recombination crossover, pls see: + * "A Genetic Algorithm Tutorial, Darrell Whitley" + * https://link.springer.com/article/10.1007/BF00175354 + * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator, + * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238 + * respectively. + */ +object JoinReorderGA extends PredicateHelper with Logging { + + def search( + conf: SQLConf, + items: Seq[LogicalPlan], + conditions: Set[Expression], + output: Seq[Attribute]): Option[LogicalPlan] = { + +val startTime = System.nanoTime() + +val itemsWithIndex = items.zipWithIndex.map { + case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0)) +}.toMap + +val topOutputSet = AttributeSet(output) + +val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve + +val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) +logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + +s"${items.length}, number of plans in memo: ${ pop.chromos.size}") + +assert(pop.chromos.head.basicPlans.size == items.length) +pop.chromos.head.integratedPlan match { + case Some(joinPlan) => joinPlan.plan match { +case p @ Project(projectList, _: Join) if projectList != output => + assert(topOutputSet == p.outputSet) + // Keep the same order of final output attributes. + Some(p.copy(projectList = output)) +case finalPlan if !sameOutput(finalPlan, output) => + Some(Project(output, finalPlan)) +case finalPlan => + Some(finalPlan) + } + case _ => None +} + } +} + +/** + * A pair of parent individuals can breed a child with certain crossover process. + * With crossover, child can inherit gene from its parents, and these gene snippets + * finally compose a new [[Chromosome]]. + */ +@DeveloperApi +trait Crossover { + + /** + * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s, + * with this crossover algorithm. + */ + def newChromo(father: Chromosome, mother: Chromosome) : Chromosome +} + +case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]]) Review comment: Removed `EdgeRecombination` since there's only one explicit use of this class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] advancedxy commented on issue #25002: [SPARK-28203][Core][Python] PythonRDD should respect SparkContext's hadoop configuration
advancedxy commented on issue #25002: [SPARK-28203][Core][Python] PythonRDD should respect SparkContext's hadoop configuration URL: https://github.com/apache/spark/pull/25002#issuecomment-518921394 Gently ping @cloud-fan, @HyukjinKwon and @dongjoon-hyun. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#discussion_r311353000 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ## @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper { * extended with the set of connected/unconnected plans. */ case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int]) + +/** + * Reorder the joins using a genetic algorithm. The algorithm treat the reorder problem + * to a traveling salesmen problem, and use genetic algorithm give an optimized solution. + * + * The implementation refs the geqo in postgresql, which is contibuted by Darrell Whitley: + * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html + * + * For more info about genetic algorithm and the edge recombination crossover, pls see: + * "A Genetic Algorithm Tutorial, Darrell Whitley" + * https://link.springer.com/article/10.1007/BF00175354 + * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator, + * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238 + * respectively. + */ +object JoinReorderGA extends PredicateHelper with Logging { + + def search( + conf: SQLConf, + items: Seq[LogicalPlan], + conditions: Set[Expression], + output: Seq[Attribute]): Option[LogicalPlan] = { + +val startTime = System.nanoTime() + +val itemsWithIndex = items.zipWithIndex.map { + case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0)) +}.toMap + +val topOutputSet = AttributeSet(output) + +val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve + +val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) +logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + +s"${items.length}, number of plans in memo: ${ pop.chromos.size}") + +assert(pop.chromos.head.basicPlans.size == items.length) +pop.chromos.head.integratedPlan match { + case Some(joinPlan) => joinPlan.plan match { +case p @ Project(projectList, _: Join) if projectList != output => + assert(topOutputSet == p.outputSet) + // Keep the same order of final output attributes. + Some(p.copy(projectList = output)) +case finalPlan if !sameOutput(finalPlan, output) => + Some(Project(output, finalPlan)) +case finalPlan => + Some(finalPlan) + } + case _ => None +} + } +} + +/** + * A pair of parent individuals can breed a child with certain crossover process. + * With crossover, child can inherit gene from its parents, and these gene snippets + * finally compose a new [[Chromosome]]. + */ +@DeveloperApi +trait Crossover { + + /** + * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s, + * with this crossover algorithm. + */ + def newChromo(father: Chromosome, mother: Chromosome) : Chromosome +} + +case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]]) + +/** + * This class implements the Genetic Edge Recombination algorithm. + * For more information about the Genetic Edge Recombination, + * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge + * Recombination Operator" by Darrell Whitley et al. + * https://dl.acm.org/citation.cfm?id=657238 + */ +object EdgeRecombination extends Crossover { + + def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = { +val fatherTable = father.basicPlans.map(g => g -> findNeighbours(father.basicPlans, g)).toMap +val motherTable = mother.basicPlans.map(g => g -> findNeighbours(mother.basicPlans, g)).toMap +EdgeTable( + fatherTable.map(entry => entry._1 -> (entry._2 ++ motherTable(entry._1 + } + + def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = { +val genesIndexed = genes.toIndexedSeq +val index = genesIndexed.indexOf(g) +val length = genes.size +if (index > 0 && index < length - 1) { + Seq(genesIndexed(index - 1), genesIndexed(index + 1)) +} else if (index == 0) { + Seq(genesIndexed(1), genesIndexed(length - 1)) +} else if (index == length - 1) { + Seq(genesIndexed(0), genesIndexed(length - 2)) +} else { + Seq() +} + } + + override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = { +var newGenes: Seq[JoinPlan] = Seq() +// 1. Generate the edge table. +var table = genEdgeTable(father, mother).table +// 2. Choose a start point randomly from the heads of father/mother. +var current = + if (util.Random.nextInt(2) == 0) father.basicPlans.head else mother.basicPlans.head +newGenes :+= current + +var stop = false +while (!stop) { + // 3. Filter out the chosen point from the edge table. + table = table.map( +
[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#issuecomment-518921599 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#issuecomment-518921601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13824/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-518922387 **[Test build #108734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108734/testReport)** for PR 24232 at commit [`a8c7ecd`](https://github.com/apache/spark/commit/a8c7ecd27b8d0fcabfd86571eeba801bb5c7e62a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
SparkQA commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#issuecomment-518921923 **[Test build #108743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108743/testReport)** for PR 24983 at commit [`75b5037`](https://github.com/apache/spark/commit/75b50373fa3120d9b1726155756909315e2b8b58). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#issuecomment-518921601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13824/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#discussion_r311353280 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ## @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper { * extended with the set of connected/unconnected plans. */ case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int]) + +/** + * Reorder the joins using a genetic algorithm. The algorithm treat the reorder problem + * to a traveling salesmen problem, and use genetic algorithm give an optimized solution. + * + * The implementation refs the geqo in postgresql, which is contibuted by Darrell Whitley: + * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html + * + * For more info about genetic algorithm and the edge recombination crossover, pls see: + * "A Genetic Algorithm Tutorial, Darrell Whitley" + * https://link.springer.com/article/10.1007/BF00175354 + * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator, + * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238 + * respectively. + */ +object JoinReorderGA extends PredicateHelper with Logging { + + def search( + conf: SQLConf, + items: Seq[LogicalPlan], + conditions: Set[Expression], + output: Seq[Attribute]): Option[LogicalPlan] = { + +val startTime = System.nanoTime() + +val itemsWithIndex = items.zipWithIndex.map { + case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0)) +}.toMap + +val topOutputSet = AttributeSet(output) + +val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve + +val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000) +logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of items: " + +s"${items.length}, number of plans in memo: ${ pop.chromos.size}") + +assert(pop.chromos.head.basicPlans.size == items.length) +pop.chromos.head.integratedPlan match { + case Some(joinPlan) => joinPlan.plan match { +case p @ Project(projectList, _: Join) if projectList != output => + assert(topOutputSet == p.outputSet) + // Keep the same order of final output attributes. + Some(p.copy(projectList = output)) +case finalPlan if !sameOutput(finalPlan, output) => + Some(Project(output, finalPlan)) +case finalPlan => + Some(finalPlan) + } + case _ => None +} + } +} + +/** + * A pair of parent individuals can breed a child with certain crossover process. + * With crossover, child can inherit gene from its parents, and these gene snippets + * finally compose a new [[Chromosome]]. + */ +@DeveloperApi +trait Crossover { + + /** + * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s, + * with this crossover algorithm. + */ + def newChromo(father: Chromosome, mother: Chromosome) : Chromosome +} + +case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]]) + +/** + * This class implements the Genetic Edge Recombination algorithm. + * For more information about the Genetic Edge Recombination, + * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge + * Recombination Operator" by Darrell Whitley et al. + * https://dl.acm.org/citation.cfm?id=657238 + */ +object EdgeRecombination extends Crossover { + + def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = { +val fatherTable = father.basicPlans.map(g => g -> findNeighbours(father.basicPlans, g)).toMap +val motherTable = mother.basicPlans.map(g => g -> findNeighbours(mother.basicPlans, g)).toMap +EdgeTable( + fatherTable.map(entry => entry._1 -> (entry._2 ++ motherTable(entry._1 + } + + def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = { +val genesIndexed = genes.toIndexedSeq +val index = genesIndexed.indexOf(g) +val length = genes.size +if (index > 0 && index < length - 1) { + Seq(genesIndexed(index - 1), genesIndexed(index + 1)) +} else if (index == 0) { + Seq(genesIndexed(1), genesIndexed(length - 1)) +} else if (index == length - 1) { + Seq(genesIndexed(0), genesIndexed(length - 2)) +} else { + Seq() +} + } + + override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = { +var newGenes: Seq[JoinPlan] = Seq() +// 1. Generate the edge table. +var table = genEdgeTable(father, mother).table +// 2. Choose a start point randomly from the heads of father/mother. +var current = + if (util.Random.nextInt(2) == 0) father.basicPlans.head else mother.basicPlans.head +newGenes :+= current + +var stop = false +while (!stop) { + // 3. Filter out the chosen point from the edge table. + table = table.map( +
[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder URL: https://github.com/apache/spark/pull/24983#issuecomment-518921599 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
SparkQA removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-518885405 **[Test build #108734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108734/testReport)** for PR 24232 at commit [`a8c7ecd`](https://github.com/apache/spark/commit/a8c7ecd27b8d0fcabfd86571eeba801bb5c7e62a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query
maropu commented on a change in pull request #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query URL: https://github.com/apache/spark/pull/25357#discussion_r310901893 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -229,6 +229,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { // List of SQL queries to run // note: this is not a robust way to split queries using semicolon, but works for now. val queries = code.mkString("\n").split("(?<=[^]);").map(_.trim).filter(_ != "").toSeq + .map(_.split("\n").filterNot(_.startsWith("--")).mkString("\n")).map(_.trim).filter(_ != "") Review comment: I feel this is a little complicated, so could you describe what this code does in the comment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile closed pull request #25358: [SPARK-28622][SQL][PYTHON] Rename PullOutPythonUDFInJoinCondition to ExtractPythonUDFFromJoinCondition and move to 'Extract Python UDFs'
gatorsmile closed pull request #25358: [SPARK-28622][SQL][PYTHON] Rename PullOutPythonUDFInJoinCondition to ExtractPythonUDFFromJoinCondition and move to 'Extract Python UDFs' URL: https://github.com/apache/spark/pull/25358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25253: [SPARK-28470][SQL] Cast to decimal throws ArithmeticException on overflow
maropu commented on issue #25253: [SPARK-28470][SQL] Cast to decimal throws ArithmeticException on overflow URL: https://github.com/apache/spark/pull/25253#issuecomment-518534442 Anyone can check this for sign-off before merging? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec URL: https://github.com/apache/spark/pull/25365#issuecomment-518536143 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518536231 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518536231 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
AmplabJenkins removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec URL: https://github.com/apache/spark/pull/25365#issuecomment-518536143 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
SparkQA removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec URL: https://github.com/apache/spark/pull/25365#issuecomment-518513993 **[Test build #108698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108698/testReport)** for PR 25365 at commit [`352a3cb`](https://github.com/apache/spark/commit/352a3cb40c851cdba5e4289095d54809438f0a7b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query
AmplabJenkins removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query URL: https://github.com/apache/spark/pull/25357#issuecomment-518536212 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
SparkQA removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518505387 **[Test build #108694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108694/testReport)** for PR 25355 at commit [`a07dce5`](https://github.com/apache/spark/commit/a07dce5a71b62d21064fc585f7ef746fb2fff6cc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query
SparkQA removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query URL: https://github.com/apache/spark/pull/25357#issuecomment-518507447 **[Test build #108695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108695/testReport)** for PR 25357 at commit [`eeb7405`](https://github.com/apache/spark/commit/eeb7405ad0c7cc1004e2cad36929d20d95ab2726). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query
AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query URL: https://github.com/apache/spark/pull/25357#issuecomment-518536226 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108695/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518536234 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108694/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query
AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query URL: https://github.com/apache/spark/pull/25357#issuecomment-518536212 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec URL: https://github.com/apache/spark/pull/25365#issuecomment-518536154 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108698/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518537763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13785/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518537760 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518537760 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518537763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13785/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518538601 **[Test build #108700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108700/testReport)** for PR 25328 at commit [`0652c22`](https://github.com/apache/spark/commit/0652c224466be741f985b77104cfbebb2cbf1a9e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize
beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize URL: https://github.com/apache/spark/pull/25309#discussion_r310905901 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ## @@ -184,4 +184,29 @@ object YarnSparkHadoopUtil { ConverterUtils.toContainerId(containerIdString) } + /** + * If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory requested value + * is not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested for executor + * may be not enough. + */ + def executorMemoryOverheadRequested(sparkConf: SparkConf): Int = { +val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt +val overhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( + math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN)).toInt +val offHeap = if (sparkConf.get(MEMORY_OFFHEAP_ENABLED)) { + val size = +sparkConf.getSizeAsMb(MEMORY_OFFHEAP_SIZE.key, MEMORY_OFFHEAP_SIZE.defaultValueString) + require(size > 0, +s"${MEMORY_OFFHEAP_SIZE.key} must be > 0 when ${MEMORY_OFFHEAP_ENABLED.key} == true") + if (size > overhead) { +logWarning(s"The value of ${MEMORY_OFFHEAP_SIZE.key}(${size}MB) will be used as " + + s"executorMemoryOverhead to request resource to ensure that Executor has enough memory " + + s"to use. It is recommended that the configuration value of " + + s"${EXECUTOR_MEMORY_OVERHEAD.key} should be no less than ${MEMORY_OFFHEAP_SIZE.key} " + + s"when ${MEMORY_OFFHEAP_ENABLED.key} is true.") + } + size +} else 0 +math.max(overhead, offHeap).toInt Review comment: I have check the code and doc, there exists some inconsistent. According to the docs, `memoryOverhead` should comprise `pysparkWorkerMemory`. But the code have different behavior. We need to fix the inconsistent. I think should reduce parameter to control memory, because more simple. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize
beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize URL: https://github.com/apache/spark/pull/25309#discussion_r310905901 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ## @@ -184,4 +184,29 @@ object YarnSparkHadoopUtil { ConverterUtils.toContainerId(containerIdString) } + /** + * If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory requested value + * is not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested for executor + * may be not enough. + */ + def executorMemoryOverheadRequested(sparkConf: SparkConf): Int = { +val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt +val overhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( + math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN)).toInt +val offHeap = if (sparkConf.get(MEMORY_OFFHEAP_ENABLED)) { + val size = +sparkConf.getSizeAsMb(MEMORY_OFFHEAP_SIZE.key, MEMORY_OFFHEAP_SIZE.defaultValueString) + require(size > 0, +s"${MEMORY_OFFHEAP_SIZE.key} must be > 0 when ${MEMORY_OFFHEAP_ENABLED.key} == true") + if (size > overhead) { +logWarning(s"The value of ${MEMORY_OFFHEAP_SIZE.key}(${size}MB) will be used as " + + s"executorMemoryOverhead to request resource to ensure that Executor has enough memory " + + s"to use. It is recommended that the configuration value of " + + s"${EXECUTOR_MEMORY_OVERHEAD.key} should be no less than ${MEMORY_OFFHEAP_SIZE.key} " + + s"when ${MEMORY_OFFHEAP_ENABLED.key} is true.") + } + size +} else 0 +math.max(overhead, offHeap).toInt Review comment: I have check the code and doc, there exists some inconsistent. According to the docs, `memoryOverhead` should comprise `pysparkWorkerMemory`. But the code have different behavior. We need to fix the inconsistent. I think should reduce parameter to control memory, because more simple. @JoshRosen Could you take a look at this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
cloud-fan commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518541569 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518543465 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518543471 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13786/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518543465 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots
HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots URL: https://github.com/apache/spark/pull/25356#issuecomment-518543461 looks intermittent though .. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots
HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots URL: https://github.com/apache/spark/pull/25356#issuecomment-518543383 Yes, seems so - https://github.com/apache/spark/pull/25363#issuecomment-518482370 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum
mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum URL: https://github.com/apache/spark/pull/25347#issuecomment-518550239 Yes @maropu, you're right. The reason why I didn't change the output attribute was not to cause a breaking change. But since we are introducing a flag for it, it may be ok to do so. What do you think? cc @cloud-fan what do you think about this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test.
AmplabJenkins removed a comment on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test. URL: https://github.com/apache/spark/pull/25366#issuecomment-518558209 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test.
SparkQA commented on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test. URL: https://github.com/apache/spark/pull/25366#issuecomment-518559393 **[Test build #108702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108702/testReport)** for PR 25366 at commit [`9654246`](https://github.com/apache/spark/commit/965424655dbe8bfdb5a9b162724b270bfedf5cbe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize
beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize URL: https://github.com/apache/spark/pull/25309#discussion_r310905901 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ## @@ -184,4 +184,29 @@ object YarnSparkHadoopUtil { ConverterUtils.toContainerId(containerIdString) } + /** + * If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory requested value + * is not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested for executor + * may be not enough. + */ + def executorMemoryOverheadRequested(sparkConf: SparkConf): Int = { +val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt +val overhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( + math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN)).toInt +val offHeap = if (sparkConf.get(MEMORY_OFFHEAP_ENABLED)) { + val size = +sparkConf.getSizeAsMb(MEMORY_OFFHEAP_SIZE.key, MEMORY_OFFHEAP_SIZE.defaultValueString) + require(size > 0, +s"${MEMORY_OFFHEAP_SIZE.key} must be > 0 when ${MEMORY_OFFHEAP_ENABLED.key} == true") + if (size > overhead) { +logWarning(s"The value of ${MEMORY_OFFHEAP_SIZE.key}(${size}MB) will be used as " + + s"executorMemoryOverhead to request resource to ensure that Executor has enough memory " + + s"to use. It is recommended that the configuration value of " + + s"${EXECUTOR_MEMORY_OVERHEAD.key} should be no less than ${MEMORY_OFFHEAP_SIZE.key} " + + s"when ${MEMORY_OFFHEAP_ENABLED.key} is true.") + } + size +} else 0 +math.max(overhead, offHeap).toInt Review comment: Let me have a check! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #25328: [SPARK-28595][SQL] explain should not trigger partition listing
viirya commented on a change in pull request #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#discussion_r310907987 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -346,6 +348,12 @@ case class FileSourceScanExec( } else { None } + } ++ { +if (relation.partitionSchemaOption.isDefined) { + Some("numPartitions" -> SQLMetrics.createMetric(sparkContext, "number of partitions read")) Review comment: Although previously it is `PartitionCount`, `numPartitions` looks more consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
gatorsmile commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518535625 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing
SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing URL: https://github.com/apache/spark/pull/25328#issuecomment-518535835 **[Test build #108699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108699/testReport)** for PR 25328 at commit [`0652c22`](https://github.com/apache/spark/commit/0652c224466be741f985b77104cfbebb2cbf1a9e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
SparkQA commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518536012 **[Test build #108694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108694/testReport)** for PR 25355 at commit [`a07dce5`](https://github.com/apache/spark/commit/a07dce5a71b62d21064fc585f7ef746fb2fff6cc). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
SparkQA commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec URL: https://github.com/apache/spark/pull/25365#issuecomment-518536015 **[Test build #108698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108698/testReport)** for PR 25365 at commit [`352a3cb`](https://github.com/apache/spark/commit/352a3cb40c851cdba5e4289095d54809438f0a7b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query
SparkQA commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query URL: https://github.com/apache/spark/pull/25357#issuecomment-518536013 **[Test build #108695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108695/testReport)** for PR 25357 at commit [`eeb7405`](https://github.com/apache/spark/commit/eeb7405ad0c7cc1004e2cad36929d20d95ab2726). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots
viirya commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots URL: https://github.com/apache/spark/pull/25356#issuecomment-518537430 > * checking CRAN incoming feasibility ...Error in readRDS(con) : Looks different to previous CRAN error. Was it happened again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
AmplabJenkins removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec URL: https://github.com/apache/spark/pull/25365#issuecomment-518536154 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108698/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default
AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default URL: https://github.com/apache/spark/pull/25355#issuecomment-518536234 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108694/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org