date:20190806

[GitHub] [spark] SparkQA removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-06 Thread GitBox

SparkQA removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning 
when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#issuecomment-518874024
 
 
   **[Test build #108733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108733/testReport)**
 for PR 25372 at commit 
[`8dd1feb`](https://github.com/apache/spark/commit/8dd1feba8231dc0b73e08935997f4acd8eb957d6).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give 
warning when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#issuecomment-518875170
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning 
when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#issuecomment-518912413
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108733/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 
fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-518912460
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108738/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 
fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-518912451
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WeichenXu123 commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

WeichenXu123 commented on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518912457
 
 
   @HyukjinKwon @dongjoon-hyun I also update `github_jira_sync.py` and 
`run-test-jenkins.py`. Now all scripts are covered.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

SparkQA commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development 
scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518913142
 
 
   **[Test build #108740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108740/testReport)**
 for PR 25289 at commit 
[`2735255`](https://github.com/apache/spark/commit/273525538128ce753b7f3b3c1b0f39838a4dea82).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

SparkQA removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 
fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-518911536
 
 
   **[Test build #108738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108738/testReport)**
 for PR 25348 at commit 
[`3e35d5c`](https://github.com/apache/spark/commit/3e35d5c84b4ba0b332981382f71f68eb05b966e2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give 
warning when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#issuecomment-518912413
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108733/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25372: [SPARK-28640][SQL] Only give 
warning when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#issuecomment-518912412
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a 
v1 fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-518912451
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518912743
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13820/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25348: [RFC][SPARK-28554][SQL] Adds a 
v1 fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-518912460
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108738/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518912736
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518912743
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13820/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518912736
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

SparkQA commented on issue #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback 
writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#issuecomment-518912442
 
 
   **[Test build #108738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108738/testReport)**
 for PR 25348 at commit 
[`3e35d5c`](https://github.com/apache/spark/commit/3e35d5c84b4ba0b332981382f71f68eb05b966e2).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait SupportsV1Write extends SparkPlan `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25372: [SPARK-28640][SQL] Only give warning 
when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#issuecomment-518912412
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518911554
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108739/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WeichenXu123 commented on a change in pull request #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

WeichenXu123 commented on a change in pull request #25289: 
[WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#discussion_r311345989
 
 

 ##
 File path: python/run-tests.py
 ##
 @@ -161,8 +161,13 @@ def run_individual_python_test(target_dir, test_name, 
pyspark_python):
 
 def get_default_python_executables():
 python_execs = [x for x in ["python2.7", "python3.6", "pypy"] if which(x)]
-if "python2.7" not in python_execs:
-LOGGER.warning("Not testing against `python2.7` because it could not 
be found; falling"
+if ("python3.6" not in python_execs) and which("python3"):
+LOGGER.warning("Not testing against `python3.6` because it could not 
be found; falling"
+   " back to `python3` instead")
+python_execs.insert(0, "python3")
 
 Review comment:
   I revert the change here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518914175
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518914177
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13821/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY function support byte array

2019-08-06 Thread GitBox

beliefer commented on issue #25172: [SPARK-28412][SQL] ANSI SQL: OVERLAY 
function support byte array
URL: https://github.com/apache/spark/pull/25172#issuecomment-518913939
 
 
   @maropu Could you continue to take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518914175
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25289: [WIP][SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518914177
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13821/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence

2019-08-06 Thread GitBox

gatorsmile commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, 
pow functions for platform independence
URL: https://github.com/apache/spark/pull/25279#issuecomment-518915141
 
 
   @srowen Could you show the perf benchmark? The performance regression is 
expected, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag

2019-08-06 Thread GitBox

dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag
URL: https://github.com/apache/spark/pull/25229#issuecomment-518916061
 
 
   @skonto . My name is `Dongjoon Hyun` with GitHub id `@dongjoon-hyun`. :)
   
   I knew the history of #24792 and Spark uses 
`YarnSparkHadoopUtil.addOutOfMemoryErrorArgument`. I agree with the necessity 
of this since that PR, and tried to review/merge this PR. However, the current 
implementation seems a little too complicated and not robust, especially the 
part of copying `/opt/spark/conf/spark.properties` to `/tmp/spark.properties` 
and replacing it.

   In general, I believe the best UX is to keep it simple by reusing the 
existing general one. `OnOutOfMemoryError` is a well-known option for JVM users 
and `spark.driver.extraJavaOptions` is for that kind of option. As of now, I 
prefer a new documentation, but other committers may have different opinions. 
Let me ping them to get their advice.
   
   Hi, @srowen , @squito , @mccheah , @zsxwing , @tgravescs . This is the 
continuation of #24796 . Could you review this PR's implementation in order to 
make a progress and finalize the issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #25229: [SPARK-27900][K8s] Add jvm oom flag

2019-08-06 Thread GitBox

dongjoon-hyun edited a comment on issue #25229: [SPARK-27900][K8s] Add jvm oom 
flag
URL: https://github.com/apache/spark/pull/25229#issuecomment-518916061
 
 
   @skonto . My name is `Dongjoon Hyun` with GitHub id `@dongjoon-hyun`. :)
   
   I knew the history of #24796 and Spark uses 
`YarnSparkHadoopUtil.addOutOfMemoryErrorArgument`. I agree with the necessity 
of this since that PR, and tried to review/merge this PR. However, the current 
implementation seems a little too complicated and not robust, especially the 
part of copying `/opt/spark/conf/spark.properties` to `/tmp/spark.properties` 
and replacing it.

   In general, I believe the best UX is to keep it simple by reusing the 
existing general one. `OnOutOfMemoryError` is a well-known option for JVM users 
and `spark.driver.extraJavaOptions` is for that kind of option. As of now, I 
prefer a new documentation, but other committers may have different opinions. 
Let me ping them to get their advice.
   
   Hi, @srowen , @squito , @mccheah , @zsxwing , @tgravescs . This is the 
continuation of #24796 . Could you review this PR's implementation in order to 
make a progress and finalize the issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test

2019-08-06 Thread GitBox

beliefer commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open 
comment about boolean test
URL: https://github.com/apache/spark/pull/25366#issuecomment-518916476
 
 
   > I made a PR to you, @beliefer . Please review and merge.
   > 
   > * [beliefer#2](https://github.com/beliefer/spark/pull/2)
   
   Thanks for your help! It's my mistake. I forget something.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax

2019-08-06 Thread GitBox

beliefer commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE 
syntax
URL: https://github.com/apache/spark/pull/25001#issuecomment-518917065
 
 
   @dongjoon-hyun Could you check this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test

2019-08-06 Thread GitBox

SparkQA commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open 
comment about boolean test
URL: https://github.com/apache/spark/pull/25366#issuecomment-518917430
 
 
   **[Test build #108741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108741/testReport)**
 for PR 25366 at commit 
[`30ed8d3`](https://github.com/apache/spark/commit/30ed8d3f61e5455d17efe77e6e5985c7e72d0109).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow functions for platform independence

2019-08-06 Thread GitBox

srowen commented on issue #25279: [SPARK-28519][SQL] Use StrictMath log, pow 
functions for platform independence
URL: https://github.com/apache/spark/pull/25279#issuecomment-518917299
 
 
   I did not benchmark this as I think it's a correctness issue that would be 
worth a perf hit. I also expect it makes almost no difference - computing a 
function in SQL is dominated by so much more than the math here. Let me assess 
that though with some microbenchmarks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25366: 
[SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
URL: https://github.com/apache/spark/pull/25366#issuecomment-518918462
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13822/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] 
Open comment about boolean test
URL: https://github.com/apache/spark/pull/25366#issuecomment-518918460
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25366: 
[SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test
URL: https://github.com/apache/spark/pull/25366#issuecomment-518918460
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] Open comment about boolean test

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25366: [SPARK-27924][SQL][TEST][FOLLOW-UP] 
Open comment about boolean test
URL: https://github.com/apache/spark/pull/25366#issuecomment-518918462
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13822/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

SparkQA commented on issue #25289: [SPARK-27889][INFRA] Make development 
scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518918889
 
 
   **[Test build #108742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108742/testReport)**
 for PR 25289 at commit 
[`34c3af7`](https://github.com/apache/spark/commit/34c3af7e22e1c9fb6fad27bb03b45a1e95f6d828).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

RussellSpitzer commented on a change in pull request #25348: 
[RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data 
source codepaths
URL: https://github.com/apache/spark/pull/25348#discussion_r311351159
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -501,3 +528,19 @@ private[v2] case class DataWritingSparkTaskResult(
  * Sink progress information collected after commit.
  */
 private[sql] case class StreamWriterCommitProgress(numOutputRows: Long)
+
+/**
+ * A trait that allows Tables that use V1 Writer interfaces to write data.
+ */
+sealed trait SupportsV1Write extends V2TableWriteExec {
+  def plan: LogicalPlan
+
+  protected def writeWithV1(
+  relation: CreatableRelationProvider,
+  mode: SaveMode,
+  options: CaseInsensitiveStringMap): RDD[InternalRow] = {
+relation.createRelation(
+  sqlContext, mode, options.asScala.toMap, 
Dataset.ofRows(sqlContext.sparkSession, plan))
+sparkContext.emptyRDD
 
 Review comment:
   ```  val writtenRows = writer match {
   case v1: V1WriteBuilder =>
 writeWithV1(v1.buildForV1Write(), writeOptions)
   case v2 =>
 doWrite(v2.buildForBatch())
 }```
   If this is always empty why do we save it as writtenRows here? This is just 
to hold a reference to the empty result set?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

RussellSpitzer commented on a change in pull request #25348: 
[RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data 
source codepaths
URL: https://github.com/apache/spark/pull/25348#discussion_r311351159
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -501,3 +528,19 @@ private[v2] case class DataWritingSparkTaskResult(
  * Sink progress information collected after commit.
  */
 private[sql] case class StreamWriterCommitProgress(numOutputRows: Long)
+
+/**
+ * A trait that allows Tables that use V1 Writer interfaces to write data.
+ */
+sealed trait SupportsV1Write extends V2TableWriteExec {
+  def plan: LogicalPlan
+
+  protected def writeWithV1(
+  relation: CreatableRelationProvider,
+  mode: SaveMode,
+  options: CaseInsensitiveStringMap): RDD[InternalRow] = {
+relation.createRelation(
+  sqlContext, mode, options.asScala.toMap, 
Dataset.ofRows(sqlContext.sparkSession, plan))
+sparkContext.emptyRDD
 
 Review comment:
   ```  val writtenRows = writer match {
   case v1: V1WriteBuilder =>
 writeWithV1(v1.buildForV1Write(), writeOptions)
   case v2 =>
 doWrite(v2.buildForBatch())
 }
   ```
   If this is always empty why do we save it as writtenRows here? This is just 
to hold a reference to the empty result set?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] RussellSpitzer commented on a change in pull request #25348: [RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-06 Thread GitBox

RussellSpitzer commented on a change in pull request #25348: 
[RFC][SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data 
source codepaths
URL: https://github.com/apache/spark/pull/25348#discussion_r311351159
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -501,3 +528,19 @@ private[v2] case class DataWritingSparkTaskResult(
  * Sink progress information collected after commit.
  */
 private[sql] case class StreamWriterCommitProgress(numOutputRows: Long)
+
+/**
+ * A trait that allows Tables that use V1 Writer interfaces to write data.
+ */
+sealed trait SupportsV1Write extends V2TableWriteExec {
+  def plan: LogicalPlan
+
+  protected def writeWithV1(
+  relation: CreatableRelationProvider,
+  mode: SaveMode,
+  options: CaseInsensitiveStringMap): RDD[InternalRow] = {
+relation.createRelation(
+  sqlContext, mode, options.asScala.toMap, 
Dataset.ofRows(sqlContext.sparkSession, plan))
+sparkContext.emptyRDD
 
 Review comment:
   ``` 
val writtenRows = writer match {
   case v1: V1WriteBuilder =>
 writeWithV1(v1.buildForV1Write(), writeOptions)
   case v2 =>
 doWrite(v2.buildForBatch())
 }
   ```
   If this is always empty why do we save it as writtenRows here? This is just 
to hold a reference to the empty result set?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518920002
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13823/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25289: [SPARK-27889][INFRA] Make 
development scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518919997
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development 
scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518919997
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development scripts under dev/ support Python 3

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25289: [SPARK-27889][INFRA] Make development 
scripts under dev/ support Python 3
URL: https://github.com/apache/spark/pull/25289#issuecomment-518920002
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13823/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] advancedxy commented on a change in pull request #25306: [SPARK-28573][SQL] Convert InsertIntoTable(HiveTableRelation) to DataSource inserting for partitioned table

2019-08-06 Thread GitBox

advancedxy commented on a change in pull request #25306: [SPARK-28573][SQL] 
Convert InsertIntoTable(HiveTableRelation) to DataSource inserting for 
partitioned table
URL: https://github.com/apache/spark/pull/25306#discussion_r311351669
 
 

 ##
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
 ##
 @@ -58,7 +58,11 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
 |TBLPROPERTIES('prop1Key'="prop1Val", '`prop2Key`'="prop2Val")
   """.stripMargin)
 sql("CREATE TABLE parquet_tab3(col1 int, `col 2` int)")
-sql("CREATE TABLE parquet_tab4 (price int, qty int) partitioned by (year 
int, month int)")
+sql(
+  """
+|CREATE TABLE parquet_tab4 (price int, qty int) partitioned by (year 
int, month int)
+|STORED AS PARQUET
 
 Review comment:
   It's modified because I randomly chose it to make sure the insert into 
partitioned table can be safely converted.
   However it should already covered in other test cases.
   
   So it's neutral change. I can revert it if you think it's unnecessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

xianyinxin commented on a change in pull request #24983: 
[SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#discussion_r311352726
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ##
 @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper {
  * extended with the set of connected/unconnected plans.
  */
 case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int])
+
+/**
+ * Reorder the joins using a genetic algorithm. The algorithm treat the 
reorder problem
+ * to a traveling salesmen problem, and use genetic algorithm give an 
optimized solution.
+ *
+ * The implementation refs the geqo in postgresql, which is contibuted by 
Darrell Whitley:
+ * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html
+ *
+ * For more info about genetic algorithm and the edge recombination crossover, 
pls see:
+ * "A Genetic Algorithm Tutorial, Darrell Whitley"
+ * https://link.springer.com/article/10.1007/BF00175354
+ * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge 
Recombination Operator,
+ * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238
+ * respectively.
+ */
+object JoinReorderGA extends PredicateHelper with Logging {
+
+  def search(
+  conf: SQLConf,
+  items: Seq[LogicalPlan],
+  conditions: Set[Expression],
+  output: Seq[Attribute]): Option[LogicalPlan] = {
+
+val startTime = System.nanoTime()
+
+val itemsWithIndex = items.zipWithIndex.map {
+  case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0))
+}.toMap
+
+val topOutputSet = AttributeSet(output)
+
+val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve
+
+val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
+logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of 
items: " +
+s"${items.length}, number of plans in memo: ${ pop.chromos.size}")
+
+assert(pop.chromos.head.basicPlans.size == items.length)
+pop.chromos.head.integratedPlan match {
+  case Some(joinPlan) => joinPlan.plan match {
+case p @ Project(projectList, _: Join) if projectList != output =>
+  assert(topOutputSet == p.outputSet)
+  // Keep the same order of final output attributes.
+  Some(p.copy(projectList = output))
+case finalPlan if !sameOutput(finalPlan, output) =>
+  Some(Project(output, finalPlan))
+case finalPlan =>
+  Some(finalPlan)
+  }
+  case _ => None
+}
+  }
+}
+
+/**
+ * A pair of parent individuals can breed a child with certain crossover 
process.
+ * With crossover, child can inherit gene from its parents, and these gene 
snippets
+ * finally compose a new [[Chromosome]].
+ */
+@DeveloperApi
+trait Crossover {
+
+  /**
+   * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s,
+   * with this crossover algorithm.
+   */
+  def newChromo(father: Chromosome, mother: Chromosome) : Chromosome
+}
+
+case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]])
+
+/**
+ * This class implements the Genetic Edge Recombination algorithm.
+ * For more information about the Genetic Edge Recombination,
+ * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge
+ * Recombination Operator" by Darrell Whitley et al.
+ * https://dl.acm.org/citation.cfm?id=657238
+ */
+object EdgeRecombination extends Crossover {
+
+  def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = {
+val fatherTable = father.basicPlans.map(g => g -> 
findNeighbours(father.basicPlans, g)).toMap
+val motherTable = mother.basicPlans.map(g => g -> 
findNeighbours(mother.basicPlans, g)).toMap
+EdgeTable(
+  fatherTable.map(entry => entry._1 -> (entry._2 ++ 
motherTable(entry._1
+  }
+
+  def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = {
+val genesIndexed = genes.toIndexedSeq
+val index = genesIndexed.indexOf(g)
+val length = genes.size
+if (index > 0 && index < length - 1) {
+  Seq(genesIndexed(index - 1), genesIndexed(index + 1))
+} else if (index == 0) {
+  Seq(genesIndexed(1), genesIndexed(length - 1))
+} else if (index == length - 1) {
+  Seq(genesIndexed(0), genesIndexed(length - 2))
+} else {
+  Seq()
+}
+  }
+
+  override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = 
{
+var newGenes: Seq[JoinPlan] = Seq()
+// 1. Generate the edge table.
+var table = genEdgeTable(father, mother).table
+// 2. Choose a start point randomly from the heads of father/mother.
+var current =
+  if (util.Random.nextInt(2) == 0) father.basicPlans.head else 
mother.basicPlans.head
+newGenes :+= current
+
+var stop = false
+while (!stop) {
+  // 3. Filter out the chosen point from the edge table.
+  table = table.map(
+

[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

xianyinxin commented on a change in pull request #24983: 
[SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#discussion_r311352702
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ##
 @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper {
  * extended with the set of connected/unconnected plans.
  */
 case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int])
+
+/**
+ * Reorder the joins using a genetic algorithm. The algorithm treat the 
reorder problem
+ * to a traveling salesmen problem, and use genetic algorithm give an 
optimized solution.
+ *
+ * The implementation refs the geqo in postgresql, which is contibuted by 
Darrell Whitley:
+ * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html
+ *
+ * For more info about genetic algorithm and the edge recombination crossover, 
pls see:
+ * "A Genetic Algorithm Tutorial, Darrell Whitley"
+ * https://link.springer.com/article/10.1007/BF00175354
+ * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge 
Recombination Operator,
+ * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238
+ * respectively.
+ */
+object JoinReorderGA extends PredicateHelper with Logging {
+
+  def search(
+  conf: SQLConf,
+  items: Seq[LogicalPlan],
+  conditions: Set[Expression],
+  output: Seq[Attribute]): Option[LogicalPlan] = {
+
+val startTime = System.nanoTime()
+
+val itemsWithIndex = items.zipWithIndex.map {
+  case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0))
+}.toMap
+
+val topOutputSet = AttributeSet(output)
+
+val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve
+
+val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
+logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of 
items: " +
+s"${items.length}, number of plans in memo: ${ pop.chromos.size}")
+
+assert(pop.chromos.head.basicPlans.size == items.length)
+pop.chromos.head.integratedPlan match {
+  case Some(joinPlan) => joinPlan.plan match {
+case p @ Project(projectList, _: Join) if projectList != output =>
+  assert(topOutputSet == p.outputSet)
+  // Keep the same order of final output attributes.
+  Some(p.copy(projectList = output))
+case finalPlan if !sameOutput(finalPlan, output) =>
+  Some(Project(output, finalPlan))
+case finalPlan =>
+  Some(finalPlan)
+  }
+  case _ => None
+}
+  }
+}
+
+/**
+ * A pair of parent individuals can breed a child with certain crossover 
process.
+ * With crossover, child can inherit gene from its parents, and these gene 
snippets
+ * finally compose a new [[Chromosome]].
+ */
+@DeveloperApi
+trait Crossover {
+
+  /**
+   * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s,
+   * with this crossover algorithm.
+   */
+  def newChromo(father: Chromosome, mother: Chromosome) : Chromosome
+}
+
+case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]])
+
+/**
+ * This class implements the Genetic Edge Recombination algorithm.
+ * For more information about the Genetic Edge Recombination,
+ * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge
+ * Recombination Operator" by Darrell Whitley et al.
+ * https://dl.acm.org/citation.cfm?id=657238
+ */
+object EdgeRecombination extends Crossover {
 
 Review comment:
   Done. Added a simple description and an example of the algorithm.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

xianyinxin commented on a change in pull request #24983: 
[SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#discussion_r311352767
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ##
 @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper {
  * extended with the set of connected/unconnected plans.
  */
 case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int])
+
+/**
+ * Reorder the joins using a genetic algorithm. The algorithm treat the 
reorder problem
+ * to a traveling salesmen problem, and use genetic algorithm give an 
optimized solution.
+ *
+ * The implementation refs the geqo in postgresql, which is contibuted by 
Darrell Whitley:
+ * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html
+ *
+ * For more info about genetic algorithm and the edge recombination crossover, 
pls see:
+ * "A Genetic Algorithm Tutorial, Darrell Whitley"
+ * https://link.springer.com/article/10.1007/BF00175354
+ * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge 
Recombination Operator,
+ * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238
+ * respectively.
+ */
+object JoinReorderGA extends PredicateHelper with Logging {
+
+  def search(
+  conf: SQLConf,
+  items: Seq[LogicalPlan],
+  conditions: Set[Expression],
+  output: Seq[Attribute]): Option[LogicalPlan] = {
+
+val startTime = System.nanoTime()
+
+val itemsWithIndex = items.zipWithIndex.map {
+  case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0))
+}.toMap
+
+val topOutputSet = AttributeSet(output)
+
+val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve
+
+val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
+logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of 
items: " +
+s"${items.length}, number of plans in memo: ${ pop.chromos.size}")
+
+assert(pop.chromos.head.basicPlans.size == items.length)
+pop.chromos.head.integratedPlan match {
+  case Some(joinPlan) => joinPlan.plan match {
+case p @ Project(projectList, _: Join) if projectList != output =>
+  assert(topOutputSet == p.outputSet)
+  // Keep the same order of final output attributes.
+  Some(p.copy(projectList = output))
+case finalPlan if !sameOutput(finalPlan, output) =>
+  Some(Project(output, finalPlan))
+case finalPlan =>
+  Some(finalPlan)
+  }
+  case _ => None
+}
+  }
+}
+
+/**
+ * A pair of parent individuals can breed a child with certain crossover 
process.
+ * With crossover, child can inherit gene from its parents, and these gene 
snippets
+ * finally compose a new [[Chromosome]].
+ */
+@DeveloperApi
+trait Crossover {
+
+  /**
+   * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s,
+   * with this crossover algorithm.
+   */
+  def newChromo(father: Chromosome, mother: Chromosome) : Chromosome
+}
+
+case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]])
+
+/**
+ * This class implements the Genetic Edge Recombination algorithm.
+ * For more information about the Genetic Edge Recombination,
+ * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge
+ * Recombination Operator" by Darrell Whitley et al.
+ * https://dl.acm.org/citation.cfm?id=657238
+ */
+object EdgeRecombination extends Crossover {
+
+  def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = {
+val fatherTable = father.basicPlans.map(g => g -> 
findNeighbours(father.basicPlans, g)).toMap
+val motherTable = mother.basicPlans.map(g => g -> 
findNeighbours(mother.basicPlans, g)).toMap
+EdgeTable(
+  fatherTable.map(entry => entry._1 -> (entry._2 ++ 
motherTable(entry._1
+  }
+
+  def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = {
+val genesIndexed = genes.toIndexedSeq
+val index = genesIndexed.indexOf(g)
+val length = genes.size
+if (index > 0 && index < length - 1) {
+  Seq(genesIndexed(index - 1), genesIndexed(index + 1))
+} else if (index == 0) {
+  Seq(genesIndexed(1), genesIndexed(length - 1))
+} else if (index == length - 1) {
+  Seq(genesIndexed(0), genesIndexed(length - 2))
+} else {
+  Seq()
+}
+  }
+
+  override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = 
{
+var newGenes: Seq[JoinPlan] = Seq()
+// 1. Generate the edge table.
+var table = genEdgeTable(father, mother).table
+// 2. Choose a start point randomly from the heads of father/mother.
+var current =
+  if (util.Random.nextInt(2) == 0) father.basicPlans.head else 
mother.basicPlans.head
+newGenes :+= current
+
+var stop = false
+while (!stop) {
+  // 3. Filter out the chosen point from the edge table.
+  table = table.map(
+

[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

xianyinxin commented on a change in pull request #24983: 
[SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#discussion_r311352547
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ##
 @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper {
  * extended with the set of connected/unconnected plans.
  */
 case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int])
+
+/**
+ * Reorder the joins using a genetic algorithm. The algorithm treat the 
reorder problem
+ * to a traveling salesmen problem, and use genetic algorithm give an 
optimized solution.
+ *
+ * The implementation refs the geqo in postgresql, which is contibuted by 
Darrell Whitley:
+ * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html
+ *
+ * For more info about genetic algorithm and the edge recombination crossover, 
pls see:
+ * "A Genetic Algorithm Tutorial, Darrell Whitley"
+ * https://link.springer.com/article/10.1007/BF00175354
+ * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge 
Recombination Operator,
+ * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238
+ * respectively.
+ */
+object JoinReorderGA extends PredicateHelper with Logging {
+
+  def search(
+  conf: SQLConf,
+  items: Seq[LogicalPlan],
+  conditions: Set[Expression],
+  output: Seq[Attribute]): Option[LogicalPlan] = {
+
+val startTime = System.nanoTime()
+
+val itemsWithIndex = items.zipWithIndex.map {
+  case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0))
+}.toMap
+
+val topOutputSet = AttributeSet(output)
+
+val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve
+
+val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
+logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of 
items: " +
+s"${items.length}, number of plans in memo: ${ pop.chromos.size}")
+
+assert(pop.chromos.head.basicPlans.size == items.length)
+pop.chromos.head.integratedPlan match {
+  case Some(joinPlan) => joinPlan.plan match {
+case p @ Project(projectList, _: Join) if projectList != output =>
+  assert(topOutputSet == p.outputSet)
+  // Keep the same order of final output attributes.
+  Some(p.copy(projectList = output))
+case finalPlan if !sameOutput(finalPlan, output) =>
+  Some(Project(output, finalPlan))
+case finalPlan =>
+  Some(finalPlan)
+  }
+  case _ => None
+}
+  }
+}
+
+/**
+ * A pair of parent individuals can breed a child with certain crossover 
process.
+ * With crossover, child can inherit gene from its parents, and these gene 
snippets
+ * finally compose a new [[Chromosome]].
+ */
+@DeveloperApi
+trait Crossover {
+
+  /**
+   * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s,
+   * with this crossover algorithm.
+   */
+  def newChromo(father: Chromosome, mother: Chromosome) : Chromosome
+}
+
+case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]])
 
 Review comment:
   Removed `EdgeRecombination` since there's only one explicit use of this 
class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] advancedxy commented on issue #25002: [SPARK-28203][Core][Python] PythonRDD should respect SparkContext's hadoop configuration

2019-08-06 Thread GitBox

advancedxy commented on issue #25002: [SPARK-28203][Core][Python] PythonRDD 
should respect SparkContext's hadoop configuration
URL: https://github.com/apache/spark/pull/25002#issuecomment-518921394
 
 
   Gently ping @cloud-fan, @HyukjinKwon and @dongjoon-hyun.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

xianyinxin commented on a change in pull request #24983: 
[SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#discussion_r311353000
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ##
 @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper {
  * extended with the set of connected/unconnected plans.
  */
 case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int])
+
+/**
+ * Reorder the joins using a genetic algorithm. The algorithm treat the 
reorder problem
+ * to a traveling salesmen problem, and use genetic algorithm give an 
optimized solution.
+ *
+ * The implementation refs the geqo in postgresql, which is contibuted by 
Darrell Whitley:
+ * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html
+ *
+ * For more info about genetic algorithm and the edge recombination crossover, 
pls see:
+ * "A Genetic Algorithm Tutorial, Darrell Whitley"
+ * https://link.springer.com/article/10.1007/BF00175354
+ * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge 
Recombination Operator,
+ * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238
+ * respectively.
+ */
+object JoinReorderGA extends PredicateHelper with Logging {
+
+  def search(
+  conf: SQLConf,
+  items: Seq[LogicalPlan],
+  conditions: Set[Expression],
+  output: Seq[Attribute]): Option[LogicalPlan] = {
+
+val startTime = System.nanoTime()
+
+val itemsWithIndex = items.zipWithIndex.map {
+  case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0))
+}.toMap
+
+val topOutputSet = AttributeSet(output)
+
+val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve
+
+val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
+logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of 
items: " +
+s"${items.length}, number of plans in memo: ${ pop.chromos.size}")
+
+assert(pop.chromos.head.basicPlans.size == items.length)
+pop.chromos.head.integratedPlan match {
+  case Some(joinPlan) => joinPlan.plan match {
+case p @ Project(projectList, _: Join) if projectList != output =>
+  assert(topOutputSet == p.outputSet)
+  // Keep the same order of final output attributes.
+  Some(p.copy(projectList = output))
+case finalPlan if !sameOutput(finalPlan, output) =>
+  Some(Project(output, finalPlan))
+case finalPlan =>
+  Some(finalPlan)
+  }
+  case _ => None
+}
+  }
+}
+
+/**
+ * A pair of parent individuals can breed a child with certain crossover 
process.
+ * With crossover, child can inherit gene from its parents, and these gene 
snippets
+ * finally compose a new [[Chromosome]].
+ */
+@DeveloperApi
+trait Crossover {
+
+  /**
+   * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s,
+   * with this crossover algorithm.
+   */
+  def newChromo(father: Chromosome, mother: Chromosome) : Chromosome
+}
+
+case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]])
+
+/**
+ * This class implements the Genetic Edge Recombination algorithm.
+ * For more information about the Genetic Edge Recombination,
+ * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge
+ * Recombination Operator" by Darrell Whitley et al.
+ * https://dl.acm.org/citation.cfm?id=657238
+ */
+object EdgeRecombination extends Crossover {
+
+  def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = {
+val fatherTable = father.basicPlans.map(g => g -> 
findNeighbours(father.basicPlans, g)).toMap
+val motherTable = mother.basicPlans.map(g => g -> 
findNeighbours(mother.basicPlans, g)).toMap
+EdgeTable(
+  fatherTable.map(entry => entry._1 -> (entry._2 ++ 
motherTable(entry._1
+  }
+
+  def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = {
+val genesIndexed = genes.toIndexedSeq
+val index = genesIndexed.indexOf(g)
+val length = genes.size
+if (index > 0 && index < length - 1) {
+  Seq(genesIndexed(index - 1), genesIndexed(index + 1))
+} else if (index == 0) {
+  Seq(genesIndexed(1), genesIndexed(length - 1))
+} else if (index == length - 1) {
+  Seq(genesIndexed(0), genesIndexed(length - 2))
+} else {
+  Seq()
+}
+  }
+
+  override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = 
{
+var newGenes: Seq[JoinPlan] = Seq()
+// 1. Generate the edge table.
+var table = genEdgeTable(father, mother).table
+// 2. Choose a start point randomly from the heads of father/mother.
+var current =
+  if (util.Random.nextInt(2) == 0) father.basicPlans.head else 
mother.basicPlans.head
+newGenes :+= current
+
+var stop = false
+while (!stop) {
+  // 3. Filter out the chosen point from the edge table.
+  table = table.map(
+

[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-518921599
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #24983: [SPARK-27714][SQL][CBO] Support 
Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-518921601
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13824/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-08-06 Thread GitBox

SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-518922387
 
 
   **[Test build #108734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108734/testReport)**
 for PR 24232 at commit 
[`a8c7ecd`](https://github.com/apache/spark/commit/a8c7ecd27b8d0fcabfd86571eeba801bb5c7e62a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

SparkQA commented on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic 
Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-518921923
 
 
   **[Test build #108743 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108743/testReport)**
 for PR 24983 at commit 
[`75b5037`](https://github.com/apache/spark/commit/75b50373fa3120d9b1726155756909315e2b8b58).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] 
Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-518921601
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13824/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xianyinxin commented on a change in pull request #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

xianyinxin commented on a change in pull request #24983: 
[SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#discussion_r311353280
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ##
 @@ -470,3 +397,427 @@ object JoinReorderDPFilters extends PredicateHelper {
  * extended with the set of connected/unconnected plans.
  */
 case class JoinGraphInfo (starJoins: Set[Int], nonStarJoins: Set[Int])
+
+/**
+ * Reorder the joins using a genetic algorithm. The algorithm treat the 
reorder problem
+ * to a traveling salesmen problem, and use genetic algorithm give an 
optimized solution.
+ *
+ * The implementation refs the geqo in postgresql, which is contibuted by 
Darrell Whitley:
+ * https://www.postgresql.org/docs/9.1/geqo-pg-intro.html
+ *
+ * For more info about genetic algorithm and the edge recombination crossover, 
pls see:
+ * "A Genetic Algorithm Tutorial, Darrell Whitley"
+ * https://link.springer.com/article/10.1007/BF00175354
+ * and "Scheduling Problems and Traveling Salesmen: The Genetic Edge 
Recombination Operator,
+ * Darrell Whitley et al." https://dl.acm.org/citation.cfm?id=657238
+ * respectively.
+ */
+object JoinReorderGA extends PredicateHelper with Logging {
+
+  def search(
+  conf: SQLConf,
+  items: Seq[LogicalPlan],
+  conditions: Set[Expression],
+  output: Seq[Attribute]): Option[LogicalPlan] = {
+
+val startTime = System.nanoTime()
+
+val itemsWithIndex = items.zipWithIndex.map {
+  case (plan, id) => id -> JoinPlan(Set(id), plan, Set.empty, Cost(0, 0))
+}.toMap
+
+val topOutputSet = AttributeSet(output)
+
+val pop = Population(conf, itemsWithIndex, conditions, topOutputSet).evolve
+
+val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
+logInfo(s"Join reordering finished. Duration: $durationInMs ms, number of 
items: " +
+s"${items.length}, number of plans in memo: ${ pop.chromos.size}")
+
+assert(pop.chromos.head.basicPlans.size == items.length)
+pop.chromos.head.integratedPlan match {
+  case Some(joinPlan) => joinPlan.plan match {
+case p @ Project(projectList, _: Join) if projectList != output =>
+  assert(topOutputSet == p.outputSet)
+  // Keep the same order of final output attributes.
+  Some(p.copy(projectList = output))
+case finalPlan if !sameOutput(finalPlan, output) =>
+  Some(Project(output, finalPlan))
+case finalPlan =>
+  Some(finalPlan)
+  }
+  case _ => None
+}
+  }
+}
+
+/**
+ * A pair of parent individuals can breed a child with certain crossover 
process.
+ * With crossover, child can inherit gene from its parents, and these gene 
snippets
+ * finally compose a new [[Chromosome]].
+ */
+@DeveloperApi
+trait Crossover {
+
+  /**
+   * Generate a new [[Chromosome]] from the given parent [[Chromosome]]s,
+   * with this crossover algorithm.
+   */
+  def newChromo(father: Chromosome, mother: Chromosome) : Chromosome
+}
+
+case class EdgeTable(table: Map[JoinPlan, Seq[JoinPlan]])
+
+/**
+ * This class implements the Genetic Edge Recombination algorithm.
+ * For more information about the Genetic Edge Recombination,
+ * see "Scheduling Problems and Traveling Salesmen: The Genetic Edge
+ * Recombination Operator" by Darrell Whitley et al.
+ * https://dl.acm.org/citation.cfm?id=657238
+ */
+object EdgeRecombination extends Crossover {
+
+  def genEdgeTable(father: Chromosome, mother: Chromosome) : EdgeTable = {
+val fatherTable = father.basicPlans.map(g => g -> 
findNeighbours(father.basicPlans, g)).toMap
+val motherTable = mother.basicPlans.map(g => g -> 
findNeighbours(mother.basicPlans, g)).toMap
+EdgeTable(
+  fatherTable.map(entry => entry._1 -> (entry._2 ++ 
motherTable(entry._1
+  }
+
+  def findNeighbours(genes: Seq[JoinPlan], g: JoinPlan) : Seq[JoinPlan] = {
+val genesIndexed = genes.toIndexedSeq
+val index = genesIndexed.indexOf(g)
+val length = genes.size
+if (index > 0 && index < length - 1) {
+  Seq(genesIndexed(index - 1), genesIndexed(index + 1))
+} else if (index == 0) {
+  Seq(genesIndexed(1), genesIndexed(length - 1))
+} else if (index == length - 1) {
+  Seq(genesIndexed(0), genesIndexed(length - 2))
+} else {
+  Seq()
+}
+  }
+
+  override def newChromo(father: Chromosome, mother: Chromosome): Chromosome = 
{
+var newGenes: Seq[JoinPlan] = Seq()
+// 1. Generate the edge table.
+var table = genEdgeTable(father, mother).table
+// 2. Choose a start point randomly from the heads of father/mother.
+var current =
+  if (util.Random.nextInt(2) == 0) father.basicPlans.head else 
mother.basicPlans.head
+newGenes :+= current
+
+var stop = false
+while (!stop) {
+  // 3. Filter out the chosen point from the edge table.
+  table = table.map(
+

[GitHub] [spark] AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] Support Genetic Algorithm based join reorder

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #24983: [SPARK-27714][SQL][CBO] 
Support Genetic Algorithm based join reorder
URL: https://github.com/apache/spark/pull/24983#issuecomment-518921599
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-08-06 Thread GitBox

SparkQA removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-518885405
 
 
   **[Test build #108734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108734/testReport)**
 for PR 24232 at commit 
[`a8c7ecd`](https://github.com/apache/spark/commit/a8c7ecd27b8d0fcabfd86571eeba801bb5c7e62a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query

2019-08-06 Thread GitBox

maropu commented on a change in pull request #25357: [SPARK-28617][SQL][TEST] 
Fix misplacement when comment is at the end of the query
URL: https://github.com/apache/spark/pull/25357#discussion_r310901893
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
 ##
 @@ -229,6 +229,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // List of SQL queries to run
 // note: this is not a robust way to split queries using semicolon, but 
works for now.
 val queries = 
code.mkString("\n").split("(?<=[^]);").map(_.trim).filter(_ != "").toSeq
+  
.map(_.split("\n").filterNot(_.startsWith("--")).mkString("\n")).map(_.trim).filter(_
 != "")
 
 Review comment:
   I feel this is a little complicated, so could you describe what this code 
does in the comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile closed pull request #25358: [SPARK-28622][SQL][PYTHON] Rename PullOutPythonUDFInJoinCondition to ExtractPythonUDFFromJoinCondition and move to 'Extract Python UDFs'

2019-08-06 Thread GitBox

gatorsmile closed pull request #25358: [SPARK-28622][SQL][PYTHON] Rename 
PullOutPythonUDFInJoinCondition to ExtractPythonUDFFromJoinCondition and move 
to 'Extract Python UDFs'
URL: https://github.com/apache/spark/pull/25358
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #25253: [SPARK-28470][SQL] Cast to decimal throws ArithmeticException on overflow

2019-08-06 Thread GitBox

maropu commented on issue #25253: [SPARK-28470][SQL] Cast to decimal throws 
ArithmeticException on overflow
URL: https://github.com/apache/spark/pull/25253#issuecomment-518534442
 
 
   Anyone can check this for sign-off before merging?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] 
Add supportColumnar in DebugExec
URL: https://github.com/apache/spark/pull/25365#issuecomment-518536143
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide 
line for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518536231
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a 
guide line for dataframe functions to say column signature function is by 
default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518536231
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25365: 
[SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
URL: https://github.com/apache/spark/pull/25365#issuecomment-518536143
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec

2019-08-06 Thread GitBox

SparkQA removed a comment on issue #25365: 
[SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
URL: https://github.com/apache/spark/pull/25365#issuecomment-518513993
 
 
   **[Test build #108698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108698/testReport)**
 for PR 25365 at commit 
[`352a3cb`](https://github.com/apache/spark/commit/352a3cb40c851cdba5e4289095d54809438f0a7b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix 
misplacement when comment is at the end of the query
URL: https://github.com/apache/spark/pull/25357#issuecomment-518536212
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

SparkQA removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide 
line for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518505387
 
 
   **[Test build #108694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108694/testReport)**
 for PR 25355 at commit 
[`a07dce5`](https://github.com/apache/spark/commit/a07dce5a71b62d21064fc585f7ef746fb2fff6cc).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query

2019-08-06 Thread GitBox

SparkQA removed a comment on issue #25357: [SPARK-28617][SQL][TEST] Fix 
misplacement when comment is at the end of the query
URL: https://github.com/apache/spark/pull/25357#issuecomment-518507447
 
 
   **[Test build #108695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108695/testReport)**
 for PR 25357 at commit 
[`eeb7405`](https://github.com/apache/spark/commit/eeb7405ad0c7cc1004e2cad36929d20d95ab2726).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix 
misplacement when comment is at the end of the query
URL: https://github.com/apache/spark/pull/25357#issuecomment-518536226
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108695/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide 
line for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518536234
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108694/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25357: [SPARK-28617][SQL][TEST] Fix 
misplacement when comment is at the end of the query
URL: https://github.com/apache/spark/pull/25357#issuecomment-518536212
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] 
Add supportColumnar in DebugExec
URL: https://github.com/apache/spark/pull/25365#issuecomment-518536154
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108698/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain 
should not trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518537763
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13785/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25328: [SPARK-28595][SQL] explain 
should not trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518537760
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not 
trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518537760
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25328: [SPARK-28595][SQL] explain should not 
trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518537763
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13785/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not 
trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518538601
 
 
   **[Test build #108700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108700/testReport)**
 for PR 25328 at commit 
[`0652c22`](https://github.com/apache/spark/commit/0652c224466be741f985b77104cfbebb2cbf1a9e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize

2019-08-06 Thread GitBox

beliefer commented on a change in pull request #25309: 
[SPARK-28577][YARN]Resource capability requested for each executor add 
offHeapMemorySize 
URL: https://github.com/apache/spark/pull/25309#discussion_r310905901
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ##
 @@ -184,4 +184,29 @@ object YarnSparkHadoopUtil {
 ConverterUtils.toContainerId(containerIdString)
   }
 
+  /**
+   * If MEMORY_OFFHEAP_ENABLED is true, we should ensure 
executorOverheadMemory requested value
+   * is not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource 
requested for executor
+   * may be not enough.
+   */
+  def executorMemoryOverheadRequested(sparkConf: SparkConf): Int = {
+val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
+val overhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse(
+  math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN)).toInt
+val offHeap = if (sparkConf.get(MEMORY_OFFHEAP_ENABLED)) {
+  val size =
+sparkConf.getSizeAsMb(MEMORY_OFFHEAP_SIZE.key, 
MEMORY_OFFHEAP_SIZE.defaultValueString)
+  require(size > 0,
+s"${MEMORY_OFFHEAP_SIZE.key} must be > 0 when 
${MEMORY_OFFHEAP_ENABLED.key} == true")
+  if (size > overhead) {
+logWarning(s"The value of ${MEMORY_OFFHEAP_SIZE.key}(${size}MB) will 
be used as " +
+  s"executorMemoryOverhead to request resource to ensure that Executor 
has enough memory " +
+  s"to use. It is recommended that the configuration value of " +
+  s"${EXECUTOR_MEMORY_OVERHEAD.key} should be no less than 
${MEMORY_OFFHEAP_SIZE.key} " +
+  s"when ${MEMORY_OFFHEAP_ENABLED.key} is true.")
+  }
+  size
+} else 0
+math.max(overhead, offHeap).toInt
 
 Review comment:
   I have check the code and doc, there exists some inconsistent. According to 
the docs, `memoryOverhead` should comprise `pysparkWorkerMemory`. But the code 
have different behavior.
   We need to fix the inconsistent. I think should reduce parameter to control 
memory, because more simple.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize

2019-08-06 Thread GitBox

beliefer commented on a change in pull request #25309: 
[SPARK-28577][YARN]Resource capability requested for each executor add 
offHeapMemorySize 
URL: https://github.com/apache/spark/pull/25309#discussion_r310905901
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ##
 @@ -184,4 +184,29 @@ object YarnSparkHadoopUtil {
 ConverterUtils.toContainerId(containerIdString)
   }
 
+  /**
+   * If MEMORY_OFFHEAP_ENABLED is true, we should ensure 
executorOverheadMemory requested value
+   * is not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource 
requested for executor
+   * may be not enough.
+   */
+  def executorMemoryOverheadRequested(sparkConf: SparkConf): Int = {
+val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
+val overhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse(
+  math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN)).toInt
+val offHeap = if (sparkConf.get(MEMORY_OFFHEAP_ENABLED)) {
+  val size =
+sparkConf.getSizeAsMb(MEMORY_OFFHEAP_SIZE.key, 
MEMORY_OFFHEAP_SIZE.defaultValueString)
+  require(size > 0,
+s"${MEMORY_OFFHEAP_SIZE.key} must be > 0 when 
${MEMORY_OFFHEAP_ENABLED.key} == true")
+  if (size > overhead) {
+logWarning(s"The value of ${MEMORY_OFFHEAP_SIZE.key}(${size}MB) will 
be used as " +
+  s"executorMemoryOverhead to request resource to ensure that Executor 
has enough memory " +
+  s"to use. It is recommended that the configuration value of " +
+  s"${EXECUTOR_MEMORY_OVERHEAD.key} should be no less than 
${MEMORY_OFFHEAP_SIZE.key} " +
+  s"when ${MEMORY_OFFHEAP_ENABLED.key} is true.")
+  }
+  size
+} else 0
+math.max(overhead, offHeap).toInt
 
 Review comment:
   I have check the code and doc, there exists some inconsistent. According to 
the docs, `memoryOverhead` should comprise `pysparkWorkerMemory`. But the code 
have different behavior.
   We need to fix the inconsistent. I think should reduce parameter to control 
memory, because more simple. @JoshRosen Could you take a look at this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

cloud-fan commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line 
for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518541569
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a 
guide line for dataframe functions to say column signature function is by 
default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518543465
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide 
line for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518543471
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13786/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide 
line for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518543465
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots

2019-08-06 Thread GitBox

HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve 
merge-spark-pr script to warn WIP PRs and strip trailing dots
URL: https://github.com/apache/spark/pull/25356#issuecomment-518543461
 
 
   looks intermittent though .. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots

2019-08-06 Thread GitBox

HyukjinKwon commented on issue #25356: [SPARK-28616][INFRA] Improve 
merge-spark-pr script to warn WIP PRs and strip trailing dots
URL: https://github.com/apache/spark/pull/25356#issuecomment-518543383
 
 
   Yes, seems so - 
https://github.com/apache/spark/pull/25363#issuecomment-518482370


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-06 Thread GitBox

mgaido91 commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-518550239
 
 
   Yes @maropu, you're right. The reason why I didn't change the output 
attribute was not to cause a breaking change. But since we are introducing a 
flag for it, it may be ok to do so. What do you think? cc @cloud-fan what do 
you think about this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test.

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25366: 
[SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test.
URL: https://github.com/apache/spark/pull/25366#issuecomment-518558209
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open comment about boolean test.

2019-08-06 Thread GitBox

SparkQA commented on issue #25366: [SPARK-27918][SQL][TEST][FOLLOW-UP] Open 
comment about boolean test.
URL: https://github.com/apache/spark/pull/25366#issuecomment-518559393
 
 
   **[Test build #108702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108702/testReport)**
 for PR 25366 at commit 
[`9654246`](https://github.com/apache/spark/commit/965424655dbe8bfdb5a9b162724b270bfedf5cbe).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25309: [SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize

2019-08-06 Thread GitBox

beliefer commented on a change in pull request #25309: 
[SPARK-28577][YARN]Resource capability requested for each executor add 
offHeapMemorySize 
URL: https://github.com/apache/spark/pull/25309#discussion_r310905901
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ##
 @@ -184,4 +184,29 @@ object YarnSparkHadoopUtil {
 ConverterUtils.toContainerId(containerIdString)
   }
 
+  /**
+   * If MEMORY_OFFHEAP_ENABLED is true, we should ensure 
executorOverheadMemory requested value
+   * is not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource 
requested for executor
+   * may be not enough.
+   */
+  def executorMemoryOverheadRequested(sparkConf: SparkConf): Int = {
+val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
+val overhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse(
+  math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN)).toInt
+val offHeap = if (sparkConf.get(MEMORY_OFFHEAP_ENABLED)) {
+  val size =
+sparkConf.getSizeAsMb(MEMORY_OFFHEAP_SIZE.key, 
MEMORY_OFFHEAP_SIZE.defaultValueString)
+  require(size > 0,
+s"${MEMORY_OFFHEAP_SIZE.key} must be > 0 when 
${MEMORY_OFFHEAP_ENABLED.key} == true")
+  if (size > overhead) {
+logWarning(s"The value of ${MEMORY_OFFHEAP_SIZE.key}(${size}MB) will 
be used as " +
+  s"executorMemoryOverhead to request resource to ensure that Executor 
has enough memory " +
+  s"to use. It is recommended that the configuration value of " +
+  s"${EXECUTOR_MEMORY_OVERHEAD.key} should be no less than 
${MEMORY_OFFHEAP_SIZE.key} " +
+  s"when ${MEMORY_OFFHEAP_ENABLED.key} is true.")
+  }
+  size
+} else 0
+math.max(overhead, offHeap).toInt
 
 Review comment:
   Let me have a check!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

viirya commented on a change in pull request #25328: [SPARK-28595][SQL] explain 
should not trigger partition listing
URL: https://github.com/apache/spark/pull/25328#discussion_r310907987
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 ##
 @@ -346,6 +348,12 @@ case class FileSourceScanExec(
 } else {
   None
 }
+  } ++ {
+if (relation.partitionSchemaOption.isDefined) {
+  Some("numPartitions" -> SQLMetrics.createMetric(sparkContext, "number of 
partitions read"))
 
 Review comment:
   Although previously it is `PartitionCount`, `numPartitions` looks more 
consistent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

gatorsmile commented on issue #25328: [SPARK-28595][SQL] explain should not 
trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518535625
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not trigger partition listing

2019-08-06 Thread GitBox

SparkQA commented on issue #25328: [SPARK-28595][SQL] explain should not 
trigger partition listing
URL: https://github.com/apache/spark/pull/25328#issuecomment-518535835
 
 
   **[Test build #108699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108699/testReport)**
 for PR 25328 at commit 
[`0652c22`](https://github.com/apache/spark/commit/0652c224466be741f985b77104cfbebb2cbf1a9e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

SparkQA commented on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line 
for dataframe functions to say column signature function is by default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518536012
 
 
   **[Test build #108694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108694/testReport)**
 for PR 25355 at commit 
[`a07dce5`](https://github.com/apache/spark/commit/a07dce5a71b62d21064fc585f7ef746fb2fff6cc).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec

2019-08-06 Thread GitBox

SparkQA commented on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add 
supportColumnar in DebugExec
URL: https://github.com/apache/spark/pull/25365#issuecomment-518536015
 
 
   **[Test build #108698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108698/testReport)**
 for PR 25365 at commit 
[`352a3cb`](https://github.com/apache/spark/commit/352a3cb40c851cdba5e4289095d54809438f0a7b).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement when comment is at the end of the query

2019-08-06 Thread GitBox

SparkQA commented on issue #25357: [SPARK-28617][SQL][TEST] Fix misplacement 
when comment is at the end of the query
URL: https://github.com/apache/spark/pull/25357#issuecomment-518536013
 
 
   **[Test build #108695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108695/testReport)**
 for PR 25357 at commit 
[`eeb7405`](https://github.com/apache/spark/commit/eeb7405ad0c7cc1004e2cad36929d20d95ab2726).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr script to warn WIP PRs and strip trailing dots

2019-08-06 Thread GitBox

viirya commented on issue #25356: [SPARK-28616][INFRA] Improve merge-spark-pr 
script to warn WIP PRs and strip trailing dots
URL: https://github.com/apache/spark/pull/25356#issuecomment-518537430
 
 
   > * checking CRAN incoming feasibility ...Error in readRDS(con) : 
   
   Looks different to previous CRAN error. Was it happened again?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25365: [SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25365: 
[SPARK-28537][SQL][HOTFIX][FOLLOW-UP] Add supportColumnar in DebugExec
URL: https://github.com/apache/spark/pull/25365#issuecomment-518536154
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108698/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a guide line for dataframe functions to say column signature function is by default

2019-08-06 Thread GitBox

AmplabJenkins removed a comment on issue #25355: [SPARK-28615][SQL][DOCS] Add a 
guide line for dataframe functions to say column signature function is by 
default
URL: https://github.com/apache/spark/pull/25355#issuecomment-518536234
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108694/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 >

501 - 600 of 803 matches

Mail list logo