[GitHub] [spark] HyukjinKwon closed pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing

2020-10-19 Thread GitBox


HyukjinKwon closed pull request #30056:
URL: https://github.com/apache/spark/pull/30056


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon removed a comment on pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing

2020-10-19 Thread GitBox


HyukjinKwon removed a comment on pull request #30056:
URL: https://github.com/apache/spark/pull/30056#issuecomment-712610574


   @cloud-fan can you take a look too just for doubly sure?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #30056:
URL: https://github.com/apache/spark/pull/30056#issuecomment-712612351


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30056: [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #30056:
URL: https://github.com/apache/spark/pull/30056#issuecomment-712610574


   @cloud-fan can you take a look too just for doubly sure?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29812: [SPARK-32941][SQL] Optimize UpdateFields expression chain and put the rule early in Analysis phase

2020-10-19 Thread GitBox


viirya commented on pull request #29812:
URL: https://github.com/apache/spark/pull/29812#issuecomment-712607215


   @HyukjinKwon Thanks for the suggestion. I updated this PR description.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712602334


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130031/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712602328







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712602328


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712578947


   **[Test build #130031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130031/testReport)**
 for PR 30097 at commit 
[`4792ebd`](https://github.com/apache/spark/commit/4792ebd71d20739ec8345ba2edd0e8d7b28dfd48).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712602038


   **[Test build #130031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130031/testReport)**
 for PR 30097 at commit 
[`4792ebd`](https://github.com/apache/spark/commit/4792ebd71d20739ec8345ba2edd0e8d7b28dfd48).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  class RemoveAllHints extends Rule[LogicalPlan] `
 * `  class DisableHints extends RemoveAllHints `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712598524


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34637/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712588757


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130030/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712531902


   **[Test build #130030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)**
 for PR 30057 at commit 
[`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712588752


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712598520


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712598520







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712598500


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34637/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HyukjinKwon commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508216007



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   oh! got it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


AngersZh commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712595799


   > **[Test build #130030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)**
 for PR 30057 at commit 
[`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304).
   > 
   > * This patch **fails Spark unit tests**.
   > * This patch merges cleanly.
   > * This patch adds no public classes.
   
   Seems test 
`org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite.subquery/scalar-subquery/scalar-subquery-select.sql`
 unstable?  Is there anyone working on this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712594756


   I am okay with not blocking this and @tgravescs PRs by the test failures - 
hope we could fix it very soon though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712589515


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34637/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


sunchao edited a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712586049


   > @sunchao, yes I think we can do that but would you mind creating a 
separate PR to fix the test first though? Using python3 with my workaround fix 
should be good enough.
   
   @HyukjinKwon Currently the github action tests pass without the `python3` 
change (I later made some change which seem to fix the tests related to 
python3), and the jenkins tests fail either w/ or w/o it: in the latter case it 
fails with error such as:
   ```
   20/10/16 19:20:36 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) 
(amp-jenkins-worker-03.amp executor 1): org.apache.spark.SparkException: 
   Error from python worker:
 Traceback (most recent call last):
   File "/usr/lib64/python2.6/runpy.py", line 104, in _run_module_as_main
 loader, code, fname = _get_module_details(mod_name)
   File "/usr/lib64/python2.6/runpy.py", line 79, in _get_module_details
 loader = get_loader(mod_name)
   File "/usr/lib64/python2.6/pkgutil.py", line 456, in get_loader
 return find_loader(fullname)
   File "/usr/lib64/python2.6/pkgutil.py", line 466, in find_loader
 for importer in iter_importers(fullname):
   File "/usr/lib64/python2.6/pkgutil.py", line 422, in iter_importers
 __import__(pkg)
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/__init__.py", 
line 53, in 
 from pyspark.rdd import RDD, RDDBarrier
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/rdd.py", line 
34, in 
 from pyspark.java_gateway import local_connect_and_auth
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/java_gateway.py",
 line 29, in 
 from py4j.java_gateway import java_import, JavaGateway, JavaObject, 
GatewayParameters
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
 line 60
 PY4J_TRUE = {"yes", "y", "t", "true"}
   ^
 SyntaxError: invalid syntax
   ```
   
   So do we still need a separate PR for that right now?
   
   > Also, seems like we're going to split PR (?). The first one (this) is for 
preparation, and second one is actually bumping up to Hadoop version to 3.2.1 
(?). Would you mind clarifying the plan and what this PR proposes in the 
description/title?
   
   Yes that's right. The plan is to have a separate PR bumping Hadoop version 
to 3.2.2 when that comes out (probably will be soon). There is a 
[bug](https://issues.apache.org/jira/browse/HDFS-15191) in 3.2.1 which affects 
wire compatibility between 3.2 clients and 2.x server. 
   
   I'll update the PR description soon. Thanks.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


sunchao edited a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712586049


   > @sunchao, yes I think we can do that but would you mind creating a 
separate PR to fix the test first though? Using python3 with my workaround fix 
should be good enough.
   
   @HyukjinKwon Currently the github action tests pass without the `python3` 
change, and the jenkins tests fail either w/ or w/o it: in the latter case it 
fails with error such as:
   ```
   20/10/16 19:20:36 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) 
(amp-jenkins-worker-03.amp executor 1): org.apache.spark.SparkException: 
   Error from python worker:
 Traceback (most recent call last):
   File "/usr/lib64/python2.6/runpy.py", line 104, in _run_module_as_main
 loader, code, fname = _get_module_details(mod_name)
   File "/usr/lib64/python2.6/runpy.py", line 79, in _get_module_details
 loader = get_loader(mod_name)
   File "/usr/lib64/python2.6/pkgutil.py", line 456, in get_loader
 return find_loader(fullname)
   File "/usr/lib64/python2.6/pkgutil.py", line 466, in find_loader
 for importer in iter_importers(fullname):
   File "/usr/lib64/python2.6/pkgutil.py", line 422, in iter_importers
 __import__(pkg)
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/__init__.py", 
line 53, in 
 from pyspark.rdd import RDD, RDDBarrier
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/rdd.py", line 
34, in 
 from pyspark.java_gateway import local_connect_and_auth
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/java_gateway.py",
 line 29, in 
 from py4j.java_gateway import java_import, JavaGateway, JavaObject, 
GatewayParameters
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
 line 60
 PY4J_TRUE = {"yes", "y", "t", "true"}
   ^
 SyntaxError: invalid syntax
   ```
   
   So do we still need a separate PR for that right now?
   
   > Also, seems like we're going to split PR (?). The first one (this) is for 
preparation, and second one is actually bumping up to Hadoop version to 3.2.1 
(?). Would you mind clarifying the plan and what this PR proposes in the 
description/title?
   
   Yes that's right. The plan is to have a separate PR bumping Hadoop version 
to 3.2.2 when that comes out (probably will be soon). There is a 
[bug](https://issues.apache.org/jira/browse/HDFS-15191) in 3.2.1 which affects 
wire compatibility between 3.2 clients and 2.x server. 
   
   I'll update the PR description soon. Thanks.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712588752







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


SparkQA commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712588337


   **[Test build #130030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)**
 for PR 30057 at commit 
[`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r508169611



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+// Add `spark` prefix because it will run in NM in Yarn mode.
+
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+new CacheLoader() {
+  public ShuffleIndexInformation load(File file) throws IOException {
+return new ShuffleIndexInformation(file);
+  }
+};
+indexCache = CacheBuilder.newBuilder()
+.maximumWeight(conf.mergedIndexCacheSize())
+.weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+.build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves
+   * the associated metadata. If 

[GitHub] [spark] sunchao commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


sunchao commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712586049


   > @sunchao, yes I think we can do that but would you mind creating a 
separate PR to fix the test first though? Using python3 with my workaround fix 
should be good enough.
   
   Currently the github action tests pass without the `python3` change, and the 
jenkins tests fail either w/ or w/o the `python3` change: in the latter case it 
fails with error such as:
   ```
   20/10/16 19:20:36 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) 
(amp-jenkins-worker-03.amp executor 1): org.apache.spark.SparkException: 
   Error from python worker:
 Traceback (most recent call last):
   File "/usr/lib64/python2.6/runpy.py", line 104, in _run_module_as_main
 loader, code, fname = _get_module_details(mod_name)
   File "/usr/lib64/python2.6/runpy.py", line 79, in _get_module_details
 loader = get_loader(mod_name)
   File "/usr/lib64/python2.6/pkgutil.py", line 456, in get_loader
 return find_loader(fullname)
   File "/usr/lib64/python2.6/pkgutil.py", line 466, in find_loader
 for importer in iter_importers(fullname):
   File "/usr/lib64/python2.6/pkgutil.py", line 422, in iter_importers
 __import__(pkg)
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/__init__.py", 
line 53, in 
 from pyspark.rdd import RDD, RDDBarrier
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/rdd.py", line 
34, in 
 from pyspark.java_gateway import local_connect_and_auth
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/java_gateway.py",
 line 29, in 
 from py4j.java_gateway import java_import, JavaGateway, JavaObject, 
GatewayParameters
   File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
 line 60
 PY4J_TRUE = {"yes", "y", "t", "true"}
   ^
 SyntaxError: invalid syntax
   ```
   
   > Also, seems like we're going to split PR (?). The first one (this) is for 
preparation, and second one is actually bumping up to Hadoop version to 3.2.1 
(?). Would you mind clarifying the plan and what this PR proposes in the 
description/title?
   
   Yes that's right. The plan is to have a separate PR bumping Hadoop version 
to 3.2.2 when that comes out (probably will be soon). There is a 
[bug](https://issues.apache.org/jira/browse/HDFS-15191) in 3.2.1 which affects 
wire compatibility between 3.2 clients and 2.x server. 
   
   I'll update the PR description soon. Thanks.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HeartSaVioR commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508204618



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   This block actually contains one-line change:
   `case d: DriverWrapper if d.wrapped.getClass.getCanonicalName == className 
=> d.wrapped`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712581742







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712581742







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712494713


   **[Test build #130025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130025/testReport)**
 for PR 28026 at commit 
[`59d43de`](https://github.com/apache/spark/commit/59d43debc2301fccaffa54c924363242c9167ef7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


SparkQA commented on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712581228


   **[Test build #130025 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130025/testReport)**
 for PR 28026 at commit 
[`59d43de`](https://github.com/apache/spark/commit/59d43debc2301fccaffa54c924363242c9167ef7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30076: [SPARK-32862][SS] Left semi stream-stream join

2020-10-19 Thread GitBox


HeartSaVioR commented on a change in pull request #30076:
URL: https://github.com/apache/spark/pull/30076#discussion_r508201850



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##
@@ -99,13 +99,20 @@ class SymmetricHashJoinStateManager(
   /**
* Get all the matched values for given join condition, with marking matched.
* This method is designed to mark joined rows properly without exposing 
internal index of row.
+   *
+   * @param joinOnlyFirstTimeMatchedRow Only join with first-time matched row.

Review comment:
   That would depend on whether the follow PR would revert the change done 
here (say, back and forth) or not. If it's likely to do the back-and-forth and 
the follow-up is not a big thing (like less than 100 lines of code change and 
could be done in a couple of days), then probably better to address here to 
save efforts on reviewing something which will be reverted later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HyukjinKwon commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508201461



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistrySuite.scala
##
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.jdbc
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.execution.datasources.jdbc.connection.TestDriver
+
+class DriverRegistrySuite extends SparkFunSuite {
+  test("SPARK-32229: get must give back wrapped driver if wrapped") {

Review comment:
   If it's just a simple refactoring, we wouldn't need a test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HyukjinKwon commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508201158



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   I am okay either way but I more mean that I wanted to know how it 
relates to the PR description/title and JIRA





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-712578947


   **[Test build #130031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130031/testReport)**
 for PR 30097 at commit 
[`4792ebd`](https://github.com/apache/spark/commit/4792ebd71d20739ec8345ba2edd0e8d7b28dfd48).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712578041


   @sunchao, yes I think we can do that but would you mind creating a separate 
PR to fix the test first though? Using `python3` with my workaround fix should 
be good enough.
   
   Also, seems like we're going to split PR (?). The first one (this) is for 
preparation, and second one is actually bumping up to Hadoop version to 3.2.1 
(?). Would you mind clarifying the plan and what this PR proposes in the 
description/title?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken opened a new pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-19 Thread GitBox


leanken opened a new pull request #30097:
URL: https://github.com/apache/spark/pull/30097


   ### What changes were proposed in this pull request?
   
   Since Issue [SPARK-33139](https://issues.apache.org/jira/browse/SPARK-33139) 
has been done, and SQLConf.get and SparkSession.active are more reliable. We 
are trying to refine the existing code usage of passing SQLConf and 
SparkSession into sub-class of Rule[QueryPlan].
   
   In this PR.
   
   * remove SQLConf from ctor-parameter of all sub-class of Rule[QueryPlan].
   * using SQLConf.get to replace the original SQLConf instance.
   * remove SparkSession from ctor-parameter of all sub-class of 
Rule[QueryPlan].
   * using SparkSession.active to replace the original SparkSession instance.
   
   ### Why are the changes needed?
   
   Code refine.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   
   Existing test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29812: [SPARK-32941][SQL] Optimize UpdateFields expression chain and put the rule early in Analysis phase

2020-10-19 Thread GitBox


HyukjinKwon commented on a change in pull request #29812:
URL: https://github.com/apache/spark/pull/29812#discussion_r508195964



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala
##
@@ -17,19 +17,68 @@
 
 package org.apache.spark.sql.catalyst.optimizer
 
-import org.apache.spark.sql.catalyst.expressions.UpdateFields
+import java.util.Locale
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions.{Expression, UpdateFields, 
WithField}
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
 
 
 /**
- * Combines all adjacent [[UpdateFields]] expression into a single 
[[UpdateFields]] expression.
+ * Optimizes [[UpdateFields]] expression chains.
  */
-object CombineUpdateFields extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
+object OptimizeUpdateFields extends Rule[LogicalPlan] {
+  private def canOptimize(names: Seq[String]): Boolean = {
+if (SQLConf.get.caseSensitiveAnalysis) {
+  names.distinct.length != names.length
+} else {
+  names.map(_.toLowerCase(Locale.ROOT)).distinct.length != names.length
+}
+  }
+
+  val optimizeUpdateFields: PartialFunction[Expression, Expression] = {
+case UpdateFields(structExpr, fieldOps)
+  if fieldOps.forall(_.isInstanceOf[WithField]) &&
+canOptimize(fieldOps.map(_.asInstanceOf[WithField].name)) =>
+  val caseSensitive = SQLConf.get.caseSensitiveAnalysis
+
+  val withFields = fieldOps.map(_.asInstanceOf[WithField])
+  val names = withFields.map(_.name)
+  val values = withFields.map(_.valExpr)
+
+  val newNames = mutable.ArrayBuffer.empty[String]
+  val newValues = mutable.ArrayBuffer.empty[Expression]
+
+  if (caseSensitive) {
+names.zip(values).reverse.foreach { case (name, value) =>

Review comment:
   I wonder if we could just do like: 
`collection.immutable.ListMap(names.zip(values): _*)` which will keep the last 
win here and keep the order of fields to use later. But I guess it's no big 
deal. Just saying.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29812: [SPARK-32941][SQL] Optimize UpdateFields expression chain and put the rule early in Analysis phase

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #29812:
URL: https://github.com/apache/spark/pull/29812#issuecomment-712573143


   @viirya, BTW, do you mind fixing the PR description to explain what this PR 
specifically improves?
   
   > This patch proposes to add more optimization to `UpdateFields` expression 
chain.
   
   Seems like this PR does not describe what exactly optimizes. Is my 
understanding correct that this PR proposes two separate optimizations?
   
   - Deduplicates `WithField` at `UpdateFields`
   - Respect nullability in input struct at `GetStructField(UpdateFields(..., 
struct))`, and unwrap if-else.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


sunchao commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712572297


   Thanks @HyukjinKwon for confirming that! I'll consider this PR passed all 
tests then. As next step I'll change the Hadoop version to 3.2.0 and bump up 
the version in a separate PR as discussed before.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #30076: [SPARK-32862][SS] Left semi stream-stream join

2020-10-19 Thread GitBox


c21 commented on a change in pull request #30076:
URL: https://github.com/apache/spark/pull/30076#discussion_r508191906



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##
@@ -99,13 +99,20 @@ class SymmetricHashJoinStateManager(
   /**
* Get all the matched values for given join condition, with marking matched.
* This method is designed to mark joined rows properly without exposing 
internal index of row.
+   *
+   * @param joinOnlyFirstTimeMatchedRow Only join with first-time matched row.

Review comment:
   IMO it would be good to add the early eviction for left side matched 
row, in a follow up PR. If people strongly think we should add that at the 
first place with this PR, I can add as well, but need more time to polish it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30076: [SPARK-32862][SS] Left semi stream-stream join

2020-10-19 Thread GitBox


HeartSaVioR commented on a change in pull request #30076:
URL: https://github.com/apache/spark/pull/30076#discussion_r508188143



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##
@@ -99,13 +99,20 @@ class SymmetricHashJoinStateManager(
   /**
* Get all the matched values for given join condition, with marking matched.
* This method is designed to mark joined rows properly without exposing 
internal index of row.
+   *
+   * @param joinOnlyFirstTimeMatchedRow Only join with first-time matched row.

Review comment:
   Or like `excludeRowsAlreadyMatched`. I guess we'd like to have another 
method to deal with left-semi join efficiently like I commented (and in the PR 
description as well).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712559058







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #29874:
URL: https://github.com/apache/spark/pull/29874#issuecomment-712518103


   **[Test build #130028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130028/testReport)**
 for PR 29874 at commit 
[`ff644eb`](https://github.com/apache/spark/commit/ff644eb0c291a4ede8ac0237ae31a3fd68a22a8e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712450230


   **[Test build #130020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130020/testReport)**
 for PR 28026 at commit 
[`c7d5591`](https://github.com/apache/spark/commit/c7d5591c48e219e581d3907463619f642996b2b5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712507245


   **[Test build #130027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130027/testReport)**
 for PR 29843 at commit 
[`bcd81b7`](https://github.com/apache/spark/commit/bcd81b72b6f13a9ee44d9e1bc83e72b014005f09).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #29874:
URL: https://github.com/apache/spark/pull/29874#issuecomment-712564310







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712551323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712552167







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #29874:
URL: https://github.com/apache/spark/pull/29874#issuecomment-712564310







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29874: [SPARK-32998] Add ability to override default remote repos with inter…

2020-10-19 Thread GitBox


SparkQA commented on pull request #29874:
URL: https://github.com/apache/spark/pull/29874#issuecomment-712563758


   **[Test build #130028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130028/testReport)**
 for PR 29874 at commit 
[`ff644eb`](https://github.com/apache/spark/commit/ff644eb0c291a4ede8ac0237ae31a3fd68a22a8e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #29906: [SPARK-32037][CORE] Rename blacklisting feature

2020-10-19 Thread GitBox


Ngone51 commented on pull request #29906:
URL: https://github.com/apache/spark/pull/29906#issuecomment-712561183


   Shall we add `DeveloperApi` annotation to `SparkFirehoseListener` since we 
all agree it missed before? @tgravescs 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29906: [SPARK-32037][CORE] Rename blacklisting feature

2020-10-19 Thread GitBox


Ngone51 commented on a change in pull request #29906:
URL: https://github.com/apache/spark/pull/29906#discussion_r508182684



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
##
@@ -284,80 +284,127 @@ private[spark] class AppStatusListener(
   }
 
   override def onExecutorBlacklisted(event: SparkListenerExecutorBlacklisted): 
Unit = {
-updateBlackListStatus(event.executorId, true)
+updateExclusionStatus(event.executorId, true)
+  }
+
+  override def onExecutorExcluded(event: SparkListenerExecutorExcluded): Unit 
= {
+updateExclusionStatus(event.executorId, true)
   }
 
   override def onExecutorBlacklistedForStage(
   event: SparkListenerExecutorBlacklistedForStage): Unit = {
-val now = System.nanoTime()
+updateExclusionStatusForStage(event.stageId, event.stageAttemptId, 
event.executorId)
+  }
 
-Option(liveStages.get((event.stageId, event.stageAttemptId))).foreach { 
stage =>
-  setStageBlackListStatus(stage, now, event.executorId)
-}
-liveExecutors.get(event.executorId).foreach { exec =>
-  addBlackListedStageTo(exec, event.stageId, now)
-}
+  override def onExecutorExcludedForStage(
+  event: SparkListenerExecutorExcludedForStage): Unit = {
+updateExclusionStatusForStage(event.stageId, event.stageAttemptId, 
event.executorId)
   }
 
   override def onNodeBlacklistedForStage(event: 
SparkListenerNodeBlacklistedForStage): Unit = {
-val now = System.nanoTime()
+updateNodeExclusionStatusForStage(event.stageId, event.stageAttemptId, 
event.hostId)
+  }
 
-// Implicitly blacklist every available executor for the stage associated 
with this node
-Option(liveStages.get((event.stageId, event.stageAttemptId))).foreach { 
stage =>
-  val executorIds = liveExecutors.values.filter(_.host == 
event.hostId).map(_.executorId).toSeq
-  setStageBlackListStatus(stage, now, executorIds: _*)
-}
-liveExecutors.values.filter(_.hostname == event.hostId).foreach { exec =>
-  addBlackListedStageTo(exec, event.stageId, now)
-}
+  override def onNodeExcludedForStage(event: 
SparkListenerNodeExcludedForStage): Unit = {
+updateNodeExclusionStatusForStage(event.stageId, event.stageAttemptId, 
event.hostId)
   }
 
-  private def addBlackListedStageTo(exec: LiveExecutor, stageId: Int, now: 
Long): Unit = {
-exec.blacklistedInStages += stageId
+  private def addExcludedStageTo(exec: LiveExecutor, stageId: Int, now: Long): 
Unit = {
+exec.excludedInStages += stageId
 liveUpdate(exec, now)
   }
 
   private def setStageBlackListStatus(stage: LiveStage, now: Long, 
executorIds: String*): Unit = {
 executorIds.foreach { executorId =>
   val executorStageSummary = stage.executorSummary(executorId)
-  executorStageSummary.isBlacklisted = true
+  executorStageSummary.isExcluded = true
+  maybeUpdate(executorStageSummary, now)
+}
+stage.excludedExecutors ++= executorIds
+maybeUpdate(stage, now)
+  }
+
+  private def setStageExcludedStatus(stage: LiveStage, now: Long, executorIds: 
String*): Unit = {
+executorIds.foreach { executorId =>
+  val executorStageSummary = stage.executorSummary(executorId)
+  executorStageSummary.isExcluded = true
   maybeUpdate(executorStageSummary, now)
 }
-stage.blackListedExecutors ++= executorIds
+stage.excludedExecutors ++= executorIds
 maybeUpdate(stage, now)
   }
 
   override def onExecutorUnblacklisted(event: 
SparkListenerExecutorUnblacklisted): Unit = {
-updateBlackListStatus(event.executorId, false)
+updateExclusionStatus(event.executorId, false)
+  }
+
+  override def onExecutorUnexcluded(event: SparkListenerExecutorUnexcluded): 
Unit = {
+updateExclusionStatus(event.executorId, false)
   }
 
   override def onNodeBlacklisted(event: SparkListenerNodeBlacklisted): Unit = {

Review comment:
   Do we need to implement these deprecated methods for internal listeners? 
I assume they are only used for external listeners.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #30026: [SPARK-32978][SQL] Make sure the number of dynamic part metric is correct

2020-10-19 Thread GitBox


LuciferYang commented on a change in pull request #30026:
URL: https://github.com/apache/spark/pull/30026#discussion_r507352925



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteJobStatsTrackerMetricSuite.scala
##
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.{LocalSparkSession, SparkSession}
+
+class BasicWriteJobStatsTrackerMetricSuite extends SparkFunSuite with 
LocalSparkSession {
+
+  test("SPARK-32978: make sure the number of dynamic part metric is correct") {
+try {
+  val partitions = "50"
+  spark = SparkSession.builder().master("local[4]").getOrCreate()
+  val statusStore = spark.sharedState.statusStore
+  val oldExecutionsSize = statusStore.executionsList().size
+
+  spark.sql("create table dynamic_partition(i bigint, part bigint) " +
+"using parquet partitioned by (part)").collect()
+  spark.sql("insert overwrite table dynamic_partition partition(part) " +
+s"select id, id % $partitions as part from range(1)").collect()
+
+  // Wait for listener to finish computing the metrics for the executions.
+  while (statusStore.executionsList().size - oldExecutionsSize < 4 ||
+statusStore.executionsList().last.metricValues == null) {
+Thread.sleep(100)
+  }
+
+  // There should be 4 SQLExecutionUIData in executionsList and the 3rd 
item is we need,

Review comment:
   > why there are 4? is it because of collect?
   
   Yes, without `.collect` should be 2.
   
   > BTW can we call val oldExecutionsSize = statusStore.executionsList().size 
after create table? then we just need to wait for one SQLExecutionUIData.
   
   @cloud-fan Address 15c7519 fix this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HeartSaVioR commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508181273



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   That said, why don't we look up `wrapperMap` before iterating through 
`DriverManager.getDrivers`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712559058







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HeartSaVioR commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508180571



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   I'm actually in favor of this change - DriverRegistry deals with 
wrapping on register, and this will also let DriverRegistry deal with 
unwrapping on get. JdbcUtils no longer needs to know about these details - it 
just needs to know that it should use `DriverRegistry` instead of 
`DriverManager`.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   I'm actually in favor of this change - `DriverRegistry` deals with 
wrapping on register, and this will also let `DriverRegistry` deal with 
unwrapping on get. `JdbcUtils` no longer needs to know about these details - it 
just needs to know that it should use `DriverRegistry` instead of 
`DriverManager`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


SparkQA commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712558282


   **[Test build #130027 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130027/testReport)**
 for PR 29843 at commit 
[`bcd81b7`](https://github.com/apache/spark/commit/bcd81b72b6f13a9ee44d9e1bc83e72b014005f09).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712552167







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


SparkQA commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712552152


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34636/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712551323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r508173722



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+// Add `spark` prefix because it will run in NM in Yarn mode.
+
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+new CacheLoader() {
+  public ShuffleIndexInformation load(File file) throws IOException {
+return new ShuffleIndexInformation(file);
+  }
+};
+indexCache = CacheBuilder.newBuilder()
+.maximumWeight(conf.mergedIndexCacheSize())
+.weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+.build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves
+   * the associated metadata. If 

[GitHub] [spark] SparkQA commented on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


SparkQA commented on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712550608


   **[Test build #130020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130020/testReport)**
 for PR 28026 at commit 
[`c7d5591`](https://github.com/apache/spark/commit/c7d5591c48e219e581d3907463619f642996b2b5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #29818:
URL: https://github.com/apache/spark/pull/29818#issuecomment-712549555







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712549315







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #29818:
URL: https://github.com/apache/spark/pull/29818#issuecomment-712549555







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712549315







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #29818:
URL: https://github.com/apache/spark/pull/29818#issuecomment-712461557


   **[Test build #130021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130021/testReport)**
 for PR 29818 at commit 
[`c114166`](https://github.com/apache/spark/commit/c114166afb682081f99b8893bce60bbd38560b3e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


SparkQA commented on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712549297


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34635/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas

2020-10-19 Thread GitBox


SparkQA commented on pull request #29818:
URL: https://github.com/apache/spark/pull/29818#issuecomment-712548936


   **[Test build #130021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130021/testReport)**
 for PR 29818 at commit 
[`c114166`](https://github.com/apache/spark/commit/c114166afb682081f99b8893bce60bbd38560b3e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30024: [SPARK-32229][SQL]Fix PostgresConnectionProvider and MSSQLConnectionProvider by accessing wrapped driver

2020-10-19 Thread GitBox


HyukjinKwon commented on a change in pull request #30024:
URL: https://github.com/apache/spark/pull/30024#discussion_r508171138



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala
##
@@ -58,5 +59,15 @@ object DriverRegistry extends Logging {
   }
 }
   }
+
+  def get(className: String): Driver = {

Review comment:
   The change seems okay but why do we need to do? It just moves the codes 
around.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-712536768


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130026/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-712536767


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-712502294


   **[Test build #130026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130026/testReport)**
 for PR 30062 at commit 
[`fbdd333`](https://github.com/apache/spark/commit/fbdd33385083adb4be83adf46cd518d519650307).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r508170165



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+// Add `spark` prefix because it will run in NM in Yarn mode.
+
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+new CacheLoader() {
+  public ShuffleIndexInformation load(File file) throws IOException {
+return new ShuffleIndexInformation(file);
+  }
+};
+indexCache = CacheBuilder.newBuilder()
+.maximumWeight(conf.mergedIndexCacheSize())
+.weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+.build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves
+   * the associated metadata. If 

[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r508169611



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap 
partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final String relativeMergeDirPathPattern;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf, String 
relativeMergeDirPathPattern) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+// Add `spark` prefix because it will run in NM in Yarn mode.
+
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+new CacheLoader() {
+  public ShuffleIndexInformation load(File file) throws IOException {
+return new ShuffleIndexInformation(file);
+  }
+};
+indexCache = CacheBuilder.newBuilder()
+.maximumWeight(conf.mergedIndexCacheSize())
+.weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+.build(indexCacheLoader);
+this.relativeMergeDirPathPattern = relativeMergeDirPathPattern;
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given an ID that uniquely identifies a given shuffle partition of an 
application, retrieves
+   * the associated metadata. If 

[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


SparkQA commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712545891


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34636/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()

2020-10-19 Thread GitBox


HyukjinKwon closed pull request #29831:
URL: https://github.com/apache/spark/pull/29831


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #29831:
URL: https://github.com/apache/spark/pull/29831#issuecomment-712543428


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon edited a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024


   > BTW I'm still not sure why my PR will trigger the YARN/Python test 
failures - seems it shouldn't be related.
   
   This is because the regular test cases do not trigger the Yarn test cases. I 
am sure it was already broken before (in Jenkins). The relevant YARN test cases 
are triggered when the PR has some changes _only in YARN side_. See also 
https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615
   
   See also 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130016/testReport/
 at https://github.com/apache/spark/pull/29906
   
   cc @tgravescs FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon edited a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024


   > BTW I'm still not sure why my PR will trigger the YARN/Python test 
failures - seems it shouldn't be related.
   
   This is because the regular test cases do not trigger the Yarn test cases. I 
am sure it was already broken before (in Jenkins). The relevant YARN test cases 
are triggered when the PR has some changes _only in YARN side_. See also 
https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615
   
   See also 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130016/testReport/
 at https://github.com/apache/spark/pull/29906



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


SparkQA commented on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712542212


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34635/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon edited a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024


   > BTW I'm still not sure why my PR will trigger the YARN/Python test 
failures - seems it shouldn't be related.
   
   This is because the regular test cases do not trigger the Yarn test cases. I 
am sure it was already broken before (in Jenkins). The relevant YARN test cases 
are triggered when the PR has some changes _only in YARN side_. See also 
https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712540024


   > BTW I'm still not sure why my PR will trigger the YARN/Python test 
failures - seems it shouldn't be related.
   
   This is because the regular test cases do not trigger the Yarn test cases. I 
am sure it was already broken before. The relevant YARN test cases are 
triggered when the PR has some changes _only in YARN side_. See also 
https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/dev/sparktestsupport/modules.py#L615
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-19 Thread GitBox


HyukjinKwon commented on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-712538725


   @sunchao, I more meant: from my observation, looks like Jenkins' `python` 
executable is 2 in Jenkins. Yes, so looks like we should probably switch 
`python` to Python 3 in Jenkins, cc @shaneknapp. 
   
   Usually I would prefer to separate env issues from the codes so we can run 
separately. Also, from what I know Shane is busy for training to have a backup 
engineer right now.
   
   So, I think changing to `python3` seems fine for the time being as a 
workaround.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-19 Thread GitBox


viirya commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508160798



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] {
 case s @ Sort(orders, _, child) if orders.isEmpty || 
orders.exists(_.child.foldable) =>
   val newOrders = orders.filterNot(_.child.foldable)
   if (newOrders.isEmpty) child else s.copy(order = newOrders)
-case Sort(orders, true, child) if 
SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
-  child
+case s @ Sort(orders, global, child)
+if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
+  (global, child) match {
+case (false, _) => child
+case (true, r: Range) => r

Review comment:
   This assumes we know Range's global ordering in advance. This seems to 
leak physical stuff into the Optimizer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-712536767







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


SparkQA commented on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-712536396


   **[Test build #130026 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130026/testReport)**
 for PR 30062 at commit 
[`fbdd333`](https://github.com/apache/spark/commit/fbdd33385083adb4be83adf46cd518d519650307).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

2020-10-19 Thread GitBox


viirya commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508159260



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] {
 case s @ Sort(orders, _, child) if orders.isEmpty || 
orders.exists(_.child.foldable) =>
   val newOrders = orders.filterNot(_.child.foldable)
   if (newOrders.isEmpty) child else s.copy(order = newOrders)
-case Sort(orders, true, child) if 
SortOrder.orderingSatisfies(child.outputOrdering, orders) =>

Review comment:
   This was added in 2.4. 
   
   cc @cloud-fan 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712532723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


AmplabJenkins commented on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712532723







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712529719


   **[Test build #130029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130029/testReport)**
 for PR 30095 at commit 
[`dbb8111`](https://github.com/apache/spark/commit/dbb811140f80a30ea5b781e266443c7a76483564).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


SparkQA commented on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712532633


   **[Test build #130029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130029/testReport)**
 for PR 30095 at commit 
[`dbb8111`](https://github.com/apache/spark/commit/dbb811140f80a30ea5b781e266443c7a76483564).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30057: [SPARK-32838][SQL]Check DataSource insert command path with actual path

2020-10-19 Thread GitBox


SparkQA commented on pull request #30057:
URL: https://github.com/apache/spark/pull/30057#issuecomment-712531902


   **[Test build #130030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130030/testReport)**
 for PR 30057 at commit 
[`351af96`](https://github.com/apache/spark/commit/351af96604aa50cee0197994dcbb6f91a6994304).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712517112


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130024/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28026: [SPARK-31257][SQL] Unify create table syntax

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #28026:
URL: https://github.com/apache/spark/pull/28026#issuecomment-712517108


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30095: [SPARK-33181][SQL][DOCS] Document Load Table Directly from File in SQL Select Reference

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30095:
URL: https://github.com/apache/spark/pull/30095#issuecomment-712475253


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-10-19 Thread GitBox


AmplabJenkins removed a comment on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-712518597







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29818: [SPARK-32953][PYTHON] Add Arrow self_destruct support to toPandas

2020-10-19 Thread GitBox


SparkQA removed a comment on pull request #29818:
URL: https://github.com/apache/spark/pull/29818#issuecomment-712398796


   **[Test build #130015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130015/testReport)**
 for PR 29818 at commit 
[`1b875c1`](https://github.com/apache/spark/commit/1b875c19e3c6318fcb26c1ea62b5397d6e75d1f8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >