[GitHub] [spark] AmplabJenkins removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706913968


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #29975:
URL: https://github.com/apache/spark/pull/29975#issuecomment-706811835


   **[Test build #129650 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129650/testReport)**
 for PR 29975 at commit 
[`2414bb0`](https://github.com/apache/spark/commit/2414bb041f47c00de28428014372ab6c55435e56).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706886780


   **[Test build #129666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129666/testReport)**
 for PR 29893 at commit 
[`c76a2aa`](https://github.com/apache/spark/commit/c76a2aacb86411ef9d0c58ac996b274dce49beb7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #30010:
URL: https://github.com/apache/spark/pull/30010#issuecomment-706914459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29975:
URL: https://github.com/apache/spark/pull/29975#issuecomment-706913861







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706849086


   **[Test build #129662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129662/testReport)**
 for PR 29881 at commit 
[`9ee1a86`](https://github.com/apache/spark/commit/9ee1a861e650f716870befcefd9820dc8425b9ae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706911022







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


SparkQA commented on pull request #30010:
URL: https://github.com/apache/spark/pull/30010#issuecomment-706914439


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34269/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #30010:
URL: https://github.com/apache/spark/pull/30010#issuecomment-706914459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706913968







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29975:
URL: https://github.com/apache/spark/pull/29975#issuecomment-706913861







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


SparkQA commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706913662


   **[Test build #129666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129666/testReport)**
 for PR 29893 at commit 
[`c76a2aa`](https://github.com/apache/spark/commit/c76a2aacb86411ef9d0c58ac996b274dce49beb7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-10-11 Thread GitBox


HeartSaVioR commented on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-706912198


   You're right. Would you like to go with revert & another PR, or it's just 
for information?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


SparkQA commented on pull request #29975:
URL: https://github.com/apache/spark/pull/29975#issuecomment-706912508


   **[Test build #129650 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129650/testReport)**
 for PR 29975 at commit 
[`2414bb0`](https://github.com/apache/spark/commit/2414bb041f47c00de28428014372ab6c55435e56).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-10-11 Thread GitBox


HeartSaVioR edited a comment on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-706912198


   You're right. Would you like to go with revert & another PR, or it's just 
for information? Either is fine for me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


SparkQA commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706911985


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34270/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706911022







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


SparkQA commented on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706910030


   **[Test build #129662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129662/testReport)**
 for PR 29881 at commit 
[`9ee1a86`](https://github.com/apache/spark/commit/9ee1a861e650f716870befcefd9820dc8425b9ae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30011: [WIP][SPARK-32281][SQL] Spark keep SORTED spec in metastore

2020-10-11 Thread GitBox


SparkQA commented on pull request #30011:
URL: https://github.com/apache/spark/pull/30011#issuecomment-706909487


   **[Test build #129669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129669/testReport)**
 for PR 30011 at commit 
[`9364dd9`](https://github.com/apache/spark/commit/9364dd9699f2a756c191389a6923be70c6e22759).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-706907596


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34271/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #30011: [WIP][SPARK-32281][SQL] Spark keep SORTED spec in metastore

2020-10-11 Thread GitBox


AngersZh opened a new pull request #30011:
URL: https://github.com/apache/spark/pull/30011


   ### What changes were proposed in this pull request?
   Since Spark only support all ASC ordering bucket cols, and in `CatalogTable` 
meta it lose hive BUCKET sort info, after some command, spark update metadata, 
then SORT meta loss.
   
   
   ### Why are the changes needed?
   Spark shouldn't change Hive table meta of SORT BY
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   WIP



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on pull request #29890: [SPARK-33014][SQL] Support multiple bucket columns in DataSourceV2 table

2020-10-11 Thread GitBox


LantaoJin commented on pull request #29890:
URL: https://github.com/apache/spark/pull/29890#issuecomment-706906267


   gentle ping @cloud-fan @HyukjinKwon @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706905860


   > > found two more place return Double.NaN with DivideByZero case. should I 
update these as well??
   > > and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ??
   > 
   > Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`?
   
   change into spark.sql.legacy.statisticalAggregate



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


leanken edited a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706905860


   > > found two more place return Double.NaN with DivideByZero case. should I 
update these as well??
   > > and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ??
   > 
   > Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`?
   
   change into spark.sql.legacy.statisticalAggregate
   all these methods are statistical aggregation function



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503069104



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-// Moved because:
-// - Spark uses a different default stddev (sample instead of pop)
-// - Tiny numerical differences in stddev results.
-// - Different StdDev behavior when n=1 (NaN instead of 0)
-checkAnswer(sql(s"""
-  |select  p_mfgr,p_name, p_size,
-  |rank() over(distribute by p_mfgr sort by p_name) as r,
-  |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-  |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-  |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-  |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-  |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-  |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-  |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-  |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-  |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-  |first_value(p_size) over w1  as fvW1
-  |from part
-  |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-  | rows between 2 preceding and 2 following)
+withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {

Review comment:
   done.

##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-// Moved because:
-// - Spark uses a different default stddev (sample instead of pop)
-// - Tiny numerical differences in stddev results.
-// - Different StdDev behavior when n=1 (NaN instead of 0)
-checkAnswer(sql(s"""
-  |select  p_mfgr,p_name, p_size,
-  |rank() over(distribute by p_mfgr sort by p_name) as r,
-  |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-  |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-  |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-  |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-  |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-  |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-  |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-  |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-  |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-  |first_value(p_size) over w1  as fvW1
-  |from part
-  |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-  | rows between 2 preceding and 2 following)
+withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {
+  // Moved because:
+  // - Spark uses a different default stddev (sample instead of pop)
+  // - Tiny numerical differences in stddev results.
+  // - Different StdDev behavior when n=1 (NaN instead of 0)

Review comment:
   done

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
##
@@ -174,7 +175,9 @@ case class StddevSamp(child: Expression) extends 
CentralMomentAgg(child) {
 
   override val evaluateExpression: Expression = {
 If(n === 0.0, Literal.create(null, DoubleType),
-  If(n === 1.0, Double.NaN, sqrt(m2 / (n - 1.0
+  If(n === 1.0,
+if (SQLConf.get.legacyCentralMomentAgg) Double.NaN else 
Literal.create(null, DoubleType),

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


SparkQA commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706905385


   **[Test build #129668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129668/testReport)**
 for PR 29983 at commit 
[`16bc5d5`](https://github.com/apache/spark/commit/16bc5d5c4310452fcbd0dea6e7e19e2fcbd10442).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


SparkQA commented on pull request #30010:
URL: https://github.com/apache/spark/pull/30010#issuecomment-706904773


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34269/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29880: [SPARK-33004][SQL] Migrate DESCRIBE column to use UnresolvedTableOrView to resolve the identifier

2020-10-11 Thread GitBox


cloud-fan commented on a change in pull request #29880:
URL: https://github.com/apache/spark/pull/29880#discussion_r503068179



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -315,6 +315,17 @@ case class DescribeRelation(
   override def output: Seq[Attribute] = 
DescribeTableSchema.describeTableAttributes()
 }
 
+/**
+ * The logical plan of the DESCRIBE relation_name col_name command that works 
for v2 tables.
+ */
+case class DescribeColumn(
+relation: LogicalPlan,
+colNameParts: Seq[String],

Review comment:
   Yea we can't break it. When we add v2 `DecribeColumExec` later, we may 
want to change `Seq[String]` to `Attribute` here, so that we can unify the 
column resolution logic for v1 and  v2 commands.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29965:
URL: https://github.com/apache/spark/pull/29965#issuecomment-706898226







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2020-10-11 Thread GitBox


SparkQA commented on pull request #29965:
URL: https://github.com/apache/spark/pull/29965#issuecomment-706898191


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34268/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29965:
URL: https://github.com/apache/spark/pull/29965#issuecomment-706898226







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-706895987







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-706895987







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-706895485


   **[Test build #129667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129667/testReport)**
 for PR 27735 at commit 
[`20fe56f`](https://github.com/apache/spark/commit/20fe56fbe80b3e2714dcd4fcae7ebfceb95e5443).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-706886873


   **[Test build #129667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129667/testReport)**
 for PR 27735 at commit 
[`20fe56f`](https://github.com/apache/spark/commit/20fe56fbe80b3e2714dcd4fcae7ebfceb95e5443).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ScrapCodes commented on a change in pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


ScrapCodes commented on a change in pull request #27735:
URL: https://github.com/apache/spark/pull/27735#discussion_r503061670



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountConfDirExecutorFeatureStep.scala
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.features
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Constants.{SPARK_CONF_DIR_INTERNAL, 
SPARK_CONF_VOL_EXEC}
+import org.apache.spark.deploy.k8s.submit.KubernetesClientUtils
+
+/**
+ * Populate the contents of SPARK_CONF_DIR, except spark-(conf|properties) and 
templates to
+ * configMap to be mounted on executor pods.
+ */
+private[spark] class MountConfDirExecutorFeatureStep(conf: KubernetesConf)

Review comment:
   Hi @dongjoon-hyun , thanks for taking a look.
   It can be merged with `BasicExecutorFeatureStep`, I found that step is 
already doing a lot of stuff.
   
Do you think keeping it like this improves readability, I have no strong 
preference one way or the other.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #30008:
URL: https://github.com/apache/spark/pull/30008#issuecomment-706888990







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #30008:
URL: https://github.com/apache/spark/pull/30008#issuecomment-706888990







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


dongjoon-hyun commented on pull request #30010:
URL: https://github.com/apache/spark/pull/30010#issuecomment-706888560


   Thank you for review and approval, @HyukjinKwon and @maropu .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #30008:
URL: https://github.com/apache/spark/pull/30008#issuecomment-706823341


   **[Test build #129654 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129654/testReport)**
 for PR 30008 at commit 
[`1161bd0`](https://github.com/apache/spark/commit/1161bd073e9fd4012903fd8fe05d62b198c60824).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue

2020-10-11 Thread GitBox


SparkQA commented on pull request #30008:
URL: https://github.com/apache/spark/pull/30008#issuecomment-706887924


   **[Test build #129654 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129654/testReport)**
 for PR 30008 at commit 
[`1161bd0`](https://github.com/apache/spark/commit/1161bd073e9fd4012903fd8fe05d62b198c60824).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


SparkQA commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706886780


   **[Test build #129666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129666/testReport)**
 for PR 29893 at commit 
[`c76a2aa`](https://github.com/apache/spark/commit/c76a2aacb86411ef9d0c58ac996b274dce49beb7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-10-11 Thread GitBox


SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-706886873


   **[Test build #129667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129667/testReport)**
 for PR 27735 at commit 
[`20fe56f`](https://github.com/apache/spark/commit/20fe56fbe80b3e2714dcd4fcae7ebfceb95e5443).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2020-10-11 Thread GitBox


SparkQA commented on pull request #29965:
URL: https://github.com/apache/spark/pull/29965#issuecomment-706886426


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34268/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-10-11 Thread GitBox


Ngone51 commented on a change in pull request #29855:
URL: https://github.com/apache/spark/pull/29855#discussion_r503055352



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.net.ConnectException;
+
+/**
+ * Plugs into {@link RetryingBlockFetcher} to further control when an 
exception should be retried
+ * and logged.
+ * Note: {@link RetryingBlockFetcher} will delegate the exception to this 
handler only when
+ * - remaining retries < max retries
+ * - exception is an IOException
+ */
+
+public interface ErrorHandler {
+
+  boolean shouldRetryError(Throwable t);
+
+  default boolean shouldLogError(Throwable t) {
+return true;
+  }
+
+  /**
+   * A no-op error handler instance.
+   */
+  ErrorHandler NOOP_ERROR_HANDLER = t -> true;
+
+  /**
+   * The error handler for pushing shuffle blocks to remote shuffle services.
+   */
+  class BlockPushErrorHandler implements ErrorHandler {
+/**
+ * String constant used for generating exception messages indicating a 
block to be merged
+ * arrives too late on the server side, and also for later checking such 
exceptions on the
+ * client side. When we get a block push failure because of the block 
arrives too late, we
+ * will not retry pushing the block nor log the exception on the client 
side.
+ */
+public static final String TOO_LATE_MESSAGE_SUFFIX =
+"received after merged shuffle is finalized";
+
+/**
+ * String constant used for generating exception messages indicating the 
server couldn't
+ * append a block after all available attempts due to collision with other 
blocks belonging

Review comment:
   BTW, how do you know the partition info of a certain shuffle block? Do 
you use the `mapId` of a shuffle block? If so, I think you should be aware of 
the new shuffle fetch protocol in 3.0, that we use `taskAttemptId`(which is 
unique among multiple task attempts) instead of `partitionId` as `mapId` by 
default. This might break the current assumption of preventing the collision.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement

2020-10-11 Thread GitBox


yaooqinn commented on pull request #29893:
URL: https://github.com/apache/spark/pull/29893#issuecomment-706885025


   Thanks for reviewing, sorry for the late reply as I spent a long holiday 
with my family. I've addressed the comment, please check again @gatorsmile 
@cloud-fan @wangyum thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706879549







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29644:
URL: https://github.com/apache/spark/pull/29644#issuecomment-706879116







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #29644:
URL: https://github.com/apache/spark/pull/29644#issuecomment-706823356


   **[Test build #129656 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129656/testReport)**
 for PR 29644 at commit 
[`a0bdce9`](https://github.com/apache/spark/commit/a0bdce946d8d5df9a9dcf480e4a7a00ff43a0335).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-10-11 Thread GitBox


cloud-fan commented on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-706883233


   We need to update the PR title, it's saveAsTable not table.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


SparkQA commented on pull request #30010:
URL: https://github.com/apache/spark/pull/30010#issuecomment-706880332


   **[Test build #129665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129665/testReport)**
 for PR 30010 at commit 
[`aa7004e`](https://github.com/apache/spark/commit/aa7004e44a948ad86f6756e6ca3085e75462ee47).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


cloud-fan commented on a change in pull request #29975:
URL: https://github.com/apache/spark/pull/29975#discussion_r503051694



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
 
 case ref: BoundReference if ctx.currentVars != null &&
 ctx.currentVars(ref.ordinal) != null =>
-  val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
-  collectLocalVariable(value)
-  collectLocalVariable(isNull)
+  val exprCode = ctx.currentVars(ref.ordinal)
+  // If the referred variable is not evaluated yet.
+  if (exprCode.code != EmptyBlock) {
+exprCodesNeedEvaluate += exprCode.copy()

Review comment:
   Sorry I was wrong, the copy here is necessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


viirya commented on a change in pull request #29975:
URL: https://github.com/apache/spark/pull/29975#discussion_r503051291



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
 
 case ref: BoundReference if ctx.currentVars != null &&
 ctx.currentVars(ref.ordinal) != null =>
-  val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
-  collectLocalVariable(value)
-  collectLocalVariable(isNull)
+  val exprCode = ctx.currentVars(ref.ordinal)
+  // If the referred variable is not evaluated yet.
+  if (exprCode.code != EmptyBlock) {
+exprCodesNeedEvaluate += exprCode.copy()

Review comment:
   `exprCode.code = EmptyBlock`, do you mean this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29544: [SPARK-32704][SQL] Logging plan changes for execution

2020-10-11 Thread GitBox


maropu commented on a change in pull request #29544:
URL: https://github.com/apache/spark/pull/29544#discussion_r503051134



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -229,17 +229,17 @@ object SQLConf {
 "'trace', 'debug', 'info', 'warn' and 'error'.")
 .createWithDefault("trace")
 
-  val OPTIMIZER_PLAN_CHANGE_LOG_RULES = 
buildConf("spark.sql.optimizer.planChangeLog.rules")
+  val PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.planChangeLog.rules")
 .internal()
-.doc("Configures a list of rules to be logged in the optimizer, in which 
the rules are " +
+.doc("Configures a list of rules for logging plan changes, in which the 
rules are " +
   "specified by their rule names and separated by comma.")
 .version("3.0.0")
 .stringConf
 .createOptional
 
-  val OPTIMIZER_PLAN_CHANGE_LOG_BATCHES = 
buildConf("spark.sql.optimizer.planChangeLog.batches")
+  val PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.planChangeLog.batches")
 .internal()
-.doc("Configures a list of batches to be logged in the optimizer, in which 
the batches " +
+.doc("Configures a list of batches for logging plan changes, in which the 
batches " +
   "are specified by their batch names and separated by comma.")
 .version("3.0.0")

Review comment:
   Ah, I missed it. I'll update it in followup.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -229,17 +229,17 @@ object SQLConf {
 "'trace', 'debug', 'info', 'warn' and 'error'.")
 .createWithDefault("trace")
 
-  val OPTIMIZER_PLAN_CHANGE_LOG_RULES = 
buildConf("spark.sql.optimizer.planChangeLog.rules")
+  val PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.planChangeLog.rules")
 .internal()
-.doc("Configures a list of rules to be logged in the optimizer, in which 
the rules are " +
+.doc("Configures a list of rules for logging plan changes, in which the 
rules are " +
   "specified by their rule names and separated by comma.")
 .version("3.0.0")
 .stringConf
 .createOptional
 
-  val OPTIMIZER_PLAN_CHANGE_LOG_BATCHES = 
buildConf("spark.sql.optimizer.planChangeLog.batches")
+  val PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.planChangeLog.batches")
 .internal()
-.doc("Configures a list of batches to be logged in the optimizer, in which 
the batches " +
+.doc("Configures a list of batches for logging plan changes, in which the 
batches " +
   "are specified by their batch names and separated by comma.")
 .version("3.0.0")

Review comment:
   Ah, I missed it. I'll update it in followup. Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


SparkQA commented on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706879523


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34267/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706879549







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29644:
URL: https://github.com/apache/spark/pull/29644#issuecomment-706879116







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec

2020-10-11 Thread GitBox


cloud-fan commented on a change in pull request #29975:
URL: https://github.com/apache/spark/pull/29975#discussion_r503050704



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##
@@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging {
 
 case ref: BoundReference if ctx.currentVars != null &&
 ctx.currentVars(ref.ordinal) != null =>
-  val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal)
-  collectLocalVariable(value)
-  collectLocalVariable(isNull)
+  val exprCode = ctx.currentVars(ref.ordinal)
+  // If the referred variable is not evaluated yet.
+  if (exprCode.code != EmptyBlock) {
+exprCodesNeedEvaluate += exprCode.copy()

Review comment:
   > We need to empty code of exprCode so we don't re-evaluate the code.
   
   AFAIK when we empty the code, we also do a copy, right?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6

2020-10-11 Thread GitBox


dongjoon-hyun opened a new pull request #30010:
URL: https://github.com/apache/spark/pull/30010


   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade ZStandard library for Apache Spark 3.1.0.
   
   ### Why are the changes needed?
   
   This will bring the latest bug fixes.
   - 
https://github.com/luben/zstd-jni/commit/2662fbdc320ce482a24c20b8fcac8b1d5b79fe33
   - 
https://github.com/luben/zstd-jni/commit/bbe140b758be2e0ba64566e16d44cafd6e4ba142
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the CI.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode

2020-10-11 Thread GitBox


SparkQA commented on pull request #29644:
URL: https://github.com/apache/spark/pull/29644#issuecomment-706878158


   **[Test build #129656 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129656/testReport)**
 for PR 29644 at commit 
[`a0bdce9`](https://github.com/apache/spark/commit/a0bdce946d8d5df9a9dcf480e4a7a00ff43a0335).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #29544: [SPARK-32704][SQL] Logging plan changes for execution

2020-10-11 Thread GitBox


gatorsmile commented on a change in pull request #29544:
URL: https://github.com/apache/spark/pull/29544#discussion_r503049350



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -229,17 +229,17 @@ object SQLConf {
 "'trace', 'debug', 'info', 'warn' and 'error'.")
 .createWithDefault("trace")
 
-  val OPTIMIZER_PLAN_CHANGE_LOG_RULES = 
buildConf("spark.sql.optimizer.planChangeLog.rules")
+  val PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.planChangeLog.rules")
 .internal()
-.doc("Configures a list of rules to be logged in the optimizer, in which 
the rules are " +
+.doc("Configures a list of rules for logging plan changes, in which the 
rules are " +
   "specified by their rule names and separated by comma.")
 .version("3.0.0")
 .stringConf
 .createOptional
 
-  val OPTIMIZER_PLAN_CHANGE_LOG_BATCHES = 
buildConf("spark.sql.optimizer.planChangeLog.batches")
+  val PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.planChangeLog.batches")
 .internal()
-.doc("Configures a list of batches to be logged in the optimizer, in which 
the batches " +
+.doc("Configures a list of batches for logging plan changes, in which the 
batches " +
   "are specified by their batch names and separated by comma.")
 .version("3.0.0")

Review comment:
   The version of these three configurations needs to be updated to 3.1.0? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-706875853







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706875583







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-706875853







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706875583







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


SparkQA commented on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706875569


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34266/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-706821615


   **[Test build #129652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129652/testReport)**
 for PR 29817 at commit 
[`17bbbd9`](https://github.com/apache/spark/commit/17bbbd9461a898c898b87635e4167ef5e01360d2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-10-11 Thread GitBox


SparkQA commented on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-706874710


   **[Test build #129652 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129652/testReport)**
 for PR 29817 at commit 
[`17bbbd9`](https://github.com/apache/spark/commit/17bbbd9461a898c898b87635e4167ef5e01360d2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706874052







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706874052







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706855156


   **[Test build #129663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129663/testReport)**
 for PR 29933 at commit 
[`942ce58`](https://github.com/apache/spark/commit/942ce58331e5c535f7e3f1885f4caf3a93eb176d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


SparkQA commented on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706873675


   **[Test build #129663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129663/testReport)**
 for PR 29933 at commit 
[`942ce58`](https://github.com/apache/spark/commit/942ce58331e5c535f7e3f1885f4caf3a93eb176d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-10-11 Thread GitBox


SparkQA commented on pull request #29933:
URL: https://github.com/apache/spark/pull/29933#issuecomment-706871871


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34267/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


leanken commented on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706869405


   > > Seems plausible to me.
   > 
   > ```
   > case class CovSample(left: Expression, right: Expression) extends 
Covariance(left, right) {
   >   override val evaluateExpression: Expression = {
   > If(n === 0.0, Literal.create(null, DoubleType),
   >   If(n === 1.0, Double.NaN, ck / (n - 1.0)))
   >   }
   > 
   > case class Corr(x: Expression, y: Expression)
   >   extends PearsonCorrelation(x, y) {
   > 
   >   override val evaluateExpression: Expression = {
   > If(n === 0.0, Literal.create(null, DoubleType),
   >   If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk)))
   >   }
   > ```
   > 
   > found two more place return Double.NaN with DivideByZero case. should I 
update these as well??
   > and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ??
   
   @cloud-fan how about CovSample and Corr ?? should I update them as well in 
these patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


SparkQA commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706869300


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34265/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706869315







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706869315







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


leanken edited a comment on pull request #29983:
URL: https://github.com/apache/spark/pull/29983#issuecomment-706869405


   > > Seems plausible to me.
   > 
   > ```
   > case class CovSample(left: Expression, right: Expression) extends 
Covariance(left, right) {
   >   override val evaluateExpression: Expression = {
   > If(n === 0.0, Literal.create(null, DoubleType),
   >   If(n === 1.0, Double.NaN, ck / (n - 1.0)))
   >   }
   > 
   > case class Corr(x: Expression, y: Expression)
   >   extends PearsonCorrelation(x, y) {
   > 
   >   override val evaluateExpression: Expression = {
   > If(n === 0.0, Literal.create(null, DoubleType),
   >   If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk)))
   >   }
   > ```
   > 
   > found two more place return Double.NaN with DivideByZero case. should I 
update these as well??
   > and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ??
   
   @cloud-fan how about CovSample and Corr ?? should I update them as well in 
this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


SparkQA commented on pull request #29881:
URL: https://github.com/apache/spark/pull/29881#issuecomment-706868381


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34266/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2020-10-11 Thread GitBox


SparkQA commented on pull request #29965:
URL: https://github.com/apache/spark/pull/29965#issuecomment-706867654


   **[Test build #129664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129664/testReport)**
 for PR 29965 at commit 
[`77923c5`](https://github.com/apache/spark/commit/77923c5542ba762646c8a56fe97a1a2e3f553fab).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero

2020-10-11 Thread GitBox


leanken commented on a change in pull request #29983:
URL: https://github.com/apache/spark/pull/29983#discussion_r503043290



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala
##
@@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
   }
 
   test("windowing.q -- 15. testExpressions") {
-// Moved because:
-// - Spark uses a different default stddev (sample instead of pop)
-// - Tiny numerical differences in stddev results.
-// - Different StdDev behavior when n=1 (NaN instead of 0)
-checkAnswer(sql(s"""
-  |select  p_mfgr,p_name, p_size,
-  |rank() over(distribute by p_mfgr sort by p_name) as r,
-  |dense_rank() over(distribute by p_mfgr sort by p_name) as dr,
-  |cume_dist() over(distribute by p_mfgr sort by p_name) as cud,
-  |percent_rank() over(distribute by p_mfgr sort by p_name) as pr,
-  |ntile(3) over(distribute by p_mfgr sort by p_name) as nt,
-  |count(p_size) over(distribute by p_mfgr sort by p_name) as ca,
-  |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg,
-  |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st,
-  |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv,
-  |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv,
-  |first_value(p_size) over w1  as fvW1
-  |from part
-  |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name
-  | rows between 2 preceding and 2 following)
+withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") {

Review comment:
   I see your point. the legacy coverage code will only stay in 
DataFrameWindowFunctionsSuite, other UT will just be updated to expected answer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #30005:
URL: https://github.com/apache/spark/pull/30005#issuecomment-706865981


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129655/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on a change in pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2020-10-11 Thread GitBox


leanken commented on a change in pull request #29965:
URL: https://github.com/apache/spark/pull/29965#discussion_r503042637



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
##
@@ -341,7 +341,7 @@ class SQLAppStatusListener(
 
 val exec = getOrCreateExecution(executionId)
 exec.physicalPlanDescription = physicalPlanDescription
-exec.metrics = sqlPlanMetrics
+exec.metrics = exec.metrics ++ sqlPlanMetrics

Review comment:
   done

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala
##
@@ -754,8 +754,6 @@ class SQLAppStatusListenerSuite extends SharedSparkSession 
with JsonTestUtils
   SparkPlanGraph(newPlan)
 .allNodes.flatMap(_.metrics.map(_.accumulatorId))
 
-// Assume that AQE update sparkPlanInfo with newPlan
-// ExecutionMetrics will be replaced using newPlan's SQLMetrics

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #30005:
URL: https://github.com/apache/spark/pull/30005#issuecomment-706865978







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #30005:
URL: https://github.com/apache/spark/pull/30005#issuecomment-706865978


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier

2020-10-11 Thread GitBox


SparkQA removed a comment on pull request #30005:
URL: https://github.com/apache/spark/pull/30005#issuecomment-706823328


   **[Test build #129655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129655/testReport)**
 for PR 30005 at commit 
[`286ccc5`](https://github.com/apache/spark/commit/286ccc5d3025b6ff2da34c22e670746fd92e21f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier

2020-10-11 Thread GitBox


SparkQA commented on pull request #30005:
URL: https://github.com/apache/spark/pull/30005#issuecomment-706865108


   **[Test build #129655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129655/testReport)**
 for PR 30005 at commit 
[`286ccc5`](https://github.com/apache/spark/commit/286ccc5d3025b6ff2da34c22e670746fd92e21f4).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


AngersZh commented on a change in pull request #29881:
URL: https://github.com/apache/spark/pull/29881#discussion_r503041748



##
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
##
@@ -396,6 +417,93 @@ private[spark] object HiveUtils extends Logging {
 config = configurations,
 barrierPrefixes = hiveMetastoreBarrierPrefixes,
 sharedPrefixes = hiveMetastoreSharedPrefixes)
+} else if (hiveMetastoreJars == "path") {
+
+  def addLocalHiveJars(file: File): Seq[URL] = {
+if (file.getName == "*") {
+  val files = file.getParentFile.listFiles()
+  if (files == null) {
+logWarning(s"Hive jar path '${file.getPath}' does not exist.")
+Nil
+  } else {
+
files.filter(_.getName.toLowerCase(Locale.ROOT).endsWith(".jar")).map(_.toURL).toSeq
+  }
+} else {
+  file.toURL :: Nil
+}
+  }
+
+  def checkRemoteHiveJars(path: String): Seq[URL] = {
+try {
+  val hadoopPath = new Path(path)
+  val fs = hadoopPath.getFileSystem(hadoopConf)
+  if (hadoopPath.getName == "*") {
+val parent = hadoopPath.getParent
+if (!fs.exists(parent)) {
+  logWarning(s"Hive Jar ${path} does not exist.")
+  Nil
+} else if (!fs.getFileStatus(parent).isDirectory) {
+  logWarning(s"Hive Jar ${parent} is not a directory.")
+  Nil
+} else {
+  fs.listStatus(parent).map(_.getPath.toUri.toURL)
+}
+  } else {
+if (!fs.exists(hadoopPath)) {
+  logWarning(s"Hive Jar ${path} does not exist.")
+  Nil
+} else if (fs.getFileStatus(hadoopPath).isDirectory) {
+  logWarning(s"Hive Jar ${path} not allow directory without `*`")
+  Nil
+} else {
+  // Since tar/tar.gz file we can't know it's final path yet, not 
support it
+  hadoopPath.toUri.toURL :: Nil

Review comment:
   > I see. So Hadoop side has a `URLStreamHandlerFactory` impl which 
provides handlers for the Hadoop-specific URL schemes. Not sure about the S3 
error though :-/
   
   Yea,  dig into the code, seems s3 find config will call URLClassLoader and 
not equal make it reload element again and again.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706863643


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34264/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706863638


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


AmplabJenkins commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706863638







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


SparkQA commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706863628


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34264/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile edited a comment on pull request #29552: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash

2020-10-11 Thread GitBox


gatorsmile edited a comment on pull request #29552:
URL: https://github.com/apache/spark/pull/29552#issuecomment-706862532


   I like the concept of Trash, but I think this PR might just resolve a very 
specific issue by introducing a mechanism without a proper design doc. This 
could make the usage more complex.
   
   I think we need to consider the big picture. Trash directory is an important 
concept. If we decide to introduce it, we should consider all the code paths of 
Spark SQL that could delete the data, instead of Truncate only. We also need to 
consider what is the current behavior if the underlying file system does not 
provide the API `Trash.moveToAppropriateTrash`. Is the exception good? How 
about the performance when users are using the object store instead of HDFS? 
Will it impact the GDPR compliance? 
   
   I think we should not merge the PR without the design doc and implementation 
plan. Above just lists some questions I have. The other community members might 
have more relevant questions/issues we need to resolve. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on pull request #29552: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash

2020-10-11 Thread GitBox


gatorsmile commented on pull request #29552:
URL: https://github.com/apache/spark/pull/29552#issuecomment-706862532


   I like the concept of Trash, but I think this PR might just resolve a very 
specific issue by introducing a mechanism without a proper design doc. This 
could make the usage more complex.
   
   I think we need to consider the big picture. Trash directory is an important 
concept. If we decide to introduce it, we should consider all the code paths of 
Spark SQL that could delete the data, instead of Truncate only. We also need to 
consider what is the current behavior if the underlying file system does not 
provide the API `Trash.moveToAppropriateTrash`. Is the exception good? How 
about the performance when users are using the object store instead of HDFS? 
Will it impact the GDPR compliance? 
   
   Without resolving the above issues, I think we should not merge the PR 
without the design and implementation plan.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.

2020-10-11 Thread GitBox


SparkQA commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-706862463


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34265/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-10-11 Thread GitBox


Ngone51 commented on a change in pull request #29855:
URL: https://github.com/apache/spark/pull/29855#discussion_r503040276



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.net.ConnectException;
+
+import com.google.common.base.Throwables;
+
+/**
+ * Plugs into {@link RetryingBlockFetcher} to further control when an 
exception should be retried
+ * and logged.
+ * Note: {@link RetryingBlockFetcher} will delegate the exception to this 
handler only when
+ * - remaining retries < max retries
+ * - exception is an IOException
+ */
+
+public interface ErrorHandler {
+
+  boolean shouldRetryError(Throwable t);
+
+  default boolean shouldLogError(Throwable t) {
+return true;
+  }
+
+  /**
+   * A no-op error handler instance.
+   */
+  ErrorHandler NOOP_ERROR_HANDLER = t -> true;
+
+  /**
+   * The error handler for pushing shuffle blocks to remote shuffle services.
+   */
+  class BlockPushErrorHandler implements ErrorHandler {
+/**
+ * String constant used for generating exception messages indicating a 
block to be merged
+ * arrives too late on the server side, and also for later checking such 
exceptions on the
+ * client side. When we get a block push failure because of the block 
arrives too late, we
+ * will not retry pushing the block nor log the exception on the client 
side.
+ */
+public static final String TOO_LATE_MESSAGE_SUFFIX =
+  "received after merged shuffle is finalized";
+
+/**
+ * String constant used for generating exception messages indicating the 
server couldn't
+ * append a block after all available attempts due to collision with other 
blocks belonging
+ * to the same shuffle partition, and also for later checking such 
exceptions on the client
+ * side. When we get a block push failure because of the block couldn't be 
written due to
+ * this reason, we will not log the exception on the client side.
+ */
+public static final String COULD_NOT_FIND_OPPORTUNITY_MSG_PREFIX =
+  "Couldn't find an opportunity to write block";
+
+@Override
+public boolean shouldRetryError(Throwable t) {
+  // If it is a connection time out or a connection closed exception, no 
need to retry.
+  if (t.getCause() != null && t.getCause() instanceof ConnectException) {
+return false;
+  }
+  // If the block is too late, there is no need to retry it
+  return 
!Throwables.getStackTraceAsString(t).contains(TOO_LATE_MESSAGE_SUFFIX);
+}
+
+@Override
+public boolean shouldLogError(Throwable t) {
+  String errorStackTrace = Throwables.getStackTraceAsString(t);
+  return !errorStackTrace.contains(COULD_NOT_FIND_OPPORTUNITY_MSG_PREFIX) 
&&
+!errorStackTrace.contains(TOO_LATE_MESSAGE_SUFFIX);

Review comment:
   > You mean the one on line 84? It should be `&&`, since we need to check 
both places.
   
   If we need to check both places, shouldn't we ensure the error message 
contains 2 COULD_NOT_FIND_OPPORTUNITY_MSG_PREFIX / TOO_LATE_MESSAGE_SUFFIX 
separately?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client

2020-10-11 Thread GitBox


sunchao edited a comment on pull request #29843:
URL: https://github.com/apache/spark/pull/29843#issuecomment-706824666


   I'll take a look at the test failure which is new. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location

2020-10-11 Thread GitBox


sunchao commented on a change in pull request #29881:
URL: https://github.com/apache/spark/pull/29881#discussion_r503039921



##
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
##
@@ -396,6 +417,93 @@ private[spark] object HiveUtils extends Logging {
 config = configurations,
 barrierPrefixes = hiveMetastoreBarrierPrefixes,
 sharedPrefixes = hiveMetastoreSharedPrefixes)
+} else if (hiveMetastoreJars == "path") {
+
+  def addLocalHiveJars(file: File): Seq[URL] = {
+if (file.getName == "*") {
+  val files = file.getParentFile.listFiles()
+  if (files == null) {
+logWarning(s"Hive jar path '${file.getPath}' does not exist.")
+Nil
+  } else {
+
files.filter(_.getName.toLowerCase(Locale.ROOT).endsWith(".jar")).map(_.toURL).toSeq
+  }
+} else {
+  file.toURL :: Nil
+}
+  }
+
+  def checkRemoteHiveJars(path: String): Seq[URL] = {
+try {
+  val hadoopPath = new Path(path)
+  val fs = hadoopPath.getFileSystem(hadoopConf)
+  if (hadoopPath.getName == "*") {
+val parent = hadoopPath.getParent
+if (!fs.exists(parent)) {
+  logWarning(s"Hive Jar ${path} does not exist.")
+  Nil
+} else if (!fs.getFileStatus(parent).isDirectory) {
+  logWarning(s"Hive Jar ${parent} is not a directory.")
+  Nil
+} else {
+  fs.listStatus(parent).map(_.getPath.toUri.toURL)
+}
+  } else {
+if (!fs.exists(hadoopPath)) {
+  logWarning(s"Hive Jar ${path} does not exist.")
+  Nil
+} else if (fs.getFileStatus(hadoopPath).isDirectory) {
+  logWarning(s"Hive Jar ${path} not allow directory without `*`")
+  Nil
+} else {
+  // Since tar/tar.gz file we can't know it's final path yet, not 
support it
+  hadoopPath.toUri.toURL :: Nil

Review comment:
   I see. So Hadoop side has a `URLStreamHandlerFactory` impl which 
provides handlers for the Hadoop-specific URL schemes. Not sure about the S3 
error though :-/





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-10-11 Thread GitBox


Ngone51 commented on a change in pull request #29855:
URL: https://github.com/apache/spark/pull/29855#discussion_r503039472



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockPusher.java
##
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Map;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.buffer.NioManagedBuffer;
+import org.apache.spark.network.client.RpcResponseCallback;
+import org.apache.spark.network.client.TransportClient;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+
+/**
+ * Similar to {@link OneForOneBlockFetcher}, but for pushing blocks to remote 
shuffle service to
+ * be merged instead of for fetching them from remote shuffle services. This 
is used by
+ * ShuffleWriter when the block push process is initiated. The supplied 
BlockFetchingListener
+ * is used to handle the success or failure in pushing each blocks.
+ */
+public class OneForOneBlockPusher {
+  private static final Logger logger = 
LoggerFactory.getLogger(OneForOneBlockPusher.class);
+  private static final ErrorHandler PUSH_ERROR_HANDLER = new 
ErrorHandler.BlockPushErrorHandler();
+
+  private final TransportClient client;
+  private final String appId;
+  private final String[] blockIds;
+  private final BlockFetchingListener listener;
+  private final Map buffers;
+
+  public OneForOneBlockPusher(
+  TransportClient client,
+  String appId,
+  String[] blockIds,
+  BlockFetchingListener listener,
+  Map buffers) {
+this.client = client;
+this.appId = appId;
+this.blockIds = blockIds;
+this.listener = listener;
+this.buffers = buffers;
+  }
+
+  private class BlockPushCallback implements RpcResponseCallback {
+
+private int index;
+private String blockId;
+
+BlockPushCallback(int index, String blockId) {
+  this.index = index;
+  this.blockId = blockId;
+}
+
+@Override
+public void onSuccess(ByteBuffer response) {
+  // On receipt of a successful block push
+  listener.onBlockFetchSuccess(blockId, new 
NioManagedBuffer(ByteBuffer.allocate(0)));
+}
+
+@Override
+public void onFailure(Throwable e) {
+  // Since block push is best effort, i.e., if we encountered a block push 
failure that's not
+  // retriable or exceeding the max retires, we should not fail all 
remaining block pushes.
+  // The best effort nature makes block push tolerable of a partial 
completion. Thus, we only
+  // fail the block that's actually failed. Not that, on the 
RetryingBlockFetcher side, once
+  // retry is initiated, it would still invalidate the previous active 
retry listener, and
+  // retry all outstanding blocks. We are preventing forwarding 
unnecessary block push failures
+  // to the parent listener of the retry listener. The only exceptions 
would be if the block
+  // push failure is due to block arriving on the server side after merge 
finalization, or the
+  // client fails to establish connection to the server side. In both 
cases, we would fail all
+  // remaining blocks.
+  if (PUSH_ERROR_HANDLER.shouldRetryError(e)) {
+String[] targetBlockId = Arrays.copyOfRange(blockIds, index, index + 
1);
+failRemainingBlocks(targetBlockId, e);
+  } else {
+String[] targetBlockId = Arrays.copyOfRange(blockIds, index, 
blockIds.length);

Review comment:
   > Here, if the exception is something that will happen for every block 
push, i.e., blocking being too late or we have a client side connection issue, 
all remaining blocks will fail.
   
   Would it have any side effect by calling `failRemainingBlocks` for multiple 
times for the same block in this case?
   
   I mean, we may call `failRemainingBlocks` for multiple times like 
`failRemainingBlocks(Array(0, 1, ..., 4), e)`, `failRemainingBlocks(Array(1, 
..., 4), e)`, ..., `failRemainingBlocks(Array(4), e)`.




-

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30006: [SPARK-33106][BUILD] Fix resolvers clash in SBT

2020-10-11 Thread GitBox


AmplabJenkins removed a comment on pull request #30006:
URL: https://github.com/apache/spark/pull/30006#issuecomment-706860082







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >