[GitHub] [spark] manuzhang commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader on RepartitionByExpression when coalescing disabled

2020-09-17 Thread GitBox


manuzhang commented on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-694671759


   cc @maryannxue @cloud-fan @JkSelf @viirya 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29798: [SPARK-32931][SQL] Unevaluable Expressions are not Foldable

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29798:
URL: https://github.com/apache/spark/pull/29798#issuecomment-694669964







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile opened a new pull request #29798: [SPARK-32931][SQL] Unevaluable Expressions are not Foldable

2020-09-17 Thread GitBox


gatorsmile opened a new pull request #29798:
URL: https://github.com/apache/spark/pull/29798


   ### What changes were proposed in this pull request?
   Unevaluable expressions are not foldable because we don't have an eval for 
it. This PR is to clean up the code and enforce it. 
   
   ### Why are the changes needed?
   Ensure that we will not hit the weird cases that trigger ConstantFolding.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   The existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694667239







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694667239







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

2020-09-17 Thread GitBox


SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694666900


   **[Test build #128856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128856/testReport)**
 for PR 29779 at commit 
[`a79b48e`](https://github.com/apache/spark/commit/a79b48eb95f3199a6cff4db6217db562d4c3f62d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


HeartSaVioR commented on a change in pull request #29767:
URL: https://github.com/apache/spark/pull/29767#discussion_r490715412



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##
@@ -300,54 +301,44 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 "write files of Hive data source directly.")
 }
 
-if (source == "memory") {
+if (source == SOURCE_NAME_TABLE) {
+  assertNotPartitioned("table")
+
+  import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier
+
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+  val CatalogAndIdentifier(catalog, identifier) = 
df.sparkSession.sessionState.sqlParser

Review comment:
   Done in e7cd27d - now it looks up (global) temp view directly and 
provide error message a bit clearer. Also added relevant tests.
   
   That said, I can't find the logic for fail-back in DataFrameWriterV2. It 
simply looks up with catalog, which temp view will not be found. Do I 
understand correctly, and if then is it a desired/expected behavior?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r490714909



##
File path: python/docs/source/getting_started/installation.rst
##
@@ -1,114 +0,0 @@
-..  Licensed to the Apache Software Foundation (ASF) under one

Review comment:
   Ah .. looks like there are too many diff and it doesn't know that it was 
renamed ..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-694664677







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-694664677







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


SparkQA commented on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-694664243


   **[Test build #128855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128855/testReport)**
 for PR 29767 at commit 
[`e7cd27d`](https://github.com/apache/spark/commit/e7cd27dd3afd32d9b28e13d56af26f6b5e097c85).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread GitBox


dongjoon-hyun commented on pull request #29780:
URL: https://github.com/apache/spark/pull/29780#issuecomment-694663873


   Thank you, @maropu and @viirya and all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhli1142015 commented on pull request #29757: [SPARK-32886][WEBUI] fix 'undefined' link in event timeline view

2020-09-17 Thread GitBox


zhli1142015 commented on pull request #29757:
URL: https://github.com/apache/spark/pull/29757#issuecomment-694663610


   > Thank you for pinging me, @zhli1142015 . The patch looks good to me since 
it's revised already.
   > 
   > Just one question, this bug seems to exist in `branch-2.4`, doesn't it? If 
then, could you update the affected version in 
[SPARK-32886](https://issues.apache.org/jira/browse/SPARK-32886) accordingly? 
Currently, the affected versions are 3.0.0 and 3.1.0.
   
   Thanks for your comments. I just verified, this bug can be reproduced in 
2.4.4. updated the JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694662809


   @HyukjinKwon I narrowed downed my PR according to your advice. Could you 
take a look again, please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694662074







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694662074







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


SparkQA commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694661637


   **[Test build #128854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128854/testReport)**
 for PR 29796 at commit 
[`71a0465`](https://github.com/apache/spark/commit/71a046515efd306f36e4894069abe234271b99d2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490710752



##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/util/HdfsUtils.scala
##
@@ -92,6 +92,10 @@ private[streaming] object HdfsUtils {
   def checkFileExists(path: String, conf: Configuration): Boolean = {
 val hdpPath = new Path(path)
 val fs = getFileSystemForPath(hdpPath, conf)
-fs.isFile(hdpPath)
+try {
+  fs.getFileStatus(hdpPath).isFile
+} catch {
+  case _: FileNotFoundException => false
+}

Review comment:
   This is the same logic with the isFile method. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490710569



##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/util/HdfsUtils.scala
##
@@ -58,7 +58,7 @@ private[streaming] object HdfsUtils {
 // If we are really unlucky, the file may be deleted as we're opening 
the stream.
 // This can happen as clean up is performed by daemon threads that may 
be left over from
 // previous runs.
-if (!dfs.isFile(dfsPath)) null else throw e
+if (!dfs.getFileStatus(dfsPath).isFile) null else throw e

Review comment:
   This is safe because of lines 53 and 55. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694659598







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694659598







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


SparkQA commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694659074


   **[Test build #128853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128853/testReport)**
 for PR 29796 at commit 
[`1c7dbae`](https://github.com/apache/spark/commit/1c7dbaef34827935d0b30bf3d6d55bcbb3005143).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490709194



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -812,11 +812,12 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
   try {
 // Fetch the entry first to avoid an RPC when it's already removed.
 listing.read(classOf[LogInfo], inProgressLog)
-if (!fs.isFile(new Path(inProgressLog))) {
+if (!fs.getFileStatus(new Path(inProgressLog)).isFile) {
   listing.delete(classOf[LogInfo], inProgressLog)
 }
   } catch {
 case _: NoSuchElementException =>
+case _: FileNotFoundException =>

Review comment:
   This is added for getFileStatus at line 815.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694658323







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490708937



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   This is safe because of `if (!fs.exists(hadoopPath)) {` at line 1881.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694654723


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29794:
URL: https://github.com/apache/spark/pull/29794#issuecomment-694657079







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29794:
URL: https://github.com/apache/spark/pull/29794#issuecomment-694657079







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND

2020-09-17 Thread GitBox


SparkQA commented on pull request #29794:
URL: https://github.com/apache/spark/pull/29794#issuecomment-694656707


   **[Test build #128852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128852/testReport)**
 for PR 29794 at commit 
[`456ffd2`](https://github.com/apache/spark/commit/456ffd23efac94af15749e2bfb79fd262a16fb94).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694654345


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694654723


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694654345


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya closed pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread GitBox


viirya closed pull request #29780:
URL: https://github.com/apache/spark/pull/29780


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread GitBox


viirya commented on pull request #29780:
URL: https://github.com/apache/spark/pull/29780#issuecomment-694652505


   Thanks all! Merging to master/3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694652478


   Thank you @HyukjinKwon I'll narrow down the score of this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-17 Thread GitBox


cloud-fan commented on a change in pull request #29585:
URL: https://github.com/apache/spark/pull/29585#discussion_r490702813



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
##
@@ -203,3 +203,62 @@ abstract class BinaryNode extends LogicalPlan {
 abstract class OrderPreservingUnaryNode extends UnaryNode {
   override final def outputOrdering: Seq[SortOrder] = child.outputOrdering
 }
+
+object LogicalPlanIntegrity {
+
+  private def canGetOutputAttrs(p: LogicalPlan): Boolean = {
+p.resolved && !p.expressions.exists { e =>
+  e.collectFirst {
+// We cannot call `output` in plans with a `ScalarSubquery` expr 
having no column,
+// so, we filter out them in advance.
+case s: ScalarSubquery if s.plan.schema.fields.isEmpty => true
+  }.isDefined
+}
+  }
+
+  /**
+   * Since some logical plans (e.g., `Union`) can build `AttributeReference`s 
in their `output`,
+   * this method checks if the same `ExprId` refers to a semantically-equal 
attribute
+   * in a plan output.
+   */
+  def hasUniqueExprIdsForOutput(plan: LogicalPlan): Boolean = {
+val allOutputAttrs = plan.collect { case p if canGetOutputAttrs(p) =>
+  // NOTE: we still need to filter resolved expressions here because the 
output of
+  // some resolved logical plans can have unresolved references,
+  // e.g., outer references in `ExistenceJoin`.
+  p.output.filter(_.resolved).map(_.canonicalized.asInstanceOf[Attribute])

Review comment:
   I'd like to make things more explicit. If our goal is to make sure all 
the attributes with the same expr ID to have the same data type, let's write 
code to check it explicitly, instead of relying on `canonicalized`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

2020-09-17 Thread GitBox


HyukjinKwon commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694651249


   Thank you @srowen.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-17 Thread GitBox


cloud-fan commented on a change in pull request #29585:
URL: https://github.com/apache/spark/pull/29585#discussion_r490701918



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
##
@@ -458,7 +463,10 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
 sys.error(s"Unexpected operator in scalar subquery: $lp")
 }
 
-val resultMap = evalPlan(plan)
+val resultMap = evalPlan(plan).mapValues { _.transform {
+case a: Alias => a.newInstance() // Assigns a new `ExprId`

Review comment:
   I don't quite understand this change. Will it break parent nodes that 
refer to this alias?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread GitBox


cloud-fan commented on a change in pull request #29780:
URL: https://github.com/apache/spark/pull/29780#discussion_r490699951



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
##
@@ -129,10 +129,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   Coalesce(children.map(normalize))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)

Review comment:
   Yea I think so. `CreateNamedStruct` allows you to specify the name for 
each field.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29780:
URL: https://github.com/apache/spark/pull/29780#discussion_r490697423



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
##
@@ -129,10 +129,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   Coalesce(children.map(normalize))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)

Review comment:
   Looks logically fine but just to confirm that I understood correctly, 
it's just a cleanup fix to match the column names, right? and doesn't affect 
the final output?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29795:
URL: https://github.com/apache/spark/pull/29795#issuecomment-694645847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29795:
URL: https://github.com/apache/spark/pull/29795#issuecomment-694645847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class

2020-09-17 Thread GitBox


SparkQA removed a comment on pull request #29795:
URL: https://github.com/apache/spark/pull/29795#issuecomment-694556609


   **[Test build #128840 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128840/testReport)**
 for PR 29795 at commit 
[`ac149a5`](https://github.com/apache/spark/commit/ac149a52b88e7eb9ab5aa50d1cf341caf9f59faa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-694645141







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-694645141







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class

2020-09-17 Thread GitBox


SparkQA commented on pull request #29795:
URL: https://github.com/apache/spark/pull/29795#issuecomment-694645071


   **[Test build #128840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128840/testReport)**
 for PR 29795 at commit 
[`ac149a5`](https://github.com/apache/spark/commit/ac149a52b88e7eb9ab5aa50d1cf341caf9f59faa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled

2020-09-17 Thread GitBox


SparkQA commented on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-694644786


   **[Test build #128851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128851/testReport)**
 for PR 29797 at commit 
[`fc831f6`](https://github.com/apache/spark/commit/fc831f67d4a52bd148328857cf4bac640434ff2b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


cloud-fan commented on a change in pull request #29767:
URL: https://github.com/apache/spark/pull/29767#discussion_r490696111



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
##
@@ -300,54 +301,44 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 "write files of Hive data source directly.")
 }
 
-if (source == "memory") {
+if (source == SOURCE_NAME_TABLE) {
+  assertNotPartitioned("table")
+
+  import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier
+
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+  val CatalogAndIdentifier(catalog, identifier) = 
df.sparkSession.sessionState.sqlParser

Review comment:
   > (only if the temp view is a single data source scan node)
   
   As I mentioned before, the temp view must be very simple, like 
`spark.table(name)` or `CREATE TEMP VIEW v USING parquet OPTIONS(...)`
   
   I believe there are tests, but I don't remember where they are. You can 
update `ResolveRelations` to drop the support of inserting temp views, and see 
which tests fail.
   
   For this particular PR, I'm OK to not support temp view for now, as we need 
to refactor it a little bit and have a logical plan for streaming write. But 
for consistency with other places that lookup a table, we should still lookup 
temp views, and just fail if a temp view is returned.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wzhfy edited a comment on pull request #29764: [SPARK-32738][CORE][2.4] Should reduce the number of active threads if fatal error happens in `Inbox.process`

2020-09-17 Thread GitBox


wzhfy edited a comment on pull request #29764:
URL: https://github.com/apache/spark/pull/29764#issuecomment-694643818


   Finally all tests are passed...
   cc @cloud-fan @mridulm  This is a backport for branch 2.4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wzhfy commented on pull request #29764: [SPARK-32738][CORE][2.4] Should reduce the number of active threads if fatal error happens in `Inbox.process`

2020-09-17 Thread GitBox


wzhfy commented on pull request #29764:
URL: https://github.com/apache/spark/pull/29764#issuecomment-694643818


   Finally all tests are passed..
   cc @cloud-fan @mridulm  This is a backport for branch 2.4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] manuzhang opened a new pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled

2020-09-17 Thread GitBox


manuzhang opened a new pull request #29797:
URL: https://github.com/apache/spark/pull/29797


   ### What changes were proposed in this pull request?
   Do not change number of partitions on RepartitionByExpression when 
`spark.sql.adaptive.coalescePartitions.enabled=false`.
   
   ### Why are the changes needed?
   Users usually repartition with partition column on dynamic partition 
overwrite. AQE could break it when changing number of partitions in coalescing 
shuffle partitions or using local shuffle reader. That could lead to a large 
number of output files, even exceeding the file system limit.  A simple fix is 
to allow users to disable it by setting coalescingPartitions to false.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   
   
   ### How was this patch tested?
   Add test.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND

2020-09-17 Thread GitBox


maropu commented on a change in pull request #29794:
URL: https://github.com/apache/spark/pull/29794#discussion_r490693684



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
##
@@ -95,4 +96,19 @@ class CanonicalizeSuite extends SparkFunSuite {
 val castWithTimeZoneId = Cast(literal, LongType, 
Some(TimeZone.getDefault.getID))
 assert(castWithTimeZoneId.semanticEquals(cast))
   }
+
+  test("SPARK-32927: Bitwise operations are commutative") {

Review comment:
   Could you add more test cases, e.g., non-deterministic exprs, literals, 
and mixed cases `BitwiseOr`/`BitwiseAnd`/`BitwiseXor`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on a change in pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND

2020-09-17 Thread GitBox


tanelk commented on a change in pull request #29794:
URL: https://github.com/apache/spark/pull/29794#discussion_r490689250



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
##
@@ -95,4 +96,19 @@ class CanonicalizeSuite extends SparkFunSuite {
 val castWithTimeZoneId = Cast(literal, LongType, 
Some(TimeZone.getDefault.getID))
 assert(castWithTimeZoneId.semanticEquals(cast))
   }
+
+  test("SPARK-32927: Bitwise operations are commutative") {
+Seq(

Review comment:
   Oh, didn't know that syntax, thank you, will change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490687211



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   @williamhyun, I think they were deprecated to avoid calling redundant 
`getFileStatus` at 
[HADOOP-13321](https://issues.apache.org/jira/browse/HADOOP-13321):
   
   > These methods have a habit of fostering inefficient call patterns in 
applications, resulting in multiple redundant getFileStatus calls.
   
You can have one util that takes both `FileStatus` or `path` for example.
   
   If you'd like to take a more conservative approach, you can keep only some 
valid calls (with `fs.exists()` as an example).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490687765



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##
@@ -1192,7 +1192,7 @@ class HiveDDLSuite
   expectedDBUri,
   Map.empty))
 // the database directory was created
-assert(fs.exists(dbPath) && fs.isDirectory(dbPath))
+assert(fs.exists(dbPath) && fs.getFileStatus(dbPath).isDirectory)

Review comment:
   this too





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490687648



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/EventLogFileWritersSuite.scala
##
@@ -213,7 +213,7 @@ class SingleEventLogFileWriterSuite extends 
EventLogFileWritersSuite {
   compressionCodecShortName)
 
 val finalLogPath = new Path(logPath)
-assert(fileSystem.exists(finalLogPath) && fileSystem.isFile(finalLogPath))
+assert(fileSystem.exists(finalLogPath) && 
fileSystem.getFileStatus(finalLogPath).isFile)

Review comment:
   this looks valid too





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490687461



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   For example, `fs.getFileStatus(hadoopPath).isDirectory` here looks valid.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490687211



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   @williamhyun, I think they were deprecated to avoid calling redundant 
`getFileStatus` at 
[HADOOP-13321](https://issues.apache.org/jira/browse/HADOOP-13321):
   
   > These methods have a habit of fostering inefficient call patterns in 
applications, resulting in multiple redundant getFileStatus calls.
   
You can have one util that takes both `FileStatus` and `path` for example.
   
   If you'd like to take more conservative, you can keep some valid calls (with 
`fs.exists()` as an example).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-17 Thread GitBox


HeartSaVioR commented on a change in pull request #29756:
URL: https://github.com/apache/spark/pull/29756#discussion_r490687079



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -846,9 +847,9 @@ class Analyzer(
*/
   object ResolveTempViews extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case u @ UnresolvedRelation(ident, _) =>
+  case u @ UnresolvedRelation(ident, _, _) =>
 lookupTempView(ident).getOrElse(u)

Review comment:
   OK I agree it should be better to fix in different PR. I still think we 
shouldn't support confused things just because we supported them. The fix would 
depend on change of the PR (isStreaming flag) - I'll wait for this PR and try 
to fix it after the PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694630959







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-694630939







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-694630939







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694630959







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


SparkQA commented on pull request #29767:
URL: https://github.com/apache/spark/pull/29767#issuecomment-694630669


   **[Test build #128850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128850/testReport)**
 for PR 29767 at commit 
[`4fd5cb9`](https://github.com/apache/spark/commit/4fd5cb9586c9a2badd4025082f811829c60bade6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


SparkQA commented on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694630611


   **[Test build #128849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128849/testReport)**
 for PR 29789 at commit 
[`d1ef396`](https://github.com/apache/spark/commit/d1ef396289d7a952843a5ae5f246a1bee60ac056).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API

2020-09-17 Thread GitBox


HeartSaVioR commented on a change in pull request #29767:
URL: https://github.com/apache/spark/pull/29767#discussion_r490683314



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
##
@@ -169,26 +171,32 @@ class InMemoryTable(
 
 new WriteBuilder with SupportsTruncate with SupportsOverwrite with 
SupportsDynamicOverwrite {
   private var writer: BatchWrite = Append
+  private var streamingWriter: StreamingWrite = StreamingAppend
 
   override def truncate(): WriteBuilder = {
 assert(writer == Append)
 writer = TruncateAndAppend
+streamingWriter = StreamingTruncateAndAppend
 this
   }
 
   override def overwrite(filters: Array[Filter]): WriteBuilder = {
 assert(writer == Append)
 writer = new Overwrite(filters)
+// streaming writer doesn't have equivalent semantic

Review comment:
   Just changed to explicitly fail for the case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


Ngone51 commented on a change in pull request #29789:
URL: https://github.com/apache/spark/pull/29789#discussion_r490682482



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -400,7 +400,10 @@ private[spark] class Executor(
   // Report executor runtime and JVM gc time
   Option(task).foreach(t => {
 t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis(
-  System.nanoTime() - taskStartTimeNs))
+  // SPARK-32898: it's possible that a task be killed before it reaches
+  // "taskStartTimeNs = System.nanoTime()". In this case, the 
executorRunTime

Review comment:
   Thanks, addressed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun closed pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun closed pull request #29796:
URL: https://github.com/apache/spark/pull/29796


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


SparkQA removed a comment on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629097


   **[Test build #128848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128848/testReport)**
 for PR 29268 at commit 
[`d0e0084`](https://github.com/apache/spark/commit/d0e0084211109c819d2176b85c26f02abf09ec77).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490682171



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   I missed that. Sorry. In that case, the utility methods will be 
identical with the deprecated Hadoop 3.2 `isFile/isDirectory` themselves. I'll 
close my PR. Thank you for your feedback.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629616


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128848/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629611







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


SparkQA commented on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629601


   **[Test build #128848 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128848/testReport)**
 for PR 29268 at commit 
[`d0e0084`](https://github.com/apache/spark/commit/d0e0084211109c819d2176b85c26f02abf09ec77).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629611


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629351







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629351







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29732:
URL: https://github.com/apache/spark/pull/29732#issuecomment-694629309







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same on ever

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29732:
URL: https://github.com/apache/spark/pull/29732#issuecomment-694629309







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn edited a comment on pull request #29777: [SPARK-32905][Core][Yarn] ApplicationMaster fails to receive UpdateDelegationTokens message

2020-09-17 Thread GitBox


yaooqinn edited a comment on pull request #29777:
URL: https://github.com/apache/spark/pull/29777#issuecomment-694607845


   thanks, @tgravescs, and @mridulm 
   
   I am running the sub-module `kyuubi-spark-sql-engine` of 
https://github.com/yaooqinn/kyuubi
   
   The simplest way to reproduce the bug and verify this fix is to follow these 
steps
   
    1 build the `kyuubi-spark-sql-engine` module
   ```
   mvn clean package -pl :kyuubi-spark-sql-engine
   ```
   
    2. config the spark with Kerberos settings towards your secured cluster
   
    3. start it in the background
   ```
   nohup bin/spark-submit --class org.apache.kyuubi.engine.spark.SparkSQLEngine 
../kyuubi-spark-sql-engine-1.0.0-SNAPSHOT.jar > kyuubi.log &
   ```
   
    4. check the AM log and see 
   
   "Updating delegation tokens ..." for SUCCESS
   
   "Inbox: Ignoring error .. does not implement 'receive'" for FAILURE



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule

2020-09-17 Thread GitBox


SparkQA commented on pull request #29268:
URL: https://github.com/apache/spark/pull/29268#issuecomment-694629097


   **[Test build #128848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128848/testReport)**
 for PR 29268 at commit 
[`d0e0084`](https://github.com/apache/spark/commit/d0e0084211109c819d2176b85c26f02abf09ec77).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same on every task

2020-09-17 Thread GitBox


SparkQA commented on pull request #29732:
URL: https://github.com/apache/spark/pull/29732#issuecomment-694629080


   **[Test build #128847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128847/testReport)**
 for PR 29732 at commit 
[`6e7044c`](https://github.com/apache/spark/commit/6e7044c458233938b7070d97858dc6da742a28af).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same on every task

2020-09-17 Thread GitBox


Ngone51 commented on pull request #29732:
URL: https://github.com/apache/spark/pull/29732#issuecomment-694628984


   > can we simply resolve the issue by setting longer barrier sync timeout?
   
   Increased the timeout to 5s. I think it should be enough long.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #29789:
URL: https://github.com/apache/spark/pull/29789#discussion_r490677550



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -400,7 +400,10 @@ private[spark] class Executor(
   // Report executor runtime and JVM gc time
   Option(task).foreach(t => {
 t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis(
-  System.nanoTime() - taskStartTimeNs))
+  // SPARK-32898: it's possible that a task be killed before it reaches
+  // "taskStartTimeNs = System.nanoTime()". In this case, the 
executorRunTime

Review comment:
   `before it reaches "taskStartTimeNs = System.nanoTime()"` might be a 
little misleading because this looks like equality. If we want to pointing 
[line 
470](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470),
 can we use a different wording? For example, something like, `it's possible 
that a task is killed when taskStartTimeNs has the initial value(=0) still`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #29789:
URL: https://github.com/apache/spark/pull/29789#discussion_r490677550



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -400,7 +400,10 @@ private[spark] class Executor(
   // Report executor runtime and JVM gc time
   Option(task).foreach(t => {
 t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis(
-  System.nanoTime() - taskStartTimeNs))
+  // SPARK-32898: it's possible that a task be killed before it reaches
+  // "taskStartTimeNs = System.nanoTime()". In this case, the 
executorRunTime

Review comment:
   `before it reaches "taskStartTimeNs = System.nanoTime()"` might be a 
little misleading because this looks like equality. If we want to pointing 
[line 
470](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470),
 can we use a different wording?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #29789:
URL: https://github.com/apache/spark/pull/29789#discussion_r490677550



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -400,7 +400,10 @@ private[spark] class Executor(
   // Report executor runtime and JVM gc time
   Option(task).foreach(t => {
 t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis(
-  System.nanoTime() - taskStartTimeNs))
+  // SPARK-32898: it's possible that a task be killed before it reaches
+  // "taskStartTimeNs = System.nanoTime()". In this case, the 
executorRunTime

Review comment:
   `before it reaches "taskStartTimeNs = System.nanoTime()"` might be a 
little misleading because this looks like equality. If we want to pointing 
[line 
473](https://github.com/apache/spark/pull/29789/files#diff-5a0de266c82b95adb47d9bca714e1f1bR473),
 can we use a different wording?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490673471



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   @williamhyun, looks `fs.getFileStatus(hadoopPath).isDirectory` isn't the 
exact synonym of `fs.isDirectory`, e.g.)
   
   ```java
 public boolean isDirectory(Path f) throws IOException {
   try {
 return getFileStatus(f).isDirectory();
   } catch (FileNotFoundException e) {
 return false;   // f does not exist
   }
 }
   ```
   
   I think you can create utility methods under `Utils.scala` to reuse.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29796:
URL: https://github.com/apache/spark/pull/29796#discussion_r490673471



##
File path: core/src/main/scala/org/apache/spark/SparkContext.scala
##
@@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging {
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(s"Jar ${path} not found")
   }
-  if (fs.isDirectory(hadoopPath)) {
+  if (fs.getFileStatus(hadoopPath).isDirectory) {

Review comment:
   @williamhyun, looks `fs.getFileStatus(hadoopPath).isDirectory` isn't 
exact synonym of `fs.isDirectory`, e.g.)
   
   ```java
 public boolean isDirectory(Path f) throws IOException {
   try {
 return getFileStatus(f).isDirectory();
   } catch (FileNotFoundException e) {
 return false;   // f does not exist
   }
 }
   ```
   
   I think you can create a utility methods under `Utils.scala` to reuse.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694620254







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694620254







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


SparkQA commented on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694619794


   **[Test build #128846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128846/testReport)**
 for PR 29789 at commit 
[`91623a7`](https://github.com/apache/spark/commit/91623a750755704af7575df577d2e32851e81954).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29788: [SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases

2020-09-17 Thread GitBox


HyukjinKwon commented on pull request #29788:
URL: https://github.com/apache/spark/pull/29788#issuecomment-694618798


   @tgravescs, @mridulm, @squito FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start

2020-09-17 Thread GitBox


HyukjinKwon commented on pull request #29789:
URL: https://github.com/apache/spark/pull/29789#issuecomment-694618522


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


dongjoon-hyun removed a comment on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694618049


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29786: [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()`

2020-09-17 Thread GitBox


HyukjinKwon closed pull request #29786:
URL: https://github.com/apache/spark/pull/29786


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29782: [SPARK-32907][ML] adaptively blockify instances - revert blockify gmm

2020-09-17 Thread GitBox


HyukjinKwon commented on a change in pull request #29782:
URL: https://github.com/apache/spark/pull/29782#discussion_r490671441



##
File path: python/pyspark/ml/clustering.py
##
@@ -432,13 +429,6 @@ def setAggregationDepth(self, value):
 """
 return self._set(aggregationDepth=value)
 
-@since("3.1.0")
-def setBlockSize(self, value):

Review comment:
   @zero323 FYI





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694618030


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


dongjoon-hyun commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694618049


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins removed a comment on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694617722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


AmplabJenkins commented on pull request #29796:
URL: https://github.com/apache/spark/pull/29796#issuecomment-694617722


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun opened a new pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods

2020-09-17 Thread GitBox


williamhyun opened a new pull request #29796:
URL: https://github.com/apache/spark/pull/29796


   ### What changes were proposed in this pull request?
   
   ```scala
   - fs.isDirectory(hadoopPath)
   + fs.getFileStatus(hadoopPath).isDirectory
   ```
   
   ```scala
   - fs.isFile(new Path(inProgressLog))
   + fs.getFileStatus(new Path(inProgressLog)).isFile
   ```
   
   ### Why are the changes needed?
   
   
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2-hive-2.3/1244/consoleFull
   
   ```
   [warn] 
/home/jenkins/workspace/spark-master-test-sbt-hadoop-3.2-hive-2.3/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:815:
 method isFile in class FileSystem is deprecated: see corresponding Javadoc for 
more information.
   [warn] if (!fs.isFile(new Path(inProgressLog))) {
   ```
   
   ```
   [warn] 
/home/jenkins/workspace/spark-master-test-sbt-hadoop-3.2-hive-2.3/core/src/main/scala/org/apache/spark/SparkContext.scala:1884:
 method isDirectory in class FileSystem is deprecated: see corresponding 
Javadoc for more information.
   [warn]   if (fs.isDirectory(hadoopPath)) {
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Pass the Jenkins.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >