[GitHub] [spark] manuzhang commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader on RepartitionByExpression when coalescing disabled
manuzhang commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-694671759 cc @maryannxue @cloud-fan @JkSelf @viirya This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29798: [SPARK-32931][SQL] Unevaluable Expressions are not Foldable
AmplabJenkins commented on pull request #29798: URL: https://github.com/apache/spark/pull/29798#issuecomment-694669964 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile opened a new pull request #29798: [SPARK-32931][SQL] Unevaluable Expressions are not Foldable
gatorsmile opened a new pull request #29798: URL: https://github.com/apache/spark/pull/29798 ### What changes were proposed in this pull request? Unevaluable expressions are not foldable because we don't have an eval for it. This PR is to clean up the code and enforce it. ### Why are the changes needed? Ensure that we will not hit the weird cases that trigger ConstantFolding. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide
AmplabJenkins removed a comment on pull request #29779: URL: https://github.com/apache/spark/pull/29779#issuecomment-694667239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide
AmplabJenkins commented on pull request #29779: URL: https://github.com/apache/spark/pull/29779#issuecomment-694667239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide
SparkQA commented on pull request #29779: URL: https://github.com/apache/spark/pull/29779#issuecomment-694666900 **[Test build #128856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128856/testReport)** for PR 29779 at commit [`a79b48e`](https://github.com/apache/spark/commit/a79b48eb95f3199a6cff4db6217db562d4c3f62d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
HeartSaVioR commented on a change in pull request #29767: URL: https://github.com/apache/spark/pull/29767#discussion_r490715412 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ## @@ -300,54 +301,44 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { "write files of Hive data source directly.") } -if (source == "memory") { +if (source == SOURCE_NAME_TABLE) { + assertNotPartitioned("table") + + import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier + + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + val CatalogAndIdentifier(catalog, identifier) = df.sparkSession.sessionState.sqlParser Review comment: Done in e7cd27d - now it looks up (global) temp view directly and provide error message a bit clearer. Also added relevant tests. That said, I can't find the logic for fail-back in DataFrameWriterV2. It simply looks up with catalog, which temp view will not be found. Do I understand correctly, and if then is it a desired/expected behavior? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide
HyukjinKwon commented on a change in pull request #29779: URL: https://github.com/apache/spark/pull/29779#discussion_r490714909 ## File path: python/docs/source/getting_started/installation.rst ## @@ -1,114 +0,0 @@ -.. Licensed to the Apache Software Foundation (ASF) under one Review comment: Ah .. looks like there are too many diff and it doesn't know that it was renamed .. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
AmplabJenkins removed a comment on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-694664677 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
AmplabJenkins commented on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-694664677 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
SparkQA commented on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-694664243 **[Test build #128855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128855/testReport)** for PR 29767 at commit [`e7cd27d`](https://github.com/apache/spark/commit/e7cd27dd3afd32d9b28e13d56af26f6b5e097c85). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
dongjoon-hyun commented on pull request #29780: URL: https://github.com/apache/spark/pull/29780#issuecomment-694663873 Thank you, @maropu and @viirya and all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhli1142015 commented on pull request #29757: [SPARK-32886][WEBUI] fix 'undefined' link in event timeline view
zhli1142015 commented on pull request #29757: URL: https://github.com/apache/spark/pull/29757#issuecomment-694663610 > Thank you for pinging me, @zhli1142015 . The patch looks good to me since it's revised already. > > Just one question, this bug seems to exist in `branch-2.4`, doesn't it? If then, could you update the affected version in [SPARK-32886](https://issues.apache.org/jira/browse/SPARK-32886) accordingly? Currently, the affected versions are 3.0.0 and 3.1.0. Thanks for your comments. I just verified, this bug can be reproduced in 2.4.4. updated the JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694662809 @HyukjinKwon I narrowed downed my PR according to your advice. Could you take a look again, please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins removed a comment on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694662074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694662074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
SparkQA commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694661637 **[Test build #128854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128854/testReport)** for PR 29796 at commit [`71a0465`](https://github.com/apache/spark/commit/71a046515efd306f36e4894069abe234271b99d2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490710752 ## File path: streaming/src/main/scala/org/apache/spark/streaming/util/HdfsUtils.scala ## @@ -92,6 +92,10 @@ private[streaming] object HdfsUtils { def checkFileExists(path: String, conf: Configuration): Boolean = { val hdpPath = new Path(path) val fs = getFileSystemForPath(hdpPath, conf) -fs.isFile(hdpPath) +try { + fs.getFileStatus(hdpPath).isFile +} catch { + case _: FileNotFoundException => false +} Review comment: This is the same logic with the isFile method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490710569 ## File path: streaming/src/main/scala/org/apache/spark/streaming/util/HdfsUtils.scala ## @@ -58,7 +58,7 @@ private[streaming] object HdfsUtils { // If we are really unlucky, the file may be deleted as we're opening the stream. // This can happen as clean up is performed by daemon threads that may be left over from // previous runs. -if (!dfs.isFile(dfsPath)) null else throw e +if (!dfs.getFileStatus(dfsPath).isFile) null else throw e Review comment: This is safe because of lines 53 and 55. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694659598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins removed a comment on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694659598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
SparkQA commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694659074 **[Test build #128853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128853/testReport)** for PR 29796 at commit [`1c7dbae`](https://github.com/apache/spark/commit/1c7dbaef34827935d0b30bf3d6d55bcbb3005143). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490709194 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -812,11 +812,12 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) try { // Fetch the entry first to avoid an RPC when it's already removed. listing.read(classOf[LogInfo], inProgressLog) -if (!fs.isFile(new Path(inProgressLog))) { +if (!fs.getFileStatus(new Path(inProgressLog)).isFile) { listing.delete(classOf[LogInfo], inProgressLog) } } catch { case _: NoSuchElementException => +case _: FileNotFoundException => Review comment: This is added for getFileStatus at line 815. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694658323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490708937 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: This is safe because of `if (!fs.exists(hadoopPath)) {` at line 1881. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins removed a comment on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694654723 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND
AmplabJenkins removed a comment on pull request #29794: URL: https://github.com/apache/spark/pull/29794#issuecomment-694657079 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND
AmplabJenkins commented on pull request #29794: URL: https://github.com/apache/spark/pull/29794#issuecomment-694657079 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND
SparkQA commented on pull request #29794: URL: https://github.com/apache/spark/pull/29794#issuecomment-694656707 **[Test build #128852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128852/testReport)** for PR 29794 at commit [`456ffd2`](https://github.com/apache/spark/commit/456ffd23efac94af15749e2bfb79fd262a16fb94). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins removed a comment on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694654345 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694654723 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694654345 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
viirya closed pull request #29780: URL: https://github.com/apache/spark/pull/29780 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
viirya commented on pull request #29780: URL: https://github.com/apache/spark/pull/29780#issuecomment-694652505 Thanks all! Merging to master/3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694652478 Thank you @HyukjinKwon I'll narrow down the score of this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans
cloud-fan commented on a change in pull request #29585: URL: https://github.com/apache/spark/pull/29585#discussion_r490702813 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ## @@ -203,3 +203,62 @@ abstract class BinaryNode extends LogicalPlan { abstract class OrderPreservingUnaryNode extends UnaryNode { override final def outputOrdering: Seq[SortOrder] = child.outputOrdering } + +object LogicalPlanIntegrity { + + private def canGetOutputAttrs(p: LogicalPlan): Boolean = { +p.resolved && !p.expressions.exists { e => + e.collectFirst { +// We cannot call `output` in plans with a `ScalarSubquery` expr having no column, +// so, we filter out them in advance. +case s: ScalarSubquery if s.plan.schema.fields.isEmpty => true + }.isDefined +} + } + + /** + * Since some logical plans (e.g., `Union`) can build `AttributeReference`s in their `output`, + * this method checks if the same `ExprId` refers to a semantically-equal attribute + * in a plan output. + */ + def hasUniqueExprIdsForOutput(plan: LogicalPlan): Boolean = { +val allOutputAttrs = plan.collect { case p if canGetOutputAttrs(p) => + // NOTE: we still need to filter resolved expressions here because the output of + // some resolved logical plans can have unresolved references, + // e.g., outer references in `ExistenceJoin`. + p.output.filter(_.resolved).map(_.canonicalized.asInstanceOf[Attribute]) Review comment: I'd like to make things more explicit. If our goal is to make sure all the attributes with the same expr ID to have the same data type, let's write code to check it explicitly, instead of relying on `canonicalized`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide
HyukjinKwon commented on pull request #29779: URL: https://github.com/apache/spark/pull/29779#issuecomment-694651249 Thank you @srowen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans
cloud-fan commented on a change in pull request #29585: URL: https://github.com/apache/spark/pull/29585#discussion_r490701918 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala ## @@ -458,7 +463,10 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { sys.error(s"Unexpected operator in scalar subquery: $lp") } -val resultMap = evalPlan(plan) +val resultMap = evalPlan(plan).mapValues { _.transform { +case a: Alias => a.newInstance() // Assigns a new `ExprId` Review comment: I don't quite understand this change. Will it break parent nodes that refer to this alias? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
cloud-fan commented on a change in pull request #29780: URL: https://github.com/apache/spark/pull/29780#discussion_r490699951 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -129,10 +129,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Coalesce(children.map(normalize)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) Review comment: Yea I think so. `CreateNamedStruct` allows you to specify the name for each field. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29780: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
HyukjinKwon commented on a change in pull request #29780: URL: https://github.com/apache/spark/pull/29780#discussion_r490697423 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -129,10 +129,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Coalesce(children.map(normalize)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) Review comment: Looks logically fine but just to confirm that I understood correctly, it's just a cleanup fix to match the column names, right? and doesn't affect the final output? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class
AmplabJenkins commented on pull request #29795: URL: https://github.com/apache/spark/pull/29795#issuecomment-694645847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class
AmplabJenkins removed a comment on pull request #29795: URL: https://github.com/apache/spark/pull/29795#issuecomment-694645847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class
SparkQA removed a comment on pull request #29795: URL: https://github.com/apache/spark/pull/29795#issuecomment-694556609 **[Test build #128840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128840/testReport)** for PR 29795 at commit [`ac149a5`](https://github.com/apache/spark/commit/ac149a52b88e7eb9ab5aa50d1cf341caf9f59faa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled
AmplabJenkins removed a comment on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-694645141 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled
AmplabJenkins commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-694645141 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29795: [SPARK-32511][SQL][WIP] Add dropFields method to Column class
SparkQA commented on pull request #29795: URL: https://github.com/apache/spark/pull/29795#issuecomment-694645071 **[Test build #128840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128840/testReport)** for PR 29795 at commit [`ac149a5`](https://github.com/apache/spark/commit/ac149a52b88e7eb9ab5aa50d1cf341caf9f59faa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled
SparkQA commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-694644786 **[Test build #128851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128851/testReport)** for PR 29797 at commit [`fc831f6`](https://github.com/apache/spark/commit/fc831f67d4a52bd148328857cf4bac640434ff2b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
cloud-fan commented on a change in pull request #29767: URL: https://github.com/apache/spark/pull/29767#discussion_r490696111 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala ## @@ -300,54 +301,44 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { "write files of Hive data source directly.") } -if (source == "memory") { +if (source == SOURCE_NAME_TABLE) { + assertNotPartitioned("table") + + import df.sparkSession.sessionState.analyzer.CatalogAndIdentifier + + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + val CatalogAndIdentifier(catalog, identifier) = df.sparkSession.sessionState.sqlParser Review comment: > (only if the temp view is a single data source scan node) As I mentioned before, the temp view must be very simple, like `spark.table(name)` or `CREATE TEMP VIEW v USING parquet OPTIONS(...)` I believe there are tests, but I don't remember where they are. You can update `ResolveRelations` to drop the support of inserting temp views, and see which tests fail. For this particular PR, I'm OK to not support temp view for now, as we need to refactor it a little bit and have a logical plan for streaming write. But for consistency with other places that lookup a table, we should still lookup temp views, and just fail if a temp view is returned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wzhfy edited a comment on pull request #29764: [SPARK-32738][CORE][2.4] Should reduce the number of active threads if fatal error happens in `Inbox.process`
wzhfy edited a comment on pull request #29764: URL: https://github.com/apache/spark/pull/29764#issuecomment-694643818 Finally all tests are passed... cc @cloud-fan @mridulm This is a backport for branch 2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wzhfy commented on pull request #29764: [SPARK-32738][CORE][2.4] Should reduce the number of active threads if fatal error happens in `Inbox.process`
wzhfy commented on pull request #29764: URL: https://github.com/apache/spark/pull/29764#issuecomment-694643818 Finally all tests are passed.. cc @cloud-fan @mridulm This is a backport for branch 2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] manuzhang opened a new pull request #29797: [SPARK-32932][SQL] Do not change number of partitions on RepartitionByExpression when coalescing disabled
manuzhang opened a new pull request #29797: URL: https://github.com/apache/spark/pull/29797 ### What changes were proposed in this pull request? Do not change number of partitions on RepartitionByExpression when `spark.sql.adaptive.coalescePartitions.enabled=false`. ### Why are the changes needed? Users usually repartition with partition column on dynamic partition overwrite. AQE could break it when changing number of partitions in coalescing shuffle partitions or using local shuffle reader. That could lead to a large number of output files, even exceeding the file system limit. A simple fix is to allow users to disable it by setting coalescingPartitions to false. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Add test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND
maropu commented on a change in pull request #29794: URL: https://github.com/apache/spark/pull/29794#discussion_r490693684 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala ## @@ -95,4 +96,19 @@ class CanonicalizeSuite extends SparkFunSuite { val castWithTimeZoneId = Cast(literal, LongType, Some(TimeZone.getDefault.getID)) assert(castWithTimeZoneId.semanticEquals(cast)) } + + test("SPARK-32927: Bitwise operations are commutative") { Review comment: Could you add more test cases, e.g., non-deterministic exprs, literals, and mixed cases `BitwiseOr`/`BitwiseAnd`/`BitwiseXor`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #29794: [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND
tanelk commented on a change in pull request #29794: URL: https://github.com/apache/spark/pull/29794#discussion_r490689250 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala ## @@ -95,4 +96,19 @@ class CanonicalizeSuite extends SparkFunSuite { val castWithTimeZoneId = Cast(literal, LongType, Some(TimeZone.getDefault.getID)) assert(castWithTimeZoneId.semanticEquals(cast)) } + + test("SPARK-32927: Bitwise operations are commutative") { +Seq( Review comment: Oh, didn't know that syntax, thank you, will change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490687211 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: @williamhyun, I think they were deprecated to avoid calling redundant `getFileStatus` at [HADOOP-13321](https://issues.apache.org/jira/browse/HADOOP-13321): > These methods have a habit of fostering inefficient call patterns in applications, resulting in multiple redundant getFileStatus calls. You can have one util that takes both `FileStatus` or `path` for example. If you'd like to take a more conservative approach, you can keep only some valid calls (with `fs.exists()` as an example). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490687765 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ## @@ -1192,7 +1192,7 @@ class HiveDDLSuite expectedDBUri, Map.empty)) // the database directory was created -assert(fs.exists(dbPath) && fs.isDirectory(dbPath)) +assert(fs.exists(dbPath) && fs.getFileStatus(dbPath).isDirectory) Review comment: this too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490687648 ## File path: core/src/test/scala/org/apache/spark/deploy/history/EventLogFileWritersSuite.scala ## @@ -213,7 +213,7 @@ class SingleEventLogFileWriterSuite extends EventLogFileWritersSuite { compressionCodecShortName) val finalLogPath = new Path(logPath) -assert(fileSystem.exists(finalLogPath) && fileSystem.isFile(finalLogPath)) +assert(fileSystem.exists(finalLogPath) && fileSystem.getFileStatus(finalLogPath).isFile) Review comment: this looks valid too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490687461 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: For example, `fs.getFileStatus(hadoopPath).isDirectory` here looks valid. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490687211 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: @williamhyun, I think they were deprecated to avoid calling redundant `getFileStatus` at [HADOOP-13321](https://issues.apache.org/jira/browse/HADOOP-13321): > These methods have a habit of fostering inefficient call patterns in applications, resulting in multiple redundant getFileStatus calls. You can have one util that takes both `FileStatus` and `path` for example. If you'd like to take more conservative, you can keep some valid calls (with `fs.exists()` as an example). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API
HeartSaVioR commented on a change in pull request #29756: URL: https://github.com/apache/spark/pull/29756#discussion_r490687079 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -846,9 +847,9 @@ class Analyzer( */ object ResolveTempViews extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case u @ UnresolvedRelation(ident, _) => + case u @ UnresolvedRelation(ident, _, _) => lookupTempView(ident).getOrElse(u) Review comment: OK I agree it should be better to fix in different PR. I still think we shouldn't support confused things just because we supported them. The fix would depend on change of the PR (isStreaming flag) - I'll wait for this PR and try to fix it after the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
AmplabJenkins commented on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694630959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
AmplabJenkins removed a comment on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-694630939 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
AmplabJenkins commented on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-694630939 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
AmplabJenkins removed a comment on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694630959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
SparkQA commented on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-694630669 **[Test build #128850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128850/testReport)** for PR 29767 at commit [`4fd5cb9`](https://github.com/apache/spark/commit/4fd5cb9586c9a2badd4025082f811829c60bade6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
SparkQA commented on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694630611 **[Test build #128849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128849/testReport)** for PR 29789 at commit [`d1ef396`](https://github.com/apache/spark/commit/d1ef396289d7a952843a5ae5f246a1bee60ac056). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
HeartSaVioR commented on a change in pull request #29767: URL: https://github.com/apache/spark/pull/29767#discussion_r490683314 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala ## @@ -169,26 +171,32 @@ class InMemoryTable( new WriteBuilder with SupportsTruncate with SupportsOverwrite with SupportsDynamicOverwrite { private var writer: BatchWrite = Append + private var streamingWriter: StreamingWrite = StreamingAppend override def truncate(): WriteBuilder = { assert(writer == Append) writer = TruncateAndAppend +streamingWriter = StreamingTruncateAndAppend this } override def overwrite(filters: Array[Filter]): WriteBuilder = { assert(writer == Append) writer = new Overwrite(filters) +// streaming writer doesn't have equivalent semantic Review comment: Just changed to explicitly fail for the case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
Ngone51 commented on a change in pull request #29789: URL: https://github.com/apache/spark/pull/29789#discussion_r490682482 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -400,7 +400,10 @@ private[spark] class Executor( // Report executor runtime and JVM gc time Option(task).foreach(t => { t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis( - System.nanoTime() - taskStartTimeNs)) + // SPARK-32898: it's possible that a task be killed before it reaches + // "taskStartTimeNs = System.nanoTime()". In this case, the executorRunTime Review comment: Thanks, addressed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun closed pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun closed pull request #29796: URL: https://github.com/apache/spark/pull/29796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
SparkQA removed a comment on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629097 **[Test build #128848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128848/testReport)** for PR 29268 at commit [`d0e0084`](https://github.com/apache/spark/commit/d0e0084211109c819d2176b85c26f02abf09ec77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490682171 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: I missed that. Sorry. In that case, the utility methods will be identical with the deprecated Hadoop 3.2 `isFile/isDirectory` themselves. I'll close my PR. Thank you for your feedback. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
AmplabJenkins removed a comment on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629616 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128848/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
AmplabJenkins commented on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629611 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
SparkQA commented on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629601 **[Test build #128848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128848/testReport)** for PR 29268 at commit [`d0e0084`](https://github.com/apache/spark/commit/d0e0084211109c819d2176b85c26f02abf09ec77). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
AmplabJenkins removed a comment on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629611 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
AmplabJenkins removed a comment on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629351 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
AmplabJenkins commented on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629351 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same
AmplabJenkins removed a comment on pull request #29732: URL: https://github.com/apache/spark/pull/29732#issuecomment-694629309 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same on ever
AmplabJenkins commented on pull request #29732: URL: https://github.com/apache/spark/pull/29732#issuecomment-694629309 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn edited a comment on pull request #29777: [SPARK-32905][Core][Yarn] ApplicationMaster fails to receive UpdateDelegationTokens message
yaooqinn edited a comment on pull request #29777: URL: https://github.com/apache/spark/pull/29777#issuecomment-694607845 thanks, @tgravescs, and @mridulm I am running the sub-module `kyuubi-spark-sql-engine` of https://github.com/yaooqinn/kyuubi The simplest way to reproduce the bug and verify this fix is to follow these steps 1 build the `kyuubi-spark-sql-engine` module ``` mvn clean package -pl :kyuubi-spark-sql-engine ``` 2. config the spark with Kerberos settings towards your secured cluster 3. start it in the background ``` nohup bin/spark-submit --class org.apache.kyuubi.engine.spark.SparkSQLEngine ../kyuubi-spark-sql-engine-1.0.0-SNAPSHOT.jar > kyuubi.log & ``` 4. check the AM log and see "Updating delegation tokens ..." for SUCCESS "Inbox: Ignoring error .. does not implement 'receive'" for FAILURE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29268: [SPARK-29544] Collect the row count info and optimize the skewed conditions with row count info in the OptimizeSkewedJoin rule
SparkQA commented on pull request #29268: URL: https://github.com/apache/spark/pull/29268#issuecomment-694629097 **[Test build #128848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128848/testReport)** for PR 29268 at commit [`d0e0084`](https://github.com/apache/spark/commit/d0e0084211109c819d2176b85c26f02abf09ec77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same on every task
SparkQA commented on pull request #29732: URL: https://github.com/apache/spark/pull/29732#issuecomment-694629080 **[Test build #128847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128847/testReport)** for PR 29732 at commit [`6e7044c`](https://github.com/apache/spark/commit/6e7044c458233938b7070d97858dc6da742a28af). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #29732: [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw exception if the number of barrier() calls are not the same on every task
Ngone51 commented on pull request #29732: URL: https://github.com/apache/spark/pull/29732#issuecomment-694628984 > can we simply resolve the issue by setting longer barrier sync timeout? Increased the timeout to 5s. I think it should be enough long. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
dongjoon-hyun commented on a change in pull request #29789: URL: https://github.com/apache/spark/pull/29789#discussion_r490677550 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -400,7 +400,10 @@ private[spark] class Executor( // Report executor runtime and JVM gc time Option(task).foreach(t => { t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis( - System.nanoTime() - taskStartTimeNs)) + // SPARK-32898: it's possible that a task be killed before it reaches + // "taskStartTimeNs = System.nanoTime()". In this case, the executorRunTime Review comment: `before it reaches "taskStartTimeNs = System.nanoTime()"` might be a little misleading because this looks like equality. If we want to pointing [line 470](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470), can we use a different wording? For example, something like, `it's possible that a task is killed when taskStartTimeNs has the initial value(=0) still`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
dongjoon-hyun commented on a change in pull request #29789: URL: https://github.com/apache/spark/pull/29789#discussion_r490677550 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -400,7 +400,10 @@ private[spark] class Executor( // Report executor runtime and JVM gc time Option(task).foreach(t => { t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis( - System.nanoTime() - taskStartTimeNs)) + // SPARK-32898: it's possible that a task be killed before it reaches + // "taskStartTimeNs = System.nanoTime()". In this case, the executorRunTime Review comment: `before it reaches "taskStartTimeNs = System.nanoTime()"` might be a little misleading because this looks like equality. If we want to pointing [line 470](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470), can we use a different wording? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
dongjoon-hyun commented on a change in pull request #29789: URL: https://github.com/apache/spark/pull/29789#discussion_r490677550 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -400,7 +400,10 @@ private[spark] class Executor( // Report executor runtime and JVM gc time Option(task).foreach(t => { t.metrics.setExecutorRunTime(TimeUnit.NANOSECONDS.toMillis( - System.nanoTime() - taskStartTimeNs)) + // SPARK-32898: it's possible that a task be killed before it reaches + // "taskStartTimeNs = System.nanoTime()". In this case, the executorRunTime Review comment: `before it reaches "taskStartTimeNs = System.nanoTime()"` might be a little misleading because this looks like equality. If we want to pointing [line 473](https://github.com/apache/spark/pull/29789/files#diff-5a0de266c82b95adb47d9bca714e1f1bR473), can we use a different wording? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490673471 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: @williamhyun, looks `fs.getFileStatus(hadoopPath).isDirectory` isn't the exact synonym of `fs.isDirectory`, e.g.) ```java public boolean isDirectory(Path f) throws IOException { try { return getFileStatus(f).isDirectory(); } catch (FileNotFoundException e) { return false; // f does not exist } } ``` I think you can create utility methods under `Utils.scala` to reuse. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
HyukjinKwon commented on a change in pull request #29796: URL: https://github.com/apache/spark/pull/29796#discussion_r490673471 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -1881,7 +1881,7 @@ class SparkContext(config: SparkConf) extends Logging { if (!fs.exists(hadoopPath)) { throw new FileNotFoundException(s"Jar ${path} not found") } - if (fs.isDirectory(hadoopPath)) { + if (fs.getFileStatus(hadoopPath).isDirectory) { Review comment: @williamhyun, looks `fs.getFileStatus(hadoopPath).isDirectory` isn't exact synonym of `fs.isDirectory`, e.g.) ```java public boolean isDirectory(Path f) throws IOException { try { return getFileStatus(f).isDirectory(); } catch (FileNotFoundException e) { return false; // f does not exist } } ``` I think you can create a utility methods under `Utils.scala` to reuse. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
AmplabJenkins removed a comment on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694620254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
AmplabJenkins commented on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694620254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
SparkQA commented on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694619794 **[Test build #128846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128846/testReport)** for PR 29789 at commit [`91623a7`](https://github.com/apache/spark/commit/91623a750755704af7575df577d2e32851e81954). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29788: [SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases
HyukjinKwon commented on pull request #29788: URL: https://github.com/apache/spark/pull/29788#issuecomment-694618798 @tgravescs, @mridulm, @squito FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29789: [SPARK-32898][CORE] Fix wrong executorRunTime when task killed before real start
HyukjinKwon commented on pull request #29789: URL: https://github.com/apache/spark/pull/29789#issuecomment-694618522 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
dongjoon-hyun removed a comment on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694618049 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29786: [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()`
HyukjinKwon closed pull request #29786: URL: https://github.com/apache/spark/pull/29786 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29782: [SPARK-32907][ML] adaptively blockify instances - revert blockify gmm
HyukjinKwon commented on a change in pull request #29782: URL: https://github.com/apache/spark/pull/29782#discussion_r490671441 ## File path: python/pyspark/ml/clustering.py ## @@ -432,13 +429,6 @@ def setAggregationDepth(self, value): """ return self._set(aggregationDepth=value) -@since("3.1.0") -def setBlockSize(self, value): Review comment: @zero323 FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694618030 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
dongjoon-hyun commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694618049 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins removed a comment on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694617722 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
AmplabJenkins commented on pull request #29796: URL: https://github.com/apache/spark/pull/29796#issuecomment-694617722 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] williamhyun opened a new pull request #29796: [SPARK-32930][CORE] Replace deprecated isFile/isDirectory methods
williamhyun opened a new pull request #29796: URL: https://github.com/apache/spark/pull/29796 ### What changes were proposed in this pull request? ```scala - fs.isDirectory(hadoopPath) + fs.getFileStatus(hadoopPath).isDirectory ``` ```scala - fs.isFile(new Path(inProgressLog)) + fs.getFileStatus(new Path(inProgressLog)).isFile ``` ### Why are the changes needed? https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2-hive-2.3/1244/consoleFull ``` [warn] /home/jenkins/workspace/spark-master-test-sbt-hadoop-3.2-hive-2.3/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:815: method isFile in class FileSystem is deprecated: see corresponding Javadoc for more information. [warn] if (!fs.isFile(new Path(inProgressLog))) { ``` ``` [warn] /home/jenkins/workspace/spark-master-test-sbt-hadoop-3.2-hive-2.3/core/src/main/scala/org/apache/spark/SparkContext.scala:1884: method isDirectory in class FileSystem is deprecated: see corresponding Javadoc for more information. [warn] if (fs.isDirectory(hadoopPath)) { ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org