[jira] [Commented] (SPARK-44469) Utils.getOrCreateLocalRootDirs will never take effect after the first call fails, even if the exception is recovered
[ https://issues.apache.org/jira/browse/SPARK-44469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744047#comment-17744047 ] Yazhi Wang commented on SPARK-44469: I'll work on it. > Utils.getOrCreateLocalRootDirs will never take effect after the first call > fails, even if the exception is recovered > > > Key: SPARK-44469 > URL: https://issues.apache.org/jira/browse/SPARK-44469 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Yazhi Wang >Priority: Minor > > {code:java} > private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String] = > { > if (localRootDirs == null || localRootDirs.isEmpty) { > this.synchronized { > if (localRootDirs == null) { > localRootDirs = getOrCreateLocalRootDirsImpl(conf) > } > } > } > localRootDirs > }{code} > localRootDirs is only initialized once in the Executor/Driver life cycle. If > it fails due to a FileSystem exception (such as a full disk) during the first > initialization, localRootDirs will be assigned a value of None instead of > null. > Even if the FileSystem exception recovered, the localRootDirs won't re-apply > from FileSystem, causing the task on the Executor to continue to fail (tasks > relay on fetchFiles locally) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44469) Utils.getOrCreateLocalRootDirs will never take effect after the first call fails, even if the exception is recovered
Yazhi Wang created SPARK-44469: -- Summary: Utils.getOrCreateLocalRootDirs will never take effect after the first call fails, even if the exception is recovered Key: SPARK-44469 URL: https://issues.apache.org/jira/browse/SPARK-44469 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.1 Reporter: Yazhi Wang {code:java} private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String] = { if (localRootDirs == null || localRootDirs.isEmpty) { this.synchronized { if (localRootDirs == null) { localRootDirs = getOrCreateLocalRootDirsImpl(conf) } } } localRootDirs }{code} localRootDirs is only initialized once in the Executor/Driver life cycle. If it fails due to a FileSystem exception (such as a full disk) during the first initialization, localRootDirs will be assigned a value of None instead of null. Even if the FileSystem exception recovered, the localRootDirs won't re-apply from FileSystem, causing the task on the Executor to continue to fail (tasks relay on fetchFiles locally) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-41192: --- Description: When task finished before speculative task has been scheduled by DAGScheduler, then the speculative tasks will be considered as pending and count towards the calculation of number of needed executors, which will lead to request more executors than needed h2. Background & Reproduce In one of our production job, we found that ExecutorAllocationManager was holding more executors than needed. We found it's difficult to reproduce in the test environment. In order to stably reproduce and debug, we temporarily annotated the scheduling code of speculative tasks in TaskSetManager:363 to ensure that the task be completed before the speculative task being scheduled. {code:java} // Original code private def dequeueTask( execId: String, host: String, maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, Boolean)] = { // Tries to schedule a regular task first; if it returns None, then schedules // a speculative task dequeueTaskHelper(execId, host, maxLocality, false).orElse( dequeueTaskHelper(execId, host, maxLocality, true)) } // Speculative task will never be scheduled private def dequeueTask( execId: String, host: String, maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, Boolean)] = { // Tries to schedule a regular task first; if it returns None, then schedules // a speculative task dequeueTaskHelper(execId, host, maxLocality, false) } {code} Referring to examples in SPARK-30511 You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. {code:java} ./bin/spark-shell --master yarn --conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.minExecutors=20 --conf spark.dynamicAllocation.maxExecutors=1000 {code} {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index > 3998) { Thread.sleep(1000 * 1000) } else if (index > 3850) { Thread.sleep(50 * 1000) // Fake running tasks } else { Thread.sleep(100) } Array.fill[Int](1)(1).iterator{code} I will have a PR ready to fix this issue was: When task finished before speculative task has been scheduled by DAGScheduler, then the speculative tasks will be considered as pending and count towards the calculation of number of needed executors, which will lead to request more executors than needed h2. Background & Reproduce In one of our production job, we found that ExecutorAllocationManager was holding more executors than needed. We found it's difficult to reproduce in the test environment. In order to stably reproduce and debug, we temporarily annotated the scheduling code of speculative tasks in TaskSetManager:363 to ensure that the task be completed before the speculative task being scheduled. {code:java} // Original code private def dequeueTask( execId: String, host: String, maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, Boolean)] = { // Tries to schedule a regular task first; if it returns None, then schedules // a speculative task dequeueTaskHelper(execId, host, maxLocality, false).orElse( dequeueTaskHelper(execId, host, maxLocality, true)) } // Speculative task will never be scheduled private def dequeueTask( execId: String, host: String, maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, Boolean)] = { // Tries to schedule a regular task first; if it returns None, then schedules // a speculative task dequeueTaskHelper(execId, host, maxLocality, false) } {code} Referring to examples in SPARK-30511 You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. {code:java} ./bin/spark-shell --master yarn --conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.minExecutors=20 --conf spark.dynamicAllocation.maxExecutors=1000 {code} {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index > 3998) { Thread.sleep(1000 * 1000) } else if (index > 3850) { Thread.sleep(20 * 1000) // Fake running tasks } else { Thread.sleep(100) } Array.fill[Int](1)(1).iterator{code} I will have a PR ready to fix this issue > Task finished before speculative task scheduled leads to holding idle > executors >
[jira] [Updated] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-41192: --- Attachment: dynamic-executors > Task finished before speculative task scheduled leads to holding idle > executors > --- > > Key: SPARK-41192 > URL: https://issues.apache.org/jira/browse/SPARK-41192 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2, 3.3.1 >Reporter: Yazhi Wang >Priority: Minor > Labels: dynamic_allocation > Attachments: dynamic-executors, dynamic-log > > > When task finished before speculative task has been scheduled by > DAGScheduler, then the speculative tasks will be considered as pending and > count towards the calculation of number of needed executors, which will lead > to request more executors than needed > h2. Background & Reproduce > In one of our production job, we found that ExecutorAllocationManager was > holding more executors than needed. > We found it's difficult to reproduce in the test environment. In order to > stably reproduce and debug, we temporarily annotated the scheduling code of > speculative tasks in TaskSetManager:363 to ensure that the task be completed > before the speculative task being scheduled. > {code:java} > // Original code > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false).orElse( > dequeueTaskHelper(execId, host, maxLocality, true)) > } > // Speculative task will never be scheduled > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false) > } {code} > Referring to examples in SPARK-30511 > You will see when running the last task, we would be hold 38 executors (see > attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only > 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. > {code:java} > ./bin/spark-shell --master yarn --conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.minExecutors=20 --conf > spark.dynamicAllocation.maxExecutors=1000 {code} > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index > 3998) { > Thread.sleep(1000 * 1000) > } else if (index > 3850) { > Thread.sleep(20 * 1000) // Fake running tasks > } else { > Thread.sleep(100) > } > Array.fill[Int](1)(1).iterator{code} > > I will have a PR ready to fix this issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors
[ https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-41192: --- Attachment: dynamic-log > Task finished before speculative task scheduled leads to holding idle > executors > --- > > Key: SPARK-41192 > URL: https://issues.apache.org/jira/browse/SPARK-41192 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2, 3.3.1 >Reporter: Yazhi Wang >Priority: Minor > Labels: dynamic_allocation > Attachments: dynamic-executors, dynamic-log > > > When task finished before speculative task has been scheduled by > DAGScheduler, then the speculative tasks will be considered as pending and > count towards the calculation of number of needed executors, which will lead > to request more executors than needed > h2. Background & Reproduce > In one of our production job, we found that ExecutorAllocationManager was > holding more executors than needed. > We found it's difficult to reproduce in the test environment. In order to > stably reproduce and debug, we temporarily annotated the scheduling code of > speculative tasks in TaskSetManager:363 to ensure that the task be completed > before the speculative task being scheduled. > {code:java} > // Original code > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false).orElse( > dequeueTaskHelper(execId, host, maxLocality, true)) > } > // Speculative task will never be scheduled > private def dequeueTask( > execId: String, > host: String, > maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, > Boolean)] = { > // Tries to schedule a regular task first; if it returns None, then > schedules > // a speculative task > dequeueTaskHelper(execId, host, maxLocality, false) > } {code} > Referring to examples in SPARK-30511 > You will see when running the last task, we would be hold 38 executors (see > attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only > 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. > {code:java} > ./bin/spark-shell --master yarn --conf spark.speculation=true --conf > spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.minExecutors=20 --conf > spark.dynamicAllocation.maxExecutors=1000 {code} > {code:java} > val n = 4000 > val someRDD = sc.parallelize(1 to n, n) > someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { > if (index > 3998) { > Thread.sleep(1000 * 1000) > } else if (index > 3850) { > Thread.sleep(20 * 1000) // Fake running tasks > } else { > Thread.sleep(100) > } > Array.fill[Int](1)(1).iterator{code} > > I will have a PR ready to fix this issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors
Yazhi Wang created SPARK-41192: -- Summary: Task finished before speculative task scheduled leads to holding idle executors Key: SPARK-41192 URL: https://issues.apache.org/jira/browse/SPARK-41192 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.1, 3.2.2 Reporter: Yazhi Wang When task finished before speculative task has been scheduled by DAGScheduler, then the speculative tasks will be considered as pending and count towards the calculation of number of needed executors, which will lead to request more executors than needed h2. Background & Reproduce In one of our production job, we found that ExecutorAllocationManager was holding more executors than needed. We found it's difficult to reproduce in the test environment. In order to stably reproduce and debug, we temporarily annotated the scheduling code of speculative tasks in TaskSetManager:363 to ensure that the task be completed before the speculative task being scheduled. {code:java} // Original code private def dequeueTask( execId: String, host: String, maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, Boolean)] = { // Tries to schedule a regular task first; if it returns None, then schedules // a speculative task dequeueTaskHelper(execId, host, maxLocality, false).orElse( dequeueTaskHelper(execId, host, maxLocality, true)) } // Speculative task will never be scheduled private def dequeueTask( execId: String, host: String, maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, Boolean)] = { // Tries to schedule a regular task first; if it returns None, then schedules // a speculative task dequeueTaskHelper(execId, host, maxLocality, false) } {code} Referring to examples in SPARK-30511 You will see when running the last task, we would be hold 38 executors (see attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed. {code:java} ./bin/spark-shell --master yarn --conf spark.speculation=true --conf spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.minExecutors=20 --conf spark.dynamicAllocation.maxExecutors=1000 {code} {code:java} val n = 4000 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index > 3998) { Thread.sleep(1000 * 1000) } else if (index > 3850) { Thread.sleep(20 * 1000) // Fake running tasks } else { Thread.sleep(100) } Array.fill[Int](1)(1).iterator{code} I will have a PR ready to fix this issue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
[ https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-37469: --- Attachment: executor-page.png sql-page.png > Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI > --- > > Key: SPARK-37469 > URL: https://issues.apache.org/jira/browse/SPARK-37469 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Yazhi Wang >Priority: Minor > Attachments: executor-page.png, sql-page.png > > > Metrics in Executor/Task page shown as " > Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which > make us confused > > !image-2021-11-26-12-12-46-896.png! > !image-2021-11-26-12-15-28-204.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
[ https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-37469: --- Description: Metrics in Executor/Task page shown as " Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which make us confused !executor-page.png! !sql-page.png! was: Metrics in Executor/Task page shown as " Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which make us confused !executor-page.png! > Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI > --- > > Key: SPARK-37469 > URL: https://issues.apache.org/jira/browse/SPARK-37469 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Yazhi Wang >Priority: Minor > Attachments: executor-page.png, sql-page.png > > > Metrics in Executor/Task page shown as " > Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which > make us confused !executor-page.png! > !sql-page.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
[ https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449371#comment-17449371 ] Yazhi Wang commented on SPARK-37469: I'm working on it > Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI > --- > > Key: SPARK-37469 > URL: https://issues.apache.org/jira/browse/SPARK-37469 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Yazhi Wang >Priority: Minor > Attachments: executor-page.png, sql-page.png > > > Metrics in Executor/Task page shown as " > Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which > make us confused !executor-page.png! > !sql-page.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
[ https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-37469: --- Description: Metrics in Executor/Task page shown as " Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which make us confused !executor-page.png! was: Metrics in Executor/Task page shown as " Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which make us confused !image-2021-11-26-12-12-46-896.png! !image-2021-11-26-12-15-28-204.png! > Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI > --- > > Key: SPARK-37469 > URL: https://issues.apache.org/jira/browse/SPARK-37469 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Yazhi Wang >Priority: Minor > Attachments: executor-page.png, sql-page.png > > > Metrics in Executor/Task page shown as " > Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which > make us confused !executor-page.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
Yazhi Wang created SPARK-37469: -- Summary: Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI Key: SPARK-37469 URL: https://issues.apache.org/jira/browse/SPARK-37469 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.2.0 Reporter: Yazhi Wang Metrics in Executor/Task page shown as " Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which make us confused !image-2021-11-26-12-12-46-896.png! !image-2021-11-26-12-15-28-204.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37141) WorkerSuite cannot run on Mac OS
[ https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435199#comment-17435199 ] Yazhi Wang commented on SPARK-37141: I'm working on it > WorkerSuite cannot run on Mac OS > > > Key: SPARK-37141 > URL: https://issues.apache.org/jira/browse/SPARK-37141 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac > os(both M1 and Intel) failed > {code:java} > mvn clean install -DskipTests -pl core -am > mvn test -pl core -Dtest=none > -DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite > {code} > {code:java} > WorkerSuite: > - test isUseLocalNodeSSLConfig > - test maybeUpdateSSLSettings > - test clearing of finishedExecutors (small number of executors) > - test clearing of finishedExecutors (more executors) > - test clearing of finishedDrivers (small number of drivers) > - test clearing of finishedDrivers (more drivers) > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 47.973 s > [INFO] Finished at: 2021-10-28T13:46:56+08:00 > [INFO] > > [ERROR] Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.0.2:test (test) on project > spark-core_2.12: There are test failures -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > {code} > {code:java} > 21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create > directory /tmp > java.nio.file.FileAlreadyExistsException: /tmp > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) > at java.nio.file.Files.createDirectory(Files.java:674) > at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781) > at java.nio.file.Files.createDirectories(Files.java:727) > at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292) > at > org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221) > at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232) > at > org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at > org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones
[ https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-35862: --- Description: Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for watermark and eventTime stats. the timestampFormat is hardcoded in UTC time zone. ` private val timestampFormat = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) ` When users set the different timezone by java options `-Duser.timezone` , they may be confused by the information mixed with different timezone. eg ` {color:#FF}*2021-06-23 16:12:07*{color} [stream execution thread for [id = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: Streaming query made progress: { "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", "name" : null, "timestamp" : "2021-06-23T08:11:56.790Z", "batchId" : 91740, "numInputRows" : 2577, "inputRowsPerSecond" : 155.33453887884266, "processedRowsPerSecond" : 242.29033471229786, "durationMs" : { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : 79, "triggerExecution" : 10636, "walCommit" : 162 } , "eventTime" : {color:#FF}*{ "avg" : "2021-06-23T08:11:46.307Z", "max" : "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : "2021-06-23T07:41:39.000Z" }*{color} , ` maybe we need to unified the timezone for time format was: Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for watermark and eventTime stats. the timestampFormat is hardcoded in UTC time zone. ` private val timestampFormat = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) ` When users set the diffenrent timezone by java options `-Duser.timezone` , they may be confused by the information mixed with different timezone. eg ` 2021-06-23 16:12:07 [stream execution thread for [id = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: Streaming query made progress: { "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", "name" : null, "timestamp" : "2021-06-23T08:11:56.790Z", "batchId" : 91740, "numInputRows" : 2577, "inputRowsPerSecond" : 155.33453887884266, "processedRowsPerSecond" : 242.29033471229786, "durationMs" : { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : 79, "triggerExecution" : 10636, "walCommit" : 162 }, "eventTime" : { "avg" : "2021-06-23T08:11:46.307Z", "max" : "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : "2021-06-23T07:41:39.000Z" }, ` maybe we need to unified the timezone for time format > Watermark timestamp only can be format in UTC timeZone, unfriendly to users > in other time zones > --- > > Key: SPARK-35862 > URL: https://issues.apache.org/jira/browse/SPARK-35862 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Yazhi Wang >Priority: Minor > > Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for > watermark and eventTime stats. the timestampFormat is hardcoded in UTC time > zone. > ` > private val timestampFormat = new > SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 > timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) > ` > When users set the different timezone by java options `-Duser.timezone` , > they may be confused by the information mixed with different timezone. > eg > ` > {color:#FF}*2021-06-23 16:12:07*{color} [stream execution thread for [id > = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = > 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: > Streaming query made progress: { > "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", > "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", > "name" : null, > "timestamp" : "2021-06-23T08:11:56.790Z", > "batchId" : 91740, > "numInputRows" : 2577, > "inputRowsPerSecond" : 155.33453887884266, > "processedRowsPerSecond" : 242.29033471229786, > "durationMs" : > { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : > 79, "triggerExecution" : 10636, "walCommit" : 162 } > , > "eventTime" : > {color:#FF}*{ "avg" : "2021-06-23T08:11:46.307Z", "max" : > "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : > "2021-06-23T07:41:39.000Z" }*{color} > , > ` > maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jir
[jira] [Commented] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones
[ https://issues.apache.org/jira/browse/SPARK-35862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368022#comment-17368022 ] Yazhi Wang commented on SPARK-35862: I'm working on it > Watermark timestamp only can be format in UTC timeZone, unfriendly to users > in other time zones > --- > > Key: SPARK-35862 > URL: https://issues.apache.org/jira/browse/SPARK-35862 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Yazhi Wang >Priority: Minor > > Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for > watermark and eventTime stats. the timestampFormat is hardcoded in UTC time > zone. > ` > private val timestampFormat = new > SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 > timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) > ` > When users set the diffenrent timezone by java options `-Duser.timezone` , > they may be confused by the information mixed with different timezone. > eg > ` > 2021-06-23 16:12:07 [stream execution thread for [id = > 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = > 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: > Streaming query made progress: { > "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", > "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", > "name" : null, > "timestamp" : "2021-06-23T08:11:56.790Z", > "batchId" : 91740, > "numInputRows" : 2577, > "inputRowsPerSecond" : 155.33453887884266, > "processedRowsPerSecond" : 242.29033471229786, > "durationMs" : { > "addBatch" : 8671, > "getBatch" : 3, > "getOffset" : 1139, > "queryPlanning" : 79, > "triggerExecution" : 10636, > "walCommit" : 162 > }, > "eventTime" : { > "avg" : "2021-06-23T08:11:46.307Z", > "max" : "2021-06-23T08:11:55.000Z", > "min" : "2021-06-23T08:11:37.000Z", > "watermark" : "2021-06-23T07:41:39.000Z" > }, > ` > maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35862) Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones
Yazhi Wang created SPARK-35862: -- Summary: Watermark timestamp only can be format in UTC timeZone, unfriendly to users in other time zones Key: SPARK-35862 URL: https://issues.apache.org/jira/browse/SPARK-35862 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.1.2 Reporter: Yazhi Wang Timestamp is formatted in `ProgressReporter` by `formatTimestamp` for watermark and eventTime stats. the timestampFormat is hardcoded in UTC time zone. ` private val timestampFormat = new SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601 timestampFormat.setTimeZone(DateTimeUtils.getTimeZone("UTC")) ` When users set the diffenrent timezone by java options `-Duser.timezone` , they may be confused by the information mixed with different timezone. eg ` 2021-06-23 16:12:07 [stream execution thread for [id = 92f4f363-df85-48e9-aef9-5ea6f2b70316, runId = 5733ef8e-11d1-46c4-95cc-219bde6e7a20]] INFO [MicroBatchExecution:54]: Streaming query made progress: { "id" : "92f4f363-df85-48e9-aef9-5ea6f2b70316", "runId" : "5733ef8e-11d1-46c4-95cc-219bde6e7a20", "name" : null, "timestamp" : "2021-06-23T08:11:56.790Z", "batchId" : 91740, "numInputRows" : 2577, "inputRowsPerSecond" : 155.33453887884266, "processedRowsPerSecond" : 242.29033471229786, "durationMs" : { "addBatch" : 8671, "getBatch" : 3, "getOffset" : 1139, "queryPlanning" : 79, "triggerExecution" : 10636, "walCommit" : 162 }, "eventTime" : { "avg" : "2021-06-23T08:11:46.307Z", "max" : "2021-06-23T08:11:55.000Z", "min" : "2021-06-23T08:11:37.000Z", "watermark" : "2021-06-23T07:41:39.000Z" }, ` maybe we need to unified the timezone for time format -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
[ https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364819#comment-17364819 ] Yazhi Wang commented on SPARK-35796: I'm working on it > UT `handles k8s cluster mode` fails on MacOs >= 10.15 > - > > Key: SPARK-35796 > URL: https://issues.apache.org/jira/browse/SPARK-35796 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.3 > Environment: MacOs 10.15.7 >Reporter: Yazhi Wang >Priority: Minor > > When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for > `handles k8s cluster mode` test after pr > [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to > `File(path).getCanonicalFile().toURI()` function with absolute path as > parameter will return path begin with /System/Volumes/Data. > eg. /home/testjars.jar will get > [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
[ https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yazhi Wang updated SPARK-35796: --- Description: When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for `handles k8s cluster mode` test after pr [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to `File(path).getCanonicalFile().toURI()` function with absolute path as parameter will return path begin with /System/Volumes/Data. eg. /home/testjars.jar will get [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] was: When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for `handles k8s cluster mode` test after pr SPARK-3561 due to `File(path).getCanonicalFile().toURI()` function with absolute path as parameter will return path begin with /System/Volumes/Data. eg. /home/testjars.jar will get file:/System/Volumes/Data/home/testjars.jar > UT `handles k8s cluster mode` fails on MacOs >= 10.15 > - > > Key: SPARK-35796 > URL: https://issues.apache.org/jira/browse/SPARK-35796 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.3 > Environment: MacOs 10.15.7 >Reporter: Yazhi Wang >Priority: Minor > > When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for > `handles k8s cluster mode` test after pr > [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to > `File(path).getCanonicalFile().toURI()` function with absolute path as > parameter will return path begin with /System/Volumes/Data. > eg. /home/testjars.jar will get > [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
Yazhi Wang created SPARK-35796: -- Summary: UT `handles k8s cluster mode` fails on MacOs >= 10.15 Key: SPARK-35796 URL: https://issues.apache.org/jira/browse/SPARK-35796 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.1.3 Environment: MacOs 10.15.7 Reporter: Yazhi Wang When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for `handles k8s cluster mode` test after pr SPARK-3561 due to `File(path).getCanonicalFile().toURI()` function with absolute path as parameter will return path begin with /System/Volumes/Data. eg. /home/testjars.jar will get file:/System/Volumes/Data/home/testjars.jar -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org