[jira] [Updated] (SPARK-45189) Creating UnresolvedRelation from TableIdentifier should include the catalog field
[ https://issues.apache.org/jira/browse/SPARK-45189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-45189: --- Affects Version/s: 3.5.1 > Creating UnresolvedRelation from TableIdentifier should include the catalog > field > - > > Key: SPARK-45189 > URL: https://issues.apache.org/jira/browse/SPARK-45189 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45189) Creating UnresolvedRelation from TableIdentifier should include the catalog field
[ https://issues.apache.org/jira/browse/SPARK-45189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-45189: --- Fix Version/s: 3.5.1 > Creating UnresolvedRelation from TableIdentifier should include the catalog > field > - > > Key: SPARK-45189 > URL: https://issues.apache.org/jira/browse/SPARK-45189 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45229) Show the number of drivers waiting in SUBMITTED status
Dongjoon Hyun created SPARK-45229: - Summary: Show the number of drivers waiting in SUBMITTED status Key: SPARK-45229 URL: https://issues.apache.org/jira/browse/SPARK-45229 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44112) Drop Java 8 and 11 support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44112: -- Summary: Drop Java 8 and 11 support (was: Drop Java 8 Support) > Drop Java 8 and 11 support > -- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45225) XML: XSD file URL support
[ https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766983#comment-17766983 ] Snoot.io commented on SPARK-45225: -- User 'sandip-db' has created a pull request for this issue: https://github.com/apache/spark/pull/43000 > XML: XSD file URL support > - > > Key: SPARK-45225 > URL: https://issues.apache.org/jira/browse/SPARK-45225 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45225) XML: XSD file URL support
[ https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45225. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43000 [https://github.com/apache/spark/pull/43000] > XML: XSD file URL support > - > > Key: SPARK-45225 > URL: https://issues.apache.org/jira/browse/SPARK-45225 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45225) XML: XSD file URL support
[ https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45225: Assignee: Sandip Agarwala > XML: XSD file URL support > - > > Key: SPARK-45225 > URL: https://issues.apache.org/jira/browse/SPARK-45225 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44622) Implement FetchErrorDetails RPC
[ https://issues.apache.org/jira/browse/SPARK-44622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44622. --- Fix Version/s: 4.0.0 Assignee: Yihong He Resolution: Fixed > Implement FetchErrorDetails RPC > --- > > Key: SPARK-44622 > URL: https://issues.apache.org/jira/browse/SPARK-44622 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45218) Refine docstring of `Column.isin`
[ https://issues.apache.org/jira/browse/SPARK-45218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766980#comment-17766980 ] Snoot.io commented on SPARK-45218: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/43001 > Refine docstring of `Column.isin` > - > > Key: SPARK-45218 > URL: https://issues.apache.org/jira/browse/SPARK-45218 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Refine the docstring of `Column.isin` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766979#comment-17766979 ] Dongjoon Hyun commented on SPARK-44112: --- Never mind because you already made a PR, [~LuciferYang]. > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45226) Refine docstring of `rand/randn`
[ https://issues.apache.org/jira/browse/SPARK-45226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766977#comment-17766977 ] Snoot.io commented on SPARK-45226: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/43003 > Refine docstring of `rand/randn` > > > Key: SPARK-45226 > URL: https://issues.apache.org/jira/browse/SPARK-45226 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45226) Refine docstring of `rand/randn`
[ https://issues.apache.org/jira/browse/SPARK-45226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766975#comment-17766975 ] Snoot.io commented on SPARK-45226: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/43003 > Refine docstring of `rand/randn` > > > Key: SPARK-45226 > URL: https://issues.apache.org/jira/browse/SPARK-45226 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44463) Improve error handling in Connect foreachBatch worker.
[ https://issues.apache.org/jira/browse/SPARK-44463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44463. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42986 [https://github.com/apache/spark/pull/42986] > Improve error handling in Connect foreachBatch worker. > -- > > Key: SPARK-44463 > URL: https://issues.apache.org/jira/browse/SPARK-44463 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Priority: Major > Fix For: 4.0.0 > > > An error in user code inside foreachBatch worker is not propagated correctly > to the user. We should. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43498) Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766972#comment-17766972 ] Snoot.io commented on SPARK-43498: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/43002 > Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0. > -- > > Key: SPARK-43498 > URL: https://issues.apache.org/jira/browse/SPARK-43498 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43498) Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766971#comment-17766971 ] Snoot.io commented on SPARK-43498: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/43002 > Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0. > -- > > Key: SPARK-43498 > URL: https://issues.apache.org/jira/browse/SPARK-43498 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45228) Update `test_axis_on_dataframe` when Pandas regression is fixed
[ https://issues.apache.org/jira/browse/SPARK-45228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-45228: Summary: Update `test_axis_on_dataframe` when Pandas regression is fixed (was: Restore `test_axis_on_dataframe` in normal state when Pandas regression is fixed) > Update `test_axis_on_dataframe` when Pandas regression is fixed > --- > > Key: SPARK-45228 > URL: https://issues.apache.org/jira/browse/SPARK-45228 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, Tests >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > We manually cast the datatype when testing `test_axis_on_dataframe` from > [https://github.com/apache/spark/pull/43002,|https://github.com/apache/spark/pull/43002.] > but it's not a normal way to test properly. > After the regression of Pandas is resolved, we should return the test back to > normal way. > See Pandas regression: https://github.com/pandas-dev/pandas/issues/55194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45228) Restore `test_axis_on_dataframe` in normal state when Pandas regression is fixed
Haejoon Lee created SPARK-45228: --- Summary: Restore `test_axis_on_dataframe` in normal state when Pandas regression is fixed Key: SPARK-45228 URL: https://issues.apache.org/jira/browse/SPARK-45228 Project: Spark Issue Type: Bug Components: Pandas API on Spark, Tests Affects Versions: 4.0.0 Reporter: Haejoon Lee We manually cast the datatype when testing `test_axis_on_dataframe` from [https://github.com/apache/spark/pull/43002,|https://github.com/apache/spark/pull/43002.] but it's not a normal way to test properly. After the regression of Pandas is resolved, we should return the test back to normal way. See Pandas regression: https://github.com/pandas-dev/pandas/issues/55194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766970#comment-17766970 ] Snoot.io commented on SPARK-44112: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/43005 > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766969#comment-17766969 ] Snoot.io commented on SPARK-45134: -- User 'gaoyajun02' has created a pull request for this issue: https://github.com/apache/spark/pull/43004 > Data duplication may occur when fallback to origin shuffle block > > > Key: SPARK-45134 > URL: https://issues.apache.org/jira/browse/SPARK-45134 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Critical > > One possible situation that has been found is that, during the process of > requesting mergedBlockMeta, when the channel is closed, it may trigger two > callback callbacks and result in duplicate data for the original shuffle > blocks. > # The first time is when the channel is inactivated, the responseHandler > will execute the callback for all outstandingRpcs. > # The second time is when the listener corresponding to > shuffleClient.writeAndFlush executes the callback after the channel is closed. > Some Error Logs: > {code:java} > 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still > have 1 requests outstanding when connection from host/ip:prot is closed > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Connection from host:port closed > at > org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) > at > org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:745) > > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Failed to send RPC RPC 8079698359363123411 to > host/ip:port: java.nio.channels.ClosedChannelException > at
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h2. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h2. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote}* Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} h2. Relevant code paths Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread are different. However, they
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Fix Version/s: (was: 4.0.0) (was: 3.5.1) Target Version/s: (was: 3.3.1) > Fix an issue where an executor process randomly gets stuck, by making > CoarseGrainedExecutorBackend.taskResources thread-safe > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at >
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Summary: Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe (was: Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe) > Fix an issue where an executor process randomly gets stuck, by making > CoarseGrainedExecutorBackend.taskResources thread-safe > > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. Note that the same > EMR cluster with two worker nodes was able to run the same app without any > issue before and after the incident. > h2. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h2. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h2. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} h2. Relevant code paths Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread are different.
[jira] [Comment Edited] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766966#comment-17766966 ] Dongjoon Hyun edited comment on SPARK-44112 at 9/20/23 3:03 AM: Oh. Thanks. It seems that I was outdated and missed the discussion. was (Author: dongjoon): Oh. Thanks. It seems that I was outdated and missed the discussion. In the dev mailing, did Sean agree to drop Java 11 ? > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766967#comment-17766967 ] Dongjoon Hyun commented on SPARK-44112: --- My bad! You're right in that part, but please keep Java 11 in a separate JIRA if you don't mind. > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766966#comment-17766966 ] Dongjoon Hyun commented on SPARK-44112: --- Oh. Thanks. It seems that I was outdated and missed the discussion. In the dev mailing, did Sean agree to drop Java 11 ? > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} * Relevant code paths {quote}Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Attachment: hashtable1.png > Fix an issue where an executor process randomly gets stuck by making > CoarseGrainedExecutorBackend.taskResources thread-safe > --- > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h3. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. > Note that the same EMR cluster with two worker nodes was able to run the same > app without any issue before and after the incident. > h3. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > > > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) >
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Attachment: hashtable2.png > Fix an issue where an executor process randomly gets stuck by making > CoarseGrainedExecutorBackend.taskResources thread-safe > --- > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h3. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. > Note that the same EMR cluster with two worker nodes was able to run the same > app without any issue before and after the incident. > h3. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > > > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) >
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Attachment: (was: Screenshot 2023-09-19 at 7.55.37 PM.png) > Fix an issue where an executor process randomly gets stuck by making > CoarseGrainedExecutorBackend.taskResources thread-safe > --- > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h3. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. > Note that the same EMR cluster with two worker nodes was able to run the same > app without any issue before and after the incident. > h3. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > > > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at >
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Attachment: (was: Screenshot 2023-09-19 at 7.55.31 PM.png) > Fix an issue where an executor process randomly gets stuck by making > CoarseGrainedExecutorBackend.taskResources thread-safe > --- > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: hashtable1.png, hashtable2.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h3. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. > Note that the same EMR cluster with two worker nodes was able to run the same > app without any issue before and after the incident. > h3. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > > > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at >
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} * Relevant code paths {quote}Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread are
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Attachment: Screenshot 2023-09-19 at 7.55.37 PM.png > Fix an issue where an executor process randomly gets stuck by making > CoarseGrainedExecutorBackend.taskResources thread-safe > --- > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: Screenshot 2023-09-19 at 7.55.31 PM.png, Screenshot > 2023-09-19 at 7.55.37 PM.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h3. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. > Note that the same EMR cluster with two worker nodes was able to run the same > app without any issue before and after the incident. > h3. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > > > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at >
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Attachment: Screenshot 2023-09-19 at 7.55.31 PM.png > Fix an issue where an executor process randomly gets stuck by making > CoarseGrainedExecutorBackend.taskResources thread-safe > --- > > Key: SPARK-45227 > URL: https://issues.apache.org/jira/browse/SPARK-45227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.5.0, 4.0.0 >Reporter: Bo Xiong >Priority: Critical > Labels: hang, infinite-loop, race-condition, stuck, threadsafe > Fix For: 4.0.0, 3.5.1 > > Attachments: Screenshot 2023-09-19 at 7.55.31 PM.png, Screenshot > 2023-09-19 at 7.55.37 PM.png > > Original Estimate: 4h > Remaining Estimate: 4h > > h3. Symptom > Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very > last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking > at Spark UI, we saw that an executor process hung over 1 hour. After we > manually killed the executor process, the app succeeded. > Note that the same EMR cluster with two worker nodes was able to run the same > app without any issue before and after the incident. > h3. Observations > Below is what's observed from relevant container logs and thread dump. > * A regular task that's sent to the executor, which also reported back to > the driver upon the task completion. > > {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID > 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID > 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) > $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 > $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) > 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). > 4495 bytes result sent to driver}} > {quote} > * Another task that's sent to the executor but didn't get launched since the > single-threaded dispatcher was stuck (presumably in an "infinite loop" as > explained later). > > > {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz > 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID > 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, > 4432 bytes) taskResourceAssignments Map() > $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz > 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 > $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz > >> note that the above command has no matching result, indicating that task > >> 153.0 in stage 23.0 (TID 924) was never launched}} > {quote} * Thread dump shows that the dispatcher-Executor thread has the > following stack trace. > > {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 > tid=0x98e37800 nid=0x1aff runnable [0x73bba000] > java.lang.Thread.State: RUNNABLE > at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) > at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) > at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) > at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) > at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) > at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) > at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) > at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) > at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) > at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.put(HashMap.scala:126) > at scala.collection.mutable.HashMap.update(HashMap.scala:131) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) > at > org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown > Source) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at >
[jira] [Commented] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766965#comment-17766965 ] Yang Jie commented on SPARK-44112: -- !image-2023-09-20-10-53-34-956.png|width=729,height=161! I have modified Jira title because it is stated in the release notes of Apache 3.5.0 that the minimum supported Java version for the next major version will be Java 17 > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} * Relevant code paths {quote}Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread are
[jira] [Updated] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44112: - Attachment: image-2023-09-20-10-53-34-956.png > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44112: - Attachment: image-2023-09-20-10-52-59-327.png > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Attachments: image-2023-09-20-10-52-59-327.png, > image-2023-09-20-10-53-34-956.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} * Relevant code paths {quote}Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread are
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Description: h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} * Relevant code paths {quote}Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher thread are different.
[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
[ https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-45227: - Fix Version/s: 4.0.0 Description: h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} * Relevant code paths {quote}Within an executor process, there's a [dispatcher thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170] dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that launches tasks scheduled by the driver. Each task is run on a TaskRunner thread backed by a thread pool created for the executor. The TaskRunner thread and the dispatcher
[jira] [Created] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
Bo Xiong created SPARK-45227: Summary: Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe Key: SPARK-45227 URL: https://issues.apache.org/jira/browse/SPARK-45227 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0, 3.3.1, 4.0.0 Reporter: Bo Xiong Fix For: 3.5.1 h3. Symptom Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at Spark UI, we saw that an executor process hung over 1 hour. After we manually killed the executor process, the app succeeded. Note that the same EMR cluster with two worker nodes was able to run the same app without any issue before and after the incident. h3. Observations Below is what's observed from relevant container logs and thread dump. * A regular task that's sent to the executor, which also reported back to the driver upon the task completion. {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200) $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923 $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923) 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 4495 bytes result sent to driver}} {quote} * Another task that's sent to the executor but didn't get launched since the single-threaded dispatcher was stuck (presumably in an "infinite loop" as explained later). {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 bytes) taskResourceAssignments Map() $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924 $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz >> note that the above command has no matching result, indicating that task >> 153.0 in stage 23.0 (TID 924) was never launched}} {quote} * Thread dump shows that the dispatcher-Executor thread has the following stack trace. {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 nid=0x1aff runnable [0x73bba000] java.lang.Thread.State: RUNNABLE at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142) at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131) at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123) at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365) at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365) at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44) at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140) at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169) at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.put(HashMap.scala:126) at scala.collection.mutable.HashMap.update(HashMap.scala:131) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown Source) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)}} {quote} *
[jira] [Updated] (SPARK-44112) Drop Java 8 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44112: -- Summary: Drop Java 8 Support (was: Drop Java 8 and Java 11 Support) > Drop Java 8 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44112) Drop Java 8 and Java 11 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766963#comment-17766963 ] Dongjoon Hyun commented on SPARK-44112: --- Sorry, but don't change this JIRA, [~LuciferYang]. If we want to drop Java 11, it needs another JIRA. I'll recover the scope of this JIRA issue > Drop Java 8 and Java 11 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44112) Drop Java 8 and Java 11 Support
[ https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44112: - Summary: Drop Java 8 and Java 11 Support (was: Drop Java 8 Support) > Drop Java 8 and Java 11 Support > --- > > Key: SPARK-44112 > URL: https://issues.apache.org/jira/browse/SPARK-44112 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45178) Fallback to use single batch executor for Trigger.AvailableNow with unsupported sources rather than using wrapper
[ https://issues.apache.org/jira/browse/SPARK-45178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-45178. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42940 [https://github.com/apache/spark/pull/42940] > Fallback to use single batch executor for Trigger.AvailableNow with > unsupported sources rather than using wrapper > - > > Key: SPARK-45178 > URL: https://issues.apache.org/jira/browse/SPARK-45178 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We have observed the case where wrapper implementation of > Trigger.AvailableNow ( > AvailableNowDataStreamWrapper and subclasses) is not fully compatible with > 3rd party data source and brought up correctness issue. > > While we could persuade 3rd party data source to support > Trigger.AvailableNow, pursuing all 3rd parties to do this is too aggressive > and challenging goal we never be able to make. Also, it may not be also > possible to come up with the wrapper implementation which would have zero > issue with any arbitrary source. > > As a mitigation, we want to make a slight behavioral change for such case, > falling back to single batch execution (a.k.a. Trigger.Once) rather than > using wrapper implementation. The exact behavior between Trigger.AvailableNow > and Trigger.Once are different so it's technically behavioral change, but > it's probably lot less surprised than failing the query. > > For extreme case where users are confident that there will be no issue at all > on using wrapper, we will come up with a flag to provide the previous > behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45178) Fallback to use single batch executor for Trigger.AvailableNow with unsupported sources rather than using wrapper
[ https://issues.apache.org/jira/browse/SPARK-45178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-45178: Assignee: Jungtaek Lim > Fallback to use single batch executor for Trigger.AvailableNow with > unsupported sources rather than using wrapper > - > > Key: SPARK-45178 > URL: https://issues.apache.org/jira/browse/SPARK-45178 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > > We have observed the case where wrapper implementation of > Trigger.AvailableNow ( > AvailableNowDataStreamWrapper and subclasses) is not fully compatible with > 3rd party data source and brought up correctness issue. > > While we could persuade 3rd party data source to support > Trigger.AvailableNow, pursuing all 3rd parties to do this is too aggressive > and challenging goal we never be able to make. Also, it may not be also > possible to come up with the wrapper implementation which would have zero > issue with any arbitrary source. > > As a mitigation, we want to make a slight behavioral change for such case, > falling back to single batch execution (a.k.a. Trigger.Once) rather than > using wrapper implementation. The exact behavior between Trigger.AvailableNow > and Trigger.Once are different so it's technically behavioral change, but > it's probably lot less surprised than failing the query. > > For extreme case where users are confident that there will be no issue at all > on using wrapper, we will come up with a flag to provide the previous > behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45192) lineInterpolate for graphviz edge is overdue
[ https://issues.apache.org/jira/browse/SPARK-45192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-45192. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42969 [https://github.com/apache/spark/pull/42969] > lineInterpolate for graphviz edge is overdue > > > Key: SPARK-45192 > URL: https://issues.apache.org/jira/browse/SPARK-45192 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45192) lineInterpolate for graphviz edge is overdue
[ https://issues.apache.org/jira/browse/SPARK-45192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-45192: Assignee: Kent Yao > lineInterpolate for graphviz edge is overdue > > > Key: SPARK-45192 > URL: https://issues.apache.org/jira/browse/SPARK-45192 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45226) Refine docstring of `rand/randn`
BingKun Pan created SPARK-45226: --- Summary: Refine docstring of `rand/randn` Key: SPARK-45226 URL: https://issues.apache.org/jira/browse/SPARK-45226 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43432) Fix `min_periods` for Rolling to work same as pandas
[ https://issues.apache.org/jira/browse/SPARK-43432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43432: Parent: (was: SPARK-44101) Issue Type: Improvement (was: Sub-task) > Fix `min_periods` for Rolling to work same as pandas > - > > Key: SPARK-43432 > URL: https://issues.apache.org/jira/browse/SPARK-43432 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Fix `min_periods` for Rolling to work same as pandas > https://github.com/pandas-dev/pandas/issues/31302 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45208) Website doesn't have horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-45208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45208: -- Summary: Website doesn't have horizontal scrollbar (was: Kubernetes Configuration in Spark Community Website doesn't have horizontal scrollbar) > Website doesn't have horizontal scrollbar > - > > Key: SPARK-45208 > URL: https://issues.apache.org/jira/browse/SPARK-45208 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Major > > I find a recent issue with the official Spark documentation on the website. > Specifically, the Kubernetes configuration lists on the right-hand side are > not visible and doc doesn't have a horizontal scrollbar. > > - > [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration] > - > [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration] > Wide tables are broken in the same way. > - https://spark.apache.org/docs/latest/spark-standalone.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45208) Kubernetes Configuration in Spark Community Website doesn't have horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-45208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45208: -- Description: I find a recent issue with the official Spark documentation on the website. Specifically, the Kubernetes configuration lists on the right-hand side are not visible and doc doesn't have a horizontal scrollbar. - [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration] - [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration] Wide tables are broken in the same way. - https://spark.apache.org/docs/latest/spark-standalone.html was: I find a recent issue with the official Spark documentation on the website. Specifically, the Kubernetes configuration lists on the right-hand side are not visible and doc doesn't have a horizontal scrollbar. - [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration] - [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration] > Kubernetes Configuration in Spark Community Website doesn't have horizontal > scrollbar > - > > Key: SPARK-45208 > URL: https://issues.apache.org/jira/browse/SPARK-45208 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Qian Sun >Priority: Major > > I find a recent issue with the official Spark documentation on the website. > Specifically, the Kubernetes configuration lists on the right-hand side are > not visible and doc doesn't have a horizontal scrollbar. > > - > [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration] > - > [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration] > Wide tables are broken in the same way. > - https://spark.apache.org/docs/latest/spark-standalone.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45225) XML: XSD file URL support
Sandip Agarwala created SPARK-45225: --- Summary: XML: XSD file URL support Key: SPARK-45225 URL: https://issues.apache.org/jira/browse/SPARK-45225 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Sandip Agarwala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45224) Add examples w/ map and array as parameters of sql()
Max Gekk created SPARK-45224: Summary: Add examples w/ map and array as parameters of sql() Key: SPARK-45224 URL: https://issues.apache.org/jira/browse/SPARK-45224 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Add a few more example to the `sql()` method in PySpark and show how to use map and array parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45220) Refine docstring of `DataFrame.join`
[ https://issues.apache.org/jira/browse/SPARK-45220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-45220: - Description: Refine the docstring of `DataFrame.join`. The examples should also include: left join, left anit join, join on multiple columns and column names, join on multiple conditions was:Refine the docstring of `DataFrame.join`. > Refine docstring of `DataFrame.join` > > > Key: SPARK-45220 > URL: https://issues.apache.org/jira/browse/SPARK-45220 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Refine the docstring of `DataFrame.join`. > The examples should also include: left join, left anit join, join on multiple > columns and column names, join on multiple conditions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45223) Refine docstring of `Column.when`
Allison Wang created SPARK-45223: Summary: Refine docstring of `Column.when` Key: SPARK-45223 URL: https://issues.apache.org/jira/browse/SPARK-45223 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of Column.when -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45222) Refine docstring of `DataFrameReader.json`
Allison Wang created SPARK-45222: Summary: Refine docstring of `DataFrameReader.json` Key: SPARK-45222 URL: https://issues.apache.org/jira/browse/SPARK-45222 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of read json -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45221) Refine docstring of `DataFrameReader.parquet`
Allison Wang created SPARK-45221: Summary: Refine docstring of `DataFrameReader.parquet` Key: SPARK-45221 URL: https://issues.apache.org/jira/browse/SPARK-45221 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of read parquet -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45220) Refine docstring of `DataFrame.join`
Allison Wang created SPARK-45220: Summary: Refine docstring of `DataFrame.join` Key: SPARK-45220 URL: https://issues.apache.org/jira/browse/SPARK-45220 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of `DataFrame.join`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45219) Refine docstring of `DataFrame.withColumnRenamed`
Allison Wang created SPARK-45219: Summary: Refine docstring of `DataFrame.withColumnRenamed` Key: SPARK-45219 URL: https://issues.apache.org/jira/browse/SPARK-45219 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of `DataFrame.withColumnRenamed` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45218) Refine docstring of `Column.isin`
Allison Wang created SPARK-45218: Summary: Refine docstring of `Column.isin` Key: SPARK-45218 URL: https://issues.apache.org/jira/browse/SPARK-45218 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refine the docstring of `Column.isin` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45217) Support change log level of specific package or class
Zhongwei Zhu created SPARK-45217: Summary: Support change log level of specific package or class Key: SPARK-45217 URL: https://issues.apache.org/jira/browse/SPARK-45217 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Zhongwei Zhu Add SparkContext.setLogLevel(loggerName: String, logLevel: String) to support change log level of specific package or class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45207) Implement Error Enrichment for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-45207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yihong He updated SPARK-45207: -- Summary: Implement Error Enrichment for Scala Client (was: Implement FetchErrorDetails RPC) > Implement Error Enrichment for Scala Client > --- > > Key: SPARK-45207 > URL: https://issues.apache.org/jira/browse/SPARK-45207 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44306) Group FileStatus with few RPC calls within Yarn Client
[ https://issues.apache.org/jira/browse/SPARK-44306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-44306. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42357 [https://github.com/apache/spark/pull/42357] > Group FileStatus with few RPC calls within Yarn Client > -- > > Key: SPARK-44306 > URL: https://issues.apache.org/jira/browse/SPARK-44306 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 0.9.2, 2.3.0, 3.5.0 >Reporter: SHU WANG >Assignee: SHU WANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > It's inefficient to obtain *FileStatus* for each resource [one by > one|https://github.com/apache/spark/blob/531ec8bddc8dd22ca39486dbdd31e62e989ddc15/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala#L71C1]. > In our company setting, we are running Spark with Hadoop Yarn and HDFS. We > noticed the current behavior has two major drawbacks: > # Since each *getFileStatus* call involves network delays, the overall delay > can be *large* and add *uncertainty* to the overall Spark job runtime. > Specifically, we quantify this overhead within our cluster. We see the p50 > overhead is around 10s, p80 is 1 min, and p100 is up to 15 mins. When HDFS is > overloaded, the delays become more severe. > # In our cluster, we have nearly 100 million *getFileStatus* call to HDFS > daily. We noticed that in our cluster, most resources come from the same HDFS > directory for each user (See our [engineer blog > post|https://engineering.linkedin.com/blog/2023/reducing-apache-spark-application-dependencies-upload-by-99-] > about why we took this approach). Therefore, we can greatly reduce nearly > 100 million *getFileStatus* call to 0.1 million *listStatus* calls daily. > This will further reduce overhead from the HDFS side. > All in all, a more efficient way to fetch the *FileStatus* for each resource > is highly needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44306) Group FileStatus with few RPC calls within Yarn Client
[ https://issues.apache.org/jira/browse/SPARK-44306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-44306: --- Assignee: SHU WANG > Group FileStatus with few RPC calls within Yarn Client > -- > > Key: SPARK-44306 > URL: https://issues.apache.org/jira/browse/SPARK-44306 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 0.9.2, 2.3.0, 3.5.0 >Reporter: SHU WANG >Assignee: SHU WANG >Priority: Major > Labels: pull-request-available > > It's inefficient to obtain *FileStatus* for each resource [one by > one|https://github.com/apache/spark/blob/531ec8bddc8dd22ca39486dbdd31e62e989ddc15/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala#L71C1]. > In our company setting, we are running Spark with Hadoop Yarn and HDFS. We > noticed the current behavior has two major drawbacks: > # Since each *getFileStatus* call involves network delays, the overall delay > can be *large* and add *uncertainty* to the overall Spark job runtime. > Specifically, we quantify this overhead within our cluster. We see the p50 > overhead is around 10s, p80 is 1 min, and p100 is up to 15 mins. When HDFS is > overloaded, the delays become more severe. > # In our cluster, we have nearly 100 million *getFileStatus* call to HDFS > daily. We noticed that in our cluster, most resources come from the same HDFS > directory for each user (See our [engineer blog > post|https://engineering.linkedin.com/blog/2023/reducing-apache-spark-application-dependencies-upload-by-99-] > about why we took this approach). Therefore, we can greatly reduce nearly > 100 million *getFileStatus* call to 0.1 million *listStatus* calls daily. > This will further reduce overhead from the HDFS side. > All in all, a more efficient way to fetch the *FileStatus* for each resource > is highly needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45215) Combine HiveCatalogedDDLSuite and HiveDDLSuite
[ https://issues.apache.org/jira/browse/SPARK-45215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45215. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42992 [https://github.com/apache/spark/pull/42992] > Combine HiveCatalogedDDLSuite and HiveDDLSuite > --- > > Key: SPARK-45215 > URL: https://issues.apache.org/jira/browse/SPARK-45215 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45215) Combine HiveCatalogedDDLSuite and HiveDDLSuite
[ https://issues.apache.org/jira/browse/SPARK-45215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45215: - Assignee: BingKun Pan > Combine HiveCatalogedDDLSuite and HiveDDLSuite > --- > > Key: SPARK-45215 > URL: https://issues.apache.org/jira/browse/SPARK-45215 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43453) Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43453. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42991 [https://github.com/apache/spark/pull/42991] > Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0. > > > Key: SPARK-43453 > URL: https://issues.apache.org/jira/browse/SPARK-43453 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43453) Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43453: - Assignee: Haejoon Lee > Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0. > > > Key: SPARK-43453 > URL: https://issues.apache.org/jira/browse/SPARK-43453 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43453) Ignore the names of MultiIndex when axis=1 for concat
[ https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-43453: -- Summary: Ignore the names of MultiIndex when axis=1 for concat (was: Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.) > Ignore the names of MultiIndex when axis=1 for concat > - > > Key: SPARK-43453 > URL: https://issues.apache.org/jira/browse/SPARK-43453 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766881#comment-17766881 ] Enrico Minack commented on SPARK-38200: --- Sadly, still no feedback from reviewers. > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > > https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile
[ https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitin Dominic updated SPARK-45200: -- Flags: Important > Spark 3.4.0 always using default log4j profile > -- > > Key: SPARK-45200 > URL: https://issues.apache.org/jira/browse/SPARK-45200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Jitin Dominic >Priority: Major > > I've been using Spark core 3.2.2 and was upgrading to 3.4.0 > On execution of my Java code with the 3.4.0, it generates some extra set of > logs but don't face this issue with 3.2.2. > > I noticed that logs says _Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties._ > > Is this a bug or do we have a a configuration to disable the using of > default log4j profile? > I didn't see anything in the documentation > > > {code:java} > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 > 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/09/18 20:05:08 INFO ResourceUtils: > == > 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for > spark.driver. > 23/09/18 20:05:08 INFO ResourceUtils: > == > 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ > 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: > offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: > cpus, amount: 1.0) > 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu > 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd > 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd > 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: > 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: > 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: jd; groups with view > permissions: EMPTY; users with modify permissions: jd; groups with modify > permissions: EMPTY > 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on > port 39155. > 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker > 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster > 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint > up > 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012 > 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 > MiB > 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator > 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI > 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd > 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath > (userClassPathFirst = false): '' > 23/09/18 20:05:09 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819. > 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819 > 23/09/18 20:05:09 INFO BlockManager: Using > org.apache.spark.storage.RandomBlockReplicationPolicy for block replication > policy > 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager > jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: > BlockManagerId(driver, jd, 32819, None) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766815#comment-17766815 ] Yair Ofek commented on SPARK-38200: --- [~EnricoMi] any news on when this important feature going to be merged? > [SQL] Spark JDBC Savemode Supports Upsert > - > > Key: SPARK-38200 > URL: https://issues.apache.org/jira/browse/SPARK-38200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > upsert sql for different databases, Most databases support merge sql: > sqlserver merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java] > mysql: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java] > oracle merge into sql : > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java] > postgres: > [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java] > postgres merg into sql : > [https://www.postgresql.org/docs/current/sql-merge.html] > db2 merge into sql : > [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge] > derby merge into sql: > [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html] > he merg into sql : > [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm] > > [~yao] > > https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44622) Implement FetchErrorDetails RPC
[ https://issues.apache.org/jira/browse/SPARK-44622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yihong He updated SPARK-44622: -- Summary: Implement FetchErrorDetails RPC (was: Implement error enrichment and JVM stacktrace) > Implement FetchErrorDetails RPC > --- > > Key: SPARK-44622 > URL: https://issues.apache.org/jira/browse/SPARK-44622 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45183) ExecutorPodsLifecycleManager delete a pod multi times.
[ https://issues.apache.org/jira/browse/SPARK-45183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766802#comment-17766802 ] hgs commented on SPARK-45183: - I have compared Spark 3.2.0 with Spark 3.5.0. The deletion of pod is nothing different in `ExecutorPodsLifecycleManager`.So I suspect the version 3.5.0 may have the same issure.[~dongjoon] > ExecutorPodsLifecycleManager delete a pod multi times. > -- > > Key: SPARK-45183 > URL: https://issues.apache.org/jira/browse/SPARK-45183 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 > Environment: Spark 3.2.0 >Reporter: hgs >Priority: Minor > > Because `ExecutorPodsLifecycleManager`.`removedExecutorsCache` is not thread > safe, will cause a pod deleted many times when > `ExecutorPodsLifecycleManager`.`onNewSnapshots` called by multi threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45216) Fix non-deterministic seeded Dataset APIs
[ https://issues.apache.org/jira/browse/SPARK-45216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-45216: --- Description: If we run the following example the result is the expected equal 2 columns: {noformat} val c = rand() df.select(c, c) +--+--+ |rand(-4522010140232537566)|rand(-4522010140232537566)| +--+--+ |0.4520819282997137|0.4520819282997137| +--+--+ {noformat} But if we run use other similar APIs their result is incorrect: {noformat} val r1 = random() val r2 = uuid() val r3 = shuffle(col("x")) val x = df.select(r1, r1, r2, r2, r3, r3) +--+--+++--+--+ |rand()|rand()| uuid()| uuid()|shuffle(x)|shuffle(x)| +--+--+++--+--+ |0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...| [1, 2, 3]| [2, 1, 3]| +--+--+++--+--+ {noformat} > Fix non-deterministic seeded Dataset APIs > - > > Key: SPARK-45216 > URL: https://issues.apache.org/jira/browse/SPARK-45216 > Project: Spark > Issue Type: Bug > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Peter Toth >Priority: Major > > If we run the following example the result is the expected equal 2 columns: > {noformat} > val c = rand() > df.select(c, c) > +--+--+ > |rand(-4522010140232537566)|rand(-4522010140232537566)| > +--+--+ > |0.4520819282997137|0.4520819282997137| > +--+--+ > {noformat} > > But if we run use other similar APIs their result is incorrect: > {noformat} > val r1 = random() > val r2 = uuid() > val r3 = shuffle(col("x")) > val x = df.select(r1, r1, r2, r2, r3, r3) > +--+--+++--+--+ > |rand()|rand()| uuid()| > uuid()|shuffle(x)|shuffle(x)| > +--+--+++--+--+ > |0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...| > [1, 2, 3]| [2, 1, 3]| > +--+--+++--+--+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45216) Fix non-deterministic seeded Dataset APIs
Peter Toth created SPARK-45216: -- Summary: Fix non-deterministic seeded Dataset APIs Key: SPARK-45216 URL: https://issues.apache.org/jira/browse/SPARK-45216 Project: Spark Issue Type: Bug Components: Connect, SQL Affects Versions: 4.0.0 Reporter: Peter Toth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior
[ https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee reopened SPARK-43433: - > Match `GroupBy.nth` behavior with new pandas behavior > - > > Key: SPARK-43433 > URL: https://issues.apache.org/jira/browse/SPARK-43433 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Match behavior with > https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45215) Combine HiveCatalogedDDLSuite and HiveDDLSuite
BingKun Pan created SPARK-45215: --- Summary: Combine HiveCatalogedDDLSuite and HiveDDLSuite Key: SPARK-45215 URL: https://issues.apache.org/jira/browse/SPARK-45215 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45211) Scala 2.13 daily test failed
[ https://issues.apache.org/jira/browse/SPARK-45211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45211: Assignee: Yang Jie > Scala 2.13 daily test failed > - > > Key: SPARK-45211 > URL: https://issues.apache.org/jira/browse/SPARK-45211 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 3.5.1 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > * [https://github.com/apache/spark/actions/runs/6215331575/job/16868131377] > {code:java} > [info] - abandoned query gets INVALID_HANDLE.OPERATION_ABANDONED error *** > FAILED *** (157 milliseconds) > 19991[info] Expected exception org.apache.spark.SparkException to be > thrown, but java.lang.StackOverflowError was thrown > (ReattachableExecuteSuite.scala:172) > 19992[info] org.scalatest.exceptions.TestFailedException: > 19993[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 19994[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 19995[info] at > org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) > 19996[info] at org.scalatest.Assertions.intercept(Assertions.scala:756) > 19997[info] at org.scalatest.Assertions.intercept$(Assertions.scala:746) > 19998[info] at > org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1564) > 1[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18(ReattachableExecuteSuite.scala:172) > 2[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18$adapted(ReattachableExecuteSuite.scala:168) > 20001[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub(SparkConnectServerTest.scala:222) > 20002[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub$(SparkConnectServerTest.scala:216) > 20003[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withCustomBlockingStub(ReattachableExecuteSuite.scala:30) > 20004[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16(ReattachableExecuteSuite.scala:168) > 20005[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16$adapted(ReattachableExecuteSuite.scala:151) > 20006[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withClient(SparkConnectServerTest.scala:199) > 20007[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withClient$(SparkConnectServerTest.scala:191) > 20008[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withClient(ReattachableExecuteSuite.scala:30) > 20009[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$15(ReattachableExecuteSuite.scala:151) > 20010[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > 20011[info] at > org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > 20012[info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > 20013[info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > 20014[info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > 20015[info] at > org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > 20016[info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > 20017[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 20018[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 20019[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 20020[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 20021[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 20022[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > 20023[info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > 20024[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 20025[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 20026[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 20027[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 20028[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 20029[info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > 20030[info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > 20031[info] at
[jira] [Resolved] (SPARK-45211) Scala 2.13 daily test failed
[ https://issues.apache.org/jira/browse/SPARK-45211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45211. -- Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 42981 [https://github.com/apache/spark/pull/42981] > Scala 2.13 daily test failed > - > > Key: SPARK-45211 > URL: https://issues.apache.org/jira/browse/SPARK-45211 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 3.5.1 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.1, 4.0.0 > > > * [https://github.com/apache/spark/actions/runs/6215331575/job/16868131377] > {code:java} > [info] - abandoned query gets INVALID_HANDLE.OPERATION_ABANDONED error *** > FAILED *** (157 milliseconds) > 19991[info] Expected exception org.apache.spark.SparkException to be > thrown, but java.lang.StackOverflowError was thrown > (ReattachableExecuteSuite.scala:172) > 19992[info] org.scalatest.exceptions.TestFailedException: > 19993[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 19994[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 19995[info] at > org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) > 19996[info] at org.scalatest.Assertions.intercept(Assertions.scala:756) > 19997[info] at org.scalatest.Assertions.intercept$(Assertions.scala:746) > 19998[info] at > org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1564) > 1[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18(ReattachableExecuteSuite.scala:172) > 2[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18$adapted(ReattachableExecuteSuite.scala:168) > 20001[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub(SparkConnectServerTest.scala:222) > 20002[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub$(SparkConnectServerTest.scala:216) > 20003[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withCustomBlockingStub(ReattachableExecuteSuite.scala:30) > 20004[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16(ReattachableExecuteSuite.scala:168) > 20005[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16$adapted(ReattachableExecuteSuite.scala:151) > 20006[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withClient(SparkConnectServerTest.scala:199) > 20007[info] at > org.apache.spark.sql.connect.SparkConnectServerTest.withClient$(SparkConnectServerTest.scala:191) > 20008[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withClient(ReattachableExecuteSuite.scala:30) > 20009[info] at > org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$15(ReattachableExecuteSuite.scala:151) > 20010[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > 20011[info] at > org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > 20012[info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > 20013[info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > 20014[info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > 20015[info] at > org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > 20016[info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > 20017[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 20018[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 20019[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 20020[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 20021[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 20022[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > 20023[info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > 20024[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 20025[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 20026[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 20027[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 20028[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 20029[info] at >
[jira] [Updated] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151
[ https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deng Ziming updated SPARK-45213: Description: in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. *NOTE:* Please reply to this ticket before start working on it, to avoid working on same ticket at a time was: We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. *NOTE:* Please reply to this ticket before start working on it, to avoid working on same ticket at a time > Assign name to _LEGACY_ERROR_TEMP_2151 > -- > > Key: SPARK-45213 > URL: https://issues.apache.org/jira/browse/SPARK-45213 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Deng Ziming >Assignee: Haejoon Lee >Priority: Major > > in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating > dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error > class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151
[ https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deng Ziming updated SPARK-45213: Affects Version/s: 3.5.0 (was: 3.4.0) > Assign name to _LEGACY_ERROR_TEMP_2151 > -- > > Key: SPARK-45213 > URL: https://issues.apache.org/jira/browse/SPARK-45213 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Deng Ziming >Assignee: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151
[ https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deng Ziming updated SPARK-45213: Fix Version/s: (was: 3.4.0) > Assign name to _LEGACY_ERROR_TEMP_2151 > -- > > Key: SPARK-45213 > URL: https://issues.apache.org/jira/browse/SPARK-45213 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Assignee: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45214) Columns should not be visible for filter after projection
Jakub Wozniak created SPARK-45214: - Summary: Columns should not be visible for filter after projection Key: SPARK-45214 URL: https://issues.apache.org/jira/browse/SPARK-45214 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1 Reporter: Jakub Wozniak Columns are visible for filtering but not for select after projection. Moreover the behaviour is different when after a union (in this case columns are not visible for filtering anymore). {code:java} from pyspark.sql import SparkSession from pyspark.sql.types import * data1 = [] data2 = [] for i in range(2): data1.append( (1,i) ) data2.append( (2,i+10)) schema1 = StructType([ StructField('f1', IntegerType(), True), StructField('f2', IntegerType(), True) ]) df1 = spark.createDataFrame(data1, schema1) df2 = spark.createDataFrame(data2, schema1) df1.show() df2.show() #works, f1 is available for filter (though it should not be) df1.select('f2').where('f1=1').show() #error, f1 is not available df1.select('f2').union(df2.select('f2')).where('f1=1').show() #this is semantically not symmetric -> incorrect. {code} This is similar to this one: https://issues.apache.org/jira/browse/SPARK-30421 Perhaps it gives a bit more argumentation why this should be fixed as it is logically not correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151
Deng Ziming created SPARK-45213: --- Summary: Assign name to _LEGACY_ERROR_TEMP_2151 Key: SPARK-45213 URL: https://issues.apache.org/jira/browse/SPARK-45213 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Deng Ziming Assignee: Haejoon Lee Fix For: 3.4.0 We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. *NOTE:* Please reply to this ticket before start working on it, to avoid working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45101) Spark UI: A stage is still active even when all of it's tasks are succeeded
[ https://issues.apache.org/jira/browse/SPARK-45101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] RickyMa updated SPARK-45101: Priority: Critical (was: Major) > Spark UI: A stage is still active even when all of it's tasks are succeeded > --- > > Key: SPARK-45101 > URL: https://issues.apache.org/jira/browse/SPARK-45101 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: RickyMa >Priority: Critical > Attachments: 1.png, 2.png, 3.png > > > In the stage UI, we can see all the tasks' statuses are SUCCESS. > But the stage is still marked as active. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed
[ https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690 ] Qian Sun edited comment on SPARK-43182 at 9/19/23 8:14 AM: --- Hi [~Resol1992] I ran your sql, tried different configuration combinations and believe regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which introduces extra shuffles. AQE can give up skewJoin Optimization if extra shuffle introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc [~cloud_fan] ref: [https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229] was (Author: dcoliversun): Hi [~Resol1992] I ran your sql, tried different configuration combinations and believe regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which introduces extra shuffles. AQE can give up skewJoin Optimization if extra shuffle introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc [~cloud_fan] * https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229 > Mutilple tables join with limit when AE is enabled and one table is skewed > -- > > Key: SPARK-43182 > URL: https://issues.apache.org/jira/browse/SPARK-43182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Liu Shuo >Priority: Critical > Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, > part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, > part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, > part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, > part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, > part-m-00019.zip > > > When we test AE in Spark3.4.0 with the following case, we find If we disable > AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we > enable AE and enable skewJoin,it will take very long time. > The test case: > {code:java} > ###uncompress the part-m-***.zip attachment, and put these files under > '/tmp/spark-warehouse/data/' dir. > create table source_aqe(c1 int,c18 string) using csv options(path > 'file:///tmp/spark-warehouse/data/'); > create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned > by(c18 string); > insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from > source_aqe; > insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from > source_aqe limit 12; > insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from > source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from > source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from > source_aqe limit 12; > set spark.sql.adaptive.enabled=false; > set spark.sql.adaptive.forceOptimizeSkewedJoin = false; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > > ###it will finish in 20s > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > set spark.sql.adaptive.enabled=true; > set spark.sql.adaptive.forceOptimizeSkewedJoin = true; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > ###it will take very long time > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To
[jira] [Commented] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed
[ https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690 ] Qian Sun commented on SPARK-43182: -- Hi [~Resol1992] I ran your sql, tried different configuration combinations and believe regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which introduces extra shuffles. AQE can give up skewJoin Optimization if extra shuffle introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc [~cloud_fan] * https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229 > Mutilple tables join with limit when AE is enabled and one table is skewed > -- > > Key: SPARK-43182 > URL: https://issues.apache.org/jira/browse/SPARK-43182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Liu Shuo >Priority: Critical > Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, > part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, > part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, > part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, > part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, > part-m-00019.zip > > > When we test AE in Spark3.4.0 with the following case, we find If we disable > AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we > enable AE and enable skewJoin,it will take very long time. > The test case: > {code:java} > ###uncompress the part-m-***.zip attachment, and put these files under > '/tmp/spark-warehouse/data/' dir. > create table source_aqe(c1 int,c18 string) using csv options(path > 'file:///tmp/spark-warehouse/data/'); > create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned > by(c18 string); > insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from > source_aqe; > insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from > source_aqe limit 12; > insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from > source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from > source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as > PARQUET partitioned by(c18 string); > insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from > source_aqe limit 16; > insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from > source_aqe limit 12; > set spark.sql.adaptive.enabled=false; > set spark.sql.adaptive.forceOptimizeSkewedJoin = false; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > > ###it will finish in 20s > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > set spark.sql.adaptive.enabled=true; > set spark.sql.adaptive.forceOptimizeSkewedJoin = true; > set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1; > set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB; > set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB; > set spark.sql.autoBroadcastJoinThreshold = 51200; > ###it will take very long time > select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join > hive_snappy_aqe_table3 on > hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45101) Spark UI: A stage is still active even when all of it's tasks are succeeded
[ https://issues.apache.org/jira/browse/SPARK-45101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] RickyMa updated SPARK-45101: Affects Version/s: 3.5.0 4.0.0 > Spark UI: A stage is still active even when all of it's tasks are succeeded > --- > > Key: SPARK-45101 > URL: https://issues.apache.org/jira/browse/SPARK-45101 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: RickyMa >Priority: Major > Attachments: 1.png, 2.png, 3.png > > > In the stage UI, we can see all the tasks' statuses are SUCCESS. > But the stage is still marked as active. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45212) Install independent Python linter dependencies for branch-3.5
[ https://issues.apache.org/jira/browse/SPARK-45212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45212. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42990 [https://github.com/apache/spark/pull/42990] > Install independent Python linter dependencies for branch-3.5 > - > > Key: SPARK-45212 > URL: https://issues.apache.org/jira/browse/SPARK-45212 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 4.0.0 > > > Python linter failed in branch -3.5 daily test: > * [https://github.com/apache/spark/actions/runs/6221638911/job/16884068430] > {code:java} > Run PYTHON_EXECUTABLE=python3.9 ./dev/lint-python > 12starting python compilation test... > 13python compilation succeeded. > 14 > 15starting black test... > 16black checks failed: > 17Oh no! The required version `22.6.0` does not match the running > version `23.9.1`! > 18Please run 'dev/reformat-python' script. > 191 > 20Error: Process completed with exit code 1. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45212) Install independent Python linter dependencies for branch-3.5
[ https://issues.apache.org/jira/browse/SPARK-45212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45212: - Assignee: Yang Jie > Install independent Python linter dependencies for branch-3.5 > - > > Key: SPARK-45212 > URL: https://issues.apache.org/jira/browse/SPARK-45212 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > Python linter failed in branch -3.5 daily test: > * [https://github.com/apache/spark/actions/runs/6221638911/job/16884068430] > {code:java} > Run PYTHON_EXECUTABLE=python3.9 ./dev/lint-python > 12starting python compilation test... > 13python compilation succeeded. > 14 > 15starting black test... > 16black checks failed: > 17Oh no! The required version `22.6.0` does not match the running > version `23.9.1`! > 18Please run 'dev/reformat-python' script. > 191 > 20Error: Process completed with exit code 1. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile
[ https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitin Dominic updated SPARK-45200: -- Description: I've been using Spark core 3.2.2 and was upgrading to 3.4.0 On execution of my Java code with the 3.4.0, it generates some extra set of logs but don't face this issue with 3.2.2. I noticed that logs says _Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties._ Is this a bug or do we have a a configuration to disable the using of default log4j profile? I didn't see anything in the documentation {code:java} Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/09/18 20:05:08 INFO ResourceUtils: == 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for spark.driver. 23/09/18 20:05:08 INFO ResourceUtils: == 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: jd; groups with view permissions: EMPTY; users with modify permissions: jd; groups with modify permissions: EMPTY 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on port 39155. 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 MiB 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): '' 23/09/18 20:05:09 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819. 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819 23/09/18 20:05:09 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, jd, 32819, None) 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None) 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, jd, 32819, None) 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, jd, 32819, None) {code} was: I've been using Spark core 3.2.2 and was upgrading to 3.4.0 On execution of my Java code with the 3.4.0, it generates some extra set of logs but don't face this issue with 3.2.2. I noticed that logs says _Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties._ Is this a bug or do we have a a configuration to disable the using of default log4j profile? I didn't see anything in the documentation {code:java} Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile
[ https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitin Dominic updated SPARK-45200: -- Description: I've been using Spark core 3.2.2 and was upgrading to 3.4.0 On execution of my Java code with the 3.4.0, it generates some extra set of logs but don't face this issue with 3.2.2. I noticed that logs says _Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties._ Is this a bug or do we have a a configuration to disable the using of default log4j profile? I didn't see anything in the documentation {code:java} Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/09/18 20:05:08 INFO ResourceUtils: == 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for spark.driver. 23/09/18 20:05:08 INFO ResourceUtils: == 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: jd; groups with view permissions: EMPTY; users with modify permissions: jd; groups with modify permissions: EMPTY 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on port 39155. 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 MiB 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): '' 23/09/18 20:05:09 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819. 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819 23/09/18 20:05:09 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, jd, 32819, None) 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None) 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, jd, 32819, None) 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, jd, 32819, None) {code} was: I've been using Spark core 3.2.2 and was upgrading to 3.4.0 On execution of my Java code with the 3.4.0, it generates some extra set of logs but don't face this issue with 3.2.2. I noticed that logs says _Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties._ Is there a configuration to disable the using of default log4j profile? {code:java} Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/09/18 20:05:08 INFO ResourceUtils:
[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile
[ https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitin Dominic updated SPARK-45200: -- Summary: Spark 3.4.0 always using default log4j profile (was: Ignore/Disable Spark's default log4j profile) > Spark 3.4.0 always using default log4j profile > -- > > Key: SPARK-45200 > URL: https://issues.apache.org/jira/browse/SPARK-45200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Jitin Dominic >Priority: Major > > I've been using Spark core 3.2.2 and was upgrading to 3.4.0 > > On execution of my Java code with the 3.4.0, it generates some extra set of > logs but don't face this issue with 3.2.2. > > I noticed that logs says _Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties._ > > Is there a configuration to disable the using of default log4j profile? > > > {code:java} > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 > 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/09/18 20:05:08 INFO ResourceUtils: > == > 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for > spark.driver. > 23/09/18 20:05:08 INFO ResourceUtils: > == > 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ > 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: > offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: > cpus, amount: 1.0) > 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu > 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd > 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd > 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: > 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: > 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: jd; groups with view > permissions: EMPTY; users with modify permissions: jd; groups with modify > permissions: EMPTY > 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on > port 39155. > 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker > 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster > 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint > up > 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012 > 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 > MiB > 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator > 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI > 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd > 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath > (userClassPathFirst = false): '' > 23/09/18 20:05:09 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819. > 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819 > 23/09/18 20:05:09 INFO BlockManager: Using > org.apache.spark.storage.RandomBlockReplicationPolicy for block replication > policy > 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager > jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: > BlockManagerId(driver, jd, 32819, None) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Updated] (SPARK-45200) Ignore/Disable Spark's default log4j profile
[ https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitin Dominic updated SPARK-45200: -- Issue Type: Bug (was: Question) > Ignore/Disable Spark's default log4j profile > > > Key: SPARK-45200 > URL: https://issues.apache.org/jira/browse/SPARK-45200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Jitin Dominic >Priority: Major > > I've been using Spark core 3.2.2 and was upgrading to 3.4.0 > > On execution of my Java code with the 3.4.0, it generates some extra set of > logs but don't face this issue with 3.2.2. > > I noticed that logs says _Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties._ > > Is there a configuration to disable the using of default log4j profile? > > > {code:java} > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0 > 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/09/18 20:05:08 INFO ResourceUtils: > == > 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for > spark.driver. > 23/09/18 20:05:08 INFO ResourceUtils: > == > 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ > 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: > offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: > cpus, amount: 1.0) > 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu > 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd > 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd > 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: > 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: > 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: jd; groups with view > permissions: EMPTY; users with modify permissions: jd; groups with modify > permissions: EMPTY > 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on > port 39155. > 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker > 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster > 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint > up > 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012 > 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 > MiB > 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator > 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI > 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd > 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath > (userClassPathFirst = false): '' > 23/09/18 20:05:09 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819. > 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819 > 23/09/18 20:05:09 INFO BlockManager: Using > org.apache.spark.storage.RandomBlockReplicationPolicy for block replication > policy > 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager > jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, jd, 32819, None) > 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: > BlockManagerId(driver, jd, 32819, None) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org