[jira] [Updated] (SPARK-45189) Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-45189:
---
Affects Version/s: 3.5.1

> Creating UnresolvedRelation from TableIdentifier should include the catalog 
> field
> -
>
> Key: SPARK-45189
> URL: https://issues.apache.org/jira/browse/SPARK-45189
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45189) Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-45189:
---
Fix Version/s: 3.5.1

> Creating UnresolvedRelation from TableIdentifier should include the catalog 
> field
> -
>
> Key: SPARK-45189
> URL: https://issues.apache.org/jira/browse/SPARK-45189
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45229) Show the number of drivers waiting in SUBMITTED status

2023-09-19 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45229:
-

 Summary: Show the number of drivers waiting in SUBMITTED status
 Key: SPARK-45229
 URL: https://issues.apache.org/jira/browse/SPARK-45229
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44112) Drop Java 8 and 11 support

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44112:
--
Summary: Drop Java 8 and 11 support  (was: Drop Java 8 Support)

> Drop Java 8 and 11 support
> --
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45225) XML: XSD file URL support

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766983#comment-17766983
 ] 

Snoot.io commented on SPARK-45225:
--

User 'sandip-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/43000

> XML: XSD file URL support
> -
>
> Key: SPARK-45225
> URL: https://issues.apache.org/jira/browse/SPARK-45225
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45225) XML: XSD file URL support

2023-09-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45225.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43000
[https://github.com/apache/spark/pull/43000]

> XML: XSD file URL support
> -
>
> Key: SPARK-45225
> URL: https://issues.apache.org/jira/browse/SPARK-45225
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45225) XML: XSD file URL support

2023-09-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45225:


Assignee: Sandip Agarwala

> XML: XSD file URL support
> -
>
> Key: SPARK-45225
> URL: https://issues.apache.org/jira/browse/SPARK-45225
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44622) Implement FetchErrorDetails RPC

2023-09-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44622.
---
Fix Version/s: 4.0.0
 Assignee: Yihong He
   Resolution: Fixed

> Implement FetchErrorDetails RPC
> ---
>
> Key: SPARK-44622
> URL: https://issues.apache.org/jira/browse/SPARK-44622
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45218) Refine docstring of `Column.isin`

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766980#comment-17766980
 ] 

Snoot.io commented on SPARK-45218:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/43001

> Refine docstring of `Column.isin`
> -
>
> Key: SPARK-45218
> URL: https://issues.apache.org/jira/browse/SPARK-45218
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Refine the docstring of `Column.isin`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766979#comment-17766979
 ] 

Dongjoon Hyun commented on SPARK-44112:
---

Never mind because you already made a PR, [~LuciferYang]. 

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45226) Refine docstring of `rand/randn`

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766977#comment-17766977
 ] 

Snoot.io commented on SPARK-45226:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/43003

> Refine docstring of `rand/randn`
> 
>
> Key: SPARK-45226
> URL: https://issues.apache.org/jira/browse/SPARK-45226
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45226) Refine docstring of `rand/randn`

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766975#comment-17766975
 ] 

Snoot.io commented on SPARK-45226:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/43003

> Refine docstring of `rand/randn`
> 
>
> Key: SPARK-45226
> URL: https://issues.apache.org/jira/browse/SPARK-45226
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44463) Improve error handling in Connect foreachBatch worker.

2023-09-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44463.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42986
[https://github.com/apache/spark/pull/42986]

> Improve error handling in Connect foreachBatch worker.
> --
>
> Key: SPARK-44463
> URL: https://issues.apache.org/jira/browse/SPARK-44463
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 4.0.0
>
>
> An error in user code inside foreachBatch worker is not propagated correctly 
> to the user. We should. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43498) Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766972#comment-17766972
 ] 

Snoot.io commented on SPARK-43498:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/43002

> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.
> --
>
> Key: SPARK-43498
> URL: https://issues.apache.org/jira/browse/SPARK-43498
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43498) Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766971#comment-17766971
 ] 

Snoot.io commented on SPARK-43498:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/43002

> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.
> --
>
> Key: SPARK-43498
> URL: https://issues.apache.org/jira/browse/SPARK-43498
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45228) Update `test_axis_on_dataframe` when Pandas regression is fixed

2023-09-19 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-45228:

Summary: Update `test_axis_on_dataframe` when Pandas regression is fixed  
(was: Restore `test_axis_on_dataframe` in normal state when Pandas regression 
is fixed)

> Update `test_axis_on_dataframe` when Pandas regression is fixed
> ---
>
> Key: SPARK-45228
> URL: https://issues.apache.org/jira/browse/SPARK-45228
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, Tests
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We manually cast the datatype when testing `test_axis_on_dataframe` from 
> [https://github.com/apache/spark/pull/43002,|https://github.com/apache/spark/pull/43002.]
>  but it's not a normal way to test properly.
> After the regression of Pandas is resolved, we should return the test back to 
> normal way.
> See Pandas regression: https://github.com/pandas-dev/pandas/issues/55194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45228) Restore `test_axis_on_dataframe` in normal state when Pandas regression is fixed

2023-09-19 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-45228:
---

 Summary: Restore `test_axis_on_dataframe` in normal state when 
Pandas regression is fixed
 Key: SPARK-45228
 URL: https://issues.apache.org/jira/browse/SPARK-45228
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark, Tests
Affects Versions: 4.0.0
Reporter: Haejoon Lee


We manually cast the datatype when testing `test_axis_on_dataframe` from 
[https://github.com/apache/spark/pull/43002,|https://github.com/apache/spark/pull/43002.]
 but it's not a normal way to test properly.

After the regression of Pandas is resolved, we should return the test back to 
normal way.

See Pandas regression: https://github.com/pandas-dev/pandas/issues/55194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766970#comment-17766970
 ] 

Snoot.io commented on SPARK-44112:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/43005

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block

2023-09-19 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766969#comment-17766969
 ] 

Snoot.io commented on SPARK-45134:
--

User 'gaoyajun02' has created a pull request for this issue:
https://github.com/apache/spark/pull/43004

> Data duplication may occur when fallback to origin shuffle block
> 
>
> Key: SPARK-45134
> URL: https://issues.apache.org/jira/browse/SPARK-45134
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0
>Reporter: gaoyajun02
>Priority: Critical
>
> One possible situation that has been found is that, during the process of 
> requesting mergedBlockMeta, when the channel is closed, it may trigger two 
> callback callbacks and result in duplicate data for the original shuffle 
> blocks.
>  # The first time is when the channel is inactivated, the responseHandler 
> will execute the callback for all outstandingRpcs.
>  # The second time is when the listener corresponding to 
> shuffleClient.writeAndFlush executes the callback after the channel is closed.
> Some Error Logs:
> {code:java}
> 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still 
> have 1 requests outstanding when connection from host/ip:prot is closed
> 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to 
> get the meta of push-merged block for (3, 54) from host:port
> java.io.IOException: Connection from host:port closed
>         at 
> org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147)
>         at 
> org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
>         at 
> io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
>         at 
> org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
>         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
>         at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>         at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.lang.Thread.run(Thread.java:745)
>  
> 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to 
> get the meta of push-merged block for (3, 54) from host:port
> java.io.IOException: Failed to send RPC RPC 8079698359363123411 to 
> host/ip:port: java.nio.channels.ClosedChannelException
>         at 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h2. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded. Note that the same EMR cluster 
with two worker nodes was able to run the same app without any issue before and 
after the incident.
h2. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

{quote}$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote} * Another task that's sent to the executor but didn't get launched 
since the single-threaded dispatcher was stuck (presumably in an "infinite 
loop" as explained later).

{quote}$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote}* Thread dump shows that the dispatcher-Executor thread has the 
following stack trace.

{quote}"dispatcher-Executor" #40 daemon prio=5 os_prio=0 tid=0x98e37800 
nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote}
h2. Relevant code paths

Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread are different. However, they 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
   Fix Version/s: (was: 4.0.0)
  (was: 3.5.1)
Target Version/s:   (was: 3.3.1)

> Fix an issue where an executor process randomly gets stuck, by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck, by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Summary: Fix an issue where an executor process randomly gets stuck, by 
making CoarseGrainedExecutorBackend.taskResources thread-safe  (was: Fix an 
issue where an executor process randomly gets stuck by making 
CoarseGrainedExecutorBackend.taskResources thread-safe)

> Fix an issue where an executor process randomly gets stuck, by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> 
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded. Note that the same 
> EMR cluster with two worker nodes was able to run the same app without any 
> issue before and after the incident.
> h2. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h2. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded. Note that the same EMR cluster 
with two worker nodes was able to run the same app without any issue before and 
after the incident.
h2. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote}
* Another task that's sent to the executor but didn't get launched since the 
single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
explained later).
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote} * Thread dump shows that the dispatcher-Executor thread has the 
following stack trace.

{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote}
h2. Relevant code paths

Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread are different. 

[jira] [Comment Edited] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766966#comment-17766966
 ] 

Dongjoon Hyun edited comment on SPARK-44112 at 9/20/23 3:03 AM:


Oh. Thanks. It seems that I was outdated and missed the discussion.


was (Author: dongjoon):
Oh. Thanks. It seems that I was outdated and missed the discussion. In the dev 
mailing, did Sean agree to drop Java 11 ?

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766967#comment-17766967
 ] 

Dongjoon Hyun commented on SPARK-44112:
---

My bad! You're right in that part, but please keep Java 11 in a separate JIRA 
if you don't mind.

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766966#comment-17766966
 ] 

Dongjoon Hyun commented on SPARK-44112:
---

Oh. Thanks. It seems that I was outdated and missed the discussion. In the dev 
mailing, did Sean agree to drop Java 11 ?

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

 
{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote} * Another task that's sent to the executor but didn't get launched 
since the single-threaded dispatcher was stuck (presumably in an "infinite 
loop" as explained later).

 

 
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote}
* Thread dump shows that the dispatcher-Executor thread has the following stack 
trace.

 

 
{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote} * Relevant code paths

 
{quote}Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Attachment: hashtable1.png

> Fix an issue where an executor process randomly gets stuck by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> ---
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h3. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded.
> Note that the same EMR cluster with two worker nodes was able to run the same 
> app without any issue before and after the incident.
> h3. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
>  
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
>  
>  
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
>  
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
> 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Attachment: hashtable2.png

> Fix an issue where an executor process randomly gets stuck by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> ---
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h3. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded.
> Note that the same EMR cluster with two worker nodes was able to run the same 
> app without any issue before and after the incident.
> h3. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
>  
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
>  
>  
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
>  
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
> 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Attachment: (was: Screenshot 2023-09-19 at 7.55.37 PM.png)

> Fix an issue where an executor process randomly gets stuck by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> ---
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h3. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded.
> Note that the same EMR cluster with two worker nodes was able to run the same 
> app without any issue before and after the incident.
> h3. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
>  
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
>  
>  
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
>  
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Attachment: (was: Screenshot 2023-09-19 at 7.55.31 PM.png)

> Fix an issue where an executor process randomly gets stuck by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> ---
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: hashtable1.png, hashtable2.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h3. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded.
> Note that the same EMR cluster with two worker nodes was able to run the same 
> app without any issue before and after the incident.
> h3. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
>  
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
>  
>  
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
>  
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

 
{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote}
* Another task that's sent to the executor but didn't get launched since the 
single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
explained later).

 

 
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote} * Thread dump shows that the dispatcher-Executor thread has the 
following stack trace.

 
{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote}
* Relevant code paths

 
{quote}Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread are 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Attachment: Screenshot 2023-09-19 at 7.55.37 PM.png

> Fix an issue where an executor process randomly gets stuck by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> ---
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: Screenshot 2023-09-19 at 7.55.31 PM.png, Screenshot 
> 2023-09-19 at 7.55.37 PM.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h3. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded.
> Note that the same EMR cluster with two worker nodes was able to run the same 
> app without any issue before and after the incident.
> h3. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
>  
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
>  
>  
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
>  
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Attachment: Screenshot 2023-09-19 at 7.55.31 PM.png

> Fix an issue where an executor process randomly gets stuck by making 
> CoarseGrainedExecutorBackend.taskResources thread-safe
> ---
>
> Key: SPARK-45227
> URL: https://issues.apache.org/jira/browse/SPARK-45227
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.5.0, 4.0.0
>Reporter: Bo Xiong
>Priority: Critical
>  Labels: hang, infinite-loop, race-condition, stuck, threadsafe
> Fix For: 4.0.0, 3.5.1
>
> Attachments: Screenshot 2023-09-19 at 7.55.31 PM.png, Screenshot 
> 2023-09-19 at 7.55.37 PM.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h3. Symptom
> Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
> last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking 
> at Spark UI, we saw that an executor process hung over 1 hour. After we 
> manually killed the executor process, the app succeeded.
> Note that the same EMR cluster with two worker nodes was able to run the same 
> app without any issue before and after the incident.
> h3. Observations
> Below is what's observed from relevant container logs and thread dump.
>  * A regular task that's sent to the executor, which also reported back to 
> the driver upon the task completion.
>  
> {quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
> 923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> 23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
> 923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)
> $zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923
> $zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
> 23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
> 4495 bytes result sent to driver}}
> {quote}
> * Another task that's sent to the executor but didn't get launched since the 
> single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
> explained later).
>  
>  
> {quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
> 23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
> 924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 
> 4432 bytes) taskResourceAssignments Map()
> $zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
> 23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924
> $zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
> >> note that the above command has no matching result, indicating that task 
> >> 153.0 in stage 23.0 (TID 924) was never launched}}
> {quote} * Thread dump shows that the dispatcher-Executor thread has the 
> following stack trace.
>  
> {quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
> tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
> java.lang.Thread.State: RUNNABLE
> at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
> at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
> at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
> at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
> at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
> at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
> at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
> at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
> at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
> at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.put(HashMap.scala:126)
> at scala.collection.mutable.HashMap.update(HashMap.scala:131)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
> at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
> at 
> org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
> Source)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> 

[jira] [Commented] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766965#comment-17766965
 ] 

Yang Jie commented on SPARK-44112:
--

!image-2023-09-20-10-53-34-956.png|width=729,height=161!

 

I have modified Jira title because it is stated in the release notes of Apache 
3.5.0 that the minimum supported Java version for the next major version will 
be Java 17

 

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

 
{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote} * Another task that's sent to the executor but didn't get launched 
since the single-threaded dispatcher was stuck (presumably in an "infinite 
loop" as explained later).

 
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote}
* Thread dump shows that the dispatcher-Executor thread has the following stack 
trace.

 
{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote} * Relevant code paths
{quote}Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread are 

[jira] [Updated] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44112:
-
Attachment: image-2023-09-20-10-53-34-956.png

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44112:
-
Attachment: image-2023-09-20-10-52-59-327.png

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Attachments: image-2023-09-20-10-52-59-327.png, 
> image-2023-09-20-10-53-34-956.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

 
{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote}
* Another task that's sent to the executor but didn't get launched since the 
single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
explained later).

 
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote} * Thread dump shows that the dispatcher-Executor thread has the 
following stack trace.
{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote}
 * Relevant code paths
{quote}Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread are 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Description: 
h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote} * Another task that's sent to the executor but didn't get launched 
since the single-threaded dispatcher was stuck (presumably in an "infinite 
loop" as explained later).
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote}
 * Thread dump shows that the dispatcher-Executor thread has the following 
stack trace.
{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote}
 * Relevant code paths
{quote}Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher thread are different. 

[jira] [Updated] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-45227:
-
Fix Version/s: 4.0.0
  Description: 
h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote}
* Another task that's sent to the executor but didn't get launched since the 
single-threaded dispatcher was stuck (presumably in an "infinite loop" as 
explained later).
{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote}
* Thread dump shows that the dispatcher-Executor thread has the following stack 
trace.
{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote}
* Relevant code paths
{quote}Within an executor process, there's a [dispatcher 
thread|https://github.com/apache/spark/blob/1fdd46f173f7bc90e0523eb0a2d5e8e27e990102/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L170]
 dedicated to CoarseGrainedExecutorBackend (a single RPC endpoint) that 
launches tasks scheduled by the driver. Each task is run on a TaskRunner thread 
backed by a thread pool created for the executor. The TaskRunner thread and the 
dispatcher 

[jira] [Created] (SPARK-45227) Fix an issue where an executor process randomly gets stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe

2023-09-19 Thread Bo Xiong (Jira)
Bo Xiong created SPARK-45227:


 Summary: Fix an issue where an executor process randomly gets 
stuck by making CoarseGrainedExecutorBackend.taskResources thread-safe
 Key: SPARK-45227
 URL: https://issues.apache.org/jira/browse/SPARK-45227
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0, 3.3.1, 4.0.0
Reporter: Bo Xiong
 Fix For: 3.5.1


h3. Symptom

Our Spark 3 app running on EMR 6.10.0 with Spark 3.3.1 got stuck in the very 
last step of writing a data frame to S3 by calling {{{}df.write{}}}. Looking at 
Spark UI, we saw that an executor process hung over 1 hour. After we manually 
killed the executor process, the app succeeded.

Note that the same EMR cluster with two worker nodes was able to run the same 
app without any issue before and after the incident.
h3. Observations

Below is what's observed from relevant container logs and thread dump.
 * A regular task that's sent to the executor, which also reported back to the 
driver upon the task completion.

{quote}{{$zgrep 'task 150' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 150.0 in stage 23.0 (TID 
923) (ip-10-0-185-107.ec2.internal, executor 3, partition 150, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()
23/09/12 18:13:55 INFO TaskSetManager: Finished task 150.0 in stage 23.0 (TID 
923) in 126 ms on ip-10-0-185-107.ec2.internal (executor 3) (16/200)

$zgrep 'task 923' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 923

$zgrep 'task 150' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO Executor: Running task 150.0 in stage 23.0 (TID 923)
23/09/12 18:13:55 INFO Executor: Finished task 150.0 in stage 23.0 (TID 923). 
4495 bytes result sent to driver}}
{quote} * Another task that's sent to the executor but didn't get launched 
since the single-threaded dispatcher was stuck (presumably in an "infinite 
loop" as explained later).

{quote}{{$zgrep 'task 153' container_1694029806204_12865_01_01/stderr.gz
23/09/12 18:13:55 INFO TaskSetManager: Starting task 153.0 in stage 23.0 (TID 
924) (ip-10-0-185-107.ec2.internal, executor 3, partition 153, NODE_LOCAL, 4432 
bytes) taskResourceAssignments Map()

$zgrep ' 924' container_1694029806204_12865_01_04/stderr.gz
23/09/12 18:13:55 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 924

$zgrep 'task 153' container_1694029806204_12865_01_04/stderr.gz
>> note that the above command has no matching result, indicating that task 
>> 153.0 in stage 23.0 (TID 924) was never launched}}
{quote} * Thread dump shows that the dispatcher-Executor thread has the 
following stack trace.

{quote}{{"dispatcher-Executor" #40 daemon prio=5 os_prio=0 
tid=0x98e37800 nid=0x1aff runnable [0x73bba000]
   java.lang.Thread.State: RUNNABLE
at scala.runtime.BoxesRunTime.equalsNumObject(BoxesRunTime.java:142)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:131)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.mutable.HashTable.elemEquals(HashTable.scala:365)
at scala.collection.mutable.HashTable.elemEquals$(HashTable.scala:365)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:44)
at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:140)
at 
scala.collection.mutable.HashTable.findOrAddEntry(HashTable.scala:169)
at 
scala.collection.mutable.HashTable.findOrAddEntry$(HashTable.scala:167)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.put(HashMap.scala:126)
at scala.collection.mutable.HashMap.update(HashMap.scala:131)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:200)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at 
org.apache.spark.rpc.netty.Inbox$$Lambda$323/1930826709.apply$mcV$sp(Unknown 
Source)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)}}
{quote} * 

[jira] [Updated] (SPARK-44112) Drop Java 8 Support

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44112:
--
Summary: Drop Java 8 Support  (was: Drop Java 8 and Java 11 Support)

> Drop Java 8 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44112) Drop Java 8 and Java 11 Support

2023-09-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766963#comment-17766963
 ] 

Dongjoon Hyun commented on SPARK-44112:
---

Sorry, but don't change this JIRA, [~LuciferYang]. If we want to drop Java 11, 
it needs another JIRA. I'll recover the scope of this JIRA issue

> Drop Java 8 and Java 11 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44112) Drop Java 8 and Java 11 Support

2023-09-19 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44112:
-
Summary: Drop Java 8 and Java 11 Support  (was: Drop Java 8 Support)

> Drop Java 8 and Java 11 Support
> ---
>
> Key: SPARK-44112
> URL: https://issues.apache.org/jira/browse/SPARK-44112
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45178) Fallback to use single batch executor for Trigger.AvailableNow with unsupported sources rather than using wrapper

2023-09-19 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-45178.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42940
[https://github.com/apache/spark/pull/42940]

> Fallback to use single batch executor for Trigger.AvailableNow with 
> unsupported sources rather than using wrapper
> -
>
> Key: SPARK-45178
> URL: https://issues.apache.org/jira/browse/SPARK-45178
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We have observed the case where wrapper implementation of 
> Trigger.AvailableNow (
> AvailableNowDataStreamWrapper and subclasses) is not fully compatible with 
> 3rd party data source and brought up correctness issue.
>  
> While we could persuade 3rd party data source to support 
> Trigger.AvailableNow, pursuing all 3rd parties to do this is too aggressive 
> and challenging goal we never be able to make. Also, it may not be also 
> possible to come up with the wrapper implementation which would have zero 
> issue with any arbitrary source.
>  
> As a mitigation, we want to make a slight behavioral change for such case, 
> falling back to single batch execution (a.k.a. Trigger.Once) rather than 
> using wrapper implementation. The exact behavior between Trigger.AvailableNow 
> and Trigger.Once are different so it's technically behavioral change, but 
> it's probably lot less surprised than failing the query.
>  
> For extreme case where users are confident that there will be no issue at all 
> on using wrapper, we will come up with a flag to provide the previous 
> behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45178) Fallback to use single batch executor for Trigger.AvailableNow with unsupported sources rather than using wrapper

2023-09-19 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-45178:


Assignee: Jungtaek Lim

> Fallback to use single batch executor for Trigger.AvailableNow with 
> unsupported sources rather than using wrapper
> -
>
> Key: SPARK-45178
> URL: https://issues.apache.org/jira/browse/SPARK-45178
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
>
> We have observed the case where wrapper implementation of 
> Trigger.AvailableNow (
> AvailableNowDataStreamWrapper and subclasses) is not fully compatible with 
> 3rd party data source and brought up correctness issue.
>  
> While we could persuade 3rd party data source to support 
> Trigger.AvailableNow, pursuing all 3rd parties to do this is too aggressive 
> and challenging goal we never be able to make. Also, it may not be also 
> possible to come up with the wrapper implementation which would have zero 
> issue with any arbitrary source.
>  
> As a mitigation, we want to make a slight behavioral change for such case, 
> falling back to single batch execution (a.k.a. Trigger.Once) rather than 
> using wrapper implementation. The exact behavior between Trigger.AvailableNow 
> and Trigger.Once are different so it's technically behavioral change, but 
> it's probably lot less surprised than failing the query.
>  
> For extreme case where users are confident that there will be no issue at all 
> on using wrapper, we will come up with a flag to provide the previous 
> behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45192) lineInterpolate for graphviz edge is overdue

2023-09-19 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-45192.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42969
[https://github.com/apache/spark/pull/42969]

> lineInterpolate for graphviz edge is overdue
> 
>
> Key: SPARK-45192
> URL: https://issues.apache.org/jira/browse/SPARK-45192
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45192) lineInterpolate for graphviz edge is overdue

2023-09-19 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-45192:


Assignee: Kent Yao

> lineInterpolate for graphviz edge is overdue
> 
>
> Key: SPARK-45192
> URL: https://issues.apache.org/jira/browse/SPARK-45192
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45226) Refine docstring of `rand/randn`

2023-09-19 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-45226:
---

 Summary: Refine docstring of `rand/randn`
 Key: SPARK-45226
 URL: https://issues.apache.org/jira/browse/SPARK-45226
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43432) Fix `min_periods` for Rolling to work same as pandas

2023-09-19 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43432:

Parent: (was: SPARK-44101)
Issue Type: Improvement  (was: Sub-task)

> Fix `min_periods` for Rolling to work same as pandas 
> -
>
> Key: SPARK-43432
> URL: https://issues.apache.org/jira/browse/SPARK-43432
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Fix `min_periods` for Rolling to work same as pandas
> https://github.com/pandas-dev/pandas/issues/31302



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45208) Website doesn't have horizontal scrollbar

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45208:
--
Summary: Website doesn't have horizontal scrollbar  (was: Kubernetes 
Configuration in Spark Community Website doesn't have horizontal scrollbar)

> Website doesn't have horizontal scrollbar
> -
>
> Key: SPARK-45208
> URL: https://issues.apache.org/jira/browse/SPARK-45208
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Major
>
> I find a recent issue with the official Spark documentation on the website. 
> Specifically, the Kubernetes configuration lists on the right-hand side are 
> not visible and doc doesn't have a horizontal scrollbar.
>  
> - 
> [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration]
> - 
> [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration]
> Wide tables are broken in the same way.
> - https://spark.apache.org/docs/latest/spark-standalone.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45208) Kubernetes Configuration in Spark Community Website doesn't have horizontal scrollbar

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45208:
--
Description: 
I find a recent issue with the official Spark documentation on the website. 
Specifically, the Kubernetes configuration lists on the right-hand side are not 
visible and doc doesn't have a horizontal scrollbar.

 
- [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration]
- [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration]

Wide tables are broken in the same way.

- https://spark.apache.org/docs/latest/spark-standalone.html

  was:
I find a recent issue with the official Spark documentation on the website. 
Specifically, the Kubernetes configuration lists on the right-hand side are not 
visible and doc doesn't have a horizontal scrollbar.

 
- [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration]
- [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration]


> Kubernetes Configuration in Spark Community Website doesn't have horizontal 
> scrollbar
> -
>
> Key: SPARK-45208
> URL: https://issues.apache.org/jira/browse/SPARK-45208
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Qian Sun
>Priority: Major
>
> I find a recent issue with the official Spark documentation on the website. 
> Specifically, the Kubernetes configuration lists on the right-hand side are 
> not visible and doc doesn't have a horizontal scrollbar.
>  
> - 
> [https://spark.apache.org/docs/3.5.0/running-on-kubernetes.html#configuration]
> - 
> [https://spark.apache.org/docs/3.4.1/running-on-kubernetes.html#configuration]
> Wide tables are broken in the same way.
> - https://spark.apache.org/docs/latest/spark-standalone.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45225) XML: XSD file URL support

2023-09-19 Thread Sandip Agarwala (Jira)
Sandip Agarwala created SPARK-45225:
---

 Summary: XML: XSD file URL support
 Key: SPARK-45225
 URL: https://issues.apache.org/jira/browse/SPARK-45225
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Sandip Agarwala






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45224) Add examples w/ map and array as parameters of sql()

2023-09-19 Thread Max Gekk (Jira)
Max Gekk created SPARK-45224:


 Summary: Add examples w/ map and array as parameters of sql()
 Key: SPARK-45224
 URL: https://issues.apache.org/jira/browse/SPARK-45224
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


Add a few more example to the `sql()` method in PySpark and show how to use map 
and array parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45220) Refine docstring of `DataFrame.join`

2023-09-19 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-45220:
-
Description: 
Refine the docstring of `DataFrame.join`.

The examples should also include: left join, left anit join, join on multiple 
columns and column names, join on multiple conditions

  was:Refine the docstring of `DataFrame.join`.


> Refine docstring of `DataFrame.join`
> 
>
> Key: SPARK-45220
> URL: https://issues.apache.org/jira/browse/SPARK-45220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Refine the docstring of `DataFrame.join`.
> The examples should also include: left join, left anit join, join on multiple 
> columns and column names, join on multiple conditions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45223) Refine docstring of `Column.when`

2023-09-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-45223:


 Summary: Refine docstring of `Column.when`
 Key: SPARK-45223
 URL: https://issues.apache.org/jira/browse/SPARK-45223
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of Column.when 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45222) Refine docstring of `DataFrameReader.json`

2023-09-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-45222:


 Summary: Refine docstring of `DataFrameReader.json`
 Key: SPARK-45222
 URL: https://issues.apache.org/jira/browse/SPARK-45222
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of read json



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45221) Refine docstring of `DataFrameReader.parquet`

2023-09-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-45221:


 Summary: Refine docstring of `DataFrameReader.parquet`
 Key: SPARK-45221
 URL: https://issues.apache.org/jira/browse/SPARK-45221
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of read parquet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45220) Refine docstring of `DataFrame.join`

2023-09-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-45220:


 Summary: Refine docstring of `DataFrame.join`
 Key: SPARK-45220
 URL: https://issues.apache.org/jira/browse/SPARK-45220
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of `DataFrame.join`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45219) Refine docstring of `DataFrame.withColumnRenamed`

2023-09-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-45219:


 Summary: Refine docstring of `DataFrame.withColumnRenamed`
 Key: SPARK-45219
 URL: https://issues.apache.org/jira/browse/SPARK-45219
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of `DataFrame.withColumnRenamed`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45218) Refine docstring of `Column.isin`

2023-09-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-45218:


 Summary: Refine docstring of `Column.isin`
 Key: SPARK-45218
 URL: https://issues.apache.org/jira/browse/SPARK-45218
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine the docstring of `Column.isin`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45217) Support change log level of specific package or class

2023-09-19 Thread Zhongwei Zhu (Jira)
Zhongwei Zhu created SPARK-45217:


 Summary: Support change log level of specific package or class
 Key: SPARK-45217
 URL: https://issues.apache.org/jira/browse/SPARK-45217
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Zhongwei Zhu


Add SparkContext.setLogLevel(loggerName: String, logLevel: String) to support 
change log level of specific package or class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45207) Implement Error Enrichment for Scala Client

2023-09-19 Thread Yihong He (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yihong He updated SPARK-45207:
--
Summary: Implement Error Enrichment for Scala Client  (was: Implement 
FetchErrorDetails RPC)

> Implement Error Enrichment for Scala Client
> ---
>
> Key: SPARK-45207
> URL: https://issues.apache.org/jira/browse/SPARK-45207
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44306) Group FileStatus with few RPC calls within Yarn Client

2023-09-19 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-44306.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42357
[https://github.com/apache/spark/pull/42357]

> Group FileStatus with few RPC calls within Yarn Client
> --
>
> Key: SPARK-44306
> URL: https://issues.apache.org/jira/browse/SPARK-44306
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Affects Versions: 0.9.2, 2.3.0, 3.5.0
>Reporter: SHU WANG
>Assignee: SHU WANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> It's inefficient to obtain *FileStatus* for each resource [one by 
> one|https://github.com/apache/spark/blob/531ec8bddc8dd22ca39486dbdd31e62e989ddc15/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala#L71C1].
>  In our company setting, we are running Spark with Hadoop Yarn and HDFS. We 
> noticed the current behavior has two major drawbacks:
>  # Since each *getFileStatus* call involves network delays, the overall delay 
> can be *large* and add *uncertainty* to the overall Spark job runtime. 
> Specifically, we quantify this overhead within our cluster. We see the p50 
> overhead is around 10s, p80 is 1 min, and p100 is up to 15 mins. When HDFS is 
> overloaded, the delays become more severe. 
>  # In our cluster, we have nearly 100 million *getFileStatus* call to HDFS 
> daily. We noticed that in our cluster, most resources come from the same HDFS 
> directory for each user (See our [engineer blog 
> post|https://engineering.linkedin.com/blog/2023/reducing-apache-spark-application-dependencies-upload-by-99-]
>  about why we took this approach). Therefore, we can greatly reduce nearly 
> 100 million *getFileStatus* call to 0.1 million *listStatus* calls daily. 
> This will further reduce overhead from the HDFS side. 
> All in all, a more efficient way to fetch the *FileStatus* for each resource 
> is highly needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44306) Group FileStatus with few RPC calls within Yarn Client

2023-09-19 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-44306:
---

Assignee: SHU WANG

> Group FileStatus with few RPC calls within Yarn Client
> --
>
> Key: SPARK-44306
> URL: https://issues.apache.org/jira/browse/SPARK-44306
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Affects Versions: 0.9.2, 2.3.0, 3.5.0
>Reporter: SHU WANG
>Assignee: SHU WANG
>Priority: Major
>  Labels: pull-request-available
>
> It's inefficient to obtain *FileStatus* for each resource [one by 
> one|https://github.com/apache/spark/blob/531ec8bddc8dd22ca39486dbdd31e62e989ddc15/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala#L71C1].
>  In our company setting, we are running Spark with Hadoop Yarn and HDFS. We 
> noticed the current behavior has two major drawbacks:
>  # Since each *getFileStatus* call involves network delays, the overall delay 
> can be *large* and add *uncertainty* to the overall Spark job runtime. 
> Specifically, we quantify this overhead within our cluster. We see the p50 
> overhead is around 10s, p80 is 1 min, and p100 is up to 15 mins. When HDFS is 
> overloaded, the delays become more severe. 
>  # In our cluster, we have nearly 100 million *getFileStatus* call to HDFS 
> daily. We noticed that in our cluster, most resources come from the same HDFS 
> directory for each user (See our [engineer blog 
> post|https://engineering.linkedin.com/blog/2023/reducing-apache-spark-application-dependencies-upload-by-99-]
>  about why we took this approach). Therefore, we can greatly reduce nearly 
> 100 million *getFileStatus* call to 0.1 million *listStatus* calls daily. 
> This will further reduce overhead from the HDFS side. 
> All in all, a more efficient way to fetch the *FileStatus* for each resource 
> is highly needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45215) Combine HiveCatalogedDDLSuite and HiveDDLSuite

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45215.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42992
[https://github.com/apache/spark/pull/42992]

> Combine HiveCatalogedDDLSuite and HiveDDLSuite 
> ---
>
> Key: SPARK-45215
> URL: https://issues.apache.org/jira/browse/SPARK-45215
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45215) Combine HiveCatalogedDDLSuite and HiveDDLSuite

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45215:
-

Assignee: BingKun Pan

> Combine HiveCatalogedDDLSuite and HiveDDLSuite 
> ---
>
> Key: SPARK-45215
> URL: https://issues.apache.org/jira/browse/SPARK-45215
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43453) Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43453.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42991
[https://github.com/apache/spark/pull/42991]

> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.
> 
>
> Key: SPARK-43453
> URL: https://issues.apache.org/jira/browse/SPARK-43453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43453) Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43453:
-

Assignee: Haejoon Lee

> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.
> 
>
> Key: SPARK-43453
> URL: https://issues.apache.org/jira/browse/SPARK-43453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43453) Ignore the names of MultiIndex when axis=1 for concat

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43453:
--
Summary: Ignore the names of MultiIndex when axis=1 for concat  (was: 
Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.)

> Ignore the names of MultiIndex when axis=1 for concat
> -
>
> Key: SPARK-43453
> URL: https://issues.apache.org/jira/browse/SPARK-43453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-09-19 Thread Enrico Minack (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766881#comment-17766881
 ] 

Enrico Minack commented on SPARK-38200:
---

Sadly, still no feedback from reviewers.

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  
> https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile

2023-09-19 Thread Jitin Dominic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitin Dominic updated SPARK-45200:
--
Flags: Important

> Spark 3.4.0 always using default log4j profile
> --
>
> Key: SPARK-45200
> URL: https://issues.apache.org/jira/browse/SPARK-45200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Jitin Dominic
>Priority: Major
>
> I've been using Spark core 3.2.2 and was upgrading to 3.4.0
> On execution of my Java code with the 3.4.0,  it generates some extra set of 
> logs but don't face this issue with 3.2.2.
>  
> I noticed that logs says _Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties._
>  
> Is this a bug or do we have a  a configuration to disable the using of 
> default log4j profile?
> I didn't see anything in the documentation
>  
>  
> {code:java}
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
> 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/09/18 20:05:08 INFO ResourceUtils: 
> ==
> 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
> 23/09/18 20:05:08 INFO ResourceUtils: 
> ==
> 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ
> 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
> offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
> cpus, amount: 1.0)
> 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu
> 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
> 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd
> 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd
> 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 
> 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 
> 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: jd; groups with view 
> permissions: EMPTY; users with modify permissions: jd; groups with modify 
> permissions: EMPTY
> 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on 
> port 39155.
> 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker
> 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster
> 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
> up
> 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012
> 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 
> MiB
> 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
> 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd
> 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath 
> (userClassPathFirst = false): ''
> 23/09/18 20:05:09 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819.
> 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819
> 23/09/18 20:05:09 INFO BlockManager: Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy
> 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager 
> jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: 
> BlockManagerId(driver, jd, 32819, None)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-09-19 Thread Yair Ofek (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766815#comment-17766815
 ] 

Yair Ofek commented on SPARK-38200:
---

[~EnricoMi] any news on when this important feature going to be merged?

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  
> https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44622) Implement FetchErrorDetails RPC

2023-09-19 Thread Yihong He (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yihong He updated SPARK-44622:
--
Summary: Implement FetchErrorDetails RPC  (was: Implement error enrichment 
and JVM stacktrace)

> Implement FetchErrorDetails RPC
> ---
>
> Key: SPARK-44622
> URL: https://issues.apache.org/jira/browse/SPARK-44622
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45183) ExecutorPodsLifecycleManager delete a pod multi times.

2023-09-19 Thread hgs (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766802#comment-17766802
 ] 

hgs commented on SPARK-45183:
-

I have compared  Spark 3.2.0 with Spark 3.5.0. The deletion of pod is nothing 
different in `ExecutorPodsLifecycleManager`.So I suspect the version 3.5.0 may 
have the same issure.[~dongjoon] 

> ExecutorPodsLifecycleManager delete a pod multi times.
> --
>
> Key: SPARK-45183
> URL: https://issues.apache.org/jira/browse/SPARK-45183
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
> Environment: Spark 3.2.0
>Reporter: hgs
>Priority: Minor
>
>  Because `ExecutorPodsLifecycleManager`.`removedExecutorsCache` is not thread 
> safe, will cause a pod deleted many times when  
> `ExecutorPodsLifecycleManager`.`onNewSnapshots` called by multi threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45216) Fix non-deterministic seeded Dataset APIs

2023-09-19 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-45216:
---
Description: 
If we run the following example the result is the expected equal 2 columns:

{noformat}
val c = rand()
df.select(c, c)

+--+--+
|rand(-4522010140232537566)|rand(-4522010140232537566)|
+--+--+
|0.4520819282997137|0.4520819282997137|
+--+--+
{noformat}

 
But if we run use other similar APIs their result is incorrect:

{noformat}
val r1 = random()
val r2 = uuid()
val r3 = shuffle(col("x"))
val x = df.select(r1, r1, r2, r2, r3, r3)

+--+--+++--+--+
|rand()|rand()|  uuid()|  
uuid()|shuffle(x)|shuffle(x)|
+--+--+++--+--+
|0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...|
 [1, 2, 3]| [2, 1, 3]|
+--+--+++--+--+
{noformat}


> Fix non-deterministic seeded Dataset APIs
> -
>
> Key: SPARK-45216
> URL: https://issues.apache.org/jira/browse/SPARK-45216
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Peter Toth
>Priority: Major
>
> If we run the following example the result is the expected equal 2 columns:
> {noformat}
> val c = rand()
> df.select(c, c)
> +--+--+
> |rand(-4522010140232537566)|rand(-4522010140232537566)|
> +--+--+
> |0.4520819282997137|0.4520819282997137|
> +--+--+
> {noformat}
>  
> But if we run use other similar APIs their result is incorrect:
> {noformat}
> val r1 = random()
> val r2 = uuid()
> val r3 = shuffle(col("x"))
> val x = df.select(r1, r1, r2, r2, r3, r3)
> +--+--+++--+--+
> |rand()|rand()|  uuid()|  
> uuid()|shuffle(x)|shuffle(x)|
> +--+--+++--+--+
> |0.7407604956381952|0.7957319451135009|e55bc4b0-74e6-4b0...|a587163b-d06b-4bb...|
>  [1, 2, 3]| [2, 1, 3]|
> +--+--+++--+--+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45216) Fix non-deterministic seeded Dataset APIs

2023-09-19 Thread Peter Toth (Jira)
Peter Toth created SPARK-45216:
--

 Summary: Fix non-deterministic seeded Dataset APIs
 Key: SPARK-45216
 URL: https://issues.apache.org/jira/browse/SPARK-45216
 Project: Spark
  Issue Type: Bug
  Components: Connect, SQL
Affects Versions: 4.0.0
Reporter: Peter Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior

2023-09-19 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee reopened SPARK-43433:
-

> Match `GroupBy.nth` behavior with new pandas behavior
> -
>
> Key: SPARK-43433
> URL: https://issues.apache.org/jira/browse/SPARK-43433
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Match behavior with 
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45215) Combine HiveCatalogedDDLSuite and HiveDDLSuite

2023-09-19 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-45215:
---

 Summary: Combine HiveCatalogedDDLSuite and HiveDDLSuite 
 Key: SPARK-45215
 URL: https://issues.apache.org/jira/browse/SPARK-45215
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45211) Scala 2.13 daily test failed

2023-09-19 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45211:


Assignee: Yang Jie

> Scala 2.13 daily  test failed
> -
>
> Key: SPARK-45211
> URL: https://issues.apache.org/jira/browse/SPARK-45211
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> * [https://github.com/apache/spark/actions/runs/6215331575/job/16868131377]
> {code:java}
> [info] - abandoned query gets INVALID_HANDLE.OPERATION_ABANDONED error *** 
> FAILED *** (157 milliseconds)
> 19991[info]   Expected exception org.apache.spark.SparkException to be 
> thrown, but java.lang.StackOverflowError was thrown 
> (ReattachableExecuteSuite.scala:172)
> 19992[info]   org.scalatest.exceptions.TestFailedException:
> 19993[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 19994[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 19995[info]   at 
> org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
> 19996[info]   at org.scalatest.Assertions.intercept(Assertions.scala:756)
> 19997[info]   at org.scalatest.Assertions.intercept$(Assertions.scala:746)
> 19998[info]   at 
> org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1564)
> 1[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18(ReattachableExecuteSuite.scala:172)
> 2[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18$adapted(ReattachableExecuteSuite.scala:168)
> 20001[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub(SparkConnectServerTest.scala:222)
> 20002[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub$(SparkConnectServerTest.scala:216)
> 20003[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withCustomBlockingStub(ReattachableExecuteSuite.scala:30)
> 20004[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16(ReattachableExecuteSuite.scala:168)
> 20005[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16$adapted(ReattachableExecuteSuite.scala:151)
> 20006[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withClient(SparkConnectServerTest.scala:199)
> 20007[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withClient$(SparkConnectServerTest.scala:191)
> 20008[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withClient(ReattachableExecuteSuite.scala:30)
> 20009[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$15(ReattachableExecuteSuite.scala:151)
> 20010[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> 20011[info]   at 
> org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> 20012[info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> 20013[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> 20014[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> 20015[info]   at 
> org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> 20016[info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> 20017[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 20018[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 20019[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 20020[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 20021[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 20022[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> 20023[info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> 20024[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 20025[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 20026[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 20027[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 20028[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 20029[info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> 20030[info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> 20031[info]   at 

[jira] [Resolved] (SPARK-45211) Scala 2.13 daily test failed

2023-09-19 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45211.
--
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42981
[https://github.com/apache/spark/pull/42981]

> Scala 2.13 daily  test failed
> -
>
> Key: SPARK-45211
> URL: https://issues.apache.org/jira/browse/SPARK-45211
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.1, 4.0.0
>
>
> * [https://github.com/apache/spark/actions/runs/6215331575/job/16868131377]
> {code:java}
> [info] - abandoned query gets INVALID_HANDLE.OPERATION_ABANDONED error *** 
> FAILED *** (157 milliseconds)
> 19991[info]   Expected exception org.apache.spark.SparkException to be 
> thrown, but java.lang.StackOverflowError was thrown 
> (ReattachableExecuteSuite.scala:172)
> 19992[info]   org.scalatest.exceptions.TestFailedException:
> 19993[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 19994[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 19995[info]   at 
> org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
> 19996[info]   at org.scalatest.Assertions.intercept(Assertions.scala:756)
> 19997[info]   at org.scalatest.Assertions.intercept$(Assertions.scala:746)
> 19998[info]   at 
> org.scalatest.funsuite.AnyFunSuite.intercept(AnyFunSuite.scala:1564)
> 1[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18(ReattachableExecuteSuite.scala:172)
> 2[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$18$adapted(ReattachableExecuteSuite.scala:168)
> 20001[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub(SparkConnectServerTest.scala:222)
> 20002[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withCustomBlockingStub$(SparkConnectServerTest.scala:216)
> 20003[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withCustomBlockingStub(ReattachableExecuteSuite.scala:30)
> 20004[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16(ReattachableExecuteSuite.scala:168)
> 20005[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$16$adapted(ReattachableExecuteSuite.scala:151)
> 20006[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withClient(SparkConnectServerTest.scala:199)
> 20007[info]   at 
> org.apache.spark.sql.connect.SparkConnectServerTest.withClient$(SparkConnectServerTest.scala:191)
> 20008[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.withClient(ReattachableExecuteSuite.scala:30)
> 20009[info]   at 
> org.apache.spark.sql.connect.execution.ReattachableExecuteSuite.$anonfun$new$15(ReattachableExecuteSuite.scala:151)
> 20010[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> 20011[info]   at 
> org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> 20012[info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> 20013[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> 20014[info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> 20015[info]   at 
> org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> 20016[info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> 20017[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 20018[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 20019[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 20020[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 20021[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 20022[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> 20023[info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> 20024[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 20025[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 20026[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 20027[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 20028[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 20029[info]   at 
> 

[jira] [Updated] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151

2023-09-19 Thread Deng Ziming (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deng Ziming updated SPARK-45213:

Description: 
in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") 
, we are using _LEGACY_ERROR_TEMP_2151, We should use proper error class name 
rather than `_LEGACY_ERROR_TEMP_xxx`.

 

*NOTE:* Please reply to this ticket before start working on it, to avoid 
working on same ticket at a time

  was:
We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.

 

*NOTE:* Please reply to this ticket before start working on it, to avoid 
working on same ticket at a time


> Assign name to _LEGACY_ERROR_TEMP_2151
> --
>
> Key: SPARK-45213
> URL: https://issues.apache.org/jira/browse/SPARK-45213
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Haejoon Lee
>Priority: Major
>
> in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating 
> dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error 
> class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151

2023-09-19 Thread Deng Ziming (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deng Ziming updated SPARK-45213:

Affects Version/s: 3.5.0
   (was: 3.4.0)

> Assign name to _LEGACY_ERROR_TEMP_2151
> --
>
> Key: SPARK-45213
> URL: https://issues.apache.org/jira/browse/SPARK-45213
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Haejoon Lee
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151

2023-09-19 Thread Deng Ziming (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deng Ziming updated SPARK-45213:

Fix Version/s: (was: 3.4.0)

> Assign name to _LEGACY_ERROR_TEMP_2151
> --
>
> Key: SPARK-45213
> URL: https://issues.apache.org/jira/browse/SPARK-45213
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Assignee: Haejoon Lee
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45214) Columns should not be visible for filter after projection

2023-09-19 Thread Jakub Wozniak (Jira)
Jakub Wozniak created SPARK-45214:
-

 Summary: Columns should not be visible for filter after projection
 Key: SPARK-45214
 URL: https://issues.apache.org/jira/browse/SPARK-45214
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1
Reporter: Jakub Wozniak


Columns are visible for filtering but not for select after projection. Moreover 
the behaviour is different when after a union (in this case columns are not 
visible for filtering anymore).
{code:java}
from pyspark.sql import SparkSession
from pyspark.sql.types import *

data1 = []
data2 = []

for i in range(2): 
data1.append( (1,i) )
data2.append( (2,i+10))



schema1 = StructType([
StructField('f1', IntegerType(), True),
 StructField('f2', IntegerType(), True)
])



df1 = spark.createDataFrame(data1, schema1)
df2 = spark.createDataFrame(data2, schema1)


df1.show()
df2.show()


#works, f1 is available for filter (though it should not be)
df1.select('f2').where('f1=1').show()

#error, f1 is not available
df1.select('f2').union(df2.select('f2')).where('f1=1').show()

#this is semantically not symmetric -> incorrect. 

{code}

This is similar to this one: https://issues.apache.org/jira/browse/SPARK-30421
Perhaps it gives a bit more argumentation why this should be fixed as it is 
logically not correct. 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151

2023-09-19 Thread Deng Ziming (Jira)
Deng Ziming created SPARK-45213:
---

 Summary: Assign name to _LEGACY_ERROR_TEMP_2151
 Key: SPARK-45213
 URL: https://issues.apache.org/jira/browse/SPARK-45213
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Deng Ziming
Assignee: Haejoon Lee
 Fix For: 3.4.0


We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.

 

*NOTE:* Please reply to this ticket before start working on it, to avoid 
working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45101) Spark UI: A stage is still active even when all of it's tasks are succeeded

2023-09-19 Thread RickyMa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RickyMa updated SPARK-45101:

Priority: Critical  (was: Major)

> Spark UI: A stage is still active even when all of it's tasks are succeeded
> ---
>
> Key: SPARK-45101
> URL: https://issues.apache.org/jira/browse/SPARK-45101
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: RickyMa
>Priority: Critical
> Attachments: 1.png, 2.png, 3.png
>
>
> In the stage UI, we can see all the tasks' statuses are SUCCESS.
> But the stage is still marked as active.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed

2023-09-19 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690
 ] 

Qian Sun edited comment on SPARK-43182 at 9/19/23 8:14 AM:
---

Hi [~Resol1992]

I ran your sql, tried different configuration combinations and believe 
regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which 
introduces 
extra shuffles. AQE can give up skewJoin Optimization if extra shuffle 
introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc 
[~cloud_fan] 

 

ref: 

[https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229]


was (Author: dcoliversun):
Hi [~Resol1992]

I ran your sql, tried different configuration combinations and believe 
regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which 
introduces 
extra shuffles. AQE can give up skewJoin Optimization if extra shuffle 
introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc 
[~cloud_fan]  * 
https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229

> Mutilple tables join with limit when AE is enabled and one table is skewed
> --
>
> Key: SPARK-43182
> URL: https://issues.apache.org/jira/browse/SPARK-43182
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Liu Shuo
>Priority: Critical
> Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, 
> part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, 
> part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, 
> part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, 
> part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, 
> part-m-00019.zip
>
>
> When we test AE in Spark3.4.0 with the following case, we find If we disable 
> AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we 
> enable AE and enable skewJoin,it will take very long time.
> The test case:
> {code:java}
> ###uncompress the part-m-***.zip attachment, and put these files under 
> '/tmp/spark-warehouse/data/' dir.
> create table source_aqe(c1 int,c18 string) using csv options(path 
> 'file:///tmp/spark-warehouse/data/');
> create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned 
> by(c18 string); 
> insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from 
> source_aqe;
> insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from 
> source_aqe limit 12;
> insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from 
> source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from 
> source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from 
> source_aqe limit 12;
> set spark.sql.adaptive.enabled=false;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = false;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
>  
> ###it will finish in 20s 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> set spark.sql.adaptive.enabled=true;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = true;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
> ###it will take very long time 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To 

[jira] [Commented] (SPARK-43182) Mutilple tables join with limit when AE is enabled and one table is skewed

2023-09-19 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766690#comment-17766690
 ] 

Qian Sun commented on SPARK-43182:
--

Hi [~Resol1992]

I ran your sql, tried different configuration combinations and believe 
regression caused by *spark.sql.adaptive.forceOptimizeSkewedJoin* , which 
introduces 
extra shuffles. AQE can give up skewJoin Optimization if extra shuffle 
introduced when *spark.sql.adaptive.forceOptimizeSkewedJoin* is false. cc 
[~cloud_fan]  * 
https://github.com/apache/spark/blob/87a5442f7ed96b11051d8a9333476d080054e5a0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala#L225-L229

> Mutilple tables join with limit when AE is enabled and one table is skewed
> --
>
> Key: SPARK-43182
> URL: https://issues.apache.org/jira/browse/SPARK-43182
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Liu Shuo
>Priority: Critical
> Attachments: part-m-0.zip, part-m-1.zip, part-m-2.zip, 
> part-m-3.zip, part-m-4.zip, part-m-5.zip, part-m-6.zip, 
> part-m-7.zip, part-m-8.zip, part-m-9.zip, part-m-00010.zip, 
> part-m-00011.zip, part-m-00012.zip, part-m-00013.zip, part-m-00014.zip, 
> part-m-00015.zip, part-m-00016.zip, part-m-00017.zip, part-m-00018.zip, 
> part-m-00019.zip
>
>
> When we test AE in Spark3.4.0 with the following case, we find If we disable 
> AE or enable Ae but disable skewJoin, the sql will finish in 20s, but if we 
> enable AE and enable skewJoin,it will take very long time.
> The test case:
> {code:java}
> ###uncompress the part-m-***.zip attachment, and put these files under 
> '/tmp/spark-warehouse/data/' dir.
> create table source_aqe(c1 int,c18 string) using csv options(path 
> 'file:///tmp/spark-warehouse/data/');
> create table hive_snappy_aqe_table1(c1 int)stored as PARQUET partitioned 
> by(c18 string); 
> insert into table hive_snappy_aqe_table1 partition(c18=1)select c1 from 
> source_aqe;
> insert into table hive_snappy_aqe_table1 partition(c18=2)select c1 from 
> source_aqe limit 12;
> insert into table hive_snappy_aqe_table1 partition(c18=3)select c1 from 
> source_aqe limit 15;create table hive_snappy_aqe_table2(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table2 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table2 partition(c18=2)select c1 from 
> source_aqe limit 12;create table hive_snappy_aqe_table3(c1 int)stored as 
> PARQUET partitioned by(c18 string); 
> insert into table hive_snappy_aqe_table3 partition(c18=1)select c1 from 
> source_aqe limit 16;
> insert into table hive_snappy_aqe_table3 partition(c18=2)select c1 from 
> source_aqe limit 12;
> set spark.sql.adaptive.enabled=false;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = false;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
>  
> ###it will finish in 20s 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> set spark.sql.adaptive.enabled=true;
> set spark.sql.adaptive.forceOptimizeSkewedJoin = true;
> set spark.sql.adaptive.skewJoin.skewedPartitionFactor=1;
> set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=10KB;
> set spark.sql.adaptive.advisoryPartitionSizeInBytes=100KB;
> set spark.sql.autoBroadcastJoinThreshold = 51200;
> ###it will take very long time 
> select * from hive_snappy_aqe_table1 join hive_snappy_aqe_table2 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table2.c18 join 
> hive_snappy_aqe_table3 on 
> hive_snappy_aqe_table1.c18=hive_snappy_aqe_table3.c18 limit 10;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45101) Spark UI: A stage is still active even when all of it's tasks are succeeded

2023-09-19 Thread RickyMa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RickyMa updated SPARK-45101:

Affects Version/s: 3.5.0
   4.0.0

> Spark UI: A stage is still active even when all of it's tasks are succeeded
> ---
>
> Key: SPARK-45101
> URL: https://issues.apache.org/jira/browse/SPARK-45101
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: RickyMa
>Priority: Major
> Attachments: 1.png, 2.png, 3.png
>
>
> In the stage UI, we can see all the tasks' statuses are SUCCESS.
> But the stage is still marked as active.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45212) Install independent Python linter dependencies for branch-3.5

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45212.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42990
[https://github.com/apache/spark/pull/42990]

> Install independent Python linter dependencies for branch-3.5
> -
>
> Key: SPARK-45212
> URL: https://issues.apache.org/jira/browse/SPARK-45212
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 4.0.0
>
>
> Python linter failed in branch -3.5 daily test:
>  * [https://github.com/apache/spark/actions/runs/6221638911/job/16884068430]
> {code:java}
> Run PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
> 12starting python compilation test...
> 13python compilation succeeded.
> 14
> 15starting black test...
> 16black checks failed:
> 17Oh no!    The required version `22.6.0` does not match the running 
> version `23.9.1`!
> 18Please run 'dev/reformat-python' script.
> 191
> 20Error: Process completed with exit code 1. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45212) Install independent Python linter dependencies for branch-3.5

2023-09-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45212:
-

Assignee: Yang Jie

> Install independent Python linter dependencies for branch-3.5
> -
>
> Key: SPARK-45212
> URL: https://issues.apache.org/jira/browse/SPARK-45212
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> Python linter failed in branch -3.5 daily test:
>  * [https://github.com/apache/spark/actions/runs/6221638911/job/16884068430]
> {code:java}
> Run PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
> 12starting python compilation test...
> 13python compilation succeeded.
> 14
> 15starting black test...
> 16black checks failed:
> 17Oh no!    The required version `22.6.0` does not match the running 
> version `23.9.1`!
> 18Please run 'dev/reformat-python' script.
> 191
> 20Error: Process completed with exit code 1. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile

2023-09-19 Thread Jitin Dominic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitin Dominic updated SPARK-45200:
--
Description: 
I've been using Spark core 3.2.2 and was upgrading to 3.4.0

On execution of my Java code with the 3.4.0,  it generates some extra set of 
logs but don't face this issue with 3.2.2.

 

I noticed that logs says _Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties._

 

Is this a bug or do we have a  a configuration to disable the using of default 
log4j profile?

I didn't see anything in the documentation

 

 
{code:java}
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/09/18 20:05:08 INFO ResourceUtils: 
==
23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for 
spark.driver.
23/09/18 20:05:08 INFO ResourceUtils: 
==
23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ
23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu
23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd
23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd
23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 
23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 
23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: jd; groups with view 
permissions: EMPTY; users with modify permissions: jd; groups with modify 
permissions: EMPTY
23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on 
port 39155.
23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker
23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster
23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012
23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 MiB
23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator
23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd
23/09/18 20:05:09 INFO Executor: Starting executor with user classpath 
(userClassPathFirst = false): ''
23/09/18 20:05:09 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819.
23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819
23/09/18 20:05:09 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, jd, 32819, None)
23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager 
jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None)
23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, jd, 32819, None)
23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, jd, 32819, None)
 {code}

  was:
I've been using Spark core 3.2.2 and was upgrading to 3.4.0

 

On execution of my Java code with the 3.4.0,  it generates some extra set of 
logs but don't face this issue with 3.2.2.

 

I noticed that logs says _Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties._

 

Is this a bug or do we have a  a configuration to disable the using of default 
log4j profile?

 

I didn't see anything in the documentation

 

 
{code:java}
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile

2023-09-19 Thread Jitin Dominic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitin Dominic updated SPARK-45200:
--
Description: 
I've been using Spark core 3.2.2 and was upgrading to 3.4.0

 

On execution of my Java code with the 3.4.0,  it generates some extra set of 
logs but don't face this issue with 3.2.2.

 

I noticed that logs says _Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties._

 

Is this a bug or do we have a  a configuration to disable the using of default 
log4j profile?

 

I didn't see anything in the documentation

 

 
{code:java}
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/09/18 20:05:08 INFO ResourceUtils: 
==
23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for 
spark.driver.
23/09/18 20:05:08 INFO ResourceUtils: 
==
23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ
23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu
23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd
23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd
23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 
23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 
23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: jd; groups with view 
permissions: EMPTY; users with modify permissions: jd; groups with modify 
permissions: EMPTY
23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on 
port 39155.
23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker
23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster
23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012
23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 MiB
23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator
23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd
23/09/18 20:05:09 INFO Executor: Starting executor with user classpath 
(userClassPathFirst = false): ''
23/09/18 20:05:09 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819.
23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819
23/09/18 20:05:09 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, jd, 32819, None)
23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager 
jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None)
23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, jd, 32819, None)
23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, jd, 32819, None)
 {code}

  was:
I've been using Spark core 3.2.2 and was upgrading to 3.4.0

 

On execution of my Java code with the 3.4.0,  it generates some extra set of 
logs but don't face this issue with 3.2.2.

 

I noticed that logs says _Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties._

 

Is there a configuration to disable the using of default log4j profile?

 

 
{code:java}
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/09/18 20:05:08 INFO ResourceUtils: 

[jira] [Updated] (SPARK-45200) Spark 3.4.0 always using default log4j profile

2023-09-19 Thread Jitin Dominic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitin Dominic updated SPARK-45200:
--
Summary: Spark 3.4.0 always using default log4j profile  (was: 
Ignore/Disable Spark's default log4j profile)

> Spark 3.4.0 always using default log4j profile
> --
>
> Key: SPARK-45200
> URL: https://issues.apache.org/jira/browse/SPARK-45200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Jitin Dominic
>Priority: Major
>
> I've been using Spark core 3.2.2 and was upgrading to 3.4.0
>  
> On execution of my Java code with the 3.4.0,  it generates some extra set of 
> logs but don't face this issue with 3.2.2.
>  
> I noticed that logs says _Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties._
>  
> Is there a configuration to disable the using of default log4j profile?
>  
>  
> {code:java}
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
> 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/09/18 20:05:08 INFO ResourceUtils: 
> ==
> 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
> 23/09/18 20:05:08 INFO ResourceUtils: 
> ==
> 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ
> 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
> offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
> cpus, amount: 1.0)
> 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu
> 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
> 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd
> 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd
> 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 
> 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 
> 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: jd; groups with view 
> permissions: EMPTY; users with modify permissions: jd; groups with modify 
> permissions: EMPTY
> 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on 
> port 39155.
> 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker
> 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster
> 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
> up
> 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012
> 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 
> MiB
> 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
> 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd
> 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath 
> (userClassPathFirst = false): ''
> 23/09/18 20:05:09 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819.
> 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819
> 23/09/18 20:05:09 INFO BlockManager: Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy
> 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager 
> jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: 
> BlockManagerId(driver, jd, 32819, None)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-45200) Ignore/Disable Spark's default log4j profile

2023-09-19 Thread Jitin Dominic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitin Dominic updated SPARK-45200:
--
Issue Type: Bug  (was: Question)

> Ignore/Disable Spark's default log4j profile
> 
>
> Key: SPARK-45200
> URL: https://issues.apache.org/jira/browse/SPARK-45200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Jitin Dominic
>Priority: Major
>
> I've been using Spark core 3.2.2 and was upgrading to 3.4.0
>  
> On execution of my Java code with the 3.4.0,  it generates some extra set of 
> logs but don't face this issue with 3.2.2.
>  
> I noticed that logs says _Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties._
>  
> Is there a configuration to disable the using of default log4j profile?
>  
>  
> {code:java}
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 23/09/18 20:05:08 INFO SparkContext: Running Spark version 3.4.0
> 23/09/18 20:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/09/18 20:05:08 INFO ResourceUtils: 
> ==
> 23/09/18 20:05:08 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
> 23/09/18 20:05:08 INFO ResourceUtils: 
> ==
> 23/09/18 20:05:08 INFO SparkContext: Submitted application: XYZ
> 23/09/18 20:05:08 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
> offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
> cpus, amount: 1.0)
> 23/09/18 20:05:08 INFO ResourceProfile: Limiting resource is cpu
> 23/09/18 20:05:08 INFO ResourceProfileManager: Added ResourceProfile id: 0
> 23/09/18 20:05:08 INFO SecurityManager: Changing view acls to: jd
> 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls to: jd
> 23/09/18 20:05:08 INFO SecurityManager: Changing view acls groups to: 
> 23/09/18 20:05:08 INFO SecurityManager: Changing modify acls groups to: 
> 23/09/18 20:05:08 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: jd; groups with view 
> permissions: EMPTY; users with modify permissions: jd; groups with modify 
> permissions: EMPTY
> 23/09/18 20:05:08 INFO Utils: Successfully started service 'sparkDriver' on 
> port 39155.
> 23/09/18 20:05:08 INFO SparkEnv: Registering MapOutputTracker
> 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMaster
> 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 23/09/18 20:05:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
> up
> 23/09/18 20:05:08 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 23/09/18 20:05:08 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-226d599c-1511-4fae-b0e7-aae81b684012
> 23/09/18 20:05:08 INFO MemoryStore: MemoryStore started with capacity 2004.6 
> MiB
> 23/09/18 20:05:08 INFO SparkEnv: Registering OutputCommitCoordinator
> 23/09/18 20:05:08 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
> 23/09/18 20:05:09 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 23/09/18 20:05:09 INFO Executor: Starting executor ID driver on host jd
> 23/09/18 20:05:09 INFO Executor: Starting executor with user classpath 
> (userClassPathFirst = false): ''
> 23/09/18 20:05:09 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32819.
> 23/09/18 20:05:09 INFO NettyBlockTransferService: Server created on jd:32819
> 23/09/18 20:05:09 INFO BlockManager: Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy
> 23/09/18 20:05:09 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManagerMasterEndpoint: Registering block manager 
> jd:32819 with 2004.6 MiB RAM, BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, jd, 32819, None)
> 23/09/18 20:05:09 INFO BlockManager: Initialized BlockManager: 
> BlockManagerId(driver, jd, 32819, None)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org