[jira] [Updated] (SPARK-48580) Add consistency check and fallback for mapIds in push-merged block meta
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Parent: SPARK-33235 Issue Type: Sub-task (was: Bug) > Add consistency check and fallback for mapIds in push-merged block meta > --- > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics of Exchange as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) Add consistency check and fallback for mapIds in push-merged block meta
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Summary: Add consistency check and fallback for mapIds in push-merged block meta (was: MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data) > Add consistency check and fallback for mapIds in push-merged block meta > --- > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics of Exchange as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Description: When push-based shuffle enabled, 0.03% of the spark application in our cluster experienced shuffle data loss. The metrics of Exchange as follows: !image-2024-06-11-10-19-57-227.png|width=405,height=170! We eventually found some WARN logs on the shuffle server: {code:java} WARN shuffle-server-8-216 org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to index/meta failed{code} And analyzed the cause from the code: The merge metadata obtained by the reduce side from the driver comes from the {{mapTracker}} in the server's memory, while the actual reading of chunk data is based on the records in the shuffle server's {{{}metaFile{}}}. There is no consistency check between the two. was: When push-based shuffle enabled, 0.03% of the spark application in our cluster experienced shuffle data loss. The metrics for the job execution plan's Exchange are as follows: !image-2024-06-11-10-19-57-227.png|width=405,height=170! We eventually found some WARN logs on the shuffle server: {code:java} WARN shuffle-server-8-216 org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to index/meta failed{code} And analyzed the cause from the code: The merge metadata obtained by the reduce side from the driver comes from the {{mapTracker}} in the server's memory, while the actual reading of chunk data is based on the records in the shuffle server's {{{}metaFile{}}}. There is no consistency check between the two. > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics of Exchange as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Description: When push-based shuffle enabled, 0.03% of the spark application in our cluster experienced shuffle data loss. The metrics for the job execution plan's Exchange are as follows: !image-2024-06-11-10-19-57-227.png|width=405,height=170! We eventually found some WARN logs on the shuffle server: {code:java} WARN shuffle-server-8-216 org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to index/meta failed{code} And analyzed the cause from the code: The merge metadata obtained by the reduce side from the driver comes from the {{mapTracker}} in the server's memory, while the actual reading of chunk data is based on the records in the shuffle server's {{{}metaFile{}}}. There is no consistency check between the two. > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics for the job execution > plan's Exchange are as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Attachment: (was: image-2024-06-11-10-19-22-284.png) > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-57-227.png > > > When push-based shuffle enabled, 0.03% of the spark application in our > cluster experienced shuffle data loss. The metrics for the job execution > plan's Exchange are as follows: > !image-2024-06-11-10-19-57-227.png|width=405,height=170! > We eventually found some WARN logs on the shuffle server: > > {code:java} > WARN shuffle-server-8-216 > org.apache.spark.network.shuffle.RemoteBlockPushResolver: Application > application_ shuffleId 0 shuffleMergeId 0 reduceId 133 update to > index/meta failed{code} > > And analyzed the cause from the code: > The merge metadata obtained by the reduce side from the driver comes from the > {{mapTracker}} in the server's memory, while the actual reading of chunk data > is based on the records in the shuffle server's {{{}metaFile{}}}. There is no > consistency check between the two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Attachment: image-2024-06-11-10-19-57-227.png > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-22-284.png, > image-2024-06-11-10-19-57-227.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Attachment: image-2024-06-11-10-19-22-284.png > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > Attachments: image-2024-06-11-10-19-22-284.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48580) MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data
[ https://issues.apache.org/jira/browse/SPARK-48580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-48580: --- Summary: MergedBlock read by reduce have missing chunks, leading to inconsistent shuffle data (was: The merge blocks read by reduce have missing chunks, leading to inconsistent shuffle data) > MergedBlock read by reduce have missing chunks, leading to inconsistent > shuffle data > > > Key: SPARK-48580 > URL: https://issues.apache.org/jira/browse/SPARK-48580 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48580) The merge blocks read by reduce have missing chunks, leading to inconsistent shuffle data
gaoyajun02 created SPARK-48580: -- Summary: The merge blocks read by reduce have missing chunks, leading to inconsistent shuffle data Key: SPARK-48580 URL: https://issues.apache.org/jira/browse/SPARK-48580 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.5.0, 3.4.0, 3.3.0, 3.2.0 Reporter: gaoyajun02 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42694) Data duplication and loss occur after executing 'insert overwrite...' in Spark 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-42694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846293#comment-17846293 ] gaoyajun02 commented on SPARK-42694: Have you enabled push-based shuffle? > Data duplication and loss occur after executing 'insert overwrite...' in > Spark 3.1.1 > > > Key: SPARK-42694 > URL: https://issues.apache.org/jira/browse/SPARK-42694 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 > Hadoop 3.2.1 > Hive 3.1.2 >Reporter: FengZhou >Priority: Critical > Labels: shuffle, spark > Attachments: image-2023-03-07-15-59-08-818.png, > image-2023-03-07-15-59-27-665.png > > > We are currently using Spark version 3.1.1 in our production environment. We > have noticed that occasionally, after executing 'insert overwrite ... > select', the resulting data is inconsistent, with some data being duplicated > or lost. This issue does not occur all the time and seems to be more > prevalent on large tables with tens of millions of records. > We compared the execution plans for two runs of the same SQL and found that > they were identical. In the case where the SQL was executed successfully, the > amount of data being written and read during the shuffle stage was the same. > However, in the case where the problem occurred, the amount of data being > written and read during the shuffle stage was different. Please see the > attached screenshots for the write/read data during shuffle stage. > > Normal SQL: > !image-2023-03-07-15-59-08-818.png! > SQL with issues: > !image-2023-03-07-15-59-27-665.png! > > Is this problem caused by a bug in version 3.1.1, specifically (SPARK-34534): > 'New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss > or correctness'? Or is it caused by something else? What could be the root > cause of this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Priority: Critical (was: Blocker) > Data duplication may occur when fallback to origin shuffle block > > > Key: SPARK-45134 > URL: https://issues.apache.org/jira/browse/SPARK-45134 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Critical > > One possible situation that has been found is that, during the process of > requesting mergedBlockMeta, when the channel is closed, it may trigger two > callback callbacks and result in duplicate data for the original shuffle > blocks. > # The first time is when the channel is inactivated, the responseHandler > will execute the callback for all outstandingRpcs. > # The second time is when the listener corresponding to > shuffleClient.writeAndFlush executes the callback after the channel is closed. > Some Error Logs: > {code:java} > 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still > have 1 requests outstanding when connection from host/ip:prot is closed > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Connection from host:port closed > at > org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) > at > org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:745) > > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Failed to send RPC RPC 8079698359363123411 to > host/ip:port: java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.ja
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Priority: Blocker (was: Critical) > Data duplication may occur when fallback to origin shuffle block > > > Key: SPARK-45134 > URL: https://issues.apache.org/jira/browse/SPARK-45134 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Blocker > > One possible situation that has been found is that, during the process of > requesting mergedBlockMeta, when the channel is closed, it may trigger two > callback callbacks and result in duplicate data for the original shuffle > blocks. > # The first time is when the channel is inactivated, the responseHandler > will execute the callback for all outstandingRpcs. > # The second time is when the listener corresponding to > shuffleClient.writeAndFlush executes the callback after the channel is closed. > Some Error Logs: > {code:java} > 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still > have 1 requests outstanding when connection from host/ip:prot is closed > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Connection from host:port closed > at > org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) > at > org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:745) > > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Failed to send RPC RPC 8079698359363123411 to > host/ip:port: java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.jav
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Priority: Critical (was: Major) > Data duplication may occur when fallback to origin shuffle block > > > Key: SPARK-45134 > URL: https://issues.apache.org/jira/browse/SPARK-45134 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Critical > > One possible situation that has been found is that, during the process of > requesting mergedBlockMeta, when the channel is closed, it may trigger two > callback callbacks and result in duplicate data for the original shuffle > blocks. > # The first time is when the channel is inactivated, the responseHandler > will execute the callback for all outstandingRpcs. > # The second time is when the listener corresponding to > shuffleClient.writeAndFlush executes the callback after the channel is closed. > Some Error Logs: > {code:java} > 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still > have 1 requests outstanding when connection from host/ip:prot is closed > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Connection from host:port closed > at > org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) > at > org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:745) > > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Failed to send RPC RPC 8079698359363123411 to > host/ip:port: java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Affects Version/s: 3.5.0 3.4.0 3.3.0 > Data duplication may occur when fallback to origin shuffle block > > > Key: SPARK-45134 > URL: https://issues.apache.org/jira/browse/SPARK-45134 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0 >Reporter: gaoyajun02 >Priority: Major > > One possible situation that has been found is that, during the process of > requesting mergedBlockMeta, when the channel is closed, it may trigger two > callback callbacks and result in duplicate data for the original shuffle > blocks. > # The first time is when the channel is inactivated, the responseHandler > will execute the callback for all outstandingRpcs. > # The second time is when the listener corresponding to > shuffleClient.writeAndFlush executes the callback after the channel is closed. > Some Error Logs: > {code:java} > 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still > have 1 requests outstanding when connection from host/ip:prot is closed > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Connection from host:port closed > at > org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) > at > org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:745) > > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Failed to send RPC RPC 8079698359363123411 to > host/ip:port: java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$RpcCha
[jira] [Commented] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764130#comment-17764130 ] gaoyajun02 commented on SPARK-45134: Hi, [~csingh] [~vsowrirajan] [~mshen] , Can you take a look? hope some suggestions. > Data duplication may occur when fallback to origin shuffle block > > > Key: SPARK-45134 > URL: https://issues.apache.org/jira/browse/SPARK-45134 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > One possible situation that has been found is that, during the process of > requesting mergedBlockMeta, when the channel is closed, it may trigger two > callback callbacks and result in duplicate data for the original shuffle > blocks. > # The first time is when the channel is inactivated, the responseHandler > will execute the callback for all outstandingRpcs. > # The second time is when the listener corresponding to > shuffleClient.writeAndFlush executes the callback after the channel is closed. > Some Error Logs: > {code:java} > 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still > have 1 requests outstanding when connection from host/ip:prot is closed > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Connection from host:port closed > at > org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) > at > org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) > at > org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:745) > > 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to > get the meta of push-merged block for (3, 54) from host:port > java.io.IOException: Failed to send RPC RPC 8079698359363123411 to > host/ip:port: java.nio.channels.ClosedChannelException > at > org.apache.spark.ne
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Description: One possible situation that has been found is that, during the process of requesting mergedBlockMeta, when the channel is closed, it may trigger two callback callbacks and result in duplicate data for the original shuffle blocks. # The first time is when the channel is inactivated, the responseHandler will execute the callback for all outstandingRpcs. # The second time is when the listener corresponding to shuffleClient.writeAndFlush executes the callback after the channel is closed. Some Error Logs: {code:java} 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still have 1 requests outstanding when connection from host/ip:prot is closed 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Connection from host:port closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Failed to send RPC RPC 8079698359363123411 to host/ip:port: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:433) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:409) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) at io.netty.util.conc
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Description: One possible situation that has been found is that, during the process of requesting mergedBlockMeta, when the channel is closed, it may trigger two callback callbacks and result in duplicate data for the original shuffle blocks. # The first time is when the channel is inactivated, the responseHandler will execute the callback for all outstandingRpcs. # The second time is when the listener corresponding to shuffleClient.writeAndFlush executes the callback after the channel is closed. Some Error Logs {code:java} // code placeholder 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still have 1 requests outstanding when connection from host/ip:prot is closed 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Connection from host:port closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Failed to send RPC RPC 8079698359363123411 to host/ip:port: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:433) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:409) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) at
[jira] [Updated] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
[ https://issues.apache.org/jira/browse/SPARK-45134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-45134: --- Description: One possible situation that has been found is that, during the process of requesting mergedBlockMeta, when the channel is closed, it may trigger two callback callbacks and result in duplicate data for the original shuffle blocks. # The first time is when the channel is inactivated, the responseHandler will execute the callback for all outstandingRpcs. # The second time is when the listener corresponding to shuffleClient.writeAndFlush executes the callback after the channel is closed. Some Error Logs was: One possible situation that has been found is that, during the process of requesting mergedBlockMeta, when the channel is closed, it may trigger two callback callbacks and result in duplicate data for the original shuffle blocks. # The first time is when the channel is inactivated, the responseHandler will execute the callback for all outstandingRpcs. # The second time is when the listener corresponding to shuffleClient.writeAndFlush executes the callback after the channel is closed. Some Error Logs ``` 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still have 1 requests outstanding when connection from host/ip:prot is closed 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Connection from host:port closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Failed to send RPC RPC 8079698359363123411 to host/ip:port: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:433) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationCompl
[jira] [Created] (SPARK-45134) Data duplication may occur when fallback to origin shuffle block
gaoyajun02 created SPARK-45134: -- Summary: Data duplication may occur when fallback to origin shuffle block Key: SPARK-45134 URL: https://issues.apache.org/jira/browse/SPARK-45134 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.0 Reporter: gaoyajun02 One possible situation that has been found is that, during the process of requesting mergedBlockMeta, when the channel is closed, it may trigger two callback callbacks and result in duplicate data for the original shuffle blocks. # The first time is when the channel is inactivated, the responseHandler will execute the callback for all outstandingRpcs. # The second time is when the listener corresponding to shuffleClient.writeAndFlush executes the callback after the channel is closed. Some Error Logs ``` 23/09/08 09:22:21 ERROR shuffle-client-7-1 TransportResponseHandler: Still have 1 requests outstanding when connection from host/ip:prot is closed 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Connection from host:port closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:147) at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:117) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:225) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) 23/09/08 09:22:21 ERROR shuffle-client-7-1 PushBasedFetchHelper: Failed to get the meta of push-merged block for (3, 54) from host:port java.io.IOException: Failed to send RPC RPC 8079698359363123411 to host/ip:port: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:433) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:409) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) at io
[jira] [Commented] (SPARK-42203) JsonProtocol should skip logging of push-based shuffle read metrics when push-based shuffle is disabled
[ https://issues.apache.org/jira/browse/SPARK-42203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746225#comment-17746225 ] gaoyajun02 commented on SPARK-42203: ping [~thejdeep] when can you resolve it? > JsonProtocol should skip logging of push-based shuffle read metrics when > push-based shuffle is disabled > --- > > Key: SPARK-42203 > URL: https://issues.apache.org/jira/browse/SPARK-42203 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Priority: Major > > This is a followup to SPARK-36620: > When push-based shuffle is disabled (the default), I think that we should > skip the logging of the new push-based shuffle read metrics. Because these > metrics are logged for every task, they will add significant additional size > to Spark event logs. It would be great to avoid this cost in cases where it's > not necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728724#comment-17728724 ] gaoyajun02 commented on SPARK-43864: It looks like a series of test package dependencies need to be changed.I'm not very familiar with these, Can you solve it? @[~panbingkun] > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Minor > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727827#comment-17727827 ] gaoyajun02 commented on SPARK-43864: @[~srowen] thank you for your reply,It doesn't matter for spark, but it will affect some users who use community spark to package or deploy. Some companies that pay more attention to security will disable these packages with security vulnerabilities. The implementation is to check dependencies during the packaging process. Once a dependency is introduced, it will prevent release and deployment, regardless of whether it is in the test scope. > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Minor > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727064#comment-17727064 ] gaoyajun02 commented on SPARK-43864: Can you take a look @[~yangjie01] > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Major > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-43864: --- Attachment: (was: image-2023-05-29-18-18-44-132.png) > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Major > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-43864: --- Attachment: image-2023-05-29-18-18-44-132.png > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Major > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-43864: --- Description: CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in spark {code:java} org.htmlunit htmlunit test org.htmlunit htmlunit-core-js test {code} see: [https://www.htmlunit.org/migration.html] was: CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in spark ``` org.htmlunit htmlunit test org.htmlunit htmlunit-core-js test ``` see: [https://www.htmlunit.org/migration.html] > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Major > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-43864: --- Description: CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in spark ``` org.htmlunit htmlunit test org.htmlunit htmlunit-core-js test ``` see: [https://www.htmlunit.org/migration.html] > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Major > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > ``` > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > > ``` > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
gaoyajun02 created SPARK-43864: -- Summary: Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL Key: SPARK-43864 URL: https://issues.apache.org/jira/browse/SPARK-43864 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: gaoyajun02 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40872) Fallback to original shuffle block when a push-merged shuffle chunk is zero-size
[ https://issues.apache.org/jira/browse/SPARK-40872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-40872: --- Parent: SPARK-33235 Issue Type: Sub-task (was: Improvement) > Fallback to original shuffle block when a push-merged shuffle chunk is > zero-size > > > Key: SPARK-40872 > URL: https://issues.apache.org/jira/browse/SPARK-40872 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.3.0, 3.2.2 >Reporter: gaoyajun02 >Priority: Major > > A large number of shuffle tests in our cluster show that bad nodes with chunk > corruption appear have a probability of fetching zero-size shuffleChunks. In > this case, we can fall back to original shuffle blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40872) Fallback to original shuffle block when a push-merged shuffle chunk is zero-size
gaoyajun02 created SPARK-40872: -- Summary: Fallback to original shuffle block when a push-merged shuffle chunk is zero-size Key: SPARK-40872 URL: https://issues.apache.org/jira/browse/SPARK-40872 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.2.2, 3.3.0 Reporter: gaoyajun02 A large number of shuffle tests in our cluster show that bad nodes with chunk corruption appear have a probability of fetching zero-size shuffleChunks. In this case, we can fall back to original shuffle blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 resolved SPARK-38010. Resolution: Fixed > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Question > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-38010: --- Issue Type: Question (was: Brainstorming) > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Question > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481614#comment-17481614 ] gaoyajun02 commented on SPARK-38010: https://issues.apache.org/jira/browse/SPARK-34826 can solve it? [~vsowrirajan] > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Brainstorming > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-38010: --- Parent: (was: SPARK-33235) Issue Type: Brainstorming (was: Technical task) > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Brainstorming > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-38010: --- Issue Type: Technical task (was: Sub-task) > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Technical task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-38010: --- Parent: SPARK-33235 Issue Type: Sub-task (was: Improvement) > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-38010: --- Description: The current shuffle merger locations is obtained based on the host of the active or dead Executors. When dynamic executor allocation is enabled, when an application submits the first few stages, there are often not enough locations to satisfy the push merge, which causes most shuffles to not benefit from the push bashed shuffle. The first few shuffle write stages of spark applications are generally the stages for reading tables or data sources, which account for a large amount of shuffled data. Because push merge shuffle is disabled, the end-to-end improvement of spark applications is very limited. I probably thought of a way, but not sure if it's possible: * Lazy initialize shuffle merger locations, After the mapper writes the local shuffle data, it obtains the merge location in the push thread. Looking for advice and solutions on this issue was: The current shuffle merger position is obtained based on the host of the active or dead Executor. When dynamic resource allocation is enabled, when the application submits the first few stages, there are often not enough locations to satisfy the push merge, which causes most shuffles to not benefit from the push bashed shuffle. The first few shuffle write stages of spark applications are generally the stages for reading tables or data sources, which account for a large amount of shuffled data and the proportion of data. Because push cannot be used, the end-to-end improvement of spark applications is very limited. I probably thought of a way, but not sure if it's possible: * Lazy initialize shuffle merger locations, After the mapper writes the local shuffle data, it obtains the merge location in the push thread. Looking for advice and solutions on this issue > Push-based shuffle disabled due to insufficient mergeLocations > -- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: gaoyajun02 >Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38010) Push-based shuffle disabled due to insufficient mergeLocations
gaoyajun02 created SPARK-38010: -- Summary: Push-based shuffle disabled due to insufficient mergeLocations Key: SPARK-38010 URL: https://issues.apache.org/jira/browse/SPARK-38010 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 3.1.0 Reporter: gaoyajun02 The current shuffle merger position is obtained based on the host of the active or dead Executor. When dynamic resource allocation is enabled, when the application submits the first few stages, there are often not enough locations to satisfy the push merge, which causes most shuffles to not benefit from the push bashed shuffle. The first few shuffle write stages of spark applications are generally the stages for reading tables or data sources, which account for a large amount of shuffled data and the proportion of data. Because push cannot be used, the end-to-end improvement of spark applications is very limited. I probably thought of a way, but not sure if it's possible: * Lazy initialize shuffle merger locations, After the mapper writes the local shuffle data, it obtains the merge location in the push thread. Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Affects Version/s: 3.3.0 3.2.0 > Reuse CachedDNSToSwitchMapping for yarn container requests > --- > > Key: SPARK-36964 > URL: https://issues.apache.org/jira/browse/SPARK-36964 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: gaoyajun02 >Priority: Major > > Similar to SPARK-13704, In some cases, YarnAllocator add container requests > with locality preference can be expensive, it may call the topology script > for rack awareness. > When submit a very large job in a very large Yarn cluster, the topology > script may take signifiant time to run. And this blocks receiving > YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from > spark dynamic executor allocation thread, which may blocks the > ExecutorAllocationListener, and then result in executorManagement queue > backlog. > > Some logs: > {code:java} > 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation > ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 > INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error > reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures > timed out after [120 seconds]. This timeout is controlled by > spark.rpc.askTimeout at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) > at > org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) > at > org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) > at > org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745)Caused by: > java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at > org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 > more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation > ExecutorAllocationManager: Unable to reach the cluster manager to request > 1922 total executors! > 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation > ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 > INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error > reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures > timed out after [120 seconds]. This timeout is controlled by > spark.rpc.askTimeout at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) > at > org.apache.spark.ExecutorAllocationManager.addExecutors(E
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Description: Similar to SPARK-13704, In some cases, YarnAllocator add container requests with locality preference can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, and then result in executorManagement queue backlog. Some logs: {code:java} 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 total executors! 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThrea
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Description: Similar to SPARK-13704, In some cases, YarnAllocator add or remove container requests can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, {code} 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 total executors!{code} and then result in executorManagement queue backlog. e.g. some log: {code} 21/09/29 12:02:49 ERROR dag-scheduler-event-loop AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 21/09/29 12:02:49 WARN dag-scheduler-event-loop AsyncEventQueue: Dropped 1 events from executorManagement since the application started. 21/09/29 12:02:55 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerExecutorAdded(1632888172920,543,org.apache.spark.scheduler.cluster.ExecutorData@8cfab8f5,None) by listener EventLoggingListener took 3.037686034s. 21/09/29 12:03:03 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerBlockManagerAdded(1632888181779,BlockManagerId(1359, --, 57233, None),2704696934,Some(2704696934),Some(0)) by listener EventLoggingListener took 1.462598355s. 21/09/29 12:03:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 74388 events from executorManagement since Wed Sep 29 12:02:49 CST 2021. 21/09/29 12:04:35 INFO spark-listener-group-executorManagement AsyncEventQueue: Process of event SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@52f810ad,{...}) by listener ExecutorAllocationListener took 116.526408932s. 21/09/29 12:04:49 WARN heartbeat-receiver-event-loop-thread AsyncEventQueue: Dropped 18892 events from executorManagement since Wed Sep 29 12:03:49 CST 2021. 21/09/29 12:05:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 19397 events from executorManagement since Wed Sep 29 12:04:49 CST 2021. {code} was: Similar to SPARK-13704, In some cases, YarnAllocator add or r
[jira] [Updated] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
[ https://issues.apache.org/jira/browse/SPARK-36964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36964: --- Description: Similar to SPARK-13704, In some cases, YarnAllocator add or remove container requests can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, {code:text} 21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.21/09/29 12:04:35 INFO spark-dynamic-executor-allocation ExecutorAllocationManager: Error reaching cluster manager.org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:839) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:411) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:361) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:316) at org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:294) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more21/09/29 12:04:35 WARN spark-dynamic-executor-allocation ExecutorAllocationManager: Unable to reach the cluster manager to request 1922 total executors!{code} and then result in executorManagement queue backlog. e.g. some log: {code:text} 21/09/29 12:02:49 ERROR dag-scheduler-event-loop AsyncEventQueue: Dropping event from queue executorManagement. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 21/09/29 12:02:49 WARN dag-scheduler-event-loop AsyncEventQueue: Dropped 1 events from executorManagement since the application started. 21/09/29 12:02:55 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerExecutorAdded(1632888172920,543,org.apache.spark.scheduler.cluster.ExecutorData@8cfab8f5,None) by listener EventLoggingListener took 3.037686034s. 21/09/29 12:03:03 INFO spark-listener-group-eventLog AsyncEventQueue: Process of event SparkListenerBlockManagerAdded(1632888181779,BlockManagerId(1359, --, 57233, None),2704696934,Some(2704696934),Some(0)) by listener EventLoggingListener took 1.462598355s. 21/09/29 12:03:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 74388 events from executorManagement since Wed Sep 29 12:02:49 CST 2021. 21/09/29 12:04:35 INFO spark-listener-group-executorManagement AsyncEventQueue: Process of event SparkListenerStageSubmitted(org.apache.spark.scheduler.StageInfo@52f810ad,{...}) by listener ExecutorAllocationListener took 116.526408932s. 21/09/29 12:04:49 WARN heartbeat-receiver-event-loop-thread AsyncEventQueue: Dropped 18892 events from executorManagement since Wed Sep 29 12:03:49 CST 2021. 21/09/29 12:05:49 WARN dispatcher-BlockManagerMaster AsyncEventQueue: Dropped 19397 events from executorManagement since Wed Sep 29 12:04:49 CST 2021. {code} was: Similar to SPARK-13704, In some cases,
[jira] [Created] (SPARK-36964) Reuse CachedDNSToSwitchMapping for yarn container requests
gaoyajun02 created SPARK-36964: -- Summary: Reuse CachedDNSToSwitchMapping for yarn container requests Key: SPARK-36964 URL: https://issues.apache.org/jira/browse/SPARK-36964 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 3.1.2, 3.0.3 Reporter: gaoyajun02 Similar to SPARK-13704, In some cases, YarnAllocator add or remove container requests can be expensive, it may call the topology script for rack awareness. When submit a very large job in a very large Yarn cluster, the topology script may take signifiant time to run. And this blocks receiving YarnSchedulerBackend's RequestExecutors rpc calls, This request comes from spark dynamic executor allocation thread, which may blocks the ExecutorAllocationListener, and then result in executorManagement queue backlog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36815) Found duplicate rewrite attributes
[ https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36815: --- Description: We are using Spark version 3.0.2 in production and some ETLs contain multi-level CTEs and the following error occurs when we join them. {code:java} java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) {code} I reproduced the problem with a simplified SQL as follows: {code:java} -- SQL with a as ( select name, get_json_object(json, '$.id') id, n from ( select get_json_object(json, '$.name') name, json from values ('{"name":"a", "id": 1}' ) people(json) ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) c from a group by name) a2 on a1.name = a2.name) select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} In debugging I found that a reference to the root Project existed in both subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` occurred in both subqueries, containing two new attrMapping, and they were both eventually passed to the root Project, leading to this error plan: {code:java} Project [name#218, id#219, n#229] +- Join LeftOuter, (name#218 = name#232) :- SubqueryAlias a1 : +- SubqueryAlias a : +- Project [name#218, get_json_object(json#225, $.id) AS id#219, n#229] :+- Generate explode(array(1, 1, 2)), false, num, [n#229] : +- SubqueryAlias __auto_generated_subquery_name : +- Project [get_json_object(json#225, $.name) AS name#218, json#225] : +- SubqueryAlias people :+- LocalRelation [json#225] +- SubqueryAlias a2 +- Aggregate [name#232], [name#232, count(1) AS c#220L] +- SubqueryAlias a +- Project [name#232, get_json_object(json#226, $.id) AS id#219, n#230] +- Generate explode(array(1, 1, 2)), false, num, [n#230] +- SubqueryAlias __auto_generated_subquery_name +- Project [get_json_object(json#226, $.name) AS name#232, json#226] +- SubqueryAlias people +- LocalRelation [json#226] {code} newPlan: {code:java} !Project [name#218, id#219, n#229] +- Join LeftOuter, (name#218 = name#232) :- SubqueryAlias a1 : +- SubqueryAlias a : +- Project [name#218, get_json_object(json#225, $.id) AS id#233, n#229] :+- Generate explode(array(1, 1, 2)), false, num, [n#229] : +- SubqueryAlias __auto_generated_subquery_name : +- Project [get_json_object(json#225, $.name) AS name#218, json#225] : +- SubqueryAlias people :+- LocalRelation [json#225] +- SubqueryAlias a2 +- Aggregate [name#232], [name#232, count(1) AS c#220L] +- SubqueryAlias a +- Project [name#232, get_json_object(json#226, $.id) AS id#234, n#230] +- Generate explode(array(1, 1, 2)), false, num, [n#230] +- SubqueryAlias __auto_generated_subquery_name +- Project [get_json_object(json#226, $.name) AS name#232, json#226] +- SubqueryAlias people +- LocalRelation [json#226] {code} attrMapping: {code:java} attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 0 = {Tuple2@17769} "(id#219,id#233)" 1 = {Tuple2@17770} "(id#219,id#234)" {code} was: We are using Spark version 3.0.2 in production and some ETLs contain multi-level CETs and the following error occurs when we join them. {code:java} java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) {code} I reproduced the problem with a simplified SQL as follows: {code:java} -- SQL with a as ( select name, get_json_object(json, '$.id') id, n from ( select get_json_object(json, '$.nam
[jira] [Comment Edited] (SPARK-36815) Found duplicate rewrite attributes
[ https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418064#comment-17418064 ] gaoyajun02 edited comment on SPARK-36815 at 9/21/21, 2:19 PM: -- https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not. Hi [~cloud_fan], can we open backport PRs for 3.0? was (Author: gaoyajun02): https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not. Hi [~cloud_fan], can we open backport PRs for 3.0.2? > Found duplicate rewrite attributes > -- > > Key: SPARK-36815 > URL: https://issues.apache.org/jira/browse/SPARK-36815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2 >Reporter: gaoyajun02 >Priority: Major > Fix For: 3.0.2 > > > We are using Spark version 3.0.2 in production and some ETLs contain > multi-level CETs and the following error occurs when we join them. > {code:java} > java.lang.AssertionError: assertion failed: Found duplicate rewrite > attributes at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) > {code} > I reproduced the problem with a simplified SQL as follows: > {code:java} > -- SQL > with > a as ( select name, get_json_object(json, '$.id') id, n from ( > select get_json_object(json, '$.name') name, json from values > ('{"name":"a", "id": 1}' ) people(json) > ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), > b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) > c from a group by name) a2 on a1.name = a2.name) > select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} > In debugging I found that a reference to the root Project existed in both > subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` > occurred in both subqueries, containing two new attrMapping, and they were > both eventually passed to the root Project, leading to this error > plan: > {code:java} > Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) >:- SubqueryAlias a1 >: +- SubqueryAlias a >: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, > n#229] >:+- Generate explode(array(1, 1, 2)), false, num, [n#229] >: +- SubqueryAlias __auto_generated_subquery_name >: +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] >: +- SubqueryAlias people >:+- LocalRelation [json#225] >+- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#219, > n#230] >+- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people >+- LocalRelation [json#226] > {code} > newPlan: > {code:java} > !Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) >:- SubqueryAlias a1 >: +- SubqueryAlias a >: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, > n#229] >:+- Generate explode(array(1, 1, 2)), false, num, [n#229] >: +- SubqueryAlias __auto_generated_subquery_name >: +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] >: +- SubqueryAlias people >:+- LocalRelation [json#225] >+- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#234, > n#230] >+- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people >+- LocalRelation [json#226] > {code} > attrMapping: > {code:java} > attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 > 0 = {Tupl
[jira] [Comment Edited] (SPARK-36815) Found duplicate rewrite attributes
[ https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418064#comment-17418064 ] gaoyajun02 edited comment on SPARK-36815 at 9/21/21, 12:17 PM: --- https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not. Hi [~cloud_fan], can we open backport PRs for 3.0.2? was (Author: gaoyajun02): https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not > Found duplicate rewrite attributes > -- > > Key: SPARK-36815 > URL: https://issues.apache.org/jira/browse/SPARK-36815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2 >Reporter: gaoyajun02 >Priority: Major > Fix For: 3.0.2 > > > We are using Spark version 3.0.2 in production and some ETLs contain > multi-level CETs and the following error occurs when we join them. > {code:java} > java.lang.AssertionError: assertion failed: Found duplicate rewrite > attributes at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) > {code} > I reproduced the problem with a simplified SQL as follows: > {code:java} > -- SQL > with > a as ( select name, get_json_object(json, '$.id') id, n from ( > select get_json_object(json, '$.name') name, json from values > ('{"name":"a", "id": 1}' ) people(json) > ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), > b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) > c from a group by name) a2 on a1.name = a2.name) > select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} > In debugging I found that a reference to the root Project existed in both > subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` > occurred in both subqueries, containing two new attrMapping, and they were > both eventually passed to the root Project, leading to this error > plan: > {code:java} > Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) >:- SubqueryAlias a1 >: +- SubqueryAlias a >: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, > n#229] >:+- Generate explode(array(1, 1, 2)), false, num, [n#229] >: +- SubqueryAlias __auto_generated_subquery_name >: +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] >: +- SubqueryAlias people >:+- LocalRelation [json#225] >+- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#219, > n#230] >+- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people >+- LocalRelation [json#226] > {code} > newPlan: > {code:java} > !Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) >:- SubqueryAlias a1 >: +- SubqueryAlias a >: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, > n#229] >:+- Generate explode(array(1, 1, 2)), false, num, [n#229] >: +- SubqueryAlias __auto_generated_subquery_name >: +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] >: +- SubqueryAlias people >:+- LocalRelation [json#225] >+- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#234, > n#230] >+- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people >+- LocalRelation [json#226] > {code} > attrMapping: > {code:java} > attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 > 0 = {Tuple2@17769} "(id#219,id#233)" > 1 = {Tuple2@17770} "
[jira] [Commented] (SPARK-36815) Found duplicate rewrite attributes
[ https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418064#comment-17418064 ] gaoyajun02 commented on SPARK-36815: https://issues.apache.org/jira/browse/SPARK-33272 fixes this issue, but the Spark 3.0.2 branch does not > Found duplicate rewrite attributes > -- > > Key: SPARK-36815 > URL: https://issues.apache.org/jira/browse/SPARK-36815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2 >Reporter: gaoyajun02 >Priority: Major > Fix For: 3.0.2 > > > We are using Spark version 3.0.2 in production and some ETLs contain > multi-level CETs and the following error occurs when we join them. > {code:java} > java.lang.AssertionError: assertion failed: Found duplicate rewrite > attributes at scala.Predef$.assert(Predef.scala:223) at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) > {code} > I reproduced the problem with a simplified SQL as follows: > {code:java} > -- SQL > with > a as ( select name, get_json_object(json, '$.id') id, n from ( > select get_json_object(json, '$.name') name, json from values > ('{"name":"a", "id": 1}' ) people(json) > ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), > b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) > c from a group by name) a2 on a1.name = a2.name) > select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} > In debugging I found that a reference to the root Project existed in both > subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` > occurred in both subqueries, containing two new attrMapping, and they were > both eventually passed to the root Project, leading to this error > plan: > {code:java} > Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) >:- SubqueryAlias a1 >: +- SubqueryAlias a >: +- Project [name#218, get_json_object(json#225, $.id) AS id#219, > n#229] >:+- Generate explode(array(1, 1, 2)), false, num, [n#229] >: +- SubqueryAlias __auto_generated_subquery_name >: +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] >: +- SubqueryAlias people >:+- LocalRelation [json#225] >+- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#219, > n#230] >+- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people >+- LocalRelation [json#226] > {code} > newPlan: > {code:java} > !Project [name#218, id#219, n#229] > +- Join LeftOuter, (name#218 = name#232) >:- SubqueryAlias a1 >: +- SubqueryAlias a >: +- Project [name#218, get_json_object(json#225, $.id) AS id#233, > n#229] >:+- Generate explode(array(1, 1, 2)), false, num, [n#229] >: +- SubqueryAlias __auto_generated_subquery_name >: +- Project [get_json_object(json#225, $.name) AS name#218, > json#225] >: +- SubqueryAlias people >:+- LocalRelation [json#225] >+- SubqueryAlias a2 > +- Aggregate [name#232], [name#232, count(1) AS c#220L] > +- SubqueryAlias a > +- Project [name#232, get_json_object(json#226, $.id) AS id#234, > n#230] >+- Generate explode(array(1, 1, 2)), false, num, [n#230] > +- SubqueryAlias __auto_generated_subquery_name > +- Project [get_json_object(json#226, $.name) AS > name#232, json#226] > +- SubqueryAlias people >+- LocalRelation [json#226] > {code} > attrMapping: > {code:java} > attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 > 0 = {Tuple2@17769} "(id#219,id#233)" > 1 = {Tuple2@17770} "(id#219,id#234)" > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additi
[jira] [Updated] (SPARK-36815) Found duplicate rewrite attributes
[ https://issues.apache.org/jira/browse/SPARK-36815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36815: --- Description: We are using Spark version 3.0.2 in production and some ETLs contain multi-level CETs and the following error occurs when we join them. {code:java} java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) {code} I reproduced the problem with a simplified SQL as follows: {code:java} -- SQL with a as ( select name, get_json_object(json, '$.id') id, n from ( select get_json_object(json, '$.name') name, json from values ('{"name":"a", "id": 1}' ) people(json) ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) c from a group by name) a2 on a1.name = a2.name) select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} In debugging I found that a reference to the root Project existed in both subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` occurred in both subqueries, containing two new attrMapping, and they were both eventually passed to the root Project, leading to this error plan: {code:java} Project [name#218, id#219, n#229] +- Join LeftOuter, (name#218 = name#232) :- SubqueryAlias a1 : +- SubqueryAlias a : +- Project [name#218, get_json_object(json#225, $.id) AS id#219, n#229] :+- Generate explode(array(1, 1, 2)), false, num, [n#229] : +- SubqueryAlias __auto_generated_subquery_name : +- Project [get_json_object(json#225, $.name) AS name#218, json#225] : +- SubqueryAlias people :+- LocalRelation [json#225] +- SubqueryAlias a2 +- Aggregate [name#232], [name#232, count(1) AS c#220L] +- SubqueryAlias a +- Project [name#232, get_json_object(json#226, $.id) AS id#219, n#230] +- Generate explode(array(1, 1, 2)), false, num, [n#230] +- SubqueryAlias __auto_generated_subquery_name +- Project [get_json_object(json#226, $.name) AS name#232, json#226] +- SubqueryAlias people +- LocalRelation [json#226] {code} newPlan: {code:java} !Project [name#218, id#219, n#229] +- Join LeftOuter, (name#218 = name#232) :- SubqueryAlias a1 : +- SubqueryAlias a : +- Project [name#218, get_json_object(json#225, $.id) AS id#233, n#229] :+- Generate explode(array(1, 1, 2)), false, num, [n#229] : +- SubqueryAlias __auto_generated_subquery_name : +- Project [get_json_object(json#225, $.name) AS name#218, json#225] : +- SubqueryAlias people :+- LocalRelation [json#225] +- SubqueryAlias a2 +- Aggregate [name#232], [name#232, count(1) AS c#220L] +- SubqueryAlias a +- Project [name#232, get_json_object(json#226, $.id) AS id#234, n#230] +- Generate explode(array(1, 1, 2)), false, num, [n#230] +- SubqueryAlias __auto_generated_subquery_name +- Project [get_json_object(json#226, $.name) AS name#232, json#226] +- SubqueryAlias people +- LocalRelation [json#226] {code} attrMapping: {code:java} attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 0 = {Tuple2@17769} "(id#219,id#233)" 1 = {Tuple2@17770} "(id#219,id#234)" {code} was: We are using Spark version 3.0.2 in production and some ETLs contain multi-level CETs and the following error occurs when we join them. {code:java} java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) {code} I reproduced the problem with a simplified SQL as follows: {code:java} -- SQL with a as ( select name, get_json_object(json, '$.id') id, n from ( select get_json_object(json, '$.na
[jira] [Created] (SPARK-36815) Found duplicate rewrite attributes
gaoyajun02 created SPARK-36815: -- Summary: Found duplicate rewrite attributes Key: SPARK-36815 URL: https://issues.apache.org/jira/browse/SPARK-36815 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.2 Reporter: gaoyajun02 Fix For: 3.0.2 We are using Spark version 3.0.2 in production and some ETLs contain multi-level CETs and the following error occurs when we join them. {code:java} java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403) {code} I reproduced the problem with a simplified SQL as follows: {code:java} -- SQL with a as ( select name, get_json_object(json, '$.id') id, n from ( select get_json_object(json, '$.name') name, json from values ('{"name":"a", "id": 1}' ) people(json) ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) c from a group by name) a2 on a1.name = a2.name) select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;{code} In debugging I found that a reference to the root Project existed in both subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` occurred in both subqueries, containing two new attrMapping, and they were both eventually passed to the root Project, leading to this error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Parent: (was: SPARK-33828) Issue Type: Question (was: Sub-task) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Comment: was deleted (was: close it) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 closed SPARK-36630. -- close it > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 resolved SPARK-36630. Resolution: Fixed > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408901#comment-17408901 ] gaoyajun02 commented on SPARK-36630: Found my issue is similar to https://issues.apache.org/jira/browse/SPARK-35264. I can set the autoBroadcastThreshold to -1 to disable logical plan stats, and set spark.sql.adaptive.autoBroadcastJoinThreshold to control the threshold for physical stats > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Parent: SPARK-33828 Issue Type: Sub-task (was: Improvement) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Description: Currently when AQE's queryStage is not materialized, it uses the stats of the logical plan to estimate whether the plan can be converted to BHJ, and in some scenarios the estimated value is several orders of magnitude smaller than the actual broadcast data, which can lead to large tables being broadcast (was: Currently when AQE's queryStage is not materialized, it uses the stats of the logical plan to estimate whether the plan can be converted to BHJ, and in some scenarios the estimated value is several orders of magnitude larger than the actual broadcast data, which can lead to large tables being broadcast) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Description: Currently when AQE's queryStage is not materialized, it uses the stats of the logical plan to estimate whether the plan can be converted to BHJ, and in some scenarios the estimated value is several orders of magnitude larger than the actual broadcast data, which can lead to large tables being broadcast (was: Currently AQE is turned on, when queryStage is not materialized, it uses the stats of the logical plan to estimate whether the plan can be converted to BHJ, and in some scenarios the estimated value is several orders of magnitude larger than the actual broadcast data, which can lead to large tables being broadcast) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude larger than > the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
gaoyajun02 created SPARK-36630: -- Summary: Add the option to use physical statistics to avoid large tables being broadcast Key: SPARK-36630 URL: https://issues.apache.org/jira/browse/SPARK-36630 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: gaoyajun02 Currently AQE is turned on, when queryStage is not materialized, it uses the stats of the logical plan to estimate whether the plan can be converted to BHJ, and in some scenarios the estimated value is several orders of magnitude larger than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36339) aggsBuffer should collect AggregateExpression in the map range
[ https://issues.apache.org/jira/browse/SPARK-36339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36339: --- Description: show demo for this ISSUE: {code:java} // SQL without error SELECT name, count(name) c FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); // An error is reported after exchanging the order of the query columns: SELECT count(name) c, name FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); {code} The error message is: {code:java} Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [name#5, spark_grouping_id#3], [count(name#1) AS c#0L, name#1] +- Expand [List(name#1, name#4, 0)], [name#1, name#5, spark_grouping_id#3] +- Project [name#1, name#1 AS name#4] +- SubqueryAlias `people` +- LocalRelation [name#1] {code} So far, I have checked that there is no problem before version 2.3. During debugging, I found that the behavior of constructAggregateExprs in ResolveGroupingAnalytics has changed. {code:java} /* * Construct new aggregate expressions by replacing grouping functions. */ private def constructAggregateExprs( groupByExprs: Seq[Expression], aggregations: Seq[NamedExpression], groupByAliases: Seq[Alias], groupingAttrs: Seq[Expression], gid: Attribute): Seq[NamedExpression] = aggregations.map { // collect all the found AggregateExpression, so we can check an expression is part of // any AggregateExpression or not. val aggsBuffer = ArrayBuffer[Expression]() // Returns whether the expression belongs to any expressions in `aggsBuffer` or not. def isPartOfAggregation(e: Expression): Boolean = { aggsBuffer.exists(a => a.find(_ eq e).isDefined) } replaceGroupingFunc(_, groupByExprs, gid).transformDown { // AggregateExpression should be computed on the unmodified value of its argument // expressions, so we should not replace any references to grouping expression // inside it. case e: AggregateExpression => aggsBuffer += e e case e if isPartOfAggregation(e) => e case e => // Replace expression by expand output attribute. val index = groupByAliases.indexWhere(_.child.semanticEquals(e)) if (index == -1) { e } else { groupingAttrs(index) } }.asInstanceOf[NamedExpression] } {code} When performing aggregations.map, the aggsBuffer here seems to be outside the scope of the map. It can store the AggregateExpression of all the elements processed by the map function, but this is not before 2.3. was: show demo for this ISSUE: This SQL This SQL can be executed normally {code:java} // SQL without error SELECT name, count(name) c FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); // An error is reported after exchanging the order of the query columns: SELECT count(name) c, name FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); {code} The error message is: {code:java} Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [name#5, spark_grouping_id#3], [count(name#1) AS c#0L, name#1] +- Expand [List(name#1, name#4, 0)], [name#1, name#5, spark_grouping_id#3] +- Project [name#1, name#1 AS name#4] +- SubqueryAlias `people` +- LocalRelation [name#1] {code} So far, I have checked that there is no problem before version 2.3. During debugging, I found that the behavior of constructAggregateExprs in ResolveGroupingAnalytics has changed. {code:java} /* * Construct new aggregate expressions by replacing grouping functions. */ private def constructAggregateExprs( groupByExprs: Seq[Expression], aggregations: Seq[NamedExpression], groupByAliases: Seq[Alias], groupingAttrs: Seq[Expression], gid: Attribute): Seq[NamedExpression] = aggregations.map { // collect all the found AggregateExpression, so we can check an expression is part of // any AggregateExpression or not. val aggsBuffer = ArrayBuffer[Expression]() // Returns whether the expression belongs to any expressions in `aggsBuffer` or not. def isPartOfAggregation(e: Expression): Boolean = { aggsBuffer.exists(a => a.find(_ eq e).isDefined) } replaceGroupingFunc(_, groupByExprs, gid).transformDown { // AggregateExpression should be computed on the unmodified value of its argument // expressions, so
[jira] [Created] (SPARK-36339) aggsBuffer should collect AggregateExpression in the map range
gaoyajun02 created SPARK-36339: -- Summary: aggsBuffer should collect AggregateExpression in the map range Key: SPARK-36339 URL: https://issues.apache.org/jira/browse/SPARK-36339 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2, 3.0.3, 2.4.8 Reporter: gaoyajun02 show demo for this ISSUE: This SQL This SQL can be executed normally {code:java} // SQL without error SELECT name, count(name) c FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); // An error is reported after exchanging the order of the query columns: SELECT count(name) c, name FROM VALUES ('Alice'), ('Bob') people(name) GROUP BY name GROUPING SETS(name); {code} The error message is: {code:java} Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [name#5, spark_grouping_id#3], [count(name#1) AS c#0L, name#1] +- Expand [List(name#1, name#4, 0)], [name#1, name#5, spark_grouping_id#3] +- Project [name#1, name#1 AS name#4] +- SubqueryAlias `people` +- LocalRelation [name#1] {code} So far, I have checked that there is no problem before version 2.3. During debugging, I found that the behavior of constructAggregateExprs in ResolveGroupingAnalytics has changed. {code:java} /* * Construct new aggregate expressions by replacing grouping functions. */ private def constructAggregateExprs( groupByExprs: Seq[Expression], aggregations: Seq[NamedExpression], groupByAliases: Seq[Alias], groupingAttrs: Seq[Expression], gid: Attribute): Seq[NamedExpression] = aggregations.map { // collect all the found AggregateExpression, so we can check an expression is part of // any AggregateExpression or not. val aggsBuffer = ArrayBuffer[Expression]() // Returns whether the expression belongs to any expressions in `aggsBuffer` or not. def isPartOfAggregation(e: Expression): Boolean = { aggsBuffer.exists(a => a.find(_ eq e).isDefined) } replaceGroupingFunc(_, groupByExprs, gid).transformDown { // AggregateExpression should be computed on the unmodified value of its argument // expressions, so we should not replace any references to grouping expression // inside it. case e: AggregateExpression => aggsBuffer += e e case e if isPartOfAggregation(e) => e case e => // Replace expression by expand output attribute. val index = groupByAliases.indexWhere(_.child.semanticEquals(e)) if (index == -1) { e } else { groupingAttrs(index) } }.asInstanceOf[NamedExpression] } {code} When performing aggregations.map, the aggsBuffer here seems to be outside the scope of the map. It can store the AggregateExpression of all the elements processed by the map function, but this is not before 2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36121) Write data loss caused by stage retry when enable v2 FileOutputCommitter
[ https://issues.apache.org/jira/browse/SPARK-36121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380646#comment-17380646 ] gaoyajun02 commented on SPARK-36121: Hi [~hyukjin.kwon], Thank you very much. It seems to be related to SPARK-27194. More precisely, my issue has the same phenomenon as SPARK-26682, we are using the earlier Spark 2.2.1 version. I will read and cherrypick this part of the patch. > Write data loss caused by stage retry when enable v2 FileOutputCommitter > > > Key: SPARK-36121 > URL: https://issues.apache.org/jira/browse/SPARK-36121 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.1, 3.0.1 >Reporter: gaoyajun02 >Priority: Critical > > All our ETL scenarios are configured: > mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed > occurs, the stage retry is triggered, and then the zombie stage and the retry > stage may write tasks of the same part at the same time, and their task > directory and file name are exactly the same. This may cause data part loss > due to conflicts between delete and rename operations. > For example, this is also a data loss case I encountered recently: Stage 5.0 > is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry > stage. They have two tasks concurrently writing the same part file: > part-00298. > # The task of stage 5.1 has preemptively created part file: part-00298 and > written data. > # At the same time as the task commit of stage 5.1, the task of sage 5.0 is > going to create this part file to write data, because the file already > exists, it throw an exception and delete the task's temporary directory. > # Then stage 5.0 starts commitTask, it will traverse the sub-directories and > execute rename. At this time, because the file has been deleted, it finally > moves empty without any exception, which causes data loss. > > I read this part of the code, and currently I think of two ideas: > # Add stageAttemptNumber to taskAttemptPath to avoid conflicts. > # Check the number of files after commitTask, and throw an exception > directly when it is found to be missing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36121) Write data loss caused by stage retry when enable v2 FileOutputCommitter
[ https://issues.apache.org/jira/browse/SPARK-36121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36121: --- Description: All our ETL scenarios are configured: mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed occurs, the stage retry is triggered, and then the zombie stage and the retry stage may write tasks of the same part at the same time, and their task directory and file name are exactly the same. This may cause data part loss due to conflicts between delete and rename operations. For example, this is also a data loss case I encountered recently: Stage 5.0 is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. They have two tasks concurrently writing the same part file: part-00298. # The task of stage 5.1 has preemptively created part file: part-00298 and written data. # At the same time as the task commit of stage 5.1, the task of sage 5.0 is going to create this part file to write data, because the file already exists, it throw an exception and delete the task's temporary directory. # Then stage 5.0 starts commitTask, it will traverse the sub-directories and execute rename. At this time, because the file has been deleted, it finally moves empty without any exception, which causes data loss. I read this part of the code, and currently I think of two ideas: # Add stageAttemptNumber to taskAttemptPath to avoid conflicts. # Check the number of files after commitTask, and throw an exception directly when it is found to be missing. was: All our ETL scenarios are configured: mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed occurs, the stage retry is triggered, and then the zombie stage and the retry stage may write tasks of the same part at the same time, and their task directory and file name are exactly the same. This may cause data part loss due to conflicts between file writing and rename operations. For example, this is also a data loss case I encountered recently: Stage 5.0 is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. They have two tasks concurrently writing the same part file: part-00298. # The task of stage 5.1 has preemptively created part file: part-00298 and written data. # At the same time as the task commit of stage 5.1, the task of sage 5.0 is going to create this part file to write data, because the file already exists, it throw an exception and delete the task's temporary directory. # Then stage 5.0 starts commitTask, it will traverse the sub-directories and execute rename. At this time, because the file has been deleted, it finally moves empty without any exception, which causes data loss. I read this part of the code, and currently I think of two ideas: # Add stageAttemptNumber to taskAttemptPath to avoid conflicts. # Check the number of files after commitTask, and throw an exception directly when it is found to be missing. > Write data loss caused by stage retry when enable v2 FileOutputCommitter > > > Key: SPARK-36121 > URL: https://issues.apache.org/jira/browse/SPARK-36121 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.1, 3.0.1 >Reporter: gaoyajun02 >Priority: Critical > > All our ETL scenarios are configured: > mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed > occurs, the stage retry is triggered, and then the zombie stage and the retry > stage may write tasks of the same part at the same time, and their task > directory and file name are exactly the same. This may cause data part loss > due to conflicts between delete and rename operations. > For example, this is also a data loss case I encountered recently: Stage 5.0 > is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry > stage. They have two tasks concurrently writing the same part file: > part-00298. > # The task of stage 5.1 has preemptively created part file: part-00298 and > written data. > # At the same time as the task commit of stage 5.1, the task of sage 5.0 is > going to create this part file to write data, because the file already > exists, it throw an exception and delete the task's temporary directory. > # Then stage 5.0 starts commitTask, it will traverse the sub-directories and > execute rename. At this time, because the file has been deleted, it finally > moves empty without any exception, which causes data loss. > > I read this part of the code, and currently I think of two ideas: > # Add stageAttemptNumber to taskAttemptPath to avoid conflicts. > # Check the number of files after commitTask, and throw an exception > directly when it is found to be missing. > > -- This message w
[jira] [Updated] (SPARK-36121) Write data loss caused by stage retry when enable v2 FileOutputCommitter
[ https://issues.apache.org/jira/browse/SPARK-36121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36121: --- Description: All our ETL scenarios are configured: mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed occurs, the stage retry is triggered, and then the zombie stage and the retry stage may write tasks of the same part at the same time, and their task directory and file name are exactly the same. This may cause data part loss due to conflicts between file writing and rename operations. For example, this is also a data loss case I encountered recently: Stage 5.0 is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. They have two tasks concurrently writing the same part file: part-00298. # The task of stage 5.1 has preemptively created part file: part-00298 and written data. # At the same time as the task commit of stage 5.1, the task of sage 5.0 is going to create this part file to write data, because the file already exists, it throw an exception and delete the task's temporary directory. # Then stage 5.0 starts commitTask, it will traverse the sub-directories and execute rename. At this time, because the file has been deleted, it finally moves empty without any exception, which causes data loss. I read this part of the code, and currently I think of two ideas: # Add stageAttemptNumber to taskAttemptPath to avoid conflicts. # Check the number of files after commitTask, and throw an exception directly when it is found to be missing. was: All our ETL scenarios are configured: mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed occurs, the stage retry is triggered, and then the zombie stage and the retry stage may write tasks of the same part at the same time, and their task directory and file name are exactly the same. This may cause data part loss due to conflicts between file writing and rename operations. For example, recently encountered a case of data loss: Stage 5.0 is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. They have two tasks concurrently writing the same part file: part-00298. # The task of stage 5.1 has preemptively created part file: part-00298 and written data. # At the same time as the task commit of stage 5.1, the task of sage 5.0 is going to create this part file to write data, because the file already exists, it throw an exception and delete the task's temporary directory. # Then stage 5.0 starts commitTask, it will traverse the sub-directories and execute rename. At this time, because the file has been deleted, it finally moves without any exception, which causes data loss. > Write data loss caused by stage retry when enable v2 FileOutputCommitter > > > Key: SPARK-36121 > URL: https://issues.apache.org/jira/browse/SPARK-36121 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.1, 3.0.1 >Reporter: gaoyajun02 >Priority: Critical > > All our ETL scenarios are configured: > mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed > occurs, the stage retry is triggered, and then the zombie stage and the retry > stage may write tasks of the same part at the same time, and their task > directory and file name are exactly the same. This may cause data part loss > due to conflicts between file writing and rename operations. > For example, this is also a data loss case I encountered recently: Stage 5.0 > is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry > stage. They have two tasks concurrently writing the same part file: > part-00298. > # The task of stage 5.1 has preemptively created part file: part-00298 and > written data. > # At the same time as the task commit of stage 5.1, the task of sage 5.0 is > going to create this part file to write data, because the file already > exists, it throw an exception and delete the task's temporary directory. > # Then stage 5.0 starts commitTask, it will traverse the sub-directories and > execute rename. At this time, because the file has been deleted, it finally > moves empty without any exception, which causes data loss. > > I read this part of the code, and currently I think of two ideas: > # Add stageAttemptNumber to taskAttemptPath to avoid conflicts. > # Check the number of files after commitTask, and throw an exception > directly when it is found to be missing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36121) Write data loss caused by stage retry when enable v2 FileOutputCommitter
gaoyajun02 created SPARK-36121: -- Summary: Write data loss caused by stage retry when enable v2 FileOutputCommitter Key: SPARK-36121 URL: https://issues.apache.org/jira/browse/SPARK-36121 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 3.0.1, 2.2.1 Reporter: gaoyajun02 All our ETL scenarios are configured: mapreduce.fileoutputcommitter.algorithm.version=2, when shuffle fetchFailed occurs, the stage retry is triggered, and then the zombie stage and the retry stage may write tasks of the same part at the same time, and their task directory and file name are exactly the same. This may cause data part loss due to conflicts between file writing and rename operations. For example, recently encountered a case of data loss: Stage 5.0 is a zombie stage caused by shuffle FetchFailed, and stage 5.1 is a retry stage. They have two tasks concurrently writing the same part file: part-00298. # The task of stage 5.1 has preemptively created part file: part-00298 and written data. # At the same time as the task commit of stage 5.1, the task of sage 5.0 is going to create this part file to write data, because the file already exists, it throw an exception and delete the task's temporary directory. # Then stage 5.0 starts commitTask, it will traverse the sub-directories and execute rename. At this time, because the file has been deleted, it finally moves without any exception, which causes data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371213#comment-17371213 ] gaoyajun02 edited comment on SPARK-35920 at 6/29/21, 9:31 AM: -- The common/unsafe module loses this dependency in this update and needs to be added {code:java} com.esotericsoftware kryo-shaded 4.0.2 {code} I submitted a PR to try to fix this problem, please see if there is anything else to add? was (Author: gaoyajun02): The spark core module loses this dependency in this update and needs to be added {code:java} com.esotericsoftware kryo-shaded 4.0.2 {code} > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35920) Upgrade to Chill 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-35920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371213#comment-17371213 ] gaoyajun02 commented on SPARK-35920: The spark core module loses this dependency in this update and needs to be added {code:java} com.esotericsoftware kryo-shaded 4.0.2 {code} > Upgrade to Chill 0.10.0 > --- > > Key: SPARK-35920 > URL: https://issues.apache.org/jira/browse/SPARK-35920 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org