Hi there,
We're joining very big datasets in 5 minutes bucket in on-prem k3 env. We have this situation very often, i think shuffle partition is corrupted as 5.1 TB didn't make sense at all. We're running spark ver 3.5.2 in on-prem kubernetes with Spark Operator. So I'd really like to know these, and really appreciated if anyone give any clue : 1. Is this really critical or i can ignore? 2. what's the root cause, if its corrupted then why is it happening and how i fix this? 3. Any recommendation for changing spark configuration, for instance, tune some network configurations, or any other tuning factors? 4. Can upgrading spark to the latest fix this? Any feedback or comment would be appreciated. Thx Jason 25/09/09 19:53:28 WARN TransportChannelHandler: Exception in connection from /10.0.41.19:53800 │ │ java.lang.IllegalArgumentException: Too large frame: 5135603447297303916 │ │ at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) │ │ at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148) │ │ at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98) │ │ at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) │ │ at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) │ │ at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) │ │ at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) │ │ at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) │ │ at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) │ │ at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) │ │ at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) │ │ at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) │ │ at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) │ │ at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) │ │ at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) │ │ at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) │ │ at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) │ │ at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) │ │ at java.base/java.lang.Thread.run(Thread.java:840)