Hi there,

We're joining very big datasets in 5 minutes bucket in on-prem k3 env.

We have this situation very often, i think shuffle partition is corrupted
as 5.1 TB didn't make sense at all.


We're running spark ver 3.5.2 in on-prem kubernetes with Spark Operator.


So I'd really like to know these, and really appreciated if anyone give any
clue :


   1. Is this really critical or i can ignore?
   2. what's the root cause, if its corrupted then why is it happening and
   how i fix this?
   3. Any recommendation for changing spark configuration, for instance,
   tune some network configurations, or any other tuning factors?
   4. Can upgrading spark to the latest fix this?


Any feedback or comment would be appreciated.

Thx
Jason




25/09/09 19:53:28 WARN TransportChannelHandler: Exception in connection
from /10.0.41.19:53800
                                      │

│ java.lang.IllegalArgumentException: Too large frame: 5135603447297303916

                                    │

│     at
org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)

                      │

│     at
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)

        │

│     at
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)

        │

│     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
                                                                          │

│     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
                                                                          │

│     at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)

│

│     at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)

    │

│     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
                                                                          │

│     at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
                                                                          │

│     at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)

              │

│     at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)

      │

│     at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)

                          │

│     at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)

                │

│     at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)

                          │

│     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)

                                    │

│     at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)

          │

│     at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

                            │

│     at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

                │

│     at java.base/java.lang.Thread.run(Thread.java:840)

Reply via email to