[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the rety goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. Note that we manually killed the stuck app instances and the rety goes thru on the same cluster (without any code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a time string. Note that we manually killed the stuck app instances and the rety goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck with an unexpected stack trace when reading/parsing a time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Summary: Apps being stuck with an unexpected stack trace when reading/parsing a time string (was: App being stuck with an unexpected stack trace when reading/parsing a time string) > Apps being stuck with an unexpected stack trace when reading/parsing a time > string > -- > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a time string. > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at >