jerqi commented on issue #198: URL: https://github.com/apache/incubator-uniffle/issues/198#issuecomment-1246287622
> Follow up this problem. > > I found the Grpc client sometimes will throw DEADLINE exception like as follows > > ``` > org.apache.uniffle.common.exception.RssException: Failed to read shuffle data with ShuffleServerGrpcClient for host[10.67.67.68], port[21000] due to DEADLINE_EXCEEDED: deadline exceeded after 59.999946594s. [closed=[], committed=[remote_addr=10.67.67.68/10.67.67.68:21000]] > at org.apache.uniffle.storage.handler.impl.LocalFileClientRemoteReadHandler.readShuffleData(LocalFileClientRemoteReadHandler.java:88) > at org.apache.uniffle.storage.handler.impl.DataSkippableReadHandler.readShuffleData(DataSkippableReadHandler.java:83) > at org.apache.uniffle.storage.handler.impl.LocalFileClientReadHandler.readShuffleData(LocalFileClientReadHandler.java:79) > at org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler.readShuffleData(LocalFileQuorumClientReadHandler.java:79) > at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:112) > at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:195) > at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:131) > at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:101) > at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) > at org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:238) > at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage20.sort_addToSorter_0$(Unknown Source) > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage20.processNext(Unknown Source) > at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) > at org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) > at org.apache.spark.sql.execution.joins.SortMergeFullOuterJoinScanner.advancedRight(SortMergeJoinExec.scala:1000) > at org.apache.spark.sql.execution.joins.SortMergeFullOuterJoinScanner.<init>(SortMergeJoinExec.scala:975) > at org.apache.spark.sql.execution.joins.SortMergeJoinExec.$anonfun$doExecute$1(SortMergeJoinExec.scala:220) > at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 22/09/13 10:43:48 ERROR ComposedClientReadHandler: Failed to read shuffle data from WARM handler > ``` > > But I found this response has been sent by shuffle server, but the client side still throw exception. What will cause this? Network? GC? > > Did you meet similar problems? @jerqi Response may not be sent by shuffle server timely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
