[GitHub] [incubator-uniffle] zuston commented on issue #198: [Problem] Inconsistent blocks when reading shuffle data

GitBox Tue, 13 Sep 2022 22:23:32 -0700


zuston commented on issue #198:
URL: 
https://github.com/apache/incubator-uniffle/issues/198#issuecomment-1246249263


   Follow up this problem.
   
   I found the Grpc client sometimes will throw DEADLINE exception like as 
follows
   ```
   org.apache.uniffle.common.exception.RssException: Failed to read shuffle 
data with ShuffleServerGrpcClient for host[10.67.67.68], port[21000] due to 
DEADLINE_EXCEEDED: deadline exceeded after 59.999946594s. [closed=[], 
committed=[remote_addr=10.67.67.68/10.67.67.68:21000]]
        at 
org.apache.uniffle.storage.handler.impl.LocalFileClientRemoteReadHandler.readShuffleData(LocalFileClientRemoteReadHandler.java:88)
        at 
org.apache.uniffle.storage.handler.impl.DataSkippableReadHandler.readShuffleData(DataSkippableReadHandler.java:83)
        at 
org.apache.uniffle.storage.handler.impl.LocalFileClientReadHandler.readShuffleData(LocalFileClientReadHandler.java:79)
        at 
org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler.readShuffleData(LocalFileQuorumClientReadHandler.java:79)
        at 
org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:112)
        at 
org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:195)
        at 
org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:131)
        at 
org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:101)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:238)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage20.sort_addToSorter_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage20.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
        at 
org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)
        at 
org.apache.spark.sql.execution.joins.SortMergeFullOuterJoinScanner.advancedRight(SortMergeJoinExec.scala:1000)
        at 
org.apache.spark.sql.execution.joins.SortMergeFullOuterJoinScanner.<init>(SortMergeJoinExec.scala:975)
        at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.$anonfun$doExecute$1(SortMergeJoinExec.scala:220)
        at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   22/09/13 10:43:48 ERROR ComposedClientReadHandler: Failed to read shuffle 
data from WARM handler
   ```
   
   
   
   But I found this response has been sent by shuffle server, but the client 
side still throw exception. What will cause this? Network? GC?
   
   Did you meet similar problems?  @jerqi 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-uniffle] zuston commented on issue #198: [Problem] Inconsistent blocks when reading shuffle data

Reply via email to