rhh777 opened a new issue, #154:
URL: https://github.com/apache/incubator-uniffle/issues/154

   When I tested hudi, I got an error.
   this is spark driver log, ERROR: Empty assignment to Shuffle Server
   ```
   52278 [dag-scheduler-event-loop] INFO  
org.apache.spark.shuffle.RssShuffleManager  - Generate application id used in 
rss: spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844
   52281 [dag-scheduler-event-loop] ERROR 
com.tencent.rss.client.impl.ShuffleWriteClientImpl  - Empty assignment to 
Shuffle Server
   52282 [dag-scheduler-event-loop] ERROR 
com.tencent.rss.client.impl.ShuffleWriteClientImpl  - Error happened when 
getShuffleAssignments with 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6], 
numMaps[0], partitionNumPerRange[1] to coordinator
   52283 [dag-scheduler-event-loop] WARN  
org.apache.spark.scheduler.DAGScheduler  - Creating new stage failed due to 
exception - job: 5
   com.tencent.rss.common.exception.RssException: Error happened when 
getShuffleAssignments with 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6], 
numMaps[0], partitionNumPerRange[1] to coordinator
           at 
com.tencent.rss.client.impl.ShuffleWriteClientImpl.throwExceptionIfNecessary(ShuffleWriteClientImpl.java:440)
           at 
com.tencent.rss.client.impl.ShuffleWriteClientImpl.getShuffleAssignments(ShuffleWriteClientImpl.java:291)
           at 
org.apache.spark.shuffle.RssShuffleManager.registerShuffle(RssShuffleManager.java:247)
           at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:97)
           at 
org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87)
           at org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:264)
           at scala.Option.getOrElse(Option.scala:189)
           at org.apache.spark.rdd.RDD.dependencies(RDD.scala:260)
           at 
org.apache.spark.scheduler.DAGScheduler.getShuffleDependenciesAndResourceProfiles(DAGScheduler.scala:634)
           at 
org.apache.spark.scheduler.DAGScheduler.getMissingAncestorShuffleDependencies(DAGScheduler.scala:597)
           at 
org.apache.spark.scheduler.DAGScheduler.getOrCreateShuffleMapStage(DAGScheduler.scala:394)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$getOrCreateParentStages$1(DAGScheduler.scala:580)
           at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
           at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
           at scala.collection.TraversableLike.map(TraversableLike.scala:238)
           at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
           at 
scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:48)
           at scala.collection.SetLike.map(SetLike.scala:104)
           at scala.collection.SetLike.map$(SetLike.scala:104)
           at scala.collection.mutable.AbstractSet.map(Set.scala:48)
           at 
org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:579)
           at 
org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:564)
           at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1115)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2396)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2388)
           at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2377)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   52287 [main] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 5 failed: 
countByKey at BaseSparkCommitActionExecutor.java:191, took 0.076660 s
   ```
   
   this is coordinator log , request partitionNum is 0
   ```
   [INFO] 2022-08-11 11:29:49,335 Grpc-301 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6], 
partitionNum[0], partitionNumPerRange[1], replica[1]
   ```
   
   ```
   full log
   [INFO] 2022-08-11 11:29:26,946 Grpc-267 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[0], 
partitionNum[200], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:26,946 Grpc-267 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:29:26,946 Grpc-267 CoordinatorGrpcService 
logAssignmentResult - Shuffle Servers of assignment for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[0] are 
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
   [INFO] 2022-08-11 11:29:27,033 Grpc-270 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[1], 
partitionNum[200], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:27,033 Grpc-270 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:29:27,034 Grpc-270 CoordinatorGrpcService 
logAssignmentResult - Shuffle Servers of assignment for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[1] are 
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
   [INFO] 2022-08-11 11:29:37,957 ApplicationManager-0 ApplicationManager 
statusCheck - Start to check status for 2 applications
   [INFO] 2022-08-11 11:29:43,047 Grpc-283 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[2], 
partitionNum[200], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:43,048 Grpc-283 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:29:43,048 Grpc-283 CoordinatorGrpcService 
logAssignmentResult - Shuffle Servers of assignment for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[2] are 
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
   [INFO] 2022-08-11 11:29:49,165 Grpc-293 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[3], 
partitionNum[200], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:49,166 Grpc-293 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:29:49,166 Grpc-293 CoordinatorGrpcService 
logAssignmentResult - Shuffle Servers of assignment for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[3] are 
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
   [INFO] 2022-08-11 11:29:49,247 Grpc-298 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[4], 
partitionNum[200], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:49,247 Grpc-298 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:29:49,247 Grpc-298 CoordinatorGrpcService 
logAssignmentResult - Shuffle Servers of assignment for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[4] are 
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
   [INFO] 2022-08-11 11:29:49,267 Grpc-297 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[5], 
partitionNum[200], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:49,267 Grpc-297 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:29:49,268 Grpc-297 CoordinatorGrpcService 
logAssignmentResult - Shuffle Servers of assignment for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[5] are 
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
   [INFO] 2022-08-11 11:29:49,335 Grpc-301 CoordinatorGrpcService 
getShuffleAssignments - Request of getShuffleAssignments for 
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6], 
partitionNum[0], partitionNumPerRange[1], replica[1]
   [WARN] 2022-08-11 11:29:49,335 Grpc-301 PartitionBalanceAssignmentStrategy 
assign - Can't get expected servers [13] and found only [3]
   [INFO] 2022-08-11 11:30:07,957 ApplicationManager-0 ApplicationManager 
statusCheck - Start to check status for 2 applications
   [INFO] 2022-08-11 11:30:07,957 ApplicationManager-0 ApplicationManager 
statusCheck - Remove expired 
application:spark-d7f3e51ca713472e88568db90c91bdea1660187027133
   ```
   
   Environment:
   ```
   uniffle: firestorm 0.4.1
   spark: 3.1.2
   hudi: 0.11.1
   k8s: v1.21.3
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to