rhh777 opened a new issue, #154:
URL: https://github.com/apache/incubator-uniffle/issues/154
When I tested hudi, I got an error.
this is spark driver log, ERROR: Empty assignment to Shuffle Server
```
52278 [dag-scheduler-event-loop] INFO
org.apache.spark.shuffle.RssShuffleManager - Generate application id used in
rss: spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844
52281 [dag-scheduler-event-loop] ERROR
com.tencent.rss.client.impl.ShuffleWriteClientImpl - Empty assignment to
Shuffle Server
52282 [dag-scheduler-event-loop] ERROR
com.tencent.rss.client.impl.ShuffleWriteClientImpl - Error happened when
getShuffleAssignments with
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6],
numMaps[0], partitionNumPerRange[1] to coordinator
52283 [dag-scheduler-event-loop] WARN
org.apache.spark.scheduler.DAGScheduler - Creating new stage failed due to
exception - job: 5
com.tencent.rss.common.exception.RssException: Error happened when
getShuffleAssignments with
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6],
numMaps[0], partitionNumPerRange[1] to coordinator
at
com.tencent.rss.client.impl.ShuffleWriteClientImpl.throwExceptionIfNecessary(ShuffleWriteClientImpl.java:440)
at
com.tencent.rss.client.impl.ShuffleWriteClientImpl.getShuffleAssignments(ShuffleWriteClientImpl.java:291)
at
org.apache.spark.shuffle.RssShuffleManager.registerShuffle(RssShuffleManager.java:247)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:97)
at
org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87)
at org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:264)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:260)
at
org.apache.spark.scheduler.DAGScheduler.getShuffleDependenciesAndResourceProfiles(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGScheduler.getMissingAncestorShuffleDependencies(DAGScheduler.scala:597)
at
org.apache.spark.scheduler.DAGScheduler.getOrCreateShuffleMapStage(DAGScheduler.scala:394)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$getOrCreateParentStages$1(DAGScheduler.scala:580)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at
scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:48)
at scala.collection.SetLike.map(SetLike.scala:104)
at scala.collection.SetLike.map$(SetLike.scala:104)
at scala.collection.mutable.AbstractSet.map(Set.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:579)
at
org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:564)
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1115)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2396)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2388)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2377)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
52287 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 5 failed:
countByKey at BaseSparkCommitActionExecutor.java:191, took 0.076660 s
```
this is coordinator log , request partitionNum is 0
```
[INFO] 2022-08-11 11:29:49,335 Grpc-301 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6],
partitionNum[0], partitionNumPerRange[1], replica[1]
```
```
full log
[INFO] 2022-08-11 11:29:26,946 Grpc-267 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[0],
partitionNum[200], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:26,946 Grpc-267 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:29:26,946 Grpc-267 CoordinatorGrpcService
logAssignmentResult - Shuffle Servers of assignment for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[0] are
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
[INFO] 2022-08-11 11:29:27,033 Grpc-270 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[1],
partitionNum[200], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:27,033 Grpc-270 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:29:27,034 Grpc-270 CoordinatorGrpcService
logAssignmentResult - Shuffle Servers of assignment for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[1] are
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
[INFO] 2022-08-11 11:29:37,957 ApplicationManager-0 ApplicationManager
statusCheck - Start to check status for 2 applications
[INFO] 2022-08-11 11:29:43,047 Grpc-283 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[2],
partitionNum[200], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:43,048 Grpc-283 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:29:43,048 Grpc-283 CoordinatorGrpcService
logAssignmentResult - Shuffle Servers of assignment for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[2] are
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
[INFO] 2022-08-11 11:29:49,165 Grpc-293 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[3],
partitionNum[200], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:49,166 Grpc-293 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:29:49,166 Grpc-293 CoordinatorGrpcService
logAssignmentResult - Shuffle Servers of assignment for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[3] are
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
[INFO] 2022-08-11 11:29:49,247 Grpc-298 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[4],
partitionNum[200], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:49,247 Grpc-298 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:29:49,247 Grpc-298 CoordinatorGrpcService
logAssignmentResult - Shuffle Servers of assignment for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[4] are
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
[INFO] 2022-08-11 11:29:49,267 Grpc-297 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[5],
partitionNum[200], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:49,267 Grpc-297 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:29:49,268 Grpc-297 CoordinatorGrpcService
logAssignmentResult - Shuffle Servers of assignment for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[5] are
[10.1.3.174-19990, 10.1.3.175-19990, 10.1.3.173-19990]
[INFO] 2022-08-11 11:29:49,335 Grpc-301 CoordinatorGrpcService
getShuffleAssignments - Request of getShuffleAssignments for
appId[spark-8304854d3e234816bba3c3a1e8bd0ade1660188566844], shuffleId[6],
partitionNum[0], partitionNumPerRange[1], replica[1]
[WARN] 2022-08-11 11:29:49,335 Grpc-301 PartitionBalanceAssignmentStrategy
assign - Can't get expected servers [13] and found only [3]
[INFO] 2022-08-11 11:30:07,957 ApplicationManager-0 ApplicationManager
statusCheck - Start to check status for 2 applications
[INFO] 2022-08-11 11:30:07,957 ApplicationManager-0 ApplicationManager
statusCheck - Remove expired
application:spark-d7f3e51ca713472e88568db90c91bdea1660187027133
```
Environment:
```
uniffle: firestorm 0.4.1
spark: 3.1.2
hudi: 0.11.1
k8s: v1.21.3
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]