leixm opened a new issue, #273:
URL: https://github.com/apache/incubator-uniffle/issues/273

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Describe the bug
   
   The registerShuffle interface of ShuffleServer receives an empty 
remoteStoragePath, which eventually causes getShuffleResult to fail.
   
   What is even more strange is that the registration of the same shuffle needs 
to be registered with a total of 15 ShuffleServers, and **only one 
ShuffleServer has an empty remoteStoragePath passed in**.
   
   **Why does ShuffleServer receive an empty remoteStoragePath?**
   org.apache.spark.shuffle.RssShuffleManager#registerShuffle will set the path 
in remoteStorage to an empty string(as below code), In a concurrent scenario, 
one thread is setting remoteStorage to an empty string, and another thread is 
using remoteStorage, which will eventually cause getShuffleResult to fail.
   `remoteStorage = new 
RemoteStorageInfo(sparkConf.get(RssSparkConfig.RSS_REMOTE_STORAGE_PATH.key(), 
""));`
   
   getShuffleResult exception stack:
   ```
   [ERROR] 2022-10-19 02:59:51,672 Grpc-997 HdfsStorageManager 
getStorageByAppId - Can't find HDFS storage for 
appId[application_1664275719770_10420755_1666119585202]
   [ERROR] 2022-10-19 02:59:51,672 Grpc-997 ShuffleServerGrpcService 
getShuffleResult - Error happened when get shuffle result for 
appId[application_1664275719770_10420755_1666119585202], shuffleId[4], 
partitionId[13]
   java.lang.NullPointerException
           at 
org.apache.uniffle.server.ShuffleTaskManager.getFinishedBlockIds(ShuffleTaskManager.java:281)
           at 
org.apache.uniffle.server.ShuffleServerGrpcService.getShuffleResult(ShuffleServerGrpcService.java:361)
           at 
org.apache.uniffle.proto.ShuffleServerGrpc$MethodHandlers.invoke(ShuffleServerGrpc.java:923)
           at 
io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
           at 
io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
           at 
io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
           at 
io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352)
           at 
io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
           at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
           at 
io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
           
   ```
   
   ### Affects Version(s)
   
   0.6.0
   
   ### Uniffle Server Log Output
   
   ```logtalk
   Abnormal ShuffleServer:
   [INFO] 2022-10-19 02:59:46,066 Grpc-215 ShuffleServerGrpcService 
registerShuffle - Get register request for 
appId[application_1664275719770_10420755_1666119585202], shuffleId[4], 
remoteStorage[] with 66 partition ranges
   
   Normal ShuffleServer:
   [INFO] 2022-10-19 02:59:47,385 Grpc-974 ShuffleServerGrpcService 
registerShuffle - Get register request for 
appId[application_1664275719770_10420755_1666119585202], shuffleId[4], 
remoteStorage[hdfs://xxxxxxx/tmp/rss/shuffle_data] with 67 partition ranges
   ```
   
   
   ### Uniffle Engine Log Output
   
   _No response_
   
   ### Uniffle Server Configurations
   
   _No response_
   
   ### Uniffle Engine Configurations
   
   _No response_
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to