TaoYang526 commented on a change in pull request #11248: [FLINK-16299] Release 
containers recovered from previous attempt in w…
URL: https://github.com/apache/flink/pull/11248#discussion_r386823957
 
 

 ##########
 File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java
 ##########
 @@ -464,7 +472,15 @@ public void onContainerStarted(ContainerId containerId, 
Map<String, ByteBuffer>
 
        @Override
        public void onContainerStatusReceived(ContainerId containerId, 
ContainerStatus containerStatus) {
-               // We are not interested in getting container status
+               // We fetch the status of the container from the previous 
attempts.
+               if (containerStatus.getState() == ContainerState.NEW) {
 
 Review comment:
   > Are you suggesting that calling NMClientAsync.getContainerStatusAsync on a 
NEW container might result in onGetContainerStatusError on some Hadoop versions 
while onContainerStatusReceived on other versions?
   
   No, they are coexisting in Hadoop, onContainerStatusReceived is for 
containers that already started by AM via calling NMClient#startContainers 
while onGetContainerStatusError is for containers that haven't been been 
started by AM or other causes like NM lost.
   
   > If that is the case, I think we can have a common method handling 
releasing the container and removing it from the worker node map
   
   Yes, a common method is necessary.
   
   > One more question, how do we now whether a container is NEW or there's 
some other problems in onGetContainerStatusError?
   
   There maybe several causes for this handling, such as container is not found 
on NM or NM can't be connected, but they can be considered as a same problem: 
this container may be not useable for now since we can't get the status 
successfully,  I think we can just handle this as above no matter what the real 
cause is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to