Daniel Halperin created APEXCORE-703:
----------------------------------------

             Summary: Window processing timeout for finished/undeployed 
container
                 Key: APEXCORE-703
                 URL: https://issues.apache.org/jira/browse/APEXCORE-703
             Project: Apache Apex Core
          Issue Type: Bug
    Affects Versions: 3.5.0
            Reporter: Daniel Halperin


Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
container, id #1, finishes and gets undeployed at 12:41:10 PM.

Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
because no data has been received for 60s, declares failure, and restarts it.

This would seem to be a bug -- shouldn't finished and undeployed operators be 
deregistered from the timeout logic that is detecting stuck operators?

Log below

{code}
Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Undeploy request: [1]
Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
undeploy
INFO: Undeploy complete.
Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
updateRecoveryCheckpoints
WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
committed window ffffffffffffffff, recovery window ffffffffffffffff, current 
time 1492198930012, last window id change time 1492198869957, window processing 
timeout millis 60000
Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
updateCheckpoints
INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Received shutdown request
Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
INFO: Container container-6 restart.
Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
scheduleContainerRestart
INFO: Initiating recovery for container-6@localhost
Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
updateRecoveryCheckpoints
WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
committed window ffffffffffffffff, recovery window ffffffffffffffff, current 
time 1492198931015, last window id change time 1492198869957, window processing 
timeout millis 60000
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to