[jira] [Created] (APEXCORE-708) Allow number of blocks when bp kicks in to be specified
Pramod Immaneni created APEXCORE-708: Summary: Allow number of blocks when bp kicks in to be specified Key: APEXCORE-708 URL: https://issues.apache.org/jira/browse/APEXCORE-708 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni Assignee: Pramod Immaneni This can be any number greater, equal or less than max in memory blocks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-707) Allow configurability on a per-stream basis using an attribute
Pramod Immaneni created APEXCORE-707: Summary: Allow configurability on a per-stream basis using an attribute Key: APEXCORE-707 URL: https://issues.apache.org/jira/browse/APEXCORE-707 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (APEXCORE-707) Allow configurability on a per-stream basis using an attribute
[ https://issues.apache.org/jira/browse/APEXCORE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni reassigned APEXCORE-707: Assignee: Pramod Immaneni > Allow configurability on a per-stream basis using an attribute > -- > > Key: APEXCORE-707 > URL: https://issues.apache.org/jira/browse/APEXCORE-707 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (APEXCORE-706) Drop blocks that have already been read so that the list size does not keep growing
[ https://issues.apache.org/jira/browse/APEXCORE-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Immaneni reassigned APEXCORE-706: Assignee: Pramod Immaneni > Drop blocks that have already been read so that the list size does not keep > growing > --- > > Key: APEXCORE-706 > URL: https://issues.apache.org/jira/browse/APEXCORE-706 > Project: Apache Apex Core > Issue Type: Sub-task > Components: Buffer Server >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > > The fault tolerance scenarios where downstream operator fails and needs to > re-read the older blocks should be handled. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (APEXCORE-706) Drop blocks that have already been read so that the list size does not keep growing
Pramod Immaneni created APEXCORE-706: Summary: Drop blocks that have already been read so that the list size does not keep growing Key: APEXCORE-706 URL: https://issues.apache.org/jira/browse/APEXCORE-706 Project: Apache Apex Core Issue Type: Sub-task Reporter: Pramod Immaneni The fault tolerance scenarios where downstream operator fails and needs to re-read the older blocks should be handled. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970454#comment-15970454 ] Vlad Rozov commented on APEXCORE-703: - I believe that the second test case is already covered for example in AtMostOnceTest.testLinearInputOperatorRecovery. Let me know if you think that a separate unit test in StreamingContainerManagerTest is required. I open PR to make sure that we agree on the proposed fix and will add 2 or 3 additional unit tests. One that simulates the bug and another in StreamingContainerManagerTest.testOperatorShutdown. > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin >Assignee: Vlad Rozov > Fix For: 3.6.0 > > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated APEXCORE-703: -- Fix Version/s: 3.6.0 > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin >Assignee: Vlad Rozov > Fix For: 3.6.0 > > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] apex-core pull request #516: APEXCORE-703 Window processing timeout for fini...
GitHub user vrozov opened a pull request: https://github.com/apache/apex-core/pull/516 APEXCORE-703 Window processing timeout for finished/undeployed container. During an operator shutdown, mark it as INACTIVE to exclude it from the blocked operators check. @tweise Please review. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/apex-core APEXCORE-703 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-core/pull/516.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #516 commit 0ebc23ee0e40f5259098f538a1b9cea4aeba9794 Author: Vlad RozovDate: 2017-04-16T16:34:09Z APEXCORE-703 Window processing timeout for finished/undeployed container. During an operator shutdown mark it as INACTIVE to exclude it from the blocked operators check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970436#comment-15970436 ] Thomas Weise commented on APEXCORE-703: --- Perhaps a test that verifies that the operator is marked INACTIVE (StreamingContainerManagerTest.testOperatorShutdown) and one that verifies that the INACTIVE operator is included when the container is scheduled for restart (see StreamingContainerManagerTest and scm.scheduleContainerRestart) > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin >Assignee: Vlad Rozov > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (APEXCORE-654) Recovery window is not updated when Delay Operator is used along with Partitioned Operators
[ https://issues.apache.org/jira/browse/APEXCORE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated APEXCORE-654: -- Fix Version/s: 3.6.0 > Recovery window is not updated when Delay Operator is used along with > Partitioned Operators > --- > > Key: APEXCORE-654 > URL: https://issues.apache.org/jira/browse/APEXCORE-654 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 > Environment: Hadoop 2.7.2 > Apache Apex 3.5.0 > Apache Apex Malhar 3.6.0 >Reporter: Ambarish Pande >Assignee: Bhupesh Chawda > Labels: DelayOperator > Fix For: 3.6.0 > > Attachments: ProblemDag.png > > > Checkpointing is not happening when DefaultDelayOperator is used in a DAG in > which some upstream operators are Partitioned. > When used without partitioning, I can see the operators being check-pointed > properly. > Here is the link of the App source code and also the built apa file. > https://github.com/ambarishpande/delay-operator-test -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970413#comment-15970413 ] Vlad Rozov commented on APEXCORE-703: - I'll open a PR. Do you have a suggestion for additional unit tests? > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin >Assignee: Vlad Rozov > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad Rozov reassigned APEXCORE-703: --- Assignee: Vlad Rozov > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin >Assignee: Vlad Rozov > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970411#comment-15970411 ] Thomas Weise commented on APEXCORE-703: --- I looked at it also and what you suggest should work. Want check a bit more closely to confirm that they will be included into redeploy when there is a container failure. Will put up a PR if that is confirmed, unless you want to work on it. > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container
[ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970407#comment-15970407 ] Vlad Rozov commented on APEXCORE-703: - I don't see why INACTIVE operators can not be redeployed and marked as ACTIVE in a case of a recovery. They are still part of the plan anyway. > Window processing timeout for finished/undeployed container > --- > > Key: APEXCORE-703 > URL: https://issues.apache.org/jira/browse/APEXCORE-703 > Project: Apache Apex Core > Issue Type: Bug >Affects Versions: 3.5.0 >Reporter: Daniel Halperin > > Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first > container, id #1, finishes and gets undeployed at 12:41:10 PM. > Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked > because no data has been received for 60s, declares failure, and restarts it. > This would seem to be a bug -- shouldn't finished and undeployed operators be > deregistered from the timeout logic that is detecting stuck operators? > Log below > {code} > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Undeploy request: [1] > Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer > undeploy > INFO: Undeploy complete. > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198930012, last window id change time 1492198869957, window > processing timeout millis 6 > Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager > updateCheckpoints > INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container > PTContainer[id=1(container-6),state=ACTIVE] time 60055ms > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer > processHeartbeatResponse > INFO: Received shutdown request > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run > INFO: Container container-6 restart. > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > scheduleContainerRestart > INFO: Initiating recovery for container-6@localhost > Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager > updateRecoveryCheckpoints > WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked > committed window , recovery window , current > time 1492198931015, last window id change time 1492198869957, window > processing timeout millis 6 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (APEXCORE-678) Shutdown of application should start from input nodes
[ https://issues.apache.org/jira/browse/APEXCORE-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhupesh Chawda resolved APEXCORE-678. - Resolution: Fixed > Shutdown of application should start from input nodes > - > > Key: APEXCORE-678 > URL: https://issues.apache.org/jira/browse/APEXCORE-678 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Bhupesh Chawda >Assignee: Bhupesh Chawda > Fix For: 3.6.0 > > > Streaming container calls shutdown() for all nodes instead of just input > nodes. > {code} > private void stopInputNodes() > { > for (Entrye : nodes.entrySet()) { > Node node = e.getValue(); > if (node instanceof InputNode) { > final Thread thread = e.getValue().context.getThread(); > if (thread == null || !thread.isAlive()) { > continue; > } > } > node.shutdown(true); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (APEXCORE-678) Shutdown of application should start from input nodes
[ https://issues.apache.org/jira/browse/APEXCORE-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970283#comment-15970283 ] ASF GitHub Bot commented on APEXCORE-678: - Github user asfgit closed the pull request at: https://github.com/apache/apex-core/pull/509 > Shutdown of application should start from input nodes > - > > Key: APEXCORE-678 > URL: https://issues.apache.org/jira/browse/APEXCORE-678 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Bhupesh Chawda >Assignee: Bhupesh Chawda > Fix For: 3.6.0 > > > Streaming container calls shutdown() for all nodes instead of just input > nodes. > {code} > private void stopInputNodes() > { > for (Entrye : nodes.entrySet()) { > Node node = e.getValue(); > if (node instanceof InputNode) { > final Thread thread = e.getValue().context.getThread(); > if (thread == null || !thread.isAlive()) { > continue; > } > } > node.shutdown(true); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] apex-core pull request #509: APEXCORE-678 Fixed shutdown of input nodes in S...
Github user asfgit closed the pull request at: https://github.com/apache/apex-core/pull/509 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: release blockers
Since the release is an infrequent event and also not on a fixed schedule, I think, it is good to give the community, enough time, to make their issue preferences known and also have discussions on the merits and demerits of including these in the release on an individual basis. From past experience, the releases haven't dragged on, so I don't think there is a need currently to set a time limit unless we start seeing evidence to the contrary. There should, however, be some common sense rules such as not to start proposing new items late in the discussion process unless they fall into the must-fix categories listed above by Vlad. Having said that, I think the current release proposal is in this right spirit and discussions happening there are following the same. Thanks On Sat, Apr 15, 2017 at 11:14 PM, Vlad Rozovwrote: > Please propose a reasonable time. IMO, there is always something "almost > ready" to be committed that contributors will want to squeeze into a > release. The request to include something into a release should point to an > open PR, not to a JIRA. Everything else may be optionally included into a > release if PR is merged by the time RC is cut. > > Since graduation, Apex core was released approximately every 5 month > (3.4.0 in late June, 3.5.0 in early December), so it is not suddenly and > while there is no fixed time release schedule and it is up to the community > to decide when to release, there is usually a reasonable amount of work > done by committers every 5 month to make an Apex core release. > > Thank you, > > Vlad > > > On 4/15/17 22:08, Thomas Weise wrote: > >> I think in absence of a fixed time release schedule it is actually >> reasonable to allow for more than a week to discuss an upcoming release >> with new features (not patch releases). This gives sufficient time for >> contributors to react and possibly get their ducks in a row. I don't think >> that there should be silence for several months and then suddenly someone >> pops up ready to pull the trigger. >> >> Thomas >> >> >> On Sat, Apr 15, 2017 at 9:52 PM, Vlad Rozov >> wrote: >> >> I believe it should be standard Apache voting rules and timing policy. >>> When somebody propose a release and there is no objections (-1), once >>> voting is over, the RC can be cut and submitted for the vote. IMO, it is >>> reasonable to assume that "way ahead" is one week and not one month. >>> >>> Thank you, >>> >>> Vlad >>> >>> >>> On 4/15/17 16:11, Thomas Weise wrote: >>> >>> There is a need for the community to agree on timing/scope of a release. That discussion should take place way ahead of cutting it. It is appropriate and desirable that folks think about and express their preferences on what they would like to see as part of the next release. It may be a priority for someone else to fix a particular issue, even if you would not see it that way. What is important is that everyone who suggests to include additional things makes a convincing case for it and is able to complete work in time. Once there is consensus on the scope, I would largely agree with the policy on what is allowed to delay or stop a release, as otherwise it will never go out. Thomas On Fri, Apr 14, 2017 at 2:33 PM, Vlad Rozov wrote: As both 692 & 687 are already resolved we should less focus on those > particular bugs, but in release policies in general. IMO only the > following > issues should stop the release: > > 1. Apache license issues >* source code is not properly licensed. It is quite unlikely as > for known file types, we have check in place. Problem may be > with new types not covered by the build) >* usage of Category X license dependencies > 2. Backward compatibility issues >* Existing API is covered by semantic versioning, but it may not > be sufficient >* New API introduced that is not marked as Evolving. >* Regression in existing functionality > 3. Security vulnerabilities > 4. JIRAs marked as Blocker (likely to fall into 3 previous categories > anyway, but possibly some critical bugs may fall into this > category > as well) > > Everything else is a nice to have and should be included into a release > if > a PR is ready and PR review is complete. It equally applies to bug > fixes, > new feature implementations and documentation issues. The > apex.apache.org > web site update is outside of the release cycle and can be done > independently of a release. > > Thank you, > > Vlad > > > On 4/14/17 08:46, Dean Lockgaard wrote: > > Vlad, > >> Here is my thought process about these tickets. Both 692 (Apex dev >> setup
Re: release blockers
Please propose a reasonable time. IMO, there is always something "almost ready" to be committed that contributors will want to squeeze into a release. The request to include something into a release should point to an open PR, not to a JIRA. Everything else may be optionally included into a release if PR is merged by the time RC is cut. Since graduation, Apex core was released approximately every 5 month (3.4.0 in late June, 3.5.0 in early December), so it is not suddenly and while there is no fixed time release schedule and it is up to the community to decide when to release, there is usually a reasonable amount of work done by committers every 5 month to make an Apex core release. Thank you, Vlad On 4/15/17 22:08, Thomas Weise wrote: I think in absence of a fixed time release schedule it is actually reasonable to allow for more than a week to discuss an upcoming release with new features (not patch releases). This gives sufficient time for contributors to react and possibly get their ducks in a row. I don't think that there should be silence for several months and then suddenly someone pops up ready to pull the trigger. Thomas On Sat, Apr 15, 2017 at 9:52 PM, Vlad Rozovwrote: I believe it should be standard Apache voting rules and timing policy. When somebody propose a release and there is no objections (-1), once voting is over, the RC can be cut and submitted for the vote. IMO, it is reasonable to assume that "way ahead" is one week and not one month. Thank you, Vlad On 4/15/17 16:11, Thomas Weise wrote: There is a need for the community to agree on timing/scope of a release. That discussion should take place way ahead of cutting it. It is appropriate and desirable that folks think about and express their preferences on what they would like to see as part of the next release. It may be a priority for someone else to fix a particular issue, even if you would not see it that way. What is important is that everyone who suggests to include additional things makes a convincing case for it and is able to complete work in time. Once there is consensus on the scope, I would largely agree with the policy on what is allowed to delay or stop a release, as otherwise it will never go out. Thomas On Fri, Apr 14, 2017 at 2:33 PM, Vlad Rozov wrote: As both 692 & 687 are already resolved we should less focus on those particular bugs, but in release policies in general. IMO only the following issues should stop the release: 1. Apache license issues * source code is not properly licensed. It is quite unlikely as for known file types, we have check in place. Problem may be with new types not covered by the build) * usage of Category X license dependencies 2. Backward compatibility issues * Existing API is covered by semantic versioning, but it may not be sufficient * New API introduced that is not marked as Evolving. * Regression in existing functionality 3. Security vulnerabilities 4. JIRAs marked as Blocker (likely to fall into 3 previous categories anyway, but possibly some critical bugs may fall into this category as well) Everything else is a nice to have and should be included into a release if a PR is ready and PR review is complete. It equally applies to bug fixes, new feature implementations and documentation issues. The apex.apache.org web site update is outside of the release cycle and can be done independently of a release. Thank you, Vlad On 4/14/17 08:46, Dean Lockgaard wrote: Vlad, Here is my thought process about these tickets. Both 692 (Apex dev setup sandbox section to reference Apex website downloads page) and 687 (update supported Hadoop v2.6 in Apex docs) are Apex documentation issues, and so they are part of the Apex release process. Furthermore, 692 directly references 693 (update Apex website downloads page with cleaned up and augmented list of 3rd party binaries), so it makes sense to have 693 updated as well, though of course I agree that it is not a part of Apex core release nor a blocker for the release. Thanks, Dean On Fri, Apr 14, 2017 at 11:27 AM, Vlad Rozov wrote: Dean, 692 and 693 are web site documentation issues and are not part of the Apex core 3.6.0 release. 687 can be covered in the release README (known issues). Thank you, Vlad On 4/13/17 14:11, Dean Lockgaard wrote: I'd like to request that 687, 692 and 693 be included in the 3.6.0 release. I will send PRs for these shortly. Thanks, Dean On Fri, Apr 14, 2017 at 5:05 AM, Amol Kekre wrote: +1 to cut a release Thks Amol E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre* www.datatorrent.com On Thu, Apr 13, 2017 at 9:22 AM, Pramod Immaneni < pra...@datatorrent.com wrote: +1 I would like to see 699 and 700 addressed as well. On Wed, Apr 12, 2017 at 10:16 PM, Tushar Gosavi <