[jira] [Created] (APEXCORE-708) Allow number of blocks when bp kicks in to be specified

2017-04-16 Thread Pramod Immaneni (JIRA)
Pramod Immaneni created APEXCORE-708:


 Summary: Allow number of blocks when bp kicks in to be specified
 Key: APEXCORE-708
 URL: https://issues.apache.org/jira/browse/APEXCORE-708
 Project: Apache Apex Core
  Issue Type: Sub-task
Reporter: Pramod Immaneni
Assignee: Pramod Immaneni


This can be any number greater, equal or less than max in memory blocks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (APEXCORE-707) Allow configurability on a per-stream basis using an attribute

2017-04-16 Thread Pramod Immaneni (JIRA)
Pramod Immaneni created APEXCORE-707:


 Summary: Allow configurability on a per-stream basis using an 
attribute
 Key: APEXCORE-707
 URL: https://issues.apache.org/jira/browse/APEXCORE-707
 Project: Apache Apex Core
  Issue Type: Sub-task
Reporter: Pramod Immaneni






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (APEXCORE-707) Allow configurability on a per-stream basis using an attribute

2017-04-16 Thread Pramod Immaneni (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Immaneni reassigned APEXCORE-707:


Assignee: Pramod Immaneni

> Allow configurability on a per-stream basis using an attribute
> --
>
> Key: APEXCORE-707
> URL: https://issues.apache.org/jira/browse/APEXCORE-707
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (APEXCORE-706) Drop blocks that have already been read so that the list size does not keep growing

2017-04-16 Thread Pramod Immaneni (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Immaneni reassigned APEXCORE-706:


Assignee: Pramod Immaneni

> Drop blocks that have already been read so that the list size does not keep 
> growing
> ---
>
> Key: APEXCORE-706
> URL: https://issues.apache.org/jira/browse/APEXCORE-706
> Project: Apache Apex Core
>  Issue Type: Sub-task
>  Components: Buffer Server
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> The fault tolerance scenarios where downstream operator fails and needs to 
> re-read the older blocks should be handled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (APEXCORE-706) Drop blocks that have already been read so that the list size does not keep growing

2017-04-16 Thread Pramod Immaneni (JIRA)
Pramod Immaneni created APEXCORE-706:


 Summary: Drop blocks that have already been read so that the list 
size does not keep growing
 Key: APEXCORE-706
 URL: https://issues.apache.org/jira/browse/APEXCORE-706
 Project: Apache Apex Core
  Issue Type: Sub-task
Reporter: Pramod Immaneni


The fault tolerance scenarios where downstream operator fails and needs to 
re-read the older blocks should be handled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970454#comment-15970454
 ] 

Vlad Rozov commented on APEXCORE-703:
-

I believe that the second test case is already covered for example in 
AtMostOnceTest.testLinearInputOperatorRecovery. Let me know if you think that a 
separate unit test in StreamingContainerManagerTest is required. I open PR to 
make sure that we agree on the proposed fix and will add 2 or 3 additional unit 
tests. One that simulates the bug and another in 
StreamingContainerManagerTest.testOperatorShutdown.

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>Assignee: Vlad Rozov
> Fix For: 3.6.0
>
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXCORE-703:
--
Fix Version/s: 3.6.0

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>Assignee: Vlad Rozov
> Fix For: 3.6.0
>
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-core pull request #516: APEXCORE-703 Window processing timeout for fini...

2017-04-16 Thread vrozov
GitHub user vrozov opened a pull request:

https://github.com/apache/apex-core/pull/516

APEXCORE-703 Window processing timeout for finished/undeployed container.

During an operator shutdown, mark it as INACTIVE to exclude it from the 
blocked operators check.
@tweise Please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/apex-core APEXCORE-703

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #516


commit 0ebc23ee0e40f5259098f538a1b9cea4aeba9794
Author: Vlad Rozov 
Date:   2017-04-16T16:34:09Z

APEXCORE-703 Window processing timeout for finished/undeployed container.
During an operator shutdown mark it as INACTIVE to exclude it from the 
blocked operators check.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970436#comment-15970436
 ] 

Thomas Weise commented on APEXCORE-703:
---

Perhaps a test that verifies that the operator is marked INACTIVE 
(StreamingContainerManagerTest.testOperatorShutdown) and one that verifies that 
the INACTIVE operator is included when the container is scheduled for restart 
(see StreamingContainerManagerTest and scm.scheduleContainerRestart)

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>Assignee: Vlad Rozov
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXCORE-654) Recovery window is not updated when Delay Operator is used along with Partitioned Operators

2017-04-16 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXCORE-654:
--
Fix Version/s: 3.6.0

> Recovery window is not updated when Delay Operator is used along with 
> Partitioned Operators
> ---
>
> Key: APEXCORE-654
> URL: https://issues.apache.org/jira/browse/APEXCORE-654
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
> Environment: Hadoop 2.7.2
> Apache Apex 3.5.0
> Apache Apex Malhar 3.6.0
>Reporter: Ambarish Pande
>Assignee: Bhupesh Chawda
>  Labels: DelayOperator
> Fix For: 3.6.0
>
> Attachments: ProblemDag.png
>
>
> Checkpointing is not happening when DefaultDelayOperator is used in a DAG in 
> which some upstream operators are Partitioned.
> When used without partitioning, I can see the operators being check-pointed 
> properly.
> Here is the link of the App source code and also the built apa file.
> https://github.com/ambarishpande/delay-operator-test



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970413#comment-15970413
 ] 

Vlad Rozov commented on APEXCORE-703:
-

I'll open a PR. Do you have a suggestion for additional unit tests?

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>Assignee: Vlad Rozov
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov reassigned APEXCORE-703:
---

Assignee: Vlad Rozov

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>Assignee: Vlad Rozov
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970411#comment-15970411
 ] 

Thomas Weise commented on APEXCORE-703:
---

I looked at it also and what you suggest should work. Want check a bit more 
closely to confirm that they will be included into redeploy when there is a 
container failure. Will put up a PR if that is confirmed, unless you want to 
work on it.

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container

2017-04-16 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970407#comment-15970407
 ] 

Vlad Rozov commented on APEXCORE-703:
-

I don't see why INACTIVE operators can not be redeployed and marked as ACTIVE 
in a case of a recovery. They are still part of the plan anyway.

> Window processing timeout for finished/undeployed container
> ---
>
> Key: APEXCORE-703
> URL: https://issues.apache.org/jira/browse/APEXCORE-703
> Project: Apache Apex Core
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Daniel Halperin
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first 
> container, id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked 
> because no data has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be 
> deregistered from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer 
> undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198930012, last window id change time 1492198869957, window 
> processing timeout millis 6
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager 
> updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container 
> PTContainer[id=1(container-6),state=ACTIVE] time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer 
> processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager 
> updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked 
> committed window , recovery window , current 
> time 1492198931015, last window id change time 1492198869957, window 
> processing timeout millis 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (APEXCORE-678) Shutdown of application should start from input nodes

2017-04-16 Thread Bhupesh Chawda (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhupesh Chawda resolved APEXCORE-678.
-
Resolution: Fixed

> Shutdown of application should start from input nodes
> -
>
> Key: APEXCORE-678
> URL: https://issues.apache.org/jira/browse/APEXCORE-678
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>Assignee: Bhupesh Chawda
> Fix For: 3.6.0
>
>
> Streaming container calls shutdown() for all nodes instead of just input 
> nodes.
> {code}
>   private void stopInputNodes()
>   {
> for (Entry e : nodes.entrySet()) {
>   Node node = e.getValue();
>   if (node instanceof InputNode) {
> final Thread thread = e.getValue().context.getThread();
> if (thread == null || !thread.isAlive()) {
>   continue;
> }
>   }
>   node.shutdown(true);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-678) Shutdown of application should start from input nodes

2017-04-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970283#comment-15970283
 ] 

ASF GitHub Bot commented on APEXCORE-678:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/509


> Shutdown of application should start from input nodes
> -
>
> Key: APEXCORE-678
> URL: https://issues.apache.org/jira/browse/APEXCORE-678
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>Assignee: Bhupesh Chawda
> Fix For: 3.6.0
>
>
> Streaming container calls shutdown() for all nodes instead of just input 
> nodes.
> {code}
>   private void stopInputNodes()
>   {
> for (Entry e : nodes.entrySet()) {
>   Node node = e.getValue();
>   if (node instanceof InputNode) {
> final Thread thread = e.getValue().context.getThread();
> if (thread == null || !thread.isAlive()) {
>   continue;
> }
>   }
>   node.shutdown(true);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-core pull request #509: APEXCORE-678 Fixed shutdown of input nodes in S...

2017-04-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/509


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: release blockers

2017-04-16 Thread Pramod Immaneni
Since the release is an infrequent event and also not on a fixed schedule,
I think, it is good to give the community, enough time, to make their issue
preferences known and also have discussions on the merits and demerits of
including these in the release on an individual basis. From past
experience, the releases haven't dragged on, so I don't think there is a
need currently to set a time limit unless we start seeing evidence to the
contrary. There should, however, be some common sense rules such as not to
start proposing new items late in the discussion process unless they fall
into the must-fix categories listed above by Vlad.

Having said that, I think the current release proposal is in this right
spirit and discussions happening there are following the same.

Thanks

On Sat, Apr 15, 2017 at 11:14 PM, Vlad Rozov 
wrote:

> Please propose a reasonable time. IMO, there is always something "almost
> ready" to be committed that contributors will want to squeeze into a
> release. The request to include something into a release should point to an
> open PR, not to a JIRA. Everything else may be optionally included into a
> release if PR is merged by the time RC is cut.
>
> Since graduation, Apex core was released approximately every 5 month
> (3.4.0 in late June, 3.5.0 in early December), so it is not suddenly and
> while there is no fixed time release schedule and it is up to the community
> to decide when to release, there is usually a reasonable amount of work
> done by committers every 5 month to make an Apex core release.
>
> Thank you,
>
> Vlad
>
>
> On 4/15/17 22:08, Thomas Weise wrote:
>
>> I think in absence of a fixed time release schedule it is actually
>> reasonable to allow for more than a week to discuss an upcoming release
>> with new features (not patch releases). This gives sufficient time for
>> contributors to react and possibly get their ducks in a row. I don't think
>> that there should be silence for several months and then suddenly someone
>> pops up ready to pull the trigger.
>>
>> Thomas
>>
>>
>> On Sat, Apr 15, 2017 at 9:52 PM, Vlad Rozov 
>> wrote:
>>
>> I believe it should be standard Apache voting rules and timing policy.
>>> When somebody propose a release and there is no objections (-1), once
>>> voting is over, the RC can be cut and submitted for the vote. IMO, it is
>>> reasonable to assume that "way ahead" is one week and not one month.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 4/15/17 16:11, Thomas Weise wrote:
>>>
>>> There is a need for the community to agree on timing/scope of a release.
 That discussion should take place way ahead of cutting it. It is
 appropriate and desirable that folks think about and express their
 preferences on what they would like to see as part of the next release.

 It may be a priority for someone else to fix a particular issue, even if
 you would not see it that way. What is important is that everyone who
 suggests to include additional things makes a convincing case for it and
 is
 able to complete work in time.

 Once there is consensus on the scope, I would largely agree with the
 policy
 on what is allowed to delay or stop a release, as otherwise it will
 never
 go out.

 Thomas


 On Fri, Apr 14, 2017 at 2:33 PM, Vlad Rozov 
 wrote:

 As both 692 & 687 are already resolved we should less focus on those

> particular bugs, but in release policies in general. IMO only the
> following
> issues should stop the release:
>
> 1. Apache license issues
>* source code is not properly licensed. It is quite unlikely as
>  for known file types, we have check in place. Problem may be
>  with new types not covered by the build)
>* usage of Category X license dependencies
> 2. Backward compatibility issues
>* Existing API is covered by semantic versioning, but it may not
>  be sufficient
>* New API introduced that is not marked as Evolving.
>* Regression in existing functionality
> 3. Security vulnerabilities
> 4. JIRAs marked as Blocker (likely to fall into 3 previous categories
>  anyway, but possibly some critical bugs may fall into this
> category
>  as well)
>
> Everything else is a nice to have and should be included into a release
> if
> a PR is ready and PR review is complete. It equally applies to bug
> fixes,
> new feature implementations and documentation issues. The
> apex.apache.org
> web site update is outside of the release cycle and can be done
> independently of a release.
>
> Thank you,
>
> Vlad
>
>
> On 4/14/17 08:46, Dean Lockgaard wrote:
>
> Vlad,
>
>> Here is my thought process about these tickets.  Both 692 (Apex dev
>> setup

Re: release blockers

2017-04-16 Thread Vlad Rozov
Please propose a reasonable time. IMO, there is always something "almost 
ready" to be committed that contributors will want to squeeze into a 
release. The request to include something into a release should point to 
an open PR, not to a JIRA. Everything else may be optionally included 
into a release if PR is merged by the time RC is cut.


Since graduation, Apex core was released approximately every 5 month 
(3.4.0 in late June, 3.5.0 in early December), so it is not suddenly and 
while there is no fixed time release schedule and it is up to the 
community to decide when to release, there is usually a reasonable 
amount of work done by committers every 5 month to make an Apex core 
release.


Thank you,

Vlad

On 4/15/17 22:08, Thomas Weise wrote:

I think in absence of a fixed time release schedule it is actually
reasonable to allow for more than a week to discuss an upcoming release
with new features (not patch releases). This gives sufficient time for
contributors to react and possibly get their ducks in a row. I don't think
that there should be silence for several months and then suddenly someone
pops up ready to pull the trigger.

Thomas


On Sat, Apr 15, 2017 at 9:52 PM, Vlad Rozov  wrote:


I believe it should be standard Apache voting rules and timing policy.
When somebody propose a release and there is no objections (-1), once
voting is over, the RC can be cut and submitted for the vote. IMO, it is
reasonable to assume that "way ahead" is one week and not one month.

Thank you,

Vlad


On 4/15/17 16:11, Thomas Weise wrote:


There is a need for the community to agree on timing/scope of a release.
That discussion should take place way ahead of cutting it. It is
appropriate and desirable that folks think about and express their
preferences on what they would like to see as part of the next release.

It may be a priority for someone else to fix a particular issue, even if
you would not see it that way. What is important is that everyone who
suggests to include additional things makes a convincing case for it and
is
able to complete work in time.

Once there is consensus on the scope, I would largely agree with the
policy
on what is allowed to delay or stop a release, as otherwise it will never
go out.

Thomas


On Fri, Apr 14, 2017 at 2:33 PM, Vlad Rozov 
wrote:

As both 692 & 687 are already resolved we should less focus on those

particular bugs, but in release policies in general. IMO only the
following
issues should stop the release:

1. Apache license issues
   * source code is not properly licensed. It is quite unlikely as
 for known file types, we have check in place. Problem may be
 with new types not covered by the build)
   * usage of Category X license dependencies
2. Backward compatibility issues
   * Existing API is covered by semantic versioning, but it may not
 be sufficient
   * New API introduced that is not marked as Evolving.
   * Regression in existing functionality
3. Security vulnerabilities
4. JIRAs marked as Blocker (likely to fall into 3 previous categories
 anyway, but possibly some critical bugs may fall into this category
 as well)

Everything else is a nice to have and should be included into a release
if
a PR is ready and PR review is complete. It equally applies to bug fixes,
new feature implementations and documentation issues. The
apex.apache.org
web site update is outside of the release cycle and can be done
independently of a release.

Thank you,

Vlad


On 4/14/17 08:46, Dean Lockgaard wrote:

Vlad,

Here is my thought process about these tickets.  Both 692 (Apex dev
setup
sandbox section to reference Apex website downloads page) and 687
(update
supported Hadoop v2.6 in Apex docs) are Apex documentation issues, and
so
they are part of the Apex release process.  Furthermore, 692 directly
references 693 (update Apex website downloads page with cleaned up and
augmented list of 3rd party binaries), so it makes sense to have 693
updated as well, though of course I agree that it is not a part of Apex
core release nor a blocker for the release.

Thanks,
Dean



On Fri, Apr 14, 2017 at 11:27 AM, Vlad Rozov 
wrote:

Dean,


692 and 693 are web site documentation issues and are not part of the
Apex
core 3.6.0 release. 687 can be covered in the release README (known
issues).

Thank you,

Vlad

On 4/13/17 14:11, Dean Lockgaard wrote:

I'd like to request that 687, 692 and 693 be included in the 3.6.0


release.  I will send PRs for these shortly.

Thanks,
Dean



On Fri, Apr 14, 2017 at 5:05 AM, Amol Kekre 
wrote:

+1 to cut a release

Thks

Amol


E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Thu, Apr 13, 2017 at 9:22 AM, Pramod Immaneni <
pra...@datatorrent.com
wrote:

+1

I would like to see 699 and 700 addressed as well.

On Wed, Apr 12, 2017 at 10:16 PM, Tushar Gosavi <