[GitHub] helix pull request: [Helix-612] Bump up the version of zkClient an...

2016-01-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/36


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] helix pull request: [Helix-612] Bump up the version of zkClient an...

2016-01-11 Thread kishoreg
Github user kishoreg commented on the pull request:

https://github.com/apache/helix/pull/36#issuecomment-170707954
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] helix pull request: [Helix-612] Bump up the version of zkClient an...

2016-01-11 Thread jicongrui
Github user jicongrui commented on the pull request:

https://github.com/apache/helix/pull/36#issuecomment-170704564
  
Any update for this request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HELIX-621) Missing listener notification of LiveInstances changes (and possibly other state change)

2016-01-11 Thread kishore gopalakrishna (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092654#comment-15092654
 ] 

kishore gopalakrishna commented on HELIX-621:
-

Can you show the code. Spectators do not listen to liveinstances, they only 
listen to ExternalView unless you add them explicitly.



> Missing listener notification of LiveInstances changes (and possibly other 
> state change)
> 
>
> Key: HELIX-621
> URL: https://issues.apache.org/jira/browse/HELIX-621
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.5
>Reporter: Marco P.
>
> I noticed sometimes my LiveInstanceChangeListener was not notified of an 
> instance disconnecting.
> Digging a little bit I found out:
>  - A reliable way to consistently reproduce this problem
>  - The problem does not seem to be limited to LiveInstances, it can happen to 
> other listeners using the same strategy
> This is bad as an application relies on notifications, and its view of the 
> system (LiveInstances or else) can get very outdated.
> The problem at the core is this logic:
> 1) Set watch W on some path P
> 2) Event E1 modifies P triggering W
> 3) The callback for W re-sets W on P
> If however a second Event E2 modifies between 2 and 3, W will not trigger 
> (until P is modified again).
> An example of why this is bad:
>  - 2 live instances L1, L2 and a spectator S watching them.
> 1) L1 disconnects
> 2) S's watch on LIVEINSTANCES fires
> 3) S reads the children of LIVEINSTANCES: {L2}
> 3) L2 disconnects
> 4) S's notifies LiveInstanceChangeListeners and goes back to watching 
> LIVEINSTANCES
> The application receives a notification that the live instances now consist 
> of {L2}. 
> And no further notification until another instance joins.
> The reality is that no instances are live.
> Again, this is not limited to LIVEINSTANCES, although that's the one I can 
> reliably reproduce.
> Fixing this is not trivial, it requires firing the watch again when 
> re-setting it IF the version of the watched node change since the last time 
> the watch fired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HELIX-622) Add new resource configuration option to allow resource to disable emmiting monitoring bean.

2016-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092582#comment-15092582
 ] 

ASF GitHub Bot commented on HELIX-622:
--

GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/41

A few more task framework improvement

This pull request includes four diffs (with each described as below):

1.  [HELIX-622] Add new resource configuration option to allow resource to 
disable emmiting monitoring bean.
  Description:
Helix creates a set of metrics for each resource. Since job is treated 
as a regular resource by Helix, each job will emit a set of new metrics to our 
internal monitoring system. But these metrics are dynamic date metrics, most of 
them are empty, it is meaningless to put any alerts on them, they are barely 
used in practice, but merely consuming the metric name space.

  On the other hand, however, we still need some stable metrics (fix set of 
metric names) for operational team to monitor the queue and job running status.

  For short term solution, we can add an option in JobConfig to enable 
emitting a metric for this job, by default, this is disabled. As a next step, 
we will need to add a new set of metrics for jobs and workflows.


2.  Do not expose internal configuration field name, this field names 
should be used only by Helix,  Client should always use JobConfig.Builder to 
create jobConfig, and construct jobConfig from HelixProperty before get fields 
from JobConfig. Client is not recommended to interpret fields from ZNRecord 
directly.

3. Clean up integration tests for task framework, move shared parts to 
TaskTestUtil.java.

4.  Job hung if the target resource does not exist anymore at the time when 
it is scheduled.
  Problem: When the job gets scheduled, if the target resource does not 
exist any more (e,g, database already deleted but the backup job is still 
there),  the job is stuck and all the rest of jobs are stuck.
 Change:If the target resource of a job does not exist, the job should be 
failed immediately.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #41


commit 32c463d9156017f048fe53830872efc26e99b7db
Author: Lei Xia 
Date:   2016-01-09T01:14:00Z

[HELIX-622] Add new resource configuration option to allow resource to 
disable emmiting monitoring bean.

commit a108cfb348b8ea0fdac3764b6c1672755fe64489
Author: Lei Xia 
Date:   2016-01-09T01:25:09Z

[HELIX-623] Do not expose internal configuration field name. Client should 
use JobConfig.Builder to create jobConfig.

commit f72627c7d2c7aa9b31fa69c5832226396995c20a
Author: Lei Xia 
Date:   2016-01-09T01:27:01Z

Clean up unit tests for task framework.

commit 8e2bf24c293afebd83076d9ee810cef4e43ab915
Author: Lei Xia 
Date:   2016-01-09T01:28:17Z

[HELIX-618]  Job hung if the target resource does not exist anymore at the 
time when it is scheduled.




> Add new resource configuration option to allow resource to disable emmiting 
> monitoring bean.
> 
>
> Key: HELIX-622
> URL: https://issues.apache.org/jira/browse/HELIX-622
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Lei Xia
>Assignee: Lei Xia
>
> Helix creates a set of metrics for each resource. Since job is treated as a 
> regular resource by Helix, each job will emit a set of new metrics to 
> ingraph.  But these metrics are dynamic date metrics, most of them are empty, 
> it is meaningless to put any alerts on them, they are barely used in 
> practice. 
> On the other hand, however, we still need some stable metrics (fix set of 
> metric names) for operational team to monitor the queue and job running 
> status.
> For short term solution, we can add an option in JobConfig to enable emitting 
> a metric for this job, by default, this is disabled.  As a next step, we will 
> need to add a new set of metrics for jobs and workflows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] helix pull request: A few more task framework improvement

2016-01-11 Thread lei-xia
GitHub user lei-xia opened a pull request:

https://github.com/apache/helix/pull/41

A few more task framework improvement

This pull request includes four diffs (with each described as below):

1.  [HELIX-622] Add new resource configuration option to allow resource to 
disable emmiting monitoring bean.
  Description:
Helix creates a set of metrics for each resource. Since job is treated 
as a regular resource by Helix, each job will emit a set of new metrics to our 
internal monitoring system. But these metrics are dynamic date metrics, most of 
them are empty, it is meaningless to put any alerts on them, they are barely 
used in practice, but merely consuming the metric name space.

  On the other hand, however, we still need some stable metrics (fix set of 
metric names) for operational team to monitor the queue and job running status.

  For short term solution, we can add an option in JobConfig to enable 
emitting a metric for this job, by default, this is disabled. As a next step, 
we will need to add a new set of metrics for jobs and workflows.


2.  Do not expose internal configuration field name, this field names 
should be used only by Helix,  Client should always use JobConfig.Builder to 
create jobConfig, and construct jobConfig from HelixProperty before get fields 
from JobConfig. Client is not recommended to interpret fields from ZNRecord 
directly.

3. Clean up integration tests for task framework, move shared parts to 
TaskTestUtil.java.

4.  Job hung if the target resource does not exist anymore at the time when 
it is scheduled.
  Problem: When the job gets scheduled, if the target resource does not 
exist any more (e,g, database already deleted but the backup job is still 
there),  the job is stuck and all the rest of jobs are stuck.
 Change:If the target resource of a job does not exist, the job should be 
failed immediately.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #41


commit 32c463d9156017f048fe53830872efc26e99b7db
Author: Lei Xia 
Date:   2016-01-09T01:14:00Z

[HELIX-622] Add new resource configuration option to allow resource to 
disable emmiting monitoring bean.

commit a108cfb348b8ea0fdac3764b6c1672755fe64489
Author: Lei Xia 
Date:   2016-01-09T01:25:09Z

[HELIX-623] Do not expose internal configuration field name. Client should 
use JobConfig.Builder to create jobConfig.

commit f72627c7d2c7aa9b31fa69c5832226396995c20a
Author: Lei Xia 
Date:   2016-01-09T01:27:01Z

Clean up unit tests for task framework.

commit 8e2bf24c293afebd83076d9ee810cef4e43ab915
Author: Lei Xia 
Date:   2016-01-09T01:28:17Z

[HELIX-618]  Job hung if the target resource does not exist anymore at the 
time when it is scheduled.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HELIX-621) Missing listener notification of LiveInstances changes (and possibly other state change)

2016-01-11 Thread Marco P. (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092401#comment-15092401
 ] 

Marco P. commented on HELIX-621:


[~k4j] From your explanation it would seem like it's not possible for the 
application to miss a notification (i.e. permanently be out of sync with 
LiveInstances changes), but one way or another this is happening pretty 
reliably with me.

i.e. my explanation for it was incorrect, but the problem is there.

You can try the following:

1) Start 1 spectator. Make it print notification for LiveInstances changes.
2) Start 2 (or more) participants
3) Kill all participants at more or less the same time
4) At this point you'd expect the spectator to tell that no participant is 
alive. However in some cases it will not, it's going to say someone is still 
up, and no subsequent notification will rectify.

This requires some luck, but you can repeat from step (2) and it shouldn't take 
long before you see the problem.





> Missing listener notification of LiveInstances changes (and possibly other 
> state change)
> 
>
> Key: HELIX-621
> URL: https://issues.apache.org/jira/browse/HELIX-621
> Project: Apache Helix
>  Issue Type: Bug
>  Components: helix-core
>Affects Versions: 0.6.5
>Reporter: Marco P.
>
> I noticed sometimes my LiveInstanceChangeListener was not notified of an 
> instance disconnecting.
> Digging a little bit I found out:
>  - A reliable way to consistently reproduce this problem
>  - The problem does not seem to be limited to LiveInstances, it can happen to 
> other listeners using the same strategy
> This is bad as an application relies on notifications, and its view of the 
> system (LiveInstances or else) can get very outdated.
> The problem at the core is this logic:
> 1) Set watch W on some path P
> 2) Event E1 modifies P triggering W
> 3) The callback for W re-sets W on P
> If however a second Event E2 modifies between 2 and 3, W will not trigger 
> (until P is modified again).
> An example of why this is bad:
>  - 2 live instances L1, L2 and a spectator S watching them.
> 1) L1 disconnects
> 2) S's watch on LIVEINSTANCES fires
> 3) S reads the children of LIVEINSTANCES: {L2}
> 3) L2 disconnects
> 4) S's notifies LiveInstanceChangeListeners and goes back to watching 
> LIVEINSTANCES
> The application receives a notification that the live instances now consist 
> of {L2}. 
> And no further notification until another instance joins.
> The reality is that no instances are live.
> Again, this is not limited to LIVEINSTANCES, although that's the one I can 
> reliably reproduce.
> Fixing this is not trivial, it requires firing the watch again when 
> re-setting it IF the version of the watched node change since the last time 
> the watch fired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)