Re: DISCUSS : Monitor the neighbour JVM using neihbour's member-timeout (GEODE-3411)

2017-09-06 Thread Anthony Baker
Are you intending to use server groups to divide the cluster into different 
logical groupings?  Unless the groups host entirely separate data sets I could 
see an asymmetric response pattern depending if the primary was hosted on a 
member with a short / long timeout.

Anthony

> On Sep 6, 2017, at 7:27 AM, Aravind Musigumpula 
> <aravind.musigump...@amdocs.com> wrote:
> 
> Hi Udo,
> 
> For your question: "If you feel that the member timeout is too short for some 
> members, why don't you increase the current member timeout?"
> 
> Yes, for some members I feel that member-timeout is short. I want to increase 
> the timeout for some members. But that timeout is not being used to monitor 
> themselves but instead the increased member timeout may be used to monitor 
> some other member.
> 
> If I want some member to be alive for a little more time even if it is not 
> responding, Now I need to increase the timeout of all the members.
> 
> Do you mean to increase the current member timeout for all the members.
> 
> 
> Thanks,
> Aravind Musigumpula 
> 
> 
> -Original Message-
> From: Udo Kohlmeyer [mailto:ukohlme...@pivotal.io] 
> Sent: Friday, September 01, 2017 10:05 PM
> To: dev@geode.apache.org
> Subject: Re: DISCUSS : Monitor the neighbour JVM using neihbour's 
> member-timeout (GEODE-3411)
> 
> Hi there Aravind,
> 
> I have a singular problem with this approach.
> 
> If a some members are designated to do more work, and don't have time to 
> respond to the cluster that they are alive using the current member timeout, 
> then they are not available to accept data. Which means they are not 
> effective members of the cluster and we cannot count on them to host data or 
> replicates.
> 
> This setting is there to safe guard the cluster against non-responsive 
> members that cause the whole cluster to be unhealthy if left unchecked for 
> too long. This can lead to potential data loss
> 
> If you feel that the member timeout is too short for some members, why don't 
> you increase the current member timeout?
> 
> My opinion is a -1 for changing the current behavior.
> 
> --Udo
> 
> On 9/1/17 03:46, Aravind Musigumpula wrote:
>> Hi Brian,
>> 
>> This will help if the user has some member doing a heavy duty when compared 
>> to others, in this case we need to give such member some extra time to that 
>> member.
>> 
>> Thanks,
>> Aravind Musigumpula
>> 
>> 
>> -Original Message-
>> From: Brian Baynes [mailto:bbay...@pivotal.io]
>> Sent: Friday, September 01, 2017 4:39 AM
>> To: dev@geode.apache.org
>> Subject: Re: DISCUSS : Monitor the neighbour JVM using neihbour's 
>> member-timeout (GEODE-3411)
>> 
>> Hi, Aravind.
>> 
>> Can you help me understand why this might be a useful feature for Geode?  I 
>> see that your needs require it, but why would users in general want to allow 
>> longer timeouts for some members?  This is a significant change with 
>> backward-compatibility implications, so would be good for the community to 
>> understand the potential benefit.
>> 
>> Thanks!
>> Brian
>> 
>> On Mon, Aug 28, 2017 at 12:20 AM, Aravind Musigumpula < 
>> aravind.musigump...@amdocs.com> wrote:
>> 
>>> Hi Team,
>>> 
>>> We have a requirement to configure  different member timeout for 
>>> different members as we need some members to survive in the view for 
>>> longer time than the other the members before being kicked out of the 
>>> view in case they aren't responding.
>>> 
>>> 
>>> 1.   Now with the current monitoring system it is not possible to
>>> determine when the member will be kicked out of the view if we 
>>> configure different member-timeout's for some required members.
>>> 
>>> 2.   Because if a member is not responding to any heartbeat requests,
>>> the member who is monitoring the non-responding member will initiate 
>>> check member request.
>>> 
>>> 3.   In this check member request monitoring member pings the
>>> non-responding member and waits for member-timeout of monitoring 
>>> member for a response.
>>> 
>>> 4.   If still there is no response, it will initiate a final suspect
>>> request to coordinator where the coordinator does the final check 
>>> waiting for coordinators member-timeout.
>>> 
>>> 5.   If coordinator did not get any response, it will remove the
>>> non-responding member from the view and publishes it.
>>> 
>>> 6.   So,

RE: DISCUSS : Monitor the neighbour JVM using neihbour's member-timeout (GEODE-3411)

2017-09-06 Thread Aravind Musigumpula
Hi Udo,

For your question: "If you feel that the member timeout is too short for some 
members, why don't you increase the current member timeout?"

Yes, for some members I feel that member-timeout is short. I want to increase 
the timeout for some members. But that timeout is not being used to monitor 
themselves but instead the increased member timeout may be used to monitor some 
other member.

If I want some member to be alive for a little more time even if it is not 
responding, Now I need to increase the timeout of all the members.

Do you mean to increase the current member timeout for all the members.


Thanks,
Aravind Musigumpula 


-Original Message-
From: Udo Kohlmeyer [mailto:ukohlme...@pivotal.io] 
Sent: Friday, September 01, 2017 10:05 PM
To: dev@geode.apache.org
Subject: Re: DISCUSS : Monitor the neighbour JVM using neihbour's 
member-timeout (GEODE-3411)

Hi there Aravind,

I have a singular problem with this approach.

If a some members are designated to do more work, and don't have time to 
respond to the cluster that they are alive using the current member timeout, 
then they are not available to accept data. Which means they are not effective 
members of the cluster and we cannot count on them to host data or replicates.

This setting is there to safe guard the cluster against non-responsive members 
that cause the whole cluster to be unhealthy if left unchecked for too long. 
This can lead to potential data loss

If you feel that the member timeout is too short for some members, why don't 
you increase the current member timeout?

My opinion is a -1 for changing the current behavior.

--Udo

On 9/1/17 03:46, Aravind Musigumpula wrote:
> Hi Brian,
>
> This will help if the user has some member doing a heavy duty when compared 
> to others, in this case we need to give such member some extra time to that 
> member.
>
> Thanks,
> Aravind Musigumpula
>
>
> -Original Message-
> From: Brian Baynes [mailto:bbay...@pivotal.io]
> Sent: Friday, September 01, 2017 4:39 AM
> To: dev@geode.apache.org
> Subject: Re: DISCUSS : Monitor the neighbour JVM using neihbour's 
> member-timeout (GEODE-3411)
>
> Hi, Aravind.
>
> Can you help me understand why this might be a useful feature for Geode?  I 
> see that your needs require it, but why would users in general want to allow 
> longer timeouts for some members?  This is a significant change with 
> backward-compatibility implications, so would be good for the community to 
> understand the potential benefit.
>
> Thanks!
> Brian
>
> On Mon, Aug 28, 2017 at 12:20 AM, Aravind Musigumpula < 
> aravind.musigump...@amdocs.com> wrote:
>
>> Hi Team,
>>
>> We have a requirement to configure  different member timeout for 
>> different members as we need some members to survive in the view for 
>> longer time than the other the members before being kicked out of the 
>> view in case they aren't responding.
>>
>>
>> 1.   Now with the current monitoring system it is not possible to
>> determine when the member will be kicked out of the view if we 
>> configure different member-timeout's for some required members.
>>
>> 2.   Because if a member is not responding to any heartbeat requests,
>> the member who is monitoring the non-responding member will initiate 
>> check member request.
>>
>> 3.   In this check member request monitoring member pings the
>> non-responding member and waits for member-timeout of monitoring 
>> member for a response.
>>
>> 4.   If still there is no response, it will initiate a final suspect
>> request to coordinator where the coordinator does the final check 
>> waiting for coordinators member-timeout.
>>
>> 5.   If coordinator did not get any response, it will remove the
>> non-responding member from the view and publishes it.
>>
>> 6.   So, Here the time period for removing a member depends on its
>> monitoring member's and coordinator's timeout. But the monitoring 
>> member depends on the view but it may change from time to time.
>>
>> So, now when a monitoring-member doing the check on a member, if we 
>> wait for the non-responding member's timeout instead of the 
>> monitoring member-timeout, then the time when the non-responding 
>> member will be removed from the view depends on its own 
>> member-timeout and the coordinators member-timeout.
>> Hence we can configure different member-timeout for the required members.
>>
>> I created a pull request based on the above scenario:
>> https://github.com/apache/geode/pull/717
>>
>> Is the above approach correct? Do we have any concerns around this area?
>> Please give your insight

Re: DISCUSS : Monitor the neighbour JVM using neihbour's member-timeout (GEODE-3411)

2017-09-01 Thread Udo Kohlmeyer

Hi there Aravind,

I have a singular problem with this approach.

If a some members are designated to do more work, and don't have time to 
respond to the cluster that they are alive using the current member 
timeout, then they are not available to accept data. Which means they 
are not effective members of the cluster and we cannot count on them to 
host data or replicates.


This setting is there to safe guard the cluster against non-responsive 
members that cause the whole cluster to be unhealthy if left unchecked 
for too long. This can lead to potential data loss


If you feel that the member timeout is too short for some members, why 
don't you increase the current member timeout?


My opinion is a -1 for changing the current behavior.

--Udo

On 9/1/17 03:46, Aravind Musigumpula wrote:

Hi Brian,

This will help if the user has some member doing a heavy duty when compared to 
others, in this case we need to give such member some extra time to that member.

Thanks,
Aravind Musigumpula


-Original Message-
From: Brian Baynes [mailto:bbay...@pivotal.io]
Sent: Friday, September 01, 2017 4:39 AM
To: dev@geode.apache.org
Subject: Re: DISCUSS : Monitor the neighbour JVM using neihbour's 
member-timeout (GEODE-3411)

Hi, Aravind.

Can you help me understand why this might be a useful feature for Geode?  I see 
that your needs require it, but why would users in general want to allow longer 
timeouts for some members?  This is a significant change with 
backward-compatibility implications, so would be good for the community to 
understand the potential benefit.

Thanks!
Brian

On Mon, Aug 28, 2017 at 12:20 AM, Aravind Musigumpula < 
aravind.musigump...@amdocs.com> wrote:


Hi Team,

We have a requirement to configure  different member timeout for
different members as we need some members to survive in the view for
longer time than the other the members before being kicked out of the
view in case they aren't responding.


1.   Now with the current monitoring system it is not possible to
determine when the member will be kicked out of the view if we
configure different member-timeout's for some required members.

2.   Because if a member is not responding to any heartbeat requests,
the member who is monitoring the non-responding member will initiate
check member request.

3.   In this check member request monitoring member pings the
non-responding member and waits for member-timeout of monitoring
member for a response.

4.   If still there is no response, it will initiate a final suspect
request to coordinator where the coordinator does the final check
waiting for coordinators member-timeout.

5.   If coordinator did not get any response, it will remove the
non-responding member from the view and publishes it.

6.   So, Here the time period for removing a member depends on its
monitoring member's and coordinator's timeout. But the monitoring
member depends on the view but it may change from time to time.

So, now when a monitoring-member doing the check on a member, if we
wait for the non-responding member's timeout instead of the monitoring
member-timeout, then the time when the non-responding member will be
removed from the view depends on its own member-timeout and the
coordinators member-timeout.
Hence we can configure different member-timeout for the required members.

I created a pull request based on the above scenario:
https://github.com/apache/geode/pull/717

Is the above approach correct? Do we have any concerns around this area?
Please give your insights on this issue.

Thanks,
Aravind Musigumpula

This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <
https://www.amdocs.com/about/email-disclaimer>


This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>




RE: DISCUSS : Monitor the neighbour JVM using neihbour's member-timeout (GEODE-3411)

2017-09-01 Thread Aravind Musigumpula
Hi Brian,

This will help if the user has some member doing a heavy duty when compared to 
others, in this case we need to give such member some extra time to that member.

Thanks,
Aravind Musigumpula 


-Original Message-
From: Brian Baynes [mailto:bbay...@pivotal.io] 
Sent: Friday, September 01, 2017 4:39 AM
To: dev@geode.apache.org
Subject: Re: DISCUSS : Monitor the neighbour JVM using neihbour's 
member-timeout (GEODE-3411)

Hi, Aravind.

Can you help me understand why this might be a useful feature for Geode?  I see 
that your needs require it, but why would users in general want to allow longer 
timeouts for some members?  This is a significant change with 
backward-compatibility implications, so would be good for the community to 
understand the potential benefit.

Thanks!
Brian

On Mon, Aug 28, 2017 at 12:20 AM, Aravind Musigumpula < 
aravind.musigump...@amdocs.com> wrote:

> Hi Team,
>
> We have a requirement to configure  different member timeout for 
> different members as we need some members to survive in the view for 
> longer time than the other the members before being kicked out of the 
> view in case they aren't responding.
>
>
> 1.   Now with the current monitoring system it is not possible to
> determine when the member will be kicked out of the view if we 
> configure different member-timeout's for some required members.
>
> 2.   Because if a member is not responding to any heartbeat requests,
> the member who is monitoring the non-responding member will initiate 
> check member request.
>
> 3.   In this check member request monitoring member pings the
> non-responding member and waits for member-timeout of monitoring 
> member for a response.
>
> 4.   If still there is no response, it will initiate a final suspect
> request to coordinator where the coordinator does the final check 
> waiting for coordinators member-timeout.
>
> 5.   If coordinator did not get any response, it will remove the
> non-responding member from the view and publishes it.
>
> 6.   So, Here the time period for removing a member depends on its
> monitoring member's and coordinator's timeout. But the monitoring 
> member depends on the view but it may change from time to time.
>
> So, now when a monitoring-member doing the check on a member, if we 
> wait for the non-responding member's timeout instead of the monitoring 
> member-timeout, then the time when the non-responding member will be 
> removed from the view depends on its own member-timeout and the 
> coordinators member-timeout.
> Hence we can configure different member-timeout for the required members.
>
> I created a pull request based on the above scenario:
> https://github.com/apache/geode/pull/717
>
> Is the above approach correct? Do we have any concerns around this area?
> Please give your insights on this issue.
>
> Thanks,
> Aravind Musigumpula
>
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer < 
> https://www.amdocs.com/about/email-disclaimer>
>
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer 
<https://www.amdocs.com/about/email-disclaimer>


Re: DISCUSS : Monitor the neighbour JVM using neihbour's member-timeout (GEODE-3411)

2017-08-31 Thread Brian Baynes
Hi, Aravind.

Can you help me understand why this might be a useful feature for Geode?  I
see that your needs require it, but why would users in general want to
allow longer timeouts for some members?  This is a significant change with
backward-compatibility implications, so would be good for the community to
understand the potential benefit.

Thanks!
Brian

On Mon, Aug 28, 2017 at 12:20 AM, Aravind Musigumpula <
aravind.musigump...@amdocs.com> wrote:

> Hi Team,
>
> We have a requirement to configure  different member timeout for different
> members as we need some members to survive in the view for longer time than
> the other the members before being kicked out of the view in case they
> aren't responding.
>
>
> 1.   Now with the current monitoring system it is not possible to
> determine when the member will be kicked out of the view if we configure
> different member-timeout's for some required members.
>
> 2.   Because if a member is not responding to any heartbeat requests,
> the member who is monitoring the non-responding member will initiate check
> member request.
>
> 3.   In this check member request monitoring member pings the
> non-responding member and waits for member-timeout of monitoring member for
> a response.
>
> 4.   If still there is no response, it will initiate a final suspect
> request to coordinator where the coordinator does the final check waiting
> for coordinators member-timeout.
>
> 5.   If coordinator did not get any response, it will remove the
> non-responding member from the view and publishes it.
>
> 6.   So, Here the time period for removing a member depends on its
> monitoring member's and coordinator's timeout. But the monitoring member
> depends on the view but it may change from time to time.
>
> So, now when a monitoring-member doing the check on a member, if we wait
> for the non-responding member's timeout instead of the monitoring
> member-timeout, then the time when the non-responding member will be
> removed from the view depends on its own member-timeout and the
> coordinators member-timeout.
> Hence we can configure different member-timeout for the required members.
>
> I created a pull request based on the above scenario:
> https://github.com/apache/geode/pull/717
>
> Is the above approach correct? Do we have any concerns around this area?
> Please give your insights on this issue.
>
> Thanks,
> Aravind Musigumpula
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer <
> https://www.amdocs.com/about/email-disclaimer>
>


DISCUSS : Monitor the neighbour JVM using neihbour's member-timeout (GEODE-3411)

2017-08-28 Thread Aravind Musigumpula
Hi Team,

We have a requirement to configure  different member timeout for different 
members as we need some members to survive in the view for longer time than the 
other the members before being kicked out of the view in case they aren't 
responding.


1.   Now with the current monitoring system it is not possible to determine 
when the member will be kicked out of the view if we configure different 
member-timeout's for some required members.

2.   Because if a member is not responding to any heartbeat requests, the 
member who is monitoring the non-responding member will initiate check member 
request.

3.   In this check member request monitoring member pings the 
non-responding member and waits for member-timeout of monitoring member for a 
response.

4.   If still there is no response, it will initiate a final suspect 
request to coordinator where the coordinator does the final check waiting for 
coordinators member-timeout.

5.   If coordinator did not get any response, it will remove the 
non-responding member from the view and publishes it.

6.   So, Here the time period for removing a member depends on its 
monitoring member's and coordinator's timeout. But the monitoring member 
depends on the view but it may change from time to time.

So, now when a monitoring-member doing the check on a member, if we wait for 
the non-responding member's timeout instead of the monitoring member-timeout, 
then the time when the non-responding member will be removed from the view 
depends on its own member-timeout and the coordinators member-timeout.
Hence we can configure different member-timeout for the required members.

I created a pull request based on the above scenario: 
https://github.com/apache/geode/pull/717

Is the above approach correct? Do we have any concerns around this area?
Please give your insights on this issue.

Thanks,
Aravind Musigumpula

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer