[ 
https://issues.apache.org/jira/browse/GEODE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Ramos updated GEODE-8374:
------------------------------
    Description: 
We have the following within our docs (point 4 
[here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
{noformat}
In the first phase, the membership coordinator sends out a view preparation 
message to all members and waits 12 seconds for a view preparation ack return 
message from each member. If the coordinator does not receive an ack message 
from a member within 12 seconds, the coordinator attempts to connect to the 
member’s failure-detection socket. If the coordinator cannot connect to the 
member’s failure-detection socket, the coordinator declares the member dead and 
starts the membership view protocol again from the beginning.
{noformat}
These 12 seconds refer to {{viewAckTimeout}} property within the 
{{GMSJoinLeave}} class, and it’s calculated as follows:
{code:java|title=GMSJoinLeave.java|borderStyle=solid}
    long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 10000;
    if (ackCollectionTimeout < 1500) {
      ackCollectionTimeout = 1500;
    } else if (ackCollectionTimeout > 12437) {
      ackCollectionTimeout = 12437;
    }
    ackCollectionTimeout = Long
        .getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
ackCollectionTimeout)
        .longValue();
    this.viewAckTimeout = ackCollectionTimeout;
{code}
So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, unless 
the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system property (for 
which I haven't found any tests nor anything related, meaning that _*it 
shouldn't be used at all as we don't know what the negative implications - if 
any - might be*_).
 We should either remove the internal check and allow the user to fully 
configure this property ({{member-timeout * 2}} by default) or add better 
documentation about this internal timeout and why it shouldn't be changed 
outside of the fixed interval.

  was:
We have the following within our docs (point 4 
[here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):

{noformat}
In the first phase, the membership coordinator sends out a view preparation 
message to all members and waits 12 seconds for a view preparation ack return 
message from each member. If the coordinator does not receive an ack message 
from a member within 12 seconds, the coordinator attempts to connect to the 
member’s failure-detection socket. If the coordinator cannot connect to the 
member’s failure-detection socket, the coordinator declares the member dead and 
starts the membership view protocol again from the beginning.
{noformat}

These 12 seconds refer to {{viewAckTimeout}} property within the 
{{GMSJoinLeave}} class, and it’s calculated as follows:
{code:title=GMSJoinLeave.java|borderStyle=solid}
    long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 10000;
    if (ackCollectionTimeout < 1500) {
      ackCollectionTimeout = 1500;
    } else if (ackCollectionTimeout > 12437) {
      ackCollectionTimeout = 12437;
    }
    ackCollectionTimeout = Long
        .getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
ackCollectionTimeout)
        .longValue();
    this.viewAckTimeout = ackCollectionTimeout;
{code}

So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, unless 
the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system property (for 
which I haven't found any tests nor anything related, meaning that _*it 
shouldn't be used at all as we don't know what the negative implications - if 
any - are*_).
We should either remove the internal check and allow the user to fully 
configure this property ({{member-timeout * 2}} by default) or add better 
documentation about this internal timeout and why it shouldn't be changed 
outside of the fixed interval.


> ViewAckTimeout Configuration
> ----------------------------
>
>                 Key: GEODE-8374
>                 URL: https://issues.apache.org/jira/browse/GEODE-8374
>             Project: Geode
>          Issue Type: Bug
>          Components: docs, membership
>            Reporter: Juan Ramos
>            Priority: Minor
>
> We have the following within our docs (point 4 
> [here|https://geode.apache.org/docs/guide/112/managing/network_partitioning/how_network_partitioning_management_works.html]):
> {noformat}
> In the first phase, the membership coordinator sends out a view preparation 
> message to all members and waits 12 seconds for a view preparation ack return 
> message from each member. If the coordinator does not receive an ack message 
> from a member within 12 seconds, the coordinator attempts to connect to the 
> member’s failure-detection socket. If the coordinator cannot connect to the 
> member’s failure-detection socket, the coordinator declares the member dead 
> and starts the membership view protocol again from the beginning.
> {noformat}
> These 12 seconds refer to {{viewAckTimeout}} property within the 
> {{GMSJoinLeave}} class, and it’s calculated as follows:
> {code:java|title=GMSJoinLeave.java|borderStyle=solid}
>     long ackCollectionTimeout = config.getMemberTimeout() * 2 * 12437 / 10000;
>     if (ackCollectionTimeout < 1500) {
>       ackCollectionTimeout = 1500;
>     } else if (ackCollectionTimeout > 12437) {
>       ackCollectionTimeout = 12437;
>     }
>     ackCollectionTimeout = Long
>         .getLong(GeodeGlossary.GEMFIRE_PREFIX + "VIEW_ACK_TIMEOUT", 
> ackCollectionTimeout)
>         .longValue();
>     this.viewAckTimeout = ackCollectionTimeout;
> {code}
> So, the actual value for the {{viewAckTimeout}} is {{member-timeout * 2}} 
> seconds, but it can’t be lower than {{1.5}}, neither higher than {{12}}, 
> unless the user configures the undocumented {{VIEW_ACK_TIMEOUT}} system 
> property (for which I haven't found any tests nor anything related, meaning 
> that _*it shouldn't be used at all as we don't know what the negative 
> implications - if any - might be*_).
>  We should either remove the internal check and allow the user to fully 
> configure this property ({{member-timeout * 2}} by default) or add better 
> documentation about this internal timeout and why it shouldn't be changed 
> outside of the fixed interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to