[jira] [Updated] (HELIX-26) Better support for handling network partition and process freeze

kishore gopalakrishna (JIRA) Tue, 29 Jan 2013 15:03:35 -0800

     [ 
https://issues.apache.org/jira/browse/HELIX-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


kishore gopalakrishna updated HELIX-26:
---------------------------------------

    Fix Version/s: 0.6.1-incubating
    
> Better support for handling network partition and process freeze
> ----------------------------------------------------------------
>
>                 Key: HELIX-26
>                 URL: https://issues.apache.org/jira/browse/HELIX-26
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: kishore gopalakrishna
>             Fix For: 0.6.1-incubating
>
>
> Handling network partition is tricky in distributed systems. Zookeeper allows 
> us to solve this upto some degree with the use of heart beat. But this is not 
> sufficient in large scale systems with many nodes. One of the problems is 
> that once the client detects disconnect which happens on the client side, the 
> options are
> 1. Put your self in a pause state until you reconnect.
> 2. Continue what ever you are doing until notified of session expiry.
> Unfortunately 1 is too agressive and 2 is too passive. Since Helix comes with 
> the centralized controller, its possible to have a more middle ground 
> solution where once the participant receives a disconnect event, it can check 
> with co-ordinator(s)/peers to check if it can continue operating.
> The challenge here for the node to detect if it belongs to the same partition 
> as of the co-ordinator or not. So its goal is to reach the controller, if it 
> cannot reach the controller it has to disable/fence itself.
> As of now Helix simply provides the state if its disconnected from the 
> cluster and user can either chose 1) or 2).
> This JIRA aims to investigate better ways to enhance network partition 
> detection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HELIX-26) Better support for handling network partition and process freeze

Reply via email to