Hi Aisling,

In my implementation of a Chord-like protocol for the Seeks Project, I did
notice this problem. My solution is to detect failure within the call to
'stabilize'. The loop tries every successor in the list until it gets an
answer (a response to a 'predecessor' call). Within this loop it is possible
to detect that the predecessor returned is our previous successor (now dead).
Next proceed with the call to 'notify'. Now, the sole 'bug' I could notice
in the Chord protocol in this case is that the 'notify' callback needs
to ping its current predecessor (it is enough to do it every x calls or so,
not in response to every 'notify' call). The new successor (your 2186) can 
then detect that its predecessor (your 2129) is dead and sets the correct 
node (your 2042) to its place.

Summing it up, it is the 'notify' call that triggers the ping from 2186 to 
2129, *after* 2042 has updated its successor to 2186.

Hope this helps.

Note that you could take a look at my code, but this is in alpha stage.
Let me know if you really need to dig in the code.

Em.

On Tue, Jun 29, 2010 at 10:55:16AM +0100, Aisling O' Driscoll wrote:
>    Hi Michael,
>     
>    Thanks for the reply. I found a paper "Bootstrapping Chord in Ad Hoc
>    Network: Not Going Anywhere for a While" that gave a detailed example and
>    that helped me alot in understanding the sequence.
>     
>    I'm currently trying to understand how Chord claims that maintaining
>    a successor list maintains stability in the case of node failure.
>    For example if there are three node 2042, 2129, 2186 and 2129 fails. 2042
>    will next stabilize with 2129 and will not receive a reply after a timeout
>    period. It will therefore look up the next successor in its successor list
>    2186 and stabilize with it.
>     
>    If 2186 has in the meantime sent a predecessor ping to 2129 it will have
>    realised 2129 has failed, will have set its predecessor to nil and when it
>    receives the stabilize message from 2042 will return an answer of nil so
>    that when 2042 sends 2186 a notify message it will accept 2042 as its
>    predecessor. That means everything works fine.
>    However if 2186 hasn't sent a ping to its predecessor in the meantime
>    there's a problem. 2042 will stabilize with 2186 who will reply saying
>    2129 is its predecessor and 2042 will simply update 2129 as its successor
>    (after just removing it).
>     
>    I can't find any information describing the maintenance of successor lists
>    and how they're supposed to work in detail. I suppose I can assume 2042
>    somehow knows 2129 is the incorrect answer from 2186 (maybe by maintaining
>    a timer or something to allow 2186 to update) and then this might
>    work....it seems to get very messy though synchronizing of timers and I
>    suppose I'm just concerned that I'm missing the point of the successor
>    list? 
>     
>    Thanks in advance.
>    Aisling
> 
>    On Fri, Jun 25, 2010 at 8:48 PM, Michael Rogers <m.rog...@cs.ucl.ac.uk>
>    wrote:
> 
>      Hi Aisling,
> 
>      This might not be the solution you're looking for, since I don't know
>      whether it's mentioned in the SIGCOMM paper, but would it make sense for
>      a node to check whether the sender of a notify message should become its
>      successor as well as its predecessor? That would mean that after m's
>      notify message, the pointers would be Ns = m, Np = m, Ms = n, Mp = n,
>      and m's subsequent lookups through n would populate its finger table
>      with a mixture of ms and ns.
> 
>      Cheers,
>      Michael
>      Aisling O' Driscoll wrote:
>      > I am currently implementing a Chord model in OPNET and am anxious to
>      > ensure my implementation is correct (as per the Chord SIGCOMM paper).
>      I
>      > have read a lot about Chord but I still cannot work out how the ring
>      > should be constructed in a distributed manner (I understand quite well
>      > how it should work once fully formed).
>      >
>      > I describe a scenario below (Please point out if any of my logic or
>      the
>      > sequence is incorrect).
>      >
>      >    1.
>      >       Node n wishes to join a Chord ring. There are no other nodes in
>      >       the ring.
>      >    2. It sets its successor = n and its predecessor = NIL. It builds
>      its
>      >       finger table so that all finger successor entries are set to
>      itself.
>      >    3. Subsequently node m wishes to join the network. Node m queries
>      >       node n for its successor.
>      >    4. Node n replies that m*s successor is itself, n.
>      >    5. Node m sets its successor = n and its predecessor = NIL.
>      >    6. As per the Chord node joining procedure, node m then triggers
>      the
>      >       notify function informing node n of its existence so that it can
>      >       update its predecessor pointer to m. At this point, Ns = n, Np =
>      >       m, Ms = n, Mp = NIL.
>      >    7. Now this is where I am confused: Ordinarily, in a fully formed
>      >       Chord ring, node n would at some point send a periodic stabilize
>      >       function to its successor querying its predecessor. In this way
>      it
>      >       would learn about a new node that has joined in between node n
>      and
>      >       its successor, update it successor and notify the new node to
>      >       update its predecessor pointer.
>      >
>      > Clearly node n*s successor should now be node m and node m*s
>      predecessor
>      > should be node n. But node N is not going to send a stabilize packet
>      to
>      > itself so I don*t know how the pointers will be correctly updated.
>      >
>      >      8.  I also don*t understand how node m*s finger tables will be
>      > correctly constructed. Node m will construct its finger table by
>      making
>      > node n do a lookup for each entry. But this will always return node n
>      > (as that is all that*s currently located in node n*s finger table) so
>      > node m will never be referenced either node*s finger table (which is
>      > incorrect).
>      >
>      > I would greatly appreciate if someone could clarify the above as I'm
>      > stuck in my implementation at the moment.
>      >
>      > Many thanks in advance,
>      >
>      >
>      >
>      >
>      >
>      >
>      >
>      >
>      ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > p2p-hackers mailing list
>      > p2p-hackers@lists.zooko.com
>      > http://lists.zooko.com/mailman/listinfo/p2p-hackers

> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@lists.zooko.com
> http://lists.zooko.com/mailman/listinfo/p2p-hackers

_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to