(adding missed dev@)

Thanks Sanket to answer questions. Following up questions inline.

Thanks,
Jungtaek Lim (HeartSaVioR)

2017년 1월 25일 (수) 오전 3:13, Sanket Chintapalli <[email protected]>님이 작성:

>
> Questions related to blobstore
>
> 1.  In get-version-for-key, it always create KeySequenceNumber instance so
> synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no
> effect. But looks like the method shouldn't be run concurrently on same
> key, and furthermore every Nimbuses can call the method concurrently. Then
> we might need to have a kind of global lock on obtaining version of the
> key. Is my understanding right?
>
> Firstly the KeyVersionSequenceNumber is initialized for every topology.
> Yes you are correct there is no need for synchronized. I thought it was
> called from someother place in an other thread. I don't think it is required
>

But get-version-for-key is called from setup-blobstore and
NimbusClient.createStateInZookeeper, which makes KeySequenceNumber called
from multiple Nimbuses concurrently.
'synchronized' keyword is no effect so we can remove that, but what I'm
really wondering is that the logic still work safely without global lock if
multiple Nimbuses try to write their state to children of the blobstore
key, since what the method reads to determine the version can be changed
concurrently.


>
> 2.  We rely on blob-sync function which is skipped on leader Nimbus, then
> how leader Nimbus synchronizes its blobs?
>
> Here the leader is assumed to have all the blobs of active topologies.
> blob-sync is only for non-leader nimbuses as any change in the state on
> zookeeper will trigger them to sync the blobs. We do not want leader to
> sync on itself, it would make no sense whenever state is updated.
>

Our precondition of leader is only checking topology blobs, and it doesn't
check the version of the blobs since it just compares via keys.
(We now have additional jar files as dependencies of topology but not
reflected yet - need to address before releasing 1.1.0.)

So leader Nimbus only guarantees that it can provide topology blobs anyway,
but still doesn't guarantee for other normal blobs, and moreover there is
no guarantee that leader Nimbus has latest blobs on topology blobs.

https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L87-L99

In scenario 2 for example, N1 and N2 has different blob content for the
blob because N1 is crashed before N2 is updating its blob. Blob for N2 was
v2 in point of content view, and promoted to v4 without any content update,
and v3 is available later when N1 is restored but discarded.

This will be more complicated when multiple Nimbuses come in. Supposing N3
has v3 for that blob, but N2 can still get a leadership and in result v3
will be discarded (since leader Nimbus doesn't sync from others).

I guess the logic might be safe if we force replication factor to same to
the count of Nimbuses, but then any follower Nimbus will be SPOF, and not
sure it still be safe when new nimbus comes in.

Please correct me if I'm missing here.


>
> 3. What's the purpose of setup-blobstore function? It seems slight
> different from blob-sync function (not updating the blobs) and it runs with
> both leader and follow Nimbus.
> Setup-blobstore was placed as an initialization function. As far as I
> remember there are ephemeral nodes on zookeeper and they disappear after
> nimbus goes down, it does some initialization operations once they come
> back up to sync all topology blobs not present on the local blobstore.
>

Yes right. BlobStore assumes that leader Nimbus shouldn't sync its blobs
from follower Nimbus so in this case setup-blobstore is needed. I just
thought about the case leader Nimbus need to synchronize its blobs.


>
> 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version
> number goes 0 (initial version - 1). Is it meant to mark blob as invalid?
> Then can we use 0 for version number if any exceptions are caught on that
> method? Since shallowing the whole exceptions in that method seems intended
> but we met crash on that case, and even we throw the exception and pass to
> caller, rolling back is not easy.
>
> The idea was to say that your version is 0 then you have crashed or come
> up and you might want to sync with latest blobs on leader nimbus. As setup
> blobstore already does the diff of local blobs and active blobs on
> zookeeper and tries to delete blobs not on zookeeper and create locally
> active keys when it crashes and comes up, I think we can mark and return
> the initial sequence number as 1 instead of 0. The 0 will trigger the sync
> again for already download/updated blob. I presume returning the (initial
> version) instead of (initial version - 1) might fix the issue.
>
>
Let's back to the origin issue.

There were three Nimbuses N1, N2, N3 and N1 was the leader. We submitted
topology T1 and after submission we restarted N1. N2 got leader and we
killed T1. While N1 is initializing and syncing up its topology blobs, N2
concurrently removes the ZK path and also max sequence number path for
topology blob in progress of killing topology.

The problem was that N1 calls KeySequenceNumber.getKeySequenceNumber()
while registering local blob, and checkExist on /blobstore/key was succeed
but following getChildren /blobstore/key fails on NoNodeException.
(/blobstoremaxsequencenumber/key might be exist or not, since N2 is
removing the paths)
In this case which is correct return value for that blob?


> Thanks,
> Sanket Chintapalli
>
> On Tuesday, January 24, 2017, 10:35:50 AM CST, Sanket Chintapalli <
> [email protected]> wrote:
> Edit * but not 2
>
> On Tuesday, January 24, 2017, 10:35:22 AM CST, Sanket Chintapalli <
> [email protected]> wrote:
> Yes Bobby I have to look at will reply, I think I can explain 1, 3 and 4
> but not 3. I did not understand 2.
>
> Thanks,
> Sanket
>
> On Tuesday, January 24, 2017, 9:42:35 AM CST, Bobby Evans <
> [email protected]> wrote:
> If you have time to answer some of his question that would really be great.
>
>
> ----- Forwarded Message -----
> *From:* Jungtaek Lim <[email protected]>
> *To:* Bobby Evans <[email protected]>; Dev <[email protected]>
> *Sent:* Tuesday, January 24, 2017, 9:35:36 AM CST
> *Subject:* Re: Local BlobStore and Nimbus H/A
>
> I just skimmed the blobstore code, and some questions about the
> implementation. I'd be really appreciated if someone understanding
> BlobStore answers below questions.
> (Questions are based on current 1.x branch, but I think master branch is
> similar.)
>
> 1. In get-version-for-key, it always create KeySequenceNumber instance so
> synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no
> effect. But looks like the method shouldn't be run concurrently on same
> key, and furthermore every Nimbuses can call the method concurrently. Then
> we might need to have a kind of global lock on obtaining version of the
> key. Is my understanding right?
>
> 2. We rely on blob-sync function which is skipped on leader Nimbus, then
> how leader Nimbus synchronizes its blobs?
>
> 3. What's the purpose of setup-blobstore function? It seems slight
> different from blob-sync function (not updating the blobs) and it runs with
> both leader and follow Nimbus.
>
> 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version
> number goes 0 (initial version - 1). Is it meant to mark blob as invalid?
> Then can we use 0 for version number if any exceptions are caught on that
> method? Since shallowing the whole exceptions in that method seems intended
> but we met crash on that case, and even we throw the exception and pass to
> caller, rolling back is not easy.
>
> Thanks in advance,
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 1월 24일 (화) 오후 12:25, Jungtaek Lim <[email protected]>님이 작성:
>
> > More details: there were three Nimbuses N1, N2, N3 and N1 was the leader.
> > We submitted topology T1 and after submission we restarted N1. N2 got
> > leader and we killed T1. While N1 is initializing and syncing up its
> > topology blobs, N2 concurrently removes the ZK path and also max sequence
> > number path for topology blob in progress of killing topology. This race
> > condition is only occurring on Local BlobStore since removing ZK path is
> > done only if Nimbus is using Local BlobStore.
> >
> > So it's the former case, and stopping current sync phase and restarting
> > sync is an ideal way since we're just guaranteeing eventually consistent.
> > I'll take a look at the codebase to see how we can apply, but it should
> be
> > great help for me if someone is familiar with BlobStore codebase and
> > willing to handle it.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2017년 1월 23일 (월) 오후 11:33, Bobby Evans <[email protected]>님이 작성:
> >
> > HA for the blobstore was set up so that ZK would hold the source of truth
> > and then the other nimbus nodes would be eventually consistent with each
> > other.  I'm not totally sure of the issue, because I don't understand if
> > this is happening in the context of a follower trying to keep up to date,
> > or with a leader entering the data.  If it is the latter we need some
> > better fencing to prevent multiple leaders trying to write to the DB at
> the
> > same time.  If it is the former we need some better code so the follower
> > can read/update the replica it has without the possibility of trying to
> > auto-vivify a node that is being deleted.  Generally in these cases we
> > would declare the race safe and then just start the sync process over
> again.
> >
> >
> > - Bobby
> >
> >
> > On Monday, January 23, 2017, 2:12:18 AM CST, Jungtaek Lim <
> > [email protected]> wrote:
> > Hi devs,
> >
> > I've been struggling to resolve specific scenario, and found Local
> > BlobStore cares about Nimbus failure scenarios, but not about removing
> > keys.
> >
> > For example, I heard that Nimbus crashed in specific scenario, and error
> > stack trace pointed to below code:
> >
> >
> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138-L149
> >
> > checkExists (L138
> > <
> >
> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138
> > >)
> > succeeds but getChildren (L149
> > <
> >
> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L149
> > >)
> > throws NoNodeException, in result sequenceNumbers.last() throws
> > NoSuchElementException.
> >
> > We could have a look deeply and make some workarounds, but given that ZK
> is
> > accessible from every Nimbuses, we can't ensure every paths are safe.
> >
> > I guess that BlobStore needs global lock or single controller to handle
> all
> > the things right. I'm also open to any workarounds or other ideas.
> >
> > What do you think?
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
>

Reply via email to