oh ok >There were three Nimbuses N1, N2, N3 and N1 was the leader. We submitted >topology T1 and after submission we restarted N1. N2 got leader and we >killed T1. While N1 is initializing and syncing up its topology blobs, N2 >concurrently removes the ZK path and also max sequence number path for >topology blob in progress of killing topology.
> The problem was that N1 calls KeySequenceNumber.getKeySequenceNumber() > while registering local blob, and checkExist on /blobstore/key was succeed > but following getChildren /blobstore/key fails on NoNodeException. > (/blobstoremaxsequencenumber/key might be exist or not, since N2 is > removing the paths) > In this case which is correct return value for that blob? I think we might want to have a fencing around it, probably throw an no keydeleted exception and gracefully handle the situation to the blob.If you have better thoughts I would appreciate it, I do not think we want to return any version here in this scenario as returning a lower versionmight try to make it sync again. >In scenario 2 for example, N1 and N2 has different blob content for the >blob because N1 is crashed before N2 is updating its blob. Blob for N2 was >v2 in point of content view, and promoted to v4 without any content update, >and v3 is available later when N1 is restored but discarded. > >This will be more complicated when multiple Nimbuses come in. Supposing N3 >has v3 for that blob, but N2 can still get a leadership and in result v3 >will be discarded (since leader Nimbus doesn't sync from others). >I guess the logic might be safe if we force replication factor to same to >the count of Nimbuses, but then any follower Nimbus will be SPOF, and not >sure it still be safe when new nimbus comes in. Well the idea was if a nimbus has an update for a blob and it crashes before getting replicated to another nimbus/nimbii when it comes back upagain since it is the follower now we would want to have the follower what the leader actually has and yes we would loose the update. Interestingly, if want to get over this scenario, then we should not allow update until the sync is complete on other nimbii which will make it a bit complicated. And yes we know thatwe will loose updates but since this was an initial version to merge blobstore and nimbus ha, I took an easier route but we could implement it if we do not want to loose updates. Also regarding synchronization, after looking at the code I think I remember now that possibly I put up synchronized there to handle updates.Updating zookeeper state with new version will trigger the callback and then the corresponding nimbii will call the version method to see what version it is on and increment their version it is complete. Thanks,Sanket Chintapalli On Wednesday, January 25, 2017, 3:51:26 AM CST, Jungtaek Lim <[email protected]> wrote:Follow up: for checking topology blobs (and corresponding dependencies) I filed STORM-2323 <https://issues.apache.org/jira/browse/STORM-2323> and submitted a pull request. - Jungtaek Lim (HeartSaVioR) 2017년 1월 25일 (수) 오전 8:46, Jungtaek Lim <[email protected]>님이 작성: > (adding missed dev@) > > Thanks Sanket to answer questions. Following up questions inline. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2017년 1월 25일 (수) 오전 3:13, Sanket Chintapalli <[email protected]>님이 > 작성: > > > Questions related to blobstore > > 1. In get-version-for-key, it always create KeySequenceNumber instance so > synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no > effect. But looks like the method shouldn't be run concurrently on same > key, and furthermore every Nimbuses can call the method concurrently. Then > we might need to have a kind of global lock on obtaining version of the > key. Is my understanding right? > > Firstly the KeyVersionSequenceNumber is initialized for every topology. > Yes you are correct there is no need for synchronized. I thought it was > called from someother place in an other thread. I don't think it is required > > > But get-version-for-key is called from setup-blobstore and > NimbusClient.createStateInZookeeper, which makes KeySequenceNumber called > from multiple Nimbuses concurrently. > 'synchronized' keyword is no effect so we can remove that, but what I'm > really wondering is that the logic still work safely without global lock if > multiple Nimbuses try to write their state to children of the blobstore > key, since what the method reads to determine the version can be changed > concurrently. > > > > 2. We rely on blob-sync function which is skipped on leader Nimbus, then > how leader Nimbus synchronizes its blobs? > > Here the leader is assumed to have all the blobs of active topologies. > blob-sync is only for non-leader nimbuses as any change in the state on > zookeeper will trigger them to sync the blobs. We do not want leader to > sync on itself, it would make no sense whenever state is updated. > > > Our precondition of leader is only checking topology blobs, and it doesn't > check the version of the blobs since it just compares via keys. > (We now have additional jar files as dependencies of topology but not > reflected yet - need to address before releasing 1.1.0.) > > So leader Nimbus only guarantees that it can provide topology blobs > anyway, but still doesn't guarantee for other normal blobs, and moreover > there is no guarantee that leader Nimbus has latest blobs on topology blobs. > > > https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L87-L99 > > In scenario 2 for example, N1 and N2 has different blob content for the > blob because N1 is crashed before N2 is updating its blob. Blob for N2 was > v2 in point of content view, and promoted to v4 without any content update, > and v3 is available later when N1 is restored but discarded. > > This will be more complicated when multiple Nimbuses come in. Supposing N3 > has v3 for that blob, but N2 can still get a leadership and in result v3 > will be discarded (since leader Nimbus doesn't sync from others). > > I guess the logic might be safe if we force replication factor to same to > the count of Nimbuses, but then any follower Nimbus will be SPOF, and not > sure it still be safe when new nimbus comes in. > > Please correct me if I'm missing here. > > > > 3. What's the purpose of setup-blobstore function? It seems slight > different from blob-sync function (not updating the blobs) and it runs with > both leader and follow Nimbus. > Setup-blobstore was placed as an initialization function. As far as I > remember there are ephemeral nodes on zookeeper and they disappear after > nimbus goes down, it does some initialization operations once they come > back up to sync all topology blobs not present on the local blobstore. > > > Yes right. BlobStore assumes that leader Nimbus shouldn't sync its blobs > from follower Nimbus so in this case setup-blobstore is needed. I just > thought about the case leader Nimbus need to synchronize its blobs. > > > > 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version > number goes 0 (initial version - 1). Is it meant to mark blob as invalid? > Then can we use 0 for version number if any exceptions are caught on that > method? Since shallowing the whole exceptions in that method seems intended > but we met crash on that case, and even we throw the exception and pass to > caller, rolling back is not easy. > > The idea was to say that your version is 0 then you have crashed or come > up and you might want to sync with latest blobs on leader nimbus. As setup > blobstore already does the diff of local blobs and active blobs on > zookeeper and tries to delete blobs not on zookeeper and create locally > active keys when it crashes and comes up, I think we can mark and return > the initial sequence number as 1 instead of 0. The 0 will trigger the sync > again for already download/updated blob. I presume returning the (initial > version) instead of (initial version - 1) might fix the issue. > > > Let's back to the origin issue. > > There were three Nimbuses N1, N2, N3 and N1 was the leader. We submitted > topology T1 and after submission we restarted N1. N2 got leader and we > killed T1. While N1 is initializing and syncing up its topology blobs, N2 > concurrently removes the ZK path and also max sequence number path for > topology blob in progress of killing topology. > > The problem was that N1 calls KeySequenceNumber.getKeySequenceNumber() > while registering local blob, and checkExist on /blobstore/key was > succeed but following getChildren /blobstore/key fails on NoNodeException. > (/blobstoremaxsequencenumber/key might be exist or not, since N2 is > removing the paths) > In this case which is correct return value for that blob? > > > Thanks, > Sanket Chintapalli > > On Tuesday, January 24, 2017, 10:35:50 AM CST, Sanket Chintapalli < > [email protected]> wrote: > Edit * but not 2 > > On Tuesday, January 24, 2017, 10:35:22 AM CST, Sanket Chintapalli < > [email protected]> wrote: > Yes Bobby I have to look at will reply, I think I can explain 1, 3 and 4 > but not 3. I did not understand 2. > > Thanks, > Sanket > > On Tuesday, January 24, 2017, 9:42:35 AM CST, Bobby Evans < > [email protected]> wrote: > If you have time to answer some of his question that would really be great. > > > ----- Forwarded Message ----- > *From:* Jungtaek Lim <[email protected]> > *To:* Bobby Evans <[email protected]>; Dev <[email protected]> > *Sent:* Tuesday, January 24, 2017, 9:35:36 AM CST > *Subject:* Re: Local BlobStore and Nimbus H/A > > I just skimmed the blobstore code, and some questions about the > implementation. I'd be really appreciated if someone understanding > BlobStore answers below questions. > (Questions are based on current 1.x branch, but I think master branch is > similar.) > > 1. In get-version-for-key, it always create KeySequenceNumber instance so > synchronized keyword on KeySequenceNumber.getKeySequenceNumber() is no > effect. But looks like the method shouldn't be run concurrently on same > key, and furthermore every Nimbuses can call the method concurrently. Then > we might need to have a kind of global lock on obtaining version of the > key. Is my understanding right? > > 2. We rely on blob-sync function which is skipped on leader Nimbus, then > how leader Nimbus synchronizes its blobs? > > 3. What's the purpose of setup-blobstore function? It seems slight > different from blob-sync function (not updating the blobs) and it runs with > both leader and follow Nimbus. > > 4. In KeySequenceNumber.getKeySequenceNumber(), in some situations version > number goes 0 (initial version - 1). Is it meant to mark blob as invalid? > Then can we use 0 for version number if any exceptions are caught on that > method? Since shallowing the whole exceptions in that method seems intended > but we met crash on that case, and even we throw the exception and pass to > caller, rolling back is not easy. > > Thanks in advance, > Jungtaek Lim (HeartSaVioR) > > 2017년 1월 24일 (화) 오후 12:25, Jungtaek Lim <[email protected]>님이 작성: > > > More details: there were three Nimbuses N1, N2, N3 and N1 was the leader. > > We submitted topology T1 and after submission we restarted N1. N2 got > > leader and we killed T1. While N1 is initializing and syncing up its > > topology blobs, N2 concurrently removes the ZK path and also max sequence > > number path for topology blob in progress of killing topology. This race > > condition is only occurring on Local BlobStore since removing ZK path is > > done only if Nimbus is using Local BlobStore. > > > > So it's the former case, and stopping current sync phase and restarting > > sync is an ideal way since we're just guaranteeing eventually consistent. > > I'll take a look at the codebase to see how we can apply, but it should > be > > great help for me if someone is familiar with BlobStore codebase and > > willing to handle it. > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > > > 2017년 1월 23일 (월) 오후 11:33, Bobby Evans <[email protected]>님이 작성: > > > > HA for the blobstore was set up so that ZK would hold the source of truth > > and then the other nimbus nodes would be eventually consistent with each > > other. I'm not totally sure of the issue, because I don't understand if > > this is happening in the context of a follower trying to keep up to date, > > or with a leader entering the data. If it is the latter we need some > > better fencing to prevent multiple leaders trying to write to the DB at > the > > same time. If it is the former we need some better code so the follower > > can read/update the replica it has without the possibility of trying to > > auto-vivify a node that is being deleted. Generally in these cases we > > would declare the race safe and then just start the sync process over > again. > > > > > > - Bobby > > > > > > On Monday, January 23, 2017, 2:12:18 AM CST, Jungtaek Lim < > > [email protected]> wrote: > > Hi devs, > > > > I've been struggling to resolve specific scenario, and found Local > > BlobStore cares about Nimbus failure scenarios, but not about removing > > keys. > > > > For example, I heard that Nimbus crashed in specific scenario, and error > > stack trace pointed to below code: > > > > > https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138-L149 > > > > checkExists (L138 > > < > > > https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L138 > > >) > > succeeds but getChildren (L149 > > < > > > https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/blobstore/KeySequenceNumber.java#L149 > > >) > > throws NoNodeException, in result sequenceNumbers.last() throws > > NoSuchElementException. > > > > We could have a look deeply and make some workarounds, but given that ZK > is > > accessible from every Nimbuses, we can't ensure every paths are safe. > > > > I guess that BlobStore needs global lock or single controller to handle > all > > the things right. I'm also open to any workarounds or other ideas. > > > > What do you think? > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > > > > >
