I have created ZOOKEEPER-2288 and attached my partial fix patch. Jordan, I can see you already spotted it and linked it with CURATOR-268. Thanks again for reporting this.
--Chris Nauroth On 10/6/15, 11:01 AM, "Flavio Junqueira" <f...@apache.org> wrote: >You're right that shutdown calls flush, but when the RoD request hits, >it breaks the while loop and shutdown isn't called. I suppose it is >relying on PrepRequestProcessor to call shutdown. > >-Flavio > >> On 06 Oct 2015, at 18:24, Chris Nauroth <cnaur...@hortonworks.com> >>wrote: >> >> Flavio, thank you for confirmation. >> >> Regarding the SyncRequestProcessor, I thought the case you described >>would >> be covered by the final flush call from within >> SyncRequestProcessor#shutdown. Let me know if I'm missing an important >> point though. >> >> --Chris Nauroth >> >> >> >> >> On 10/6/15, 9:14 AM, "Flavio Junqueira" <f...@apache.org> wrote: >> >>> That's a good analysis, Chris. I might be worth doing. >>> >>> I have a new conjecture for Jordan's issue. I was having a look at >>> SyncRequestProcessor and the following seems to happen. This request >>> processor tries to accumulate requests to perform a group commit to the >>> transaction log. If the load is high, it tries to group as many as >>>1000. >>> The trouble is that if it receives a RoD at a high load, then it is >>> currently breaking the main thread loop and not flushing the requests >>> accumulated in the toFlush list. Note that flushing includes executing >>> them in FinalRequestProcessor. >>> >>> This is a change we can try to make and see if it sorts out the >>>problem. >>> >>> -Flavio >>> >>> >>>> On 05 Oct 2015, at 23:47, Chris Nauroth <cnaur...@hortonworks.com> >>>> wrote: >>>> >>>> I was able to do more debugging today, using Jordan's ZooKeeper-only >>>> test >>>> (no Curator in the mix) mentioned in CURATOR-268. >>>> >>>> https://gist.github.com/Randgalt/c3358b740f7f09d47bb0 >>>> >>>> >>>> My observation is that the transactions are getting processed and >>>> persisted to the log, but the client's connection to the server is >>>> getting >>>> closed before a successful reply is sent. This causes the client to >>>> reconnect and retry the same request. For a retried delete, this >>>> causes a >>>> NoNode error. For a retried create, this causes a NodeExists error. >>>> The >>>> effect is also visible by logging the last zxid seen by the client, >>>> which >>>> will show gaps. i.e. 1, 2, 3, 5, 6, ... >>>> >>>> I think a contributing factor is the shutdown sequence in >>>> ZooKeeperServerMain. I see NIOServerCnxnFactory.shutdown() getting >>>> called >>>> and closing all connections before shutting down the request >>>>processing >>>> pipeline. If requests are in flight in the pipeline after sockets >>>>have >>>> been closed, but before the pipeline has been stopped, then the server >>>> could process the transaction and persist it, but not tell the client >>>> about it. I was able to reduce the frequency of test failures by >>>> reordering the shutdown sequence so that the request processing >>>>pipeline >>>> is shutdown before the sockets, but it appears this is only a partial >>>> fix. >>>> Before this change, I could only run Jordan's test for a few seconds >>>> before it aborted. After this change, I can keep it running for >>>> multiple >>>> minutes, but it still aborts eventually. >>>> >>>> Note that even if we improve the handling for this on a graceful >>>> shutdown, >>>> this scenario would still be possible in the event of hard failure >>>>(i.e. >>>> power loss). I know some applications wrap the ZooKeeper API calls >>>>with >>>> additional semantics, so that a NoNode/NodeExists response is treated >>>> as a >>>> successful idempotent response. For example, here is something in >>>> HBase: >>>> >>>> >>>> >>>>http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/zookeeper/Re >>>>co >>>> ve >>>> rableZooKeeper.html >>>> >>>> >>>> I don't know the Curator codebase very well, so I don't know if it has >>>> similar options. >>>> >>>> --Chris Nauroth >>>> >>>> >>>> >>>> >>>> On 10/5/15, 10:41 AM, "Flavio Junqueira" <f...@apache.org> wrote: >>>> >>>>> You said you can reproduce it with raw ZooKeeper and I thought you're >>>>> saying you can't reproduce with raw ZooKeeper, but sounds like I got >>>>>it >>>>> wrong. >>>>> >>>>> -Flavio >>>>> >>>>>> On 05 Oct 2015, at 18:36, Jordan Zimmerman >>>>>> <jor...@jordanzimmerman.com> >>>>>> wrote: >>>>>> >>>>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw >>>>>>> ZooKeeper , yes? >>>>>> >>>>>> I don’t know what you mean. The test that the user posted used >>>>>> Curator. >>>>>> I changed all the Curator usages to raw ZooKeeper usages and the >>>>>> problem >>>>>> still shows. >>>>>> >>>>>> -JZ >>>>>> >>>>>>> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> >>>>>>>wrote: >>>>>>> >>>>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw >>>>>>> ZooKeeper , yes? >>>>>>> >>>>>>> I'll have a look at the jira to see if I have any insight. >>>>>>> >>>>>>> -Flavio >>>>>>> >>>>>>>> On 05 Oct 2015, at 18:15, Jordan Zimmerman >>>>>>>> <jor...@jordanzimmerman.com> wrote: >>>>>>>> >>>>>>>> What we’re seeing is transaction rollbacks. The bug was reported >>>>>>>> against Curator but I can reproduce it with raw ZooKeeper: >>>>>>>> https://issues.apache.org/jira/browse/CURATOR-268 >>>>>>>> <https://issues.apache.org/jira/browse/CURATOR-268> >>>>>>>> >>>>>>>> -JZ >>>>>>>> >>>>>>>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> It is safe because the requests in the submittedRequests queue >>>>>>>>> haven't been prepared yet. The simplest pipeline is the one of >>>>>>>>>the >>>>>>>>> standalone server: preprequestprocessor -> syncrequestprocessor >>>>>>>>>-> >>>>>>>>> finalrequestprocessor. If the request hasn't gone through prepRP, >>>>>>>>> then nothing has changed in the state of zookeeper. The ones that >>>>>>>>> have gone through prepPR will complete regularly. For quorum, the >>>>>>>>> pipeline is a bit more complex, but the reasoning is very >>>>>>>>>similar. >>>>>>>>> >>>>>>>>> -Flavio >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman >>>>>>>>>> <jor...@jordanzimmerman.com> wrote: >>>>>>>>>> >>>>>>>>>> That would mean that there’s no safe way to shut down the >>>>>>>>>>server, >>>>>>>>>> right? Ideally, you’d want the server to shut down gracefully: >>>>>>>>>>a) >>>>>>>>>> stop receiving requests; b) complete current requests; c) shut >>>>>>>>>> down. >>>>>>>>>> That’s how most servers work. Of course, you might want a >>>>>>>>>> quick-die >>>>>>>>>> shutdown but that’s not usual behavior. >>>>>>>>>> >>>>>>>>>> -JZ >>>>>>>>>> >>>>>>>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> You suggested that it is a bug, and I'm arguing that it isn't a >>>>>>>>>>> bug. You may want to optimize and still process the requests in >>>>>>>>>>> the >>>>>>>>>>> queue before injecting RoD, but discarding them doesn't sound >>>>>>>>>>> like >>>>>>>>>>> a bug because you can't guarantee that requests submitted >>>>>>>>>>> concurrently with the server shutting down will be executed. >>>>>>>>>>> Optimizing isn't the same as spotting a bug. Also, if you are >>>>>>>>>>> trying to shut down, you probably want to do it asap, rather >>>>>>>>>>>than >>>>>>>>>>> wait for a whole batch of operations to complete. >>>>>>>>>>> >>>>>>>>>>> -Flavio >>>>>>>>>>> >>>>>>>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman >>>>>>>>>>>> <jor...@jordanzimmerman.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Flavio, that isn’t logical. Just because you can’t make that >>>>>>>>>>>> guarantee doesn’t imply that you should flush already queued >>>>>>>>>>>> transactions. >>>>>>>>>>>> >>>>>>>>>>>> -JZ >>>>>>>>>>>> >>>>>>>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Injecting the RoD means that we are shutting down the server >>>>>>>>>>>>> pipeline. If the server is shutting down, then we can't >>>>>>>>>>>>> guarantee >>>>>>>>>>>>> that a request submitted concurrently will be executed >>>>>>>>>>>>>anyway, >>>>>>>>>>>>> so >>>>>>>>>>>>> clearing the queue of submitted requests (submitted but no >>>>>>>>>>>>> preped >>>>>>>>>>>>> for execution) sounds like correct behavior to me. >>>>>>>>>>>>> >>>>>>>>>>>>> -Flavio >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth >>>>>>>>>>>>>> <cnaur...@hortonworks.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Jordan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> That's an interesting find. I think you have a good theory. >>>>>>>>>>>>>> Have you >>>>>>>>>>>>>> already tried patching this to see if the bug reported >>>>>>>>>>>>>>against >>>>>>>>>>>>>> Curator >>>>>>>>>>>>>> goes away? (BTW, is there a corresponding Curator JIRA?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> That logic dates all the way back to the initial import of >>>>>>>>>>>>>>the >>>>>>>>>>>>>> codebase. >>>>>>>>>>>>>> I can't find a definitive explanation, but my best guess is >>>>>>>>>>>>>> that dropping >>>>>>>>>>>>>> pending requests (instead of gracefully quiescing) can give >>>>>>>>>>>>>>a >>>>>>>>>>>>>> faster >>>>>>>>>>>>>> shutdown in the event of a heavily overloaded server. >>>>>>>>>>>>>> However, >>>>>>>>>>>>>> the >>>>>>>>>>>>>> correctness of this choice looks questionable, especially in >>>>>>>>>>>>>> stand-alone >>>>>>>>>>>>>> mode where you don't have a cluster of other machines to >>>>>>>>>>>>>> compensate. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Something else interesting is that this doesn't even really >>>>>>>>>>>>>> guarantee that >>>>>>>>>>>>>> the request of death is the only thing remaining to be >>>>>>>>>>>>>> processed. There >>>>>>>>>>>>>> is no synchronization over the queue covering both the clear >>>>>>>>>>>>>> and the >>>>>>>>>>>>>> enqueue of the request of death, so I think there is a >>>>>>>>>>>>>>window >>>>>>>>>>>>>> in which >>>>>>>>>>>>>> other requests could trickle in ahead of the request of >>>>>>>>>>>>>>death. >>>>>>>>>>>>>> >>>>>>>>>>>>>> --Chris Nauroth >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" >>>>>>>>>>>>>> <jord...@bluejeansnet.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why does PrepRequestProcessor.shutdown() call >>>>>>>>>>>>>>> submittedRequests.clear(); >>>>>>>>>>>>>>> before adding the death request? What if there are pending >>>>>>>>>>>>>>> requests? I¹m >>>>>>>>>>>>>>> trying to track down a bug reported in Curator. It only >>>>>>>>>>>>>>> happens in >>>>>>>>>>>>>>> Standalone ZK instances. From what I can tell, shutting >>>>>>>>>>>>>>>down >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> standalone >>>>>>>>>>>>>>> instance might result in lost transactions. Am I looking >>>>>>>>>>>>>>>down >>>>>>>>>>>>>>> the wrong >>>>>>>>>>>>>>> path or is this a possibility? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Jordan >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > >