Flavio, thank you for confirmation. Regarding the SyncRequestProcessor, I thought the case you described would be covered by the final flush call from within SyncRequestProcessor#shutdown. Let me know if I'm missing an important point though.
--Chris Nauroth On 10/6/15, 9:14 AM, "Flavio Junqueira" <f...@apache.org> wrote: >That's a good analysis, Chris. I might be worth doing. > >I have a new conjecture for Jordan's issue. I was having a look at >SyncRequestProcessor and the following seems to happen. This request >processor tries to accumulate requests to perform a group commit to the >transaction log. If the load is high, it tries to group as many as 1000. >The trouble is that if it receives a RoD at a high load, then it is >currently breaking the main thread loop and not flushing the requests >accumulated in the toFlush list. Note that flushing includes executing >them in FinalRequestProcessor. > >This is a change we can try to make and see if it sorts out the problem. > >-Flavio > > >> On 05 Oct 2015, at 23:47, Chris Nauroth <cnaur...@hortonworks.com> >>wrote: >> >> I was able to do more debugging today, using Jordan's ZooKeeper-only >>test >> (no Curator in the mix) mentioned in CURATOR-268. >> >> https://gist.github.com/Randgalt/c3358b740f7f09d47bb0 >> >> >> My observation is that the transactions are getting processed and >> persisted to the log, but the client's connection to the server is >>getting >> closed before a successful reply is sent. This causes the client to >> reconnect and retry the same request. For a retried delete, this >>causes a >> NoNode error. For a retried create, this causes a NodeExists error. >>The >> effect is also visible by logging the last zxid seen by the client, >>which >> will show gaps. i.e. 1, 2, 3, 5, 6, ... >> >> I think a contributing factor is the shutdown sequence in >> ZooKeeperServerMain. I see NIOServerCnxnFactory.shutdown() getting >>called >> and closing all connections before shutting down the request processing >> pipeline. If requests are in flight in the pipeline after sockets have >> been closed, but before the pipeline has been stopped, then the server >> could process the transaction and persist it, but not tell the client >> about it. I was able to reduce the frequency of test failures by >> reordering the shutdown sequence so that the request processing pipeline >> is shutdown before the sockets, but it appears this is only a partial >>fix. >> Before this change, I could only run Jordan's test for a few seconds >> before it aborted. After this change, I can keep it running for >>multiple >> minutes, but it still aborts eventually. >> >> Note that even if we improve the handling for this on a graceful >>shutdown, >> this scenario would still be possible in the event of hard failure (i.e. >> power loss). I know some applications wrap the ZooKeeper API calls with >> additional semantics, so that a NoNode/NodeExists response is treated >>as a >> successful idempotent response. For example, here is something in >>HBase: >> >> >>http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/zookeeper/Reco >>ve >> rableZooKeeper.html >> >> >> I don't know the Curator codebase very well, so I don't know if it has >> similar options. >> >> --Chris Nauroth >> >> >> >> >> On 10/5/15, 10:41 AM, "Flavio Junqueira" <f...@apache.org> wrote: >> >>> You said you can reproduce it with raw ZooKeeper and I thought you're >>> saying you can't reproduce with raw ZooKeeper, but sounds like I got it >>> wrong. >>> >>> -Flavio >>> >>>> On 05 Oct 2015, at 18:36, Jordan Zimmerman >>>><jor...@jordanzimmerman.com> >>>> wrote: >>>> >>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw >>>>> ZooKeeper , yes? >>>> >>>> I don’t know what you mean. The test that the user posted used >>>>Curator. >>>> I changed all the Curator usages to raw ZooKeeper usages and the >>>>problem >>>> still shows. >>>> >>>> -JZ >>>> >>>>> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> wrote: >>>>> >>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw >>>>> ZooKeeper , yes? >>>>> >>>>> I'll have a look at the jira to see if I have any insight. >>>>> >>>>> -Flavio >>>>> >>>>>> On 05 Oct 2015, at 18:15, Jordan Zimmerman >>>>>> <jor...@jordanzimmerman.com> wrote: >>>>>> >>>>>> What we’re seeing is transaction rollbacks. The bug was reported >>>>>> against Curator but I can reproduce it with raw ZooKeeper: >>>>>> https://issues.apache.org/jira/browse/CURATOR-268 >>>>>> <https://issues.apache.org/jira/browse/CURATOR-268> >>>>>> >>>>>> -JZ >>>>>> >>>>>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> >>>>>>>wrote: >>>>>>> >>>>>>> It is safe because the requests in the submittedRequests queue >>>>>>> haven't been prepared yet. The simplest pipeline is the one of the >>>>>>> standalone server: preprequestprocessor -> syncrequestprocessor -> >>>>>>> finalrequestprocessor. If the request hasn't gone through prepRP, >>>>>>> then nothing has changed in the state of zookeeper. The ones that >>>>>>> have gone through prepPR will complete regularly. For quorum, the >>>>>>> pipeline is a bit more complex, but the reasoning is very similar. >>>>>>> >>>>>>> -Flavio >>>>>>> >>>>>>> >>>>>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman >>>>>>>> <jor...@jordanzimmerman.com> wrote: >>>>>>>> >>>>>>>> That would mean that there’s no safe way to shut down the server, >>>>>>>> right? Ideally, you’d want the server to shut down gracefully: a) >>>>>>>> stop receiving requests; b) complete current requests; c) shut >>>>>>>>down. >>>>>>>> That’s how most servers work. Of course, you might want a >>>>>>>>quick-die >>>>>>>> shutdown but that’s not usual behavior. >>>>>>>> >>>>>>>> -JZ >>>>>>>> >>>>>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> You suggested that it is a bug, and I'm arguing that it isn't a >>>>>>>>> bug. You may want to optimize and still process the requests in >>>>>>>>>the >>>>>>>>> queue before injecting RoD, but discarding them doesn't sound >>>>>>>>>like >>>>>>>>> a bug because you can't guarantee that requests submitted >>>>>>>>> concurrently with the server shutting down will be executed. >>>>>>>>> Optimizing isn't the same as spotting a bug. Also, if you are >>>>>>>>> trying to shut down, you probably want to do it asap, rather than >>>>>>>>> wait for a whole batch of operations to complete. >>>>>>>>> >>>>>>>>> -Flavio >>>>>>>>> >>>>>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman >>>>>>>>>> <jor...@jordanzimmerman.com> wrote: >>>>>>>>>> >>>>>>>>>> Flavio, that isn’t logical. Just because you can’t make that >>>>>>>>>> guarantee doesn’t imply that you should flush already queued >>>>>>>>>> transactions. >>>>>>>>>> >>>>>>>>>> -JZ >>>>>>>>>> >>>>>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Injecting the RoD means that we are shutting down the server >>>>>>>>>>> pipeline. If the server is shutting down, then we can't >>>>>>>>>>>guarantee >>>>>>>>>>> that a request submitted concurrently will be executed anyway, >>>>>>>>>>>so >>>>>>>>>>> clearing the queue of submitted requests (submitted but no >>>>>>>>>>>preped >>>>>>>>>>> for execution) sounds like correct behavior to me. >>>>>>>>>>> >>>>>>>>>>> -Flavio >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth >>>>>>>>>>>> <cnaur...@hortonworks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Jordan, >>>>>>>>>>>> >>>>>>>>>>>> That's an interesting find. I think you have a good theory. >>>>>>>>>>>> Have you >>>>>>>>>>>> already tried patching this to see if the bug reported against >>>>>>>>>>>> Curator >>>>>>>>>>>> goes away? (BTW, is there a corresponding Curator JIRA?) >>>>>>>>>>>> >>>>>>>>>>>> That logic dates all the way back to the initial import of the >>>>>>>>>>>> codebase. >>>>>>>>>>>> I can't find a definitive explanation, but my best guess is >>>>>>>>>>>> that dropping >>>>>>>>>>>> pending requests (instead of gracefully quiescing) can give a >>>>>>>>>>>> faster >>>>>>>>>>>> shutdown in the event of a heavily overloaded server. >>>>>>>>>>>>However, >>>>>>>>>>>> the >>>>>>>>>>>> correctness of this choice looks questionable, especially in >>>>>>>>>>>> stand-alone >>>>>>>>>>>> mode where you don't have a cluster of other machines to >>>>>>>>>>>> compensate. >>>>>>>>>>>> >>>>>>>>>>>> Something else interesting is that this doesn't even really >>>>>>>>>>>> guarantee that >>>>>>>>>>>> the request of death is the only thing remaining to be >>>>>>>>>>>> processed. There >>>>>>>>>>>> is no synchronization over the queue covering both the clear >>>>>>>>>>>> and the >>>>>>>>>>>> enqueue of the request of death, so I think there is a window >>>>>>>>>>>> in which >>>>>>>>>>>> other requests could trickle in ahead of the request of death. >>>>>>>>>>>> >>>>>>>>>>>> --Chris Nauroth >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" >>>>>>>>>>>> <jord...@bluejeansnet.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Why does PrepRequestProcessor.shutdown() call >>>>>>>>>>>>> submittedRequests.clear(); >>>>>>>>>>>>> before adding the death request? What if there are pending >>>>>>>>>>>>> requests? I¹m >>>>>>>>>>>>> trying to track down a bug reported in Curator. It only >>>>>>>>>>>>> happens in >>>>>>>>>>>>> Standalone ZK instances. From what I can tell, shutting down >>>>>>>>>>>>>a >>>>>>>>>>>>> standalone >>>>>>>>>>>>> instance might result in lost transactions. Am I looking down >>>>>>>>>>>>> the wrong >>>>>>>>>>>>> path or is this a possibility? >>>>>>>>>>>>> >>>>>>>>>>>>> -Jordan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> > >