What we’re seeing is transaction rollbacks. The bug was reported against Curator but I can reproduce it with raw ZooKeeper: https://issues.apache.org/jira/browse/CURATOR-268 <https://issues.apache.org/jira/browse/CURATOR-268>
-JZ > On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> wrote: > > It is safe because the requests in the submittedRequests queue haven't been > prepared yet. The simplest pipeline is the one of the standalone server: > preprequestprocessor -> syncrequestprocessor -> finalrequestprocessor. If the > request hasn't gone through prepRP, then nothing has changed in the state of > zookeeper. The ones that have gone through prepPR will complete regularly. > For quorum, the pipeline is a bit more complex, but the reasoning is very > similar. > > -Flavio > > >> On 05 Oct 2015, at 17:55, Jordan Zimmerman <jor...@jordanzimmerman.com> >> wrote: >> >> That would mean that there’s no safe way to shut down the server, right? >> Ideally, you’d want the server to shut down gracefully: a) stop receiving >> requests; b) complete current requests; c) shut down. That’s how most >> servers work. Of course, you might want a quick-die shutdown but that’s not >> usual behavior. >> >> -JZ >> >>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> wrote: >>> >>> You suggested that it is a bug, and I'm arguing that it isn't a bug. You >>> may want to optimize and still process the requests in the queue before >>> injecting RoD, but discarding them doesn't sound like a bug because you >>> can't guarantee that requests submitted concurrently with the server >>> shutting down will be executed. Optimizing isn't the same as spotting a >>> bug. Also, if you are trying to shut down, you probably want to do it asap, >>> rather than wait for a whole batch of operations to complete. >>> >>> -Flavio >>> >>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman <jor...@jordanzimmerman.com> >>>> wrote: >>>> >>>> Flavio, that isn’t logical. Just because you can’t make that guarantee >>>> doesn’t imply that you should flush already queued transactions. >>>> >>>> -JZ >>>> >>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> wrote: >>>>> >>>>> Injecting the RoD means that we are shutting down the server pipeline. If >>>>> the server is shutting down, then we can't guarantee that a request >>>>> submitted concurrently will be executed anyway, so clearing the queue of >>>>> submitted requests (submitted but no preped for execution) sounds like >>>>> correct behavior to me. >>>>> >>>>> -Flavio >>>>> >>>>> >>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth <cnaur...@hortonworks.com> wrote: >>>>>> >>>>>> Hi Jordan, >>>>>> >>>>>> That's an interesting find. I think you have a good theory. Have you >>>>>> already tried patching this to see if the bug reported against Curator >>>>>> goes away? (BTW, is there a corresponding Curator JIRA?) >>>>>> >>>>>> That logic dates all the way back to the initial import of the codebase. >>>>>> I can't find a definitive explanation, but my best guess is that dropping >>>>>> pending requests (instead of gracefully quiescing) can give a faster >>>>>> shutdown in the event of a heavily overloaded server. However, the >>>>>> correctness of this choice looks questionable, especially in stand-alone >>>>>> mode where you don't have a cluster of other machines to compensate. >>>>>> >>>>>> Something else interesting is that this doesn't even really guarantee >>>>>> that >>>>>> the request of death is the only thing remaining to be processed. There >>>>>> is no synchronization over the queue covering both the clear and the >>>>>> enqueue of the request of death, so I think there is a window in which >>>>>> other requests could trickle in ahead of the request of death. >>>>>> >>>>>> --Chris Nauroth >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" <jord...@bluejeansnet.com> wrote: >>>>>> >>>>>>> Why does PrepRequestProcessor.shutdown() call submittedRequests.clear(); >>>>>>> before adding the death request? What if there are pending requests? I¹m >>>>>>> trying to track down a bug reported in Curator. It only happens in >>>>>>> Standalone ZK instances. From what I can tell, shutting down a >>>>>>> standalone >>>>>>> instance might result in lost transactions. Am I looking down the wrong >>>>>>> path or is this a possibility? >>>>>>> >>>>>>> -Jordan >>>>>> >>>>> >>>> >>> >> >