> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper > , yes?
I don’t know what you mean. The test that the user posted used Curator. I changed all the Curator usages to raw ZooKeeper usages and the problem still shows. -JZ > On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> wrote: > > can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper > , yes? > > I'll have a look at the jira to see if I have any insight. > > -Flavio > >> On 05 Oct 2015, at 18:15, Jordan Zimmerman <jor...@jordanzimmerman.com> >> wrote: >> >> What we’re seeing is transaction rollbacks. The bug was reported against >> Curator but I can reproduce it with raw ZooKeeper: >> https://issues.apache.org/jira/browse/CURATOR-268 >> <https://issues.apache.org/jira/browse/CURATOR-268> >> >> -JZ >> >>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> wrote: >>> >>> It is safe because the requests in the submittedRequests queue haven't been >>> prepared yet. The simplest pipeline is the one of the standalone server: >>> preprequestprocessor -> syncrequestprocessor -> finalrequestprocessor. If >>> the request hasn't gone through prepRP, then nothing has changed in the >>> state of zookeeper. The ones that have gone through prepPR will complete >>> regularly. For quorum, the pipeline is a bit more complex, but the >>> reasoning is very similar. >>> >>> -Flavio >>> >>> >>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman <jor...@jordanzimmerman.com> >>>> wrote: >>>> >>>> That would mean that there’s no safe way to shut down the server, right? >>>> Ideally, you’d want the server to shut down gracefully: a) stop receiving >>>> requests; b) complete current requests; c) shut down. That’s how most >>>> servers work. Of course, you might want a quick-die shutdown but that’s >>>> not usual behavior. >>>> >>>> -JZ >>>> >>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> wrote: >>>>> >>>>> You suggested that it is a bug, and I'm arguing that it isn't a bug. You >>>>> may want to optimize and still process the requests in the queue before >>>>> injecting RoD, but discarding them doesn't sound like a bug because you >>>>> can't guarantee that requests submitted concurrently with the server >>>>> shutting down will be executed. Optimizing isn't the same as spotting a >>>>> bug. Also, if you are trying to shut down, you probably want to do it >>>>> asap, rather than wait for a whole batch of operations to complete. >>>>> >>>>> -Flavio >>>>> >>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman <jor...@jordanzimmerman.com> >>>>>> wrote: >>>>>> >>>>>> Flavio, that isn’t logical. Just because you can’t make that guarantee >>>>>> doesn’t imply that you should flush already queued transactions. >>>>>> >>>>>> -JZ >>>>>> >>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> wrote: >>>>>>> >>>>>>> Injecting the RoD means that we are shutting down the server pipeline. >>>>>>> If the server is shutting down, then we can't guarantee that a request >>>>>>> submitted concurrently will be executed anyway, so clearing the queue >>>>>>> of submitted requests (submitted but no preped for execution) sounds >>>>>>> like correct behavior to me. >>>>>>> >>>>>>> -Flavio >>>>>>> >>>>>>> >>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth <cnaur...@hortonworks.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Jordan, >>>>>>>> >>>>>>>> That's an interesting find. I think you have a good theory. Have you >>>>>>>> already tried patching this to see if the bug reported against Curator >>>>>>>> goes away? (BTW, is there a corresponding Curator JIRA?) >>>>>>>> >>>>>>>> That logic dates all the way back to the initial import of the >>>>>>>> codebase. >>>>>>>> I can't find a definitive explanation, but my best guess is that >>>>>>>> dropping >>>>>>>> pending requests (instead of gracefully quiescing) can give a faster >>>>>>>> shutdown in the event of a heavily overloaded server. However, the >>>>>>>> correctness of this choice looks questionable, especially in >>>>>>>> stand-alone >>>>>>>> mode where you don't have a cluster of other machines to compensate. >>>>>>>> >>>>>>>> Something else interesting is that this doesn't even really guarantee >>>>>>>> that >>>>>>>> the request of death is the only thing remaining to be processed. >>>>>>>> There >>>>>>>> is no synchronization over the queue covering both the clear and the >>>>>>>> enqueue of the request of death, so I think there is a window in which >>>>>>>> other requests could trickle in ahead of the request of death. >>>>>>>> >>>>>>>> --Chris Nauroth >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" <jord...@bluejeansnet.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Why does PrepRequestProcessor.shutdown() call >>>>>>>>> submittedRequests.clear(); >>>>>>>>> before adding the death request? What if there are pending requests? >>>>>>>>> I¹m >>>>>>>>> trying to track down a bug reported in Curator. It only happens in >>>>>>>>> Standalone ZK instances. From what I can tell, shutting down a >>>>>>>>> standalone >>>>>>>>> instance might result in lost transactions. Am I looking down the >>>>>>>>> wrong >>>>>>>>> path or is this a possibility? >>>>>>>>> >>>>>>>>> -Jordan >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >