You said you can reproduce it with raw ZooKeeper and I thought you're saying you can't reproduce with raw ZooKeeper, but sounds like I got it wrong.
-Flavio > On 05 Oct 2015, at 18:36, Jordan Zimmerman <jor...@jordanzimmerman.com> wrote: > >> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper >> , yes? > > I don’t know what you mean. The test that the user posted used Curator. I > changed all the Curator usages to raw ZooKeeper usages and the problem still > shows. > > -JZ > >> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> wrote: >> >> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper >> , yes? >> >> I'll have a look at the jira to see if I have any insight. >> >> -Flavio >> >>> On 05 Oct 2015, at 18:15, Jordan Zimmerman <jor...@jordanzimmerman.com> >>> wrote: >>> >>> What we’re seeing is transaction rollbacks. The bug was reported against >>> Curator but I can reproduce it with raw ZooKeeper: >>> https://issues.apache.org/jira/browse/CURATOR-268 >>> <https://issues.apache.org/jira/browse/CURATOR-268> >>> >>> -JZ >>> >>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> wrote: >>>> >>>> It is safe because the requests in the submittedRequests queue haven't >>>> been prepared yet. The simplest pipeline is the one of the standalone >>>> server: preprequestprocessor -> syncrequestprocessor -> >>>> finalrequestprocessor. If the request hasn't gone through prepRP, then >>>> nothing has changed in the state of zookeeper. The ones that have gone >>>> through prepPR will complete regularly. For quorum, the pipeline is a bit >>>> more complex, but the reasoning is very similar. >>>> >>>> -Flavio >>>> >>>> >>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman <jor...@jordanzimmerman.com> >>>>> wrote: >>>>> >>>>> That would mean that there’s no safe way to shut down the server, right? >>>>> Ideally, you’d want the server to shut down gracefully: a) stop receiving >>>>> requests; b) complete current requests; c) shut down. That’s how most >>>>> servers work. Of course, you might want a quick-die shutdown but that’s >>>>> not usual behavior. >>>>> >>>>> -JZ >>>>> >>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> wrote: >>>>>> >>>>>> You suggested that it is a bug, and I'm arguing that it isn't a bug. You >>>>>> may want to optimize and still process the requests in the queue before >>>>>> injecting RoD, but discarding them doesn't sound like a bug because you >>>>>> can't guarantee that requests submitted concurrently with the server >>>>>> shutting down will be executed. Optimizing isn't the same as spotting a >>>>>> bug. Also, if you are trying to shut down, you probably want to do it >>>>>> asap, rather than wait for a whole batch of operations to complete. >>>>>> >>>>>> -Flavio >>>>>> >>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman <jor...@jordanzimmerman.com> >>>>>>> wrote: >>>>>>> >>>>>>> Flavio, that isn’t logical. Just because you can’t make that guarantee >>>>>>> doesn’t imply that you should flush already queued transactions. >>>>>>> >>>>>>> -JZ >>>>>>> >>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> wrote: >>>>>>>> >>>>>>>> Injecting the RoD means that we are shutting down the server pipeline. >>>>>>>> If the server is shutting down, then we can't guarantee that a request >>>>>>>> submitted concurrently will be executed anyway, so clearing the queue >>>>>>>> of submitted requests (submitted but no preped for execution) sounds >>>>>>>> like correct behavior to me. >>>>>>>> >>>>>>>> -Flavio >>>>>>>> >>>>>>>> >>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth <cnaur...@hortonworks.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Jordan, >>>>>>>>> >>>>>>>>> That's an interesting find. I think you have a good theory. Have you >>>>>>>>> already tried patching this to see if the bug reported against Curator >>>>>>>>> goes away? (BTW, is there a corresponding Curator JIRA?) >>>>>>>>> >>>>>>>>> That logic dates all the way back to the initial import of the >>>>>>>>> codebase. >>>>>>>>> I can't find a definitive explanation, but my best guess is that >>>>>>>>> dropping >>>>>>>>> pending requests (instead of gracefully quiescing) can give a faster >>>>>>>>> shutdown in the event of a heavily overloaded server. However, the >>>>>>>>> correctness of this choice looks questionable, especially in >>>>>>>>> stand-alone >>>>>>>>> mode where you don't have a cluster of other machines to compensate. >>>>>>>>> >>>>>>>>> Something else interesting is that this doesn't even really guarantee >>>>>>>>> that >>>>>>>>> the request of death is the only thing remaining to be processed. >>>>>>>>> There >>>>>>>>> is no synchronization over the queue covering both the clear and the >>>>>>>>> enqueue of the request of death, so I think there is a window in which >>>>>>>>> other requests could trickle in ahead of the request of death. >>>>>>>>> >>>>>>>>> --Chris Nauroth >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" <jord...@bluejeansnet.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Why does PrepRequestProcessor.shutdown() call >>>>>>>>>> submittedRequests.clear(); >>>>>>>>>> before adding the death request? What if there are pending requests? >>>>>>>>>> I¹m >>>>>>>>>> trying to track down a bug reported in Curator. It only happens in >>>>>>>>>> Standalone ZK instances. From what I can tell, shutting down a >>>>>>>>>> standalone >>>>>>>>>> instance might result in lost transactions. Am I looking down the >>>>>>>>>> wrong >>>>>>>>>> path or is this a possibility? >>>>>>>>>> >>>>>>>>>> -Jordan >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >