> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper 
> , yes?

I don’t know what you mean. The test that the user posted used Curator. I 
changed all the Curator usages to raw ZooKeeper usages and the problem still 
shows.

-JZ

> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> wrote:
> 
> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper 
> , yes?
> 
> I'll have a look at the jira to see if I have any insight.
> 
> -Flavio
> 
>> On 05 Oct 2015, at 18:15, Jordan Zimmerman <jor...@jordanzimmerman.com> 
>> wrote:
>> 
>> What we’re seeing is transaction rollbacks. The bug was reported against 
>> Curator but I can reproduce it with raw ZooKeeper: 
>> https://issues.apache.org/jira/browse/CURATOR-268 
>> <https://issues.apache.org/jira/browse/CURATOR-268>
>> 
>> -JZ
>> 
>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> wrote:
>>> 
>>> It is safe because the requests in the submittedRequests queue haven't been 
>>> prepared yet. The simplest pipeline is the one of the standalone server: 
>>> preprequestprocessor -> syncrequestprocessor -> finalrequestprocessor. If 
>>> the request hasn't gone through prepRP, then nothing has changed in the 
>>> state of zookeeper. The ones that have gone through prepPR will complete 
>>> regularly. For quorum, the pipeline is a bit more complex, but the 
>>> reasoning is very similar.
>>> 
>>> -Flavio
>>> 
>>> 
>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman <jor...@jordanzimmerman.com> 
>>>> wrote:
>>>> 
>>>> That would mean that there’s no safe way to shut down the server, right? 
>>>> Ideally, you’d want the server to shut down gracefully: a) stop receiving 
>>>> requests; b) complete current requests; c) shut down. That’s how most 
>>>> servers work. Of course, you might want a quick-die shutdown but that’s 
>>>> not usual behavior.
>>>> 
>>>> -JZ
>>>> 
>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> wrote:
>>>>> 
>>>>> You suggested that it is a bug, and I'm arguing that it isn't a bug. You 
>>>>> may want to optimize and still process the requests in the queue before 
>>>>> injecting RoD, but discarding them doesn't sound like a bug because you 
>>>>> can't guarantee that requests submitted concurrently with the server 
>>>>> shutting down will be executed. Optimizing isn't the same as spotting a 
>>>>> bug. Also, if you are trying to shut down, you probably want to do it 
>>>>> asap, rather than wait for a whole batch of operations to complete.
>>>>> 
>>>>> -Flavio
>>>>> 
>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman <jor...@jordanzimmerman.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Flavio, that isn’t logical. Just because you can’t make that guarantee 
>>>>>> doesn’t imply that you should flush already queued transactions.
>>>>>> 
>>>>>> -JZ
>>>>>> 
>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> wrote:
>>>>>>> 
>>>>>>> Injecting the RoD means that we are shutting down the server pipeline. 
>>>>>>> If the server is shutting down, then we can't guarantee that a request 
>>>>>>> submitted concurrently will be executed anyway, so clearing the queue 
>>>>>>> of submitted requests (submitted but no preped for execution) sounds 
>>>>>>> like correct behavior to me.
>>>>>>> 
>>>>>>> -Flavio  
>>>>>>> 
>>>>>>> 
>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth <cnaur...@hortonworks.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Jordan,
>>>>>>>> 
>>>>>>>> That's an interesting find.  I think you have a good theory.  Have you
>>>>>>>> already tried patching this to see if the bug reported against Curator
>>>>>>>> goes away?  (BTW, is there a corresponding Curator JIRA?)
>>>>>>>> 
>>>>>>>> That logic dates all the way back to the initial import of the 
>>>>>>>> codebase.
>>>>>>>> I can't find a definitive explanation, but my best guess is that 
>>>>>>>> dropping
>>>>>>>> pending requests (instead of gracefully quiescing) can give a faster
>>>>>>>> shutdown in the event of a heavily overloaded server.  However, the
>>>>>>>> correctness of this choice looks questionable, especially in 
>>>>>>>> stand-alone
>>>>>>>> mode where you don't have a cluster of other machines to compensate.
>>>>>>>> 
>>>>>>>> Something else interesting is that this doesn't even really guarantee 
>>>>>>>> that
>>>>>>>> the request of death is the only thing remaining to be processed.  
>>>>>>>> There
>>>>>>>> is no synchronization over the queue covering both the clear and the
>>>>>>>> enqueue of the request of death, so I think there is a window in which
>>>>>>>> other requests could trickle in ahead of the request of death.
>>>>>>>> 
>>>>>>>> --Chris Nauroth
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" <jord...@bluejeansnet.com> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Why does PrepRequestProcessor.shutdown() call 
>>>>>>>>> submittedRequests.clear();
>>>>>>>>> before adding the death request? What if there are pending requests? 
>>>>>>>>> I¹m
>>>>>>>>> trying to track down a bug reported in Curator. It only happens in
>>>>>>>>> Standalone ZK instances. From what I can tell, shutting down a 
>>>>>>>>> standalone
>>>>>>>>> instance might result in lost transactions. Am I looking down the 
>>>>>>>>> wrong
>>>>>>>>> path or is this a possibility?
>>>>>>>>> 
>>>>>>>>> -Jordan
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to