You said you can reproduce it with raw ZooKeeper and I thought you're saying 
you can't reproduce with raw ZooKeeper, but sounds like I got it wrong.

-Flavio

> On 05 Oct 2015, at 18:36, Jordan Zimmerman <jor...@jordanzimmerman.com> wrote:
> 
>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper 
>> , yes?
> 
> I don’t know what you mean. The test that the user posted used Curator. I 
> changed all the Curator usages to raw ZooKeeper usages and the problem still 
> shows.
> 
> -JZ
> 
>> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> wrote:
>> 
>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw ZooKeeper 
>> , yes?
>> 
>> I'll have a look at the jira to see if I have any insight.
>> 
>> -Flavio
>> 
>>> On 05 Oct 2015, at 18:15, Jordan Zimmerman <jor...@jordanzimmerman.com> 
>>> wrote:
>>> 
>>> What we’re seeing is transaction rollbacks. The bug was reported against 
>>> Curator but I can reproduce it with raw ZooKeeper: 
>>> https://issues.apache.org/jira/browse/CURATOR-268 
>>> <https://issues.apache.org/jira/browse/CURATOR-268>
>>> 
>>> -JZ
>>> 
>>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> wrote:
>>>> 
>>>> It is safe because the requests in the submittedRequests queue haven't 
>>>> been prepared yet. The simplest pipeline is the one of the standalone 
>>>> server: preprequestprocessor -> syncrequestprocessor -> 
>>>> finalrequestprocessor. If the request hasn't gone through prepRP, then 
>>>> nothing has changed in the state of zookeeper. The ones that have gone 
>>>> through prepPR will complete regularly. For quorum, the pipeline is a bit 
>>>> more complex, but the reasoning is very similar.
>>>> 
>>>> -Flavio
>>>> 
>>>> 
>>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman <jor...@jordanzimmerman.com> 
>>>>> wrote:
>>>>> 
>>>>> That would mean that there’s no safe way to shut down the server, right? 
>>>>> Ideally, you’d want the server to shut down gracefully: a) stop receiving 
>>>>> requests; b) complete current requests; c) shut down. That’s how most 
>>>>> servers work. Of course, you might want a quick-die shutdown but that’s 
>>>>> not usual behavior.
>>>>> 
>>>>> -JZ
>>>>> 
>>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org> wrote:
>>>>>> 
>>>>>> You suggested that it is a bug, and I'm arguing that it isn't a bug. You 
>>>>>> may want to optimize and still process the requests in the queue before 
>>>>>> injecting RoD, but discarding them doesn't sound like a bug because you 
>>>>>> can't guarantee that requests submitted concurrently with the server 
>>>>>> shutting down will be executed. Optimizing isn't the same as spotting a 
>>>>>> bug. Also, if you are trying to shut down, you probably want to do it 
>>>>>> asap, rather than wait for a whole batch of operations to complete.
>>>>>> 
>>>>>> -Flavio
>>>>>> 
>>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman <jor...@jordanzimmerman.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Flavio, that isn’t logical. Just because you can’t make that guarantee 
>>>>>>> doesn’t imply that you should flush already queued transactions.
>>>>>>> 
>>>>>>> -JZ
>>>>>>> 
>>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> Injecting the RoD means that we are shutting down the server pipeline. 
>>>>>>>> If the server is shutting down, then we can't guarantee that a request 
>>>>>>>> submitted concurrently will be executed anyway, so clearing the queue 
>>>>>>>> of submitted requests (submitted but no preped for execution) sounds 
>>>>>>>> like correct behavior to me.
>>>>>>>> 
>>>>>>>> -Flavio  
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth <cnaur...@hortonworks.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Jordan,
>>>>>>>>> 
>>>>>>>>> That's an interesting find.  I think you have a good theory.  Have you
>>>>>>>>> already tried patching this to see if the bug reported against Curator
>>>>>>>>> goes away?  (BTW, is there a corresponding Curator JIRA?)
>>>>>>>>> 
>>>>>>>>> That logic dates all the way back to the initial import of the 
>>>>>>>>> codebase.
>>>>>>>>> I can't find a definitive explanation, but my best guess is that 
>>>>>>>>> dropping
>>>>>>>>> pending requests (instead of gracefully quiescing) can give a faster
>>>>>>>>> shutdown in the event of a heavily overloaded server.  However, the
>>>>>>>>> correctness of this choice looks questionable, especially in 
>>>>>>>>> stand-alone
>>>>>>>>> mode where you don't have a cluster of other machines to compensate.
>>>>>>>>> 
>>>>>>>>> Something else interesting is that this doesn't even really guarantee 
>>>>>>>>> that
>>>>>>>>> the request of death is the only thing remaining to be processed.  
>>>>>>>>> There
>>>>>>>>> is no synchronization over the queue covering both the clear and the
>>>>>>>>> enqueue of the request of death, so I think there is a window in which
>>>>>>>>> other requests could trickle in ahead of the request of death.
>>>>>>>>> 
>>>>>>>>> --Chris Nauroth
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman" <jord...@bluejeansnet.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Why does PrepRequestProcessor.shutdown() call 
>>>>>>>>>> submittedRequests.clear();
>>>>>>>>>> before adding the death request? What if there are pending requests? 
>>>>>>>>>> I¹m
>>>>>>>>>> trying to track down a bug reported in Curator. It only happens in
>>>>>>>>>> Standalone ZK instances. From what I can tell, shutting down a 
>>>>>>>>>> standalone
>>>>>>>>>> instance might result in lost transactions. Am I looking down the 
>>>>>>>>>> wrong
>>>>>>>>>> path or is this a possibility?
>>>>>>>>>> 
>>>>>>>>>> -Jordan
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to