Re: PrepRequestProcessor.shutdown()

Chris Nauroth Mon, 05 Oct 2015 15:48:39 -0700

I was able to do more debugging today, using Jordan's ZooKeeper-only test
(no Curator in the mix) mentioned in CURATOR-268.


https://gist.github.com/Randgalt/c3358b740f7f09d47bb0


My observation is that the transactions are getting processed and
persisted to the log, but the client's connection to the server is getting
closed before a successful reply is sent.  This causes the client to
reconnect and retry the same request.  For a retried delete, this causes a
NoNode error.  For a retried create, this causes a NodeExists error.  The
effect is also visible by logging the last zxid seen by the client, which
will show gaps.  i.e. 1, 2, 3, 5, 6, ...

I think a contributing factor is the shutdown sequence in
ZooKeeperServerMain.  I see NIOServerCnxnFactory.shutdown() getting called
and closing all connections before shutting down the request processing
pipeline.  If requests are in flight in the pipeline after sockets have
been closed, but before the pipeline has been stopped, then the server
could process the transaction and persist it, but not tell the client
about it.  I was able to reduce the frequency of test failures by
reordering the shutdown sequence so that the request processing pipeline
is shutdown before the sockets, but it appears this is only a partial fix.
 Before this change, I could only run Jordan's test for a few seconds
before it aborted.  After this change, I can keep it running for multiple
minutes, but it still aborts eventually.

Note that even if we improve the handling for this on a graceful shutdown,
this scenario would still be possible in the event of hard failure (i.e.
power loss).  I know some applications wrap the ZooKeeper API calls with
additional semantics, so that a NoNode/NodeExists response is treated as a
successful idempotent response.  For example, here is something in HBase:

http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/zookeeper/Recove
rableZooKeeper.html


I don't know the Curator codebase very well, so I don't know if it has
similar options.

--Chris Nauroth




On 10/5/15, 10:41 AM, "Flavio Junqueira" <f...@apache.org> wrote:

>You said you can reproduce it with raw ZooKeeper and I thought you're
>saying you can't reproduce with raw ZooKeeper, but sounds like I got it
>wrong.
>
>-Flavio
>
>> On 05 Oct 2015, at 18:36, Jordan Zimmerman <jor...@jordanzimmerman.com>
>>wrote:
>> 
>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw
>>>ZooKeeper , yes?
>> 
>> I don’t know what you mean. The test that the user posted used Curator.
>>I changed all the Curator usages to raw ZooKeeper usages and the problem
>>still shows.
>> 
>> -JZ
>> 
>>> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org> wrote:
>>> 
>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw
>>>ZooKeeper , yes?
>>> 
>>> I'll have a look at the jira to see if I have any insight.
>>> 
>>> -Flavio
>>> 
>>>> On 05 Oct 2015, at 18:15, Jordan Zimmerman
>>>><jor...@jordanzimmerman.com> wrote:
>>>> 
>>>> What we’re seeing is transaction rollbacks. The bug was reported
>>>>against Curator but I can reproduce it with raw ZooKeeper:
>>>>https://issues.apache.org/jira/browse/CURATOR-268
>>>><https://issues.apache.org/jira/browse/CURATOR-268>
>>>> 
>>>> -JZ
>>>> 
>>>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org> wrote:
>>>>> 
>>>>> It is safe because the requests in the submittedRequests queue
>>>>>haven't been prepared yet. The simplest pipeline is the one of the
>>>>>standalone server: preprequestprocessor -> syncrequestprocessor ->
>>>>>finalrequestprocessor. If the request hasn't gone through prepRP,
>>>>>then nothing has changed in the state of zookeeper. The ones that
>>>>>have gone through prepPR will complete regularly. For quorum, the
>>>>>pipeline is a bit more complex, but the reasoning is very similar.
>>>>> 
>>>>> -Flavio
>>>>> 
>>>>> 
>>>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman
>>>>>><jor...@jordanzimmerman.com> wrote:
>>>>>> 
>>>>>> That would mean that there’s no safe way to shut down the server,
>>>>>>right? Ideally, you’d want the server to shut down gracefully: a)
>>>>>>stop receiving requests; b) complete current requests; c) shut down.
>>>>>>That’s how most servers work. Of course, you might want a quick-die
>>>>>>shutdown but that’s not usual behavior.
>>>>>> 
>>>>>> -JZ
>>>>>> 
>>>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org>
>>>>>>>wrote:
>>>>>>> 
>>>>>>> You suggested that it is a bug, and I'm arguing that it isn't a
>>>>>>>bug. You may want to optimize and still process the requests in the
>>>>>>>queue before injecting RoD, but discarding them doesn't sound like
>>>>>>>a bug because you can't guarantee that requests submitted
>>>>>>>concurrently with the server shutting down will be executed.
>>>>>>>Optimizing isn't the same as spotting a bug. Also, if you are
>>>>>>>trying to shut down, you probably want to do it asap, rather than
>>>>>>>wait for a whole batch of operations to complete.
>>>>>>> 
>>>>>>> -Flavio
>>>>>>> 
>>>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman
>>>>>>>><jor...@jordanzimmerman.com> wrote:
>>>>>>>> 
>>>>>>>> Flavio, that isn’t logical. Just because you can’t make that
>>>>>>>>guarantee doesn’t imply that you should flush already queued
>>>>>>>>transactions.
>>>>>>>> 
>>>>>>>> -JZ
>>>>>>>> 
>>>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org>
>>>>>>>>>wrote:
>>>>>>>>> 
>>>>>>>>> Injecting the RoD means that we are shutting down the server
>>>>>>>>>pipeline. If the server is shutting down, then we can't guarantee
>>>>>>>>>that a request submitted concurrently will be executed anyway, so
>>>>>>>>>clearing the queue of submitted requests (submitted but no preped
>>>>>>>>>for execution) sounds like correct behavior to me.
>>>>>>>>> 
>>>>>>>>> -Flavio  
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth
>>>>>>>>>><cnaur...@hortonworks.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Jordan,
>>>>>>>>>> 
>>>>>>>>>> That's an interesting find.  I think you have a good theory.
>>>>>>>>>>Have you
>>>>>>>>>> already tried patching this to see if the bug reported against
>>>>>>>>>>Curator
>>>>>>>>>> goes away?  (BTW, is there a corresponding Curator JIRA?)
>>>>>>>>>> 
>>>>>>>>>> That logic dates all the way back to the initial import of the
>>>>>>>>>>codebase.
>>>>>>>>>> I can't find a definitive explanation, but my best guess is
>>>>>>>>>>that dropping
>>>>>>>>>> pending requests (instead of gracefully quiescing) can give a
>>>>>>>>>>faster
>>>>>>>>>> shutdown in the event of a heavily overloaded server.  However,
>>>>>>>>>>the
>>>>>>>>>> correctness of this choice looks questionable, especially in
>>>>>>>>>>stand-alone
>>>>>>>>>> mode where you don't have a cluster of other machines to
>>>>>>>>>>compensate.
>>>>>>>>>> 
>>>>>>>>>> Something else interesting is that this doesn't even really
>>>>>>>>>>guarantee that
>>>>>>>>>> the request of death is the only thing remaining to be
>>>>>>>>>>processed.  There
>>>>>>>>>> is no synchronization over the queue covering both the clear
>>>>>>>>>>and the
>>>>>>>>>> enqueue of the request of death, so I think there is a window
>>>>>>>>>>in which
>>>>>>>>>> other requests could trickle in ahead of the request of death.
>>>>>>>>>> 
>>>>>>>>>> --Chris Nauroth
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman"
>>>>>>>>>><jord...@bluejeansnet.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Why does PrepRequestProcessor.shutdown() call
>>>>>>>>>>>submittedRequests.clear();
>>>>>>>>>>> before adding the death request? What if there are pending
>>>>>>>>>>>requests? I¹m
>>>>>>>>>>> trying to track down a bug reported in Curator. It only
>>>>>>>>>>>happens in
>>>>>>>>>>> Standalone ZK instances. From what I can tell, shutting down a
>>>>>>>>>>>standalone
>>>>>>>>>>> instance might result in lost transactions. Am I looking down
>>>>>>>>>>>the wrong
>>>>>>>>>>> path or is this a possibility?
>>>>>>>>>>> 
>>>>>>>>>>> -Jordan
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>
>

Re: PrepRequestProcessor.shutdown()

Reply via email to