I have created ZOOKEEPER-2288 and attached my partial fix patch.  Jordan,
I can see you already spotted it and linked it with CURATOR-268.  Thanks
again for reporting this.

--Chris Nauroth




On 10/6/15, 11:01 AM, "Flavio Junqueira" <f...@apache.org> wrote:

>You're right that shutdown calls flush, but when the RoD request  hits,
>it breaks the while loop and shutdown isn't called. I suppose it is
>relying on PrepRequestProcessor to call shutdown.
>
>-Flavio
>
>> On 06 Oct 2015, at 18:24, Chris Nauroth <cnaur...@hortonworks.com>
>>wrote:
>> 
>> Flavio, thank you for confirmation.
>> 
>> Regarding the SyncRequestProcessor, I thought the case you described
>>would
>> be covered by the final flush call from within
>> SyncRequestProcessor#shutdown.  Let me know if I'm missing an important
>> point though.
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 10/6/15, 9:14 AM, "Flavio Junqueira" <f...@apache.org> wrote:
>> 
>>> That's a good analysis, Chris. I might be worth doing.
>>> 
>>> I have a new conjecture for Jordan's issue. I was having a look at
>>> SyncRequestProcessor and the following seems to happen. This request
>>> processor tries to accumulate requests to perform a group commit to the
>>> transaction log. If the load is high, it tries to group as many as
>>>1000.
>>> The trouble is that if it receives a RoD at a high load, then it is
>>> currently breaking the main thread loop and not flushing the requests
>>> accumulated in the toFlush list. Note that flushing includes executing
>>> them in FinalRequestProcessor.
>>> 
>>> This is a change we can try to make and see if it sorts out the
>>>problem.
>>> 
>>> -Flavio
>>> 
>>> 
>>>> On 05 Oct 2015, at 23:47, Chris Nauroth <cnaur...@hortonworks.com>
>>>> wrote:
>>>> 
>>>> I was able to do more debugging today, using Jordan's ZooKeeper-only
>>>> test
>>>> (no Curator in the mix) mentioned in CURATOR-268.
>>>> 
>>>> https://gist.github.com/Randgalt/c3358b740f7f09d47bb0
>>>> 
>>>> 
>>>> My observation is that the transactions are getting processed and
>>>> persisted to the log, but the client's connection to the server is
>>>> getting
>>>> closed before a successful reply is sent.  This causes the client to
>>>> reconnect and retry the same request.  For a retried delete, this
>>>> causes a
>>>> NoNode error.  For a retried create, this causes a NodeExists error.
>>>> The
>>>> effect is also visible by logging the last zxid seen by the client,
>>>> which
>>>> will show gaps.  i.e. 1, 2, 3, 5, 6, ...
>>>> 
>>>> I think a contributing factor is the shutdown sequence in
>>>> ZooKeeperServerMain.  I see NIOServerCnxnFactory.shutdown() getting
>>>> called
>>>> and closing all connections before shutting down the request
>>>>processing
>>>> pipeline.  If requests are in flight in the pipeline after sockets
>>>>have
>>>> been closed, but before the pipeline has been stopped, then the server
>>>> could process the transaction and persist it, but not tell the client
>>>> about it.  I was able to reduce the frequency of test failures by
>>>> reordering the shutdown sequence so that the request processing
>>>>pipeline
>>>> is shutdown before the sockets, but it appears this is only a partial
>>>> fix.
>>>> Before this change, I could only run Jordan's test for a few seconds
>>>> before it aborted.  After this change, I can keep it running for
>>>> multiple
>>>> minutes, but it still aborts eventually.
>>>> 
>>>> Note that even if we improve the handling for this on a graceful
>>>> shutdown,
>>>> this scenario would still be possible in the event of hard failure
>>>>(i.e.
>>>> power loss).  I know some applications wrap the ZooKeeper API calls
>>>>with
>>>> additional semantics, so that a NoNode/NodeExists response is treated
>>>> as a
>>>> successful idempotent response.  For example, here is something in
>>>> HBase:
>>>> 
>>>> 
>>>> 
>>>>http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/zookeeper/Re
>>>>co
>>>> ve
>>>> rableZooKeeper.html
>>>> 
>>>> 
>>>> I don't know the Curator codebase very well, so I don't know if it has
>>>> similar options.
>>>> 
>>>> --Chris Nauroth
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 10/5/15, 10:41 AM, "Flavio Junqueira" <f...@apache.org> wrote:
>>>> 
>>>>> You said you can reproduce it with raw ZooKeeper and I thought you're
>>>>> saying you can't reproduce with raw ZooKeeper, but sounds like I got
>>>>>it
>>>>> wrong.
>>>>> 
>>>>> -Flavio
>>>>> 
>>>>>> On 05 Oct 2015, at 18:36, Jordan Zimmerman
>>>>>> <jor...@jordanzimmerman.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw
>>>>>>> ZooKeeper , yes?
>>>>>> 
>>>>>> I don’t know what you mean. The test that the user posted used
>>>>>> Curator.
>>>>>> I changed all the Curator usages to raw ZooKeeper usages and the
>>>>>> problem
>>>>>> still shows.
>>>>>> 
>>>>>> -JZ
>>>>>> 
>>>>>>> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org>
>>>>>>>wrote:
>>>>>>> 
>>>>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw
>>>>>>> ZooKeeper , yes?
>>>>>>> 
>>>>>>> I'll have a look at the jira to see if I have any insight.
>>>>>>> 
>>>>>>> -Flavio
>>>>>>> 
>>>>>>>> On 05 Oct 2015, at 18:15, Jordan Zimmerman
>>>>>>>> <jor...@jordanzimmerman.com> wrote:
>>>>>>>> 
>>>>>>>> What we’re seeing is transaction rollbacks. The bug was reported
>>>>>>>> against Curator but I can reproduce it with raw ZooKeeper:
>>>>>>>> https://issues.apache.org/jira/browse/CURATOR-268
>>>>>>>> <https://issues.apache.org/jira/browse/CURATOR-268>
>>>>>>>> 
>>>>>>>> -JZ
>>>>>>>> 
>>>>>>>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> It is safe because the requests in the submittedRequests queue
>>>>>>>>> haven't been prepared yet. The simplest pipeline is the one of
>>>>>>>>>the
>>>>>>>>> standalone server: preprequestprocessor -> syncrequestprocessor
>>>>>>>>>->
>>>>>>>>> finalrequestprocessor. If the request hasn't gone through prepRP,
>>>>>>>>> then nothing has changed in the state of zookeeper. The ones that
>>>>>>>>> have gone through prepPR will complete regularly. For quorum, the
>>>>>>>>> pipeline is a bit more complex, but the reasoning is very
>>>>>>>>>similar.
>>>>>>>>> 
>>>>>>>>> -Flavio
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman
>>>>>>>>>> <jor...@jordanzimmerman.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> That would mean that there’s no safe way to shut down the
>>>>>>>>>>server,
>>>>>>>>>> right? Ideally, you’d want the server to shut down gracefully:
>>>>>>>>>>a)
>>>>>>>>>> stop receiving requests; b) complete current requests; c) shut
>>>>>>>>>> down.
>>>>>>>>>> That’s how most servers work. Of course, you might want a
>>>>>>>>>> quick-die
>>>>>>>>>> shutdown but that’s not usual behavior.
>>>>>>>>>> 
>>>>>>>>>> -JZ
>>>>>>>>>> 
>>>>>>>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> You suggested that it is a bug, and I'm arguing that it isn't a
>>>>>>>>>>> bug. You may want to optimize and still process the requests in
>>>>>>>>>>> the
>>>>>>>>>>> queue before injecting RoD, but discarding them doesn't sound
>>>>>>>>>>> like
>>>>>>>>>>> a bug because you can't guarantee that requests submitted
>>>>>>>>>>> concurrently with the server shutting down will be executed.
>>>>>>>>>>> Optimizing isn't the same as spotting a bug. Also, if you are
>>>>>>>>>>> trying to shut down, you probably want to do it asap, rather
>>>>>>>>>>>than
>>>>>>>>>>> wait for a whole batch of operations to complete.
>>>>>>>>>>> 
>>>>>>>>>>> -Flavio
>>>>>>>>>>> 
>>>>>>>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman
>>>>>>>>>>>> <jor...@jordanzimmerman.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Flavio, that isn’t logical. Just because you can’t make that
>>>>>>>>>>>> guarantee doesn’t imply that you should flush already queued
>>>>>>>>>>>> transactions.
>>>>>>>>>>>> 
>>>>>>>>>>>> -JZ
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Injecting the RoD means that we are shutting down the server
>>>>>>>>>>>>> pipeline. If the server is shutting down, then we can't
>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>> that a request submitted concurrently will be executed
>>>>>>>>>>>>>anyway,
>>>>>>>>>>>>> so
>>>>>>>>>>>>> clearing the queue of submitted requests (submitted but no
>>>>>>>>>>>>> preped
>>>>>>>>>>>>> for execution) sounds like correct behavior to me.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Flavio
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth
>>>>>>>>>>>>>> <cnaur...@hortonworks.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Jordan,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That's an interesting find.  I think you have a good theory.
>>>>>>>>>>>>>> Have you
>>>>>>>>>>>>>> already tried patching this to see if the bug reported
>>>>>>>>>>>>>>against
>>>>>>>>>>>>>> Curator
>>>>>>>>>>>>>> goes away?  (BTW, is there a corresponding Curator JIRA?)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That logic dates all the way back to the initial import of
>>>>>>>>>>>>>>the
>>>>>>>>>>>>>> codebase.
>>>>>>>>>>>>>> I can't find a definitive explanation, but my best guess is
>>>>>>>>>>>>>> that dropping
>>>>>>>>>>>>>> pending requests (instead of gracefully quiescing) can give
>>>>>>>>>>>>>>a
>>>>>>>>>>>>>> faster
>>>>>>>>>>>>>> shutdown in the event of a heavily overloaded server.
>>>>>>>>>>>>>> However,
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> correctness of this choice looks questionable, especially in
>>>>>>>>>>>>>> stand-alone
>>>>>>>>>>>>>> mode where you don't have a cluster of other machines to
>>>>>>>>>>>>>> compensate.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Something else interesting is that this doesn't even really
>>>>>>>>>>>>>> guarantee that
>>>>>>>>>>>>>> the request of death is the only thing remaining to be
>>>>>>>>>>>>>> processed.  There
>>>>>>>>>>>>>> is no synchronization over the queue covering both the clear
>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>> enqueue of the request of death, so I think there is a
>>>>>>>>>>>>>>window
>>>>>>>>>>>>>> in which
>>>>>>>>>>>>>> other requests could trickle in ahead of the request of
>>>>>>>>>>>>>>death.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --Chris Nauroth
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman"
>>>>>>>>>>>>>> <jord...@bluejeansnet.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Why does PrepRequestProcessor.shutdown() call
>>>>>>>>>>>>>>> submittedRequests.clear();
>>>>>>>>>>>>>>> before adding the death request? What if there are pending
>>>>>>>>>>>>>>> requests? I¹m
>>>>>>>>>>>>>>> trying to track down a bug reported in Curator. It only
>>>>>>>>>>>>>>> happens in
>>>>>>>>>>>>>>> Standalone ZK instances. From what I can tell, shutting
>>>>>>>>>>>>>>>down
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>> instance might result in lost transactions. Am I looking
>>>>>>>>>>>>>>>down
>>>>>>>>>>>>>>> the wrong
>>>>>>>>>>>>>>> path or is this a possibility?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>
>

Reply via email to