Thank you!
> On Oct 10, 2015, at 5:36 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote:
>
> I have created ZOOKEEPER-2288 and attached my partial fix patch. Jordan,
> I can see you already spotted it and linked it with CURATOR-268. Thanks
> again for reporting this.
>
> --Chris Nauroth
>
>
>
>
> On 10/6/15, 11:01 AM, "Flavio Junqueira" <f...@apache.org> wrote:
>
>> You're right that shutdown calls flush, but when the RoD request hits,
>> it breaks the while loop and shutdown isn't called. I suppose it is
>> relying on PrepRequestProcessor to call shutdown.
>>
>> -Flavio
>>
>>> On 06 Oct 2015, at 18:24, Chris Nauroth <cnaur...@hortonworks.com>
>>> wrote:
>>>
>>> Flavio, thank you for confirmation.
>>>
>>> Regarding the SyncRequestProcessor, I thought the case you described
>>> would
>>> be covered by the final flush call from within
>>> SyncRequestProcessor#shutdown. Let me know if I'm missing an important
>>> point though.
>>>
>>> --Chris Nauroth
>>>
>>>
>>>
>>>
>>> On 10/6/15, 9:14 AM, "Flavio Junqueira" <f...@apache.org> wrote:
>>>
>>>> That's a good analysis, Chris. I might be worth doing.
>>>>
>>>> I have a new conjecture for Jordan's issue. I was having a look at
>>>> SyncRequestProcessor and the following seems to happen. This request
>>>> processor tries to accumulate requests to perform a group commit to the
>>>> transaction log. If the load is high, it tries to group as many as
>>>> 1000.
>>>> The trouble is that if it receives a RoD at a high load, then it is
>>>> currently breaking the main thread loop and not flushing the requests
>>>> accumulated in the toFlush list. Note that flushing includes executing
>>>> them in FinalRequestProcessor.
>>>>
>>>> This is a change we can try to make and see if it sorts out the
>>>> problem.
>>>>
>>>> -Flavio
>>>>
>>>>
>>>>> On 05 Oct 2015, at 23:47, Chris Nauroth <cnaur...@hortonworks.com>
>>>>> wrote:
>>>>>
>>>>> I was able to do more debugging today, using Jordan's ZooKeeper-only
>>>>> test
>>>>> (no Curator in the mix) mentioned in CURATOR-268.
>>>>>
>>>>> https://gist.github.com/Randgalt/c3358b740f7f09d47bb0
>>>>>
>>>>>
>>>>> My observation is that the transactions are getting processed and
>>>>> persisted to the log, but the client's connection to the server is
>>>>> getting
>>>>> closed before a successful reply is sent. This causes the client to
>>>>> reconnect and retry the same request. For a retried delete, this
>>>>> causes a
>>>>> NoNode error. For a retried create, this causes a NodeExists error.
>>>>> The
>>>>> effect is also visible by logging the last zxid seen by the client,
>>>>> which
>>>>> will show gaps. i.e. 1, 2, 3, 5, 6, ...
>>>>>
>>>>> I think a contributing factor is the shutdown sequence in
>>>>> ZooKeeperServerMain. I see NIOServerCnxnFactory.shutdown() getting
>>>>> called
>>>>> and closing all connections before shutting down the request
>>>>> processing
>>>>> pipeline. If requests are in flight in the pipeline after sockets
>>>>> have
>>>>> been closed, but before the pipeline has been stopped, then the server
>>>>> could process the transaction and persist it, but not tell the client
>>>>> about it. I was able to reduce the frequency of test failures by
>>>>> reordering the shutdown sequence so that the request processing
>>>>> pipeline
>>>>> is shutdown before the sockets, but it appears this is only a partial
>>>>> fix.
>>>>> Before this change, I could only run Jordan's test for a few seconds
>>>>> before it aborted. After this change, I can keep it running for
>>>>> multiple
>>>>> minutes, but it still aborts eventually.
>>>>>
>>>>> Note that even if we improve the handling for this on a graceful
>>>>> shutdown,
>>>>> this scenario would still be possible in the event of hard failure
>>>>> (i.e.
>>>>> power loss). I know some applications wrap the ZooKeeper API calls
>>>>> with
>>>>> additional semantics, so that a NoNode/NodeExists response is treated
>>>>> as a
>>>>> successful idempotent response. For example, here is something in
>>>>> HBase:
>>>>>
>>>>>
>>>>>
>>>>> http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/zookeeper/Re
>>>>> co
>>>>> ve
>>>>> rableZooKeeper.html
>>>>>
>>>>>
>>>>> I don't know the Curator codebase very well, so I don't know if it has
>>>>> similar options.
>>>>>
>>>>> --Chris Nauroth
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/5/15, 10:41 AM, "Flavio Junqueira" <f...@apache.org> wrote:
>>>>>
>>>>>> You said you can reproduce it with raw ZooKeeper and I thought you're
>>>>>> saying you can't reproduce with raw ZooKeeper, but sounds like I got
>>>>>> it
>>>>>> wrong.
>>>>>>
>>>>>> -Flavio
>>>>>>
>>>>>>> On 05 Oct 2015, at 18:36, Jordan Zimmerman
>>>>>>> <jor...@jordanzimmerman.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw
>>>>>>>> ZooKeeper , yes?
>>>>>>>
>>>>>>> I don’t know what you mean. The test that the user posted used
>>>>>>> Curator.
>>>>>>> I changed all the Curator usages to raw ZooKeeper usages and the
>>>>>>> problem
>>>>>>> still shows.
>>>>>>>
>>>>>>> -JZ
>>>>>>>
>>>>>>>> On Oct 5, 2015, at 12:17 PM, Flavio Junqueira <f...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> can reproduce it with raw ZooKeeper -> can't reproduce it with raw
>>>>>>>> ZooKeeper , yes?
>>>>>>>>
>>>>>>>> I'll have a look at the jira to see if I have any insight.
>>>>>>>>
>>>>>>>> -Flavio
>>>>>>>>
>>>>>>>>> On 05 Oct 2015, at 18:15, Jordan Zimmerman
>>>>>>>>> <jor...@jordanzimmerman.com> wrote:
>>>>>>>>>
>>>>>>>>> What we’re seeing is transaction rollbacks. The bug was reported
>>>>>>>>> against Curator but I can reproduce it with raw ZooKeeper:
>>>>>>>>> https://issues.apache.org/jira/browse/CURATOR-268
>>>>>>>>> <https://issues.apache.org/jira/browse/CURATOR-268>
>>>>>>>>>
>>>>>>>>> -JZ
>>>>>>>>>
>>>>>>>>>> On Oct 5, 2015, at 12:00 PM, Flavio Junqueira <f...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> It is safe because the requests in the submittedRequests queue
>>>>>>>>>> haven't been prepared yet. The simplest pipeline is the one of
>>>>>>>>>> the
>>>>>>>>>> standalone server: preprequestprocessor -> syncrequestprocessor
>>>>>>>>>> ->
>>>>>>>>>> finalrequestprocessor. If the request hasn't gone through prepRP,
>>>>>>>>>> then nothing has changed in the state of zookeeper. The ones that
>>>>>>>>>> have gone through prepPR will complete regularly. For quorum, the
>>>>>>>>>> pipeline is a bit more complex, but the reasoning is very
>>>>>>>>>> similar.
>>>>>>>>>>
>>>>>>>>>> -Flavio
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 05 Oct 2015, at 17:55, Jordan Zimmerman
>>>>>>>>>>> <jor...@jordanzimmerman.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> That would mean that there’s no safe way to shut down the
>>>>>>>>>>> server,
>>>>>>>>>>> right? Ideally, you’d want the server to shut down gracefully:
>>>>>>>>>>> a)
>>>>>>>>>>> stop receiving requests; b) complete current requests; c) shut
>>>>>>>>>>> down.
>>>>>>>>>>> That’s how most servers work. Of course, you might want a
>>>>>>>>>>> quick-die
>>>>>>>>>>> shutdown but that’s not usual behavior.
>>>>>>>>>>>
>>>>>>>>>>> -JZ
>>>>>>>>>>>
>>>>>>>>>>>> On Oct 5, 2015, at 11:30 AM, Flavio Junqueira <f...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> You suggested that it is a bug, and I'm arguing that it isn't a
>>>>>>>>>>>> bug. You may want to optimize and still process the requests in
>>>>>>>>>>>> the
>>>>>>>>>>>> queue before injecting RoD, but discarding them doesn't sound
>>>>>>>>>>>> like
>>>>>>>>>>>> a bug because you can't guarantee that requests submitted
>>>>>>>>>>>> concurrently with the server shutting down will be executed.
>>>>>>>>>>>> Optimizing isn't the same as spotting a bug. Also, if you are
>>>>>>>>>>>> trying to shut down, you probably want to do it asap, rather
>>>>>>>>>>>> than
>>>>>>>>>>>> wait for a whole batch of operations to complete.
>>>>>>>>>>>>
>>>>>>>>>>>> -Flavio
>>>>>>>>>>>>
>>>>>>>>>>>>> On 05 Oct 2015, at 14:57, Jordan Zimmerman
>>>>>>>>>>>>> <jor...@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Flavio, that isn’t logical. Just because you can’t make that
>>>>>>>>>>>>> guarantee doesn’t imply that you should flush already queued
>>>>>>>>>>>>> transactions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 5, 2015, at 3:24 AM, Flavio Junqueira <f...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Injecting the RoD means that we are shutting down the server
>>>>>>>>>>>>>> pipeline. If the server is shutting down, then we can't
>>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>>> that a request submitted concurrently will be executed
>>>>>>>>>>>>>> anyway,
>>>>>>>>>>>>>> so
>>>>>>>>>>>>>> clearing the queue of submitted requests (submitted but no
>>>>>>>>>>>>>> preped
>>>>>>>>>>>>>> for execution) sounds like correct behavior to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Flavio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 04 Oct 2015, at 23:05, Chris Nauroth
>>>>>>>>>>>>>>> <cnaur...@hortonworks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Jordan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's an interesting find. I think you have a good theory.
>>>>>>>>>>>>>>> Have you
>>>>>>>>>>>>>>> already tried patching this to see if the bug reported
>>>>>>>>>>>>>>> against
>>>>>>>>>>>>>>> Curator
>>>>>>>>>>>>>>> goes away? (BTW, is there a corresponding Curator JIRA?)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That logic dates all the way back to the initial import of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> codebase.
>>>>>>>>>>>>>>> I can't find a definitive explanation, but my best guess is
>>>>>>>>>>>>>>> that dropping
>>>>>>>>>>>>>>> pending requests (instead of gracefully quiescing) can give
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> faster
>>>>>>>>>>>>>>> shutdown in the event of a heavily overloaded server.
>>>>>>>>>>>>>>> However,
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> correctness of this choice looks questionable, especially in
>>>>>>>>>>>>>>> stand-alone
>>>>>>>>>>>>>>> mode where you don't have a cluster of other machines to
>>>>>>>>>>>>>>> compensate.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Something else interesting is that this doesn't even really
>>>>>>>>>>>>>>> guarantee that
>>>>>>>>>>>>>>> the request of death is the only thing remaining to be
>>>>>>>>>>>>>>> processed. There
>>>>>>>>>>>>>>> is no synchronization over the queue covering both the clear
>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>> enqueue of the request of death, so I think there is a
>>>>>>>>>>>>>>> window
>>>>>>>>>>>>>>> in which
>>>>>>>>>>>>>>> other requests could trickle in ahead of the request of
>>>>>>>>>>>>>>> death.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --Chris Nauroth
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/1/15, 8:21 PM, "Jordan Zimmerman"
>>>>>>>>>>>>>>> <jord...@bluejeansnet.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why does PrepRequestProcessor.shutdown() call
>>>>>>>>>>>>>>>> submittedRequests.clear();
>>>>>>>>>>>>>>>> before adding the death request? What if there are pending
>>>>>>>>>>>>>>>> requests? I¹m
>>>>>>>>>>>>>>>> trying to track down a bug reported in Curator. It only
>>>>>>>>>>>>>>>> happens in
>>>>>>>>>>>>>>>> Standalone ZK instances. From what I can tell, shutting
>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>> instance might result in lost transactions. Am I looking
>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>> the wrong
>>>>>>>>>>>>>>>> path or is this a possibility?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>