Re: Interesting elastic/ZK post

Chris Nauroth Mon, 09 May 2016 10:49:36 -0700

I filed ZOOKEEPER-2424 to track this.

--Chris Nauroth





On 5/9/16, 10:18 AM, "Patrick Hunt" <[email protected]> wrote:

>Makes sense to me to add it. Someone could create a ZK jira? Sounds like a
>great starter project for someone interested to get rolling with ZK.  3.5+
>adds jetty support for accessing metrics, sounds like it would dovetail
>nicely.
>
>Patrick
>
>On Mon, May 9, 2016 at 10:12 AM, Chris Nauroth <[email protected]>
>wrote:
>
>> I always sympathize with a major outage report, but on the bright side,
>>it
>> was very satisfying to hear the ZooKeeper cluster had sustained uptime
>>for
>> 3 years.  That agrees with my own user experience.  It's often the most
>> stable component of a distributed infrastructure (as it needs to be).
>>
>> As far as potential improvements, I was wondering if it would make sense
>> to introduce something like Hadoop's JvmPauseMonitor [1].  This is a
>> background thread that attempts to detect GC churn and log warnings
>>about
>> it.  This has been very helpful in diagnosing NameNode misconfigurations
>> that lead to GC churn.
>>
>> This wouldn't have prevented a problem for the Elastic Cloud team, but
>>at
>> least it would have made the root cause more visible.  A warning about
>>GC
>> churn could have been shown in the main ZooKeeper log instead of a
>> separate GC log or inferring it from other sources like JMX.
>>
>> [1] https://s.apache.org/4sdx
>>
>> --Chris Nauroth
>>
>>
>>
>>
>> On 5/8/16, 7:37 PM, "Patrick Hunt" <[email protected]> wrote:
>>
>> >Interesting root cause and mitigations discussion.
>> >
>> >https://www.elastic.co/blog/elastic-cloud-outage-april-2016
>> >
>> >Patrick
>>
>>

Re: Interesting elastic/ZK post

Reply via email to