almost everything we do in zookkeeper is to make sure that we don't
lose data in much worse scenarios. the probably of a loss in this
scenario is really just the probability of a bug in the code. i don't
think that kill -TERM vs kill -KILL changes that probability at all
either way.

ben

On Thu, Jul 28, 2011 at 12:50 AM, Laxman <[email protected]> wrote:
> Thanks for the responses Mahadev, Pat and Ben.
> I understand your explanation.
>
> My only question is "Will there be any probability data loss in the scenario
> mentioned?"
>
>>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted
> there is a chance of data loss.
>
>>>if we use sigterm in the script, we would want to put a timeout in to
> escalate to a -9
>
> As Ben mentioned, even if we escalate to "kill -9" to ensure shutdown, still
> we may have data loss. But the probability is very less by giving a chance
> to shutdown gracefully.
>
> Please do correct me if my understanding is wrong.
> --
> Laxman
>
> -----Original Message-----
> From: Benjamin Reed [mailto:[email protected]]
> Sent: Thursday, July 28, 2011 11:40 AM
> To: [email protected]
> Subject: Re: FW: Does abrupt kill corrupts the datadir?
>
> i agree with pat. if we use sigterm in the script, we would want to
> put a timeout in to escalate to a -9 which makes the script a bit more
> complicated without reason since we don't have any exit hooks that we
> want to run. zookeeper is designed to recover well from hard failures,
> much worse than a kill -9. i don't think we want to change that.
>
> ben
>
> On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt <[email protected]> wrote:
>> ZK has been built around the "fail fast" approach. In order to
>> maintain high availability we want to ensure that restarting a server
>> will result in it attempting to rejoin the quorum. IMO we would not
>> want to change this (kill -9).
>>
>> Patrick
>>
>> On Tue, Jul 26, 2011 at 2:02 AM, Laxman <[email protected]> wrote:
>>> Hi Everyone,
>>>
>>> Any thoughts?
>>> Do we need consider changing abrupt shutdown to
>>>
>>> Implementations in some other hadoop eco system projects for your
> reference.
>>> Hadoop - kill [SIGTERM]
>>> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung
>>> ZooKeeper - "kill -9" [SIGKILL]
>>>
>>>
>>> -----Original Message-----
>>> From: Laxman [mailto:[email protected]]
>>> Sent: Wednesday, July 13, 2011 12:36 PM
>>> To: '[email protected]'
>>> Subject: RE: Does abrupt kill corrupts the datadir?
>>>
>>> Hi Mahadev,
>>>
>>> Shutdown hook is just a quick thought. Another approach can be just give
> a
>>> kill [SIGTERM] call which can be interpreted by process.
>>>
>>> First look at the "kill -9" triggered the following scenario.
>>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted
> there
>>>>is a chance of dataloss.
>>>
>>> How does zookeeper can deal with this scenario gracefully?
>>>
>>> Also, I feel we should give a chance to application to shutdown
> gracefully
>>> before abrupt shutdown.
>>>
>>> http://en.wikipedia.org/wiki/SIGKILL
>>>
>>> Because SIGKILL gives the process no opportunity to do cleanup operations
> on
>>> terminating, in most system shutdown procedures an attempt is first made
> to
>>> terminate processes using SIGTERM, before resorting to SIGKILL.
>>>
>>> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/
>>>
>>> The application can determine what it wants to do once a SIGTERM is
>>> received. While most applications will clean up their resources and stop,
>>> some may not. An application may be configured to do something completely
>>> different when a SIGTERM is received. Also, if the application is in a
> bad
>>> state, such as waiting for disk I/O, it may not be able to act on the
> signal
>>> that was sent.
>>>
>>> Most system administrators will usually resort to the more abrupt signal
>>> when an application doesn't respond to a SIGTERM.
>>>
>>> -----Original Message-----
>>> From: Mahadev Konar [mailto:[email protected]]
>>> Sent: Wednesday, July 13, 2011 12:02 PM
>>> To: [email protected]
>>> Subject: Re: Does abrupt kill corrupts the datadir?
>>>
>>> Hi Laxman,
>>>  The servers takes care of all the issues with data integrity, so a kill
>>> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure
>>> everything works reliably is use kill -9 :).
>>>
>>> Thanks
>>> mahadev
>>>
>>> On 7/12/11 11:16 PM, "Laxman" <[email protected]> wrote:
>>>
>>>>When we stop zookeeper through zkServer.sh stop, we are aborting the
>>>>zookeeper process using "kill -9".
>>>>
>>>>
>>>>
>>>>129 stop)
>>>>
>>>>130     echo -n "Stopping zookeeper ... "
>>>>
>>>>131     if [ ! -f "$ZOOPIDFILE" ]
>>>>
>>>>132     then
>>>>
>>>>133       echo "error: could not find file $ZOOPIDFILE"
>>>>
>>>>134       exit 1
>>>>
>>>>135     else
>>>>
>>>>136       $KILL -9 $(cat "$ZOOPIDFILE")
>>>>
>>>>137       rm "$ZOOPIDFILE"
>>>>
>>>>138       echo STOPPED
>>>>
>>>>139       exit 0
>>>>
>>>>140     fi
>>>>
>>>>141     ;;
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>This may corrupt the snapshot and transaction logs. Also, its not
>>>>recommended to use "kill -9".
>>>>
>>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted
> there
>>>>is a chance of dataloss.
>>>>
>>>>
>>>>
>>>>How about introducing a shutdown hook which will ensure zookeeper is
>>>>shutdown gracefully when we call stop?
>>>>
>>>>
>>>>
>>>>Note: This is just an observation and its not found in a test.
>>>>
>>>>
>>>>
>>>>--
>>>>
>>>>Thanks,
>>>>
>>>>Laxman
>>>>
>>>
>>>
>>>
>>
>
>

Reply via email to