ZK has been built around the "fail fast" approach. In order to maintain high availability we want to ensure that restarting a server will result in it attempting to rejoin the quorum. IMO we would not want to change this (kill -9).
Patrick On Tue, Jul 26, 2011 at 2:02 AM, Laxman <lakshman...@huawei.com> wrote: > Hi Everyone, > > Any thoughts? > Do we need consider changing abrupt shutdown to > > Implementations in some other hadoop eco system projects for your reference. > Hadoop - kill [SIGTERM] > HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung > ZooKeeper - "kill -9" [SIGKILL] > > > -----Original Message----- > From: Laxman [mailto:lakshman...@huawei.com] > Sent: Wednesday, July 13, 2011 12:36 PM > To: 'dev@zookeeper.apache.org' > Subject: RE: Does abrupt kill corrupts the datadir? > > Hi Mahadev, > > Shutdown hook is just a quick thought. Another approach can be just give a > kill [SIGTERM] call which can be interpreted by process. > > First look at the "kill -9" triggered the following scenario. >>In worst case, if latest snaps in all zookeeper nodes gets corrupted there >>is a chance of dataloss. > > How does zookeeper can deal with this scenario gracefully? > > Also, I feel we should give a chance to application to shutdown gracefully > before abrupt shutdown. > > http://en.wikipedia.org/wiki/SIGKILL > > Because SIGKILL gives the process no opportunity to do cleanup operations on > terminating, in most system shutdown procedures an attempt is first made to > terminate processes using SIGTERM, before resorting to SIGKILL. > > http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/ > > The application can determine what it wants to do once a SIGTERM is > received. While most applications will clean up their resources and stop, > some may not. An application may be configured to do something completely > different when a SIGTERM is received. Also, if the application is in a bad > state, such as waiting for disk I/O, it may not be able to act on the signal > that was sent. > > Most system administrators will usually resort to the more abrupt signal > when an application doesn't respond to a SIGTERM. > > -----Original Message----- > From: Mahadev Konar [mailto:maha...@hortonworks.com] > Sent: Wednesday, July 13, 2011 12:02 PM > To: dev@zookeeper.apache.org > Subject: Re: Does abrupt kill corrupts the datadir? > > Hi Laxman, > The servers takes care of all the issues with data integrity, so a kill > -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure > everything works reliably is use kill -9 :). > > Thanks > mahadev > > On 7/12/11 11:16 PM, "Laxman" <lakshman...@huawei.com> wrote: > >>When we stop zookeeper through zkServer.sh stop, we are aborting the >>zookeeper process using "kill -9". >> >> >> >>129 stop) >> >>130 echo -n "Stopping zookeeper ... " >> >>131 if [ ! -f "$ZOOPIDFILE" ] >> >>132 then >> >>133 echo "error: could not find file $ZOOPIDFILE" >> >>134 exit 1 >> >>135 else >> >>136 $KILL -9 $(cat "$ZOOPIDFILE") >> >>137 rm "$ZOOPIDFILE" >> >>138 echo STOPPED >> >>139 exit 0 >> >>140 fi >> >>141 ;; >> >> >> >> >> >>This may corrupt the snapshot and transaction logs. Also, its not >>recommended to use "kill -9". >> >>In worst case, if latest snaps in all zookeeper nodes gets corrupted there >>is a chance of dataloss. >> >> >> >>How about introducing a shutdown hook which will ensure zookeeper is >>shutdown gracefully when we call stop? >> >> >> >>Note: This is just an observation and its not found in a test. >> >> >> >>-- >> >>Thanks, >> >>Laxman >> > > >