i agree with pat. if we use sigterm in the script, we would want to put a timeout in to escalate to a -9 which makes the script a bit more complicated without reason since we don't have any exit hooks that we want to run. zookeeper is designed to recover well from hard failures, much worse than a kill -9. i don't think we want to change that.
ben On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt <[email protected]> wrote: > ZK has been built around the "fail fast" approach. In order to > maintain high availability we want to ensure that restarting a server > will result in it attempting to rejoin the quorum. IMO we would not > want to change this (kill -9). > > Patrick > > On Tue, Jul 26, 2011 at 2:02 AM, Laxman <[email protected]> wrote: >> Hi Everyone, >> >> Any thoughts? >> Do we need consider changing abrupt shutdown to >> >> Implementations in some other hadoop eco system projects for your reference. >> Hadoop - kill [SIGTERM] >> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung >> ZooKeeper - "kill -9" [SIGKILL] >> >> >> -----Original Message----- >> From: Laxman [mailto:[email protected]] >> Sent: Wednesday, July 13, 2011 12:36 PM >> To: '[email protected]' >> Subject: RE: Does abrupt kill corrupts the datadir? >> >> Hi Mahadev, >> >> Shutdown hook is just a quick thought. Another approach can be just give a >> kill [SIGTERM] call which can be interpreted by process. >> >> First look at the "kill -9" triggered the following scenario. >>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there >>>is a chance of dataloss. >> >> How does zookeeper can deal with this scenario gracefully? >> >> Also, I feel we should give a chance to application to shutdown gracefully >> before abrupt shutdown. >> >> http://en.wikipedia.org/wiki/SIGKILL >> >> Because SIGKILL gives the process no opportunity to do cleanup operations on >> terminating, in most system shutdown procedures an attempt is first made to >> terminate processes using SIGTERM, before resorting to SIGKILL. >> >> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/ >> >> The application can determine what it wants to do once a SIGTERM is >> received. While most applications will clean up their resources and stop, >> some may not. An application may be configured to do something completely >> different when a SIGTERM is received. Also, if the application is in a bad >> state, such as waiting for disk I/O, it may not be able to act on the signal >> that was sent. >> >> Most system administrators will usually resort to the more abrupt signal >> when an application doesn't respond to a SIGTERM. >> >> -----Original Message----- >> From: Mahadev Konar [mailto:[email protected]] >> Sent: Wednesday, July 13, 2011 12:02 PM >> To: [email protected] >> Subject: Re: Does abrupt kill corrupts the datadir? >> >> Hi Laxman, >> The servers takes care of all the issues with data integrity, so a kill >> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure >> everything works reliably is use kill -9 :). >> >> Thanks >> mahadev >> >> On 7/12/11 11:16 PM, "Laxman" <[email protected]> wrote: >> >>>When we stop zookeeper through zkServer.sh stop, we are aborting the >>>zookeeper process using "kill -9". >>> >>> >>> >>>129 stop) >>> >>>130 echo -n "Stopping zookeeper ... " >>> >>>131 if [ ! -f "$ZOOPIDFILE" ] >>> >>>132 then >>> >>>133 echo "error: could not find file $ZOOPIDFILE" >>> >>>134 exit 1 >>> >>>135 else >>> >>>136 $KILL -9 $(cat "$ZOOPIDFILE") >>> >>>137 rm "$ZOOPIDFILE" >>> >>>138 echo STOPPED >>> >>>139 exit 0 >>> >>>140 fi >>> >>>141 ;; >>> >>> >>> >>> >>> >>>This may corrupt the snapshot and transaction logs. Also, its not >>>recommended to use "kill -9". >>> >>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there >>>is a chance of dataloss. >>> >>> >>> >>>How about introducing a shutdown hook which will ensure zookeeper is >>>shutdown gracefully when we call stop? >>> >>> >>> >>>Note: This is just an observation and its not found in a test. >>> >>> >>> >>>-- >>> >>>Thanks, >>> >>>Laxman >>> >> >> >> >
