On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <mlimo...@feeva.com> wrote:
> Actually, I'm concerned about performance of map/reduce jobs for a > long-running cluster. I.e. it seems to get slower the longer it's running. > After a restart of HDFS, the jobs seems to run faster. Not concerned about > the start-up time of HDFS. > Hi Marc, Does it sound like this JIRA describes your problem? https://issues.apache.org/jira/browse/HADOOP-4766 If so, restarting just the JT should help with the symptoms. (I say symptoms because this is clearly a problem! Hadoop should be stable and performant for months without a cluster restart!) -Todd > > Of course, as you suggest, this could be poor configuration of the cluster > on my part; but I'd still like to hear best practices around doing a > scheduled restart. > > Marc > > -----Original Message----- > From: Allen Wittenauer [mailto:a...@yahoo-inc.com] > Sent: Friday, April 24, 2009 10:17 AM > To: core-user@hadoop.apache.org > Subject: Re: Advice on restarting HDFS in a cron > > > > > On 4/24/09 9:31 AM, "Marc Limotte" <mlimo...@feeva.com> wrote: > > I've heard that HDFS starts to slow down after it's been running for a > long > > time. And I believe I've experienced this. > > We did an upgrade (== complete restart) of a 2000 node instance in ~20 > minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV. > > I suspect people aren't running the secondary name node and therefore have > massively large edits file. The name node appears slow on restart because > it has to apply the edits to the fsimage rather than having the secondary > keep it up to date. > > > -----Original Message----- > From: Marc Limotte > > Hi. > > I've heard that HDFS starts to slow down after it's been running for a long > time. And I believe I've experienced this. So, I was thinking to set up a > cron job to execute every week to shutdown HDFS and start it up again. > > In concept, it would be something like: > > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh > > But I'm wondering if there is a safer way to do this. In particular: > > * What if a map/reduce job is running when this cron hits. Is > there a way to suspend jobs while the HDFS restart happens? > > * Should I also restart the mapred daemons? > > * Should I wait some time after "stop-dfs.sh" for things to settle > down, before executing "start-dfs.sh"? Or maybe I should run a command to > verify that it is stopped before I run the start? > > Thanks for any help. > Marc > > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. >