Hi, I have faced somewhat a similar issue... i have a couple of map reduce jobs running on EC2... after a week or so, i get a no space on device exception while performing any linux command... so end up shuttin down hadoop and hbase, clear the logs and then restart them.
is there a cleaner way to do it??? thanks Raakhi On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <t...@cloudera.com> wrote: > On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <mlimo...@feeva.com> wrote: > > > Actually, I'm concerned about performance of map/reduce jobs for a > > long-running cluster. I.e. it seems to get slower the longer it's > running. > > After a restart of HDFS, the jobs seems to run faster. Not concerned > about > > the start-up time of HDFS. > > > > Hi Marc, > > Does it sound like this JIRA describes your problem? > > https://issues.apache.org/jira/browse/HADOOP-4766 > > If so, restarting just the JT should help with the symptoms. (I say > symptoms > because this is clearly a problem! Hadoop should be stable and performant > for months without a cluster restart!) > > -Todd > > > > > > Of course, as you suggest, this could be poor configuration of the > cluster > > on my part; but I'd still like to hear best practices around doing a > > scheduled restart. > > > > Marc > > > > -----Original Message----- > > From: Allen Wittenauer [mailto:a...@yahoo-inc.com] > > Sent: Friday, April 24, 2009 10:17 AM > > To: core-user@hadoop.apache.org > > Subject: Re: Advice on restarting HDFS in a cron > > > > > > > > > > On 4/24/09 9:31 AM, "Marc Limotte" <mlimo...@feeva.com> wrote: > > > I've heard that HDFS starts to slow down after it's been running for a > > long > > > time. And I believe I've experienced this. > > > > We did an upgrade (== complete restart) of a 2000 node instance in ~20 > > minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV. > > > > I suspect people aren't running the secondary name node and therefore > have > > massively large edits file. The name node appears slow on restart > because > > it has to apply the edits to the fsimage rather than having the secondary > > keep it up to date. > > > > > > -----Original Message----- > > From: Marc Limotte > > > > Hi. > > > > I've heard that HDFS starts to slow down after it's been running for a > long > > time. And I believe I've experienced this. So, I was thinking to set > up a > > cron job to execute every week to shutdown HDFS and start it up again. > > > > In concept, it would be something like: > > > > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh > > > > But I'm wondering if there is a safer way to do this. In particular: > > > > * What if a map/reduce job is running when this cron hits. Is > > there a way to suspend jobs while the HDFS restart happens? > > > > * Should I also restart the mapred daemons? > > > > * Should I wait some time after "stop-dfs.sh" for things to > settle > > down, before executing "start-dfs.sh"? Or maybe I should run a command > to > > verify that it is stopped before I run the start? > > > > Thanks for any help. > > Marc > > > > > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR > > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A > COMMUNICATION > > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, > > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY > > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL > AND > > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. > > >