Re: Advice on restarting HDFS in a cron

Rakhi Khatwani Sat, 25 Apr 2009 11:21:53 -0700

Hi,
   I have faced somewhat a similar issue...
   i have a couple of map reduce jobs running on EC2... after a week or so,
i get a no space on device exception while performing any linux command...
so end up shuttin down hadoop and hbase, clear the logs and then restart
them.


is there a cleaner way to do it???

thanks
Raakhi

On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <t...@cloudera.com> wrote:

> On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <mlimo...@feeva.com> wrote:
>
> > Actually, I'm concerned about performance of map/reduce jobs for a
> > long-running cluster.  I.e. it seems to get slower the longer it's
> running.
> >  After a restart of HDFS, the jobs seems to run faster.  Not concerned
> about
> > the start-up time of HDFS.
> >
>
> Hi Marc,
>
> Does it sound like this JIRA describes your problem?
>
> https://issues.apache.org/jira/browse/HADOOP-4766
>
> If so, restarting just the JT should help with the symptoms. (I say
> symptoms
> because this is clearly a problem! Hadoop should be stable and performant
> for months without a cluster restart!)
>
> -Todd
>
>
> >
> > Of course, as you suggest, this could be poor configuration of the
> cluster
> > on my part; but I'd still like to hear best practices around doing a
> > scheduled restart.
> >
> > Marc
> >
> > -----Original Message-----
> > From: Allen Wittenauer [mailto:a...@yahoo-inc.com]
> > Sent: Friday, April 24, 2009 10:17 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: Advice on restarting HDFS in a cron
> >
> >
> >
> >
> > On 4/24/09 9:31 AM, "Marc Limotte" <mlimo...@feeva.com> wrote:
> > > I've heard that HDFS starts to slow down after it's been running for a
> > long
> > > time.  And I believe I've experienced this.
> >
> > We did an upgrade (== complete restart) of a 2000 node instance in ~20
> > minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.
> >
> > I suspect people aren't running the secondary name node and therefore
> have
> > massively large edits file.  The name node appears slow on restart
> because
> > it has to apply the edits to the fsimage rather than having the secondary
> > keep it up to date.
> >
> >
> > -----Original Message-----
> > From: Marc Limotte
> >
> > Hi.
> >
> > I've heard that HDFS starts to slow down after it's been running for a
> long
> > time.  And I believe I've experienced this.   So, I was thinking to set
> up a
> > cron job to execute every week to shutdown HDFS and start it up again.
> >
> > In concept, it would be something like:
> >
> > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
> >
> > But I'm wondering if there is a safer way to do this.  In particular:
> >
> > *         What if a map/reduce job is running when this cron hits.  Is
> > there a way to suspend jobs while the HDFS restart happens?
> >
> > *         Should I also restart the mapred daemons?
> >
> > *         Should I wait some time after "stop-dfs.sh" for things to
> settle
> > down, before executing "start-dfs.sh"?  Or maybe I should run a command
> to
> > verify that it is stopped before I run the start?
> >
> > Thanks for any help.
> > Marc
> >
> >
> > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
> > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
> COMMUNICATION
> > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
> > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL
> AND
> > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> >
>

Re: Advice on restarting HDFS in a cron

Reply via email to