Re: hbase on s3 and safemode

Andrew Purtell Wed, 07 Oct 2009 12:55:59 -0700

One possibility is you loaded data, but not enough to cause a flush, then there 
appeared to be some network related problem, and you killed the regionservers 
hard (-9?) while the filesystem was unavailable. This unfortunate string of 
circumstances would cause data loss. However you said the cluster had been 
running for 6 days so a major compaction (runs once every 24 hours) would have 
flushed and persisted data. Is there anything in the bucket? (hadoop fs -lsr 
...)


0.20 is definitely the way to go, for a number of reasons.

   - Andy




________________________________
From: Ananth T. Sarathy <ananth.t.sara...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wed, October 7, 2009 12:46:24 PM
Subject: Re: hbase on s3 and safemode

thanks for all the help

<property>
    <name>hbase.rootdir</name>
    <value>s3://hbase2.s3.amazonaws.com:80/hbasedata</value>
    <description>The directory shared by region servers.
    Should be fully-qualified to include the filesystem to use.
    E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
    </description>
  </property>

that's in our hbase-site.xml


We had been running for about 6 days with new issues.  at 130 this morning
it just crapped out.

We are thinking about just moving to 20.0 and starting over.

Ananth T Sarathy


On Wed, Oct 7, 2009 at 3:41 PM, Andrew Purtell <apurt...@apache.org> wrote:

> Did you edit hbase-site.xml such that HBase data directories are not in
> /tmp? Maybe a silly question... but it happens sometimes.
>
> If your hbase.rootdir points to an HDFS filesystem, what does 'hadoop fs
> -lsr hdfs://namenode:port/path/to/hbase/root' show?
>
> You said this was working before? Did you shut down and bring HBase back up
> before without trouble? Is this a new install?
>
>   - Andy
>
>
>
>
>
> ________________________________
> From: Ananth T. Sarathy <ananth.t.sara...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wed, October 7, 2009 12:34:28 PM
> Subject: Re: hbase on s3 and safemode
>
> ok. so we finally got the regionserver to come up (We killed all the
> processes on the box and finally the regionserver came back up)
> but when it did, there is no data in our tables. Though the tables are
> there.  Any ideas where the data went or how I can get it back?
>
> Ananth T Sarathy
>
>
> On Wed, Oct 7, 2009 at 2:46 PM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > One option is to add SYSV init scripts that on boot take the following
> > equivalent actions:
> >
> >    hbase-daemon.sh start zookeeper
> >
> >    hbase-daemon.sh start master
> >
> >    hbase-daemon.sh start regionserver
> >
> > Set the respective init scripts to run according to host role.
> >
> > This presumes you have also added init scripts that start up DFS daemons
> > wherever they should be, equivalents to the following:
> >
> >    hadoop-daemon.sh start namenode
> >
> >    hadoop-daemon.sh start datanode
> >
> >    hadoop-daemon.sh start secondarynamenode
> >
> > You can start everything up all at once. The respective daemons will wait
> > for each others' services to become available. Ignore ZK noise in the
> logs
> > about connection difficulties unless they persist for minutes.
> >
> > If you want to try out the Cloudera Hadoop distribution for 0.20, they
> have
> > RPMs that will take care of all of this for you, and we have a RPM for
> that
> > platform that I can provide you.
> >
> > Do also check your network configuration.
> >
> >   - Andy
> >
> >
> >
> >
> > ________________________________
> > From: Ananth T. Sarathy <ananth.t.sara...@gmail.com>
> > To: hbase-user@hadoop.apache.org
> > Sent: Wed, October 7, 2009 11:36:22 AM
> > Subject: Re: hbase on s3 and safemode
> >
> > is there a way to turn my regionservers on implicitly besides
> > start-hbase.sh?
> > Ananth T Sarathy
> >
> >
> > On Wed, Oct 7, 2009 at 2:31 PM, Andrew Purtell <apurt...@apache.org>
> > wrote:
> >
> > > HBase won't leave safe mode if the regionservers cannot contact the
> > master.
> > > So the question is why cannot your regionservers contact the master. If
> > the
> > > regionserver processes are confirmed running, then it's a firewall or
> AWS
> > > Security Groups config problem most likely.
> > >
> > > status was a shell command added in 0.20 IIRC.
> > >
> > >    - Andy
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Ananth T. Sarathy <ananth.t.sara...@gmail.com>
> > > To: hbase-user@hadoop.apache.org
> > > Sent: Wed, October 7, 2009 11:04:03 AM
> > > Subject: Re: hbase on s3 and safemode
> > >
> > > i suppose we need to, but for now it's kind of a pain because we need
> to
> > > coordinate our clients.
> > >
> > > But the problem is why was it working and all of the sudden it's stuck
> in
> > > safemode and how to can get back up?
> > >
> > > Ananth T Sarathy
> > >
> > >
> > > On Wed, Oct 7, 2009 at 1:58 PM, stack <st...@duboce.net> wrote:
> > >
> > > > Can you update to 0.20.0? (Oodles of improvements).
> > > > St.Ack
> > > >
> > > > On Wed, Oct 7, 2009 at 10:56 AM, Ananth T. Sarathy <
> > > > ananth.t.sara...@gmail.com> wrote:
> > > >
> > > > > I get an error
> > > > >
> > > > > hbase(main):001:0> status "detailed"
> > > > > NoMethodError: undefined method `status' for #<Object:0x5585c0de>
> > > > >        from (hbase):2
> > > > > hbase(main):002:0> status "detailed"
> > > > > NoMethodError: undefined method `status' for #<Object:0x5585c0de>
> > > > >        from (hbase):3
> > > > >
> > > > >
> > > > > we are running 0.19.3
> > > > >
> > > > > Ananth T Sarathy
> > > > >
> > > > >
> > > > > On Wed, Oct 7, 2009 at 1:51 PM, stack <st...@duboce.net> wrote:
> > > > >
> > > > > > This state persists even if you shutdown hbase and zk and
> restart?
> > > > > >
> > > > > > In shell, do:
> > > > > >
> > > > > > > status "detailed"
> > > > > >
> > > > > > At the top there is a section which says regions in transistion.
> > > > >  Anything
> > > > > > there?
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > > On Wed, Oct 7, 2009 at 10:35 AM, Ananth T. Sarathy <
> > > > > > ananth.t.sara...@gmail.com> wrote:
> > > > > >
> > > > > > > Here is the log  since I started it...
> > > > > > >
> > > > > > > Wed Oct  7 13:27:26 EDT 2009 Starting master on ip-10-244-9-171
> > > > > > > ulimit -n 1024
> > > > > > > 2009-10-07 13:27:26,404 INFO
> > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun
> > Microsystems
> > > > > Inc.,
> > > > > > > vmVersion=14.2-b01
> > > > > > > 2009-10-07 13:27:26,405 INFO
> > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > vmInputArguments=[-Xmx2000m, -XX:+HeapDumpOnOutOfMemoryError,
> > > > > > > -Djava.io.tmpdir=/mnt/tmp,
> > > > > > > -Dhbase.log.dir=/mnt/apps/hadoop/hbase/bin/../logs,
> > > > > > > -Dhbase.log.file=hbase-root-master-ip-10-244-9-171.log,
> > > > > > > -Dhbase.home.dir=/mnt/apps/hadoop/hbase/bin/..,
> > > -Dhbase.id.str=root,
> > > > > > > -Dhbase.root.logger=INFO,DRFA,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -Djava.library.path=/mnt/apps/hadoop/hbase/bin/../lib/native/Linux-amd64-64]
> > > > > > > 2009-10-07 13:27:27,525 INFO
> > > org.apache.hadoop.hbase.master.HMaster:
> > > > > Root
> > > > > > > region dir: s3://
> > > > hbase2.s3.amazonaws.com:80/hbasedata/-ROOT-/70236052
> > > > > > > 2009-10-07<
> > > > > >
> > > >
> > http://hbase2.s3.amazonaws.com:80/hbasedata/-ROOT-/70236052%0A2009-10-07
> > > > > >13:27:27,751
> > > > > > INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
> > > > > > > Initializing RPC Metrics with hostName=HMaster, port=60000
> > > > > > > 2009-10-07 13:27:27,827 INFO
> > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > HMaster
> > > > > > > initialized on 10.244.9.171:60000
> > > > > > > 2009-10-07 13:27:27,829 INFO
> > > > org.apache.hadoop.metrics.jvm.JvmMetrics:
> > > > > > > Initializing JVM Metrics with processName=Master,
> > sessionId=HMaster
> > > > > > > 2009-10-07 13:27:27,830 INFO
> > > > > > > org.apache.hadoop.hbase.master.metrics.MasterMetrics:
> Initialized
> > > > > > > 2009-10-07 13:27:27,932 INFO org.mortbay.util.Credential:
> > Checking
> > > > > > Resource
> > > > > > > aliases
> > > > > > > 2009-10-07 13:27:27,936 INFO org.mortbay.http.HttpServer:
> Version
> > > > > > > Jetty/5.1.4
> > > > > > > 2009-10-07 13:27:27,936 INFO org.mortbay.util.Container:
> Started
> > > > > > > HttpContext[/logs,/logs]
> > > > > > > 2009-10-07 13:27:28,202 INFO org.mortbay.util.Container:
> Started
> > > > > > > org.mortbay.jetty.servlet.webapplicationhand...@3209fa8f
> > > > > > > 2009-10-07 13:27:28,244 INFO org.mortbay.util.Container:
> Started
> > > > > > > WebApplicationContext[/static,/static]
> > > > > > > 2009-10-07 13:27:28,361 INFO org.mortbay.util.Container:
> Started
> > > > > > > org.mortbay.jetty.servlet.webapplicationhand...@b0c0f66
> > > > > > > 2009-10-07 13:27:28,364 INFO org.mortbay.util.Container:
> Started
> > > > > > > WebApplicationContext[/,/]
> > > > > > > 2009-10-07 13:27:28,636 INFO org.mortbay.util.Container:
> Started
> > > > > > > org.mortbay.jetty.servlet.webapplicationhand...@3c2d7440
> > > > > > > 2009-10-07 13:27:28,638 INFO org.mortbay.util.Container:
> Started
> > > > > > > WebApplicationContext[/api,rest]
> > > > > > > 2009-10-07 13:27:28,639 INFO org.mortbay.http.SocketListener:
> > > Started
> > > > > > > SocketListener on 0.0.0.0:60010
> > > > > > > 2009-10-07 13:27:28,639 INFO org.mortbay.util.Container:
> Started
> > > > > > > org.mortbay.jetty.ser...@28b301f2
> > > > > > > 2009-10-07 13:27:28,640 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > Responder: starting
> > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > listener on 60000: starting
> > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 0 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 1 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 2 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 3 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 4 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 5 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 6 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 7 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 8 on 60000: starting
> > > > > > > 2009-10-07 13:27:28,642 DEBUG
> > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > Started service threads
> > > > > > > 2009-10-07 13:27:28,643 INFO org.apache.hadoop.ipc.HBaseServer:
> > IPC
> > > > > > Server
> > > > > > > handler 9 on 60000: starting
> > > > > > > 2009-10-07 13:28:09,519 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:28:11,542 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:28:13,543 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:28:15,545 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:28:17,548 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:28:19,555 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:28:27,834 INFO
> > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > All
> > > > > > > 0 .META. region(s) scanned
> > > > > > > 2009-10-07 13:29:27,832 INFO
> > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > All
> > > > > > > 0 .META. region(s) scanned
> > > > > > > 2009-10-07 13:29:37,593 INFO
> > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > in safe mode
> > > > > > > 2009-10-07 13:30:27,834 INFO
> > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > All
> > > > > > > 0 .META. region(s) scanned
> > > > > > > 2009-10-07 13:31:27,836 INFO
> > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > All
> > > > > > > 0 .META. region(s) scanned
> > > > > > > 2009-10-07 13:32:27,838 INFO
> > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > All
> > > > > > > 0 .META. region(s) scanned
> > > > > > > 2009-10-07 13:33:27,840 INFO
> > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > All
> > > > > > > 0 .META. region(s) scanned
> > > > > > >
> > > > > > >
> > > > > > > Ananth T Sarathy
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Oct 7, 2009 at 1:20 PM, stack <st...@duboce.net>
> wrote:
> > > > > > >
> > > > > > > > Thats interesting to hear.  Keep us posted.
> > > > > > > >
> > > > > > > > HBase asks the filesystem if its in safe mode and if it is,
> it
> > > > parks
> > > > > > > > itself.  Here is code from master:
> > > > > > > >
> > > > > > > >    if (this.fs instanceof DistributedFileSystem) {
> > > > > > > >      // Make sure dfs is not in safe mode
> > > > > > > >      String message = "Waiting for dfs to exit safe mode...";
> > > > > > > >      while (((DistributedFileSystem) fs).setSafeMode(
> > > > > > > >          FSConstants.SafeModeAction.SAFEMODE_GET)) {
> > > > > > > >        LOG.info(message);
> > > > > > > >        try {
> > > > > > > >          Thread.sleep(this.threadWakeFrequency);
> > > > > > > >        } catch (InterruptedException e) {
> > > > > > > >          //continue
> > > > > > > >        }
> > > > > > > >      }
> > > > > > > >    }
> > > > > > > >
> > > > > > > >
> > > > > > > > Then there is hbase's notion of safemode.  It will be in safe
> > > mode
> > > > > > until
> > > > > > > it
> > > > > > > > does initial scan of catalog tables.  The master keeps a flag
> > in
> > > > > > > zookeeper
> > > > > > > > while its in safemode so regionservers are aware of the
> state:
> > > > > > > >
> > > > > > > >  public boolean inSafeMode() {
> > > > > > > >    if (safeMode) {
> > > > > > > >      if(isInitialMetaScanComplete() &&
> > regionsInTransition.size()
> > > > ==
> > > > > 0
> > > > > > &&
> > > > > > > >         tellZooKeeperOutOfSafeMode()) {
> > > > > > > >        master.connection.unsetRootRegionLocation();
> > > > > > > >        safeMode = false;
> > > > > > > >        LOG.info("exiting safe mode");
> > > > > > > >      } else {
> > > > > > > >        LOG.info("in safe mode");
> > > > > > > >      }
> > > > > > > >    }
> > > > > > > >    return safeMode;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > Have you seen the .META. and -ROOT- deploy to regionservers?
> > >  Have
> > > > > you
> > > > > > > seen
> > > > > > > > that these regions being scanned in the master log?  (Enable
> > > DEBUG
> > > > if
> > > > > > not
> > > > > > > > already enabled).
> > > > > > > >
> > > > > > > > Yours,
> > > > > > > > ST.Ack
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Oct 7, 2009 at 10:06 AM, Ananth T. Sarathy <
> > > > > > > > ananth.t.sara...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > We have been running Hbase on a s3 filesystem. It's the
> hbase
> > > > > > > > regionserver,
> > > > > > > > > not HDFS since we are using s3.  We haven't felt like it's
> > been
> > > > too
> > > > > > > slow,
> > > > > > > > > though the amount of data we are pushing isn't sufficiently
> > > large
> > > > > > > enough
> > > > > > > > to
> > > > > > > > > notice yet.
> > > > > > > > > Ananth T Sarathy
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Oct 7, 2009 at 12:47 PM, stack <st...@duboce.net>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > HBase or HDFS is in safe mode.  My guess is that its the
> > > > latter.
> > > > > > > Can
> > > > > > > > > you
> > > > > > > > > > figure from HDFS logs why it won't leave safe mode?
> >  Usually
> > > > > > > > > > under-replication or a loss of a large swath of the
> cluster
> > > > will
> > > > > > flip
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > safe-mode switch.
> > > > > > > > > >
> > > > > > > > > > Are you trying to run HBASE on an S3 filesystem?  An
> > HBasista
> > > > > tried
> > > > > > > it
> > > > > > > > in
> > > > > > > > > > the past and, FYI, found it insufferably slow.  Let us
> know
> > > how
> > > > > it
> > > > > > > goes
> > > > > > > > > for
> > > > > > > > > > you.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > St.Ack
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 7, 2009 at 9:33 AM, Ananth T. Sarathy <
> > > > > > > > > > ananth.t.sara...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > my  regionserver has been stuck in safemode. What can i
> > do
> > > to
> > > > > get
> > > > > > > it
> > > > > > > > > out
> > > > > > > > > > > safemode?
> > > > > > > > > > >
> > > > > > > > > > > Ananth T Sarathy
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam protection around
> > > http://mail.yahoo.com
> >
> >
> >
> >
> >
>
>
>
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: hbase on s3 and safemode

Reply via email to