One possibility is you loaded data, but not enough to cause a flush, then there appeared to be some network related problem, and you killed the regionservers hard (-9?) while the filesystem was unavailable. This unfortunate string of circumstances would cause data loss. However you said the cluster had been running for 6 days so a major compaction (runs once every 24 hours) would have flushed and persisted data. Is there anything in the bucket? (hadoop fs -lsr ...)
0.20 is definitely the way to go, for a number of reasons. - Andy ________________________________ From: Ananth T. Sarathy <ananth.t.sara...@gmail.com> To: hbase-user@hadoop.apache.org Sent: Wed, October 7, 2009 12:46:24 PM Subject: Re: hbase on s3 and safemode thanks for all the help <property> <name>hbase.rootdir</name> <value>s3://hbase2.s3.amazonaws.com:80/hbasedata</value> <description>The directory shared by region servers. Should be fully-qualified to include the filesystem to use. E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR </description> </property> that's in our hbase-site.xml We had been running for about 6 days with new issues. at 130 this morning it just crapped out. We are thinking about just moving to 20.0 and starting over. Ananth T Sarathy On Wed, Oct 7, 2009 at 3:41 PM, Andrew Purtell <apurt...@apache.org> wrote: > Did you edit hbase-site.xml such that HBase data directories are not in > /tmp? Maybe a silly question... but it happens sometimes. > > If your hbase.rootdir points to an HDFS filesystem, what does 'hadoop fs > -lsr hdfs://namenode:port/path/to/hbase/root' show? > > You said this was working before? Did you shut down and bring HBase back up > before without trouble? Is this a new install? > > - Andy > > > > > > ________________________________ > From: Ananth T. Sarathy <ananth.t.sara...@gmail.com> > To: hbase-user@hadoop.apache.org > Sent: Wed, October 7, 2009 12:34:28 PM > Subject: Re: hbase on s3 and safemode > > ok. so we finally got the regionserver to come up (We killed all the > processes on the box and finally the regionserver came back up) > but when it did, there is no data in our tables. Though the tables are > there. Any ideas where the data went or how I can get it back? > > Ananth T Sarathy > > > On Wed, Oct 7, 2009 at 2:46 PM, Andrew Purtell <apurt...@apache.org> > wrote: > > > One option is to add SYSV init scripts that on boot take the following > > equivalent actions: > > > > hbase-daemon.sh start zookeeper > > > > hbase-daemon.sh start master > > > > hbase-daemon.sh start regionserver > > > > Set the respective init scripts to run according to host role. > > > > This presumes you have also added init scripts that start up DFS daemons > > wherever they should be, equivalents to the following: > > > > hadoop-daemon.sh start namenode > > > > hadoop-daemon.sh start datanode > > > > hadoop-daemon.sh start secondarynamenode > > > > You can start everything up all at once. The respective daemons will wait > > for each others' services to become available. Ignore ZK noise in the > logs > > about connection difficulties unless they persist for minutes. > > > > If you want to try out the Cloudera Hadoop distribution for 0.20, they > have > > RPMs that will take care of all of this for you, and we have a RPM for > that > > platform that I can provide you. > > > > Do also check your network configuration. > > > > - Andy > > > > > > > > > > ________________________________ > > From: Ananth T. Sarathy <ananth.t.sara...@gmail.com> > > To: hbase-user@hadoop.apache.org > > Sent: Wed, October 7, 2009 11:36:22 AM > > Subject: Re: hbase on s3 and safemode > > > > is there a way to turn my regionservers on implicitly besides > > start-hbase.sh? > > Ananth T Sarathy > > > > > > On Wed, Oct 7, 2009 at 2:31 PM, Andrew Purtell <apurt...@apache.org> > > wrote: > > > > > HBase won't leave safe mode if the regionservers cannot contact the > > master. > > > So the question is why cannot your regionservers contact the master. If > > the > > > regionserver processes are confirmed running, then it's a firewall or > AWS > > > Security Groups config problem most likely. > > > > > > status was a shell command added in 0.20 IIRC. > > > > > > - Andy > > > > > > > > > > > > > > > ________________________________ > > > From: Ananth T. Sarathy <ananth.t.sara...@gmail.com> > > > To: hbase-user@hadoop.apache.org > > > Sent: Wed, October 7, 2009 11:04:03 AM > > > Subject: Re: hbase on s3 and safemode > > > > > > i suppose we need to, but for now it's kind of a pain because we need > to > > > coordinate our clients. > > > > > > But the problem is why was it working and all of the sudden it's stuck > in > > > safemode and how to can get back up? > > > > > > Ananth T Sarathy > > > > > > > > > On Wed, Oct 7, 2009 at 1:58 PM, stack <st...@duboce.net> wrote: > > > > > > > Can you update to 0.20.0? (Oodles of improvements). > > > > St.Ack > > > > > > > > On Wed, Oct 7, 2009 at 10:56 AM, Ananth T. Sarathy < > > > > ananth.t.sara...@gmail.com> wrote: > > > > > > > > > I get an error > > > > > > > > > > hbase(main):001:0> status "detailed" > > > > > NoMethodError: undefined method `status' for #<Object:0x5585c0de> > > > > > from (hbase):2 > > > > > hbase(main):002:0> status "detailed" > > > > > NoMethodError: undefined method `status' for #<Object:0x5585c0de> > > > > > from (hbase):3 > > > > > > > > > > > > > > > we are running 0.19.3 > > > > > > > > > > Ananth T Sarathy > > > > > > > > > > > > > > > On Wed, Oct 7, 2009 at 1:51 PM, stack <st...@duboce.net> wrote: > > > > > > > > > > > This state persists even if you shutdown hbase and zk and > restart? > > > > > > > > > > > > In shell, do: > > > > > > > > > > > > > status "detailed" > > > > > > > > > > > > At the top there is a section which says regions in transistion. > > > > > Anything > > > > > > there? > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > On Wed, Oct 7, 2009 at 10:35 AM, Ananth T. Sarathy < > > > > > > ananth.t.sara...@gmail.com> wrote: > > > > > > > > > > > > > Here is the log since I started it... > > > > > > > > > > > > > > Wed Oct 7 13:27:26 EDT 2009 Starting master on ip-10-244-9-171 > > > > > > > ulimit -n 1024 > > > > > > > 2009-10-07 13:27:26,404 INFO > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun > > Microsystems > > > > > Inc., > > > > > > > vmVersion=14.2-b01 > > > > > > > 2009-10-07 13:27:26,405 INFO > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > > vmInputArguments=[-Xmx2000m, -XX:+HeapDumpOnOutOfMemoryError, > > > > > > > -Djava.io.tmpdir=/mnt/tmp, > > > > > > > -Dhbase.log.dir=/mnt/apps/hadoop/hbase/bin/../logs, > > > > > > > -Dhbase.log.file=hbase-root-master-ip-10-244-9-171.log, > > > > > > > -Dhbase.home.dir=/mnt/apps/hadoop/hbase/bin/.., > > > -Dhbase.id.str=root, > > > > > > > -Dhbase.root.logger=INFO,DRFA, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Djava.library.path=/mnt/apps/hadoop/hbase/bin/../lib/native/Linux-amd64-64] > > > > > > > 2009-10-07 13:27:27,525 INFO > > > org.apache.hadoop.hbase.master.HMaster: > > > > > Root > > > > > > > region dir: s3:// > > > > hbase2.s3.amazonaws.com:80/hbasedata/-ROOT-/70236052 > > > > > > > 2009-10-07< > > > > > > > > > > > > http://hbase2.s3.amazonaws.com:80/hbasedata/-ROOT-/70236052%0A2009-10-07 > > > > > >13:27:27,751 > > > > > > INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: > > > > > > > Initializing RPC Metrics with hostName=HMaster, port=60000 > > > > > > > 2009-10-07 13:27:27,827 INFO > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > > HMaster > > > > > > > initialized on 10.244.9.171:60000 > > > > > > > 2009-10-07 13:27:27,829 INFO > > > > org.apache.hadoop.metrics.jvm.JvmMetrics: > > > > > > > Initializing JVM Metrics with processName=Master, > > sessionId=HMaster > > > > > > > 2009-10-07 13:27:27,830 INFO > > > > > > > org.apache.hadoop.hbase.master.metrics.MasterMetrics: > Initialized > > > > > > > 2009-10-07 13:27:27,932 INFO org.mortbay.util.Credential: > > Checking > > > > > > Resource > > > > > > > aliases > > > > > > > 2009-10-07 13:27:27,936 INFO org.mortbay.http.HttpServer: > Version > > > > > > > Jetty/5.1.4 > > > > > > > 2009-10-07 13:27:27,936 INFO org.mortbay.util.Container: > Started > > > > > > > HttpContext[/logs,/logs] > > > > > > > 2009-10-07 13:27:28,202 INFO org.mortbay.util.Container: > Started > > > > > > > org.mortbay.jetty.servlet.webapplicationhand...@3209fa8f > > > > > > > 2009-10-07 13:27:28,244 INFO org.mortbay.util.Container: > Started > > > > > > > WebApplicationContext[/static,/static] > > > > > > > 2009-10-07 13:27:28,361 INFO org.mortbay.util.Container: > Started > > > > > > > org.mortbay.jetty.servlet.webapplicationhand...@b0c0f66 > > > > > > > 2009-10-07 13:27:28,364 INFO org.mortbay.util.Container: > Started > > > > > > > WebApplicationContext[/,/] > > > > > > > 2009-10-07 13:27:28,636 INFO org.mortbay.util.Container: > Started > > > > > > > org.mortbay.jetty.servlet.webapplicationhand...@3c2d7440 > > > > > > > 2009-10-07 13:27:28,638 INFO org.mortbay.util.Container: > Started > > > > > > > WebApplicationContext[/api,rest] > > > > > > > 2009-10-07 13:27:28,639 INFO org.mortbay.http.SocketListener: > > > Started > > > > > > > SocketListener on 0.0.0.0:60010 > > > > > > > 2009-10-07 13:27:28,639 INFO org.mortbay.util.Container: > Started > > > > > > > org.mortbay.jetty.ser...@28b301f2 > > > > > > > 2009-10-07 13:27:28,640 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > Responder: starting > > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > listener on 60000: starting > > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 0 on 60000: starting > > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 1 on 60000: starting > > > > > > > 2009-10-07 13:27:28,641 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 2 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 3 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 4 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 5 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 6 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 7 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 8 on 60000: starting > > > > > > > 2009-10-07 13:27:28,642 DEBUG > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > > Started service threads > > > > > > > 2009-10-07 13:27:28,643 INFO org.apache.hadoop.ipc.HBaseServer: > > IPC > > > > > > Server > > > > > > > handler 9 on 60000: starting > > > > > > > 2009-10-07 13:28:09,519 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:28:11,542 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:28:13,543 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:28:15,545 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:28:17,548 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:28:19,555 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:28:27,834 INFO > > > > > org.apache.hadoop.hbase.master.BaseScanner: > > > > > > > All > > > > > > > 0 .META. region(s) scanned > > > > > > > 2009-10-07 13:29:27,832 INFO > > > > > org.apache.hadoop.hbase.master.BaseScanner: > > > > > > > All > > > > > > > 0 .META. region(s) scanned > > > > > > > 2009-10-07 13:29:37,593 INFO > > > > > > org.apache.hadoop.hbase.master.RegionManager: > > > > > > > in safe mode > > > > > > > 2009-10-07 13:30:27,834 INFO > > > > > org.apache.hadoop.hbase.master.BaseScanner: > > > > > > > All > > > > > > > 0 .META. region(s) scanned > > > > > > > 2009-10-07 13:31:27,836 INFO > > > > > org.apache.hadoop.hbase.master.BaseScanner: > > > > > > > All > > > > > > > 0 .META. region(s) scanned > > > > > > > 2009-10-07 13:32:27,838 INFO > > > > > org.apache.hadoop.hbase.master.BaseScanner: > > > > > > > All > > > > > > > 0 .META. region(s) scanned > > > > > > > 2009-10-07 13:33:27,840 INFO > > > > > org.apache.hadoop.hbase.master.BaseScanner: > > > > > > > All > > > > > > > 0 .META. region(s) scanned > > > > > > > > > > > > > > > > > > > > > Ananth T Sarathy > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 7, 2009 at 1:20 PM, stack <st...@duboce.net> > wrote: > > > > > > > > > > > > > > > Thats interesting to hear. Keep us posted. > > > > > > > > > > > > > > > > HBase asks the filesystem if its in safe mode and if it is, > it > > > > parks > > > > > > > > itself. Here is code from master: > > > > > > > > > > > > > > > > if (this.fs instanceof DistributedFileSystem) { > > > > > > > > // Make sure dfs is not in safe mode > > > > > > > > String message = "Waiting for dfs to exit safe mode..."; > > > > > > > > while (((DistributedFileSystem) fs).setSafeMode( > > > > > > > > FSConstants.SafeModeAction.SAFEMODE_GET)) { > > > > > > > > LOG.info(message); > > > > > > > > try { > > > > > > > > Thread.sleep(this.threadWakeFrequency); > > > > > > > > } catch (InterruptedException e) { > > > > > > > > //continue > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > Then there is hbase's notion of safemode. It will be in safe > > > mode > > > > > > until > > > > > > > it > > > > > > > > does initial scan of catalog tables. The master keeps a flag > > in > > > > > > > zookeeper > > > > > > > > while its in safemode so regionservers are aware of the > state: > > > > > > > > > > > > > > > > public boolean inSafeMode() { > > > > > > > > if (safeMode) { > > > > > > > > if(isInitialMetaScanComplete() && > > regionsInTransition.size() > > > > == > > > > > 0 > > > > > > && > > > > > > > > tellZooKeeperOutOfSafeMode()) { > > > > > > > > master.connection.unsetRootRegionLocation(); > > > > > > > > safeMode = false; > > > > > > > > LOG.info("exiting safe mode"); > > > > > > > > } else { > > > > > > > > LOG.info("in safe mode"); > > > > > > > > } > > > > > > > > } > > > > > > > > return safeMode; > > > > > > > > } > > > > > > > > > > > > > > > > Have you seen the .META. and -ROOT- deploy to regionservers? > > > Have > > > > > you > > > > > > > seen > > > > > > > > that these regions being scanned in the master log? (Enable > > > DEBUG > > > > if > > > > > > not > > > > > > > > already enabled). > > > > > > > > > > > > > > > > Yours, > > > > > > > > ST.Ack > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 7, 2009 at 10:06 AM, Ananth T. Sarathy < > > > > > > > > ananth.t.sara...@gmail.com> wrote: > > > > > > > > > > > > > > > > > We have been running Hbase on a s3 filesystem. It's the > hbase > > > > > > > > regionserver, > > > > > > > > > not HDFS since we are using s3. We haven't felt like it's > > been > > > > too > > > > > > > slow, > > > > > > > > > though the amount of data we are pushing isn't sufficiently > > > large > > > > > > > enough > > > > > > > > to > > > > > > > > > notice yet. > > > > > > > > > Ananth T Sarathy > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 7, 2009 at 12:47 PM, stack <st...@duboce.net> > > > wrote: > > > > > > > > > > > > > > > > > > > HBase or HDFS is in safe mode. My guess is that its the > > > > latter. > > > > > > > Can > > > > > > > > > you > > > > > > > > > > figure from HDFS logs why it won't leave safe mode? > > Usually > > > > > > > > > > under-replication or a loss of a large swath of the > cluster > > > > will > > > > > > flip > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > safe-mode switch. > > > > > > > > > > > > > > > > > > > > Are you trying to run HBASE on an S3 filesystem? An > > HBasista > > > > > tried > > > > > > > it > > > > > > > > in > > > > > > > > > > the past and, FYI, found it insufferably slow. Let us > know > > > how > > > > > it > > > > > > > goes > > > > > > > > > for > > > > > > > > > > you. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > On Wed, Oct 7, 2009 at 9:33 AM, Ananth T. Sarathy < > > > > > > > > > > ananth.t.sara...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > my regionserver has been stuck in safemode. What can i > > do > > > to > > > > > get > > > > > > > it > > > > > > > > > out > > > > > > > > > > > safemode? > > > > > > > > > > > > > > > > > > > > > > Ananth T Sarathy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam protection around > > > http://mail.yahoo.com > > > > > > > > > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com