> Are you trying to run HBASE on an S3 filesystem? An HBasista tried it in > the past and, FYI, found it insufferably slow. Let us know how it goes for > you.
Hi HBasers, I'm a little late to this conversation, but I thought I should add my 2¢. I would recommend NOT writing directly to Hadoop's S3 file systems from HBase. Not for speed reasons (I don't know how it would perform), but because S3 is eventually consistent. Hadoop tends to assume that its underlying distributed file system is consistent. HDFS is consistent, so it works for most users, but this assumption breaks down when you are using one of the S3 file systems (s3:// or s3n://). There are places in Hadoop which write a file and then immediately go to read it again. Normally S3 reaches consistency quickly enough for this to not be a problem, but there are times it can take a little bit longer. In most of these cases, Hadoop assumes that if the file isn't there now it'll never be there (since HDFS is consistent), so it either ignores the missing file or throws an error. Unless HBase was specifically architected to allow eventually consistent datastores, then I imagine problems will crop up in production. I'll admit I'm not familiar with HBase's internals, but I can imagine a situation like this: HBase decides a log file has gotten too large and wants to split it. It finishes writing and then closes the file. (With S3N, the file is actually uploaded to S3 during the close, so this takes longer than it would with HDFS). As soon as it is finished calling close(), HBase opens the file for reading but the file might not have appeared yet. What does HBase do then? I don't know. Before I trusted HBase on S3 with important data, I'd first want to verify that it handles eventual consistency properly. Also, S3N doesn't support append, which I believe HBase uses in the newer versions (or will soon). Again, I'm not intimately familiar with the HBase internals, I'm just presenting my worries. Stack and others, please correct me if I'm wrong and HBase already takes this into account. My suggestion would be to run HDFS on your cluster, tell HBase to write to HDFS, and then make periodic snapshots of your data to S3. Regards, Andrew On Wed, Oct 7, 2009 at 9:47 AM, stack <[email protected]> wrote: > HBase or HDFS is in safe mode. My guess is that its the latter. Can you > figure from HDFS logs why it won't leave safe mode? Usually > under-replication or a loss of a large swath of the cluster will flip on the > safe-mode switch. > > Are you trying to run HBASE on an S3 filesystem? An HBasista tried it in > the past and, FYI, found it insufferably slow. Let us know how it goes for > you. > > Thanks, > St.Ack > > On Wed, Oct 7, 2009 at 9:33 AM, Ananth T. Sarathy < > [email protected]> wrote: > >> my regionserver has been stuck in safemode. What can i do to get it out >> safemode? >> >> Ananth T Sarathy >> >
