Re: hbase on s3 and safemode

Andrew Hitchcock Wed, 14 Oct 2009 14:26:15 -0700

> Are you trying to run HBASE on an S3 filesystem?  An HBasista tried it in
> the past and, FYI, found it insufferably slow.  Let us know how it goes for
> you.

Hi HBasers,

I'm a little late to this conversation, but I thought I should add my 2¢.

I would recommend NOT writing directly to Hadoop's S3 file systems
from HBase. Not for speed reasons (I don't know how it would perform),
but because S3 is eventually consistent. Hadoop tends to assume that
its underlying distributed file system is consistent. HDFS is
consistent, so it works for most users, but this assumption breaks
down when you are using one of the S3 file systems (s3:// or s3n://).

There are places in Hadoop which write a file and then immediately go
to read it again. Normally S3 reaches consistency quickly enough for
this to not be a problem, but there are times it can take a little bit
longer. In most of these cases, Hadoop assumes that if the file isn't
there now it'll never be there (since HDFS is consistent), so it
either ignores the missing file or throws an error.

Unless HBase was specifically architected to allow eventually
consistent datastores, then I imagine problems will crop up in
production.

I'll admit I'm not familiar with HBase's internals, but I can imagine
a situation like this: HBase decides a log file has gotten too large
and wants to split it. It finishes writing and then closes the file.
(With S3N, the file is actually uploaded to S3 during the close, so
this takes longer than it would with HDFS). As soon as it is finished
calling close(), HBase opens the file for reading but the file might
not have appeared yet. What does HBase do then? I don't know.

Before I trusted HBase on S3 with important data, I'd first want to
verify that it handles eventual consistency properly. Also, S3N
doesn't support append, which I believe HBase uses in the newer
versions (or will soon).

Again, I'm not intimately familiar with the HBase internals, I'm just
presenting my worries. Stack and others, please correct me if I'm
wrong and HBase already takes this into account.

My suggestion would be to run HDFS on your cluster, tell HBase to
write to HDFS, and then make periodic snapshots of your data to S3.

Regards,
Andrew

On Wed, Oct 7, 2009 at 9:47 AM, stack <[email protected]> wrote:
> HBase or HDFS is in safe mode.  My guess is that its the latter.   Can you
> figure from HDFS logs why it won't leave safe mode?  Usually
> under-replication or a loss of a large swath of the cluster will flip on the
> safe-mode switch.
>
> Are you trying to run HBASE on an S3 filesystem?  An HBasista tried it in
> the past and, FYI, found it insufferably slow.  Let us know how it goes for
> you.
>
> Thanks,
> St.Ack
>
> On Wed, Oct 7, 2009 at 9:33 AM, Ananth T. Sarathy <
> [email protected]> wrote:
>
>> my  regionserver has been stuck in safemode. What can i do to get it out
>> safemode?
>>
>> Ananth T Sarathy
>>
>

Re: hbase on s3 and safemode

Reply via email to