-1 on this idea as suggested. Even Google does not distribute
DFS or Bigtable across data centers (see the Bigtable paper at
http://labs.google.com/papers/bigtable.html ). What the paper
does not mention is that they can replicate a table to multiple
data centers for business continuity and backup. This is on the
road map for HBase but is still quite a way down the road.

In addition, we do want to add 'rack awareness' within a data
center for fault tolerance and load balancing. This is also
not going to happen in the immediate future.

We are currently focusing on making what we have more fault
tolerant and are starting to work on performance issues.

Answers to your two questions inline below.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Andrew Purtell [mailto:[EMAIL PROTECTED]
> Sent: Monday, January 07, 2008 8:49 PM
> To: hadoop-dev
> Subject: [hbase] table HRegionServer affinity
>
> Hello,
>
> Consider the case of a global federation of Hadoop clusters,
> with a single global HBase master, divided into a number of
> geographic regions each with a local DFS, local workload, and
> region server backed by that DFS. This setup allows for a
> global HBase space, where any region may retrieve rows stored
> by any other region -- which is quite useful -- but, in
> addition to this, it would also be useful to be able to
> specify constraints on data mobility and also to be able to
> scope queries to a particular region.
>
> To be a bit more specific, I have three things in mind:
>
> 1) The ability to fix a given key range to a region. This

I assume here you mean geographic region and not an HBase
table region.

> would both assign a range to a given region, and also disable
> splitting over that range. Aside from API changes, ideally
> there would be a HBase shell command to support this.
>
> 2) Syntactic support in HBase shell for table affinity to a
> given region server:
>
>      CREATE TABLE ... REGION=10.10.10.10
>
> (or similar) This would fix an entire table to a region.
>
> 3) Query support for scoping the result set based on region
> server:
>
>      SELECT ... WHERE @REGION=10.10.10.10 AND ...
>
> (or similar)
>
> Given the inflexibility of IP or hostnames to name regions,
> perhaps a mechanism for assigning logical labels to a region
> server (or even group of region servers, where a prohibition
> on splitting may be relaxed to allow splitting over the
> group) would also be useful.
>
> As I am still coming up to speed on Hadoop and HBase and the
> code base, I kindly ask for the answers to two questions.
>
> First: How invasive to the HBase master/region model is the
> concept of specifying constraints on data mobility?

It would be very disruptive. The current model is that you
run one or more HBase clusters per HDFS cluster. An HBase
cluster does not span HDFS clusters.

As far as I know HDFS clusters do not span data centers.
Latency and network partitioning would be big problems for
a system that requires sub-second response times.

> Second: How difficult would the modifications may be to accomplish?

A change such as this would require major changes to the
architecture and our vision of the model going forward.
(replication between data centers and a single table residing
in multiple data centers being served by separate HBase
instances running on separate HDFS clusters).

> I believe these questions to be related. :-)
>
> Thanks,
>
> Andrew Purtell
> Advanced Threats Research
> Trend Micro, Inc., Pasadena, CA, USA
> (personal mail)
>
>
>
>
> ______________________________________________________________
> ______________________
> Looking for last minute shopping deals?
> Find them fast with Yahoo! Search.
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release
> Date: 1/7/2008 9:14 AM
>
>

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 
AM

Reply via email to