Hello, Consider the case of a global federation of Hadoop clusters, with a single global HBase master, divided into a number of geographic regions each with a local DFS, local workload, and region server backed by that DFS. This setup allows for a global HBase space, where any region may retrieve rows stored by any other region -- which is quite useful -- but, in addition to this, it would also be useful to be able to specify constraints on data mobility and also to be able to scope queries to a particular region.
To be a bit more specific, I have three things in mind: 1) The ability to fix a given key range to a region. This would both assign a range to a given region, and also disable splitting over that range. Aside from API changes, ideally there would be a HBase shell command to support this. 2) Syntactic support in HBase shell for table affinity to a given region server: CREATE TABLE ... REGION=10.10.10.10 (or similar) This would fix an entire table to a region. 3) Query support for scoping the result set based on region server: SELECT ... WHERE @REGION=10.10.10.10 AND ... (or similar) Given the inflexibility of IP or hostnames to name regions, perhaps a mechanism for assigning logical labels to a region server (or even group of region servers, where a prohibition on splitting may be relaxed to allow splitting over the group) would also be useful. As I am still coming up to speed on Hadoop and HBase and the code base, I kindly ask for the answers to two questions. First: How invasive to the HBase master/region model is the concept of specifying constraints on data mobility? Second: How difficult would the modifications may be to accomplish? I believe these questions to be related. :-) Thanks, Andrew Purtell Advanced Threats Research Trend Micro, Inc., Pasadena, CA, USA (personal mail) ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping