Yoav Morag wrote:
J-D, thanks for the quick  response.
I currently use getRegionsInfo method from HTable, and scan for the
RegionInfo which has the same address as <this> node, I suppose under the
hood it does about what you suggest below. I would love to have a more solid
node ID (an access to the local startcode?) though.
You have to scan .META. to get serverstartcode currently (What'll your crawler do when startcode changes?).

You can subclass HRegionServer (See the just-added regionserver.transactional.TransactionalRegionServer for an example). Maybe a CrawlingRegionServer would make sense in your case?
also, perhaps you know whether it is possible to manually assign multiple
regions to a specific server ? (I thought of implementing a consistent hash
schema to distribute the keys among regions) - the questions are whether
hbase will split the regions by itself in that case, and whether multiple
regions per node are supported .

Many regions per node is 'normal'.

Override the assignment function to change its behavior. See master.RegionManager#assignRegions. Current implementation balances the regions across the cluster. If its hard for you to get your assigment function into place, let us know -- or patch it -- and we'll fix/commit it.

Regionservers run the region splits. When done, they tell the master the region that was split -- now closed -- and the names of the new daughters. Master then assigns new regions which are then 'opened' on the new servers and away we go again.

St.Ack
Yoav.

On Sun, Aug 17, 2008 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

Yoav,

If this is what you want to do (I'm not very good in crawler design, so my
opinion on your solution may not be the best), you can query the .META.
table directly. This way you have the regions assignment and their row
range.

J-D

On Sun, Aug 17, 2008 at 9:45 AM, Yoav Morag <[EMAIL PROTECTED]> wrote:

no, what I am trying to do is to build a distributed crawler that will
limit
the scope of the *local* crawler node to the range of the keys currently
stored in the local DB node. so I am trying to get distribution info from
HBase to synchronize with it.
the node# query was just an idea to have some internal load balance among
the crawler nodes -
what I probably really need is a way to get only the *local* hbase range
-
maybe you can tip me on this too ? :-)

Yoav.

On Sun, Aug 17, 2008 at 3:18 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
wrote:
Ah yeah sorry, the method is not exposed and casting HMI to HM will
surely
fail since it's a proxy. Maybe this is a feature we can expose in
HBaseAdmin, I guess it would be cleaner. BTW, what's your need exactly?
Are
you build some kind of custom management interface?

thx

J-D

On Sun, Aug 17, 2008 at 4:39 AM, <[EMAIL PROTECTED]> wrote:

thanks , J-D.
b.t.w, getMaster() returns an HMasterInterface, which I am forced to
cast
to HMaster to get that functionality. Is it solid to assume I can
always
do
this cast ?
Yoav.

Jean-Daniel Cryans wrote:
Yoav,

HBaseAdmin.getMaster().getServersToServerInfo()

J-D

On Thu, Aug 14, 2008 at 8:44 AM, yoav.morag <[EMAIL PROTECTED]>
wrote:
 hi guys
does anyone knows how I can query hbase (or hadoop) for the number
of
physical data nodes (slaves) ?
Yoav.
--
View this message in context:

http://www.nabble.com/query-number-of-data-nodes-tp18980802p18980802.html
Sent from the HBase User mailing list archive at Nabble.com.


Quoted from:

http://www.nabble.com/query-number-of-data-nodes-tp18980802p18981558.html


Reply via email to