Re: query number of data nodes

stack Sun, 17 Aug 2008 15:37:32 -0700

Yoav Morag wrote:

J-D, thanks for the quick  response.
I currently use getRegionsInfo method from HTable, and scan for the
RegionInfo which has the same address as <this> node, I suppose under the
hood it does about what you suggest below. I would love to have a more solid
node ID (an access to the local startcode?) though.

You have to scan .META. to get serverstartcode currently (What'll yourcrawler do when startcode changes?).

You can subclass HRegionServer (See the just-addedregionserver.transactional.TransactionalRegionServer for an example).Maybe a CrawlingRegionServer would make sense in your case?

also, perhaps you know whether it is possible to manually assign multiple
regions to a specific server ? (I thought of implementing a consistent hash
schema to distribute the keys among regions) - the questions are whether
hbase will split the regions by itself in that case, and whether multiple
regions per node are supported .


Many regions per node is 'normal'.

Override the assignment function to change its behavior. Seemaster.RegionManager#assignRegions. Current implementation balances theregions across the cluster. If its hard for you to get your assigmentfunction into place, let us know -- or patch it -- and we'll fix/commit it.

Regionservers run the region splits. When done, they tell the masterthe region that was split -- now closed -- and the names of the newdaughters. Master then assigns new regions which are then 'opened' onthe new servers and away we go again.


St.Ack

Yoav.

On Sun, Aug 17, 2008 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

Yoav,

If this is what you want to do (I'm not very good in crawler design, so my
opinion on your solution may not be the best), you can query the .META.
table directly. This way you have the regions assignment and their row
range.

J-D

On Sun, Aug 17, 2008 at 9:45 AM, Yoav Morag <[EMAIL PROTECTED]> wrote:

no, what I am trying to do is to build a distributed crawler that will
limit
the scope of the *local* crawler node to the range of the keys currently
stored in the local DB node. so I am trying to get distribution info from
HBase to synchronize with it.
the node# query was just an idea to have some internal load balance among
the crawler nodes -
what I probably really need is a way to get only the *local* hbase range

maybe you can tip me on this too ? :-)

Yoav.

On Sun, Aug 17, 2008 at 3:18 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]

wrote:
Ah yeah sorry, the method is not exposed and casting HMI to HM will

surely

fail since it's a proxy. Maybe this is a feature we can expose in
HBaseAdmin, I guess it would be cleaner. BTW, what's your need exactly?

Are

you build some kind of custom management interface?

thx

J-D

On Sun, Aug 17, 2008 at 4:39 AM, <[EMAIL PROTECTED]> wrote:

thanks , J-D.
b.t.w, getMaster() returns an HMasterInterface, which I am forced to

cast

to HMaster to get that functionality. Is it solid to assume I can

always

do

this cast ?
Yoav.

Jean-Daniel Cryans wrote:

Yoav,

HBaseAdmin.getMaster().getServersToServerInfo()

J-D

On Thu, Aug 14, 2008 at 8:44 AM, yoav.morag <[EMAIL PROTECTED]>

wrote:

 hi guys
does anyone knows how I can query hbase (or hadoop) for the number

of

physical data nodes (slaves) ?
Yoav.
--
View this message in context:

http://www.nabble.com/query-number-of-data-nodes-tp18980802p18980802.html

Sent from the HBase User mailing list archive at Nabble.com.

Quoted from:

http://www.nabble.com/query-number-of-data-nodes-tp18980802p18981558.html

Re: query number of data nodes

Reply via email to