Re: hfile v2 and bloomfilter

2016-05-15 Thread Jerry He
Another good place to look at is the design doc attached to the HFile v2
JIRA:
https://issues.apache.org/jira/browse/HBASE-3857

Jerry


On Sun, May 15, 2016 at 5:12 PM, Stack  wrote:

> On Sun, May 15, 2016 at 5:05 AM, Shushant Arora  >
> wrote:
>
> > In Hfile v2 block level blommfilters are stored inb scanned section along
> > with data block and leaf index.
> >
> > Load on open section contains bloomfilter data . Whats this bloom filter
> > data?
> >
>
> To what are you referring to when you say 'on open section'?
>
>
> > 1.Does it contains index of bloomchunks stored in scanned section ?
> > 2.What does meta blocks of non scanned section contains.
> > 3.Does leaf level index contains row keys only? Will having tall table vs
> > wide table affect the size of leaf index.
> >
> >
> You might be better off reading the code. Have you tried? You'd get a more
> trustworthy answer (smile).
>
> St.Ack
>
>
>
> > Thanks!
> >
>


Re: hbase zookeeper lag

2016-05-15 Thread Stack
On Sat, May 14, 2016 at 7:43 PM, Shushant Arora 
wrote:

> Hi
>
> Hbase uses zookeeper for various purposes. e.g for region split.
>
> Regionserver creates a znode in zookeeper with splitting state and master
> gets notification of this directory , since zookeeper is not fully
> consistent - there may be lag between  actual directory creation and
> notification till then regionserver will start splitting.
> 1.will this lag creates issue- Region is already splitted in two but master
> does not even know about it until lag of zookeeper is cleared.
>
>
Have you seen an issue? First the regionserver 'asks' the master if it is
ok to split (before splitting).  If master says it is ok by changing the
znode state, then regionserver proceeds notifying the master via state
change in zk. There could be some lag here of course given there is an RPC
-- and in hbase 2.0, the intent is to undo our going via zk -- but we've
not had this identified as a problem. Have you seen it as so?


>
> and also when regionserver is down it will be notified to master but there
> also it can be lag. So it can happen a node in zookeeper is lagging lot
> behind say ~2minutes . So master will be notified after 2 minutes.
>


There is RPC so there may be a lag, yes.



> 2.Won't this lag create issue- make client will get region not reachable
> and will try with backoff but actual recovery of region server backup will
> start after 2 minutes?
>
>
(Where'd you get the two minutes?)

Yes, lag could slow down recovery.

St.Ack



> Thanks!
>


Re: hfile v2 and bloomfilter

2016-05-15 Thread Stack
On Sun, May 15, 2016 at 5:05 AM, Shushant Arora 
wrote:

> In Hfile v2 block level blommfilters are stored inb scanned section along
> with data block and leaf index.
>
> Load on open section contains bloomfilter data . Whats this bloom filter
> data?
>

To what are you referring to when you say 'on open section'?


> 1.Does it contains index of bloomchunks stored in scanned section ?
> 2.What does meta blocks of non scanned section contains.
> 3.Does leaf level index contains row keys only? Will having tall table vs
> wide table affect the size of leaf index.
>
>
You might be better off reading the code. Have you tried? You'd get a more
trustworthy answer (smile).

St.Ack



> Thanks!
>


Re: hbase block and columnfamily of a row

2016-05-15 Thread Stack
On Sat, May 14, 2016 at 10:46 PM, Shushant Arora 
wrote:

> can a hbase table with single column family hve its row spawned on
>  multiple blocks in a same HFile ?
>
>
Yes. A row could have many entries in it.. too many for a single block.



> Suppose there is only one hfile in that case is it possile a column family
> having 5-6 columns is spawned on multiple blocks ? or its always block is
> closed at max( 64k default or when all columns of a columnfamily for a
> single row fits in that block).
>
>
It is possible. It is usual in fact.

At an extreme a single entry/Cell could fill a whole block if it were 64k
or bigger in size. A single Cell will never span blocks.

A block is completed as soon as it exceeds the configured block size.
Blocks are rarely exactly 64k in size.

St.Ack


> Thanks!
>


hfile v2 and bloomfilter

2016-05-15 Thread Shushant Arora
In Hfile v2 block level blommfilters are stored inb scanned section along
with data block and leaf index.

Load on open section contains bloomfilter data . Whats this bloom filter
data?
1.Does it contains index of bloomchunks stored in scanned section ?
2.What does meta blocks of non scanned section contains.
3.Does leaf level index contains row keys only? Will having tall table vs
wide table affect the size of leaf index.

Thanks!