Re: How can i get hbase table memory used? Why hdfs size of hbase table double when i use bulkload?

2016-05-02 Thread Ted Yu
For #1, consider increasing hfile.block.cache.size (assuming majority of
your reads are not point gets).

FYI

On Mon, May 2, 2016 at 6:41 PM, Jone Zhang  wrote:

> For #1
> My workload is read heavy.
> I use bulkload to  write data once a day.
>
> Thanks.
>
> 2016-04-30 1:13 GMT+08:00 Ted Yu :
>
> > For #1, can you clarify whether your workload is read heavy, write heavy
> or
> > mixed load of read and write ?
> >
> > For #2, have you run major compaction after the second bulk load ?
> >
> > On Thu, Apr 28, 2016 at 9:16 PM, Jone Zhang 
> > wrote:
> >
> > > *1、How can i get hbase table memory used?*
> > > *2、Why hdfs size of hbase table  double  when i use bulkload*
> > >
> > > bulkload file to qimei_info
> > >
> > > 101.7 G  /user/hbase/data/default/qimei_info
> > >
> > > bulkload same file to qimei_info agagin
> > >
> > > 203.3 G  /user/hbase/data/default/qimei_info
> > >
> > > hbase(main):001:0> describe 'qimei_info'
> > > DESCRIPTION
> > >
> > >
> > >  'qimei_info', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
> > =>
> > > 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '
> > >
> > >  1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => '2147483647',
> > > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> > >
> > >   IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > >
> > >
> > > 1 row(s) in 1.4170 seconds
> > >
> > >
> > > *Besh wishes.*
> > > *Thanks.*
> > >
> >
>


Re: hbase architecture doubts

2016-05-02 Thread Stack
On Mon, May 2, 2016 at 5:34 PM, Shushant Arora 
wrote:

> Thanks Stack.
>
> 1.So is it at any time there will be two reference 1.active memstore
> 2.snapshot memstore
> snapshot will be initialised at time of flush using active memstore with a
> momentaily lock and then active will be discarded and read will be served
> usinmg snapshot and write will go to new active memstore.
>
>
Yes


> 2key of CSLS is keyvalue . Which part of keyValue is used while sorting the
> set. Is it whole keyvalue or just row key. Does Hfile has separate entry
> for each key value and keyvalues of same row key are always stored
> contiguosly in HFile and may not be in same block?
>
>
Just the row key. Value is not considered in the sort.

Yes, HFile has separate entry for each KeyValue (or 'Cell' in hbase-speak).

Cells in HFile are sorted. Those of the same or near 'Cell' coordinates
will be sorted together and may therefore appear inside the same block.

St.Ack



> On Tue, May 3, 2016 at 12:05 AM, Stack  wrote:
>
> > On Mon, May 2, 2016 at 10:06 AM, Shushant Arora <
> shushantaror...@gmail.com
> > >
> > wrote:
> >
> > > Thanks Stack
> > >
> > > for point 2 :
> > > I am concerned with downtime of Hbase for read and write.
> > > If write lock is just for the time while we move aside the current
> > > MemStore.
> > > Then when a write happens to key will it update the memstore only but
> > > snapshot does not have that update and when snapshot is dunmped to
> Hfile
> > > won't we loose the update?
> > >
> > >
> > >
> > No. The update is in the new currently active MemStore. The update will
> be
> > included in the next flush added to a new hfile.
> >
> > St.Ack
> >
> >
> >
> >
> >
> > > On Mon, May 2, 2016 at 9:06 PM, Stack  wrote:
> > >
> > > > On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <
> > > shushantaror...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Few doubts;
> > > > >
> > > > > 1.LSM tree comprises two tree-like
> > > > >  structures,
> > > called
> > > > > C0 and
> > > > > C1 and If the insertion causes the C0 component to exceed a certain
> > > size
> > > > > threshold, a contiguous segment of entries is removed from C0 and
> > > merged
> > > > > into C1 on disk
> > > > >
> > > > > But in Hbase when C0 which is memstore I guess? is exceeded the
> > > threshold
> > > > > size its dumped on to HDFS as HFIle(c1 I guess?) - and does
> > compaction
> > > is
> > > > > the process which here means as merging of C0 and C1 ?
> > > > >
> > > > >
> > > > The 'merge' in the quoted high-level description may just mean that
> the
> > > > dumped hfile is 'merged' with the others at read time. Or it may be
> as
> > > > stated, that the 'merge' happens at flush time. Some LSM tree
> > > > implementations do it this way -- Bigtable, and it calls the merge of
> > > > memstore and a file-on-disk a form of compaction -- but this is not
> > what
> > > > HBase does; it just dumps the memstore as a flushed hfile. Later,
> we'll
> > > run
> > > > a compaction process to merge hfiles in background.
> > > >
> > > >
> > > >
> > > > > 2.Moves current, active Map aside as a snapshot (while a write lock
> > is
> > > > held
> > > > > for a short period of time), and then creates a new CSLS instances.
> > > > >
> > > > > In background, the snapshot is then dumped to disk. We get an
> > Iterator
> > > on
> > > > > CSLS. We write a block at a time. When we exceed configured block
> > size,
> > > > we
> > > > > start a new one.
> > > > >
> > > > > -- Does write lock is held till the time complete CSLS is dumpled
> on
> > > > > disk.
> > > >
> > > >
> > > >
> > > > No. Just while we move aside the current MemStore.
> > > >
> > > > What is your concern/objective? Are you studying LSM trees generally
> or
> > > are
> > > > you worried that HBase is offline for periods of time for read and
> > write?
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > And read is allowed using snapshot.
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 2, 2016 at 11:39 AM, Stack  wrote:
> > > > >
> > > > > > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <
> > > > > shushantaror...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in
> > > > > memstore?
> > > > > > >
> > > > > > > Yes (We use a CSLS but this is implemented over a CSLM).
> > > > > >
> > > > > >
> > > > > > > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > > > > > > Concurrentskiplist as Hfile2? Then How does it calculates
> blocks
> > > out
> > > > of
> > > > > > > CSLM and dmp them in HDFS.
> > > > > > >
> > > > > > >
> > > > > > Moves current, active Map aside as a snapshot (while a write lock
> > is
> > > > held
> > > > > > for a short period of time), and then creates a new CSLS
> instances.
> > > > > >
> > > > > > In background, the snapshot is then dumped

Re: How can i get hbase table memory used? Why hdfs size of hbase table double when i use bulkload?

2016-05-02 Thread Jone Zhang
For #1
My workload is read heavy.
I use bulkload to  write data once a day.

Thanks.

2016-04-30 1:13 GMT+08:00 Ted Yu :

> For #1, can you clarify whether your workload is read heavy, write heavy or
> mixed load of read and write ?
>
> For #2, have you run major compaction after the second bulk load ?
>
> On Thu, Apr 28, 2016 at 9:16 PM, Jone Zhang 
> wrote:
>
> > *1、How can i get hbase table memory used?*
> > *2、Why hdfs size of hbase table  double  when i use bulkload*
> >
> > bulkload file to qimei_info
> >
> > 101.7 G  /user/hbase/data/default/qimei_info
> >
> > bulkload same file to qimei_info agagin
> >
> > 203.3 G  /user/hbase/data/default/qimei_info
> >
> > hbase(main):001:0> describe 'qimei_info'
> > DESCRIPTION
> >
> >
> >  'qimei_info', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
> =>
> > 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '
> >
> >  1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => '2147483647',
> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> >
> >   IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> >
> >
> > 1 row(s) in 1.4170 seconds
> >
> >
> > *Besh wishes.*
> > *Thanks.*
> >
>


Re: hbase architecture doubts

2016-05-02 Thread Shushant Arora
Thanks Stack.

1.So is it at any time there will be two reference 1.active memstore
2.snapshot memstore
snapshot will be initialised at time of flush using active memstore with a
momentaily lock and then active will be discarded and read will be served
usinmg snapshot and write will go to new active memstore.

2key of CSLS is keyvalue . Which part of keyValue is used while sorting the
set. Is it whole keyvalue or just row key. Does Hfile has separate entry
for each key value and keyvalues of same row key are always stored
contiguosly in HFile and may not be in same block?

On Tue, May 3, 2016 at 12:05 AM, Stack  wrote:

> On Mon, May 2, 2016 at 10:06 AM, Shushant Arora  >
> wrote:
>
> > Thanks Stack
> >
> > for point 2 :
> > I am concerned with downtime of Hbase for read and write.
> > If write lock is just for the time while we move aside the current
> > MemStore.
> > Then when a write happens to key will it update the memstore only but
> > snapshot does not have that update and when snapshot is dunmped to Hfile
> > won't we loose the update?
> >
> >
> >
> No. The update is in the new currently active MemStore. The update will be
> included in the next flush added to a new hfile.
>
> St.Ack
>
>
>
>
>
> > On Mon, May 2, 2016 at 9:06 PM, Stack  wrote:
> >
> > > On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <
> > shushantaror...@gmail.com>
> > > wrote:
> > >
> > > > Thanks!
> > > >
> > > > Few doubts;
> > > >
> > > > 1.LSM tree comprises two tree-like
> > > >  structures,
> > called
> > > > C0 and
> > > > C1 and If the insertion causes the C0 component to exceed a certain
> > size
> > > > threshold, a contiguous segment of entries is removed from C0 and
> > merged
> > > > into C1 on disk
> > > >
> > > > But in Hbase when C0 which is memstore I guess? is exceeded the
> > threshold
> > > > size its dumped on to HDFS as HFIle(c1 I guess?) - and does
> compaction
> > is
> > > > the process which here means as merging of C0 and C1 ?
> > > >
> > > >
> > > The 'merge' in the quoted high-level description may just mean that the
> > > dumped hfile is 'merged' with the others at read time. Or it may be as
> > > stated, that the 'merge' happens at flush time. Some LSM tree
> > > implementations do it this way -- Bigtable, and it calls the merge of
> > > memstore and a file-on-disk a form of compaction -- but this is not
> what
> > > HBase does; it just dumps the memstore as a flushed hfile. Later, we'll
> > run
> > > a compaction process to merge hfiles in background.
> > >
> > >
> > >
> > > > 2.Moves current, active Map aside as a snapshot (while a write lock
> is
> > > held
> > > > for a short period of time), and then creates a new CSLS instances.
> > > >
> > > > In background, the snapshot is then dumped to disk. We get an
> Iterator
> > on
> > > > CSLS. We write a block at a time. When we exceed configured block
> size,
> > > we
> > > > start a new one.
> > > >
> > > > -- Does write lock is held till the time complete CSLS is dumpled on
> > > > disk.
> > >
> > >
> > >
> > > No. Just while we move aside the current MemStore.
> > >
> > > What is your concern/objective? Are you studying LSM trees generally or
> > are
> > > you worried that HBase is offline for periods of time for read and
> write?
> > >
> > > Thanks,
> > > St.Ack
> > >
> > >
> > >
> > > > And read is allowed using snapshot.
> > > >
> > > >
> > >
> > >
> > >
> > > > Thanks!
> > > >
> > > >
> > > >
> > > > On Mon, May 2, 2016 at 11:39 AM, Stack  wrote:
> > > >
> > > > > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <
> > > > shushantaror...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in
> > > > memstore?
> > > > > >
> > > > > > Yes (We use a CSLS but this is implemented over a CSLM).
> > > > >
> > > > >
> > > > > > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > > > > > Concurrentskiplist as Hfile2? Then How does it calculates blocks
> > out
> > > of
> > > > > > CSLM and dmp them in HDFS.
> > > > > >
> > > > > >
> > > > > Moves current, active Map aside as a snapshot (while a write lock
> is
> > > held
> > > > > for a short period of time), and then creates a new CSLS instances.
> > > > >
> > > > > In background, the snapshot is then dumped to disk. We get an
> > Iterator
> > > on
> > > > > CSLS. We write a block at a time. When we exceed configured block
> > size,
> > > > we
> > > > > start a new one.
> > > > >
> > > > >
> > > > > > 3.After dumping the inmemory CSLM of memstore to HFILe does
> > memstore
> > > > > > content is discarded
> > > > >
> > > > >
> > > > > Yes
> > > > >
> > > > >
> > > > >
> > > > > > and if while dumping memstore any read request comes
> > > > > > will it be responded by copy of memstore or discard of memstore
> > will
> > > be
> > > > > > blocked until read request is completed?
> > > > > >
> > > > > > We will respond using the snapshot until it has been successfully
> > > > dumped.
> >

Re: hbase architecture doubts

2016-05-02 Thread Stack
On Mon, May 2, 2016 at 10:06 AM, Shushant Arora 
wrote:

> Thanks Stack
>
> for point 2 :
> I am concerned with downtime of Hbase for read and write.
> If write lock is just for the time while we move aside the current
> MemStore.
> Then when a write happens to key will it update the memstore only but
> snapshot does not have that update and when snapshot is dunmped to Hfile
> won't we loose the update?
>
>
>
No. The update is in the new currently active MemStore. The update will be
included in the next flush added to a new hfile.

St.Ack





> On Mon, May 2, 2016 at 9:06 PM, Stack  wrote:
>
> > On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <
> shushantaror...@gmail.com>
> > wrote:
> >
> > > Thanks!
> > >
> > > Few doubts;
> > >
> > > 1.LSM tree comprises two tree-like
> > >  structures,
> called
> > > C0 and
> > > C1 and If the insertion causes the C0 component to exceed a certain
> size
> > > threshold, a contiguous segment of entries is removed from C0 and
> merged
> > > into C1 on disk
> > >
> > > But in Hbase when C0 which is memstore I guess? is exceeded the
> threshold
> > > size its dumped on to HDFS as HFIle(c1 I guess?) - and does compaction
> is
> > > the process which here means as merging of C0 and C1 ?
> > >
> > >
> > The 'merge' in the quoted high-level description may just mean that the
> > dumped hfile is 'merged' with the others at read time. Or it may be as
> > stated, that the 'merge' happens at flush time. Some LSM tree
> > implementations do it this way -- Bigtable, and it calls the merge of
> > memstore and a file-on-disk a form of compaction -- but this is not what
> > HBase does; it just dumps the memstore as a flushed hfile. Later, we'll
> run
> > a compaction process to merge hfiles in background.
> >
> >
> >
> > > 2.Moves current, active Map aside as a snapshot (while a write lock is
> > held
> > > for a short period of time), and then creates a new CSLS instances.
> > >
> > > In background, the snapshot is then dumped to disk. We get an Iterator
> on
> > > CSLS. We write a block at a time. When we exceed configured block size,
> > we
> > > start a new one.
> > >
> > > -- Does write lock is held till the time complete CSLS is dumpled on
> > > disk.
> >
> >
> >
> > No. Just while we move aside the current MemStore.
> >
> > What is your concern/objective? Are you studying LSM trees generally or
> are
> > you worried that HBase is offline for periods of time for read and write?
> >
> > Thanks,
> > St.Ack
> >
> >
> >
> > > And read is allowed using snapshot.
> > >
> > >
> >
> >
> >
> > > Thanks!
> > >
> > >
> > >
> > > On Mon, May 2, 2016 at 11:39 AM, Stack  wrote:
> > >
> > > > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <
> > > shushantaror...@gmail.com>
> > > > wrote:
> > > >
> > > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in
> > > memstore?
> > > > >
> > > > > Yes (We use a CSLS but this is implemented over a CSLM).
> > > >
> > > >
> > > > > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > > > > Concurrentskiplist as Hfile2? Then How does it calculates blocks
> out
> > of
> > > > > CSLM and dmp them in HDFS.
> > > > >
> > > > >
> > > > Moves current, active Map aside as a snapshot (while a write lock is
> > held
> > > > for a short period of time), and then creates a new CSLS instances.
> > > >
> > > > In background, the snapshot is then dumped to disk. We get an
> Iterator
> > on
> > > > CSLS. We write a block at a time. When we exceed configured block
> size,
> > > we
> > > > start a new one.
> > > >
> > > >
> > > > > 3.After dumping the inmemory CSLM of memstore to HFILe does
> memstore
> > > > > content is discarded
> > > >
> > > >
> > > > Yes
> > > >
> > > >
> > > >
> > > > > and if while dumping memstore any read request comes
> > > > > will it be responded by copy of memstore or discard of memstore
> will
> > be
> > > > > blocked until read request is completed?
> > > > >
> > > > > We will respond using the snapshot until it has been successfully
> > > dumped.
> > > > Once dumped, we'll respond using the hfile.
> > > >
> > > > No blocking (other than for the short period during which the
> snapshot
> > is
> > > > made and the file is swapped into the read path).
> > > >
> > > >
> > > >
> > > > > 4.When a read request comes does it look in inmemory CSLM and then
> in
> > > > > HFile?
> > > >
> > > >
> > > > Generally, yes.
> > > >
> > > >
> > > >
> > > > > And what is LogStructuredMerge tree and its usage in Hbase.
> > > > >
> > > > >
> > > > Suggest you read up on LSM Trees (
> > > > https://en.wikipedia.org/wiki/Log-structured_merge-tree) and if you
> > > still
> > > > can't see the LSM tree in the HBase forest, ask specific questions
> and
> > > > we'll help you out.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > > > Thanks!
> > > > >
> > > >
> > >
> >
>


Re: hbase architecture doubts

2016-05-02 Thread Shushant Arora
Thanks Stack

for point 2 :
I am concerned with downtime of Hbase for read and write.
If write lock is just for the time while we move aside the current
MemStore.
Then when a write happens to key will it update the memstore only but
snapshot does not have that update and when snapshot is dunmped to Hfile
won't we loose the update?


On Mon, May 2, 2016 at 9:06 PM, Stack  wrote:

> On Mon, May 2, 2016 at 1:25 AM, Shushant Arora 
> wrote:
>
> > Thanks!
> >
> > Few doubts;
> >
> > 1.LSM tree comprises two tree-like
> >  structures, called
> > C0 and
> > C1 and If the insertion causes the C0 component to exceed a certain size
> > threshold, a contiguous segment of entries is removed from C0 and merged
> > into C1 on disk
> >
> > But in Hbase when C0 which is memstore I guess? is exceeded the threshold
> > size its dumped on to HDFS as HFIle(c1 I guess?) - and does compaction is
> > the process which here means as merging of C0 and C1 ?
> >
> >
> The 'merge' in the quoted high-level description may just mean that the
> dumped hfile is 'merged' with the others at read time. Or it may be as
> stated, that the 'merge' happens at flush time. Some LSM tree
> implementations do it this way -- Bigtable, and it calls the merge of
> memstore and a file-on-disk a form of compaction -- but this is not what
> HBase does; it just dumps the memstore as a flushed hfile. Later, we'll run
> a compaction process to merge hfiles in background.
>
>
>
> > 2.Moves current, active Map aside as a snapshot (while a write lock is
> held
> > for a short period of time), and then creates a new CSLS instances.
> >
> > In background, the snapshot is then dumped to disk. We get an Iterator on
> > CSLS. We write a block at a time. When we exceed configured block size,
> we
> > start a new one.
> >
> > -- Does write lock is held till the time complete CSLS is dumpled on
> > disk.
>
>
>
> No. Just while we move aside the current MemStore.
>
> What is your concern/objective? Are you studying LSM trees generally or are
> you worried that HBase is offline for periods of time for read and write?
>
> Thanks,
> St.Ack
>
>
>
> > And read is allowed using snapshot.
> >
> >
>
>
>
> > Thanks!
> >
> >
> >
> > On Mon, May 2, 2016 at 11:39 AM, Stack  wrote:
> >
> > > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <
> > shushantaror...@gmail.com>
> > > wrote:
> > >
> > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in
> > memstore?
> > > >
> > > > Yes (We use a CSLS but this is implemented over a CSLM).
> > >
> > >
> > > > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > > > Concurrentskiplist as Hfile2? Then How does it calculates blocks out
> of
> > > > CSLM and dmp them in HDFS.
> > > >
> > > >
> > > Moves current, active Map aside as a snapshot (while a write lock is
> held
> > > for a short period of time), and then creates a new CSLS instances.
> > >
> > > In background, the snapshot is then dumped to disk. We get an Iterator
> on
> > > CSLS. We write a block at a time. When we exceed configured block size,
> > we
> > > start a new one.
> > >
> > >
> > > > 3.After dumping the inmemory CSLM of memstore to HFILe does memstore
> > > > content is discarded
> > >
> > >
> > > Yes
> > >
> > >
> > >
> > > > and if while dumping memstore any read request comes
> > > > will it be responded by copy of memstore or discard of memstore will
> be
> > > > blocked until read request is completed?
> > > >
> > > > We will respond using the snapshot until it has been successfully
> > dumped.
> > > Once dumped, we'll respond using the hfile.
> > >
> > > No blocking (other than for the short period during which the snapshot
> is
> > > made and the file is swapped into the read path).
> > >
> > >
> > >
> > > > 4.When a read request comes does it look in inmemory CSLM and then in
> > > > HFile?
> > >
> > >
> > > Generally, yes.
> > >
> > >
> > >
> > > > And what is LogStructuredMerge tree and its usage in Hbase.
> > > >
> > > >
> > > Suggest you read up on LSM Trees (
> > > https://en.wikipedia.org/wiki/Log-structured_merge-tree) and if you
> > still
> > > can't see the LSM tree in the HBase forest, ask specific questions and
> > > we'll help you out.
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Thanks!
> > > >
> > >
> >
>


Re: hbase architecture doubts

2016-05-02 Thread Stack
On Mon, May 2, 2016 at 1:25 AM, Shushant Arora 
wrote:

> Thanks!
>
> Few doubts;
>
> 1.LSM tree comprises two tree-like
>  structures, called
> C0 and
> C1 and If the insertion causes the C0 component to exceed a certain size
> threshold, a contiguous segment of entries is removed from C0 and merged
> into C1 on disk
>
> But in Hbase when C0 which is memstore I guess? is exceeded the threshold
> size its dumped on to HDFS as HFIle(c1 I guess?) - and does compaction is
> the process which here means as merging of C0 and C1 ?
>
>
The 'merge' in the quoted high-level description may just mean that the
dumped hfile is 'merged' with the others at read time. Or it may be as
stated, that the 'merge' happens at flush time. Some LSM tree
implementations do it this way -- Bigtable, and it calls the merge of
memstore and a file-on-disk a form of compaction -- but this is not what
HBase does; it just dumps the memstore as a flushed hfile. Later, we'll run
a compaction process to merge hfiles in background.



> 2.Moves current, active Map aside as a snapshot (while a write lock is held
> for a short period of time), and then creates a new CSLS instances.
>
> In background, the snapshot is then dumped to disk. We get an Iterator on
> CSLS. We write a block at a time. When we exceed configured block size, we
> start a new one.
>
> -- Does write lock is held till the time complete CSLS is dumpled on
> disk.



No. Just while we move aside the current MemStore.

What is your concern/objective? Are you studying LSM trees generally or are
you worried that HBase is offline for periods of time for read and write?

Thanks,
St.Ack



> And read is allowed using snapshot.
>
>



> Thanks!
>
>
>
> On Mon, May 2, 2016 at 11:39 AM, Stack  wrote:
>
> > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <
> shushantaror...@gmail.com>
> > wrote:
> >
> > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in
> memstore?
> > >
> > > Yes (We use a CSLS but this is implemented over a CSLM).
> >
> >
> > > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > > Concurrentskiplist as Hfile2? Then How does it calculates blocks out of
> > > CSLM and dmp them in HDFS.
> > >
> > >
> > Moves current, active Map aside as a snapshot (while a write lock is held
> > for a short period of time), and then creates a new CSLS instances.
> >
> > In background, the snapshot is then dumped to disk. We get an Iterator on
> > CSLS. We write a block at a time. When we exceed configured block size,
> we
> > start a new one.
> >
> >
> > > 3.After dumping the inmemory CSLM of memstore to HFILe does memstore
> > > content is discarded
> >
> >
> > Yes
> >
> >
> >
> > > and if while dumping memstore any read request comes
> > > will it be responded by copy of memstore or discard of memstore will be
> > > blocked until read request is completed?
> > >
> > > We will respond using the snapshot until it has been successfully
> dumped.
> > Once dumped, we'll respond using the hfile.
> >
> > No blocking (other than for the short period during which the snapshot is
> > made and the file is swapped into the read path).
> >
> >
> >
> > > 4.When a read request comes does it look in inmemory CSLM and then in
> > > HFile?
> >
> >
> > Generally, yes.
> >
> >
> >
> > > And what is LogStructuredMerge tree and its usage in Hbase.
> > >
> > >
> > Suggest you read up on LSM Trees (
> > https://en.wikipedia.org/wiki/Log-structured_merge-tree) and if you
> still
> > can't see the LSM tree in the HBase forest, ask specific questions and
> > we'll help you out.
> >
> > St.Ack
> >
> >
> >
> >
> > > Thanks!
> > >
> >
>


[ANNOUNCE] Apache HBase 0.98.19 is now available for download

2016-05-02 Thread Andrew Purtell
Apache HBase 0.98.19 is now available for download. Get it from an Apache
mirror [1] or Maven repository. The list of changes in this release can be
found in the release notes [2] or at the bottom of this announcement.

Thanks to all who contributed to this release.

Best,
The HBase Dev Team

​1. http://www.apache.org/dyn/closer.lua/hbase/
2. https://s.apache.org/z92R


HBASE-11830 TestReplicationThrottler.testThrottling failed on virtual boxes
HBASE-12148 Remove TimeRangeTracker as point of contention when many
threads writing a Store
HBASE-12511 namespace permissions - add support from table creation
privilege in a namespace 'C'
HBASE-12663 unify getTableDescriptors() and
listTableDescriptorsByNamespace()
HBASE-12674 Add permission check to getNamespaceDescriptor()
HBASE-13700 Allow Thrift2 HSHA server to have configurable threads
HBASE-14809 Grant / revoke Namespace admin permission to group
HBASE-14870 Backport namespace permissions to 98 branch
HBASE-14983 Create metrics for per block type hit/miss ratios
HBASE-15191 CopyTable and VerifyReplication - Option to specify batch size,
versions
HBASE-15212 RRCServer should enforce max request size
HBASE-15234 ReplicationLogCleaner can abort due to transient ZK issues
HBASE-15368 Add pluggable window support
HBASE-15386 PREFETCH_BLOCKS_ON_OPEN in HColumnDescriptor is ignored
HBASE-15389 Write out multiple files when compaction
HBASE-15400 Use DateTieredCompactor for Date Tiered Compaction
HBASE-15405 Synchronize final results logging single thread in PE, fix
wrong defaults in help message
HBASE-15412 Add average region size metric
HBASE-15460 Fix infer issues in hbase-common
HBASE-15475 Allow TimestampsFilter to provide a seek hint
HBASE-15479 No more garbage or beware of autoboxing
HBASE-15527 Refactor Compactor related classes
HBASE-15548 SyncTable: sourceHashDir is supposed to be optional but won't
work without
HBASE-15569 Make Bytes.toStringBinary faster
HBASE-15582 SnapshotManifestV1 too verbose when there are no regions
HBASE-15587 FSTableDescriptors.getDescriptor() logs stack trace erronously
HBASE-15614 Report metrics from JvmPauseMonitor
HBASE-15621 Suppress Hbase SnapshotHFile cleaner error  messages when a
snaphot is going on
HBASE-15622 Superusers does not consider the keytab credentials
HBASE-15627 Miss space and closing quote in
AccessController#checkSystemOrSuperUser
HBASE-15629 Backport HBASE-14703 to 0.98+
HBASE-15637 TSHA Thrift-2 server should allow limiting call queue size
HBASE-15640 L1 cache doesn't give fair warning that it is showing partial
stats only when it hits limit
HBASE-15647 Backport HBASE-15507 to 0.98
HBASE-15650 Remove TimeRangeTracker as point of contention when many
threads reading a StoreFile
HBASE-15661 Hook up JvmPauseMonitor metrics in Master
HBASE-15662 Hook up JvmPauseMonitor to REST server
HBASE-15663 Hook up JvmPauseMonitor to ThriftServer
HBASE-15664 Use Long.MAX_VALUE instead of HConstants.FOREVER in
CompactionPolicy
HBASE-15665 Support using different StoreFileComparators for different
CompactionPolicies
HBASE-15672
hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes fails
HBASE-15673 [PE tool] Fix latency metrics for multiGet
HBASE-15679 Assertion on wrong variable in
TestReplicationThrottler#testThrottling


Re: hbase architecture doubts

2016-05-02 Thread Shushant Arora
Thanks!

Few doubts;

1.LSM tree comprises two tree-like
 structures, called C0 and
C1 and If the insertion causes the C0 component to exceed a certain size
threshold, a contiguous segment of entries is removed from C0 and merged
into C1 on disk

But in Hbase when C0 which is memstore I guess? is exceeded the threshold
size its dumped on to HDFS as HFIle(c1 I guess?) - and does compaction is
the process which here means as merging of C0 and C1 ?

2.Moves current, active Map aside as a snapshot (while a write lock is held
for a short period of time), and then creates a new CSLS instances.

In background, the snapshot is then dumped to disk. We get an Iterator on
CSLS. We write a block at a time. When we exceed configured block size, we
start a new one.

-- Does write lock is held till the time complete CSLS is dumpled on
disk.And read is allowed using snapshot.

Thanks!



On Mon, May 2, 2016 at 11:39 AM, Stack  wrote:

> On Sun, May 1, 2016 at 3:36 AM, Shushant Arora 
> wrote:
>
> > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in memstore?
> >
> > Yes (We use a CSLS but this is implemented over a CSLM).
>
>
> > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > Concurrentskiplist as Hfile2? Then How does it calculates blocks out of
> > CSLM and dmp them in HDFS.
> >
> >
> Moves current, active Map aside as a snapshot (while a write lock is held
> for a short period of time), and then creates a new CSLS instances.
>
> In background, the snapshot is then dumped to disk. We get an Iterator on
> CSLS. We write a block at a time. When we exceed configured block size, we
> start a new one.
>
>
> > 3.After dumping the inmemory CSLM of memstore to HFILe does memstore
> > content is discarded
>
>
> Yes
>
>
>
> > and if while dumping memstore any read request comes
> > will it be responded by copy of memstore or discard of memstore will be
> > blocked until read request is completed?
> >
> > We will respond using the snapshot until it has been successfully dumped.
> Once dumped, we'll respond using the hfile.
>
> No blocking (other than for the short period during which the snapshot is
> made and the file is swapped into the read path).
>
>
>
> > 4.When a read request comes does it look in inmemory CSLM and then in
> > HFile?
>
>
> Generally, yes.
>
>
>
> > And what is LogStructuredMerge tree and its usage in Hbase.
> >
> >
> Suggest you read up on LSM Trees (
> https://en.wikipedia.org/wiki/Log-structured_merge-tree) and if you still
> can't see the LSM tree in the HBase forest, ask specific questions and
> we'll help you out.
>
> St.Ack
>
>
>
>
> > Thanks!
> >
>