Re: Hbase fast access

2016-10-24 Thread Mich Talebzadeh
Thanks Dave,

Yes defragging is a process to get rid of fragmentation and block/page
chaining.

I must admit that Hbase architecture in terms of memory management is
similar to what something like Oracle or SAP ASE do. Sounds like after a
long journey memory is the best place to do data manipulation. LSM tree
structure is pretty impressive compared to the traditional B-tree access in
RDBMS.



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 18:53, Dave Birdsall <dave.birds...@esgyn.com> wrote:

> At a physical level HBase is append-only.
>
> At a logical level, one can update data in HBase just like one can in any
> RDBMS.
>
> The memstore/block cache and compaction logic are the mechanisms that
> bridge between these two views.
>
> What makes LSMs attractive performance-wise in comparison to traditional
> RDMS storage architectures is that memory speeds and CPU speeds have
> increased at a faster rate than Disk I/O transfer speeds.
>
> Even in traditional RDBMS though it is useful to periodically perform file
> reorganizations, that is, rewrite scattered disk blocks into sequence on
> disk. Many RDBMSs do this; Tandem did it way back in the 1980s for example.
> But caches were not large enough to have an LSM-style architecture back
> then.
>
> Dave
>
> -Original Message-
> From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
> Sent: Friday, October 21, 2016 2:09 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase fast access
>
> I was asked an interesting question.
>
> Can one update data in Hbase? and my answer was it is only append only
>
> Can one update data in Hive? My answer was yes if table is created as ORC
> and tableproperties set with "transactional"="true"
>
>
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY", "transactional"="true",
> "orc.create.index"="true", "orc.bloom.filter.columns"="object_id",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="1" )
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 22:01, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > It is true in the sense that hfile, once written (and closed), becomes
> > immutable.
> >
> > Compaction would remove obsolete content and generate new hfiles.
> >
> > Cheers
> >
> > On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> > > BTW. I always understood that Hbase is append only. is that
> > > generally
> > true?
> > >
> > > thx
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac
> > > PCCd
> > > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility
> > > for any loss, damage or destruction of data or any other property
> > > which may arise from relying on this email's technical content is
> explicitly disclaimed.
> > > The author will in no case be liable for any monetary damages
> > > arising
> > from
> > > such loss, damage or de

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
I was asked an interesting question.

Can one update data in Hbase? and my answer was it is only append only

Can one update data in Hive? My answer was yes if table is created as ORC
and tableproperties set with "transactional"="true"


STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"transactional"="true",
"orc.create.index"="true",
"orc.bloom.filter.columns"="object_id",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 22:01, Ted Yu  wrote:

> It is true in the sense that hfile, once written (and closed), becomes
> immutable.
>
> Compaction would remove obsolete content and generate new hfiles.
>
> Cheers
>
> On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > BTW. I always understood that Hbase is append only. is that generally
> true?
> >
> > thx
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:57, Mich Talebzadeh 
> > wrote:
> >
> > > agreed much like any rdbms
> > >
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >  AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 21:54, Ted Yu  wrote:
> > >
> > >> Well, updates (in memory) would ultimately be flushed to disk,
> resulting
> > >> in
> > >> new hfiles.
> > >>
> > >> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
> > >> mich.talebza...@gmail.com>
> > >> wrote:
> > >>
> > >> > thanks
> > >> >
> > >> > bq. all updates are done in memory o disk access
> > >> >
> > >> > I meant data updates are operated in memory, no disk access.
> > >> >
> > >> > in other much like rdbms read data into memory and update it there
> > >> > (assuming that data is not already in memory?)
> > >> >
> > >> > HTH
> > >> >
> > >> > Dr Mich Talebzadeh
> > >> >
> > >> >
> > >> >
> > >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> >  > >> Jd6zP6AcPCCd
> > >> > OABUrV8Pw>*
> > >> >
> > >> >
> > >> >
> > >> > http://talebzadehmich.wordpress.com
> > >> >
> > >> >
> > >> > *Disclaimer:* Use it at your own risk. Any and all responsibility
> for
> > >> any
> > >> > loss, damage or destruction of data or any other property which may
> > >> arise
> > >> > from relying on this email's technical content is explicitly
> > disclaimed.
> > >> > The author will in no case be liable for any monetary damages
> arising
> > >> from
> > >> > such loss, damage or destruction.
> > >> >
> > >> >
> > >> >
> > >> > On 21 October 2016 at 21:46, Ted Yu  wrote:
> > >> >
> > >> > > bq. this search is carried out through map-reduce on region
> servers?
> > >> > >
> > >> > > No map-reduce. region server uses its own thread(s).
> > >> > >
> > >> > > bq. all updates are done in memory o disk access
> > >> > >
> > >> > > Can you clarify ? There seems to be some missing letters.
> > >> > >
> > >> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> > >> > > mich.talebza...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > thanks
> > >> > > >
> > >> > > > 

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
BTW. I always understood that Hbase is append only. is that generally true?

thx

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:57, Mich Talebzadeh 
wrote:

> agreed much like any rdbms
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 21:54, Ted Yu  wrote:
>
>> Well, updates (in memory) would ultimately be flushed to disk, resulting
>> in
>> new hfiles.
>>
>> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
>> wrote:
>>
>> > thanks
>> >
>> > bq. all updates are done in memory o disk access
>> >
>> > I meant data updates are operated in memory, no disk access.
>> >
>> > in other much like rdbms read data into memory and update it there
>> > (assuming that data is not already in memory?)
>> >
>> > HTH
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn * https://www.linkedin.com/profile/view?id=
>> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> > > Jd6zP6AcPCCd
>> > OABUrV8Pw>*
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The author will in no case be liable for any monetary damages arising
>> from
>> > such loss, damage or destruction.
>> >
>> >
>> >
>> > On 21 October 2016 at 21:46, Ted Yu  wrote:
>> >
>> > > bq. this search is carried out through map-reduce on region servers?
>> > >
>> > > No map-reduce. region server uses its own thread(s).
>> > >
>> > > bq. all updates are done in memory o disk access
>> > >
>> > > Can you clarify ? There seems to be some missing letters.
>> > >
>> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
>> > > mich.talebza...@gmail.com>
>> > > wrote:
>> > >
>> > > > thanks
>> > > >
>> > > > having read the docs it appears to me that the main reason of hbase
>> > being
>> > > > faster is:
>> > > >
>> > > >
>> > > >1. it behaves like an rdbms like oracle tetc. reads are looked
>> for
>> > in
>> > > >the buffer cache for consistent reads and if not found then store
>> > > files
>> > > > on
>> > > >disks are searched. Does this mean that this search is carried
>> out
>> > > > through
>> > > >map-reduce on region servers?
>> > > >2. when the data is written it is written to log file
>> sequentially
>> > > >first, then to in-memory store, sorted like b-tree of rdbms and
>> then
>> > > >flushed to disk. this is exactly what checkpoint in an rdbms does
>> > > >3. one can point out that hbase is faster because log structured
>> > merge
>> > > >tree (LSM-trees)  has less depth than a B-tree in rdbms.
>> > > >4. all updates are done in memory o disk access
>> > > >5. in summary LSM-trees reduce disk access when data is read from
>> > disk
>> > > >because of reduced seek time again less depth to get data with
>> > > LSM-tree
>> > > >
>> > > >
>> > > > appreciate any comments
>> > > >
>> > > >
>> > > > cheers
>> > > >
>> > > >
>> > > > Dr Mich Talebzadeh
>> > > >
>> > > >
>> > > >
>> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
>> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> > > > > > AAEWh2gBxianrbJd6zP6AcPCCd
>> > > > OABUrV8Pw>*
>> > > >
>> > > >
>> > > >
>> > > > http://talebzadehmich.wordpress.com
>> > > >
>> > > >
>> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for
>> > any
>> > > > loss, damage or destruction of data or any other property which may
>> > arise
>> > > > from relying on this email's technical content is explicitly
>> > disclaimed.
>> > > > The author will in no case be liable for 

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
agreed much like any rdbms



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:54, Ted Yu  wrote:

> Well, updates (in memory) would ultimately be flushed to disk, resulting in
> new hfiles.
>
> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > thanks
> >
> > bq. all updates are done in memory o disk access
> >
> > I meant data updates are operated in memory, no disk access.
> >
> > in other much like rdbms read data into memory and update it there
> > (assuming that data is not already in memory?)
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:46, Ted Yu  wrote:
> >
> > > bq. this search is carried out through map-reduce on region servers?
> > >
> > > No map-reduce. region server uses its own thread(s).
> > >
> > > bq. all updates are done in memory o disk access
> > >
> > > Can you clarify ? There seems to be some missing letters.
> > >
> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > thanks
> > > >
> > > > having read the docs it appears to me that the main reason of hbase
> > being
> > > > faster is:
> > > >
> > > >
> > > >1. it behaves like an rdbms like oracle tetc. reads are looked for
> > in
> > > >the buffer cache for consistent reads and if not found then store
> > > files
> > > > on
> > > >disks are searched. Does this mean that this search is carried out
> > > > through
> > > >map-reduce on region servers?
> > > >2. when the data is written it is written to log file sequentially
> > > >first, then to in-memory store, sorted like b-tree of rdbms and
> then
> > > >flushed to disk. this is exactly what checkpoint in an rdbms does
> > > >3. one can point out that hbase is faster because log structured
> > merge
> > > >tree (LSM-trees)  has less depth than a B-tree in rdbms.
> > > >4. all updates are done in memory o disk access
> > > >5. in summary LSM-trees reduce disk access when data is read from
> > disk
> > > >because of reduced seek time again less depth to get data with
> > > LSM-tree
> > > >
> > > >
> > > > appreciate any comments
> > > >
> > > >
> > > > cheers
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >  > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > > On 21 October 2016 at 17:51, Ted Yu  wrote:
> > > >
> > > > > See some prior blog:
> > > > >
> > > > > http://www.cyanny.com/2014/03/13/hbase-architecture-
> > > > > analysis-part1-logical-architecture/
> > > > >
> > > > > w.r.t. compaction in Hive, it is used to compact deltas into a base
> > > file
> > > > > (in the context of transactions).  Likely they're different.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> > > > > mich.talebza...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Can someone in a nutshell explain *the *Hbase use of
> log-structured
> > > > > > merge-tree (LSM-tree) as data storage architecture
> > > > > >
> > > > > > The idea of merging smaller files to larger 

Re: Hbase fast access

2016-10-21 Thread Ted Yu
Well, updates (in memory) would ultimately be flushed to disk, resulting in
new hfiles.

On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh 
wrote:

> thanks
>
> bq. all updates are done in memory o disk access
>
> I meant data updates are operated in memory, no disk access.
>
> in other much like rdbms read data into memory and update it there
> (assuming that data is not already in memory?)
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 21:46, Ted Yu  wrote:
>
> > bq. this search is carried out through map-reduce on region servers?
> >
> > No map-reduce. region server uses its own thread(s).
> >
> > bq. all updates are done in memory o disk access
> >
> > Can you clarify ? There seems to be some missing letters.
> >
> > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> > > thanks
> > >
> > > having read the docs it appears to me that the main reason of hbase
> being
> > > faster is:
> > >
> > >
> > >1. it behaves like an rdbms like oracle tetc. reads are looked for
> in
> > >the buffer cache for consistent reads and if not found then store
> > files
> > > on
> > >disks are searched. Does this mean that this search is carried out
> > > through
> > >map-reduce on region servers?
> > >2. when the data is written it is written to log file sequentially
> > >first, then to in-memory store, sorted like b-tree of rdbms and then
> > >flushed to disk. this is exactly what checkpoint in an rdbms does
> > >3. one can point out that hbase is faster because log structured
> merge
> > >tree (LSM-trees)  has less depth than a B-tree in rdbms.
> > >4. all updates are done in memory o disk access
> > >5. in summary LSM-trees reduce disk access when data is read from
> disk
> > >because of reduced seek time again less depth to get data with
> > LSM-tree
> > >
> > >
> > > appreciate any comments
> > >
> > >
> > > cheers
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >  AAEWh2gBxianrbJd6zP6AcPCCd
> > > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 17:51, Ted Yu  wrote:
> > >
> > > > See some prior blog:
> > > >
> > > > http://www.cyanny.com/2014/03/13/hbase-architecture-
> > > > analysis-part1-logical-architecture/
> > > >
> > > > w.r.t. compaction in Hive, it is used to compact deltas into a base
> > file
> > > > (in the context of transactions).  Likely they're different.
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> > > > mich.talebza...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Can someone in a nutshell explain *the *Hbase use of log-structured
> > > > > merge-tree (LSM-tree) as data storage architecture
> > > > >
> > > > > The idea of merging smaller files to larger files periodically to
> > > reduce
> > > > > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > Dr Mich Talebzadeh
> > > > >
> > > > >
> > > > >
> > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > >  > > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > > OABUrV8Pw>*
> > > > >
> > > > >
> > > > >
> > > > > http://talebzadehmich.wordpress.com
> > > > >
> > > > >
> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility
> for
> > > any
> > > > > loss, damage or destruction of data or any other property which may
> > > arise
> > > > > from relying on this email's technical content is explicitly
> > > disclaimed.
> > > > > The author will in no case be liable for any monetary damages
> arising
> > > > from
> > > > > such loss, damage or destruction.

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
thanks

bq. all updates are done in memory o disk access

I meant data updates are operated in memory, no disk access.

in other much like rdbms read data into memory and update it there
(assuming that data is not already in memory?)

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:46, Ted Yu  wrote:

> bq. this search is carried out through map-reduce on region servers?
>
> No map-reduce. region server uses its own thread(s).
>
> bq. all updates are done in memory o disk access
>
> Can you clarify ? There seems to be some missing letters.
>
> On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > thanks
> >
> > having read the docs it appears to me that the main reason of hbase being
> > faster is:
> >
> >
> >1. it behaves like an rdbms like oracle tetc. reads are looked for in
> >the buffer cache for consistent reads and if not found then store
> files
> > on
> >disks are searched. Does this mean that this search is carried out
> > through
> >map-reduce on region servers?
> >2. when the data is written it is written to log file sequentially
> >first, then to in-memory store, sorted like b-tree of rdbms and then
> >flushed to disk. this is exactly what checkpoint in an rdbms does
> >3. one can point out that hbase is faster because log structured merge
> >tree (LSM-trees)  has less depth than a B-tree in rdbms.
> >4. all updates are done in memory o disk access
> >5. in summary LSM-trees reduce disk access when data is read from disk
> >because of reduced seek time again less depth to get data with
> LSM-tree
> >
> >
> > appreciate any comments
> >
> >
> > cheers
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 17:51, Ted Yu  wrote:
> >
> > > See some prior blog:
> > >
> > > http://www.cyanny.com/2014/03/13/hbase-architecture-
> > > analysis-part1-logical-architecture/
> > >
> > > w.r.t. compaction in Hive, it is used to compact deltas into a base
> file
> > > (in the context of transactions).  Likely they're different.
> > >
> > > Cheers
> > >
> > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can someone in a nutshell explain *the *Hbase use of log-structured
> > > > merge-tree (LSM-tree) as data storage architecture
> > > >
> > > > The idea of merging smaller files to larger files periodically to
> > reduce
> > > > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >  > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > > On 21 October 2016 at 15:27, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > > > wrote:
> > > >
> > > > > Sorry that should read Hive not Spark here
> > > > >
> > > > > Say compared to Spark that is basically a SQL layer relying on
> > > different
> > > > > engines (mr, Tez, Spark) to execute the code
> > > > >
> > > > > Dr Mich Talebzadeh
> > > > >
> > > > >
> > > > >
> > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > 

Re: Hbase fast access

2016-10-21 Thread Ted Yu
bq. this search is carried out through map-reduce on region servers?

No map-reduce. region server uses its own thread(s).

bq. all updates are done in memory o disk access

Can you clarify ? There seems to be some missing letters.

On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh 
wrote:

> thanks
>
> having read the docs it appears to me that the main reason of hbase being
> faster is:
>
>
>1. it behaves like an rdbms like oracle tetc. reads are looked for in
>the buffer cache for consistent reads and if not found then store files
> on
>disks are searched. Does this mean that this search is carried out
> through
>map-reduce on region servers?
>2. when the data is written it is written to log file sequentially
>first, then to in-memory store, sorted like b-tree of rdbms and then
>flushed to disk. this is exactly what checkpoint in an rdbms does
>3. one can point out that hbase is faster because log structured merge
>tree (LSM-trees)  has less depth than a B-tree in rdbms.
>4. all updates are done in memory o disk access
>5. in summary LSM-trees reduce disk access when data is read from disk
>because of reduced seek time again less depth to get data with LSM-tree
>
>
> appreciate any comments
>
>
> cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 17:51, Ted Yu  wrote:
>
> > See some prior blog:
> >
> > http://www.cyanny.com/2014/03/13/hbase-architecture-
> > analysis-part1-logical-architecture/
> >
> > w.r.t. compaction in Hive, it is used to compact deltas into a base file
> > (in the context of transactions).  Likely they're different.
> >
> > Cheers
> >
> > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Can someone in a nutshell explain *the *Hbase use of log-structured
> > > merge-tree (LSM-tree) as data storage architecture
> > >
> > > The idea of merging smaller files to larger files periodically to
> reduce
> > > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> > >
> > > Thanks
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >  AAEWh2gBxianrbJd6zP6AcPCCd
> > > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 15:27, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Sorry that should read Hive not Spark here
> > > >
> > > > Say compared to Spark that is basically a SQL layer relying on
> > different
> > > > engines (mr, Tez, Spark) to execute the code
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >  > AAEWh2gBxianrbJd6zP6AcPCCd
> > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > > On 21 October 2016 at 13:17, Ted Yu  wrote:
> > > >
> > > >> Mich:
> > > >> Here is brief description of hbase architecture:
> > > >> https://hbase.apache.org/book.html#arch.overview
> > > >>
> > > >> You can also get more details from Lars George's or Nick Dimiduk's
> > > books.
> > > >>
> > > >> HBase doesn't support SQL directly. There is no cost based
> > optimization.
> > > >>
> > > >> Cheers
> > > >>
> > > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh <
> > > 

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
thanks

having read the docs it appears to me that the main reason of hbase being
faster is:


   1. it behaves like an rdbms like oracle tetc. reads are looked for in
   the buffer cache for consistent reads and if not found then store files on
   disks are searched. Does this mean that this search is carried out through
   map-reduce on region servers?
   2. when the data is written it is written to log file sequentially
   first, then to in-memory store, sorted like b-tree of rdbms and then
   flushed to disk. this is exactly what checkpoint in an rdbms does
   3. one can point out that hbase is faster because log structured merge
   tree (LSM-trees)  has less depth than a B-tree in rdbms.
   4. all updates are done in memory o disk access
   5. in summary LSM-trees reduce disk access when data is read from disk
   because of reduced seek time again less depth to get data with LSM-tree


appreciate any comments


cheers


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 17:51, Ted Yu  wrote:

> See some prior blog:
>
> http://www.cyanny.com/2014/03/13/hbase-architecture-
> analysis-part1-logical-architecture/
>
> w.r.t. compaction in Hive, it is used to compact deltas into a base file
> (in the context of transactions).  Likely they're different.
>
> Cheers
>
> On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Can someone in a nutshell explain *the *Hbase use of log-structured
> > merge-tree (LSM-tree) as data storage architecture
> >
> > The idea of merging smaller files to larger files periodically to reduce
> > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 15:27, Mich Talebzadeh 
> > wrote:
> >
> > > Sorry that should read Hive not Spark here
> > >
> > > Say compared to Spark that is basically a SQL layer relying on
> different
> > > engines (mr, Tez, Spark) to execute the code
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >  AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 13:17, Ted Yu  wrote:
> > >
> > >> Mich:
> > >> Here is brief description of hbase architecture:
> > >> https://hbase.apache.org/book.html#arch.overview
> > >>
> > >> You can also get more details from Lars George's or Nick Dimiduk's
> > books.
> > >>
> > >> HBase doesn't support SQL directly. There is no cost based
> optimization.
> > >>
> > >> Cheers
> > >>
> > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > This is a general question.
> > >> >
> > >> > Is Hbase fast because Hbase uses Hash tables and provides random
> > access,
> > >> > and it stores the data in indexed HDFS files for faster lookups.
> > >> >
> > >> > Say compared to Spark that is basically a SQL layer relying on
> > different
> > >> > engines (mr, Tez, Spark) to execute the code (although it has Cost
> > Base
> > >> > Optimizer), how Hbase fares, beyond relying on these engines
> > >> >
> > >> > Thanks
> > >> >
> > >> >
> > >> > Dr Mich Talebzadeh
> > >> >
> > >> >
> > >> >
> 

Re: Hbase fast access

2016-10-21 Thread Ted Yu
See some prior blog:

http://www.cyanny.com/2014/03/13/hbase-architecture-analysis-part1-logical-architecture/

w.r.t. compaction in Hive, it is used to compact deltas into a base file
(in the context of transactions).  Likely they're different.

Cheers

On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh 
wrote:

> Hi,
>
> Can someone in a nutshell explain *the *Hbase use of log-structured
> merge-tree (LSM-tree) as data storage architecture
>
> The idea of merging smaller files to larger files periodically to reduce
> disk seeks,  is this similar concept to compaction in HDFS or Hive?
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 15:27, Mich Talebzadeh 
> wrote:
>
> > Sorry that should read Hive not Spark here
> >
> > Say compared to Spark that is basically a SQL layer relying on different
> > engines (mr, Tez, Spark) to execute the code
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 13:17, Ted Yu  wrote:
> >
> >> Mich:
> >> Here is brief description of hbase architecture:
> >> https://hbase.apache.org/book.html#arch.overview
> >>
> >> You can also get more details from Lars George's or Nick Dimiduk's
> books.
> >>
> >> HBase doesn't support SQL directly. There is no cost based optimization.
> >>
> >> Cheers
> >>
> >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > This is a general question.
> >> >
> >> > Is Hbase fast because Hbase uses Hash tables and provides random
> access,
> >> > and it stores the data in indexed HDFS files for faster lookups.
> >> >
> >> > Say compared to Spark that is basically a SQL layer relying on
> different
> >> > engines (mr, Tez, Spark) to execute the code (although it has Cost
> Base
> >> > Optimizer), how Hbase fares, beyond relying on these engines
> >> >
> >> > Thanks
> >> >
> >> >
> >> > Dr Mich Talebzadeh
> >> >
> >> >
> >> >
> >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJ
> >> d6zP6AcPCCdOABUrV8Pw
> >> >  >> Jd6zP6AcPCCdOABUrV8Pw>*
> >> >
> >> >
> >> >
> >> > http://talebzadehmich.wordpress.com
> >> >
> >> >
> >> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >> any
> >> > loss, damage or destruction of data or any other property which may
> >> arise
> >> > from relying on this email's technical content is explicitly
> disclaimed.
> >> > The author will in no case be liable for any monetary damages arising
> >> from
> >> > such loss, damage or destruction.
> >>
> >
> >
>


Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
Hi,

Can someone in a nutshell explain *the *Hbase use of log-structured
merge-tree (LSM-tree) as data storage architecture

The idea of merging smaller files to larger files periodically to reduce
disk seeks,  is this similar concept to compaction in HDFS or Hive?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 15:27, Mich Talebzadeh 
wrote:

> Sorry that should read Hive not Spark here
>
> Say compared to Spark that is basically a SQL layer relying on different
> engines (mr, Tez, Spark) to execute the code
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 13:17, Ted Yu  wrote:
>
>> Mich:
>> Here is brief description of hbase architecture:
>> https://hbase.apache.org/book.html#arch.overview
>>
>> You can also get more details from Lars George's or Nick Dimiduk's books.
>>
>> HBase doesn't support SQL directly. There is no cost based optimization.
>>
>> Cheers
>>
>> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh 
>> wrote:
>> >
>> > Hi,
>> >
>> > This is a general question.
>> >
>> > Is Hbase fast because Hbase uses Hash tables and provides random access,
>> > and it stores the data in indexed HDFS files for faster lookups.
>> >
>> > Say compared to Spark that is basically a SQL layer relying on different
>> > engines (mr, Tez, Spark) to execute the code (although it has Cost Base
>> > Optimizer), how Hbase fares, beyond relying on these engines
>> >
>> > Thanks
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJ
>> d6zP6AcPCCdOABUrV8Pw
>> > > Jd6zP6AcPCCdOABUrV8Pw>*
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The author will in no case be liable for any monetary damages arising
>> from
>> > such loss, damage or destruction.
>>
>
>


Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh
Sorry that should read Hive not Spark here

Say compared to Spark that is basically a SQL layer relying on different
engines (mr, Tez, Spark) to execute the code

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 13:17, Ted Yu  wrote:

> Mich:
> Here is brief description of hbase architecture:
> https://hbase.apache.org/book.html#arch.overview
>
> You can also get more details from Lars George's or Nick Dimiduk's books.
>
> HBase doesn't support SQL directly. There is no cost based optimization.
>
> Cheers
>
> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh 
> wrote:
> >
> > Hi,
> >
> > This is a general question.
> >
> > Is Hbase fast because Hbase uses Hash tables and provides random access,
> > and it stores the data in indexed HDFS files for faster lookups.
> >
> > Say compared to Spark that is basically a SQL layer relying on different
> > engines (mr, Tez, Spark) to execute the code (although it has Cost Base
> > Optimizer), how Hbase fares, beyond relying on these engines
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
>


Re: Hbase fast access

2016-10-21 Thread Ted Yu
Mich:
Here is brief description of hbase architecture:
https://hbase.apache.org/book.html#arch.overview

You can also get more details from Lars George's or Nick Dimiduk's books. 

HBase doesn't support SQL directly. There is no cost based optimization. 

Cheers

> On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh  
> wrote:
> 
> Hi,
> 
> This is a general question.
> 
> Is Hbase fast because Hbase uses Hash tables and provides random access,
> and it stores the data in indexed HDFS files for faster lookups.
> 
> Say compared to Spark that is basically a SQL layer relying on different
> engines (mr, Tez, Spark) to execute the code (although it has Cost Base
> Optimizer), how Hbase fares, beyond relying on these engines
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.