from:"Mich Talebzadeh"

Re: [ANNOUNCE] Apache HBase 2.3.5 is now available for download

2021-04-01 Thread Mich Talebzadeh

Great news HBase team, well done.

I have worked with HBase for many years and I think it is a great product.
it does what it says on the tin so to speak.

Ironically if you look around the NoSQL competitors, most of them are
supported by start-ups, whereas HBase is only supported as part of Apache
suite of products by vendors like Cloudera and others. Moreover, Google
Cloud BigTable (proprietary)  now has a seamless API to Apache HBase.

For those who would prefer to use SQL on top, there is Apache Phoenix
around which makes life easier for most SQL savvy people to work on HBase.
Problem solved

For TCO, HBase is still value for money compared to others. You do not need
expensive RAM or SSD with Hbase. That makes it easy to onboard it in no
time. Also Hbase can be used in a variety of different business
applications, whereas other commercial ones  are focused on narrower niche
markets.

I believe HBase is now on its 11th anniversary (the 10th anniversary was
May 2020) and hope HBase will go from strength to strength and we will
keep using it for years to come with these frequent upgrades.

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Thu, 1 Apr 2021 at 19:58, Huaxiang Sun  wrote:

> The HBase team is happy to announce the immediate availability of HBase
>
> 2.3.5.
>
>
> Apache HBase™ is an open-source, distributed, versioned, non-relational
>
> database.
>
> Apache HBase gives you low latency random access to billions of rows with
>
> millions of columns atop non-specialized hardware. To learn more about
>
> HBase, see https://hbase.apache.org/.
>
>
> HBase 2.3.5 is the fifth patch release in the HBase 2.3.x line, which aims
>
> to improve the stability and reliability of HBase. This release includes 53
> bug
>
> fixes and improvements since 2.3.4.
>
>
> The full list of issues and release notes can be found here:
>
> CHANGELOG: https://downloads.apache.org/hbase/2.3.5/CHANGES.md
>
> RELEASENOTES: https://downloads.apache.org/hbase/2.3.5/RELEASENOTES.md
>
>
> or via our issue tracker:
>
> https://issues.apache.org/jira/projects/HBASE/versions/12349549
>
>
> To download please follow the links and instructions on our website:
>
>
> https://hbase.apache.org/downloads.html
>
>
> Questions, comments, and problems are always welcome at:
>
> d...@hbase.apache.org
>
> user@hbase.apache.org
>
>
> Thanks to all who contributed and made this release possible.
>
>
> Cheers,
>
> The HBase Dev Team
>

Re: [DISCUSS] Removing problematic terms from our project

2020-06-22 Thread Mich Talebzadeh

Let us look at what *slave* mean

According to the merriam-webster

https://www.merriam-webster.com/dictionary/slave

Definition of *slave*

 (Entry 1 of 4)
1: a person held in servitude as the chattel of another
2: one that is completely subservient to a dominating influence
3: a device (such as the printer of a computer) that is directly responsive
to another
4: DRUDGE , TOILER

so in the context of Hbase, number *3* is valid. In other words, a
component which is directly responsive to another, another being *master*.






LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 22 Jun 2020 at 22:09, Geoffrey Jacoby  wrote:

> For most of the proposals (slave -> worker, blacklist -> denylist,
> whitelist-> allowlist), I'm +1 (nonbinding). Denylist and acceptlist even
> have the advantage of being clearer than the terms they're replacing.
>
> However, I'm not convinced about changing "master" to "coordinator", or
> something similar. Unlike "slave", which is negative in any context,
> "master" has many definitions, including some common ones which do not
> appear problematic. See https://www.merriam-webster.com/dictionary/master
> for
> examples. In particular, the progression of an artisan was from
> "apprentice" to "journeyman" to "master". A master smith, carpenter, or
> artist would run a shop managing lots of workers and apprentices who would
> hope to become masters of their own someday. So "master" and "worker" can
> still go together.
>
> Since it's the least problematic term, and by far the hardest term to
> change (both within HBase and with effects on downstream projects such as
> Ambari), I'm -0 (nonbinding) on changing "master".
>
> Geoffrey
>
> On Mon, Jun 22, 2020 at 1:32 PM Rushabh Shah
>  wrote:
>
> > +1 to renaming.
> >
> >
> > Rushabh Shah
> >
> >- Software Engineering SMTS | Salesforce
> >-
> >   - Mobile: 213 422 9052
> >
> >
> >
> > On Mon, Jun 22, 2020 at 1:18 PM Josh Elser  wrote:
> >
> > > +1
> > >
> > > On 6/22/20 4:03 PM, Sean Busbey wrote:
> > > > We should change our use of these terms. We can be equally or more
> > clear
> > > in
> > > > what we are trying to convey where they are present.
> > > >
> > > > That they have been used historically is only useful if the advantage
> > we
> > > > gain from using them through that shared context outweighs the
> > potential
> > > > friction they add. They make me personally less enthusiastic about
> > > > contributing. That's enough friction for me to advocate removing
> them.
> > > >
> > > > AFAICT reworking our replication stuff in terms of "active" and
> > "passive"
> > > > clusters did not result in a big spike of folks asking new questions
> > > about
> > > > where authority for state was.
> > > >
> > > > On Mon, Jun 22, 2020, 13:39 Andrew Purtell 
> > wrote:
> > > >
> > > >> In response to renewed attention at the Foundation toward addressing
> > > >> culturally problematic language and terms often used in technical
> > > >> documentation and discussion, several projects have begun
> discussions,
> > > or
> > > >> made proposals, or started work along these lines.
> > > >>
> > > >> The HBase PMC began its own discussion on private@ on June 9, 2020
> > > with an
> > > >> observation of this activity and this suggestion:
> > > >>
> > > >> There is a renewed push back against classic technology industry
> terms
> > > that
> > > >> have negative modern connotations.
> > > >>
> > > >> In the case of HBase, the following substitutions might be proposed:
> > > >>
> > > >> - Coordinator instead of master
> > > >>
> > > >> - Worker instead of slave
> > > >>
> > > >> Recommendations for these additional substitutions also come up in
> > this
> > > >> type of discussion:
> > > >>
> > > >> - Accept list instead of white list
> > > >>
> > > >> - Deny list instead of black list
> > > >>
> > > >> Unfortunately we have Master all over our code base, baked into
> > various
> > > >> APIs and configuration variable names, so for us the necessary
> changes
> > > >> amount to a new major release and deprecation cycle. It could well
> be
> > > worth
> > > >> it in the long run. We exist only as long as we draw a willing and
> > > >> sufficient contributor community. It also wouldn’t be great to have
> an
> > > >> activist fork appear somewhere, even if unlikely to be successful.
> > > >>
> > > >> Relevant JIRAs are:
> > > >>
> > >

Re: [DISCUSS] Removing problematic terms from our project

2020-06-22 Thread Mich Talebzadeh

In mitigation, we should only do the revision if the community feels:


   1. There is a need to revise historical context
   2. We by virtue of accepting changes will make a better team
   3. It will have little or no impact on the current functionality
   4. Given that most products in production lag few versions behind, in
   all likelihood, it may take few years before the changes are materialised
   5. If there is a clear majority message that we ought to change, then a
   sensible roadmap should be prepared with timelines.
   6. We should not change the things because it is fashionable. Those who
   have visited Linkedlin recently would have noticed that there are a lot of
   companies who rightly or wrongly have come out with the support of the
   current trends and equally have attracted a lot of criticism by quote "not
   being sincere"


I know I am not making myself popular but I think we ought to weigh things
up in true context.


HTH




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 22 Jun 2020 at 20:14, Mich Talebzadeh 
wrote:

>
> Hi,
>
> Thank you for the proposals.
>
> I am afraid I have to agree to differ. The term master and slave (commonly
> used in Big data tools (not confined to HBase only) is BAU and historical)
> and bears no resemblance to anything recent.
>
> Additionally, both whitelist and blacklist simply refer to a
> proposal which is accepted and a proposal which is struck out (black
> pencil line).
>
> So in scientific context these are terminologies used. Terminologies
> become offensive if they are used "in the incorrect context". I don't think
> anyone in HBase or Spark community will have objections if these
> terminologies are used as before. Spark used the term in master/slave in
> Standalone mode if i recall correctly.
>
> Changing something for the sake of "now being in the limelight" does not
> make it right. So I beg to differ on this. Having said that it is indeed a
> sign of a civilised mind to entertain an idea without accepting it so
> whatever the community wishes.
>
> HTH
>
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 22 Jun 2020 at 19:39, Andrew Purtell  wrote:
>
>> In response to renewed attention at the Foundation toward addressing
>> culturally problematic language and terms often used in technical
>> documentation and discussion, several projects have begun discussions, or
>> made proposals, or started work along these lines.
>>
>> The HBase PMC began its own discussion on private@ on June 9, 2020 with
>> an
>> observation of this activity and this suggestion:
>>
>> There is a renewed push back against classic technology industry terms
>> that
>> have negative modern connotations.
>>
>> In the case of HBase, the following substitutions might be proposed:
>>
>> - Coordinator instead of master
>>
>> - Worker instead of slave
>>
>> Recommendations for these additional substitutions also come up in this
>> type of discussion:
>>
>> - Accept list instead of white list
>>
>> - Deny list instead of black list
>>
>> Unfortunately we have Master all over our code base, baked into various
>> APIs and configuration variable names, so for us the necessary changes
>> amount to a new major release and deprecation cycle. It could well be
>> worth
>> it in the long run. We exist only as long as we draw a willing and
>> sufficient contributor community. It also wouldn’t be great to have an
>> activist fork appear somewhere, even if unlikely to be successful.
>>
>> Relevant JIRAs are:
>>
>>- HBASE-12677 <https://issues.apache.org/jira/browse/HBASE-12677>:
>>

Re: [DISCUSS] Removing problematic terms from our project

2020-06-22 Thread Mich Talebzadeh

Hi,

Thank you for the proposals.

I am afraid I have to agree to differ. The term master and slave (commonly
used in Big data tools (not confined to HBase only) is BAU and historical)
and bears no resemblance to anything recent.

Additionally, both whitelist and blacklist simply refer to a proposal which
is accepted and a proposal which is struck out (black pencil line).

So in scientific context these are terminologies used. Terminologies become
offensive if they are used "in the incorrect context". I don't think anyone
in HBase or Spark community will have objections if these terminologies are
used as before. Spark used the term in master/slave in Standalone mode if i
recall correctly.

Changing something for the sake of "now being in the limelight" does not
make it right. So I beg to differ on this. Having said that it is indeed a
sign of a civilised mind to entertain an idea without accepting it so
whatever the community wishes.

HTH




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 22 Jun 2020 at 19:39, Andrew Purtell  wrote:

> In response to renewed attention at the Foundation toward addressing
> culturally problematic language and terms often used in technical
> documentation and discussion, several projects have begun discussions, or
> made proposals, or started work along these lines.
>
> The HBase PMC began its own discussion on private@ on June 9, 2020 with an
> observation of this activity and this suggestion:
>
> There is a renewed push back against classic technology industry terms that
> have negative modern connotations.
>
> In the case of HBase, the following substitutions might be proposed:
>
> - Coordinator instead of master
>
> - Worker instead of slave
>
> Recommendations for these additional substitutions also come up in this
> type of discussion:
>
> - Accept list instead of white list
>
> - Deny list instead of black list
>
> Unfortunately we have Master all over our code base, baked into various
> APIs and configuration variable names, so for us the necessary changes
> amount to a new major release and deprecation cycle. It could well be worth
> it in the long run. We exist only as long as we draw a willing and
> sufficient contributor community. It also wouldn’t be great to have an
> activist fork appear somewhere, even if unlikely to be successful.
>
> Relevant JIRAs are:
>
>- HBASE-12677 :
>Update replication docs to clarify terminology
>- HBASE-13852 :
>Replace master-slave terminology in book, site, and javadoc with a more
>modern vocabulary
>- HBASE-24576 :
>Changing "whitelist" and "blacklist" in our docs and project
>
> In response to this proposal, a member of the PMC asked if the term
> 'master' used by itself would be fine, because we only have use of 'slave'
> in replication documentation and that is easily addressed. In response to
> this question, others on the PMC suggested that even if only 'master' is
> used, in this context it is still a problem.
>
> For folks who are surprised or lacking context on the details of this
> discussion, one PMC member offered a link to this draft RFC as background:
> https://tools.ietf.org/id/draft-knodel-terminology-00.html
>
> There was general support for removing the term "master" / "hmaster" from
> our code base and using the terms "coordinator" or "leader" instead. In the
> context of replication, "worker" makes less sense and perhaps "destination"
> or "follower" would be more appropriate terms.
>
> One PMC member's thoughts on language and non-native English speakers is
> worth including in its entirety:
>
> While words like blacklist/whitelist/slave clearly have those negative
> references, word master might not have the same impact for non native
> English speakers like myself where the literal translation to my mother
> tongue does not have this same bad connotation. Replacing all references
> for word *master *on our docs/codebase is a huge effort, I guess such a
> decision would be more suitable for native English speakers folks, and
> maybe we should consider the opinion of contributors from that ethinic
> minority as well?
>
> These are good questions for public discussion.
>
> We have a consensus in the PMC, at this time, that is supportive of making
> the above discussed terminology changes. However, we also have concerns
> about what it would take to accomplish meaning

Re: Celebrating our 10th birthday for Apache HBase

2020-05-12 Thread Mich Talebzadeh

Hi,

I will be presenting on Hbase to one of the major European banks this
Friday  15th May.

Does anyone have latest bullet points on new features of HBase so I can add
them to my presentation material.

Many thanks,

Dr Mich Talebzadeh


[image: image.png]




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 2 May 2020 at 08:19, Mich Talebzadeh 
wrote:

> Hi,
>
> I have worked with Hbase for many years and I think it is a great product.
> it does what it says on the tin so to speak.
>
> Ironically if you look around the NoSQL competitors, most of them are
> supported by start-ups, whereas Hbase is only supported as part of Apache
> suite of products by vendors like Cloudera, Hortonworks MapR etc.
>
> For those who would prefer to use SQL on top, there is Apache Phoenix
> around which makes life easier for most SQL savvy world to work on Hbase.
> Problem solved
>
> For TCO, Hbase is still value for money compared to others. You don't need
> expensive RAM or SSD with Hbase. That makes it easy to onboard it in no
> time. Also Hbase can be used in a variety of different business
> applications, whereas other commercial ones  are focused on narrower niche
> markets.
>
> Least but last happy 10th anniversary and hope Hbase will go form
> strength to strength and we will keep using it for years to come!
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 2 May 2020 at 07:28, Yu Li  wrote:
>
>> Dear HBase developers and users,
>> 亲爱的HBase开发者和用户们，
>>
>> It has been a decade since Apache HBase became an Apache top level project
>> [1]. Ten years is a big milestone and deserves a good celebration. Do you
>> have anything to say to us? Maybe some wishes, good stories or just a
>> happy
>> birthday blessing? Looking forward to your voices.
>>
>> 大家好！距离 HBase 成为 Apache 顶级项目 (TLP) 已经过去了整整 10 年
>> [1]，这是一个值得纪念的里程碑。在这个特殊的时刻，您有什么想对 HBase 说的吗？分享您和 HBase 之间发生的故事，表达您对 HBase
>> 的期待，或者是一句简单的“生日快乐”祝福？期待听到您的声音。
>>
>> Best Regards,
>> Yu (on behalf of the Apache HBase PMC)
>>
>> 祝好！
>> Yu (代表Apache HBase PMC)
>>
>> [1] https://whimsy.apache.org/board/minutes/HBase.html#2010-04-21
>>
>

Re: Celebrating our 10th birthday for Apache HBase

2020-05-04 Thread Mich Talebzadeh

Many thanks. I thought HBase deserves a 10 candle virtual cake!

regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 4 May 2020 at 20:04, Stack  wrote:

> Nice testimonial above Mich.
> S
>
> On Sun, May 3, 2020 at 3:45 AM Mich Talebzadeh 
> wrote:
>
> > Hi,
> >
> > Back in 2017, I wrote an article in LinkedIn on HBase titled HBase for
> > impatient
> > <
> https://www.linkedin.com/pulse/apache-hbase-impatient-mich-talebzadeh-ph-d-/
> >
> >
> > Today I wrote a post in celebration of HBase 10 years anniversary in
> > LinkedIn.
> >
> > HTH,
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> > [image: image.png]
> >
> >
> >
> > LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
> >
> >
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Sat, 2 May 2020 at 07:28, Yu Li  wrote:
> >
> >> Dear HBase developers and users,
> >> 亲爱的HBase开发者和用户们，
> >>
> >> It has been a decade since Apache HBase became an Apache top level
> project
> >> [1]. Ten years is a big milestone and deserves a good celebration. Do
> you
> >> have anything to say to us? Maybe some wishes, good stories or just a
> >> happy
> >> birthday blessing? Looking forward to your voices.
> >>
> >> 大家好！距离 HBase 成为 Apache 顶级项目 (TLP) 已经过去了整整 10 年
> >> [1]，这是一个值得纪念的里程碑。在这个特殊的时刻，您有什么想对 HBase 说的吗？分享您和 HBase 之间发生的故事，表达您对 HBase
> >> 的期待，或者是一句简单的“生日快乐”祝福？期待听到您的声音。
> >>
> >> Best Regards,
> >> Yu (on behalf of the Apache HBase PMC)
> >>
> >> 祝好！
> >> Yu (代表Apache HBase PMC)
> >>
> >> [1] https://whimsy.apache.org/board/minutes/HBase.html#2010-04-21
> >>
> >
>

Re: Celebrating our 10th birthday for Apache HBase

2020-05-03 Thread Mich Talebzadeh

Hi,

Back in 2017, I wrote an article in LinkedIn on HBase titled HBase for
impatient
<https://www.linkedin.com/pulse/apache-hbase-impatient-mich-talebzadeh-ph-d-/>

Today I wrote a post in celebration of HBase 10 years anniversary in
LinkedIn.

HTH,


Dr Mich Talebzadeh


[image: image.png]



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 2 May 2020 at 07:28, Yu Li  wrote:

> Dear HBase developers and users,
> 亲爱的HBase开发者和用户们，
>
> It has been a decade since Apache HBase became an Apache top level project
> [1]. Ten years is a big milestone and deserves a good celebration. Do you
> have anything to say to us? Maybe some wishes, good stories or just a happy
> birthday blessing? Looking forward to your voices.
>
> 大家好！距离 HBase 成为 Apache 顶级项目 (TLP) 已经过去了整整 10 年
> [1]，这是一个值得纪念的里程碑。在这个特殊的时刻，您有什么想对 HBase 说的吗？分享您和 HBase 之间发生的故事，表达您对 HBase
> 的期待，或者是一句简单的“生日快乐”祝福？期待听到您的声音。
>
> Best Regards,
> Yu (on behalf of the Apache HBase PMC)
>
> 祝好！
> Yu (代表Apache HBase PMC)
>
> [1] https://whimsy.apache.org/board/minutes/HBase.html#2010-04-21
>

Re: Celebrating our 10th birthday for Apache HBase

2020-05-02 Thread Mich Talebzadeh

Hi,

I have worked with Hbase for many years and I think it is a great product.
it does what it says on the tin so to speak.

Ironically if you look around the NoSQL competitors, most of them are
supported by start-ups, whereas Hbase is only supported as part of Apache
suite of products by vendors like Cloudera, Hortonworks MapR etc.

For those who would prefer to use SQL on top, there is Apache Phoenix
around which makes life easier for most SQL savvy world to work on Hbase.
Problem solved

For TCO, Hbase is still value for money compared to others. You don't need
expensive RAM or SSD with Hbase. That makes it easy to onboard it in no
time. Also Hbase can be used in a variety of different business
applications, whereas other commercial ones  are focused on narrower niche
markets.

Least but last happy 10th anniversary and hope Hbase will go form
strength to strength and we will keep using it for years to come!

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Sat, 2 May 2020 at 07:28, Yu Li  wrote:

> Dear HBase developers and users,
> 亲爱的HBase开发者和用户们，
>
> It has been a decade since Apache HBase became an Apache top level project
> [1]. Ten years is a big milestone and deserves a good celebration. Do you
> have anything to say to us? Maybe some wishes, good stories or just a happy
> birthday blessing? Looking forward to your voices.
>
> 大家好！距离 HBase 成为 Apache 顶级项目 (TLP) 已经过去了整整 10 年
> [1]，这是一个值得纪念的里程碑。在这个特殊的时刻，您有什么想对 HBase 说的吗？分享您和 HBase 之间发生的故事，表达您对 HBase
> 的期待，或者是一句简单的“生日快乐”祝福？期待听到您的声音。
>
> Best Regards,
> Yu (on behalf of the Apache HBase PMC)
>
> 祝好！
> Yu (代表Apache HBase PMC)
>
> [1] https://whimsy.apache.org/board/minutes/HBase.html#2010-04-21
>

Re: Spark reading from Hbase throws java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods

2020-02-23 Thread Mich Talebzadeh

Hi,

Does anyone has any more suggestion for the error I reported below please?

Thanks,

Mich



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 17 Feb 2020 at 22:27, Mich Talebzadeh 
wrote:

> I stripped everything from the jar list. This is all I have
>
> sspark-shell --jars shc-core-1.1.1-2.1-s_2.11.jar, \
>   json4s-native_2.11-3.5.3.jar, \
>   json4s-jackson_2.11-3.5.3.jar, \
>   hbase-client-1.2.3.jar, \
>   hbase-common-1.2.3.jar
>
> Now I still get the same error!
>
> scala> val df = withCatalog(catalog)
> java.lang.NoSuchMethodError:
> org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
>   at
> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:257)
>   at
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:80)
>   at
> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>   at
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
>   at
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
>   at withCatalog(:54)
>
> Thanks
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 17 Feb 2020 at 21:37, Mich Talebzadeh 
> wrote:
>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>> Many thanks both.
>>
>> Let me check and confirm.
>>
>> regards,
>>
>> Mich
>>
>>
>> On Mon, 17 Feb 2020 at 21:33, Jörn Franke  wrote:
>>
>>> Is there a reason why different Scala (it seems at least 2.10/2.11)
>>> versions are mixed? This never works.
>>> Do you include by accident a dependency to with an old Scala version? Ie
>>> the Hbase datasource maybe?
>>>
>>>
>>> Am 17.02.2020 um 22:15 schrieb Mich Talebzadeh <
>>> mich.talebza...@gmail.com>:
>>>
>>> 
>>> Thanks Muthu,
>>>
>>>
>>> I am using the following jar files for now in local mode i.e.  
>>> spark-shell_local
>>> --jars …..
>>>
>>> json4s-jackson_2.10-3.2.10.jar
>>> json4s_2.11-3.2.11.jar
>>> json4s-native_2.10-3.4.0.jar
>>>
>>> Which one is the incorrect one please/
>>>
>>> Regards,
>>>
>>> Mich
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 17 Feb 2020 at 20:28, Muthu Jayakumar 
>>> wrote:
>>>
>>>> I suspect the spark job is somehow having an incorrect (newer) version
>>>> of json4s in the classpath. json4s 3.5.3 is the utmost version that can be
>>>> used.
>>>>
>>>> Thanks,
>>>> Muthu
>>>>
>>>> On Mon, Feb 17, 2020, 06:43 Mich Talebza

Re: Running HBase on Hadoop 3.1, was Re: [ANNOUNCE] Apache HBase 2.1.9 is now available for download

2020-02-21 Thread Mich Talebzadeh

Thanks Nick.

My bad. One of the nodes holding region server did not have Snappy package
installed.

yum install snappy snappy-devel did the trick

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 21 Feb 2020 at 21:36, Nick Dimiduk  wrote:

> Hi Mich,
>
> I'm glad you're able to make progress. It sounds like your native libraries
> are not installed or activated correctly. It's possible they're not drop-in
> compatible between Hadoop2 and Hadoop3, in which case recompilation might
> be necessary. I'm not up-to-current on the state of those affairs (though
> potentially someone else reading here is...)
>
> Have you looked through this section of our book?
> http://hbase.apache.org/book.html#hadoop.native.lib
>
> Bonne chance,
> Nick
>
> On Thu, Feb 20, 2020 at 11:45 AM Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks Nick,
> >
> > Looking at this matrix <http://hbase.apache.org/book.html#hadoop>, I
> > upgraded Hadoop from 3.1 to 3.1.1. This should allow me to used Hbase
> 2.1.8
> >
> > So I did both. One needs to change few hbase-site.xml setting etc to make
> > it work. Also the addition of htrace-core-3.1.0-incubating.jar to
> > $HBASE_HOME/lib.
> >
> > So both the HMaster and HRegionServer  are running on master node
> > and HRegionServer on another node.
> >
> > The drawback is that one cannot use snappy compression in create table
> > command,  something like below
> >
> > create 'trading:MARKETDATAHBASESPEED',  {NAME=> 'PRICE_INFO', COMPRESSION
> > => 'SNAPPY'}
> >
> > This is the error I get on HRegionServer log
> >
> > org.apache.hadoop.hbase.DoNotRetryIOException: Compression algorithm
> > 'snappy' previously failed test.
> >
> > Eventually table is sort of created but mot complete. Also getting rid of
> > this table becomes problematic
> >
> > hbase(main):002:0> drop 'trading:MARKETDATAHBASESPEED'
> > ERROR: The procedure 31 is still running
> > For usage try 'help "drop"'
> > Took 668.3693 seconds
> > hbase(main):003:0> drop 'trading:MARKETDATAHBASESPEED'
> > ERROR: The procedure 32 is still running
> > For usage try 'help "drop"'
> > Took 668.2910 seconds
> >
> >
> > So the table cannot be dropped.
> >
> > All fun and games
> >
> > Cheers,
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn *
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Thu, 20 Feb 2020 at 18:54, Nick Dimiduk  wrote:
> >
> > > changed subject, dropped dev
> > >
> > > Hello Mich,
> > >
> > > The HBase binary distribution includes all Hadoop client jars necessary
> > for
> > > HBase to function on top of HDFS. The version of those Hadoop jars is
> > that
> > > of Hadoop 2.8.5. Duo is saying that the Hadoop 2.8.5 client works
> > against a
> > > HDFS 3.1.x cluster. Thus, this binary release of HBase is expected to
> > work
> > > with HDFS 3.1.
> > >
> > > Alternatively, you can choose to recompile HBase using Hadoop 3.1.x as
> > the
> > > dependency. In that case, you'll need to retrieve the source,
> establish a
> > > build environment, and perform the build yourself -- the HBase project
> > does
> > > not distr

Re: Running HBase on Hadoop 3.1, was Re: [ANNOUNCE] Apache HBase 2.1.9 is now available for download

2020-02-20 Thread Mich Talebzadeh

Thanks Nick,

Looking at this matrix <http://hbase.apache.org/book.html#hadoop>, I
upgraded Hadoop from 3.1 to 3.1.1. This should allow me to used Hbase 2.1.8

So I did both. One needs to change few hbase-site.xml setting etc to make
it work. Also the addition of htrace-core-3.1.0-incubating.jar to
$HBASE_HOME/lib.

So both the HMaster and HRegionServer  are running on master node
and HRegionServer on another node.

The drawback is that one cannot use snappy compression in create table
command,  something like below

create 'trading:MARKETDATAHBASESPEED',  {NAME=> 'PRICE_INFO', COMPRESSION
=> 'SNAPPY'}

This is the error I get on HRegionServer log

org.apache.hadoop.hbase.DoNotRetryIOException: Compression algorithm
'snappy' previously failed test.

Eventually table is sort of created but mot complete. Also getting rid of
this table becomes problematic

hbase(main):002:0> drop 'trading:MARKETDATAHBASESPEED'
ERROR: The procedure 31 is still running
For usage try 'help "drop"'
Took 668.3693 seconds
hbase(main):003:0> drop 'trading:MARKETDATAHBASESPEED'
ERROR: The procedure 32 is still running
For usage try 'help "drop"'
Took 668.2910 seconds


So the table cannot be dropped.

All fun and games

Cheers,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 20 Feb 2020 at 18:54, Nick Dimiduk  wrote:

> changed subject, dropped dev
>
> Hello Mich,
>
> The HBase binary distribution includes all Hadoop client jars necessary for
> HBase to function on top of HDFS. The version of those Hadoop jars is that
> of Hadoop 2.8.5. Duo is saying that the Hadoop 2.8.5 client works against a
> HDFS 3.1.x cluster. Thus, this binary release of HBase is expected to work
> with HDFS 3.1.
>
> Alternatively, you can choose to recompile HBase using Hadoop 3.1.x as the
> dependency. In that case, you'll need to retrieve the source, establish a
> build environment, and perform the build yourself -- the HBase project does
> not distribute a binary artifact compiled against Hadoop 3.
>
> Thanks,
> Nick
>
> On Thu, Feb 20, 2020 at 3:20 AM Mich Talebzadeh  >
> wrote:
>
> > Thank you.
> >
> > In your points below
> >
> > If you want to run HBase on top of a 3.1.x HDFS cluster, you can just use
> > the binaries, hadoop 2.8.5 client can communicate with 3.1.x server.
> >
> > If you want HBase to use hadoop 3.1.x, then you need to build the
> binaries
> > by yourself, use the hadoop 3 profile.
> >
> > Could you please clarify (since Hbase can work with Hadoop), which
> version
> > of new release has been tested against which version of Hadoop.
> >
> > Also it is not very clear which binaries you are referring to? "Hadoop
> > 2.8.5 client can communicate with "Hadoop" 3.1 server. So basically
> > download Hadoop 2.8.5 client for Hbase use?
> >
> >
> > Regards,
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn *
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Thu, 20 Feb 2020 at 09:45, 张铎(Duo Zhang) 
> wrote:
> >
> > > If you want to run HBase on top of a 3.1.x HDFS cluster, you can just
> use
> > > the binaries, hadoop 2.8.5 client can communicate with 3.1.x server.
> > >
> > > If you want HBase to use hadoop 3.1.x, then you need to build the
> > binaries
> > > by yourself, use the hadoop 3 profile.
> > >
> > > Mich Talebzadeh  于2

Re: [ANNOUNCE] Apache HBase 2.1.9 is now available for download

2020-02-20 Thread Mich Talebzadeh

Thank you.

In your points below

If you want to run HBase on top of a 3.1.x HDFS cluster, you can just use
the binaries, hadoop 2.8.5 client can communicate with 3.1.x server.

If you want HBase to use hadoop 3.1.x, then you need to build the binaries
by yourself, use the hadoop 3 profile.

Could you please clarify (since Hbase can work with Hadoop), which version
of new release has been tested against which version of Hadoop.

Also it is not very clear which binaries you are referring to? "Hadoop
2.8.5 client can communicate with "Hadoop" 3.1 server. So basically
download Hadoop 2.8.5 client for Hbase use?


Regards,



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 20 Feb 2020 at 09:45, 张铎(Duo Zhang)  wrote:

> If you want to run HBase on top of a 3.1.x HDFS cluster, you can just use
> the binaries, hadoop 2.8.5 client can communicate with 3.1.x server.
>
> If you want HBase to use hadoop 3.1.x, then you need to build the binaries
> by yourself, use the hadoop 3 profile.
>
> Mich Talebzadeh  于2020年2月20日周四 下午5:15写道：
>
> > Hi,
> >
> > Thanks.
> >
> > Does this version of Hbase work with Hadoop 3.1? I am still stuck with
> > Hbase 1.2.7
> >
> > Hadoop 3.1.0
> > Source code repository https://github.com/apache/hadoop -r
> > 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
> > Compiled by centos on 2018-03-30T00:00Z
> > Compiled with protoc 2.5.0
> >
> > Regards,
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn *
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Thu, 20 Feb 2020 at 08:48, Duo Zhang  wrote:
> >
> > > The HBase team is happy to announce the immediate availability of
> Apache
> > > HBase 2.1.9.
> > >
> > > Download from https://hbase.apache.org/downloads.html
> > >
> > > Apache HBase is an open-source, distributed, versioned, non-relational
> > > database. Apache HBase gives you low latency random access to billions
> of
> > > rows with millions of columns atop non-specialized hardware. To learn
> > more
> > > about HBase, see https://hbase.apache.org/.
> > >
> > > HBase 2.1.9 is the latest release of the HBase 2.1 line, continuing on
> > the
> > > theme of bringing a stable, reliable database to the Apache Big Data
> > > ecosystem and beyond. 2.1.9 includes ~62 bug and improvement fixes done
> > > since the 2.1.8.
> > >
> > > For instructions on verifying ASF release downloads, please see
> > >
> > > https://www.apache.org/dyn/closer.cgi#verify
> > >
> > > Project member signature keys can be found at
> > >
> > > https://www.apache.org/dist/hbase/KEYS
> > >
> > > Thanks to all the contributors who made this release possible!
> > >
> > > Best,
> > > The HBase Dev Team
> > >
> >
>

Re: [ANNOUNCE] Apache HBase 2.1.9 is now available for download

2020-02-20 Thread Mich Talebzadeh

Hi,

Thanks.

Does this version of Hbase work with Hadoop 3.1? I am still stuck with
Hbase 1.2.7

Hadoop 3.1.0
Source code repository https://github.com/apache/hadoop -r
16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
Compiled by centos on 2018-03-30T00:00Z
Compiled with protoc 2.5.0

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 20 Feb 2020 at 08:48, Duo Zhang  wrote:

> The HBase team is happy to announce the immediate availability of Apache
> HBase 2.1.9.
>
> Download from https://hbase.apache.org/downloads.html
>
> Apache HBase is an open-source, distributed, versioned, non-relational
> database. Apache HBase gives you low latency random access to billions of
> rows with millions of columns atop non-specialized hardware. To learn more
> about HBase, see https://hbase.apache.org/.
>
> HBase 2.1.9 is the latest release of the HBase 2.1 line, continuing on the
> theme of bringing a stable, reliable database to the Apache Big Data
> ecosystem and beyond. 2.1.9 includes ~62 bug and improvement fixes done
> since the 2.1.8.
>
> For instructions on verifying ASF release downloads, please see
>
> https://www.apache.org/dyn/closer.cgi#verify
>
> Project member signature keys can be found at
>
> https://www.apache.org/dist/hbase/KEYS
>
> Thanks to all the contributors who made this release possible!
>
> Best,
> The HBase Dev Team
>

Re: How to insert Json records from Flume into Hbase with Kafka source

2020-02-16 Thread Mich Talebzadeh

Hi,

This regex seems to work


*JsonAgent.sinks.Hbase-sink.serializer.regex
=[^_]*"(.+).{1},(.+),(.+),(.+).{1}*
Remember we were getting the below as ROW (incorrect) beforehand

{"rowkey":"eff0bdc7-d6b1-40b5-ad0a-b8181173b806"

The first positional column is the ROW_KEY. *We need to strip all except
the UUID itself*

[^_]*"(.+).{1} means

Get rid of everything *from start until and including first quote* and
also *get
rid of last quote *just getting the ROW_KEY itself

eff0bdc7-d6b1-40b5-ad0a-b8181173b806

And also we wanted to *get rid of '}' *from last column in this case the
price column

(.+).{1}

Means get rid of last character

Now the search via ROW_KEY works

hbase(main):483:0> *get 'trading:MARKETDATAHBASEBATCH',
'19735b2e-91b6-4cc8-afcb-f02c00bd52a3'*
COLUMN CELL
 PRICE_INFO:key
timestamp=1581883743642, value=19735b2e-91b6-4cc8-afcb-f02c00bd52a3
 PRICE_INFO:partition
timestamp=1581883743642, value=6
 PRICE_INFO:price
timestamp=1581883743642, value= "price":108.7
 PRICE_INFO:ticker
timestamp=1581883743642, value="ticker":"IBM"
 PRICE_INFO:timeissued
timestamp=1581883743642, value= "timeissued":"2020-02-16T20:19:43"
 PRICE_INFO:timestamp
timestamp=1581883743642, value=1581883739646
 PRICE_INFO:topic
timestamp=1581883743642, value=md
7 row(s) in 0.0040 seconds


Hope this helps

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 16 Feb 2020 at 10:47, Mich Talebzadeh 
wrote:

> BTW
>
> When I turn out headers in the conf fle
>
> JsonAgent.sinks.Hbase-sink.serializer.depositHeaders=true
>
> I get
>
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> *column=PRICE_INFO:key*, timestamp=1581849565330,
> *value=f8a6e006-35bb-4470-9a7b-9273b8aa83f*1
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:partition, timestamp=1581849565330, value=5
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:price, timestamp=1581849565330, value= "price":202.74}
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:ticker, timestamp=1581849565330, value="ticker":"IBM"
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:timeissued, timestamp=1581849565330, value=
> "timeissued":"2020-02-16T10:50:05"
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:timestamp, timestamp=1581849565330, value=1581849561330
>  {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
> column=PRICE_INFO:topic, timestamp=1581849565330, value=md
>
> So it displays the key alright value=f8a6e006-35bb-4470-9a7b-9273b8aa83f1
>
> But cannot search on that key!
>
> hbase(main):333:0> get 'trading:MARKETDATAHBASEBATCH',
> 'f8a6e006-35bb-4470-9a7b-9273b8aa83f1'
> COLUMN CELL
> 0 row(s) in 0.0540 seconds
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 15 Feb 2020 at 15:12, Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> I have streaming Kafka that sends data to flume in the following JSON
>> format
>>
>> This is the record is sent via Kafka
>>
>> 7d645a0f-0386-4405-8af1-7fca908fe928
>> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
>> "timeissued":"2020-02-14T20:32:29", "p

Re: How to insert Json records from Flume into Hbase with Kafka source

2020-02-16 Thread Mich Talebzadeh

BTW

When I turn out headers in the conf fle

JsonAgent.sinks.Hbase-sink.serializer.depositHeaders=true

I get

 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
*column=PRICE_INFO:key*, timestamp=1581849565330,
*value=f8a6e006-35bb-4470-9a7b-9273b8aa83f*1
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:partition, timestamp=1581849565330, value=5
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:price, timestamp=1581849565330, value= "price":202.74}
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:ticker, timestamp=1581849565330, value="ticker":"IBM"
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:timeissued, timestamp=1581849565330, value=
"timeissued":"2020-02-16T10:50:05"
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:timestamp, timestamp=1581849565330, value=1581849561330
 {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1"
column=PRICE_INFO:topic, timestamp=1581849565330, value=md

So it displays the key alright value=f8a6e006-35bb-4470-9a7b-9273b8aa83f1

But cannot search on that key!

hbase(main):333:0> get 'trading:MARKETDATAHBASEBATCH',
'f8a6e006-35bb-4470-9a7b-9273b8aa83f1'
COLUMN CELL
0 row(s) in 0.0540 seconds






Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 15 Feb 2020 at 15:12, Mich Talebzadeh 
wrote:

> Hi,
>
> I have streaming Kafka that sends data to flume in the following JSON
> format
>
> This is the record is sent via Kafka
>
> 7d645a0f-0386-4405-8af1-7fca908fe928
> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
> "timeissued":"2020-02-14T20:32:29", "price":140.11}
>
> Note that "7d645a0f-0386-4405-8af1-7fca908fe928" is the key and there are
> 4 columns in value including the key itself as another column.
>
> The Flume configuration file is as follows
>
> # Describing/Configuring the sink
> JsonAgent.channels.hdfs-channel-1.type = memory
> JsonAgent.channels.hdfs-channel-1.capacity = 300
> JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
> *JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink*
> JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
> JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
> JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
>
> JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
> *JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)*
>
> *JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  =
> 0JsonAgent.sinks.Hbase-sink.serializer.colNames
> =ROW_KEY,ticker,timeissued,price*
> JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
> JsonAgent.sinks.Hbase-sink.batchSize =100
>
> This works and posts records to Hbase as follows:
>
> ROWCOLUMN+CELL
>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
> column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11}
>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
> column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM"
>  {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
> column=PRICE_INFO:timeissued, timestamp=1581711715292, value=
> "timeissued":"2020-02-14T20:32:29"
> 1 row(s) in 0.0050 seconds
>
> However there is a problem. the rowkey value includes redundant
> characters {"rowkey": that do not allow for records to be searched in Hbase
> based on rowkey value! When I try to ignore the redundant characters by
> twicking regex, unfortunately no rows are added to Hbase table. Example as
> follows:
>
> JsonAgent.sinks.Hbase-sink.serializer.regex = (?<=^.{9}).+,(.+),(.+),(.+)
>
> Appreciate any advice.
>
> Thanks,
>
> Mich
>
>
>
>

How to insert Json records from Flume into Hbase with Kafka source

2020-02-15 Thread Mich Talebzadeh

Hi,

I have streaming Kafka that sends data to flume in the following JSON format

This is the record is sent via Kafka

7d645a0f-0386-4405-8af1-7fca908fe928
{"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
"timeissued":"2020-02-14T20:32:29", "price":140.11}

Note that "7d645a0f-0386-4405-8af1-7fca908fe928" is the key and there are 4
columns in value including the key itself as another column.

The Flume configuration file is as follows

# Describing/Configuring the sink
JsonAgent.channels.hdfs-channel-1.type = memory
JsonAgent.channels.hdfs-channel-1.capacity = 300
JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
*JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink*
JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
*JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)*

*JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  =
0JsonAgent.sinks.Hbase-sink.serializer.colNames
=ROW_KEY,ticker,timeissued,price*
JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
JsonAgent.sinks.Hbase-sink.batchSize =100

This works and posts records to Hbase as follows:

ROWCOLUMN+CELL
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11}
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM"
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:timeissued, timestamp=1581711715292, value=
"timeissued":"2020-02-14T20:32:29"
1 row(s) in 0.0050 seconds

However there is a problem. the rowkey value includes redundant
characters {"rowkey": that do not allow for records to be searched in Hbase
based on rowkey value! When I try to ignore the redundant characters by
twicking regex, unfortunately no rows are added to Hbase table. Example as
follows:

JsonAgent.sinks.Hbase-sink.serializer.regex = (?<=^.{9}).+,(.+),(.+),(.+)

Appreciate any advice.

Thanks,

Mich

Re: How to insert Json records from Flume into Hbase table

2020-02-14 Thread Mich Talebzadeh

Thanks Pedro. The mention of
*org.apache.flume.sink.hbase.RegexHbaseEventSerializer* was very useful.

This works

# Describing/Configuring the sink
JsonAgent.channels.hdfs-channel-1.type = memory
JsonAgent.channels.hdfs-channel-1.capacity = 300
JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
*JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink*
JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
*JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)*

*JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  =
0JsonAgent.sinks.Hbase-sink.serializer.colNames
=ROW_KEY,ticker,timeissued,price*
JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
JsonAgent.sinks.Hbase-sink.batchSize =100

This is the record is sent via Kafka

7d645a0f-0386-4405-8af1-7fca908fe928
{"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM",
"timeissued":"2020-02-14T20:32:29", "price":140.11}

And the same record in Hbase

 ROWCOLUMN+CELL
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11}
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM"
 {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928"
column=PRICE_INFO:timeissued, timestamp=1581711715292, value=
"timeissued":"2020-02-14T20:32:29"
1 row(s) in 0.0050 seconds

Regards,

Mich






On Fri, 14 Feb 2020 at 13:41, Pedro Boado  wrote:

> Probably Flume's mailing list would be a better resource to get help about
> this.
>
> SimpleHBaseEventSerializer doesn't do regex, so you can't extract your own
> .
>
> https://github.com/slmnhq/flume/blob/master/flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/SimpleHbaseEventSerializer.java#L40
>
> I'd say you should go for RegexHbaseEventRowKeySerializer.
>
>
>
> On Fri, 14 Feb 2020 at 13:27, Mich Talebzadeh 
> wrote:
>
> > Thanks Pedro,
> >
> > As I understand it tries a default rowkey as follows:
> >
> > Row keys are default + UUID_like_string
> > :
> >  defaultfb7cb953-8598-466e-a1c0-277e2863b249
> >
> > But I send rowkey value as well
> >
> > *f2d7174e-6299-49a7-9e87-0d66c248e66b*
> > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> > "timeissued":"2020-02-14T08:54:13", "price":573.25}
> >
> > But it is still generates its own rowkey. -->
> > defaultfb7cb953-8598-466e-a1c0-277e2863b249
> >
> > How can I make Hbase use the rowkey that flume sends WITHOUT generating
> its
> > own rowkey?
> >
> > Regards,
> >
> > Mich
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Fri, 14 Feb 2020 at 12:27, Pedro Boado  wrote:
> >
> > > If what you're looking after is not achievable by extracting fields
> > through
> > > regex (it looks like it should) and you are after full control over
> > what's
> > > written to HBase you're probably looking at writing your own
> serializer.
> > >
> > > On Fri, 14 Feb 2020 at 11:05, Mich Talebzadeh <
> mich.talebza...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have an Hbase table 'trading:MARKETDATAHBASEBATCH'
> > > >
> > > > Kafka delivers topic rows into flume.
> > > >
> > > > This is a typical json row
> > > >
> > > > f2d7174e-6299-49a7-9e87-0d66c248e66b
> > > > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> > > > "timeissued":"2020-02-14T08:54:13", "price":573.25}
> > > >
> > > > The rowkey is UUID
> > > >
> > > > The json.conf for Flume is as follows:
> >

Re: How to insert Json records from Flume into Hbase table

2020-02-14 Thread Mich Talebzadeh

Thanks Pedro,

As I understand it tries a default rowkey as follows:

Row keys are default + UUID_like_string
:
 defaultfb7cb953-8598-466e-a1c0-277e2863b249

But I send rowkey value as well

*f2d7174e-6299-49a7-9e87-0d66c248e66b*
{"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
"timeissued":"2020-02-14T08:54:13", "price":573.25}

But it is still generates its own rowkey. -->
defaultfb7cb953-8598-466e-a1c0-277e2863b249

How can I make Hbase use the rowkey that flume sends WITHOUT generating its
own rowkey?

Regards,

Mich


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 14 Feb 2020 at 12:27, Pedro Boado  wrote:

> If what you're looking after is not achievable by extracting fields through
> regex (it looks like it should) and you are after full control over what's
> written to HBase you're probably looking at writing your own serializer.
>
> On Fri, 14 Feb 2020 at 11:05, Mich Talebzadeh 
> wrote:
>
> > Hi,
> >
> > I have an Hbase table 'trading:MARKETDATAHBASEBATCH'
> >
> > Kafka delivers topic rows into flume.
> >
> > This is a typical json row
> >
> > f2d7174e-6299-49a7-9e87-0d66c248e66b
> > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> > "timeissued":"2020-02-14T08:54:13", "price":573.25}
> >
> > The rowkey is UUID
> >
> > The json.conf for Flume is as follows:
> >
> > # Describing/Configuring the sink
> > JsonAgent.channels.hdfs-channel-1.type = memory
> > JsonAgent.channels.hdfs-channel-1.capacity = 300
> > JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
> > JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
> > JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
> > JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
> > JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
> > JsonAgent.sinks.Hbase-sink.serializer
> > =org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
> > ##JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)
> > agent1.sinks.sink1.serializer.regex
> > =[a-zA-Z0-9]*^C[a-zA-Z0-9]*^C[a-zA-Z0-9]*
> > JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  = ROW_KEY
> > JsonAgent.sinks.Hbase-sink.serializer.colNames
> > =ROW_KEY,ticker,timeissued,price
> > JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
> > JsonAgent.sinks.Hbase-sink.batchSize =100
> >
> > The problem is that the rows are inserted as follows
> >
> > defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1
> > column=PRICE_INFO:pCol, timestamp=1581670394182,
> > value={"rowkey":"a7464cf4-42a1-40b8-a597-a41fbc3b847f","ticker":"MRW",
> > "timeissued":"2020-02-14T09:03:46", "price":317.13}
> >
> > So it creates a default rowkey value
> > "defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1" followed by json values all
> > in value column
> >
> > Ideally I would like something similar to below:
> >
> > hbase(main):085:0> put 'trading:MARKETDATAHBASEBATCH',
> > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:rowkey',
> > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
> > hbase(main):086:0> put 'trading:MARKETDATAHBASEBATCH',
> > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:ticker', "ORCL"
> > hbase(main):087:0> put 'trading:MARKETDATAHBASEBATCH',
> > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:timeissued',
> > "2020-02-14T09:57:32"
> > hbase(main):001:0> put 'trading:MARKETDATAHBASEBATCH',
> > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:price' ,22.12
> > hbase(main):002:0> get 'trading:MARKETDATAHBASEBATCH',
> > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
> > COLUMN CELL
> >  PRICE_INFO:price
> > timestamp=1581676221846, value=22.12
> >  PRICE_INFO:rowkey
> > timestamp=1581675986932, value=8b97d3b9-e87b-4f21-9879-b43c4dcccb37
> >  PRICE_INFO:ticker
> > timestamp=1581676103443, value=ORCL
> >  PRICE_INFO:timeissued
> > timestamp=1581676168656, value=2020-02-14T09:57:32
> >
> > Any advice would be appreciated.
> >
> > Thanks,
> >
> > Mich
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
>
>
> --
> Un saludo.
> Pedro Boado.
>

How to insert Json records from Flume into Hbase table

2020-02-14 Thread Mich Talebzadeh

Hi,

I have an Hbase table 'trading:MARKETDATAHBASEBATCH'

Kafka delivers topic rows into flume.

This is a typical json row

f2d7174e-6299-49a7-9e87-0d66c248e66b
{"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
"timeissued":"2020-02-14T08:54:13", "price":573.25}

The rowkey is UUID

The json.conf for Flume is as follows:

# Describing/Configuring the sink
JsonAgent.channels.hdfs-channel-1.type = memory
JsonAgent.channels.hdfs-channel-1.capacity = 300
JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
JsonAgent.sinks.Hbase-sink.serializer
=org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
##JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)
agent1.sinks.sink1.serializer.regex
=[a-zA-Z0-9]*^C[a-zA-Z0-9]*^C[a-zA-Z0-9]*
JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  = ROW_KEY
JsonAgent.sinks.Hbase-sink.serializer.colNames
=ROW_KEY,ticker,timeissued,price
JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
JsonAgent.sinks.Hbase-sink.batchSize =100

The problem is that the rows are inserted as follows

defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1
column=PRICE_INFO:pCol, timestamp=1581670394182,
value={"rowkey":"a7464cf4-42a1-40b8-a597-a41fbc3b847f","ticker":"MRW",
"timeissued":"2020-02-14T09:03:46", "price":317.13}

So it creates a default rowkey value
"defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1" followed by json values all
in value column

Ideally I would like something similar to below:

hbase(main):085:0> put 'trading:MARKETDATAHBASEBATCH',
"8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:rowkey',
"8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
hbase(main):086:0> put 'trading:MARKETDATAHBASEBATCH',
"8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:ticker', "ORCL"
hbase(main):087:0> put 'trading:MARKETDATAHBASEBATCH',
"8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:timeissued',
"2020-02-14T09:57:32"
hbase(main):001:0> put 'trading:MARKETDATAHBASEBATCH',
"8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:price' ,22.12
hbase(main):002:0> get 'trading:MARKETDATAHBASEBATCH',
"8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
COLUMN CELL
 PRICE_INFO:price
timestamp=1581676221846, value=22.12
 PRICE_INFO:rowkey
timestamp=1581675986932, value=8b97d3b9-e87b-4f21-9879-b43c4dcccb37
 PRICE_INFO:ticker
timestamp=1581676103443, value=ORCL
 PRICE_INFO:timeissued
timestamp=1581676168656, value=2020-02-14T09:57:32

Any advice would be appreciated.

Thanks,

Mich

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Phoenix CsvBulkLoadTool fails with java.sql.SQLException: ERROR 103 (08004): Unable to establish connection

2018-08-20 Thread Mich Talebzadeh

This was working fine before my Hbase upgrade to 1.2.6

I have Hbase version 1.2.6 and Phoenix
version apache-phoenix-4.8.1-HBase-1.2-bin

This command bulkloading into Hbase through phoenix failsnow fails

HADOOP_CLASSPATH=${HOME}/jars/hbase-protocol-1.2.6.jar:${HBASE_HOME}/conf
hadoop jar ${HBASE_HOME}/lib/phoenix-4.8.1-HBase-1.2-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool --table ${TABLE_NAME} --input
hdfs://rhes75:9000/${REFINED_HBASE_SUB_DIR}/${FILE_NAME}_${dir}.txt

hadoop jar /data6/hduser/hbase-1.2.6/lib/phoenix-4.8.1-HBase-1.2-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool --table MARKETDATAHBASEBATCH
--input
hdfs://rhes75:9000//data/prices/2018-08-20_refined/populate_Phoenix_table_MARKETDATAHBASEBATCH_2018-08-20.txt
+
HADOOP_CLASSPATH=/home/hduser/jars/hbase-protocol-1.2.6.jar:/data6/hduser/hbase-1.2.6/conf


With the following error

2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:java.library.path=/home/hduser/hadoop-3.1.0/lib
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:java.compiler=
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:os.name=Linux
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:os.version=3.10.0-862.3.2.el7.x86_64
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:user.name=hduser
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:user.home=/home/hduser
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper: Client
environment:user.dir=/data6/hduser/streaming_data/2018-08-20
2018-08-20 18:29:47,249 INFO  [main] zookeeper.ZooKeeper: Initiating client
connection, connectString=rhes75:2181 sessionTimeout=9
watcher=hconnection-0x493d44230x0, quorum=rhes75:2181, baseZNode=/hbase
2018-08-20 18:29:47,261 INFO  [main-SendThread(rhes75:2181)]
zookeeper.ClientCnxn: Opening socket connection to server rhes75/
50.140.197.220:2181. Will not attempt to authenticate using SASL (unknown
error)
2018-08-20 18:29:47,264 INFO  [main-SendThread(rhes75:2181)]
zookeeper.ClientCnxn: Socket connection established to rhes75/
50.140.197.220:2181, initiating session
2018-08-20 18:29:47,281 INFO  [main-SendThread(rhes75:2181)]
zookeeper.ClientCnxn:
Session establishment complete on server rhes75/50.140.197.220:2181,
sessionid = 0x1002ea99eed0077, negotiated timeout = 4
Exception in thread "main" java.sql.SQLException: ERROR 103 (08004): Unable
to establish connection.
at
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:455)

Any thoughts?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

ImportTsv fails with expecting hadoop-mapreduce-client-core-2.5.1.jar in hdfs!

2018-08-19 Thread Mich Talebzadeh

I am trying to import data into Hbase table from a csv file.

The version of Hbase is 1.2.6

This used to work in the older version of Hbase

 $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.separator=','
-Dimporttsv.columns="HBASE_ROW_KEY,tock_daily:stock,stock_daily:ticker,stock_daily:Date,stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stock_daily:volume"
-Dimporttsv.skip.bad.lines=true tsco hdfs://rhes75:9000/data/stocks/tsco.csv

But now it throws the following error with "File does not exist:
hdfs://rhes75:9000/data6/hduser/hbase-1.2.6/lib/hadoop-mapreduce-client-core-2.5.1.jar"

2018-08-19 19:11:09,445 INFO  [main] Configuration.deprecation:
io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2018-08-19 19:11:09,557 INFO  [main] mapreduce.JobSubmitter: Cleaning up
the staging area
file:/tmp/hadoop-hduser/mapred/staging/hduser144058/.staging/job_local144058_0001
Exception in thread "main" java.io.FileNotFoundException: File does not
exist:
hdfs://rhes75:9000/data6/hduser/hbase-1.2.6/lib/hadoop-mapreduce-client-core-2.5.1.jar

I am at loss why is looking at hdfs directory for this jar file. The jar
exists in $HBASE_HOME/bin!

ls $HBASE_HOME/lib/hadoop-mapreduce-client-core-2.5.1.jar
/data6/hduser/hbase-1.2.6/lib/hadoop-mapreduce-client-core-2.5.1.jar

This has also been report here with some unconventional solution.

https://stackoverflow.com/questions/5091/running-a-mapreduce-job-fails-file-does-not-exist?rq=1

Thanks





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: HBase 2.0.1 with Hadoop 2.8.4 causes NoSuchMethodException

2018-07-02 Thread Mich Talebzadeh

You are lucky that HBASE 2.0.1 worked with Hadoop 2.8

I tried HBASE 2.0.1 with Hadoop 3.1 and there was endless problems with the
Region server crashing because WAL file system issue.

thread - Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

Decided to roll back to Hbase 1.2.6 that works with Hadoop 3.1

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 2 Jul 2018 at 22:43, Andrey Elenskiy
 wrote:

> 
> hbase.wal.provider
> filesystem
> 
>
> Seems to fix it, but would be nice to actually try the fanout wal with
> hadoop 2.8.4.
>
> On Mon, Jul 2, 2018 at 1:03 PM, Andrey Elenskiy <
> andrey.elens...@arista.com>
> wrote:
>
> > Hello, we are running HBase 2.0.1 with official Hadoop 2.8.4 jars and
> > hadoop 2.8.4 client (http://central.maven.org/maven2/org/apache/hadoop/
> > hadoop-client/2.8.4/). Got the following exception on regionserver which
> > brings it down:
> >
> > 18/07/02 18:51:06 WARN concurrent.DefaultPromise: An exception was
> thrown by org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete()
> > java.lang.Error: Couldn't properly initialize access to HDFS internals.
> Please update your WAL Provider to not make use of the 'asyncfs' provider.
> See HBASE-16110 for more information.
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.(FanOutOneBlockAsyncDFSOutputSaslHelper.java:268)
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputHelper.initialize(FanOutOneBlockAsyncDFSOutputHelper.java:661)
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:118)
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:720)
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:715)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
> >  at
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
> >  at
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:638)
> >  at
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:676)
> >  at
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:552)
> >  at
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394)
> >  at
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> >  at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> >  at java.lang.Thread.run(Thread.java:748)
> >  Caused by: java.lang.NoSuchMethodException:
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(org.apache.hadoop.fs.FileEncryptionInfo)
> >  at java.lang.Class.getDeclaredMethod(Class.java:2130)
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.createTransparentCryptoHelper(FanOutOneBlockAsyncDFSOutputSaslHelper.java:232)
> >  at org.apache.hadoop.hbase.io
> .asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.(FanOutOneBlockAsyncDFSOutputSaslHelper.java:262)
> >  ... 18 more
> >
> >  FYI, we don't have encryption enabled. Let me know if you need more info
> > about our setup.
> >
>

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-07-02 Thread Mich Talebzadeh

Hi Sean,

Many thanks for the clarification. I read some notes on GitHub and JIRAs
for Hbase and Hadoop 3 integration.

So my decision was to revert back to an earlier stable version of Hbase as
I did not have the bandwidth trying to make Hbase work with Hadoop 3+

In fairness to Ted, he has always been very knowledgeable and helpful in
the forum and being an engineer myself, I would not think Ted's suggestion
was far off.

Kind Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 2 Jul 2018 at 14:27, Sean Busbey  wrote:

> Hi Mich,
>
> Please check out the section of our reference guide on Hadoop versions:
>
> http://hbase.apache.org/book.html#hadoop
>
> the short version is that there is not yet a Hadoop 3 version that the
> HBase community considers appropriate for running HBase. if you'd like
> to get into details and work arounds, please join the dev@hbase
> mailing list and bring it up there.
>
> Ted, please stop suggesting folks on the user list use anything other
> than PMC sanctioned releases of HBase.
>
> On Sun, Jul 1, 2018 at 1:09 AM, Mich Talebzadeh
>  wrote:
> > Hi,
> >
> > What is the ETA with version of Hbase that will work with Hadoop 3.1 and
> > may not require HA setup for HDFS?
> >
> > Thanks
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Sun, 1 Jul 2018 at 00:26, Mich Talebzadeh 
> > wrote:
> >
> >> Thanks Ted.
> >>
> >> Went back to hbase-1.2.6 that works OK with Hadoop 3.1
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
> >>
> >>
> >>
> >> On Sun, 1 Jul 2018 at 00:15, Ted Yu  wrote:
> >>
> >>> Have you tried setting the value for the config to filesystem ?
> >>>
> >>> Cheers
> >>>
> >>> On Sat, Jun 30, 2018 at 4:07 PM, Mich Talebzadeh <
> >>> mich.talebza...@gmail.com>
> >>> wrote:
> >>>
> >>> > One way would be to set WAL outside of Hadoop environment. Will that
> >>> work?
> >>> >
> >>> > The following did not work
> >>> >
> >>> > 
> >>> >   hbase.wal.provider
> >>> >   multiwal
> >>> > 
> >>> >
> >>> >
> >>> > Dr Mich Talebzadeh
> >>> >
> >>> >
> >>> >
> >>> > LinkedIn * https://www.linkedin.com/profile/view?id=
> >>> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> > <
> >>>
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> >>> > OABUrV8Pw>*
> >>> >
> &g

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

Hi,

What is the ETA with version of Hbase that will work with Hadoop 3.1 and
may not require HA setup for HDFS?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 1 Jul 2018 at 00:26, Mich Talebzadeh 
wrote:

> Thanks Ted.
>
> Went back to hbase-1.2.6 that works OK with Hadoop 3.1
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sun, 1 Jul 2018 at 00:15, Ted Yu  wrote:
>
>> Have you tried setting the value for the config to filesystem ?
>>
>> Cheers
>>
>> On Sat, Jun 30, 2018 at 4:07 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
>> wrote:
>>
>> > One way would be to set WAL outside of Hadoop environment. Will that
>> work?
>> >
>> > The following did not work
>> >
>> > 
>> >   hbase.wal.provider
>> >   multiwal
>> > 
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn * https://www.linkedin.com/profile/view?id=
>> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> > <
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
>> > OABUrV8Pw>*
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The author will in no case be liable for any monetary damages arising
>> from
>> > such loss, damage or destruction.
>> >
>> >
>> >
>> >
>> > On Sat, 30 Jun 2018 at 23:36, Ted Yu  wrote:
>> >
>> > > Please read :
>> > >
>> > > http://hbase.apache.org/book.html#wal.providers
>> > >
>> > > On Sat, Jun 30, 2018 at 3:31 PM, Mich Talebzadeh <
>> > > mich.talebza...@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks
>> > > >
>> > > > In your point below
>> > > >
>> > > > …. or you can change default WAL to FSHLog.
>> > > >
>> > > > is there any configuration parameter to allow me to do so in
>> > > > hbase-site.xml?
>> > > >
>> > > > Dr Mich Talebzadeh
>> > > >
>> > > >
>> > > >
>> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
>> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> > > > <https://www.linkedin.com/profile/view?id=
>> > AAEWh2gBxianrbJd6zP6AcPCCd
>> > > > OABUrV8Pw>*
>> > > >
>> > > >
>> > > >
>> > > > http://talebzadehmich.wordpress.com
>> > > >
>> > > >
>> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for
>> > any
>> > > > loss, damage or destruction of data or any other property which may
>> > arise
>> > > > from relying on this email's technical content is explicitly
>> > disclaimed.
>> > > > The author will in no case be liable for any monetary damages
>> arising
>> > > from
>> > > > such loss, damage or destruction.
>> > > >
>> > > >
>> > > >
>>

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

Thanks Ted.

Went back to hbase-1.2.6 that works OK with Hadoop 3.1

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 1 Jul 2018 at 00:15, Ted Yu  wrote:

> Have you tried setting the value for the config to filesystem ?
>
> Cheers
>
> On Sat, Jun 30, 2018 at 4:07 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > One way would be to set WAL outside of Hadoop environment. Will that
> work?
> >
> > The following did not work
> >
> > 
> >   hbase.wal.provider
> >   multiwal
> > 
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Sat, 30 Jun 2018 at 23:36, Ted Yu  wrote:
> >
> > > Please read :
> > >
> > > http://hbase.apache.org/book.html#wal.providers
> > >
> > > On Sat, Jun 30, 2018 at 3:31 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Thanks
> > > >
> > > > In your point below
> > > >
> > > > …. or you can change default WAL to FSHLog.
> > > >
> > > > is there any configuration parameter to allow me to do so in
> > > > hbase-site.xml?
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, 30 Jun 2018 at 23:25, Ted Yu  wrote:
> > > >
> > > > > Do you plan to deploy onto hadoop 3.1.x ?
> > > > >
> > > > > If so, you'd better build against hadoop 3.1.x yourself.
> > > > > You can either patch in HBASE-20244 and use asyncfswal.
> > > > > Or you can change default WAL to FSHLog.
> > > > >
> > > > > If you don't have to deploy onto hadoop 3.1.x, you can use hbase
> > 2.0.1
> > > > >
> > > > > FYI
> > > > >
> > > > > On Sat, Jun 30, 2018 at 3:21 PM, Mich Talebzadeh <
> > > > > mich.talebza...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > so what options do I have her?. Is there any conf parameter I can
> > set
> > > > in
> > > > > > hbase-site,xml to make this work? or shall I go back to a more
> > stable
> > > > > > version of Hbase?
> > > > > >
> > > > > > cheers
> > > > > >
> > > > > > Dr Mich Talebzadeh
> > > &

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

One way would be to set WAL outside of Hadoop environment. Will that work?

The following did not work


  hbase.wal.provider
  multiwal



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 30 Jun 2018 at 23:36, Ted Yu  wrote:

> Please read :
>
> http://hbase.apache.org/book.html#wal.providers
>
> On Sat, Jun 30, 2018 at 3:31 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks
> >
> > In your point below
> >
> > …. or you can change default WAL to FSHLog.
> >
> > is there any configuration parameter to allow me to do so in
> > hbase-site.xml?
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Sat, 30 Jun 2018 at 23:25, Ted Yu  wrote:
> >
> > > Do you plan to deploy onto hadoop 3.1.x ?
> > >
> > > If so, you'd better build against hadoop 3.1.x yourself.
> > > You can either patch in HBASE-20244 and use asyncfswal.
> > > Or you can change default WAL to FSHLog.
> > >
> > > If you don't have to deploy onto hadoop 3.1.x, you can use hbase 2.0.1
> > >
> > > FYI
> > >
> > > On Sat, Jun 30, 2018 at 3:21 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > so what options do I have her?. Is there any conf parameter I can set
> > in
> > > > hbase-site,xml to make this work? or shall I go back to a more stable
> > > > version of Hbase?
> > > >
> > > > cheers
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, 30 Jun 2018 at 23:15, Ted Yu  wrote:
> > > >
> > > > > trunk version would correspond to hbase 3.0 which has lot more
> > changes
> > > > > compared to hbase 2.
> > > > > The trunk build wouldn't serve you if your goal is to run hbase on
> > > hadoop
> > > > > 3.1 (see HBASE-20244)
> > > > >
> > > > > FYI
> > > > >
> > > > > On Sat, Jun 30, 2018 at 3:11 PM, Mich Talebzadeh <
> > > > > mich.talebza...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Ted.
> > > > > >
> > > > > > I downloaded the latest Hbase binary which is 2.0.1 2018/06/19
> > > > > >
> > > > > > Is there any trunc version build for Hadoop 3.1

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

Thanks

In your point below

…. or you can change default WAL to FSHLog.

is there any configuration parameter to allow me to do so in hbase-site.xml?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 30 Jun 2018 at 23:25, Ted Yu  wrote:

> Do you plan to deploy onto hadoop 3.1.x ?
>
> If so, you'd better build against hadoop 3.1.x yourself.
> You can either patch in HBASE-20244 and use asyncfswal.
> Or you can change default WAL to FSHLog.
>
> If you don't have to deploy onto hadoop 3.1.x, you can use hbase 2.0.1
>
> FYI
>
> On Sat, Jun 30, 2018 at 3:21 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > so what options do I have her?. Is there any conf parameter I can set in
> > hbase-site,xml to make this work? or shall I go back to a more stable
> > version of Hbase?
> >
> > cheers
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Sat, 30 Jun 2018 at 23:15, Ted Yu  wrote:
> >
> > > trunk version would correspond to hbase 3.0 which has lot more changes
> > > compared to hbase 2.
> > > The trunk build wouldn't serve you if your goal is to run hbase on
> hadoop
> > > 3.1 (see HBASE-20244)
> > >
> > > FYI
> > >
> > > On Sat, Jun 30, 2018 at 3:11 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ted.
> > > >
> > > > I downloaded the latest Hbase binary which is 2.0.1 2018/06/19
> > > >
> > > > Is there any trunc version build for Hadoop 3.1 please and if so
> where
> > > can
> > > > I download it?
> > > >
> > > > Regards,
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, 30 Jun 2018 at 22:52, Ted Yu  wrote:
> > > >
> > > > > Which hadoop release was the 2.0.1 built against ?
> > > > >
> > > > > In order to build hbase 2 against hadoop 3.0.1+ / 3.1.0+, you will
> > need
> > > > > HBASE-20244.
> > > > >
> > > > > FYI
> > > > >
> > > > > On Sat, Jun 30, 2018 at 2:34 PM, Mich Talebzadeh <
> > > > > mich.talebza...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I am using the following hbase-site.xml
> > > > > >
> > > > > > 
> >

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

so what options do I have her?. Is there any conf parameter I can set in
hbase-site,xml to make this work? or shall I go back to a more stable
version of Hbase?

cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 30 Jun 2018 at 23:15, Ted Yu  wrote:

> trunk version would correspond to hbase 3.0 which has lot more changes
> compared to hbase 2.
> The trunk build wouldn't serve you if your goal is to run hbase on hadoop
> 3.1 (see HBASE-20244)
>
> FYI
>
> On Sat, Jun 30, 2018 at 3:11 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks Ted.
> >
> > I downloaded the latest Hbase binary which is 2.0.1 2018/06/19
> >
> > Is there any trunc version build for Hadoop 3.1 please and if so where
> can
> > I download it?
> >
> > Regards,
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Sat, 30 Jun 2018 at 22:52, Ted Yu  wrote:
> >
> > > Which hadoop release was the 2.0.1 built against ?
> > >
> > > In order to build hbase 2 against hadoop 3.0.1+ / 3.1.0+, you will need
> > > HBASE-20244.
> > >
> > > FYI
> > >
> > > On Sat, Jun 30, 2018 at 2:34 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > I am using the following hbase-site.xml
> > > >
> > > > 
> > > >   
> > > > hbase.rootdir
> > > > hdfs://rhes75:9000/hbase
> > > >   
> > > >   
> > > > hbase.zookeeper.property.dataDir
> > > > /home/hduser/zookeeper-3.4.6
> > > >   
> > > > 
> > > > hbase.master
> > > > localhost:6
> > > > 
> > > > 
> > > >   hbase.zookeeper.property.clientPort
> > > >   2181
> > > >
> > > >   
> > > > hbase.cluster.distributed
> > > > true
> > > >   
> > > > 
> > > >  hbase.defaults.for.version.skip
> > > >  true
> > > > 
> > > > 
> > > >  phoenix.query.dateFormatTimeZone
> > > >  Europe/London
> > > > 
> > > > 
> > > > hbase.procedure.store.wal.use.hsync
> > > > false
> > > > `
> > > > 
> > > >   hbase.unsafe.stream.capability.enforce
> > > >   false
> > > > 
> > > > 
> > > >
> > > > master starts OK but region server throws some errors
> > > >
> > > > 2018-06-30 22:23:56,607 INFO  [regionserver/rhes75:16020]
> > > > wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128
> > MB,
> > > > prefix=rhes75%2C16020%2C1530393832024, suffix=,
> > > > logDir=hdfs://rhes75:9000/hbase/WALs/rhes75,16020,153
> > > > 0393832024, archiveDir=hdfs://rhes75:9000/hbase/oldWALs
> > > > 2018-06-30 22:23:56,629 ERROR [regionserver/rhes75:16020]
> > > > regionserver.HRegionServer: ason:
> > > > Type 'org/apache/hadoop/fs/ContentSummary' (current frame,
> > stack[1])
> > > > is
> > > > not assignable to 'org/apache/hadoop/fs/QuotaUsage'
> > > >   Current Frame:
> > > > bci: @105
> > > > flags: { }
> > > &g

Re: Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

Thanks Ted.

I downloaded the latest Hbase binary which is 2.0.1 2018/06/19

Is there any trunc version build for Hadoop 3.1 please and if so where can
I download it?

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 30 Jun 2018 at 22:52, Ted Yu  wrote:

> Which hadoop release was the 2.0.1 built against ?
>
> In order to build hbase 2 against hadoop 3.0.1+ / 3.1.0+, you will need
> HBASE-20244.
>
> FYI
>
> On Sat, Jun 30, 2018 at 2:34 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > I am using the following hbase-site.xml
> >
> > 
> >   
> > hbase.rootdir
> > hdfs://rhes75:9000/hbase
> >   
> >   
> > hbase.zookeeper.property.dataDir
> > /home/hduser/zookeeper-3.4.6
> >   
> > 
> > hbase.master
> > localhost:6
> > 
> > 
> >   hbase.zookeeper.property.clientPort
> >   2181
> >
> >   
> > hbase.cluster.distributed
> > true
> >   
> > 
> >  hbase.defaults.for.version.skip
> >  true
> > 
> > 
> >  phoenix.query.dateFormatTimeZone
> >  Europe/London
> > 
> > 
> > hbase.procedure.store.wal.use.hsync
> > false
> > `
> > 
> >   hbase.unsafe.stream.capability.enforce
> >   false
> > 
> > 
> >
> > master starts OK but region server throws some errors
> >
> > 2018-06-30 22:23:56,607 INFO  [regionserver/rhes75:16020]
> > wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB,
> > prefix=rhes75%2C16020%2C1530393832024, suffix=,
> > logDir=hdfs://rhes75:9000/hbase/WALs/rhes75,16020,153
> > 0393832024, archiveDir=hdfs://rhes75:9000/hbase/oldWALs
> > 2018-06-30 22:23:56,629 ERROR [regionserver/rhes75:16020]
> > regionserver.HRegionServer: ason:
> > Type 'org/apache/hadoop/fs/ContentSummary' (current frame, stack[1])
> > is
> > not assignable to 'org/apache/hadoop/fs/QuotaUsage'
> >   Current Frame:
> > bci: @105
> > flags: { }
> > locals: { 'org/apache/hadoop/fs/ContentSummary',
> > 'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$
> > ContentSummaryProto$Builder'
> > }
> > stack: {
> > 'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$
> > ContentSummaryProto$Builder',
> > 'org/apache/hadoop/fs/ContentSummary' }
> >   Bytecode:
> > 0x000: 2ac7 0005 01b0 b805 984c 2b2a b605 99b6
> > 0x010: 059a 2ab6 059b b605 9c2a b605 9db6 059e
> > 0x020: 2ab6 059f b605 a02a b605 a1b6 05a2 2ab6
> > 0x030: 05a3 b605 a42a b605 a5b6 05a6 2ab6 05a7
> > 0x040: b605 a82a b605 a9b6 05aa 2ab6 05ab b605
> > 0x050: ac2a b605 adb6 05ae 572a b605 af9a 000a
> > 0x060: 2ab6 05b0 9900 0c2b 2ab8 0410 b605 b157
> > 0x070: 2bb6 05b2 b0
> >   Stackmap Table:
> > same_frame(@6)
> > append_frame(@103,Object[#2940])
> > same_frame(@112)
> >  *
> > java.lang.VerifyError: Bad type on operand stack
> > Exception Details:
> >   Location:
> >
> > org/apache/hadoop/hdfs/protocolPB/PBHelperClient.
> > convert(Lorg/apache/hadoop/fs/ContentSummary;)Lorg/apache/
> > hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto;
> > @105: invokestatic
> >   Reason:
> > Type 'org/apache/hadoop/fs/ContentSummary' (current frame, stack[1])
> > is
> > not assignable to 'org/apache/hadoop/fs/QuotaUsage'
> >   Current Frame:
> > bci: @105
> > flags: { }
> > locals: { 'org/apache/hadoop/fs/ContentSummary',
> > 'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$
> > ContentSummaryProto$Builder'
> > }
> > stack: {
> > 'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$
> > ContentSummaryProto$Builder',
> > 'org/apache/hadoop/fs/ContentSummary' }
> >   Bytecode:
> > 0x000: 2ac7 0005 01b0 b805 984c 2b2a b605 99b6
> > 0x010: 05

Hbase hbase-2.0.1, region server does not start on Hadoop 3.1

2018-06-30 Thread Mich Talebzadeh

I am using the following hbase-site.xml


  
hbase.rootdir
hdfs://rhes75:9000/hbase
  
  
hbase.zookeeper.property.dataDir
/home/hduser/zookeeper-3.4.6
  

hbase.master
localhost:6


  hbase.zookeeper.property.clientPort
  2181
   
  
hbase.cluster.distributed
true
  

 hbase.defaults.for.version.skip
 true


 phoenix.query.dateFormatTimeZone
 Europe/London


hbase.procedure.store.wal.use.hsync
false
`

  hbase.unsafe.stream.capability.enforce
  false



master starts OK but region server throws some errors

2018-06-30 22:23:56,607 INFO  [regionserver/rhes75:16020]
wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB,
prefix=rhes75%2C16020%2C1530393832024, suffix=,
logDir=hdfs://rhes75:9000/hbase/WALs/rhes75,16020,153
0393832024, archiveDir=hdfs://rhes75:9000/hbase/oldWALs
2018-06-30 22:23:56,629 ERROR [regionserver/rhes75:16020]
regionserver.HRegionServer: ason:
Type 'org/apache/hadoop/fs/ContentSummary' (current frame, stack[1]) is
not assignable to 'org/apache/hadoop/fs/QuotaUsage'
  Current Frame:
bci: @105
flags: { }
locals: { 'org/apache/hadoop/fs/ContentSummary',
'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto$Builder'
}
stack: {
'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto$Builder',
'org/apache/hadoop/fs/ContentSummary' }
  Bytecode:
0x000: 2ac7 0005 01b0 b805 984c 2b2a b605 99b6
0x010: 059a 2ab6 059b b605 9c2a b605 9db6 059e
0x020: 2ab6 059f b605 a02a b605 a1b6 05a2 2ab6
0x030: 05a3 b605 a42a b605 a5b6 05a6 2ab6 05a7
0x040: b605 a82a b605 a9b6 05aa 2ab6 05ab b605
0x050: ac2a b605 adb6 05ae 572a b605 af9a 000a
0x060: 2ab6 05b0 9900 0c2b 2ab8 0410 b605 b157
0x070: 2bb6 05b2 b0
  Stackmap Table:
same_frame(@6)
append_frame(@103,Object[#2940])
same_frame(@112)
 *
java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:

org/apache/hadoop/hdfs/protocolPB/PBHelperClient.convert(Lorg/apache/hadoop/fs/ContentSummary;)Lorg/apache/hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto;
@105: invokestatic
  Reason:
Type 'org/apache/hadoop/fs/ContentSummary' (current frame, stack[1]) is
not assignable to 'org/apache/hadoop/fs/QuotaUsage'
  Current Frame:
bci: @105
flags: { }
locals: { 'org/apache/hadoop/fs/ContentSummary',
'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto$Builder'
}
stack: {
'org/apache/hadoop/hdfs/protocol/proto/HdfsProtos$ContentSummaryProto$Builder',
'org/apache/hadoop/fs/ContentSummary' }
  Bytecode:
0x000: 2ac7 0005 01b0 b805 984c 2b2a b605 99b6
0x010: 059a 2ab6 059b b605 9c2a b605 9db6 059e
0x020: 2ab6 059f b605 a02a b605 a1b6 05a2 2ab6
0x030: 05a3 b605 a42a b605 a5b6 05a6 2ab6 05a7
0x040: b605 a82a b605 a9b6 05aa 2ab6 05ab b605
0x050: ac2a b605 adb6 05ae 572a b605 af9a 000a
0x060: 2ab6 05b0 9900 0c2b 2ab8 0410 b605 b157
0x070: 2bb6 05b2 b0
  Stackmap Table:
same_frame(@6)
append_frame(@103,Object[#2940])
same_frame(@112)

any ideas?

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Problem starting region server with Hbase version hbase-2.0.0

2018-06-07 Thread Mich Talebzadeh

Thanks.

under $HBASE_HOME/lib for version 2 I swapped the phoenix client jar file
as below

phoenix-5.0.0-alpha-HBase-2.0-client.jar_ori
phoenix-4.8.1-HBase-1.2-client.jar

I then started HBASE-2 that worked fine.

For Hbase clients, i.e. the Hbase  connection from edge nodes etc, I will
keep using HBASE-1.2.6 which is the stable version and it connects
successfully to Hbase-2. This appears to be a working solution for now.

Regards

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 June 2018 at 21:03, Sean Busbey  wrote:

> Your current problem is caused by this phoenix jar:
>
>
> > hduser@rhes75: /data6/hduser/hbase-2.0.0> find ./ -name '*.jar' -print
> > -exec jar tf {} \; | grep -E "\.jar$|StreamCapabilities" | grep -B 1
> > StreamCapabilities
> > ./lib/phoenix-5.0.0-alpha-HBase-2.0-client.jar
> > org/apache/hadoop/hbase/util/CommonFSUtils$StreamCapabilities.class
> > org/apache/hadoop/fs/StreamCapabilities.class
> > org/apache/hadoop/fs/StreamCapabilities$StreamCapability.class
>
> I don't know what version of Hadoop it's bundling or why, but it's one
> that includes the StreamCapabilities interface, so HBase takes that to
> mean it can check on capabilities. Since Hadoop 2.7 doesn't claim to
> implement any, HBase throws its hands up.
>
> I'd recommend you ask on the phoenix list how to properly install
> phoenix such that you don't need to copy the jars into the HBase
> installation. Hopefully the jar pointed out here is meant to be client
> facing only and not installed into the HBase cluster.
>
>
> On Thu, Jun 7, 2018 at 2:38 PM, Mich Talebzadeh
>  wrote:
> > Hi,
> >
> > Under Hbase Home directory I get
> >
> > hduser@rhes75: /data6/hduser/hbase-2.0.0> find ./ -name '*.jar' -print
> > -exec jar tf {} \; | grep -E "\.jar$|StreamCapabilities" | grep -B 1
> > StreamCapabilities
> > ./lib/phoenix-5.0.0-alpha-HBase-2.0-client.jar
> > org/apache/hadoop/hbase/util/CommonFSUtils$StreamCapabilities.class
> > org/apache/hadoop/fs/StreamCapabilities.class
> > org/apache/hadoop/fs/StreamCapabilities$StreamCapability.class
> > --
> > ./lib/hbase-common-2.0.0.jar
> > org/apache/hadoop/hbase/util/CommonFSUtils$StreamCapabilities.class
> >
> > for Hadoop home directory I get nothing
> >
> > hduser@rhes75: /home/hduser/hadoop-2.7.3> find ./ -name '*.jar' -print
> > -exec jar tf {} \; | grep -E "\.jar$|StreamCapabilities" | grep -B 1
> > StreamCapabilities
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 7 June 2018 at 15:39, Sean Busbey  wrote:
> >
> >> Somehow, HBase is getting confused by your installation and thinks it
> >> can check for wether or not the underlying FileSystem implementation
> >> (i.e. HDFS) provides hflush/hsync even though that ability is not
> >> present in Hadoop 2.7. Usually this means there's a mix of Hadoop
> >> versions on the classpath. While you do have both Hadoop 2.7.3 and
> >> 2.7.4, that mix shouldn't cause this kind of failure[1].
> >>
> >> Please run this command and copy/paste the output in your HBase and
> >> Hadoop installation directories:
> >>
> >> find . -name '*.jar' -print -exec jar tf {} \; | grep -E
> >> "\.jar$|StreamCapabilities" | grep -B 1 StreamCapabilities
> >>
> >>
> >>
> >> [1]: As an asid

Re: Problem starting region server with Hbase version hbase-2.0.0

2018-06-07 Thread Mich Talebzadeh

Hi,

Under Hbase Home directory I get

hduser@rhes75: /data6/hduser/hbase-2.0.0> find ./ -name '*.jar' -print
-exec jar tf {} \; | grep -E "\.jar$|StreamCapabilities" | grep -B 1
StreamCapabilities
./lib/phoenix-5.0.0-alpha-HBase-2.0-client.jar
org/apache/hadoop/hbase/util/CommonFSUtils$StreamCapabilities.class
org/apache/hadoop/fs/StreamCapabilities.class
org/apache/hadoop/fs/StreamCapabilities$StreamCapability.class
--
./lib/hbase-common-2.0.0.jar
org/apache/hadoop/hbase/util/CommonFSUtils$StreamCapabilities.class

for Hadoop home directory I get nothing

hduser@rhes75: /home/hduser/hadoop-2.7.3> find ./ -name '*.jar' -print
-exec jar tf {} \; | grep -E "\.jar$|StreamCapabilities" | grep -B 1
StreamCapabilities


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 June 2018 at 15:39, Sean Busbey  wrote:

> Somehow, HBase is getting confused by your installation and thinks it
> can check for wether or not the underlying FileSystem implementation
> (i.e. HDFS) provides hflush/hsync even though that ability is not
> present in Hadoop 2.7. Usually this means there's a mix of Hadoop
> versions on the classpath. While you do have both Hadoop 2.7.3 and
> 2.7.4, that mix shouldn't cause this kind of failure[1].
>
> Please run this command and copy/paste the output in your HBase and
> Hadoop installation directories:
>
> find . -name '*.jar' -print -exec jar tf {} \; | grep -E
> "\.jar$|StreamCapabilities" | grep -B 1 StreamCapabilities
>
>
>
> [1]: As an aside, you should follow the guidance in our reference
> guide from the section "Replace the Hadoop Bundled With HBase!" in the
> Hadoop chapter: http://hbase.apache.org/book.html#hadoop
>
> But as I mentioned, I don't think it's the underlying cause in this case.
>
> On Thu, Jun 7, 2018 at 8:41 AM, Mich Talebzadeh
>  wrote:
> > Hi,
> >
> > Please find below
> >
> > *bin/hbase version*
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/data6/hduser/hbase-2.0.0/lib/phoenix-5.0.0-alpha-
> HBase-2.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/data6/hduser/hbase-2.0.0/lib/slf4j-log4j12-1.7.
> 25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> > [jar:file:/home/hduser/hadoop-2.7.3/share/hadoop/common/lib/
> slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > HBase 2.0.0
> > Source code repository git://
> > kalashnikov.att.net/Users/stack/checkouts/hbase.git
> > revision=7483b111e4da77adbfc8062b3b22cbe7c2cb91c1
> > Compiled by stack on Sun Apr 22 20:26:55 PDT 2018
> > From source with checksum a59e806496ef216732e730c746bbe5ac
> >
> > *l**s -lah lib/hadoop**
> > -rw-r--r-- 1 hduser hadoop  41K Apr 23 04:26
> > lib/hadoop-annotations-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop  93K Apr 23 04:26 lib/hadoop-auth-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop  26K Apr 23 04:29 lib/hadoop-client-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 1.9M Apr 23 04:28
> > lib/hadoop-common-2.7.4-tests.jar
> > -rw-r--r-- 1 hduser hadoop 3.4M Apr 23 04:26 lib/hadoop-common-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 127K Apr 23 04:29 lib/hadoop-distcp-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 3.4M Apr 23 04:29 lib/hadoop-hdfs-2.7.4-tests.
> jar
> > -rw-r--r-- 1 hduser hadoop 8.0M Apr 23 04:29 lib/hadoop-hdfs-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 532K Apr 23 04:29
> > lib/hadoop-mapreduce-client-app-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 759K Apr 23 04:29
> > lib/hadoop-mapreduce-client-common-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 1.5M Apr 23 04:27
> > lib/hadoop-mapreduce-client-core-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop 188K Apr 23 04:29
> > lib/hadoop-mapreduce-client-hs-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop  62K Apr 23 04:29
> > lib/hadoop-mapreduce-client-jobclient-2.7.4.jar
> > -rw-r--r-- 1 hduser hadoop  71K Apr 23 04:28
> > lib/hadoop-mapreduce-client-shuffle-2.7.4.jar
> > -rw-r

Re: Problem starting region server with Hbase version hbase-2.0.0

2018-06-07 Thread Mich Talebzadeh

Hi,

Please find below

*bin/hbase version*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/data6/hduser/hbase-2.0.0/lib/phoenix-5.0.0-alpha-HBase-2.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/data6/hduser/hbase-2.0.0/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hduser/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
HBase 2.0.0
Source code repository git://
kalashnikov.att.net/Users/stack/checkouts/hbase.git
revision=7483b111e4da77adbfc8062b3b22cbe7c2cb91c1
Compiled by stack on Sun Apr 22 20:26:55 PDT 2018
>From source with checksum a59e806496ef216732e730c746bbe5ac

*l**s -lah lib/hadoop**
-rw-r--r-- 1 hduser hadoop  41K Apr 23 04:26
lib/hadoop-annotations-2.7.4.jar
-rw-r--r-- 1 hduser hadoop  93K Apr 23 04:26 lib/hadoop-auth-2.7.4.jar
-rw-r--r-- 1 hduser hadoop  26K Apr 23 04:29 lib/hadoop-client-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 1.9M Apr 23 04:28
lib/hadoop-common-2.7.4-tests.jar
-rw-r--r-- 1 hduser hadoop 3.4M Apr 23 04:26 lib/hadoop-common-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 127K Apr 23 04:29 lib/hadoop-distcp-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 3.4M Apr 23 04:29 lib/hadoop-hdfs-2.7.4-tests.jar
-rw-r--r-- 1 hduser hadoop 8.0M Apr 23 04:29 lib/hadoop-hdfs-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 532K Apr 23 04:29
lib/hadoop-mapreduce-client-app-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 759K Apr 23 04:29
lib/hadoop-mapreduce-client-common-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 1.5M Apr 23 04:27
lib/hadoop-mapreduce-client-core-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 188K Apr 23 04:29
lib/hadoop-mapreduce-client-hs-2.7.4.jar
-rw-r--r-- 1 hduser hadoop  62K Apr 23 04:29
lib/hadoop-mapreduce-client-jobclient-2.7.4.jar
-rw-r--r-- 1 hduser hadoop  71K Apr 23 04:28
lib/hadoop-mapreduce-client-shuffle-2.7.4.jar
-rw-r--r-- 1 hduser hadoop  26K Apr 23 04:28
lib/hadoop-minicluster-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 2.0M Apr 23 04:27 lib/hadoop-yarn-api-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 163K Apr 23 04:28
lib/hadoop-yarn-client-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 1.7M Apr 23 04:27
lib/hadoop-yarn-common-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 216K Apr 23 04:28
lib/hadoop-yarn-server-applicationhistoryservice-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 380K Apr 23 04:28
lib/hadoop-yarn-server-common-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 703K Apr 23 04:28
lib/hadoop-yarn-server-nodemanager-2.7.4.jar
-rw-r--r-- 1 hduser hadoop 1.3M Apr 23 04:29
lib/hadoop-yarn-server-resourcemanager-2.7.4.jar
-rw-r--r-- 1 hduser hadoop  75K Apr 23 04:28
lib/hadoop-yarn-server-tests-2.7.4-tests.jar
-rw-r--r-- 1 hduser hadoop  58K Apr 23 04:29
lib/hadoop-yarn-server-web-proxy-2.7.4.jar

Also I am on Hadoop 2.7.3

*hadoop version*
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
>From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using
/home/hduser/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 June 2018 at 14:20, Sean Busbey  wrote:

> HBase needs HDFS syncs to avoid dataloss during component failure.
>
> What's the output of the command "bin/hbase version"?
>
>
> What's the result of doing the following in the hbase install?
>
> ls -lah lib/hadoop*
>
> On Jun 7, 2018 00:58, "Mich Talebzadeh"  wrote:
>
> yes correct I am using Hbase on hdfs  with hadoop-2.7.3
>
> The file system is ext4.
>
> I was hoping that I can avoid the sync option,
>
> many thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will i

Does HBase importtsv take namespace name as part of table name

2018-06-07 Thread Mich Talebzadeh

I am getting this error when tablename includes namespace!

org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=','
-Dimporttsv.columns="HBASE_ROW_KEY,price_info:ticker,price_info:timecreated,price_info:price"
"tradeData:marketDataHbaseBatch"

Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 4900 actions:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column
family price_info does not exist in region
tradeData:marketDataHbaseBatch,,1528366035267.724c1bff41e867dca86389e9864a1935.
in table 'tradeData:marketDataHbaseBatch', {NAME => 'trade_info',
BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL =>
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =>
'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

If I create the table in default namespace (i.e. without any namespace
name) it works!

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Problem starting region server with Hbase version hbase-2.0.0

2018-06-06 Thread Mich Talebzadeh

yes correct I am using Hbase on hdfs  with hadoop-2.7.3

The file system is ext4.

I was hoping that I can avoid the sync option,

many thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 June 2018 at 01:43, Sean Busbey  wrote:

> On Wed, Jun 6, 2018 at 6:11 PM, Mich Talebzadeh
>  wrote:
> >
> >
> > so the region server started OK but then I had a problem with master :(
> >
> > java.lang.IllegalStateException: The procedure WAL relies on the
> ability to
> > hsync for proper operation during component failures, but the underlying
> > filesystem does not support doing so. Please check the config value of
> > 'hbase.procedure.store.wal.use.hsync' to set the desired level of
> > robustness and ensure the config value of 'hbase.wal.dir' points to a
> > FileSystem mount that can provide it.
> >
>
> This error means that you're running on top of a Filesystem that
> doesn't provide sync.
>
> Are you using HDFS? What version?
>

Re: Problem starting region server with Hbase version hbase-2.0.0

2018-06-06 Thread Mich Talebzadeh

Thanks all.

in my older version of Hbase 1.2.3 I had added the correct phoenix jar file
(phoenix-4.8.1-HBase-1.2-client.jar) to /lib directory of Hbase.

I found the correct jar file for Hbase 2.0.0
in phoenix-5.0.0-alpha-HBase-2.0-client.jar

jar tvf phoenix-5.0.0-alpha-HBase-2.0-client.jar|grep IndexedWALEditCodec
  1881 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$BinaryCompatibleCompressedIndexKeyValueDecoder.class
  1223 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$BinaryCompatibleIndexKeyValueDecoder.class
   830 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$BinaryCompatiblePhoenixBaseDecoder.class
  1801 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$CompressedIndexKeyValueDecoder.class
  1919 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$CompressedIndexKeyValueEncoder.class
  1143 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$IndexKeyValueDecoder.class
  1345 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$IndexKeyValueEncoder.class
   755 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$PhoenixBaseDecoder.class
   762 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec$PhoenixBaseEncoder.class
  4436 Thu Feb 08 17:36:50 GMT 2018
org/apache/hadoop/hbase/regionserver/wal/IndexedWALEditCodec.class

so the region server started OK but then I had a problem with master :(

java.lang.IllegalStateException: The procedure WAL relies on the ability to
hsync for proper operation during component failures, but the underlying
filesystem does not support doing so. Please check the config value of
'hbase.procedure.store.wal.use.hsync' to set the desired level of
robustness and ensure the config value of 'hbase.wal.dir' points to a
FileSystem mount that can provide it.


I tried that mentioned property in hbase-site.xml but no luck. However, I
saw this recent note
<https://stackoverflow.com/questions/50229580/hbase-shell-cannot-use-error-keepererrorcode-nonode-for-hbase-master>


   - I had similar issues with the recent HBase 2.x beta releases, whereas
   everything was OK with stable 1.x releases. Are you using 2.x beta? –
   VS_FF <https://stackoverflow.com/users/7241513/vs-ff> May 8 at 13:16
   
<https://stackoverflow.com/questions/50229580/hbase-shell-cannot-use-error-keepererrorcode-nonode-for-hbase-master#comment87486468_50229580>
   -
   yes, i guess that it is caused by releases problem – Solodye
   <https://stackoverflow.com/users/8351601/solodye> May 10 at 12:58
   
<https://stackoverflow.com/questions/50229580/hbase-shell-cannot-use-error-keepererrorcode-nonode-for-hbase-master#comment87563798_50229580>


So I revered back to stable release Hbase 1.2.6 unless someone has resolved
this issue.

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 6 June 2018 at 23:24, Juan Jose Escobar 
wrote:

> Hello Mich,
>
> Verify you have the right jars (from your commnts I guess should be
> phoenix-5.0.0-alpha-HBase-2.0-server.jar), that it shows in HBase
> classpath
> and that it contains the missing class e.g. with jar -vtf.
>
> Also, check if there are any pending WALs that are making the startup fail,
> I had similar problem and Phoenix seemed to cause problems at startup until
> I removed the WALs.
>
>
>
>
>
>
>
>
> On Wed, Jun 6, 2018 at 10:55 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks Sean. I downloaded Phoenix for Hbase version 2
> > (apache-phoenix-5.0.0-alpha-HBase-2.0-bin) but still the same error
> >
> > 2018-06-06 21:45:15,297 INFO  [regionserver/rhes75:16020]
> > wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB,
> > prefix=rhes75%2C16020%2C1528317910703, suffix=,
> > logDir=hdfs://rhes75:9000/hbase/WALs/rhes75,16020,152
> > 8317910703, archiveDir=hdfs://rhes75:9000/hbase/oldWALs
> > 2018-06-06 21:45:15,414 ERROR [regionserver/rhes75:16020]
> > regionserver.HRegionServer: * ABORTING region server
>

Re: Problem starting region server with Hbase version hbase-2.0.0

2018-06-06 Thread Mich Talebzadeh

:service=HBase,name=RegionServer,sub=Server",
"modelerType" : "RegionServer,sub=Server",
"tag.Context" : "regionserver",
"tag.Hostname" : "rhes75"
  } ]
}
2018-06-06 21:45:15,430 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: * STOPPING region server
'rhes75,16020,1528317910703' *
2018-06-06 21:45:15,430 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: STOPPED: Unhandled: Unable to find
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
2018-06-06 21:45:15,430 INFO  [regionserver/rhes75:16020]
regionserver.SplitLogWorker: Sending interrupt to stop the worker thread
2018-06-06 21:45:15,430 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: Stopping infoServer
2018-06-06 21:45:15,430 INFO  [SplitLogWorker-rhes75:16020]
regionserver.SplitLogWorker: SplitLogWorker interrupted. Exiting.
2018-06-06 21:45:15,430 INFO  [SplitLogWorker-rhes75:16020]
regionserver.SplitLogWorker: SplitLogWorker rhes75,16020,1528317910703
exiting
2018-06-06 21:45:15,434 INFO  [regionserver/rhes75:16020]
handler.ContextHandler: Stopped o.e.j.w.WebAppContext@1e530163
{/,null,UNAVAILABLE}{file:/data6/hduser/hbase-2.0.0/hbase-webapps/regionserver}
2018-06-06 21:45:15,436 INFO  [regionserver/rhes75:16020]
server.AbstractConnector: Stopped ServerConnector@5c60b0a0
{HTTP/1.1,[http/1.1]}{0.0.0.0:16030}
2018-06-06 21:45:15,436 INFO  [regionserver/rhes75:16020]
handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@39651a82
{/static,file:///data6/hduser/hbase-2.0.0/hbase-webapps/static/,UNAVAILABLE}
2018-06-06 21:45:15,436 INFO  [regionserver/rhes75:16020]
handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@70211e49
{/logs,file:///data6/hduser/hbase-2.0.0/logs/,UNAVAILABLE}
2018-06-06 21:45:15,437 INFO  [regionserver/rhes75:16020]
regionserver.HeapMemoryManager: Stopping
2018-06-06 21:45:15,437 INFO  [regionserver/rhes75:16020]
flush.RegionServerFlushTableProcedureManager: Stopping region server flush
procedure manager abruptly.
2018-06-06 21:45:15,437 INFO  [MemStoreFlusher.1]
regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
2018-06-06 21:45:15,437 INFO  [regionserver/rhes75:16020]
snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager
abruptly.
2018-06-06 21:45:15,437 INFO  [MemStoreFlusher.0]
regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2018-06-06 21:45:15,437 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: aborting server rhes75,16020,1528317910703
2018-06-06 21:45:15,437 INFO  [regionserver/rhes75:16020]
zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x2a9ccc02 to
localhost:2181
2018-06-06 21:45:15,439 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: stopping server rhes75,16020,1528317910703; all
regions closed.
2018-06-06 21:45:15,440 INFO  [regionserver/rhes75:16020]
regionserver.Leases: Closed leases
2018-06-06 21:45:15,440 INFO  [regionserver/rhes75:16020]
hbase.ChoreService: Chore service for: regionserver/rhes75:16020 had
[[ScheduledChore: Name: CompactionThroughputTuner Period: 6 Unit:
MILLISECONDS], [ScheduledChore: Nam
e: CompactedHFilesCleaner Period: 12 Unit: MILLISECONDS],
[ScheduledChore: Name: MovedRegionsCleaner for region
rhes75,16020,1528317910703 Period: 12 Unit: MILLISECONDS],
[ScheduledChore: Name: MemstoreFlusherChore Period: 1
 Unit: MILLISECONDS]] on shutdown
2018-06-06 21:45:15,440 INFO  [regionserver/rhes75:16020.logRoller]
regionserver.LogRoller: LogRoller exiting.

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 6 June 2018 at 20:49, Sean Busbey  wrote:

> IndexedWALEditCodec is a class from the Apache Phoenix project. your
> cluster must be configured to have Phoenix run but it can't find the
> jars for phoenix.
>
> u...@phoenix.apache.org is probably your best bet for getting things
> going.
>
> On Wed, Jun 6, 2018 at 1:52 PM, Mich Talebzadeh
>  wrote:
> > Hi,
> >
> > I have an old Hbase hbase-1.2.3 that runs fine on both RHES 5.6 and RHES
> 7.5
> >
> > I created a new Hbase hbase-2.0.0 instance on RHES 7.5.
> >
> > I seem to have a problem with my region server as it fails to start
> > throwing error
> >
> > 2018-06-06 19:28:37,033 INFO  [regionserver/rhes75:16020]
> > regionserver.HRegionServer: CompactionChecker run

Problem starting region server with Hbase version hbase-2.0.0

2018-06-06 Thread Mich Talebzadeh

Hi,

I have an old Hbase hbase-1.2.3 that runs fine on both RHES 5.6 and RHES 7.5

I created a new Hbase hbase-2.0.0 instance on RHES 7.5.

I seem to have a problem with my region server as it fails to start
throwing error

2018-06-06 19:28:37,033 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: CompactionChecker runs every PT10S
2018-06-06 19:28:37,071 INFO  [SplitLogWorker-rhes75:16020]
regionserver.SplitLogWorker: SplitLogWorker rhes75,16020,1528309715572
starting
2018-06-06 19:28:37,073 INFO  [regionserver/rhes75:16020]
regionserver.HeapMemoryManager: Starting, tuneOn=false
2018-06-06 19:28:37,076 INFO  [regionserver/rhes75:16020]
regionserver.ChunkCreator: Allocating data MemStoreChunkPool with chunk
size 2 MB, max count 2880, initial count 0
2018-06-06 19:28:37,077 INFO  [regionserver/rhes75:16020]
regionserver.ChunkCreator: Allocating index MemStoreChunkPool with chunk
size 204.80 KB, max count 3200, initial count 0
2018-06-06 19:28:37,078 INFO  [ReplicationExecutor-0]
regionserver.ReplicationSourceManager: Current list of replicators:
[rhes75,16020,1528309715572] other RSs: [rhes75,16020,1528309715572]
2018-06-06 19:28:37,099 INFO  [regionserver/rhes75:16020]
regionserver.HRegionServer: Serving as rhes75,16020,1528309715572,
RpcServer on rhes75/50.140.197.220:16020, sessionid=0x163d61b308c0033
2018-06-06 19:28:37,100 INFO  [regionserver/rhes75:16020]
quotas.RegionServerRpcQuotaManager: Quota support disabled
2018-06-06 19:28:37,100 INFO  [regionserver/rhes75:16020]
quotas.RegionServerSpaceQuotaManager: Quota support disabled, not starting
space quota manager.
2018-06-06 19:28:40,133 INFO  [regionserver/rhes75:16020]
wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB,
prefix=rhes75%2C16020%2C1528309715572, suffix=,
logDir=hdfs://rhes75:9000/hbase/WALs/rhes75,16020,152
8309715572, archiveDir=hdfs://rhes75:9000/hbase/oldWALs
2018-06-06 19:28:40,251 ERROR [regionserver/rhes75:16020]
regionserver.HRegionServer: * ABORTING region server
rhes75,16020,1528309715572: Unhandled: Unable to find
org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec **
***

I cannot seem to be able to fix this even after removing hbase directory
from hdfs and zookeeper! Any ideas will be appreciated.

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Pro and Cons of using HBase table as an external table in HIVE

2017-06-07 Thread Mich Talebzadeh

As I know using Hive on Hbase can only be done through Hive

Example

hive>  create external table MARKETDATAHBASE (key STRING, TICKER STRING,
TIMECREATED STRING, PRICE STRING)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH
SERDEPROPERTIES ("hbase.columns.mapping" =
":key,PRICE_INFO:TICKER,PRICE_INFO:TIMECREATED,PRICE_INFO:PRICE")

TBLPROPERTIES ("hbase.table.name" = "MARKETDATAHBASE");

The problem here is that like most Hive external tables you are creating a
pointer to Hbase with Hive storage handler and there is very little
optimization that can be done.

In all probability you would be better off using Apache  Phoenix on top of
Hbase with Phoenix secondary indexes. Granted the SQL capability in Phoenix
may not be that good as Hive but should do for most purposes.

In Phoenix you can do:

CREATE TABLE MARKETDATAHBASE (PK VARCHAR PRIMARY KEY, PRICE_INFO.TICKER
VARCHAR, PRICE_INFO.TIMECREATED VARCHAR, PRICE_INFO.PRICE VARCHAR);

HTH,

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 7 June 2017 at 11:13, Ramasubramanian Narayanan <
ramasubramanian.naraya...@gmail.com> wrote:

> Hi,
>
> Can you please let us know Pro and Cons of using HBase table as an
> external table in HIVE.
>
> Will there be any performance degrade when using Hive over HBase instead
> of using direct HIVE table.
>
> The table that I am planning to use in HBase will be master table like
> account, customer. Wanting to achieve Slowly Changing Dimension. Please
> through some lights on that too if you have done any such implementations.
>
> Thanks and Regards,
> Rams
>

Reading specific column family and columns in Hbase table through spark

2016-12-29 Thread Mich Talebzadeh

Hi,

I have a routine in Spark that iterates  through Hbase rows and tries to
read columns.

My question is how can I read the correct ordering of columns?

example

val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
  classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
  classOf[org.apache.hadoop.hbase.client.Result])

val parsed = hBaseRDD.map{ case(b, a) => val iter = a.list().iterator();
( Bytes.toString(a.getRow()).toString,
Bytes.toString( iter.next().getValue()).toString,
Bytes.toString( iter.next().getValue()).toString,
Bytes.toString( iter.next().getValue()).toString,
Bytes.toString(iter.next().getValue())
)}

The above reads the column family columns sequentially. How can I force it
to read specific columns only?


Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Creating SQL skin on Hbase table

2016-12-22 Thread Mich Talebzadeh

Hi,

I have used Phoenix to create SQL views on top of Hbase table and also
created covering engines. The issue I have noticed is that all ingestion
has to go through Phoenix otherwise Phoenix indexes will not be updated.

Another alternative has been to try Hive EXTERNAL tables on top of Hbase
table but only EXTERNAL tables are supported. I was wondering the pros and
cons of using Hive or Phoenix tables on Hbase?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: reading Hbase table in Spark

2016-12-11 Thread Mich Talebzadeh

Hi Asher,

As mentioned before Spark 2 does not work with Phoenix. However, you can
use Spark 2 on top of Phoenix directly.

Does that answer your point?

Thanks

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 8 December 2016 at 08:31, Asher  wrote:

> Hi
> Mich, can you describe the detail about used phoenix read/write hbase table
> in spark for RDD's process.
> thx
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/reading-Hbase-table-in-Spark-tp4083260p4084996.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: [ANNOUNCE] New HBase Committer Josh Elser

2016-12-10 Thread Mich Talebzadeh

+ me

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 10 December 2016 at 19:54, Ted Yu  wrote:

> Congratulations , Josh.
>
> > On Dec 10, 2016, at 11:47 AM, Nick Dimiduk  wrote:
> >
> > On behalf of the Apache HBase PMC, I am pleased to announce that Josh
> Elser
> > has accepted the PMC's invitation to become a committer on the project.
> We
> > appreciate all of Josh's generous contributions thus far and look forward
> > to his continued involvement.
> >
> > Allow me to be the first to congratulate and welcome Josh into his new
> role!
>

Re: [ANNOUNCE] Apache Phoenix 4.9 released

2016-12-03 Thread Mich Talebzadeh

Many thanks for this announcement.

This is a question that I have been seeking verification.

Does the new release of 4.9.0 of Phoenix support transactional and ACID
compliance on Hbase? In a naïve way can one do what an RDBMS does with a
combination of Hbase + Phoenix!

FYI, I am not interested on add-ons or some beta test tools such as Phoenix
with combination of some other product.

Regards,

Mich

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 1 December 2016 at 21:31, James Taylor  wrote:

> Apache Phoenix enables OLTP and operational analytics for Apache Hadoop
> through SQL support using Apache HBase as its backing store and providing
> integration with other projects in the ecosystem such as Apache Spark,
> Apache Hive, Apache Pig, Apache Flume, and Apache MapReduce.
>
> We're pleased to announce our 4.9.0 release which includes:
> - Atomic UPSERT through new ON DUPLICATE KEY syntax [1]
> - Support for DEFAULT declaration in DDL statements [2]
> - Specify guidepost width per table [3]
> - Over 40 bugs fixed [4]
>
> The release is available in source or binary form here [5].
>
> Thanks,
> The Apache Phoenix Team
>
> [1] https://phoenix.apache.org/atomic_upsert.html
> [2] https://phoenix.apache.org/language/index.html#column_def
> [3] https://phoenix.apache.org/update_statistics.html
> [4]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12315120&version=12335845
> [5] https://phoenix.apache.org/download.html
>

Hbase on HDFS versus Cassandra

2016-11-30 Thread Mich Talebzadeh

Hi Guys,

Used Hbase on HDFS reasonably well. Happy to to stick with it and more with
Hive/Phoenix views and Phoenix indexes where I can.

I have a bunch of users now vocal about the use case for Cassandra and
whether it can do a better job than Hbase.

Unfortunately I am no expert on Cassandra. However, some use case fit would
be very valuable.

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Using Hbase as a transactional table

2016-11-28 Thread Mich Talebzadeh

Thanks Ted.

How does Phoenix provide transaction support?

I have read some docs but sounds like problematic. I need to be sure there
is full commit and rollback if things go wrong!

Also it appears that Phoenix transactional support is in beta phase.

Cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 23 November 2016 at 18:15, Ted Yu  wrote:

> Mich:
> Even though related rows are on the same region server, there is no
> intrinsic transaction support.
>
> For #1 under design considerations, multi column family is one
> possibility. You should consider how the queries from RDBMS access the
> related data.
>
> You can also evaluate Phoenix / Trafodion which provides transaction
> support.
>
> Cheers
>
> > On Nov 23, 2016, at 9:19 AM, Mich Talebzadeh 
> wrote:
> >
> > Thanks all.
> >
> > As I understand Hbase does not support ACIC compliant transactions over
> > multiple rows or across tables?
> >
> > So this is not supported
> >
> >
> >   1. Hbase can support multi-rows transactions if the rows are on the
> same
> >   table and in the same RegionServer?
> >   2. Hbase does not support multi-rows transactions if the rows are in
> >   different tables but happen to be in the same RegionServer?
> >   3. If I migrated RDBMS transactional tables to the same Hbase table
> (big
> >   if) with different column familities will that work?
> >
> >
> > Design considerations
> >
> >
> >   1. If I have 4 big tables in RDBMS, some having in excess of 200
> columns
> >   (I know this is a joke), can they all go one-to-one to Hbase tables.
> Can
> >   some of these RDBMS tables put into one Hbase schema  with different
> column
> >   families.
> >   2. then another question. If I use hive tables on these hbase tables
> >   with large number of family columns, will it work ok?
> >
> > thanks
> >
> >   1.
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >> On 23 November 2016 at 16:43, Denise Rogers  wrote:
> >>
> >> I would recommend MariaDB. HBase is not ACID compliant. MariaDB is.
> >>
> >> Regards,
> >> Denise
> >>
> >>
> >> Sent from mi iPad
> >>
> >>>> On Nov 23, 2016, at 11:27 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> >>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I need to explore if anyone has used Hbase as a transactional table to
> do
> >>> the processing that historically one has done with RDBMSs.
> >>>
> >>> A simple question dealing with a transaction as a unit of work (all or
> >>> nothing). In that case if any part of statement in batch transaction
> >> fails,
> >>> that transaction will be rolled back in its entirety.
> >>>
> >>> Now how does Hbase can handle this? Specifically at the theoretical
> level
> >>> if a standard transactional processing was migrated from RDBMS to Hbase
> >>> tables, will that work.
> >>>
> >>> Has anyone built  successful transaction processing in Hbase?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>>
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=
> >> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> >> OABUrV8Pw>*
> >>>
> >>>
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >>> loss, damage or destruction of data or any other property which may
> arise
> >>> from relying on this email's technical content is explicitly
> disclaimed.
> >>> The author will in no case be liable for any monetary damages arising
> >> from
> >>> such loss, damage or destruction.
> >>
> >>
>

Re: Storing XML file in Hbase

2016-11-28 Thread Mich Talebzadeh

Thanks Richard.

How would one decide on the number of column family and columns?

Is there a ballpark approach

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 November 2016 at 16:04, Richard Startin 
wrote:

> Hi Mich,
>
> If you want to store the file whole, you'll need to enforce a 10MB limit
> to the file size, otherwise you will flush too often (each time the me
> store fills up) which will slow down writes.
>
> Maybe you could deconstruct the xml by extracting columns from the xml
> using xpath?
>
> If the files are small there might be a tangible performance benefit by
> limiting the number of columns.
>
> Cheers,
> Richard
>
> Sent from my iPhone
>
> > On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> >
> > Hi Mich,
> >
> > How many files are you looking to store? How often do you need to read
> > them? What's the total size of all the files you need to serve?
> >
> > Cheers,
> > Dima
> >
> > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Storing XML file in Big Data. Are there any strategies to create
> multiple
> >> column families or just one column family and in that case how many
> columns
> >> would be optional?
> >>
> >> thanks
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn *
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >> <
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >>> *
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
>

Storing XML file in Hbase

2016-11-28 Thread Mich Talebzadeh

Hi,

Storing XML file in Big Data. Are there any strategies to create multiple
column families or just one column family and in that case how many columns
would be optional?

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Using Hbase as a transactional table

2016-11-23 Thread Mich Talebzadeh

Thanks all.

As I understand Hbase does not support ACIC compliant transactions over
multiple rows or across tables?

So this is not supported

   1. Hbase can support multi-rows transactions if the rows are on the same
   table and in the same RegionServer?
   2. Hbase does not support multi-rows transactions if the rows are in
   different tables but happen to be in the same RegionServer?
   3. If I migrated RDBMS transactional tables to the same Hbase table (big
   if) with different column familities will that work?

Design considerations

   1. If I have 4 big tables in RDBMS, some having in excess of 200 columns
   (I know this is a joke), can they all go one-to-one to Hbase tables. Can
   some of these RDBMS tables put into one Hbase schema  with different column
   families.
   2. then another question. If I use hive tables on these hbase tables
   with large number of family columns, will it work ok?

thanks

   1.

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 23 November 2016 at 16:43, Denise Rogers  wrote:

> I would recommend MariaDB. HBase is not ACID compliant. MariaDB is.
>
> Regards,
> Denise
>
>
> Sent from mi iPad
>
> > On Nov 23, 2016, at 11:27 AM, Mich Talebzadeh 
> wrote:
> >
> > Hi,
> >
> > I need to explore if anyone has used Hbase as a transactional table to do
> > the processing that historically one has done with RDBMSs.
> >
> > A simple question dealing with a transaction as a unit of work (all or
> > nothing). In that case if any part of statement in batch transaction
> fails,
> > that transaction will be rolled back in its entirety.
> >
> > Now how does Hbase can handle this? Specifically at the theoretical level
> > if a standard transactional processing was migrated from RDBMS to Hbase
> > tables, will that work.
> >
> > Has anyone built  successful transaction processing in Hbase?
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
>
>

Using Hbase as a transactional table

2016-11-23 Thread Mich Talebzadeh

Hi,

I need to explore if anyone has used Hbase as a transactional table to do
the processing that historically one has done with RDBMSs.

A simple question dealing with a transaction as a unit of work (all or
nothing). In that case if any part of statement in batch transaction fails,
that transaction will be rolled back in its entirety.

Now how does Hbase can handle this? Specifically at the theoretical level
if a standard transactional processing was migrated from RDBMS to Hbase
tables, will that work.

Has anyone built  successful transaction processing in Hbase?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Hive on Hbase

2016-11-17 Thread Mich Talebzadeh

Thanks John for info.

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 November 2016 at 18:44, John Leach  wrote:

> Mich,
>
> Please see slide 9 for architectural differences between Splice Machine,
> Trafodion, and Phoenix.
>
> https://docs.google.com/presentation/d/111t2QSVaI-CPwE_
> ejPHZMFhKJVe5yghCMPLfR3zh9hQ/edit?ts=582def5b#slide=id.g5fcdef5a7_09 <
> https://docs.google.com/presentation/d/111t2QSVaI-CPwE_
> ejPHZMFhKJVe5yghCMPLfR3zh9hQ/edit?ts=582def5b#slide=id.g5fcdef5a7_09>
>
> The performance differences are in the later slides.
>
> Hope this helps.
>
> Regards,
> John Leach
>
> > On Nov 17, 2016, at 10:41 AM, Gunnar Tapper 
> wrote:
> >
> > Hi,
> >
> > Trafodion's native storage engine is HBase.
> >
> > You can find its documentation at: trafodion.apache.org/
> documentation.html
> >
> > Since this is an HBase user mailing list, I suggest that we discuss your
> > other questions on u...@trafodion.incubator.apache.org.
> >
> > Thanks,
> >
> > Gunnar
> >
> >
> >
> > On Thu, Nov 17, 2016 at 8:19 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> thanks Gunnar.
> >>
> >> have you tried the performance of this product on Hbase. There are a
> number
> >> of options available. However, what makes this product better than hive
> on
> >> hbase?
> >>
> >> regards
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn * https://www.linkedin.com/profile/view?id=
> >> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> <https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> >> OABUrV8Pw>*
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
> >>
> >>
> >> On 17 November 2016 at 15:04, Gunnar Tapper 
> >> wrote:
> >>
> >>> Apache Trafodion provides SQL on top of HBase.
> >>>
> >>> On Thu, Nov 17, 2016 at 7:40 AM, Mich Talebzadeh <
> >>> mich.talebza...@gmail.com>
> >>> wrote:
> >>>
> >>>> thanks John.
> >>>>
> >>>> How about using Phoenix or using Spark RDDs on top of Hbase?
> >>>>
> >>>> Many people think Phoenix is not a good choice?
> >>>>
> >>>>
> >>>>
> >>>> Dr Mich Talebzadeh
> >>>>
> >>>>
> >>>>
> >>>> LinkedIn * https://www.linkedin.com/profile/view?id=
> >>>> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>>> <https://www.linkedin.com/profile/view?id=
> >> AAEWh2gBxianrbJd6zP6AcPCCd
> >>>> OABUrV8Pw>*
> >>>>
> >>>>
> >>>>
> >>>> http://talebzadehmich.wordpress.com
> >>>>
> >>>>
> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >> any
> >>>> loss, damage or destruction of data or any other property which may
> >> arise
> >>>> from relying on this email's technical content is explicitly
> >> disclaimed.
> >>>> The author will in no case be liable for any monetary damages arising
> >>> from
> >>>> such loss, damage or destruction.
> >>>>
> >>>>
> >>>>
> >>>> On 17 November 2016 at 14:24, John Leach 
> >>> wrote:
> >>>>
> >>>>> Mich,
> >>>>>
> >>>

Re: Hive on Hbase

2016-11-17 Thread Mich Talebzadeh

thanks Gunnar.

have you tried the performance of this product on Hbase. There are a number
of options available. However, what makes this product better than hive on
hbase?

regards

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 November 2016 at 15:04, Gunnar Tapper  wrote:

> Apache Trafodion provides SQL on top of HBase.
>
> On Thu, Nov 17, 2016 at 7:40 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > thanks John.
> >
> > How about using Phoenix or using Spark RDDs on top of Hbase?
> >
> > Many people think Phoenix is not a good choice?
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 17 November 2016 at 14:24, John Leach 
> wrote:
> >
> > > Mich,
> > >
> > > I have not found too many happy users of Hive on top of HBase in my
> > > experience.  For every query in Hive, you will have to read the data
> from
> > > the filesystem into hbase and then serialize the data via an HBase
> > scanner
> > > into Hive.  The throughput through this mechanism is pretty poor and
> now
> > > when you read 1 million records you actually read 1 Million records in
> > > HBase and 1 Million Records in Hive.  There are significant resource
> > > management issues with this approach as well.
> > >
> > > At Splice Machine (open source), we have written an implementation to
> > read
> > > the store files directly from the file system (via embedded Spark) and
> > then
> > > we do incremental deltas with HBase to maintain consistency.  When we
> > read
> > > 1 million records, Spark reads most of them directly from the
> filesystem.
> > > Spark provides resource management and fair scheduling of those queries
> > as
> > > well.
> > >
> > > We released some of our performance results at HBaseCon East in NYC.
> > Here
> > > is the video.  https://www.youtube.com/watch?v=cgIz-cjehJ0 <
> > > https://www.youtube.com/watch?v=cgIz-cjehJ0> .
> > >
> > > Regards,
> > > John Leach
> > >
> > > > On Nov 17, 2016, at 6:09 AM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > > wrote:
> > > >
> > > > H,
> > > >
> > > > My approach to have a SQL engine on top of Hbase has been (excluding
> > > Spark
> > > > & Phoenix for now) is to create Hbase table as is, then create an
> > > EXTERNAL
> > > > Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to
> interface
> > > with
> > > > Hbase table.
> > > >
> > > > My reasoning with creating Hive external table is to avoid
> accidentally
> > > > dropping Hbase table etc. Is this a reasonable approach?
> > > >
> > > > Then that Hive table can be used by a variety of tools like Spark,
> > > Tableau,
> > > > Zeppelin.
> > > >
> > > > Is this a viable solution as Hive seems to be preferred on top of
> Hbase
> > > > compared to Phoenix etc.
> > > >
> > > > Thaks
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > >
> > >
> >
>
>
>
> --
> Thanks,
>
> Gunnar
> *If you think you can you can, if you think you can't you're right.*
>

Re: Hive on Hbase

2016-11-17 Thread Mich Talebzadeh

thanks John.

How about using Phoenix or using Spark RDDs on top of Hbase?

Many people think Phoenix is not a good choice?



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 November 2016 at 14:24, John Leach  wrote:

> Mich,
>
> I have not found too many happy users of Hive on top of HBase in my
> experience.  For every query in Hive, you will have to read the data from
> the filesystem into hbase and then serialize the data via an HBase scanner
> into Hive.  The throughput through this mechanism is pretty poor and now
> when you read 1 million records you actually read 1 Million records in
> HBase and 1 Million Records in Hive.  There are significant resource
> management issues with this approach as well.
>
> At Splice Machine (open source), we have written an implementation to read
> the store files directly from the file system (via embedded Spark) and then
> we do incremental deltas with HBase to maintain consistency.  When we read
> 1 million records, Spark reads most of them directly from the filesystem.
> Spark provides resource management and fair scheduling of those queries as
> well.
>
> We released some of our performance results at HBaseCon East in NYC.  Here
> is the video.  https://www.youtube.com/watch?v=cgIz-cjehJ0 <
> https://www.youtube.com/watch?v=cgIz-cjehJ0> .
>
> Regards,
> John Leach
>
> > On Nov 17, 2016, at 6:09 AM, Mich Talebzadeh 
> wrote:
> >
> > H,
> >
> > My approach to have a SQL engine on top of Hbase has been (excluding
> Spark
> > & Phoenix for now) is to create Hbase table as is, then create an
> EXTERNAL
> > Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to interface
> with
> > Hbase table.
> >
> > My reasoning with creating Hive external table is to avoid accidentally
> > dropping Hbase table etc. Is this a reasonable approach?
> >
> > Then that Hive table can be used by a variety of tools like Spark,
> Tableau,
> > Zeppelin.
> >
> > Is this a viable solution as Hive seems to be preferred on top of Hbase
> > compared to Phoenix etc.
> >
> > Thaks
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
>
>

Hive on Hbase

2016-11-17 Thread Mich Talebzadeh

H,

My approach to have a SQL engine on top of Hbase has been (excluding Spark
& Phoenix for now) is to create Hbase table as is, then create an EXTERNAL
Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to interface with
Hbase table.

My reasoning with creating Hive external table is to avoid accidentally
dropping Hbase table etc. Is this a reasonable approach?

Then that Hive table can be used by a variety of tools like Spark, Tableau,
Zeppelin.

Is this a viable solution as Hive seems to be preferred on top of Hbase
compared to Phoenix etc.

Thaks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Zeppelin using Spark to access Hbase throws error

2016-10-29 Thread Mich Talebzadeh

Hi Felix,

Yes it is the same host that I run Spark shell and I start Zeppelin on.

Have you observed this before?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 29 October 2016 at 21:53, Felix Cheung  wrote:

> When you run the code in spark-shell - is that the same machine as where
> Zeppelin is running?
>
> It looks like you are getting socket connection timeout when Spark,
> running from Zeppelin, is trying to connect to HBASE.
>
>
> _________
> From: Mich Talebzadeh 
> Sent: Saturday, October 29, 2016 1:30 PM
> Subject: Zeppelin using Spark to access Hbase throws error
> To: , 
>
>
> Spark 2.0.1, Zeppelin 0.6.1, hbase-1.2.3
>
> The below code runs fine with Spark shell.
>
> import org.apache.spark._
> import org.apache.spark.rdd.NewHadoopRDD
> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.hbase.HColumnDescriptor
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client.Put
> import org.apache.hadoop.hbase.client.HTable
> import scala.util.Random
> import scala.math._
> import org.apache.spark.sql.functions._
> import org.apache.spark.rdd.NewHadoopRDD
> import scala.collection.JavaConversions._
> import scala.collection.JavaConverters._
> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
> import java.nio.ByteBuffer
> val tableName = "MARKETDATAHBASE"
> val conf = HBaseConfiguration.create()
> // Add local HBase conf
> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> //create rdd
> val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
> classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
> classOf[org.apache.hadoop.hbase.client.Result])
> val rdd1 = hBaseRDD.map(tuple => tuple._2).map(result => (result.getRow,
> result.getColumn("PRICE_INFO".getBytes(), "TICKER".getBytes(.map(row
> => {
> (
>   row._1.map(_.toChar).mkString,
>   row._2.asScala.reduceLeft {
> (a, b) => if (a.getTimestamp > b.getTimestamp) a else b
>   }.getValue.map(_.toChar).mkString
> )
> })
> case class columns (KEY: String, TICKER: String)
> val dfTICKER = rdd1.toDF.map(p => columns(p(0).toString,p(1).toString))
> dfTICKER.show(5)
>
>
> However, in Zeppelin it throws this error:
>
>
> dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER:
> string]
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
> Sat Oct 29 21:02:41 BST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=68599: row 'MARKETDATAHBASE,,00'
> on table 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=rhes564,16201,1477246132044, seqNum=0
>
>
> Is this related to Hbase region server?
>
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destructionof data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.The
> author will in no case be liable for any monetary damages arising from
> suchloss, damage or destruction.
>
>
>
>
>

Zeppelin using Spark to access Hbase throws error

2016-10-29 Thread Mich Talebzadeh

Spark 2.0.1, Zeppelin 0.6.1, hbase-1.2.3

The below code runs fine with Spark shell.

import org.apache.spark._
import org.apache.spark.rdd.NewHadoopRDD
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.client.HTable
import scala.util.Random
import scala.math._
import org.apache.spark.sql.functions._
import org.apache.spark.rdd.NewHadoopRDD
import scala.collection.JavaConversions._
import scala.collection.JavaConverters._
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import java.nio.ByteBuffer
val tableName = "MARKETDATAHBASE"
val conf = HBaseConfiguration.create()
// Add local HBase conf
conf.set(TableInputFormat.INPUT_TABLE, tableName)
//create rdd
val hBaseRDD = sc.newAPIHadoopRDD(conf,
classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
val rdd1 = hBaseRDD.map(tuple => tuple._2).map(result => (result.getRow,
result.getColumn("PRICE_INFO".getBytes(), "TICKER".getBytes(.map(row =>
{
(
  row._1.map(_.toChar).mkString,
  row._2.asScala.reduceLeft {
(a, b) => if (a.getTimestamp > b.getTimestamp) a else b
  }.getValue.map(_.toChar).mkString
)
})
case class columns (KEY: String, TICKER: String)
val dfTICKER = rdd1.toDF.map(p => columns(p(0).toString,p(1).toString))
dfTICKER.show(5)


However, in Zeppelin it throws this error:


dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER:
string]
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:
Sat Oct 29 21:02:41 BST 2016, null, java.net.SocketTimeoutException:
callTimeout=6, callDuration=68599: row
'MARKETDATAHBASE,,00' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044,
seqNum=0


Is this related to Hbase region server?


Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

ok what it says that it was discussed before and there is Jira on hbase
side.

it is not a showstopper anyway

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 23:53, Ted Yu  wrote:

> Mich:
> The image didn't go through.
>
> Consider using third party website.
>
> On Fri, Oct 28, 2016 at 3:52 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Gentle reminder :)
> >
> > [image: Inline images 1]
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 28 October 2016 at 23:05, Ted Yu  wrote:
> >
> >> You should have written to the mailing list earlier :-)
> >>
> >> hbase community is very responsive.
> >>
> >>
> >> On Fri, Oct 28, 2016 at 2:53 PM, Pat Ferrel 
> >> wrote:
> >>
> >> > After passing in hbase-site.xml with the increased timeout it
> completes
> >> > pretty fast with no errors.
> >> >
> >> > Thanks Ted, we’ve been going crazy trying to figure what was going on.
> >> We
> >> > moved from having Hbase installed on the Spark driver machine (though
> >> not
> >> > used) to containerized installation, where the config was left default
> >> on
> >> > the driver and only existed in the containers. We were passing in the
> >> empty
> >> > config to the spark-submit but it didn’t match the containers and
> fixing
> >> > that has made the system much happier.
> >> >
> >> > Anyway good call, we will be more aware of this with other services
> now.
> >> > Thanks for ending our weeks long fight!  :-)
> >> >
> >> >
> >> > On Oct 28, 2016, at 11:29 AM, Ted Yu  wrote:
> >> >
> >> > bq. with 400 threads hitting HBase at the same time
> >> >
> >> > How many regions are serving the 400 threads ?
> >> > How many region servers do you have ?
> >> >
> >> > If the regions are spread relatively evenly across the cluster, the
> >> above
> >> > may not be big issue.
> >> >
> >> > On Fri, Oct 28, 2016 at 11:21 AM, Pat Ferrel 
> >> > wrote:
> >> >
> >> > > Ok, will do.
> >> > >
> >> > > So the scanner does not indicate of itself that I’ve missed
> something
> >> in
> >> > > handling the data. If not index, then made a fast lookup “key”? I
> ask
> >> > > because the timeout change may work but not be the optimal solution.
> >> The
> >> > > stage that fails is very long compared to other stages. And with 400
> >> > > threads hitting HBase at the same time, this seems like something I
> >> may
> >> > > need to restructure and any advice about that would be welcome.
> >> > >
> >> > > HBase is 1.2.3
> >> > >
> >> > >
> >> > > On Oct 28, 2016, at 10:36 AM, Ted Yu  wrote:
> >> > >
> >> > > For your first question, you need to pass hbase-site.xml which has
> >> config
> >> > > parameters affecting client operations to Spark  executors.
> >> > >
> >> > > bq. missed indexing some column
> >> > >
> >> > > hbase doesn't have indexing (as in the sense of traditional RDBMS).
> >> >

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

Gentle reminder :)

[image: Inline images 1]

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 23:05, Ted Yu  wrote:

> You should have written to the mailing list earlier :-)
>
> hbase community is very responsive.
>
> On Fri, Oct 28, 2016 at 2:53 PM, Pat Ferrel  wrote:
>
> > After passing in hbase-site.xml with the increased timeout it completes
> > pretty fast with no errors.
> >
> > Thanks Ted, we’ve been going crazy trying to figure what was going on. We
> > moved from having Hbase installed on the Spark driver machine (though not
> > used) to containerized installation, where the config was left default on
> > the driver and only existed in the containers. We were passing in the
> empty
> > config to the spark-submit but it didn’t match the containers and fixing
> > that has made the system much happier.
> >
> > Anyway good call, we will be more aware of this with other services now.
> > Thanks for ending our weeks long fight!  :-)
> >
> >
> > On Oct 28, 2016, at 11:29 AM, Ted Yu  wrote:
> >
> > bq. with 400 threads hitting HBase at the same time
> >
> > How many regions are serving the 400 threads ?
> > How many region servers do you have ?
> >
> > If the regions are spread relatively evenly across the cluster, the above
> > may not be big issue.
> >
> > On Fri, Oct 28, 2016 at 11:21 AM, Pat Ferrel 
> > wrote:
> >
> > > Ok, will do.
> > >
> > > So the scanner does not indicate of itself that I’ve missed something
> in
> > > handling the data. If not index, then made a fast lookup “key”? I ask
> > > because the timeout change may work but not be the optimal solution.
> The
> > > stage that fails is very long compared to other stages. And with 400
> > > threads hitting HBase at the same time, this seems like something I may
> > > need to restructure and any advice about that would be welcome.
> > >
> > > HBase is 1.2.3
> > >
> > >
> > > On Oct 28, 2016, at 10:36 AM, Ted Yu  wrote:
> > >
> > > For your first question, you need to pass hbase-site.xml which has
> config
> > > parameters affecting client operations to Spark  executors.
> > >
> > > bq. missed indexing some column
> > >
> > > hbase doesn't have indexing (as in the sense of traditional RDBMS).
> > >
> > > Let's see what happens after hbase-site.xml is passed to executors.
> > >
> > > BTW Can you tell us the release of hbase you're using ?
> > >
> > >
> > >
> > > On Fri, Oct 28, 2016 at 10:22 AM, Pat Ferrel 
> > > wrote:
> > >
> > >> So to clarify there are some values in hbase/conf/hbase-site.xml that
> > are
> > >> needed by the calling code in the Spark driver and executors and so
> must
> > > be
> > >> passed using --files to spark-submit? If so I can do this.
> > >>
> > >> But do I have a deeper issue? Is it typical to need a scan like this?
> > > Have
> > >> I missed indexing some column maybe?
> > >>
> > >>
> > >> On Oct 28, 2016, at 9:59 AM, Ted Yu  wrote:
> > >>
> > >> Mich:
> > >> bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740
> > >>
> > >> What you observed was different issue.
> > >> The above looks like trouble with locating region(s) during scan.
> > >>
> > >> On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <
> > >> mich.talebza...@gmail.com>
> > >> wrote:
> > >>
> > >>> This is an example I got
> > >>>
> > >>> warning: there were two deprecation warnings; re-run with
> -deprecation
> > >> for
> > >>> details
> > >>> rdd1: org.apache.spark.rdd.RDD[(String, String)] =
> > MapPartitionsRDD[77]
> > >> at
> > >>> map at :151
> > >>> defined class columns
> > >>> d

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

thanks Ted I am aware of that issue of Spark 2.0.1  not handling
connections to Phoenix. For now I use Spark 2.0.1 on Hbase directly or
Spark 2.0.1 on Hbase through Hive external tables.

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 22:58, Ted Yu  wrote:

> That's another way of using hbase.
>
> Watch out for PHOENIX-
> <https://issues.apache.org/jira/browse/PHOENIX-> if you're running
> queries with Spark 2.0
>
> On Fri, Oct 28, 2016 at 2:38 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Hbase does not have indexes but Phoenix will allow one to create
> secondary
> > indexes on Hbase. The index structure will be created on Hbase itself and
> > you can maintain it from Phoenix.
> >
> > HTH
> >
> >
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 28 October 2016 at 19:29, Ted Yu  wrote:
> >
> > > bq. with 400 threads hitting HBase at the same time
> > >
> > > How many regions are serving the 400 threads ?
> > > How many region servers do you have ?
> > >
> > > If the regions are spread relatively evenly across the cluster, the
> above
> > > may not be big issue.
> > >
> > > On Fri, Oct 28, 2016 at 11:21 AM, Pat Ferrel 
> > > wrote:
> > >
> > > > Ok, will do.
> > > >
> > > > So the scanner does not indicate of itself that I’ve missed something
> > in
> > > > handling the data. If not index, then made a fast lookup “key”? I ask
> > > > because the timeout change may work but not be the optimal solution.
> > The
> > > > stage that fails is very long compared to other stages. And with 400
> > > > threads hitting HBase at the same time, this seems like something I
> may
> > > > need to restructure and any advice about that would be welcome.
> > > >
> > > > HBase is 1.2.3
> > > >
> > > >
> > > > On Oct 28, 2016, at 10:36 AM, Ted Yu  wrote:
> > > >
> > > > For your first question, you need to pass hbase-site.xml which has
> > config
> > > > parameters affecting client operations to Spark  executors.
> > > >
> > > > bq. missed indexing some column
> > > >
> > > > hbase doesn't have indexing (as in the sense of traditional RDBMS).
> > > >
> > > > Let's see what happens after hbase-site.xml is passed to executors.
> > > >
> > > > BTW Can you tell us the release of hbase you're using ?
> > > >
> > > >
> > > >
> > > > On Fri, Oct 28, 2016 at 10:22 AM, Pat Ferrel 
> > > > wrote:
> > > >
> > > > > So to clarify there are some values in hbase/conf/hbase-site.xml
> that
> > > are
> > > > > needed by the calling code in the Spark driver and executors and so
> > > must
> > > > be
> > > > > passed using --files to spark-submit? If so I can do this.
> > > > >
> > > > > But do I have a deeper issue? Is it typical to need a scan like
> this?
> > > > Have
> > > > > I missed indexing some column maybe?
> > > > >
> > > > >
> > > > > On Oct 28, 2016, at 9:59 AM, Ted Yu  wrote:
> > >

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

Hbase does not have indexes but Phoenix will allow one to create secondary
indexes on Hbase. The index structure will be created on Hbase itself and
you can maintain it from Phoenix.

HTH





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 19:29, Ted Yu  wrote:

> bq. with 400 threads hitting HBase at the same time
>
> How many regions are serving the 400 threads ?
> How many region servers do you have ?
>
> If the regions are spread relatively evenly across the cluster, the above
> may not be big issue.
>
> On Fri, Oct 28, 2016 at 11:21 AM, Pat Ferrel 
> wrote:
>
> > Ok, will do.
> >
> > So the scanner does not indicate of itself that I’ve missed something in
> > handling the data. If not index, then made a fast lookup “key”? I ask
> > because the timeout change may work but not be the optimal solution. The
> > stage that fails is very long compared to other stages. And with 400
> > threads hitting HBase at the same time, this seems like something I may
> > need to restructure and any advice about that would be welcome.
> >
> > HBase is 1.2.3
> >
> >
> > On Oct 28, 2016, at 10:36 AM, Ted Yu  wrote:
> >
> > For your first question, you need to pass hbase-site.xml which has config
> > parameters affecting client operations to Spark  executors.
> >
> > bq. missed indexing some column
> >
> > hbase doesn't have indexing (as in the sense of traditional RDBMS).
> >
> > Let's see what happens after hbase-site.xml is passed to executors.
> >
> > BTW Can you tell us the release of hbase you're using ?
> >
> >
> >
> > On Fri, Oct 28, 2016 at 10:22 AM, Pat Ferrel 
> > wrote:
> >
> > > So to clarify there are some values in hbase/conf/hbase-site.xml that
> are
> > > needed by the calling code in the Spark driver and executors and so
> must
> > be
> > > passed using --files to spark-submit? If so I can do this.
> > >
> > > But do I have a deeper issue? Is it typical to need a scan like this?
> > Have
> > > I missed indexing some column maybe?
> > >
> > >
> > > On Oct 28, 2016, at 9:59 AM, Ted Yu  wrote:
> > >
> > > Mich:
> > > bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740
> > >
> > > What you observed was different issue.
> > > The above looks like trouble with locating region(s) during scan.
> > >
> > > On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > >> This is an example I got
> > >>
> > >> warning: there were two deprecation warnings; re-run with -deprecation
> > > for
> > >> details
> > >> rdd1: org.apache.spark.rdd.RDD[(String, String)] =
> MapPartitionsRDD[77]
> > > at
> > >> map at :151
> > >> defined class columns
> > >> dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string,
> TICKER:
> > >> string]
> > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after
> > >> attempts=36, exceptions:
> > >> *Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException:
> > >> callTimeout=6, callDuration=68411: row
> > >> 'MARKETDATAHBASE,,00' on table 'hbase:meta' *at
> > >> region=hbase:meta,,1.1588230740, hostname=rhes564,16201,
> 1477246132044,
> > >> seqNum=0
> > >> at
> > >> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> > >> cas.throwEnrichedException(RpcRetryingCallerWithReadRepli
> cas.java:276)
> > >> at
> > >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> > >> ScannerCallableWithReplicas.java:210)
> > >> at
> > >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> > >> ScannerCallableWithReplicas.java:60)
> > >> at
> > >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWitho

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

sorry do you mean in my error case the issue was locating regions during
scan.

in that case I do not know why it works through spark shell but not
Zeppelin?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 17:59, Ted Yu  wrote:

> Mich:
> bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740
>
> What you observed was different issue.
> The above looks like trouble with locating region(s) during scan.
>
> On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > This is an example I got
> >
> > warning: there were two deprecation warnings; re-run with -deprecation
> for
> > details
> > rdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[77]
> at
> > map at :151
> > defined class columns
> > dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER:
> > string]
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=36, exceptions:
> > *Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException:
> > callTimeout=6, callDuration=68411: row
> > 'MARKETDATAHBASE,,00' on table 'hbase:meta' *at
> > region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044,
> > seqNum=0
> >   at
> > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> > cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
> >   at
> > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> > ScannerCallableWithReplicas.java:210)
> >   at
> > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> > ScannerCallableWithReplicas.java:60)
> >   at
> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> > RpcRetryingCaller.java:210)
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 28 October 2016 at 17:52, Pat Ferrel  wrote:
> >
> > > I will check that, but if that is a server startup thing I was not
> aware
> > I
> > > had to send it to the executors. So it’s like a connection timeout from
> > > executor code?
> > >
> > >
> > > On Oct 28, 2016, at 9:48 AM, Ted Yu  wrote:
> > >
> > > How did you change the timeout(s) ?
> > >
> > > bq. timeout is currently set to 6
> > >
> > > Did you pass hbase-site.xml using --files to Spark job ?
> > >
> > > Cheers
> > >
> > > On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel 
> > wrote:
> > >
> > > > Using standalone Spark. I don’t recall seeing connection lost errors,
> > but
> > > > there are lots of logs. I’ve set the scanner and RPC timeouts to
> large
> > > > numbers on the servers.
> > > >
> > > > But I also saw in the logs:
> > > >
> > > >org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
> > > > passed since the last invocation, timeout is currently set to 6
> > > >
> > > > Not sure where that is coming from. Does the driver machine making
> > > queries
> > > > need to have the timeout config also?
> > > >
> > > > And why so large, am I doing something wrong?
> > > >
> > > >
> > > > On Oct 28, 2016, at 8:50 AM, Ted Yu  wrote:
>

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

This is an example I got

warning: there were two deprecation warnings; re-run with -deprecation for
details
rdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[77] at
map at :151
defined class columns
dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER:
string]
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:
*Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException:
callTimeout=6, callDuration=68411: row
'MARKETDATAHBASE,,00' on table 'hbase:meta' *at
region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044,
seqNum=0
  at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
  at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210)
  at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
  at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 17:52, Pat Ferrel  wrote:

> I will check that, but if that is a server startup thing I was not aware I
> had to send it to the executors. So it’s like a connection timeout from
> executor code?
>
>
> On Oct 28, 2016, at 9:48 AM, Ted Yu  wrote:
>
> How did you change the timeout(s) ?
>
> bq. timeout is currently set to 6
>
> Did you pass hbase-site.xml using --files to Spark job ?
>
> Cheers
>
> On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel  wrote:
>
> > Using standalone Spark. I don’t recall seeing connection lost errors, but
> > there are lots of logs. I’ve set the scanner and RPC timeouts to large
> > numbers on the servers.
> >
> > But I also saw in the logs:
> >
> >org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
> > passed since the last invocation, timeout is currently set to 6
> >
> > Not sure where that is coming from. Does the driver machine making
> queries
> > need to have the timeout config also?
> >
> > And why so large, am I doing something wrong?
> >
> >
> > On Oct 28, 2016, at 8:50 AM, Ted Yu  wrote:
> >
> > Mich:
> > The OutOfOrderScannerNextException indicated problem with read from
> hbase.
> >
> > How did you know connection to Spark cluster was lost ?
> >
> > Cheers
> >
> > On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> >> Looks like it lost the connection to Spark cluster.
> >>
> >> What mode you are using with Spark, Standalone, Yarn or others. The
> issue
> >> looks like a resource manager issue.
> >>
> >> I have seen this when running Zeppelin with Spark on Hbase.
> >>
> >> HTH
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn * https://www.linkedin.com/profile/view?id=
> >> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> <https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> >> OABUrV8Pw>*
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> > from
> >> such loss, damage or destruction.
> >>
> >>
> >>
> >> On 28 October 2016 at 16:38, Pat Ferrel  wrote:
> >>
> >>> I’m getting data from HBase using a large Spark cluster with
> parallelism
> >>> of near 400. The query fails quire often with the message below.
> >> Sometimes
> >>> a retry will work and sometimes the ultimate failure results (below).
> >>>
> >>> If I reduce parallelism in Spark

Re: Scanner timeouts

2016-10-28 Thread Mich Talebzadeh

Looks like it lost the connection to Spark cluster.

What mode you are using with Spark, Standalone, Yarn or others. The issue
looks like a resource manager issue.

I have seen this when running Zeppelin with Spark on Hbase.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 16:38, Pat Ferrel  wrote:

> I’m getting data from HBase using a large Spark cluster with parallelism
> of near 400. The query fails quire often with the message below. Sometimes
> a retry will work and sometimes the ultimate failure results (below).
>
> If I reduce parallelism in Spark it slows other parts of the algorithm
> unacceptably. I have also experimented with very large RPC/Scanner timeouts
> of many minutes—to no avail.
>
> Any clues about what to look for or what may be setup wrong in my tables?
>
> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> ip-172-16-3-9.eu-central-1.compute.internal): 
> org.apache.hadoop.hbase.DoNotRetryIOException:
> Failed after retry of OutOfOrderScannerNextException: was there a rpc
> timeout?+details
> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> ip-172-16-3-9.eu-central-1.compute.internal): 
> org.apache.hadoop.hbase.DoNotRetryIOException:
> Failed after retry of OutOfOrderScannerNextException: was there a rpc
> timeout? at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(
> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at
>

Re: Hbase fast access

2016-10-24 Thread Mich Talebzadeh

Thanks Dave,

Yes defragging is a process to get rid of fragmentation and block/page
chaining.

I must admit that Hbase architecture in terms of memory management is
similar to what something like Oracle or SAP ASE do. Sounds like after a
long journey memory is the best place to do data manipulation. LSM tree
structure is pretty impressive compared to the traditional B-tree access in
RDBMS.



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 18:53, Dave Birdsall  wrote:

> At a physical level HBase is append-only.
>
> At a logical level, one can update data in HBase just like one can in any
> RDBMS.
>
> The memstore/block cache and compaction logic are the mechanisms that
> bridge between these two views.
>
> What makes LSMs attractive performance-wise in comparison to traditional
> RDMS storage architectures is that memory speeds and CPU speeds have
> increased at a faster rate than Disk I/O transfer speeds.
>
> Even in traditional RDBMS though it is useful to periodically perform file
> reorganizations, that is, rewrite scattered disk blocks into sequence on
> disk. Many RDBMSs do this; Tandem did it way back in the 1980s for example.
> But caches were not large enough to have an LSM-style architecture back
> then.
>
> Dave
>
> -Original Message-
> From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
> Sent: Friday, October 21, 2016 2:09 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase fast access
>
> I was asked an interesting question.
>
> Can one update data in Hbase? and my answer was it is only append only
>
> Can one update data in Hive? My answer was yes if table is created as ORC
> and tableproperties set with "transactional"="true"
>
>
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY", "transactional"="true",
> "orc.create.index"="true", "orc.bloom.filter.columns"="object_id",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="1" )
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 22:01, Ted Yu  wrote:
>
> > It is true in the sense that hfile, once written (and closed), becomes
> > immutable.
> >
> > Compaction would remove obsolete content and generate new hfiles.
> >
> > Cheers
> >
> > On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > wrote:
> >
> > > BTW. I always understood that Hbase is append only. is that
> > > generally
> > true?
> > >
> > > thx
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac
> > > PCCd
> > > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility
> > > for any loss, damage or destruction of data or any other property
> > > which may arise from relying on this email's technical content is
> explicitly disclaimed.
> > > The author will in no case be liable for any monetary damages
> > > arising
> > from
> > > such loss, damage or destruction.
> > >
> > &g

Index in Phoenix view on Hbase is not updated

2016-10-22 Thread Mich Talebzadeh

Hi,

I have a Hbase table that is populated via
org.apache.hadoop.hbase.mapreduce.ImportTsv
through bulk load ever 15 minutes. This works fine.

In Phoenix I created a view on this table

jdbc:phoenix:rhes564:2181> create index marketDataHbase_idx on
"marketDataHbase" ("price_info"."ticker", "price_info"."price",
"price_info"."timecreated");

This also does what is supposed to do and shows correct count.

I then created an index in Phoenix as below

create index index_dx1 on "marketDataHbase"
("price_info"."timecreated","price_info"."ticker", "price_info"."price");

that showed the records OK at that time. I verified this using explain


0: jdbc:phoenix:rhes564:2181> explain select count(1) from
"marketDataHbase";
+-+
|  PLAN   |
+-+
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER INDEX_DX1  |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO SINGLE ROW|
+-+

Now the issue is that the above does not show new data since build in Hbase
table unless I do the following:

0: jdbc:phoenix:rhes564:2181> alter index INDEX_DX1 on "marketDataHbase"
rebuild;


Which is not what an index should do (The covered index should be
maintained automatically).
The simple issue is how to overcome this problem?

As I understand the index in Phoenix ia another file independent of the
original phoenix view so I assume that this index file is not updated for
one reason or other?

Thanks

Re: ETL HBase HFile+HLog to ORC(or Parquet) file?

2016-10-21 Thread Mich Talebzadeh

Hi Demai,

As I understand you want to use Hbase as the real time layer and Hive Data
Warehouse as the batch layer for analytics.

In other words ingest data real time from source into Hbase and push that
data into Hive recurring

If you partition your target ORC table with DtStamp and INSERT/OVERWRITE
into this table using Spark as the execution engine for Hive (as opposed to
map-reduce) it should pretty fast.

Hive is going to get an in-memory database in the next release or so it is
a perfect choice.


HTH




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 22:28, Demai Ni  wrote:

> Mich,
>
> thanks for the detail instructions.
>
> While aware of the Hive method, I have a few questions/concerns:
> 1) the Hive method is a "INSERT FROM SELECT " ,which usually not perform as
> good as a bulk load though I am not familiar with the real implementation
> 2) I have another SQL-on-Hadoop engine working well with ORC file. So if
> possible, I'd like to avoid the system dependency on Hive(one fewer
> component to maintain).
> 3) HBase has well running back-end process for Replication(HBASE-1295) or
> Backup(HBASE-7912), so  wondering anything can be piggy-back on it to deal
> with day-to-day works
>
> The goal is to have HBase as a OLTP front(to receive data), and the ORC
> file(with a SQL engine) as the OLAP end for reporting/analytic. the ORC
> file will also serve as my backup in the case for DR.
>
> Demai
>
>
> On Fri, Oct 21, 2016 at 1:57 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Create an external table in Hive on Hbase atble. Pretty straight forward.
> >
> > hive>  create external table marketDataHbase (key STRING, ticker STRING,
> > timecreated STRING, price STRING)
> >
> > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH
> > SERDEPROPERTIES ("hbase.columns.mapping" =
> > ":key,price_info:ticker,price_info:timecreated, price_info:price")
> >
> > TBLPROPERTIES ("hbase.table.name" = "marketDataHbase");
> >
> >
> >
> > then create a normal table in hive as ORC
> >
> >
> > CREATE TABLE IF NOT EXISTS marketData (
> >  KEY string
> >, TICKER string
> >, TIMECREATED string
> >, PRICE float
> > )
> > PARTITIONED BY (DateStamp  string)
> > STORED AS ORC
> > TBLPROPERTIES (
> > "orc.create.index"="true",
> > "orc.bloom.filter.columns"="KEY",
> > "orc.bloom.filter.fpp"="0.05",
> > "orc.compress"="SNAPPY",
> > "orc.stripe.size"="16777216",
> > "orc.row.index.stride"="1" )
> > ;
> > --show create table marketData;
> > --Populate target table
> > INSERT OVERWRITE TABLE marketData PARTITION (DateStamp = "${TODAY}")
> > SELECT
> >   KEY
> > , TICKER
> > , TIMECREATED
> > , PRICE
> > FROM MarketDataHbase
> >
> >
> > Run this job as a cron every often
> >
> >
> > HTH
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:48, Demai Ni  wrote:
> >
> > > hi,
> > >
> > > I am wondering whether there are existing methods to ETL HBase data to
> > > ORC(or other open source columnar) file?
> > >
> >

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

I was asked an interesting question.

Can one update data in Hbase? and my answer was it is only append only

Can one update data in Hive? My answer was yes if table is created as ORC
and tableproperties set with "transactional"="true"


STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"transactional"="true",
"orc.create.index"="true",
"orc.bloom.filter.columns"="object_id",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 22:01, Ted Yu  wrote:

> It is true in the sense that hfile, once written (and closed), becomes
> immutable.
>
> Compaction would remove obsolete content and generate new hfiles.
>
> Cheers
>
> On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > BTW. I always understood that Hbase is append only. is that generally
> true?
> >
> > thx
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:57, Mich Talebzadeh 
> > wrote:
> >
> > > agreed much like any rdbms
> > >
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 21:54, Ted Yu  wrote:
> > >
> > >> Well, updates (in memory) would ultimately be flushed to disk,
> resulting
> > >> in
> > >> new hfiles.
> > >>
> > >> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
> > >> mich.talebza...@gmail.com>
> > >> wrote:
> > >>
> > >> > thanks
> > >> >
> > >> > bq. all updates are done in memory o disk access
> > >> >
> > >> > I meant data updates are operated in memory, no disk access.
> > >> >
> > >> > in other much like rdbms read data into memory and update it there
> > >> > (assuming that data is not already in memory?)
> > >> >
> > >> > HTH
> > >> >
> > >> > Dr Mich Talebzadeh
> > >> >
> > >> >
> > >> >
> > >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrb
> > >> Jd6zP6AcPCCd
> > >> > OABUrV8Pw>*
> > >> >
> > >> >
> > >> >
> &g

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

BTW. I always understood that Hbase is append only. is that generally true?

thx

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:57, Mich Talebzadeh 
wrote:

> agreed much like any rdbms
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 21:54, Ted Yu  wrote:
>
>> Well, updates (in memory) would ultimately be flushed to disk, resulting
>> in
>> new hfiles.
>>
>> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
>> wrote:
>>
>> > thanks
>> >
>> > bq. all updates are done in memory o disk access
>> >
>> > I meant data updates are operated in memory, no disk access.
>> >
>> > in other much like rdbms read data into memory and update it there
>> > (assuming that data is not already in memory?)
>> >
>> > HTH
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn * https://www.linkedin.com/profile/view?id=
>> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrb
>> Jd6zP6AcPCCd
>> > OABUrV8Pw>*
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The author will in no case be liable for any monetary damages arising
>> from
>> > such loss, damage or destruction.
>> >
>> >
>> >
>> > On 21 October 2016 at 21:46, Ted Yu  wrote:
>> >
>> > > bq. this search is carried out through map-reduce on region servers?
>> > >
>> > > No map-reduce. region server uses its own thread(s).
>> > >
>> > > bq. all updates are done in memory o disk access
>> > >
>> > > Can you clarify ? There seems to be some missing letters.
>> > >
>> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
>> > > mich.talebza...@gmail.com>
>> > > wrote:
>> > >
>> > > > thanks
>> > > >
>> > > > having read the docs it appears to me that the main reason of hbase
>> > being
>> > > > faster is:
>> > > >
>> > > >
>> > > >1. it behaves like an rdbms like oracle tetc. reads are looked
>> for
>> > in
>> > > >the buffer cache for consistent reads and if not found then store
>> > > files
>> > > > on
>> > > >disks are searched. Does this mean that this search is carried
>> out
>> > > > through
>> > > >map-reduce on region servers?
>> > > >2. when the data is written it is written to log file
>> sequentially
>> > > >first, then to in-memory store, sorted like b-tree of rdbms and
>> then
>> > > >flushed to disk. this is exactly what checkpoint in an rdbms does
>> > > >3. one can point out that hbase is faster because log structured
>> > merge
>> > > >tree (LSM-trees)  has less depth than a B-tree in rdbms.
>> > > >4. all updates are done in memory o disk access
>> > > >5. in su

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

agreed much like any rdbms



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:54, Ted Yu  wrote:

> Well, updates (in memory) would ultimately be flushed to disk, resulting in
> new hfiles.
>
> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > thanks
> >
> > bq. all updates are done in memory o disk access
> >
> > I meant data updates are operated in memory, no disk access.
> >
> > in other much like rdbms read data into memory and update it there
> > (assuming that data is not already in memory?)
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:46, Ted Yu  wrote:
> >
> > > bq. this search is carried out through map-reduce on region servers?
> > >
> > > No map-reduce. region server uses its own thread(s).
> > >
> > > bq. all updates are done in memory o disk access
> > >
> > > Can you clarify ? There seems to be some missing letters.
> > >
> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > thanks
> > > >
> > > > having read the docs it appears to me that the main reason of hbase
> > being
> > > > faster is:
> > > >
> > > >
> > > >1. it behaves like an rdbms like oracle tetc. reads are looked for
> > in
> > > >the buffer cache for consistent reads and if not found then store
> > > files
> > > > on
> > > >disks are searched. Does this mean that this search is carried out
> > > > through
> > > >map-reduce on region servers?
> > > >2. when the data is written it is written to log file sequentially
> > > >first, then to in-memory store, sorted like b-tree of rdbms and
> then
> > > >flushed to disk. this is exactly what checkpoint in an rdbms does
> > > >3. one can point out that hbase is faster because log structured
> > merge
> > > >tree (LSM-trees)  has less depth than a B-tree in rdbms.
> > > >4. all updates are done in memory o disk access
> > > >5. in summary LSM-trees reduce disk access when data is read from
> > disk
> > > >because of reduced seek time again less depth to get data with
> > > LSM-tree
> > > >
> > > >
> > > > appreciate any comments
> > > >
> > > >
> > > > cheers
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
&g

Re: ETL HBase HFile+HLog to ORC(or Parquet) file?

2016-10-21 Thread Mich Talebzadeh

Create an external table in Hive on Hbase atble. Pretty straight forward.

hive>  create external table marketDataHbase (key STRING, ticker STRING,
timecreated STRING, price STRING)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH
SERDEPROPERTIES ("hbase.columns.mapping" =
":key,price_info:ticker,price_info:timecreated, price_info:price")

TBLPROPERTIES ("hbase.table.name" = "marketDataHbase");



then create a normal table in hive as ORC


CREATE TABLE IF NOT EXISTS marketData (
 KEY string
   , TICKER string
   , TIMECREATED string
   , PRICE float
)
PARTITIONED BY (DateStamp  string)
STORED AS ORC
TBLPROPERTIES (
"orc.create.index"="true",
"orc.bloom.filter.columns"="KEY",
"orc.bloom.filter.fpp"="0.05",
"orc.compress"="SNAPPY",
"orc.stripe.size"="16777216",
"orc.row.index.stride"="1" )
;
--show create table marketData;
--Populate target table
INSERT OVERWRITE TABLE marketData PARTITION (DateStamp = "${TODAY}")
SELECT
  KEY
, TICKER
, TIMECREATED
, PRICE
FROM MarketDataHbase


Run this job as a cron every often


HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:48, Demai Ni  wrote:

> hi,
>
> I am wondering whether there are existing methods to ETL HBase data to
> ORC(or other open source columnar) file?
>
> I understand in Hive "insert into Hive_ORC_Table from SELET * from
> HIVE_HBase_Table", can probably get the job done. Is this the common way to
> do so? Performance is acceptable and able to handle the delta update in the
> case HBase table changed?
>
> I did a bit google, and find this
> https://community.hortonworks.com/questions/2632/loading-
> hbase-from-hive-orc-tables.html
>
> which is another way around.
>
> Will it perform better(comparing to above Hive stmt) if using either
> replication logic or snapshot backup to generate ORC file from hbase tables
> and with incremental update ability?
>
> I hope to has as fewer dependency as possible. in the Example of ORC, will
> only depend on Apache ORC's API, and not depend on Hive
>
> Demai
>

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

thanks

bq. all updates are done in memory o disk access

I meant data updates are operated in memory, no disk access.

in other much like rdbms read data into memory and update it there
(assuming that data is not already in memory?)

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 21:46, Ted Yu  wrote:

> bq. this search is carried out through map-reduce on region servers?
>
> No map-reduce. region server uses its own thread(s).
>
> bq. all updates are done in memory o disk access
>
> Can you clarify ? There seems to be some missing letters.
>
> On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > thanks
> >
> > having read the docs it appears to me that the main reason of hbase being
> > faster is:
> >
> >
> >1. it behaves like an rdbms like oracle tetc. reads are looked for in
> >the buffer cache for consistent reads and if not found then store
> files
> > on
> >disks are searched. Does this mean that this search is carried out
> > through
> >map-reduce on region servers?
> >2. when the data is written it is written to log file sequentially
> >first, then to in-memory store, sorted like b-tree of rdbms and then
> >flushed to disk. this is exactly what checkpoint in an rdbms does
> >3. one can point out that hbase is faster because log structured merge
> >tree (LSM-trees)  has less depth than a B-tree in rdbms.
> >4. all updates are done in memory o disk access
> >5. in summary LSM-trees reduce disk access when data is read from disk
> >because of reduced seek time again less depth to get data with
> LSM-tree
> >
> >
> > appreciate any comments
> >
> >
> > cheers
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 17:51, Ted Yu  wrote:
> >
> > > See some prior blog:
> > >
> > > http://www.cyanny.com/2014/03/13/hbase-architecture-
> > > analysis-part1-logical-architecture/
> > >
> > > w.r.t. compaction in Hive, it is used to compact deltas into a base
> file
> > > (in the context of transactions).  Likely they're different.
> > >
> > > Cheers
> > >
> > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can someone in a nutshell explain *the *Hbase use of log-structured
> > > > merge-tree (LSM-tree) as data storage architecture
> > > >
> > > > The idea of merging smaller files to larger files periodically to
> > reduce
> > > > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any a

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

thanks

having read the docs it appears to me that the main reason of hbase being
faster is:


   1. it behaves like an rdbms like oracle tetc. reads are looked for in
   the buffer cache for consistent reads and if not found then store files on
   disks are searched. Does this mean that this search is carried out through
   map-reduce on region servers?
   2. when the data is written it is written to log file sequentially
   first, then to in-memory store, sorted like b-tree of rdbms and then
   flushed to disk. this is exactly what checkpoint in an rdbms does
   3. one can point out that hbase is faster because log structured merge
   tree (LSM-trees)  has less depth than a B-tree in rdbms.
   4. all updates are done in memory o disk access
   5. in summary LSM-trees reduce disk access when data is read from disk
   because of reduced seek time again less depth to get data with LSM-tree


appreciate any comments


cheers


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 17:51, Ted Yu  wrote:

> See some prior blog:
>
> http://www.cyanny.com/2014/03/13/hbase-architecture-
> analysis-part1-logical-architecture/
>
> w.r.t. compaction in Hive, it is used to compact deltas into a base file
> (in the context of transactions).  Likely they're different.
>
> Cheers
>
> On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Can someone in a nutshell explain *the *Hbase use of log-structured
> > merge-tree (LSM-tree) as data storage architecture
> >
> > The idea of merging smaller files to larger files periodically to reduce
> > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 15:27, Mich Talebzadeh 
> > wrote:
> >
> > > Sorry that should read Hive not Spark here
> > >
> > > Say compared to Spark that is basically a SQL layer relying on
> different
> > > engines (mr, Tez, Spark) to execute the code
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 13:17, Ted Yu  wrote:
> > >
> > >> Mich:
> > >> Here is brief description of hbase architecture:
> > >> https://hbase.apache.org/book.html#arch.overview
> > >>
> > >> You can also get more details from Lars George's or Nick Dimiduk's
> > books.
> > >>
> > >> HBase doesn't support SQL directly. There is no cost based
> optimization.
> > >>
> > >> Cheers
> > >>
> > >> &g

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

Hi,

Can someone in a nutshell explain *the *Hbase use of log-structured
merge-tree (LSM-tree) as data storage architecture

The idea of merging smaller files to larger files periodically to reduce
disk seeks,  is this similar concept to compaction in HDFS or Hive?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 15:27, Mich Talebzadeh 
wrote:

> Sorry that should read Hive not Spark here
>
> Say compared to Spark that is basically a SQL layer relying on different
> engines (mr, Tez, Spark) to execute the code
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 13:17, Ted Yu  wrote:
>
>> Mich:
>> Here is brief description of hbase architecture:
>> https://hbase.apache.org/book.html#arch.overview
>>
>> You can also get more details from Lars George's or Nick Dimiduk's books.
>>
>> HBase doesn't support SQL directly. There is no cost based optimization.
>>
>> Cheers
>>
>> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh 
>> wrote:
>> >
>> > Hi,
>> >
>> > This is a general question.
>> >
>> > Is Hbase fast because Hbase uses Hash tables and provides random access,
>> > and it stores the data in indexed HDFS files for faster lookups.
>> >
>> > Say compared to Spark that is basically a SQL layer relying on different
>> > engines (mr, Tez, Spark) to execute the code (although it has Cost Base
>> > Optimizer), how Hbase fares, beyond relying on these engines
>> >
>> > Thanks
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJ
>> d6zP6AcPCCdOABUrV8Pw
>> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrb
>> Jd6zP6AcPCCdOABUrV8Pw>*
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly disclaimed.
>> > The author will in no case be liable for any monetary damages arising
>> from
>> > such loss, damage or destruction.
>>
>
>

Re: Hbase fast access

2016-10-21 Thread Mich Talebzadeh

Sorry that should read Hive not Spark here

Say compared to Spark that is basically a SQL layer relying on different
engines (mr, Tez, Spark) to execute the code

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 21 October 2016 at 13:17, Ted Yu  wrote:

> Mich:
> Here is brief description of hbase architecture:
> https://hbase.apache.org/book.html#arch.overview
>
> You can also get more details from Lars George's or Nick Dimiduk's books.
>
> HBase doesn't support SQL directly. There is no cost based optimization.
>
> Cheers
>
> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh 
> wrote:
> >
> > Hi,
> >
> > This is a general question.
> >
> > Is Hbase fast because Hbase uses Hash tables and provides random access,
> > and it stores the data in indexed HDFS files for faster lookups.
> >
> > Say compared to Spark that is basically a SQL layer relying on different
> > engines (mr, Tez, Spark) to execute the code (although it has Cost Base
> > Optimizer), how Hbase fares, beyond relying on these engines
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
>

Hbase fast access

2016-10-21 Thread Mich Talebzadeh

Hi,

This is a general question.

Is Hbase fast because Hbase uses Hash tables and provides random access,
and it stores the data in indexed HDFS files for faster lookups.

Say compared to Spark that is basically a SQL layer relying on different
engines (mr, Tez, Spark) to execute the code (although it has Cost Base
Optimizer), how Hbase fares, beyond relying on these engines

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Doing map-reduce with Hive external table on Hbase throws error

2016-10-20 Thread Mich Talebzadeh

Thanks Ted

hbase-1.2.3 worked!



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 20 October 2016 at 17:09, Ted Yu  wrote:

> I downloaded hive 2.0.1 source tar ball.
>
> In their pom.xml :
>
> 1.1.1
>
> Can you run against 1.1.1 or newer hbase release ?
>
> On Thu, Oct 20, 2016 at 8:58 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Hive 2.0.1
> > Hbase 0.98
> >
> > hive> select max(price) from test.marketdatahbase;
> >
> > Throws:
> >
> > Caused by: java.lang.NoSuchMethodError:
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >
> >
> > I have both hbase-protocol-0.98.21-hadoop2.jar and
> protobuf-java-2.5.0.jar
> > in $HBASE_HOME/lib ditectory
> >
> > Full error as below
> >
> > Query ID = hduser_20161020164447_d283db5c-056d-4d40-8998-d2cca1e63f12
> > Total jobs = 1
> > Launching Job 1 out of 1
> > Number of reduce tasks determined at compile time: 1
> > In order to change the average load for a reducer (in bytes):
> >   set hive.exec.reducers.bytes.per.reducer=
> > In order to limit the maximum number of reducers:
> >   set hive.exec.reducers.max=
> > In order to set a constant number of reducers:
> >   set mapreduce.job.reduces=
> > Starting Job = job_1476869096162_0503, Tracking URL =
> > http://rhes564:8088/proxy/application_1476869096162_0503/
> > Kill Command = /home/hduser/hadoop-2.7.3/bin/hadoop job  -kill
> > job_1476869096162_0503
> > Hadoop job information for Stage-1: number of mappers: 2; number of
> > reducers: 1
> > 2016-10-20 16:45:01,146 Stage-1 map = 0%,  reduce = 0%
> > 2016-10-20 16:45:39,143 Stage-1 map = 100%,  reduce = 100%
> > Ended Job = job_1476869096162_0503 with errors
> > Error during job, obtaining debugging information...
> > Examining task ID: task_1476869096162_0503_m_00 (and more) from job
> > job_1476869096162_0503
> > Task with the most failures(4):
> > -
> > Task ID:
> >   task_1476869096162_0503_m_00
> > URL:
> >
> > http://rhes564:8088/taskdetails.jsp?jobid=job_
> > 1476869096162_0503&tipid=task_1476869096162_0503_m_00
> > -
> > Diagnostic Messages for this Task:
> > Error: java.io.IOException: java.io.IOException:
> > java.lang.reflect.InvocationTargetException
> > at
> > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.
> > handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> > at
> > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.
> > handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> > at
> > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(
> > HiveInputFormat.java:303)
> > at
> > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(
> > CombineHiveInputFormat.java:662)
> > at
> > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<
> > init>(MapTask.java:169)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.
> java:432)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1698)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> > Caused by: java.io.IOException: java.lang.reflect.
> > InvocationTargetException
> > at
> > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(
> > ConnectionFactory.java:240)
> > at
> > org.apache.hadoop.hbase.client.ConnectionManager.createConnection(
> > ConnectionManager.java:420)
> > at
> > org.apache.hadoop.hbase.client.ConnectionManager.createConnection(
> > ConnectionManager.java:413)
> > at
> > org.apache.hadoop.hbase.client.

Doing map-reduce with Hive external table on Hbase throws error

2016-10-20 Thread Mich Talebzadeh

Manager$HConnectionImplementation.(ConnectionManager.java:635)
... 22 more










Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Creating an external hive table on Hbase throws error.

2016-10-19 Thread Mich Talebzadeh

Sorted this one out

Need to put phoenix-4.8.0-HBase-0.98-client.jar in $HBASE_HOME/lib directory
though it does say anything about Phoenix

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 October 2016 at 15:55, Mich Talebzadeh 
wrote:

> This used to work before. I think it started playing up when I started
> region server all on the same host. In short I could create the table.
>
> Now I am getting
>
> hive> create external table marketDataHbase (key STRING, ticker STRING,
> timecreated STRING, price STRING)
> > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH
> SERDEPROPERTIES ("hbase.columns.mapping" = 
> ":key,price_info:ticker,price_info:timecreated,
> price_info:price")
> > TBLPROPERTIES ("hbase.table.name" = "marketDataHbase");
>
>
>
>
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:MetaException(message:java.io.IOException:
> java.lang.reflect.InvocationTargetException
> at org.apache.hadoop.hbase.client.ConnectionFactory.
> createConnection(ConnectionFactory.java:240)
> at org.apache.hadoop.hbase.client.ConnectionManager.
> createConnection(ConnectionManager.java:420)
> at org.apache.hadoop.hbase.client.ConnectionManager.
> createConnection(ConnectionManager.java:413)
> at org.apache.hadoop.hbase.client.ConnectionManager.
> getConnectionInternal(ConnectionManager.java:291)
>
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.hbase.client.ConnectionFactory.
> createConnection(ConnectionFactory.java:238)
> ... 42 more
>
> Any ideas what can cause this?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Creating an external hive table on Hbase throws error.

2016-10-19 Thread Mich Talebzadeh

This used to work before. I think it started playing up when I started
region server all on the same host. In short I could create the table.

Now I am getting

hive> create external table marketDataHbase (key STRING, ticker STRING,
timecreated STRING, price STRING)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH
SERDEPROPERTIES ("hbase.columns.mapping" =
":key,price_info:ticker,price_info:timecreated, price_info:price")
> TBLPROPERTIES ("hbase.table.name" = "marketDataHbase");




FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:MetaException(message:java.io.IOException:
java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at
org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:420)
at
org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:413)
at
org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:291)

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    ... 42 more

Any ideas what can cause this?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-18 Thread Mich Talebzadeh

The design really needs to look at other stack as well.

If the visualisation layer is going to use Tableau then you cannot use
Spark functional programming. Only Spark SQL or anything that works with
SQL like Hive or Phoenix.

Tableau is not a real time dashboard so for analytics it maps tables in
database as it sees it. It has ODBC/JDBC connection to Hive (don't know
about Phoenix).

So that is the advantage of Hive. Any caching, yes you can cache some data
in Tableau Server cache but we all agree that it is only finite. The same
is true for anything that relies on memory Hive + LLAP, any in-memory
database (I tried Tableau on Oracle TimesTen), you can only cache certain
amount of data and no one is going to splash for large memory for analytics.

Bear in mind that performance is a deployment issue and you are unlikely to
be able to create the same conditions as PROD in a test environment.



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 October 2016 at 08:18, Jörn Franke  wrote:

> Careful Hbase with Phoenix is only in certain scenarios faster. When it is
> about processing small amounts out of a bigger amount of data (depends on
> node memory, the operation etc).  Hive+tez+orc can  be rather competitive,
> llap makes sense for interactive ad-hoc queries that are rather similar.
> Both Phoenix and hive follow different purposes with a different
> architecture and underlying data structure.
>
> On 18 Oct 2016, at 07:44, Mich Talebzadeh 
> wrote:
>
> yes Hive external table is partitioned on a daily basis (datestamp below)
>
> CREATE EXTERNAL TABLE IF NOT EXISTS ${DATABASE}.externalMarketData (
>  KEY string
>, SECURITY string
>, TIMECREATED string
>, PRICE float
> )
> COMMENT 'From prices Kakfa delivered by Flume location by day'
> ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> STORED AS TEXTFILE
> LOCATION 'hdfs://rhes564:9000/data/prices/'
> --TBLPROPERTIES ("skip.header.line.count"="1")
> ;
> ALTER TABLE ${DATABASE}.externalMarketData set location
> 'hdfs://rhes564:9000/data/prices/${TODAY}';
>
> and there is insert/overwrite into managed table every 15 minutes.
>
> INSERT OVERWRITE TABLE ${DATABASE}.marketData PARTITION (DateStamp =
> "${TODAY}")
> SELECT
>   KEY
> , SECURITY
> , TIMECREATED
> , PRICE
> , 1
> , CAST(from_unixtime(unix_timestamp()) AS timestamp)
> FROM ${DATABASE}.externalMarketData
>
> That works fine. However, Hbase is much faster for data retrieval with
> phoenix
>
> When we get Hive with LLAP, I gather Hive will replace Hbase.
>
> So in summary we have
>
>
>1. raw data delivered to HDFS
>2. data ingested into Hbase via cron
>3. HDFS directory is mapped to Hive external table
>4. There is Hive managed table with added optimisation/indexing (ORC)
>
>
> There are a number of ways of doing it as usual.
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 18 October 2016 at 00:48, ayan guha  wrote:
>
>> I do not see a rationale to have hbase in this scheme of thingsmay be
>> I am missing something?
>>
>> If data is delivered in HDFS, why not just add partition to an existing
>> Hive table?
>>
>> On Tue, Oct 18, 2016 at 8:23 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks Mike,
>>>
>>> My test csv data comes as
>>>
>>> UUID, ticker,  timecreated,
>>> price
>>> a2c844ed-137f-4820-aa6e-c49739e46fa6,

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Mich Talebzadeh

yes Hive external table is partitioned on a daily basis (datestamp below)

CREATE EXTERNAL TABLE IF NOT EXISTS ${DATABASE}.externalMarketData (
 KEY string
   , SECURITY string
   , TIMECREATED string
   , PRICE float
)
COMMENT 'From prices Kakfa delivered by Flume location by day'
ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 'hdfs://rhes564:9000/data/prices/'
--TBLPROPERTIES ("skip.header.line.count"="1")
;
ALTER TABLE ${DATABASE}.externalMarketData set location
'hdfs://rhes564:9000/data/prices/${TODAY}';

and there is insert/overwrite into managed table every 15 minutes.

INSERT OVERWRITE TABLE ${DATABASE}.marketData PARTITION (DateStamp =
"${TODAY}")
SELECT
  KEY
, SECURITY
, TIMECREATED
, PRICE
, 1
, CAST(from_unixtime(unix_timestamp()) AS timestamp)
FROM ${DATABASE}.externalMarketData

That works fine. However, Hbase is much faster for data retrieval with
phoenix

When we get Hive with LLAP, I gather Hive will replace Hbase.

So in summary we have


   1. raw data delivered to HDFS
   2. data ingested into Hbase via cron
   3. HDFS directory is mapped to Hive external table
   4. There is Hive managed table with added optimisation/indexing (ORC)


There are a number of ways of doing it as usual.

Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 October 2016 at 00:48, ayan guha  wrote:

> I do not see a rationale to have hbase in this scheme of thingsmay be
> I am missing something?
>
> If data is delivered in HDFS, why not just add partition to an existing
> Hive table?
>
> On Tue, Oct 18, 2016 at 8:23 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Thanks Mike,
>>
>> My test csv data comes as
>>
>> UUID, ticker,  timecreated,
>> price
>> a2c844ed-137f-4820-aa6e-c49739e46fa6, S01, 2016-10-17T22:02:09,
>> 53.36665625650533484995
>> a912b65e-b6bc-41d4-9e10-d6a44ea1a2b0, S02, 2016-10-17T22:02:09,
>> 86.31917515824627016510
>> 5f4e3a9d-05cc-41a2-98b3-40810685641e, S03, 2016-10-17T22:02:09,
>> 95.48298277703729129559
>>
>>
>> And this is my Hbase table with one column family
>>
>> create 'marketDataHbase', 'price_info'
>>
>> It is populated every 15 minutes from test.csv files delivered via Kafka
>> and Flume to HDFS
>>
>>
>>1. Create a fat csv file based on all small csv files for today -->
>>prices/2016-10-17
>>2. Populate data into Hbase table using 
>> org.apache.hadoop.hbase.mapreduce.ImportTsv
>>
>>3. This is pretty quick using MapReduce
>>
>>
>> That importTsv only appends new rows to Hbase table as the choice of UUID
>> as rowKey avoids any issues.
>>
>> So I only have 15 minutes lag in my batch Hbase table.
>>
>> I have both Hive ORC tables and Phoenix views on top of this Hbase
>> tables.
>>
>>
>>1. Phoenix offers the fastest response if used on top of Hbase.
>>unfortunately, Spark 2 with Phoenix is broken
>>2. Spark on Hive on Hbase looks OK. This works fine with Spark 2
>>3. Spark on Hbase tables directly using key, value DFs for each
>>column. Not as fast as 2 but works. I don't think a DF is a good choice 
>> for
>>a key, value pair?
>>
>> Now if I use Zeppelin to read from Hbase
>>
>>
>>1. I can use Phoenix JDBC. That looks very fast
>>2. I can use Spark csv directly on HDFS csv files.
>>3. I can use Spark on Hive tables
>>
>>
>> If I use Tableau on Hbase data then, only sql like code is useful.
>> Phoenix or Hive
>>
>> I don't want to change the design now. But admittedly Hive is the best
>> SQL on top of Hbase. Next release of Hive is going to have in-memory
>> database (LLAP) so we can cache Hive tables in memory. That will be faster.
>> Many people underestimate Hive but I still believe it has a lot to offer
>> besides serious ANSI compliant SQL.
>>
>> Regards
>>
>>  Mich
>>
>>
>>
>>
>>
>>
>>

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Mich Talebzadeh

Thanks Mike,

My test csv data comes as

UUID, ticker,  timecreated,  price
a2c844ed-137f-4820-aa6e-c49739e46fa6, S01, 2016-10-17T22:02:09,
53.36665625650533484995
a912b65e-b6bc-41d4-9e10-d6a44ea1a2b0, S02, 2016-10-17T22:02:09,
86.31917515824627016510
5f4e3a9d-05cc-41a2-98b3-40810685641e, S03, 2016-10-17T22:02:09,
95.48298277703729129559

And this is my Hbase table with one column family

create 'marketDataHbase', 'price_info'

It is populated every 15 minutes from test.csv files delivered via Kafka
and Flume to HDFS

   1. Create a fat csv file based on all small csv files for today -->
   prices/2016-10-17
   2. Populate data into Hbase table using
   org.apache.hadoop.hbase.mapreduce.ImportTsv
   3. This is pretty quick using MapReduce

That importTsv only appends new rows to Hbase table as the choice of UUID
as rowKey avoids any issues.

So I only have 15 minutes lag in my batch Hbase table.

I have both Hive ORC tables and Phoenix views on top of this Hbase tables.

   1. Phoenix offers the fastest response if used on top of Hbase.
   unfortunately, Spark 2 with Phoenix is broken
   2. Spark on Hive on Hbase looks OK. This works fine with Spark 2
   3. Spark on Hbase tables directly using key, value DFs for each column.
   Not as fast as 2 but works. I don't think a DF is a good choice for a key,
   value pair?

Now if I use Zeppelin to read from Hbase

   1. I can use Phoenix JDBC. That looks very fast
   2. I can use Spark csv directly on HDFS csv files.
   3. I can use Spark on Hive tables

If I use Tableau on Hbase data then, only sql like code is useful. Phoenix
or Hive

I don't want to change the design now. But admittedly Hive is the best SQL
on top of Hbase. Next release of Hive is going to have in-memory database
(LLAP) so we can cache Hive tables in memory. That will be faster. Many
people underestimate Hive but I still believe it has a lot to offer besides
serious ANSI compliant SQL.

Regards

 Mich

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 17 October 2016 at 21:54, Michael Segel 
wrote:

> Mitch,
>
> Short answer… no, it doesn’t scale.
>
> Longer answer…
>
> You are using an UUID as the row key?  Why?  (My guess is that you want to
> avoid hot spotting)
>
> So you’re going to have to pull in all of the data… meaning a full table
> scan… and then perform a sort order transformation, dropping the UUID in
> the process.
>
> You would be better off not using HBase and storing the data in Parquet
> files in a directory partitioned on date.  Or rather the rowkey would be
> the max_ts - TS so that your data is in LIFO.
> Note: I’ve used the term epoch to describe the max value of a long (8
> bytes of ‘FF’ ) for the max_ts. This isn’t a good use of the term epoch,
> but if anyone has a better term, please let me know.
>
>
>
> Having said that… if you want to use HBase, you could do the same thing.
> If you want to avoid hot spotting, you could load the day’s transactions
> using a bulk loader so that you don’t have to worry about splits.
>
> But that’s just my $0.02 cents worth.
>
> HTH
>
> -Mike
>
> PS. If you wanted to capture the transactions… you could do the following
> schemea:
>
> 1) Rowkey = max_ts - TS
> 2) Rows contain the following:
> CUSIP (Transaction ID)
> Party 1 (Seller)
> Party 2 (Buyer)
> Symbol
> Qty
> Price
>
> This is a trade ticket.
>
>
>
> On Oct 16, 2016, at 1:37 PM, Mich Talebzadeh 
> wrote:
>
> Hi,
>
> I have trade data stored in Hbase table. Data arrives in csv format to
> HDFS and then loaded into Hbase via periodic load with
> org.apache.hadoop.hbase.mapreduce.ImportTsv.
>
> The Hbase table has one Column family "trade_info" and three columns:
> ticker, timecreated, price.
>
> The RowKey is UUID. So each row has UUID, ticker, timecreated and price in
> the csv file
>
> Each row in Hbase is a key, value map. In my case, I have one Column
> Family and three columns. Without going into semantics I see Hbase as a
> column oriented database where column data stay together.
>
> So I thought of this way of accessing the data.
>
> I define an RDD for each column in the column family as below. In this
> case colu

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-16 Thread Mich Talebzadeh

Thanks Ted.

I have seen that before, but sounds like breaking a nut with sledgehammer.

It should be a simpler than that.

Regards


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 16 October 2016 at 19:45, Ted Yu  wrote:

> Please take a look at
> http://hbase.apache.org/book.html#_language_integrated_query
>
> The above is based on hbase-spark module which is part of the upcoming
> hbase 2.0 release.
>
> Cheers
>
> On Sun, Oct 16, 2016 at 11:37 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com
> > wrote:
>
> > Hi,
> >
> > I have trade data stored in Hbase table. Data arrives in csv format to
> HDFS
> > and then loaded into Hbase via periodic load with
> > org.apache.hadoop.hbase.mapreduce.ImportTsv.
> >
> > The Hbase table has one Column family "trade_info" and three columns:
> > ticker, timecreated, price.
> >
> > The RowKey is UUID. So each row has UUID, ticker, timecreated and price
> in
> > the csv file
> >
> > Each row in Hbase is a key, value map. In my case, I have one Column
> Family
> > and three columns. Without going into semantics I see Hbase as a column
> > oriented database where column data stay together.
> >
> > So I thought of this way of accessing the data.
> >
> > I define an RDD for each column in the column family as below. In this
> case
> > column trade_info:ticker
> >
> > //create rdd
> > val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.
> > ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
> > val rdd1 = hBaseRDD.map(tuple => tuple._2).map(result => (result.getRow,
> > result.getColumn("price_info".getBytes(), "ticker".getBytes(.map(row
> > =>
> > {
> > (
> >   row._1.map(_.toChar).mkString,
> >   row._2.asScala.reduceLeft {
> > (a, b) => if (a.getTimestamp > b.getTimestamp) a else b
> >   }.getValue.map(_.toChar).mkString
> > )
> > })
> > case class columns (key: String, ticker: String)
> > val dfticker = rdd1.toDF.map(p => columns(p(0).toString,p(1).toString))
> >
> > Note that the end result is a DataFrame with the RowKey -> key and column
> > -> ticker
> >
> > I use the same approach to create two other DataFrames, namely
> > dftimecreated
> > and dfprice for the two other columns.
> >
> > Note that if I don't need a column, then I do not create a DF for it. So
> a
> > DF with each column I use. I am not sure how this compares if I read the
> > full row through other methods if any.
> >
> > Anyway all I need to do after creating a DataFrame for each column is to
> > join themthrough RowKey to slice and dice data. Like below.
> >
> > Get me the latest prices ordered by timecreated and ticker (ticker is
> > stock)
> >
> > val rs =
> > dfticker.join(dftimecreated,"key").join(dfprice,"key").
> > orderBy('timecreated
> > desc, 'price desc).select('timecreated, 'ticker,
> > 'price.cast("Float").as("Latest price"))
> > rs.show(10)
> >
> > +---+--++
> > |timecreated|ticker|Latest price|
> > +---+--++
> > |2016-10-16T18:44:57|   S16|   97.631966|
> > |2016-10-16T18:44:57|   S13|92.11406|
> > |2016-10-16T18:44:57|   S19|85.93021|
> > |2016-10-16T18:44:57|   S09|   85.714645|
> > |2016-10-16T18:44:57|   S15|82.38932|
> > |2016-10-16T18:44:57|   S17|80.77747|
> > |2016-10-16T18:44:57|   S06|79.81854|
> > |2016-10-16T18:44:57|   S18|74.10128|
> > |2016-10-16T18:44:57|   S07|66.13622|
> > |2016-10-16T18:44:57|   S20|60.35727|
> > +---+--++
> > only showing top 10 rows
> >
> > Is this a workable solution?
> >
> > Thanks
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
>

Accessing Hbase tables through Spark, this seems to work

2016-10-16 Thread Mich Talebzadeh

Hi,

I have trade data stored in Hbase table. Data arrives in csv format to HDFS
and then loaded into Hbase via periodic load with
org.apache.hadoop.hbase.mapreduce.ImportTsv.

The Hbase table has one Column family "trade_info" and three columns:
ticker, timecreated, price.

The RowKey is UUID. So each row has UUID, ticker, timecreated and price in
the csv file

Each row in Hbase is a key, value map. In my case, I have one Column Family
and three columns. Without going into semantics I see Hbase as a column
oriented database where column data stay together.

So I thought of this way of accessing the data.

I define an RDD for each column in the column family as below. In this case
column trade_info:ticker

//create rdd
val hBaseRDD = sc.newAPIHadoopRDD(conf,
classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
val rdd1 = hBaseRDD.map(tuple => tuple._2).map(result => (result.getRow,
result.getColumn("price_info".getBytes(), "ticker".getBytes(.map(row =>
{
(
  row._1.map(_.toChar).mkString,
  row._2.asScala.reduceLeft {
(a, b) => if (a.getTimestamp > b.getTimestamp) a else b
  }.getValue.map(_.toChar).mkString
)
})
case class columns (key: String, ticker: String)
val dfticker = rdd1.toDF.map(p => columns(p(0).toString,p(1).toString))

Note that the end result is a DataFrame with the RowKey -> key and column
-> ticker

I use the same approach to create two other DataFrames, namely dftimecreated
and dfprice for the two other columns.

Note that if I don't need a column, then I do not create a DF for it. So a
DF with each column I use. I am not sure how this compares if I read the
full row through other methods if any.

Anyway all I need to do after creating a DataFrame for each column is to
join themthrough RowKey to slice and dice data. Like below.

Get me the latest prices ordered by timecreated and ticker (ticker is stock)

val rs =
dfticker.join(dftimecreated,"key").join(dfprice,"key").orderBy('timecreated
desc, 'price desc).select('timecreated, 'ticker,
'price.cast("Float").as("Latest price"))
rs.show(10)

+---+--++
|timecreated|ticker|Latest price|
+---+--++
|2016-10-16T18:44:57|   S16|   97.631966|
|2016-10-16T18:44:57|   S13|92.11406|
|2016-10-16T18:44:57|   S19|85.93021|
|2016-10-16T18:44:57|   S09|   85.714645|
|2016-10-16T18:44:57|   S15|82.38932|
|2016-10-16T18:44:57|   S17|80.77747|
|2016-10-16T18:44:57|   S06|79.81854|
|2016-10-16T18:44:57|   S18|74.10128|
|2016-10-16T18:44:57|   S07|66.13622|
|2016-10-16T18:44:57|   S20|    60.35727|
+---+--++
only showing top 10 rows

Is this a workable solution?

Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh

I have already done it with Hive and Phoenix thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:58, Ted Yu  wrote:

> In that case I suggest polling user@hive to see if someone has done this.
>
> Thanks
>
> On Mon, Oct 10, 2016 at 2:56 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks I am on Spark 2 so may not be feasible.
> >
> > As a mater of interest how about using Hive on top of Hbase table?
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 10 October 2016 at 22:49, Ted Yu  wrote:
> >
> > > In hbase master branch, there is hbase-spark module which would allow
> you
> > > to integrate with Spark seamlessly.
> > >
> > > Note: support for Spark 2.0 is pending. For details, see HBASE-16179
> > >
> > > Cheers
> > >
> > > On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ted,
> > > >
> > > > So basically involves Java programming much like JDBC connection
> > > retrieval
> > > > etc.
> > > >
> > > > Writing to Hbase is pretty fast. Now I have both views in Phoenix and
> > > Hive
> > > > on the underlying Hbase tables.
> > > >
> > > > I am looking for flexibility here so I get I should use Spark on Hive
> > > > tables with a view on Hbase table.
> > > >
> > > > Also I like tools like Zeppelin that work with both SQL and Spark
> > > > Functional programming.
> > > >
> > > > Sounds like reading data from Hbase table is best done through some
> > form
> > > of
> > > > SQL.
> > > >
> > > > What are view on this approach?
> > > >
> > > >
> > > >
> > > > Dr Mich Talebzadeh
> > > >
> > > >
> > > >
> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > > > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > > <https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCd
> > > > OABUrV8Pw>*
> > > >
> > > >
> > > >
> > > > http://talebzadehmich.wordpress.com
> > > >
> > > >
> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > > > loss, damage or destruction of data or any other property which may
> > arise
> > > > from relying on this email's technical content is explicitly
> > disclaimed.
> > > > The author will in no case be liable for any monetary damages arising
> > > from
> > > > such loss, damage or destruction.
> > > >
> > > >
> > > >
> > > > On 10 October 2016 at 22:13, Ted Yu  wrote:
> > > >
> > > > > For org.apache.hadoop.hbase.client.Result, there is this method:
> > > > >
> > > > >   public byte[] getValue(byte [] family, byte [] qualifier) {
> > > > >
> > > > > which allows you to retrieve value for designated column.
> > > > >
> > > > >
> > > > > FYI
> > > > >
> > > > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
&

Re: reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh

Thanks I am on Spark 2 so may not be feasible.

As a mater of interest how about using Hive on top of Hbase table?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:49, Ted Yu  wrote:

> In hbase master branch, there is hbase-spark module which would allow you
> to integrate with Spark seamlessly.
>
> Note: support for Spark 2.0 is pending. For details, see HBASE-16179
>
> Cheers
>
> On Mon, Oct 10, 2016 at 2:46 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Thanks Ted,
> >
> > So basically involves Java programming much like JDBC connection
> retrieval
> > etc.
> >
> > Writing to Hbase is pretty fast. Now I have both views in Phoenix and
> Hive
> > on the underlying Hbase tables.
> >
> > I am looking for flexibility here so I get I should use Spark on Hive
> > tables with a view on Hbase table.
> >
> > Also I like tools like Zeppelin that work with both SQL and Spark
> > Functional programming.
> >
> > Sounds like reading data from Hbase table is best done through some form
> of
> > SQL.
> >
> > What are view on this approach?
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 10 October 2016 at 22:13, Ted Yu  wrote:
> >
> > > For org.apache.hadoop.hbase.client.Result, there is this method:
> > >
> > >   public byte[] getValue(byte [] family, byte [] qualifier) {
> > >
> > > which allows you to retrieve value for designated column.
> > >
> > >
> > > FYI
> > >
> > > On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> > > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to do some operation on an Hbase table that is being
> > > populated
> > > > by Spark Streaming.
> > > >
> > > > Now this is just Spark on Hbase as opposed to Spark on Hive -> view
> on
> > > > Hbase etc. I also have Phoenix view on this Hbase table.
> > > >
> > > > This is sample code
> > > >
> > > > scala> val tableName = "marketDataHbase"
> > > > > val conf = HBaseConfiguration.create()
> > > > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > > > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > > > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > > > hbase-default.xml, hbase-site.xml
> > > > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > > > scala> //create rdd
> > > > scala>
> > > > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > > > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > > > <http://hbase.io>.ImmutableBytesWritable],classOf[org.apache.hadoop.
> > > > hbase.client.Result])*hBaseRDD:
> > > > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > > > ImmutableBytesWritable,
> > > > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > > > newAPIHadoopRDD at :64
> > > > scala> hBaseRDD.count
> > > > res11: Long = 22272
> > > >
> > > > scala> // transform (ImmutableBytesWritable, Result) tuples into
> an
> > > RD

Re: reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh

Thanks Ted,

So basically involves Java programming much like JDBC connection retrieval
etc.

Writing to Hbase is pretty fast. Now I have both views in Phoenix and Hive
on the underlying Hbase tables.

I am looking for flexibility here so I get I should use Spark on Hive
tables with a view on Hbase table.

Also I like tools like Zeppelin that work with both SQL and Spark
Functional programming.

Sounds like reading data from Hbase table is best done through some form of
SQL.

What are view on this approach?



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 10 October 2016 at 22:13, Ted Yu  wrote:

> For org.apache.hadoop.hbase.client.Result, there is this method:
>
>   public byte[] getValue(byte [] family, byte [] qualifier) {
>
> which allows you to retrieve value for designated column.
>
>
> FYI
>
> On Mon, Oct 10, 2016 at 2:08 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do some operation on an Hbase table that is being
> populated
> > by Spark Streaming.
> >
> > Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
> > Hbase etc. I also have Phoenix view on this Hbase table.
> >
> > This is sample code
> >
> > scala> val tableName = "marketDataHbase"
> > > val conf = HBaseConfiguration.create()
> > conf: org.apache.hadoop.conf.Configuration = Configuration:
> > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
> > hbase-default.xml, hbase-site.xml
> > scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
> > scala> //create rdd
> > scala>
> > *val hBaseRDD = sc.newAPIHadoopRDD(conf,
> > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
> > <http://hbase.io>.ImmutableBytesWritable],classOf[org.apache.hadoop.
> > hbase.client.Result])*hBaseRDD:
> > org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.
> > ImmutableBytesWritable,
> > org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
> > newAPIHadoopRDD at :64
> > scala> hBaseRDD.count
> > res11: Long = 22272
> >
> > scala> // transform (ImmutableBytesWritable, Result) tuples into an
> RDD
> > of Result's
> > scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
> > resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.
> Result]
> > = MapPartitionsRDD[8] at map at :41
> >
> > scala>  // transform into an RDD of (RowKey, ColumnValue)s  the RowKey
> has
> > the time removed
> >
> > scala> val keyValueRDD = resultRDD.map(result =>
> > (Bytes.toString(result.getRow()).split(" ")(0),
> > Bytes.toString(result.value)))
> > keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
> > MapPartitionsRDD[9] at map at :43
> >
> > scala> keyValueRDD.take(2).foreach(kv => println(kv))
> > (55e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
> > (000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)
> >
> > OK above I am only getting the rowkey (UUID above) and the last
> > attribute (price).
> > However, I have the rowkey and 3 more columns there in Hbase table!
> >
> > scan 'marketDataHbase', "LIMIT" => 1
> > ROW   COLUMN+CELL
> >  55e2-63f1-4def-b625-e73f0ac36271
> > column=price_info:price, timestamp=1476133232864,
> > value=43.89760813529593664528
> >  55e2-63f1-4def-b625-e73f0ac36271
> > column=price_info:ticker, timestamp=1476133232864, value=S08
> >  55e2-63f1-4def-b625-e73f0ac36271
> > column=price_info:timecreated, timestamp=1476133232864,
> > value=2016-10-10T17:12:22
> > 1 row(s) in 0.0100 seconds
> > So how can I get the other columns?
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
>

reading Hbase table in Spark

2016-10-10 Thread Mich Talebzadeh

Hi,

I am trying to do some operation on an Hbase table that is being populated
by Spark Streaming.

Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
Hbase etc. I also have Phoenix view on this Hbase table.

This is sample code

scala> val tableName = "marketDataHbase"
> val conf = HBaseConfiguration.create()
conf: org.apache.hadoop.conf.Configuration = Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
hbase-default.xml, hbase-site.xml
scala> conf.set(TableInputFormat.INPUT_TABLE, tableName)
scala> //create rdd
scala>
*val hBaseRDD = sc.newAPIHadoopRDD(conf,
classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io
<http://hbase.io>.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])*hBaseRDD:
org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.ImmutableBytesWritable,
org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
newAPIHadoopRDD at :64
scala> hBaseRDD.count
res11: Long = 22272

scala> // transform (ImmutableBytesWritable, Result) tuples into an RDD
of Result's
scala> val resultRDD = hBaseRDD.map(tuple => tuple._2)
resultRDD: org.apache.spark.rdd.RDD[org.apache.hadoop.hbase.client.Result]
= MapPartitionsRDD[8] at map at :41

scala>  // transform into an RDD of (RowKey, ColumnValue)s  the RowKey has
the time removed

scala> val keyValueRDD = resultRDD.map(result =>
(Bytes.toString(result.getRow()).split(" ")(0),
Bytes.toString(result.value)))
keyValueRDD: org.apache.spark.rdd.RDD[(String, String)] =
MapPartitionsRDD[9] at map at :43

scala> keyValueRDD.take(2).foreach(kv => println(kv))
(55e2-63f1-4def-b625-e73f0ac36271,43.89760813529593664528)
(000151e9-ff27-493d-a5ca-288507d92f95,57.68882040742382868990)

OK above I am only getting the rowkey (UUID above) and the last
attribute (price).
However, I have the rowkey and 3 more columns there in Hbase table!

scan 'marketDataHbase', "LIMIT" => 1
ROW   COLUMN+CELL
 55e2-63f1-4def-b625-e73f0ac36271
column=price_info:price, timestamp=1476133232864,
value=43.89760813529593664528
 55e2-63f1-4def-b625-e73f0ac36271
column=price_info:ticker, timestamp=1476133232864, value=S08
 55e2-63f1-4def-b625-e73f0ac36271
column=price_info:timecreated, timestamp=1476133232864,
value=2016-10-10T17:12:22
1 row(s) in 0.0100 seconds
So how can I get the other columns?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: where clause on Phoenix view built on Hbase table throws error

2016-10-05 Thread Mich Talebzadeh

Thanks John.

0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" where
"Date" = '1-Apr-08';
+---+---+
|   Date|  volume   |
+---+---+
| 1-Apr-08  | 49664486  |
+---+---+
1 row selected (0.016 seconds)

BTW I believe double quotes in enclosing phoenix column names are needed
for case sensitivity on Hbase?


Also does Phoenix have type conversion from VARCHAR to integer etc? Is
there such document

Regards




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 October 2016 at 15:24, John Leach  wrote:

>
> Remove the double quotes and try single quote.  Double quotes refers to an
> identifier…
>
> Cheers,
> John Leach
>
> > On Oct 5, 2016, at 9:21 AM, Mich Talebzadeh 
> wrote:
> >
> > Hi,
> >
> > I have this Hbase table already populated
> >
> > create 'tsco','stock_daily'
> >
> > and populated using
> > $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> > -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
> > stock_info:stock,stock_info:ticker,stock_daily:Date,stock_
> daily:open,stock_daily:high,stock_daily:low,stock_daily:
> close,stock_daily:volume"
> > tsco hdfs://rhes564:9000/data/stocks/tsco.csv
> > This works OK. In Hbase I have
> >
> > hbase(main):176:0> scan 'tsco', LIMIT => 1
> > ROWCOLUMN+CELL
> > TSCO-1-Apr-08
> > column=stock_daily:Date, timestamp=1475525222488, value=1-Apr-08
> > TSCO-1-Apr-08
> > column=stock_daily:close, timestamp=1475525222488, value=405.25
> > TSCO-1-Apr-08
> > column=stock_daily:high, timestamp=1475525222488, value=406.75
> > TSCO-1-Apr-08
> > column=stock_daily:low, timestamp=1475525222488, value=379.25
> > TSCO-1-Apr-08
> > column=stock_daily:open, timestamp=1475525222488, value=380.00
> > TSCO-1-Apr-08
> > column=stock_daily:stock, timestamp=1475525222488, value=TESCO PLC
> > TSCO-1-Apr-08
> > column=stock_daily:ticker, timestamp=1475525222488, value=TSCO
> > TSCO-1-Apr-08
> > column=stock_daily:volume, timestamp=1475525222488, value=49664486
> >
> > In Phoenix I have a view "tsco" created on Hbase table as follows:
> >
> > 0: jdbc:phoenix:rhes564:2181> create view "tsco" (PK VARCHAR PRIMARY KEY,
> > "stock_daily"."Date" VARCHAR, "stock_daily"."close" VARCHAR,
> > "stock_daily"."high" VARCHAR, "stock_daily"."low" VARCHAR,
> > "stock_daily"."open" VARCHAR, "stock_daily"."ticker" VARCHAR,
> > "stock_daily"."stock" VARCHAR, "stock_daily"."volume" VARCHAR)
> >
> > So all good.
> >
> > This works
> >
> > 0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" limit 2;
> > +---+---+
> > |   Date|  volume   |
> > +---+---+
> > | 1-Apr-08  | 49664486  |
> > | 1-Apr-09  | 24877341  |
> > +---+---+
> > 2 rows selected (0.011 seconds)
> >
> > However, I don't seem to be able to use where clause!
> >
> > 0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" where
> > "Date" = "1-Apr-08";
> > Error: ERROR 504 (42703): Undefined column. columnName=1-Apr-08
> > (state=42703,code=504)
> > org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
> > Undefined column. columnName=1-Apr-08
> >
> > Why does it think a predicate "1-Apr-08" is a column.
> >
> > Any ideas?
> >
> > Thanks
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
>
>

where clause on Phoenix view built on Hbase table throws error

2016-10-05 Thread Mich Talebzadeh

Hi,

I have this Hbase table already populated

 create 'tsco','stock_daily'

and populated using
$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
stock_info:stock,stock_info:ticker,stock_daily:Date,stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stock_daily:volume"
tsco hdfs://rhes564:9000/data/stocks/tsco.csv
This works OK. In Hbase I have

hbase(main):176:0> scan 'tsco', LIMIT => 1
ROWCOLUMN+CELL
 TSCO-1-Apr-08
column=stock_daily:Date, timestamp=1475525222488, value=1-Apr-08
 TSCO-1-Apr-08
column=stock_daily:close, timestamp=1475525222488, value=405.25
 TSCO-1-Apr-08
column=stock_daily:high, timestamp=1475525222488, value=406.75
 TSCO-1-Apr-08
column=stock_daily:low, timestamp=1475525222488, value=379.25
 TSCO-1-Apr-08
column=stock_daily:open, timestamp=1475525222488, value=380.00
 TSCO-1-Apr-08
column=stock_daily:stock, timestamp=1475525222488, value=TESCO PLC
 TSCO-1-Apr-08
column=stock_daily:ticker, timestamp=1475525222488, value=TSCO
 TSCO-1-Apr-08
column=stock_daily:volume, timestamp=1475525222488, value=49664486

In Phoenix I have a view "tsco" created on Hbase table as follows:

0: jdbc:phoenix:rhes564:2181> create view "tsco" (PK VARCHAR PRIMARY KEY,
"stock_daily"."Date" VARCHAR, "stock_daily"."close" VARCHAR,
"stock_daily"."high" VARCHAR, "stock_daily"."low" VARCHAR,
"stock_daily"."open" VARCHAR, "stock_daily"."ticker" VARCHAR,
"stock_daily"."stock" VARCHAR, "stock_daily"."volume" VARCHAR)

So all good.

This works

0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" limit 2;
+---+---+
|   Date|  volume   |
+---+---+
| 1-Apr-08  | 49664486  |
| 1-Apr-09  | 24877341  |
+---+---+
2 rows selected (0.011 seconds)

However, I don't seem to be able to use where clause!

0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" where
"Date" = "1-Apr-08";
Error: ERROR 504 (42703): Undefined column. columnName=1-Apr-08
(state=42703,code=504)
org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
Undefined column. columnName=1-Apr-08

Why does it think a predicate "1-Apr-08" is a column.

Any ideas?

Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Loading into hbase from csv file issue

2016-10-04 Thread Mich Talebzadeh

Thanks again.

If I wanted to store TSCO for a row and not bother for the rest of the rows
how will it work for the row key.

Currently this is trhe way table tsco is defined:

 create 'tsco','stock_daily'

and this is the attributes of stock_daily fc

hbase(main):144:0* scan 'tsco', LIMIT => 1
ROWCOLUMN+CELL
 TSCO-1-Apr-08
column=stock_daily:Date, timestamp=1475525222488, value=1-Apr-08
 TSCO-1-Apr-08
column=stock_daily:close, timestamp=1475525222488, value=405.25
 TSCO-1-Apr-08
column=stock_daily:high, timestamp=1475525222488, value=406.75
 TSCO-1-Apr-08
column=stock_daily:low, timestamp=1475525222488, value=379.25
 TSCO-1-Apr-08
column=stock_daily:open, timestamp=1475525222488, value=380.00
 TSCO-1-Apr-08
column=stock_daily:stock, timestamp=1475525222488, value=TESCO PLC
 TSCO-1-Apr-08
column=stock_daily:ticker, timestamp=1475525222488, value=TSCO
 TSCO-1-Apr-08
column=stock_daily:volume, timestamp=1475525222488, value=49664486

Note that column=stock_daily:stock and column=stock_daily:ticker is
repeated in every row. That may not be efficient?

Kindly suggest the best way of creating row key and whether it is necessary
to store those above columns?

regards












Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 October 2016 at 01:53, Jean-Marc Spaggiari 
wrote:

> Hi Mich,
>
> that's better already, but now you have to think about the read pattern.
> How do you want to read this data? Are you going to read just one column at
> a time? Like reading stock_daily:high without reading stock_daily:close? If
> so, fine, keep it that way. But if you mostly read all of them together,
> then why not just keep them together instead of separating them into
> different columns? That way you save the key overhead storage for each new
> column...
>
> Also, I suspect you will have one row per stock per day, right? Does it
> mean you will repeat the stock_info information again and again and again?
> If so, why not just also storing  it once for the row "TSCO" and not repeat
> it for "TSCO-DATE"? That way you store it just one, you have an easy way to
> retrieve it and you can safe one column family?
>
> HTH,
>
> JMS
>
> 2016-10-03 11:16 GMT-04:00 Mich Talebzadeh :
>
> > Hi Jean-Marc
> >
> > I decided to create a composite key *ticker-date* from the csv file
> >
> > I just did some manipulation on CSV file
> >
> > export IFS=",";sed -i 1d tsco.csv; cat tsco.csv | while read a b c d e f;
> > do echo "TSCO-$a,TESCO PLC,TSCO,$a,$b,$c,$d,$e,$f"; done > temp; mv -f
> temp
> > tsco.csv
> >
> > Which basically takes the csv file, tells the shell that field separator
> > IFS=",", drops the header, reads every field in every line (1,b,c ..),
> > creates the composite key TSCO-$a, adds the stock name and ticker to the
> > csv file. The whole process can be automated and parameterised.
> >
> > Once the csv file is put into HDFS then, I run the following command
> >
> > $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> > -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW
> > _KEY,stock_info:stock,stock_info:ticker,stock_daily:Date,sto
> > ck_daily:open,stock_daily:high,stock_daily:low,stock_daily:
> > close,stock_daily:volume" tsco hdfs://rhes564:9000/data/stocks/tsco.csv
> >
> > The Hbase table is created as below
> >
> > create 'tsco','stock_info','stock_daily'
> >
> > and this is the data (2 rows each 2 family and with 8 attributes)
> >
> > hbase(main):132:0> scan 'tsco', LIMIT => 2
> > ROWCOLUMN+CELL
> >  TSCO-1-Apr-08
> > column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-08
> >  TSCO-1-Apr-08
> > column=stock_daily:close, timestamp=1475507091676, value=405.25
> >  TSCO-1-Apr-08
> > column=stock_daily:high, timestamp=1475507091676, value=406.75
> >  TSCO-1-Apr-08
> > column=stock_daily:low, timestamp=1475507091676, value=379.25
> >  TSCO-1-Apr-08
> > column=stock_daily:open, timestamp=147

Re: Loading into hbase from csv file issue

2016-10-03 Thread Mich Talebzadeh

Hi Jean-Marc

I decided to create a composite key *ticker-date* from the csv file

I just did some manipulation on CSV file

export IFS=",";sed -i 1d tsco.csv; cat tsco.csv | while read a b c d e f;
do echo "TSCO-$a,TESCO PLC,TSCO,$a,$b,$c,$d,$e,$f"; done > temp; mv -f temp
tsco.csv

Which basically takes the csv file, tells the shell that field separator
IFS=",", drops the header, reads every field in every line (1,b,c ..),
creates the composite key TSCO-$a, adds the stock name and ticker to the
csv file. The whole process can be automated and parameterised.

Once the csv file is put into HDFS then, I run the following command

$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
-Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW
_KEY,stock_info:stock,stock_info:ticker,stock_daily:Date,sto
ck_daily:open,stock_daily:high,stock_daily:low,stock_daily:
close,stock_daily:volume" tsco hdfs://rhes564:9000/data/stocks/tsco.csv

The Hbase table is created as below

create 'tsco','stock_info','stock_daily'

and this is the data (2 rows each 2 family and with 8 attributes)

hbase(main):132:0> scan 'tsco', LIMIT => 2
ROWCOLUMN+CELL
 TSCO-1-Apr-08
column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-08
 TSCO-1-Apr-08
column=stock_daily:close, timestamp=1475507091676, value=405.25
 TSCO-1-Apr-08
column=stock_daily:high, timestamp=1475507091676, value=406.75
 TSCO-1-Apr-08
column=stock_daily:low, timestamp=1475507091676, value=379.25
 TSCO-1-Apr-08
column=stock_daily:open, timestamp=1475507091676, value=380.00
 TSCO-1-Apr-08
column=stock_daily:volume, timestamp=1475507091676, value=49664486
 TSCO-1-Apr-08
column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
 TSCO-1-Apr-08
column=stock_info:ticker, timestamp=1475507091676, value=TSCO

 TSCO-1-Apr-09
column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-09
 TSCO-1-Apr-09
column=stock_daily:close, timestamp=1475507091676, value=333.30
 TSCO-1-Apr-09
column=stock_daily:high, timestamp=1475507091676, value=334.60
 TSCO-1-Apr-09
column=stock_daily:low, timestamp=1475507091676, value=326.50
 TSCO-1-Apr-09
column=stock_daily:open, timestamp=1475507091676, value=331.10
 TSCO-1-Apr-09
column=stock_daily:volume, timestamp=1475507091676, value=24877341
 TSCO-1-Apr-09
column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
 TSCO-1-Apr-09
column=stock_info:ticker, timestamp=1475507091676, value=TSCO


What do you think?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 October 2016 at 15:10, Jean-Marc Spaggiari 
wrote:

> Hi Mich,
>
> As you said, it's most probably because it's all the same key... If you
> want to be 200% sure, just alter VERSIONS => '1' to be greater (like, 10)
> and scan all the versions of the cells. You should see the others.
>
> JMS
>
> 2016-10-03 3:41 GMT-04:00 Mich Talebzadeh :
>
> > Hi,
> >
> > when I use the command line utility ImportTsv  to load a file into Hbase
> > with the following table format
> >
> > describe 'marketDataHbase'
> > Table marketDataHbase is ENABLED
> > marketDataHbase
> > COLUMN FAMILIES DESCRIPTION
> > {NAME => 'price_info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY
> =>
> > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE',
> TTL
> > => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKC
> > ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> > 1 row(s) in 0.0930 seconds
> >
> >
> > hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> > -Dimporttsv.separator=','
> > -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker,
> > stock_daily:tradedate, stock_daily:open,stock_daily:
> > high,stock_daily:low,stock_daily:close,stock_daily:volume" tsco
> > hdfs://rhes564:9000/data/stocks/tsco.csv
> >
> > There are with 1200 rows in the csv file,* but it only loads the first
> > row!*
> >
> > scan 'tsco'
> > ROW

Loading into hbase from csv file issue

2016-10-03 Thread Mich Talebzadeh

Hi,

when I use the command line utility ImportTsv  to load a file into Hbase
with the following table format

describe 'marketDataHbase'
Table marketDataHbase is ENABLED
marketDataHbase
COLUMN FAMILIES DESCRIPTION
{NAME => 'price_info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL
=> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0930 seconds


hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=','
-Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker,
stock_daily:tradedate, stock_daily:open,stock_daily:
high,stock_daily:low,stock_daily:close,stock_daily:volume" tsco
hdfs://rhes564:9000/data/stocks/tsco.csv

There are with 1200 rows in the csv file,* but it only loads the first row!*

scan 'tsco'
ROWCOLUMN+CELL
 Tesco PLC
column=stock_daily:close, timestamp=1475447365118, value=325.25
 Tesco PLC
column=stock_daily:high, timestamp=1475447365118, value=332.00
 Tesco PLC
column=stock_daily:low, timestamp=1475447365118, value=324.00
 Tesco PLC
column=stock_daily:open, timestamp=1475447365118, value=331.75
 Tesco PLC
column=stock_daily:ticker, timestamp=1475447365118, value=TSCO
 Tesco PLC
column=stock_daily:tradedate, timestamp=1475447365118, value= 3-Jan-06
 Tesco PLC
column=stock_daily:volume, timestamp=1475447365118, value=46935045
1 row(s) in 0.0390 seconds

Is this because the hbase_row_key --> Tesco PLC is the same for all? I
thought that the row key can be anything.

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

93 matches

Mail list logo