Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
Thanks Alan for detailed explanation.

Please bear in mind that any tool that needs to work with some repository
(Oracle TimesTen IMDB has its metastore on Oracle classic), SAP Replication
Server has its repository RSSD on SAP ASE and others
First thing they do, they go and cache those tables and keep it in memory
of the big brother database until they are shutdown. I reversed engineered
and created Hive data model from physical schema (on Oracle). There are
around 194 tables in total that can be easily cached.

For small medium enterprise (SME), they don't really have much data so
anything will do and they are the ones that use open source databases. For
bigger companies, they already pay bucks for Oracle and alike and they are
the one that would not touch an open source database (not talking about big
data), because in this new capital-sensitive risk-averse world, they do
not want to expose themselves to unnecessary risk.  So I am not sure
whether they will take something like Hbase as a core product, unless it is
going to be maintenance free.

Going back to your point

".. but you have to pay for an expensive commercial license to make the
metadata really work well is a non-starter"

They already do and pay more if they have to. We will stick with Hive
metadata on Oracle with schema on SSD
.

HTH









Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 20:14, Alan Gates  wrote:

> Some thoughts on this:
>
> First, there’s no plan to remove the option to use an RDBMS such as Oracle
> as your backend.  Hive’s RawStore interface is built such that various
> implementations of the metadata storage can easily coexist.  Obviously
> different users will make different choices about what metadata store makes
> sense for them.
>
> As to why HBase:
> 1) We desperately need to get rid of the ORM layer.  It’s causing us
> performance problems, as evidenced by things like it taking several minutes
> to fetch all of the partition data for queries that span many partitions.
> HBase is a way to achieve this, not the only way.  See in particular
> Yahoo’s work on optimizing Oracle access https://issues.apache.org/
> jira/browse/HIVE-14870  The question around this is whether we can
> optimize for Oracle, MySQL, Postgres, and SQLServer without creating a
> maintenance and testing nightmare for ourselves.  I’m skeptical, but others
> think it’s possible.  See comments on that JIRA.
>
> 2) We’d like to scale to much larger sizes, both in terms of data and
> access from nodes.  Not that we’re worried about the amount of metadata,
> but we’d like to be able to cache more stats, file splits, etc.  And we’d
> like to allow nodes in the cluster to contact the metastore, which we do
> not today since many RDBMSs don’t handle a thousand plus simultaneous
> connections well.  Obviously both data and connection scale can be met with
> high end commercial stores.  But saying that we have this great open source
> database but you have to pay for an expensive commercial license to make
> the metadata really work well is a non-starter.
>
> 3) By using tools within the Hadoop ecosystem like HBase we are helping to
> drive improvement in the system
>
> To explain the HBase work a little more, it doesn’t use Phoenix, but works
> directly against HBase, with the help of a transaction manager (Omid).  In
> performance tests we’ve done so far it’s faster than Hive 1 with the ORM
> layer, but not yet to the 10x range that we’d like to see.  We haven’t yet
> done the work to put in co-processors and such that we expect would speed
> it up further.
>
> Alan.
>
> > On Oct 23, 2016, at 15:46, Mich Talebzadeh 
> wrote:
> >
> >
> > A while back there was some notes on having Hive metastore on Hbase as
> opposed to conventional RDBMSs
> >
> > I am currently involved with some hefty work with Hbase and Phoenix for
> batch ingestion of trade data. As long as you define your Hbase table
> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
> impressive.
> >
> > I am not sure how much having Hbase as Hive metastore is going to add to
> Hive performance. We use Oracle 12c as Hive metastore and the Hive
> database/schema is built on solid state disks. Never had any issues with
> lock and concurrency.
> >
> > Therefore I am not sure what one is going to gain by having Hbase as the
> Hive metastore? I trust that we can still use our existing schemas 

Re: Hive metadata on Hbase

2016-10-24 Thread Alan Gates
Some thoughts on this:

First, there’s no plan to remove the option to use an RDBMS such as Oracle as 
your backend.  Hive’s RawStore interface is built such that various 
implementations of the metadata storage can easily coexist.  Obviously 
different users will make different choices about what metadata store makes 
sense for them.

As to why HBase:
1) We desperately need to get rid of the ORM layer.  It’s causing us 
performance problems, as evidenced by things like it taking several minutes to 
fetch all of the partition data for queries that span many partitions.  HBase 
is a way to achieve this, not the only way.  See in particular Yahoo’s work on 
optimizing Oracle access https://issues.apache.org/jira/browse/HIVE-14870  The 
question around this is whether we can optimize for Oracle, MySQL, Postgres, 
and SQLServer without creating a maintenance and testing nightmare for 
ourselves.  I’m skeptical, but others think it’s possible.  See comments on 
that JIRA.

2) We’d like to scale to much larger sizes, both in terms of data and access 
from nodes.  Not that we’re worried about the amount of metadata, but we’d like 
to be able to cache more stats, file splits, etc.  And we’d like to allow nodes 
in the cluster to contact the metastore, which we do not today since many 
RDBMSs don’t handle a thousand plus simultaneous connections well.  Obviously 
both data and connection scale can be met with high end commercial stores.  But 
saying that we have this great open source database but you have to pay for an 
expensive commercial license to make the metadata really work well is a 
non-starter.

3) By using tools within the Hadoop ecosystem like HBase we are helping to 
drive improvement in the system

To explain the HBase work a little more, it doesn’t use Phoenix, but works 
directly against HBase, with the help of a transaction manager (Omid).  In 
performance tests we’ve done so far it’s faster than Hive 1 with the ORM layer, 
but not yet to the 10x range that we’d like to see.  We haven’t yet done the 
work to put in co-processors and such that we expect would speed it up further.

Alan.

> On Oct 23, 2016, at 15:46, Mich Talebzadeh  wrote:
> 
> 
> A while back there was some notes on having Hive metastore on Hbase as 
> opposed to conventional RDBMSs
> 
> I am currently involved with some hefty work with Hbase and Phoenix for batch 
> ingestion of trade data. As long as you define your Hbase table through 
> Phoenix and with secondary Phoenix indexes on Hbase, the speed is impressive.
> 
> I am not sure how much having Hbase as Hive metastore is going to add to Hive 
> performance. We use Oracle 12c as Hive metastore and the Hive database/schema 
> is built on solid state disks. Never had any issues with lock and concurrency.
> 
> Therefore I am not sure what one is going to gain by having Hbase as the Hive 
> metastore? I trust that we can still use our existing schemas on Oracle.
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  



Re: hiveserver2 java heap space

2016-10-24 Thread Patcharee Thongtra

It works on Hive cli

Patcharee

On 10/24/2016 11:51 AM, Mich Talebzadeh wrote:

does this work ok through Hive cli?

Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk.Any and all responsibility for 
any loss, damage or destruction of data or any other property which 
may arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.



On 24 October 2016 at 10:43, Patcharee Thongtra 
> wrote:


Hi,

I tried to query orc file by beeline and java program using jdbc
("select * from orcfileTable limit 1"). Both failed with Caused
by: java.lang.OutOfMemoryError: Java heap space. Hiveserver2 heap
size is 1024m. I  guess I need to increase this Hiveserver2 heap
size? However I wonder why I got this error because I query just
ONE line. Any ideas?

Thanks,

Patcharee







Re: hiveserver2 java heap space

2016-10-24 Thread Mich Talebzadeh
does this work ok through Hive cli?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 10:43, Patcharee Thongtra 
wrote:

> Hi,
>
> I tried to query orc file by beeline and java program using jdbc ("select
> * from orcfileTable limit 1"). Both failed with Caused by:
> java.lang.OutOfMemoryError: Java heap space. Hiveserver2 heap size is
> 1024m. I  guess I need to increase this Hiveserver2 heap size? However I
> wonder why I got this error because I query just ONE line. Any ideas?
>
> Thanks,
>
> Patcharee
>
>
>


hiveserver2 java heap space

2016-10-24 Thread Patcharee Thongtra

Hi,

I tried to query orc file by beeline and java program using jdbc 
("select * from orcfileTable limit 1"). Both failed with Caused by: 
java.lang.OutOfMemoryError: Java heap space. Hiveserver2 heap size is 
1024m. I  guess I need to increase this Hiveserver2 heap size? However I 
wonder why I got this error because I query just ONE line. Any ideas?


Thanks,

Patcharee




Rest API Apache Hive

2016-10-24 Thread Hardika Catur S

Hi,

I try create Rest API Apache Hive, but but I find it difficult to make. 
My documentation from the link https://www.npmjs.com/package/jshs2.

Has anyone tried making of these or other documentation based node js??
Please help me to find a solution.

Thanks,
Hardika CS.


Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
Hi Furcy,

Thanks for updates.

transactional tables creates issue for us. When many updates are done they
create many delta files that require compaction.

This by itself is not an issue for Hive. However, Spark fails to read these
delta files so the job crashes.

Regards,

Mich

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 08:39, Furcy Pin  wrote:

> Hi Mich,
>
> the umbrella JIRA for this gives a few reason.
> https://issues.apache.org/jira/browse/HIVE-9452
> (with even more details in the attached pdf https://issues.apache.org/
> jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf)
>
> In my experience, Hive tables with a lot of partitions (> 10 000) may
> become really slow, especially with Spark.
> The latency induced by the metastore can be really big compared to the
> whole duration of the query itself,
> because the driver needs to fetch a lot of info about partitions just to
> optimize the query, before even running it.
>
> I guess another advantage is that using a RDBMS as metastore makes it a
> SPOF, unless you setup replication etc. while, HBase would give HA for free.
>
>
>
> On Mon, Oct 24, 2016 at 9:06 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> @Per
>>
>> We run full transactional enabled Hive metadb on an Oracle DB.
>>
>> I don't have statistics now but will collect from AWR reports no problem.
>>
>> @Jorn,
>>
>> The primary reason Oracle was chosen is because the company has global
>> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
>> Grade databases.
>>
>> None of MySQL and others are classified as such so they cannot be
>> deployed in production.
>>
>> Besides, for us to have Hive metadata on Oracle makes sense as our
>> infrastructure does all the support, HA etc for it and they have trained
>> DBAs to look after it 24x7.
>>
>> Admittedly we are now relying on HDFS itself plus Hbase as well for
>> persistent storage. So the situation might change.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2016 at 06:46, Per Ullberg  wrote:
>>
>>> I thought the main gain was to get ACID on Hive performant enough.
>>>
>>> @Mich: Do you run with ACID-enabled tables? How many
>>> Create/Update/Deletes do you do per second?
>>>
>>> best regards
>>> /Pelle
>>>
>>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke 
>>> wrote:
>>>
 I think the main gain is more about getting rid of a dedicated database
 including maintenance and potential license cost.
 For really large clusters and a lot of users this might be even more
 beneficial. You can avoid clustering the database etc.

 On 24 Oct 2016, at 00:46, Mich Talebzadeh 
 wrote:


 A while back there was some notes on having Hive metastore on Hbase as
 opposed to conventional RDBMSs

 I am currently involved with some hefty work with Hbase and Phoenix for
 batch ingestion of trade data. As long as you define your Hbase table
 through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
 impressive.

 I am not sure how much having Hbase as Hive metastore is going to add
 to Hive performance. We use Oracle 12c as Hive metastore and the Hive
 database/schema is built on solid state disks. Never had any issues with
 lock and concurrency.

 Therefore I am not sure what one is going to gain by having Hbase as
 the Hive metastore? I trust that we can still use our existing schemas on
 Oracle.

 HTH



 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



Re: Hive metadata on Hbase

2016-10-24 Thread Furcy Pin
Hi Mich,

the umbrella JIRA for this gives a few reason.
https://issues.apache.org/jira/browse/HIVE-9452
(with even more details in the attached pdf
https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf
)

In my experience, Hive tables with a lot of partitions (> 10 000) may
become really slow, especially with Spark.
The latency induced by the metastore can be really big compared to the
whole duration of the query itself,
because the driver needs to fetch a lot of info about partitions just to
optimize the query, before even running it.

I guess another advantage is that using a RDBMS as metastore makes it a
SPOF, unless you setup replication etc. while, HBase would give HA for free.



On Mon, Oct 24, 2016 at 9:06 AM, Mich Talebzadeh 
wrote:

> @Per
>
> We run full transactional enabled Hive metadb on an Oracle DB.
>
> I don't have statistics now but will collect from AWR reports no problem.
>
> @Jorn,
>
> The primary reason Oracle was chosen is because the company has global
> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
> Grade databases.
>
> None of MySQL and others are classified as such so they cannot be deployed
> in production.
>
> Besides, for us to have Hive metadata on Oracle makes sense as our
> infrastructure does all the support, HA etc for it and they have trained
> DBAs to look after it 24x7.
>
> Admittedly we are now relying on HDFS itself plus Hbase as well for
> persistent storage. So the situation might change.
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 06:46, Per Ullberg  wrote:
>
>> I thought the main gain was to get ACID on Hive performant enough.
>>
>> @Mich: Do you run with ACID-enabled tables? How many
>> Create/Update/Deletes do you do per second?
>>
>> best regards
>> /Pelle
>>
>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke 
>> wrote:
>>
>>> I think the main gain is more about getting rid of a dedicated database
>>> including maintenance and potential license cost.
>>> For really large clusters and a lot of users this might be even more
>>> beneficial. You can avoid clustering the database etc.
>>>
>>> On 24 Oct 2016, at 00:46, Mich Talebzadeh 
>>> wrote:
>>>
>>>
>>> A while back there was some notes on having Hive metastore on Hbase as
>>> opposed to conventional RDBMSs
>>>
>>> I am currently involved with some hefty work with Hbase and Phoenix for
>>> batch ingestion of trade data. As long as you define your Hbase table
>>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>>> impressive.
>>>
>>> I am not sure how much having Hbase as Hive metastore is going to add to
>>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>>> database/schema is built on solid state disks. Never had any issues with
>>> lock and concurrency.
>>>
>>> Therefore I am not sure what one is going to gain by having Hbase as the
>>> Hive metastore? I trust that we can still use our existing schemas on
>>> Oracle.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Per Ullberg*
>> Data Vault Tech Lead
>> Odin Uppsala
>> +46 701612693 <+46+701612693>
>>
>> Klarna AB (publ)
>> Sveavägen 46, 111 34 Stockholm
>> Tel: +46 8 120 120 00 <+46812012000>
>> Reg no: 556737-0431
>> klarna.com
>>
>>
>


Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
Hive 2.0.1
Subversion git://reznor-mbp-2.local/Users/sergey/git/hivegit -r
e3cfeebcefe9a19c5055afdcbb00646908340694
Compiled by sergey on Tue May 3 21:03:11 PDT 2016
>From source with checksum 5a49522e4b572555dbbe5dd4773bc7c2

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 08:29, Per Ullberg  wrote:

> What version of hive are you running?
>
> /Pelle
>
>
> On Monday, October 24, 2016, Mich Talebzadeh 
> wrote:
>
>> @Per
>>
>> We run full transactional enabled Hive metadb on an Oracle DB.
>>
>> I don't have statistics now but will collect from AWR reports no problem.
>>
>> @Jorn,
>>
>> The primary reason Oracle was chosen is because the company has global
>> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
>> Grade databases.
>>
>> None of MySQL and others are classified as such so they cannot be
>> deployed in production.
>>
>> Besides, for us to have Hive metadata on Oracle makes sense as our
>> infrastructure does all the support, HA etc for it and they have trained
>> DBAs to look after it 24x7.
>>
>> Admittedly we are now relying on HDFS itself plus Hbase as well for
>> persistent storage. So the situation might change.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2016 at 06:46, Per Ullberg  wrote:
>>
>>> I thought the main gain was to get ACID on Hive performant enough.
>>>
>>> @Mich: Do you run with ACID-enabled tables? How many
>>> Create/Update/Deletes do you do per second?
>>>
>>> best regards
>>> /Pelle
>>>
>>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke 
>>> wrote:
>>>
 I think the main gain is more about getting rid of a dedicated database
 including maintenance and potential license cost.
 For really large clusters and a lot of users this might be even more
 beneficial. You can avoid clustering the database etc.

 On 24 Oct 2016, at 00:46, Mich Talebzadeh 
 wrote:


 A while back there was some notes on having Hive metastore on Hbase as
 opposed to conventional RDBMSs

 I am currently involved with some hefty work with Hbase and Phoenix for
 batch ingestion of trade data. As long as you define your Hbase table
 through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
 impressive.

 I am not sure how much having Hbase as Hive metastore is going to add
 to Hive performance. We use Oracle 12c as Hive metastore and the Hive
 database/schema is built on solid state disks. Never had any issues with
 lock and concurrency.

 Therefore I am not sure what one is going to gain by having Hbase as
 the Hive metastore? I trust that we can still use our existing schemas on
 Oracle.

 HTH



 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




>>>
>>>
>>> --
>>>
>>> *Per Ullberg*
>>> Data Vault Tech Lead
>>> Odin Uppsala
>>> +46 701612693 <+46+701612693>
>>>
>>> Klarna AB (publ)
>>> Sveavägen 46, 111 34 Stockholm
>>> Tel: +46 8 120 120 00 <+46812012000>
>>> Reg no: 556737-0431
>>> klarna.com
>>>
>>>
>>
>
> --
>
> *Per Ullberg*
> Data Vault Tech Lead
> Odin Uppsala
> 

Re: Hive metadata on Hbase

2016-10-24 Thread Per Ullberg
What version of hive are you running?

/Pelle

On Monday, October 24, 2016, Mich Talebzadeh 
wrote:

> @Per
>
> We run full transactional enabled Hive metadb on an Oracle DB.
>
> I don't have statistics now but will collect from AWR reports no problem.
>
> @Jorn,
>
> The primary reason Oracle was chosen is because the company has global
> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
> Grade databases.
>
> None of MySQL and others are classified as such so they cannot be deployed
> in production.
>
> Besides, for us to have Hive metadata on Oracle makes sense as our
> infrastructure does all the support, HA etc for it and they have trained
> DBAs to look after it 24x7.
>
> Admittedly we are now relying on HDFS itself plus Hbase as well for
> persistent storage. So the situation might change.
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 06:46, Per Ullberg  > wrote:
>
>> I thought the main gain was to get ACID on Hive performant enough.
>>
>> @Mich: Do you run with ACID-enabled tables? How many
>> Create/Update/Deletes do you do per second?
>>
>> best regards
>> /Pelle
>>
>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke > > wrote:
>>
>>> I think the main gain is more about getting rid of a dedicated database
>>> including maintenance and potential license cost.
>>> For really large clusters and a lot of users this might be even more
>>> beneficial. You can avoid clustering the database etc.
>>>
>>> On 24 Oct 2016, at 00:46, Mich Talebzadeh >> > wrote:
>>>
>>>
>>> A while back there was some notes on having Hive metastore on Hbase as
>>> opposed to conventional RDBMSs
>>>
>>> I am currently involved with some hefty work with Hbase and Phoenix for
>>> batch ingestion of trade data. As long as you define your Hbase table
>>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>>> impressive.
>>>
>>> I am not sure how much having Hbase as Hive metastore is going to add to
>>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>>> database/schema is built on solid state disks. Never had any issues with
>>> lock and concurrency.
>>>
>>> Therefore I am not sure what one is going to gain by having Hbase as the
>>> Hive metastore? I trust that we can still use our existing schemas on
>>> Oracle.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Per Ullberg*
>> Data Vault Tech Lead
>> Odin Uppsala
>> +46 701612693 <+46+701612693>
>>
>> Klarna AB (publ)
>> Sveavägen 46, 111 34 Stockholm
>> Tel: +46 8 120 120 00 <+46812012000>
>> Reg no: 556737-0431
>> klarna.com
>>
>>
>

-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Hive metadata on Hbase

2016-10-24 Thread Mich Talebzadeh
@Per

We run full transactional enabled Hive metadb on an Oracle DB.

I don't have statistics now but will collect from AWR reports no problem.

@Jorn,

The primary reason Oracle was chosen is because the company has global
licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
Grade databases.

None of MySQL and others are classified as such so they cannot be deployed
in production.

Besides, for us to have Hive metadata on Oracle makes sense as our
infrastructure does all the support, HA etc for it and they have trained
DBAs to look after it 24x7.

Admittedly we are now relying on HDFS itself plus Hbase as well for
persistent storage. So the situation might change.

HTH







Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 06:46, Per Ullberg  wrote:

> I thought the main gain was to get ACID on Hive performant enough.
>
> @Mich: Do you run with ACID-enabled tables? How many Create/Update/Deletes
> do you do per second?
>
> best regards
> /Pelle
>
> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke  wrote:
>
>> I think the main gain is more about getting rid of a dedicated database
>> including maintenance and potential license cost.
>> For really large clusters and a lot of users this might be even more
>> beneficial. You can avoid clustering the database etc.
>>
>> On 24 Oct 2016, at 00:46, Mich Talebzadeh 
>> wrote:
>>
>>
>> A while back there was some notes on having Hive metastore on Hbase as
>> opposed to conventional RDBMSs
>>
>> I am currently involved with some hefty work with Hbase and Phoenix for
>> batch ingestion of trade data. As long as you define your Hbase table
>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>> impressive.
>>
>> I am not sure how much having Hbase as Hive metastore is going to add to
>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>> database/schema is built on solid state disks. Never had any issues with
>> lock and concurrency.
>>
>> Therefore I am not sure what one is going to gain by having Hbase as the
>> Hive metastore? I trust that we can still use our existing schemas on
>> Oracle.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>
>
> --
>
> *Per Ullberg*
> Data Vault Tech Lead
> Odin Uppsala
> +46 701612693 <+46+701612693>
>
> Klarna AB (publ)
> Sveavägen 46, 111 34 Stockholm
> Tel: +46 8 120 120 00 <+46812012000>
> Reg no: 556737-0431
> klarna.com
>
>