Re: Hive Concurrency support

2015-08-24 Thread Alan Gates
Are you using a lock manager, and if so which one?  I believe the 
ZooKeeper lock manager does not allow simultaneous writes.  The lock 
manager that comes with the DbTxnManager does, but you can't use that 
without also using transactions.


Alan.


Suyog Parlikar 
August 24, 2015 at 8:11

No table is not transactional

Elliot West 
August 23, 2015 at 22:00
Is the table configured to be transactional?

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties

On Monday, 24 August 2015, Suyog Parlikar > wrote:

Suyog Parlikar 
August 23, 2015 at 21:46

Hello Elliot,

Thanks for clarification.

But I am still not able to understand the hive working.

My cluster has following properties.

Txn.manager- dummyTxnManager

Cocurrency - true

Actually I am trying to insert data into two different partitions of a 
table at the same time.


When I check the locks present on the table ,it shows the shared 
lock.Which does not allow the writes on the table.


So I wanted to understand that ,

Does hive execute these two insert operations sequentially or it 
executes it in parallel .


Thanks,
Suyog

Elliot West 
August 23, 2015 at 3:52
Shared locks are all that is required to insert data into 
transactional tables. Multiple clients can hold a shared lock 
simultaneously. Each client will write using uniquely assigned 
transaction ids so that their work is isolated from one another. It 
should actually be possible for multiple clients to insert into the 
same partition concurrently.


See slide 12 in:
http://www.slideshare.net/mobile/Hadoop_Summit/w-525210-comalley

Thanks - Elliot.

On Friday, 21 August 2015, Suyog Parlikar > wrote:


Re: Hive Concurrency support

2015-08-24 Thread Suyog Parlikar
No table is not transactional
On Aug 24, 2015 10:30 AM, "Elliot West"  wrote:

> Is the table configured to be transactional?
>
>
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties
>
> On Monday, 24 August 2015, Suyog Parlikar  wrote:
>
>> Hello Elliot,
>>
>> Thanks for clarification.
>>
>> But I am still not able to understand the hive working.
>>
>> My cluster has following properties.
>>
>> Txn.manager- dummyTxnManager
>>
>> Cocurrency - true
>>
>> Actually I am trying to insert data into two different partitions of a
>> table at the same time.
>>
>> When I check the locks present on the table ,it shows the shared
>> lock.Which does not allow the writes on the table.
>>
>> So I wanted to understand that ,
>>
>> Does hive execute these two insert operations sequentially or it executes
>> it in parallel .
>>
>> Thanks,
>> Suyog
>> On Aug 23, 2015 4:23 PM, "Elliot West"  wrote:
>>
>>> Shared locks are all that is required to insert data into transactional
>>> tables. Multiple clients can hold a shared lock simultaneously. Each client
>>> will write using uniquely assigned transaction ids so that their work is
>>> isolated from one another. It should actually be possible for multiple
>>> clients to insert into the same partition concurrently.
>>>
>>> See slide 12 in:
>>> http://www.slideshare.net/mobile/Hadoop_Summit/w-525210-comalley
>>>
>>> Thanks - Elliot.
>>>
>>> On Friday, 21 August 2015, Suyog Parlikar 
>>> wrote:
>>>
 Thanks Elliot,

 For the immediate reply.

 But as per hive locking mechanism,
 While inserting data to a partition hive acquires exclusive lock on
 that partition and shared lock on the entire table.

 How is it possible to insert data into a different partition of the
 same table while having shared lock on the table which does not allow write
 operation.

 Please correct me if my understanding about the same is wrong.
 (I am using hql inserts only for these operations)

 Thanks,
 Suyog
 On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:

> I presume you mean "into different partitions of a table at the same
> time"? This should be possible. It is certainly supported by the streaming
> API, which is probably where you want to look if you need to insert large
> volumes of data to multiple partitions concurrently. I can't see why it
> would not also be possible with HQL INSERTs.
>
> On Friday, 21 August 2015, Suyog Parlikar 
> wrote:
>
>> Can we insert data in different partitions of a table at a time.
>>
>> Waiting for inputs .
>>
>> Thanks in advance.
>>
>> - suyog
>>
>


Re: Hive Concurrency support

2015-08-23 Thread Elliot West
Is the table configured to be transactional?

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties

On Monday, 24 August 2015, Suyog Parlikar  wrote:

> Hello Elliot,
>
> Thanks for clarification.
>
> But I am still not able to understand the hive working.
>
> My cluster has following properties.
>
> Txn.manager- dummyTxnManager
>
> Cocurrency - true
>
> Actually I am trying to insert data into two different partitions of a
> table at the same time.
>
> When I check the locks present on the table ,it shows the shared
> lock.Which does not allow the writes on the table.
>
> So I wanted to understand that ,
>
> Does hive execute these two insert operations sequentially or it executes
> it in parallel .
>
> Thanks,
> Suyog
> On Aug 23, 2015 4:23 PM, "Elliot West"  > wrote:
>
>> Shared locks are all that is required to insert data into transactional
>> tables. Multiple clients can hold a shared lock simultaneously. Each client
>> will write using uniquely assigned transaction ids so that their work is
>> isolated from one another. It should actually be possible for multiple
>> clients to insert into the same partition concurrently.
>>
>> See slide 12 in:
>> http://www.slideshare.net/mobile/Hadoop_Summit/w-525210-comalley
>>
>> Thanks - Elliot.
>>
>> On Friday, 21 August 2015, Suyog Parlikar 
>> wrote:
>>
>>> Thanks Elliot,
>>>
>>> For the immediate reply.
>>>
>>> But as per hive locking mechanism,
>>> While inserting data to a partition hive acquires exclusive lock on that
>>> partition and shared lock on the entire table.
>>>
>>> How is it possible to insert data into a different partition of the same
>>> table while having shared lock on the table which does not allow write
>>> operation.
>>>
>>> Please correct me if my understanding about the same is wrong.
>>> (I am using hql inserts only for these operations)
>>>
>>> Thanks,
>>> Suyog
>>> On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:
>>>
 I presume you mean "into different partitions of a table at the same
 time"? This should be possible. It is certainly supported by the streaming
 API, which is probably where you want to look if you need to insert large
 volumes of data to multiple partitions concurrently. I can't see why it
 would not also be possible with HQL INSERTs.

 On Friday, 21 August 2015, Suyog Parlikar 
 wrote:

> Can we insert data in different partitions of a table at a time.
>
> Waiting for inputs .
>
> Thanks in advance.
>
> - suyog
>



Re: Hive Concurrency support

2015-08-23 Thread Suyog Parlikar
Hello Elliot,

Thanks for clarification.

But I am still not able to understand the hive working.

My cluster has following properties.

Txn.manager- dummyTxnManager

Cocurrency - true

Actually I am trying to insert data into two different partitions of a
table at the same time.

When I check the locks present on the table ,it shows the shared lock.Which
does not allow the writes on the table.

So I wanted to understand that ,

Does hive execute these two insert operations sequentially or it executes
it in parallel .

Thanks,
Suyog
On Aug 23, 2015 4:23 PM, "Elliot West"  wrote:

> Shared locks are all that is required to insert data into transactional
> tables. Multiple clients can hold a shared lock simultaneously. Each client
> will write using uniquely assigned transaction ids so that their work is
> isolated from one another. It should actually be possible for multiple
> clients to insert into the same partition concurrently.
>
> See slide 12 in:
> http://www.slideshare.net/mobile/Hadoop_Summit/w-525210-comalley
>
> Thanks - Elliot.
>
> On Friday, 21 August 2015, Suyog Parlikar  wrote:
>
>> Thanks Elliot,
>>
>> For the immediate reply.
>>
>> But as per hive locking mechanism,
>> While inserting data to a partition hive acquires exclusive lock on that
>> partition and shared lock on the entire table.
>>
>> How is it possible to insert data into a different partition of the same
>> table while having shared lock on the table which does not allow write
>> operation.
>>
>> Please correct me if my understanding about the same is wrong.
>> (I am using hql inserts only for these operations)
>>
>> Thanks,
>> Suyog
>> On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:
>>
>>> I presume you mean "into different partitions of a table at the same
>>> time"? This should be possible. It is certainly supported by the streaming
>>> API, which is probably where you want to look if you need to insert large
>>> volumes of data to multiple partitions concurrently. I can't see why it
>>> would not also be possible with HQL INSERTs.
>>>
>>> On Friday, 21 August 2015, Suyog Parlikar 
>>> wrote:
>>>
 Can we insert data in different partitions of a table at a time.

 Waiting for inputs .

 Thanks in advance.

 - suyog

>>>


Re: Hive Concurrency support

2015-08-23 Thread Dr Mich Talebzadeh

correction in below:

2) You will have to coordinate concurrency via zookeeper for distributed
>transactions. Without zookeeper or equivalent product it will not work
>and you will end up with deadlocks in your metastore.

Should read

.. it will not work and you will end up with serialisation issues in your
metastore.

On 23/8/2015, "Dr Mich Talebzadeh"  wrote:

>
>Well I have across this in practice with real time data movements DML
>inserts) using replication server to deliver data from RDBMS to Hive. In
>general if you have not met the conditions you will end up with
>deadlocks.
>
>To make this work you will need:
>
>1) your Hive metastore must allow concurrency. In so far as I have found
>out Hive metastore on Oracle provides the best coincurrency support. For
>that you will need to run the supplied concurrency script against your
>metastore.
>2) You will have to coordinate concurrency via zookeeper for distributed
>transactions. Without zookeeper or equivalent product it will not work
>and you will end up with deadlocks in your metastore.
>
>HTH,
>
>Mich
>
>
>
>On 23/8/2015, "Noam Hasson"  wrote:
>
>>If you are looking to support concurrency check this param:
>>https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency
>>
>>I believe it will allow to you run several different inserts to the same
>>partitions, but I don't know what kind of corruption/collisions scenarios
>>are possible.
>>
>>On Fri, Aug 21, 2015 at 9:02 PM, Suyog Parlikar 
>>wrote:
>>
>>> Thanks Elliot,
>>>
>>> For the immediate reply.
>>>
>>> But as per hive locking mechanism,
>>> While inserting data to a partition hive acquires exclusive lock on that
>>> partition and shared lock on the entire table.
>>>
>>> How is it possible to insert data into a different partition of the same
>>> table while having shared lock on the table which does not allow write
>>> operation.
>>>
>>> Please correct me if my understanding about the same is wrong.
>>> (I am using hql inserts only for these operations)
>>>
>>> Thanks,
>>> Suyog
>>> On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:
>>>
 I presume you mean "into different partitions of a table at the same
 time"? This should be possible. It is certainly supported by the streaming
 API, which is probably where you want to look if you need to insert large
 volumes of data to multiple partitions concurrently. I can't see why it
 would not also be possible with HQL INSERTs.

 On Friday, 21 August 2015, Suyog Parlikar 
 wrote:

> Can we insert data in different partitions of a table at a time.
>
> Waiting for inputs .
>
> Thanks in advance.
>
> - suyog
>

>>
>>--
>>This e-mail, as well as any attached document, may contain material which
>>is confidential and privileged and may include trademark, copyright and
>>other intellectual property rights that are proprietary to Kenshoo Ltd,
>> its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
>>attachments may be read, copied and used only by the addressee for the
>>purpose(s) for which it was disclosed herein. If you have received it in
>>error, please destroy the message and any attachment, and contact us
>>immediately. If you are not the intended recipient, be aware that any
>>review, reliance, disclosure, copying, distribution or use of the contents
>>of this message without Kenshoo's express permission is strictly prohibited.
>>
>>
>


Re: Hive Concurrency support

2015-08-23 Thread Dr Mich Talebzadeh

Well I have across this in practice with real time data movements DML
inserts) using replication server to deliver data from RDBMS to Hive. In
general if you have not met the conditions you will end up with
deadlocks.

To make this work you will need:

1) your Hive metastore must allow concurrency. In so far as I have found
out Hive metastore on Oracle provides the best coincurrency support. For
that you will need to run the supplied concurrency script against your
metastore.
2) You will have to coordinate concurrency via zookeeper for distributed
transactions. Without zookeeper or equivalent product it will not work
and you will end up with deadlocks in your metastore.

HTH,

Mich



On 23/8/2015, "Noam Hasson"  wrote:

>If you are looking to support concurrency check this param:
>https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency
>
>I believe it will allow to you run several different inserts to the same
>partitions, but I don't know what kind of corruption/collisions scenarios
>are possible.
>
>On Fri, Aug 21, 2015 at 9:02 PM, Suyog Parlikar 
>wrote:
>
>> Thanks Elliot,
>>
>> For the immediate reply.
>>
>> But as per hive locking mechanism,
>> While inserting data to a partition hive acquires exclusive lock on that
>> partition and shared lock on the entire table.
>>
>> How is it possible to insert data into a different partition of the same
>> table while having shared lock on the table which does not allow write
>> operation.
>>
>> Please correct me if my understanding about the same is wrong.
>> (I am using hql inserts only for these operations)
>>
>> Thanks,
>> Suyog
>> On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:
>>
>>> I presume you mean "into different partitions of a table at the same
>>> time"? This should be possible. It is certainly supported by the streaming
>>> API, which is probably where you want to look if you need to insert large
>>> volumes of data to multiple partitions concurrently. I can't see why it
>>> would not also be possible with HQL INSERTs.
>>>
>>> On Friday, 21 August 2015, Suyog Parlikar 
>>> wrote:
>>>
 Can we insert data in different partitions of a table at a time.

 Waiting for inputs .

 Thanks in advance.

 - suyog

>>>
>
>--
>This e-mail, as well as any attached document, may contain material which
>is confidential and privileged and may include trademark, copyright and
>other intellectual property rights that are proprietary to Kenshoo Ltd,
> its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
>attachments may be read, copied and used only by the addressee for the
>purpose(s) for which it was disclosed herein. If you have received it in
>error, please destroy the message and any attachment, and contact us
>immediately. If you are not the intended recipient, be aware that any
>review, reliance, disclosure, copying, distribution or use of the contents
>of this message without Kenshoo's express permission is strictly prohibited.
>
>


Hive Concurrency support

2015-08-23 Thread Elliot West
Shared locks are all that is required to insert data into transactional
tables. Multiple clients can hold a shared lock simultaneously. Each client
will write using uniquely assigned transaction ids so that their work is
isolated from one another. It should actually be possible for multiple
clients to insert into the same partition concurrently.

See slide 12 in:
http://www.slideshare.net/mobile/Hadoop_Summit/w-525210-comalley

Thanks - Elliot.

On Friday, 21 August 2015, Suyog Parlikar > wrote:

> Thanks Elliot,
>
> For the immediate reply.
>
> But as per hive locking mechanism,
> While inserting data to a partition hive acquires exclusive lock on that
> partition and shared lock on the entire table.
>
> How is it possible to insert data into a different partition of the same
> table while having shared lock on the table which does not allow write
> operation.
>
> Please correct me if my understanding about the same is wrong.
> (I am using hql inserts only for these operations)
>
> Thanks,
> Suyog
> On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:
>
>> I presume you mean "into different partitions of a table at the same
>> time"? This should be possible. It is certainly supported by the streaming
>> API, which is probably where you want to look if you need to insert large
>> volumes of data to multiple partitions concurrently. I can't see why it
>> would not also be possible with HQL INSERTs.
>>
>> On Friday, 21 August 2015, Suyog Parlikar 
>> wrote:
>>
>>> Can we insert data in different partitions of a table at a time.
>>>
>>> Waiting for inputs .
>>>
>>> Thanks in advance.
>>>
>>> - suyog
>>>
>>


Re: Hive Concurrency support

2015-08-23 Thread Noam Hasson
If you are looking to support concurrency check this param:
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency

I believe it will allow to you run several different inserts to the same
partitions, but I don't know what kind of corruption/collisions scenarios
are possible.

On Fri, Aug 21, 2015 at 9:02 PM, Suyog Parlikar 
wrote:

> Thanks Elliot,
>
> For the immediate reply.
>
> But as per hive locking mechanism,
> While inserting data to a partition hive acquires exclusive lock on that
> partition and shared lock on the entire table.
>
> How is it possible to insert data into a different partition of the same
> table while having shared lock on the table which does not allow write
> operation.
>
> Please correct me if my understanding about the same is wrong.
> (I am using hql inserts only for these operations)
>
> Thanks,
> Suyog
> On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:
>
>> I presume you mean "into different partitions of a table at the same
>> time"? This should be possible. It is certainly supported by the streaming
>> API, which is probably where you want to look if you need to insert large
>> volumes of data to multiple partitions concurrently. I can't see why it
>> would not also be possible with HQL INSERTs.
>>
>> On Friday, 21 August 2015, Suyog Parlikar 
>> wrote:
>>
>>> Can we insert data in different partitions of a table at a time.
>>>
>>> Waiting for inputs .
>>>
>>> Thanks in advance.
>>>
>>> - suyog
>>>
>>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.


Re: Hive Concurrency support

2015-08-21 Thread Suyog Parlikar
Thanks Elliot,

For the immediate reply.

But as per hive locking mechanism,
While inserting data to a partition hive acquires exclusive lock on that
partition and shared lock on the entire table.

How is it possible to insert data into a different partition of the same
table while having shared lock on the table which does not allow write
operation.

Please correct me if my understanding about the same is wrong.
(I am using hql inserts only for these operations)

Thanks,
Suyog
On Aug 21, 2015 7:28 PM, "Elliot West"  wrote:

> I presume you mean "into different partitions of a table at the same
> time"? This should be possible. It is certainly supported by the streaming
> API, which is probably where you want to look if you need to insert large
> volumes of data to multiple partitions concurrently. I can't see why it
> would not also be possible with HQL INSERTs.
>
> On Friday, 21 August 2015, Suyog Parlikar  wrote:
>
>> Can we insert data in different partitions of a table at a time.
>>
>> Waiting for inputs .
>>
>> Thanks in advance.
>>
>> - suyog
>>
>


Re: Hive Concurrency support

2015-08-21 Thread Elliot West
I presume you mean "into different partitions of a table at the same
time"? This should be possible. It is certainly supported by the streaming
API, which is probably where you want to look if you need to insert large
volumes of data to multiple partitions concurrently. I can't see why it
would not also be possible with HQL INSERTs.

On Friday, 21 August 2015, Suyog Parlikar  wrote:

> Can we insert data in different partitions of a table at a time.
>
> Waiting for inputs .
>
> Thanks in advance.
>
> - suyog
>


Hive Concurrency support

2015-08-21 Thread Suyog Parlikar
Can we insert data in different partitions of a table at a time.

Waiting for inputs .

Thanks in advance.

- suyog