Re: Is Insert Overwrite table partition on s3 is an atomic operation ?

2021-01-11 Thread Austin Hackett
Hi Mark

It’s my understanding that when you do an INSERT OVERWRITE into a partition, 
Hive will take out an exclusive lock on the partition and a shared lock on the 
table itself. This blocks are read and write operations on the partition, and 
allows reads against the other partitions to proceed.

I am assuming that you have hive.txn.strict.locking.mode set to its default 
value of true, and the table is non-ACID. With 
hive.txn.strict.locking.mode=false and a non-ACID table, then the lock is 
shared, i.e. concurrent writes to the same table are allowed.

Hopefully that is this information you were looking for? Apologies if not.

Thanks

Austin

> On 11 Jan 2021, at 16:44, Mark Norkin  wrote:
> 
> Hello Hive users,
> 
> We are using AWS Glue as Hive compatible metastore when running queries on 
> EMR. For Hive external tables we are using AWS S3.
> 
> After looking at the docs we didn't find a conclusive answer on whether an 
> Insert Overwrite table partition is an atomic operation, maybe we've missed 
> it and it is documented somewhere, or maybe someone knows from their 
> experience?
> 
> If it's an atomic operation, is there any difference whether the table is 
> external or a managed one?
> 
> Thank you,
> 
> Mark



Re: Is Insert Overwrite table partition on s3 is an atomic operation ?

2021-01-11 Thread Mich Talebzadeh
Hi Mark,

By atomic operation I gather you mean INSERT/OVERWRITE affects that
partition only?

According to my somehow dated scripts yes you can do that. The idea being
that you only want to overwrite data for that partition ONLY.

--show create table marketData;
--Populate target table
select from_unixtime(unix_timestamp(), 'dd/MM/ HH:mm:ss.ss') AS
StartTime;
INSERT OVERWRITE TABLE ${DATABASE}.MARKETDATA PARTITION (DateStamp =
"${TODAY}")
SELECT
  KEY
, TICKER
, TIMECREATED
, PRICE
, 1
, CAST(from_unixtime(unix_timestamp()) AS timestamp)
FROM ${DATABASE}.EXTERNALMARKETDATA

So this basically loads data into the Hive partitioned table from an
external Hive table populated by Flume. It overwrites data for *today's
created partition regardles*s. Th external table has one partition added
daily

ALTER TABLE ${DATABASE}.EXTERNALMARKETDATA set location
'hdfs://rhes564:9000/data/prices/${TODAY}';

HTH




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 11 Jan 2021 at 16:45, Mark Norkin  wrote:

> Hello Hive users,
>
> We are using AWS Glue as Hive compatible metastore when running queries on
> EMR. For Hive external tables we are using AWS S3.
>
> After looking at the docs we didn't find a conclusive answer on whether an
> Insert Overwrite table partition is an atomic operation, maybe we've missed
> it and it is documented somewhere, or maybe someone knows from their
> experience?
>
> If it's an atomic operation, is there any difference whether the table is
> external or a managed one?
>
> Thank you,
>
> Mark
>


Is Insert Overwrite table partition on s3 is an atomic operation ?

2021-01-11 Thread Mark Norkin
Hello Hive users,

We are using AWS Glue as Hive compatible metastore when running queries on
EMR. For Hive external tables we are using AWS S3.

After looking at the docs we didn't find a conclusive answer on whether an
Insert Overwrite table partition is an atomic operation, maybe we've missed
it and it is documented somewhere, or maybe someone knows from their
experience?

If it's an atomic operation, is there any difference whether the table is
external or a managed one?

Thank you,

Mark