Re: Locking in HIVE : How to use locking/unlocking features using hive java API ?

2012-12-10 Thread Manish Malhotra
Sending again, as got no response.

Can somebody from Hive dev group please review my approach and reply?

Cheers,
Manish


On Thu, Dec 6, 2012 at 11:17 PM, Manish Malhotra 
manish.hadoop.w...@gmail.com wrote:

 Hi,

 I'm building / designing a back-up and restore tool for hive data for
 Disaster Recovery scenarios.

 I'm trying to understand the locking behavior of HIVE that is currently
 supporting ZooKeeper for locking.

 My thought process if like this ( early design.)

 1. Backing up the meta-data of hive.
 2. Backing up the data for hive tables on s3 or hdfs or NFS
 3. Restoring table(s):
 a. Only Data
 b. Schema and data

 So, to achieve 1st task, this is the flow I'm thinking.

 a. Check whether there is any exclusive lock on the Table, whose meta-data
 needs to be backed up.
  if YES then don't do any thing, wait and retry for configured
 no/frequency
  if NO: Then get the meta-data of the table and create the DDL
 statement for HIVE including table / partition etc.

 For 2nd task:

 a. Check whether the table has any exclusive lock,
 if NOT take shared lock and start copy, once done release the
 shared lock.
 if YES then then wait and retry.

 For 3rd: Restoring:

 a. Only Data: Check if there is any lock on the table.
  if NO, then take the exclusive lock, insert the data
 into table, release the lock.
  if YES then wait and retry.

 b. Schema and Data:

 Check if there is any lock on table/partition.
   if NO then Drop and create table/partitions.
   if YES then wait and retry.
  Once schema is created:
   take the exclusive lock, insert data, release lock.


 Now I'm going to run this kind of job from my scheduler / WF engine.
 I need input on following questions:

 a. Is this overall approach looks good?
 b. How can I take and release different locks explicitly using HIVE API.
 ref: https://cwiki.apache.org/confluence/display/Hive/Locking

 If I understood correctly, As per this still HIVE doesn't support locking
 explicitly at API level.
 Is there any plan or patch to get this done.

 I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further
 to see, if can use these classes for locking features.

 Thanks for your time and effort.

 Regards,
 Manish





Re: Locking in HIVE : How to use locking/unlocking features using hive java API ?

2012-12-10 Thread Ruslan Al-Fakikh
Hi Manish!

Why do you need metadata backup? Can't you just store all the table create
statements in an init file? If you care about Partitions that have been
created dynamically then you can restore them from data by RECOVER
PARTITIONS (if using Amazon EMR) or an analog check command for a regular
distro of Hadoop (I don't remember what the name is).

Ruslan


On Mon, Dec 10, 2012 at 12:48 PM, Manish Malhotra 
manish.hadoop.w...@gmail.com wrote:

 Sending again, as got no response.

 Can somebody from Hive dev group please review my approach and reply?

 Cheers,
 Manish


 On Thu, Dec 6, 2012 at 11:17 PM, Manish Malhotra 
 manish.hadoop.w...@gmail.com wrote:

 Hi,

 I'm building / designing a back-up and restore tool for hive data for
 Disaster Recovery scenarios.

 I'm trying to understand the locking behavior of HIVE that is currently
 supporting ZooKeeper for locking.

 My thought process if like this ( early design.)

 1. Backing up the meta-data of hive.
 2. Backing up the data for hive tables on s3 or hdfs or NFS
 3. Restoring table(s):
 a. Only Data
 b. Schema and data

 So, to achieve 1st task, this is the flow I'm thinking.

 a. Check whether there is any exclusive lock on the Table, whose
 meta-data needs to be backed up.
  if YES then don't do any thing, wait and retry for configured
 no/frequency
  if NO: Then get the meta-data of the table and create the DDL
 statement for HIVE including table / partition etc.

 For 2nd task:

 a. Check whether the table has any exclusive lock,
 if NOT take shared lock and start copy, once done release the
 shared lock.
 if YES then then wait and retry.

 For 3rd: Restoring:

 a. Only Data: Check if there is any lock on the table.
  if NO, then take the exclusive lock, insert the data
 into table, release the lock.
  if YES then wait and retry.

 b. Schema and Data:

 Check if there is any lock on table/partition.
   if NO then Drop and create table/partitions.
   if YES then wait and retry.
  Once schema is created:
   take the exclusive lock, insert data, release lock.


 Now I'm going to run this kind of job from my scheduler / WF engine.
 I need input on following questions:

 a. Is this overall approach looks good?
 b. How can I take and release different locks explicitly using HIVE API.
 ref: https://cwiki.apache.org/confluence/display/Hive/Locking

 If I understood correctly, As per this still HIVE doesn't support locking
 explicitly at API level.
 Is there any plan or patch to get this done.

 I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further
 to see, if can use these classes for locking features.

 Thanks for your time and effort.

 Regards,
 Manish






Re: Locking in HIVE : How to use locking/unlocking features using hive java API ?

2012-12-10 Thread Manish Malhotra
Thanks Ruslan,

Please see my inline comments,

Why do you need metadata backup? Can't you just store all the table create
statements in an init file?

MM: Because I don't want to depend on the init script that will have all
the entries for all the tables.
And this backup tool should be independent of any application or process to
be follows like maintaining all the tables in init file.
Secondly, I want to club the metadata and data backup, so to restore data,
user can say give me User data for these dates.

 If you care about Partitions that have been created dynamically then you
can restore them from data by RECOVER PARTITIONS (if using Amazon EMR) or
an analog check command for a regular distro of Hadoop (I don't remember
what the name is).

MM: Dont want to go to EMR route, will check the hadoop/hive based way of
doing.

Cheers,
Manish


Locking in HIVE : How to use locking/unlocking features using hive java API ?

2012-12-06 Thread Manish Malhotra
Hi,

I'm building / designing a back-up and restore tool for hive data for
Disaster Recovery scenarios.

I'm trying to understand the locking behavior of HIVE that is currently
supporting ZooKeeper for locking.

My thought process if like this ( early design.)

1. Backing up the meta-data of hive.
2. Backing up the data for hive tables on s3 or hdfs or NFS
3. Restoring table(s):
a. Only Data
b. Schema and data

So, to achieve 1st task, this is the flow I'm thinking.

a. Check whether there is any exclusive lock on the Table, whose meta-data
needs to be backed up.
 if YES then don't do any thing, wait and retry for configured
no/frequency
 if NO: Then get the meta-data of the table and create the DDL
statement for HIVE including table / partition etc.

For 2nd task:

a. Check whether the table has any exclusive lock,
if NOT take shared lock and start copy, once done release the
shared lock.
if YES then then wait and retry.

For 3rd: Restoring:

a. Only Data: Check if there is any lock on the table.
 if NO, then take the exclusive lock, insert the data
into table, release the lock.
 if YES then wait and retry.

b. Schema and Data:

Check if there is any lock on table/partition.
  if NO then Drop and create table/partitions.
  if YES then wait and retry.
 Once schema is created:
  take the exclusive lock, insert data, release lock.


Now I'm going to run this kind of job from my scheduler / WF engine.
I need input on following questions:

a. Is this overall approach looks good?
b. How can I take and release different locks explicitly using HIVE API.
ref: https://cwiki.apache.org/confluence/display/Hive/Locking

If I understood correctly, As per this still HIVE doesn't support locking
explicitly at API level.
Is there any plan or patch to get this done.

I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further to
see, if can use these classes for locking features.

Thanks for your time and effort.

Regards,
Manish