Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Mich Talebzadeh
That is a good point Jorn with regard to JDBC and Hive data

I believe you can use JDBC to get a compressed data from an Oraclle or
Sybase database cause decompression happens at the time of data access much
like using a sqlplus or isql tool.

However, it is worth trying what happens when one accesses Hive data
through JDBC where the underlying table is compress using bzip2 or snappy
etc.

If this is oone-off request say copy all table from certain DB in Hive in
Prod to UAT, I am not sure replication will be suitable as the request is
for a snapshot.

EXPORT/IMPORT through NAS or scp should be an option. NAS is better as it
saves scp and copy across with taget having enough external space to get
the files in.

More useful tool would be to export the full Hive database in binary format
and import it in target.

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 26 May 2016 at 07:28, Elliot West  wrote:

> Hello,
>
> I've been looking at this recently for moving Hive tables from on-premise
> clusters to the cloud, but the principle should be the same for your
> use-case. If you wish to do this in an automated way, some tools worth
> considering are:
>
>- Hive's built in replication framework:
>https://cwiki.apache.org/confluence/display/Hive/Replication
>- Hive's IMPORT/EXPORT primitives:
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
>- AirBnB's ReAir replication tool:
>
> https://medium.com/airbnb-engineering/reair-easy-to-use-tools-for-migrating-and-replicating-petabyte-scale-data-warehouses-5153f8a433da
>
> Elliot.
>
> On 8 April 2016 at 23:24, Ashok Kumar  wrote:
>
>> Hi,
>>
>> Anyone has suggestions how to create and copy Hive and Spark tables from
>> Production to UAT.
>>
>> One way would be to copy table data to external files and then move the
>> external files to a local target directory and populate the tables in
>> target Hive with data.
>>
>> Is there an easier way of doing so?
>>
>> thanks
>>
>>
>>
>


Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Elliot West
Hello,

I've been looking at this recently for moving Hive tables from on-premise
clusters to the cloud, but the principle should be the same for your
use-case. If you wish to do this in an automated way, some tools worth
considering are:

   - Hive's built in replication framework:
   https://cwiki.apache.org/confluence/display/Hive/Replication
   - Hive's IMPORT/EXPORT primitives:
   https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
   - AirBnB's ReAir replication tool:
   
https://medium.com/airbnb-engineering/reair-easy-to-use-tools-for-migrating-and-replicating-petabyte-scale-data-warehouses-5153f8a433da

Elliot.

On 8 April 2016 at 23:24, Ashok Kumar  wrote:

> Hi,
>
> Anyone has suggestions how to create and copy Hive and Spark tables from
> Production to UAT.
>
> One way would be to copy table data to external files and then move the
> external files to a local target directory and populate the tables in
> target Hive with data.
>
> Is there an easier way of doing so?
>
> thanks
>
>
>


Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Jörn Franke
Or use Falcon ...

The Spark JDBC I would try to avoid. Jdbc is not designed for these big data 
bulk operations, eg data has to be transferred uncompressed and there is the 
serialization/deserialization issue query result -> protocol -> Java objects -> 
writing to specific storage format etc
This costs more time than you may think.

> On 25 May 2016, at 18:05, Mich Talebzadeh  wrote:
> 
> They are multiple ways of doing this without relying any vendors release.
> 
> 1) Using  hive EXPORT/IMPORT utility
> 
> EXPORT TABLE table_or_partition TO hdfs_path;
> IMPORT [[EXTERNAL] TABLE table_or_partition] FROM hdfs_path [LOCATION 
> [table_location]];
> 2) This works for individual tables but you can easily write a generic script 
> to pick up name of tables for a given database from Hive metadata.
>  example
> 
> SELECT
>   t.owner AS Owner
> , d.NAME AS DBName
> , t.TBL_NAME AS Tablename
> , TBL_TYPE
> FROM tbls t, dbs d
> WHERE
>   t.DB_ID = d.DB_ID
> AND
>   TBL_TYPE IN ('MANAGED_TABLE','EXTERNAL_TABLE')
> ORDER BY 1,2
> 
> Then a Linux shell script will table 5 min max to create and you have full 
> control of the code. You can even do multiple EXPORT/IMPORT at the same time.
> 
> 3) Easier  to create a shared NFS mount between PROD and UAT so you can put 
> the tables data and metadata on this NFS
> 
> 2) Use Spark shell script to get data via JDBC from the source database and 
> push schema and data into the new env. Again this is no different from 
> getting the underlying data from Oracle or Sybase database and putting in Hive
> 
> 3) Using vendor's product to do the same. I am not sure vendors do 
> parallelise this sort of things.
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 25 May 2016 at 14:50, Suresh Kumar Sethuramaswamy  
>> wrote:
>> Hi
>> 
>>   If you are using CDH, via CM , Backup->replications you could do inter 
>> cluster hive data transfer including metadata
>> 
>> Regards
>> Suresh
>> 
>> 
>>> On Wednesday, May 25, 2016, mahender bigdata  
>>> wrote:
>>> Any Document on it. 
>>> 
 On 4/8/2016 6:28 PM, Will Du wrote:
 did you try export and import statement in HQL?
 
> On Apr 8, 2016, at 6:24 PM, Ashok Kumar  wrote:
> 
> Hi,
> 
> Anyone has suggestions how to create and copy Hive and Spark tables from 
> Production to UAT.
> 
> One way would be to copy table data to external files and then move the 
> external files to a local target directory and populate the tables in 
> target Hive with data.
> 
> Is there an easier way of doing so?
> 
> thanks
> 


Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread Gopal Vijayaraghavan

> We are using HDP. Is there any feature in ambari

Apache Falcon handles data lifecycle management, not Ambari.
 
https://falcon.apache.org/0.8/HiveDR.html

Cheers,
Gopal




Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread mahender bigdata

Thanks Mich. I will look into it.


On 5/25/2016 9:05 AM, Mich Talebzadeh wrote:

They are multiple ways of doing this without relying any vendors release.

1) Using  hive EXPORT/IMPORT utility

EXPORT TABLE table_or_partition TO hdfs_path;
IMPORT [[EXTERNAL] TABLE table_or_partition] FROM hdfs_path
[LOCATION [table_location]];

2) This works for individual tables but you can easily write a generic 
script to pick up name of tables for a given database from Hive metadata.

 example

SELECT
  t.owner AS Owner
, d.NAME AS DBName
, t.TBL_NAME AS Tablename
, TBL_TYPE
FROM tbls t, dbs d
WHERE
  t.DB_ID = d.DB_ID
AND
  TBL_TYPE IN ('MANAGED_TABLE','EXTERNAL_TABLE')
ORDER BY 1,2

Then a Linux shell script will table 5 min max to create and you have 
full control of the code. You can even do multiple EXPORT/IMPORT at 
the same time.


3) Easier  to create a shared NFS mount between PROD and UAT so you 
can put the tables data and metadata on this NFS


2) Use Spark shell script to get data via JDBC from the source 
database and push schema and data into the new env. Again this is no 
different from getting the underlying data from Oracle or Sybase 
database and putting in Hive


3) Using vendor's product to do the same. I am not sure vendors do 
parallelise this sort of things.


HTH



Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com 


On 25 May 2016 at 14:50, Suresh Kumar Sethuramaswamy 
> wrote:


Hi

  If you are using CDH, via CM , Backup->replications you could do
inter cluster hive data transfer including metadata

Regards
Suresh


On Wednesday, May 25, 2016, mahender bigdata
> wrote:

Any Document on it.


On 4/8/2016 6:28 PM, Will Du wrote:

did you try export and import statement in HQL?


On Apr 8, 2016, at 6:24 PM, Ashok Kumar
 wrote:

Hi,

Anyone has suggestions how to create and copy Hive and Spark
tables from Production to UAT.

One way would be to copy table data to external files and
then move the external files to a local target directory and
populate the tables in target Hive with data.

Is there an easier way of doing so?

thanks











Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread mahender bigdata

We are using HDP. Is there any feature in ambari


On 5/25/2016 6:50 AM, Suresh Kumar Sethuramaswamy wrote:

Hi

  If you are using CDH, via CM , Backup->replications you could do 
inter cluster hive data transfer including metadata


Regards
Suresh

On Wednesday, May 25, 2016, mahender bigdata 
> 
wrote:


Any Document on it.


On 4/8/2016 6:28 PM, Will Du wrote:

did you try export and import statement in HQL?


On Apr 8, 2016, at 6:24 PM, Ashok Kumar > wrote:

Hi,

Anyone has suggestions how to create and copy Hive and Spark
tables from Production to UAT.

One way would be to copy table data to external files and then
move the external files to a local target directory and populate
the tables in target Hive with data.

Is there an easier way of doing so?

thanks










Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread Mich Talebzadeh
They are multiple ways of doing this without relying any vendors release.

1) Using  hive EXPORT/IMPORT utility

EXPORT TABLE table_or_partition TO hdfs_path;
IMPORT [[EXTERNAL] TABLE table_or_partition] FROM hdfs_path [LOCATION
[table_location]];

2) This works for individual tables but you can easily write a generic
script to pick up name of tables for a given database from Hive metadata.
 example

SELECT
  t.owner AS Owner
, d.NAME AS DBName
, t.TBL_NAME AS Tablename
, TBL_TYPE
FROM tbls t, dbs d
WHERE
  t.DB_ID = d.DB_ID
AND
  TBL_TYPE IN ('MANAGED_TABLE','EXTERNAL_TABLE')
ORDER BY 1,2

Then a Linux shell script will table 5 min max to create and you have full
control of the code. You can even do multiple EXPORT/IMPORT at the same
time.

3) Easier  to create a shared NFS mount between PROD and UAT so you can put
the tables data and metadata on this NFS

2) Use Spark shell script to get data via JDBC from the source database and
push schema and data into the new env. Again this is no different from
getting the underlying data from Oracle or Sybase database and putting in
Hive

3) Using vendor's product to do the same. I am not sure vendors do
parallelise this sort of things.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 25 May 2016 at 14:50, Suresh Kumar Sethuramaswamy 
wrote:

> Hi
>
>   If you are using CDH, via CM , Backup->replications you could do inter
> cluster hive data transfer including metadata
>
> Regards
> Suresh
>
>
> On Wednesday, May 25, 2016, mahender bigdata 
> wrote:
>
>> Any Document on it.
>>
>> On 4/8/2016 6:28 PM, Will Du wrote:
>>
>> did you try export and import statement in HQL?
>>
>> On Apr 8, 2016, at 6:24 PM, Ashok Kumar  wrote:
>>
>> Hi,
>>
>> Anyone has suggestions how to create and copy Hive and Spark tables from
>> Production to UAT.
>>
>> One way would be to copy table data to external files and then move the
>> external files to a local target directory and populate the tables in
>> target Hive with data.
>>
>> Is there an easier way of doing so?
>>
>> thanks
>>
>>
>>
>>
>>


Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread Suresh Kumar Sethuramaswamy
Hi

  If you are using CDH, via CM , Backup->replications you could do inter
cluster hive data transfer including metadata

Regards
Suresh

On Wednesday, May 25, 2016, mahender bigdata 
wrote:

> Any Document on it.
>
> On 4/8/2016 6:28 PM, Will Du wrote:
>
> did you try export and import statement in HQL?
>
> On Apr 8, 2016, at 6:24 PM, Ashok Kumar <
> ashok34...@yahoo.com
> > wrote:
>
> Hi,
>
> Anyone has suggestions how to create and copy Hive and Spark tables from
> Production to UAT.
>
> One way would be to copy table data to external files and then move the
> external files to a local target directory and populate the tables in
> target Hive with data.
>
> Is there an easier way of doing so?
>
> thanks
>
>
>
>
>


Re: Copying all Hive tables from Prod to UAT

2016-05-25 Thread mahender bigdata

Any Document on it.


On 4/8/2016 6:28 PM, Will Du wrote:

did you try export and import statement in HQL?

On Apr 8, 2016, at 6:24 PM, Ashok Kumar > wrote:


Hi,

Anyone has suggestions how to create and copy Hive and Spark tables 
from Production to UAT.


One way would be to copy table data to external files and then move 
the external files to a local target directory and populate the 
tables in target Hive with data.


Is there an easier way of doing so?

thanks








Re: Copying all Hive tables from Prod to UAT

2016-04-08 Thread Will Du
did you try export and import statement in HQL?

> On Apr 8, 2016, at 6:24 PM, Ashok Kumar  wrote:
> 
> Hi,
> 
> Anyone has suggestions how to create and copy Hive and Spark tables from 
> Production to UAT.
> 
> One way would be to copy table data to external files and then move the 
> external files to a local target directory and populate the tables in target 
> Hive with data.
> 
> Is there an easier way of doing so?
> 
> thanks
> 
> 



Copying all Hive tables from Prod to UAT

2016-04-08 Thread Ashok Kumar
Hi,
Anyone has suggestions how to create and copy Hive and Spark tables from 
Production to UAT.
One way would be to copy table data to external files and then move the 
external files to a local target directory and populate the tables in target 
Hive with data.
Is there an easier way of doing so?
thanks