Re: Hive append

2014-03-06 Thread Nitin Pawar
are you talking about adding new records to tables or updating records in
already existing table?


On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote:

 Query in HIVE



 I tried merge kind of operation in Hive to retain the existing records and
 append the new records instead of dropping the table and populating it
 again.



 If anyone can come help with any other approach other than this or the
 approach to perform merge operation



 will be great help




-- 
Nitin Pawar


Re: Hive append

2014-03-06 Thread Raj hadoop
Hi Nitin,

existing records should remain same and the new records should get inserted
into the table


On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 are you talking about adding new records to tables or updating records in
 already existing table?


 On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote:

 Query in HIVE



 I tried merge kind of operation in Hive to retain the existing records
 and append the new records instead of dropping the table and populating it
 again.



 If anyone can come help with any other approach other than this or the
 approach to perform merge operation



 will be great help




 --
 Nitin Pawar



Re: Hive append

2014-03-06 Thread Raj hadoop
Hi Nitin,

existing records should remain same and the new records should get inserted
into the table


On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 are you talking about adding new records to tables or updating records in
 already existing table?


 On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote:

 Query in HIVE



 I tried merge kind of operation in Hive to retain the existing records
 and append the new records instead of dropping the table and populating it
 again.



 If anyone can come help with any other approach other than this or the
 approach to perform merge operation



 will be great help




 --
 Nitin Pawar



Re: Hive append

2014-03-06 Thread Nitin Pawar
You may want to look at partitioned tables and load data into partitions.
For my that seems like the easiest way.

If you do not have a defined partition column in your data, then another
approach is load data into a temporary staging table and from there load
into partitioned table.
In this approach the catch would be that the data you are getting does not
have data for older partitions.

I normally have an extra column added to my tables. Something like
data_load_date which is my partition table. Then from the staging table I
load data in this table with partition to be the date on which I am loading
new data to table.


On Thu, Mar 6, 2014 at 2:30 PM, Raj hadoop raj.had...@gmail.com wrote:

 Hi Nitin,

 existing records should remain same and the new records should get
 inserted into the table


 On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 are you talking about adding new records to tables or updating records in
 already existing table?


 On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote:

 Query in HIVE



 I tried merge kind of operation in Hive to retain the existing records
 and append the new records instead of dropping the table and populating it
 again.



 If anyone can come help with any other approach other than this or the
 approach to perform merge operation



 will be great help




 --
 Nitin Pawar





-- 
Nitin Pawar


RE: Setting | Verifying | Hive Query Parameters from Java

2014-03-06 Thread Garg, Rinku
Hi All,

Can anybody help me on below mail trail.

Thanks
Rinku Garg

From: Garg, Rinku
Sent: Tuesday, March 04, 2014 5:14 PM
To: user@hive.apache.org
Subject: Setting | Verifying | Hive Query Parameters from Java

Hi All,

We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as 
desired.

We need to set hive configuration parameter from Java while making JDBC 
connection.

We have written a java program to execute queries on hive server with some 
configurations properties setting dynamically . We are doing it as below

CONNECTION_URL=jdbc:hive://master149:1/default

Next, we are doing following method to set properties through java

props.setProperty(hive.server2.async.exec.threads,50);
props.setProperty(hive.server2.thrift.max.worker.threads,500);
props.setProperty(hive.groupby.orderby.position.alias,false);

and a hive connection is made as given below

hiveConnection = DriverManager.getConnection(connectionURL,props);

by above steps when a hive connection is made using hive-jdbc and we are 
getting hive query results as desired.

QUERY:


1.   Are we doing rightly for setting up the hive properties, if yes then 
how can we verify that?

2.   If the above is not the right way, then how can we achieve setting 
hive configuration parameters from Java using JDBC?

Thanks
Rinku Garg

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.


Re: Data Modeling Tool

2014-03-06 Thread Joseph D Antoni
A data architect friend said the latest release of CA Erwin can handle Hive, 
but it doesn't support Postgres directly.

Thanks

Joey D'Antoni



On Wednesday, March 5, 2014 9:59 PM, Ronak Bhatt ronakb...@gmail.com wrote:
 
Hello Hive Experts

Is there any data modeling tool that you can suggest that can work with Hive 
and Postgres?

Objective : build  maintain entity definitions for Hive, Postgres thru this 
one tool...Build logical and physical models for data warehouse in the same 
tool. 



Any pointers?


thanks, ronak

Hive unwanted location directory

2014-03-06 Thread Valluri, Sathish
We are creating external table in Hive and if the location path is not present 
in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' 
dummy folder.

Is there any option in hive or any way to stop creating dummy directories if 
the location folder not exists.

Our use case needs many temporary tables needs to be created dynamically and we 
are creating many unwanted dummy directories if the data not present on the 
HDFS.



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ schema json literal') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Regards

Sathish Valluri





Re: Automatic replacement of partitions in hive

2014-03-06 Thread Bryan Jeffrey
Nitin,

#3 will not work.  msck repair table does not remove partitions if the
files associated with the partition do not exist.  We have successfully
applied #2 in our application.

Regards,

Bryan Jeffrey


On Thu, Mar 6, 2014 at 5:37 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

 There is no concept called automatic.

 Please wait for expert hive gurus to reply before using any of my
 suggestions

 Few options which I can think of are
 1) Insert overwrite table with dynamic partitions enabled and restricting
 the partition column values for the date range you want. Cost of this
 operation will totally matter on how big the table is when you are
 importing via sqoop.

 2) Load data in new partition and drop older partition using hive script 
 little bit of scripting effort is needed
 3) Use hadoop command line utilities to clear partition directories from
 hdfs and then do a table repair.  I never heard anyone using this to delete
 partition. Its mostly to recover lost partitions etc






 On Thu, Mar 6, 2014 at 3:53 PM, Kasi Subrahmanyam 
 kasisubbu...@gmail.comwrote:

 Hi,
 I have a table in hive which has data of three months old. I have
 partitioned the data and I got 90 partitions. Now when I get the new data
 from next day I want to replace the partition 1week old with the new one
 automatically.

 Can this partitioning and replacement be done using swoop at the same time

 Thanks,
 Subbu




 --
 Nitin Pawar



Re: Automatic replacement of partitions in hive

2014-03-06 Thread Nitin Pawar
Thanks for clarifying that Bryan


On Thu, Mar 6, 2014 at 7:55 PM, Bryan Jeffrey bryan.jeff...@gmail.comwrote:

 Nitin,

 #3 will not work.  msck repair table does not remove partitions if the
 files associated with the partition do not exist.  We have successfully
 applied #2 in our application.

 Regards,

 Bryan Jeffrey


 On Thu, Mar 6, 2014 at 5:37 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 There is no concept called automatic.

 Please wait for expert hive gurus to reply before using any of my
 suggestions

 Few options which I can think of are
 1) Insert overwrite table with dynamic partitions enabled and restricting
 the partition column values for the date range you want. Cost of this
 operation will totally matter on how big the table is when you are
 importing via sqoop.

 2) Load data in new partition and drop older partition using hive script
  little bit of scripting effort is needed
 3) Use hadoop command line utilities to clear partition directories from
 hdfs and then do a table repair.  I never heard anyone using this to delete
 partition. Its mostly to recover lost partitions etc






 On Thu, Mar 6, 2014 at 3:53 PM, Kasi Subrahmanyam kasisubbu...@gmail.com
  wrote:

 Hi,
 I have a table in hive which has data of three months old. I have
 partitioned the data and I got 90 partitions. Now when I get the new data
 from next day I want to replace the partition 1week old with the new one
 automatically.

 Can this partitioning and replacement be done using swoop at the same
 time

 Thanks,
 Subbu




 --
 Nitin Pawar





-- 
Nitin Pawar


RE: Setting | Verifying | Hive Query Parameters from Java

2014-03-06 Thread java8964
If you want to set some properties of hive, just run it as it is in your JDBC 
connection. 
Any command in the hive JDBC will send to the server as the same if you run 
set hive.server2.async.exec.threads=50; in the hive session.
Run the command set hive.server2.async.exec.threads=50; as a SQL statement, 
it will adjust the value for  your JDBC connection.
About the properties setting, I am not sure if it will work in Hive JDBC. 
Hive JDBC is a limited JDBC implementation based on Hive, so it maybe won't 
work, but I don't know for sure.
Yong

From: rinku.g...@fisglobal.com
To: user@hive.apache.org
Subject: RE: Setting | Verifying | Hive Query Parameters from Java
Date: Thu, 6 Mar 2014 11:12:52 +









Hi All,
 
Can anybody help me on below mail trail.
 
Thanks
Rinku Garg
 


From: Garg, Rinku


Sent: Tuesday, March 04, 2014 5:14 PM

To: user@hive.apache.org

Subject: Setting | Verifying | Hive Query Parameters from Java


 
Hi All,
 
We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as 
desired.
 
We need to set hive configuration parameter from Java while making JDBC 
connection.
 
We have written a java program to execute queries on hive server with some 
configurations properties setting dynamically . We are doing it as below

 
CONNECTION_URL=jdbc:hive://master149:1/default
 
Next, we are doing following method to set properties through java
 
props.setProperty(hive.server2.async.exec.threads,50);
props.setProperty(hive.server2.thrift.max.worker.threads,500);
props.setProperty(hive.groupby.orderby.position.alias,false);
 
and a hive connection is made as given below

 
hiveConnection = DriverManager.getConnection(connectionURL,props);
 
by above steps when a hive connection is made using hive-jdbc and we are 
getting hive query results as desired.
 
QUERY: 
 
1.  
Are we doing rightly for setting up the hive properties, if yes then how can we 
verify that?
2.  
If the above is not the right way, then how can we achieve setting hive 
configuration parameters from Java using JDBC?
 
Thanks
Rinku Garg
 



_

The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.

  

Re: Partitions in Hive

2014-03-06 Thread Nitin Pawar
partition in hive is done on the column value and not on the sub portion of
column value.

If you want to separate data based on the first character then create
another column to store that value




On Thu, Mar 6, 2014 at 11:42 PM, nagarjuna kanamarlapudi 
nagarjuna.kanamarlap...@gmail.com wrote:

 Hi,

 I have a table with 3 columns in hive.

 I want that table to be partitioned based on first letter of column 1.
 How do we define such partition condition in hive ?

 Regards,
 Nagarjuna K




-- 
Nitin Pawar


Re: Setting | Verifying | Hive Query Parameters from Java

2014-03-06 Thread Gordon Wang
The 2 following props are for hive server2. I don't think you can change it
in your jdbc session. I am wondering why you need to change them in your
jdbc connection.


props.setProperty(hive.server2.async.exec.threads,50);

props.setProperty(hive.server2.thrift.max.worker.threads,500);



You can set props in your jdbc connetion with HQL like propA=valueA;


More, your connection url is for hive server, it does not work for hive
server2.

If you need to use hive server2, you have to use

jdbc:hive2://master149:1/default




On Tue, Mar 4, 2014 at 7:43 PM, Garg, Rinku rinku.g...@fisglobal.comwrote:

  Hi All,



 We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as
 desired.



 We need to set hive configuration parameter from Java while making JDBC
 connection.



 We have written a java program to execute queries on hive server with some
 configurations properties setting dynamically . We are doing it as below



 CONNECTION_URL=jdbc:hive://master149:1/default



 Next, we are doing following method to set properties through java



 props.setProperty(hive.server2.async.exec.threads,50);

 props.setProperty(hive.server2.thrift.max.worker.threads,500);

 props.setProperty(hive.groupby.orderby.position.alias,false);



 and a hive connection is made as given below



 hiveConnection = DriverManager.getConnection(connectionURL,props);



 by above steps when a hive connection is made using hive-jdbc and we are
 getting hive query results as desired.



 *QUERY: *



 *1.   **Are we doing rightly for setting up the hive properties, if
 yes then how can we verify that?*

 *2.   **If the above is not the right way, then how can we achieve
 setting hive configuration parameters from Java using JDBC?*



 Thanks

 Rinku Garg


  _
 The information contained in this message is proprietary and/or
 confidential. If you are not the intended recipient, please: (i) delete the
 message and all copies; (ii) do not disclose, distribute or use the message
 in any manner; and (iii) notify the sender immediately. In addition, please
 be aware that any message addressed to our domain is subject to archiving
 and review by persons other than the intended recipient. Thank you.




-- 
Regards
Gordon Wang


RE: Setting | Verifying | Hive Query Parameters from Java

2014-03-06 Thread Garg, Rinku
Hi Gordon,

Thanks a lot for your reply.

The properties mentioned in mail trail  are just an example. Actual properties 
that we want to set are as given below:

set yarn.nodemanager.resource.memory-mb=16384;
set mapreduce.map.memory.mb=2048;
set mapreduce.reduce.memory.mb=2048;
set mapreduce.map.java.opts=-Xmx2048M;
set yarn.app.mapreduce.am.command-opts=-Xmx2048m;

Please suggest.

Thanks
Rinku Garg

From: Gordon Wang [mailto:gw...@gopivotal.com]
Sent: Friday, March 07, 2014 11:49 AM
To: user@hive.apache.org
Subject: Re: Setting | Verifying | Hive Query Parameters from Java

The 2 following props are for hive server2. I don't think you can change it in 
your jdbc session. I am wondering why you need to change them in your jdbc 
connection.

props.setProperty(hive.server2.async.exec.threads,50);
props.setProperty(hive.server2.thrift.max.worker.threads,500);


You can set props in your jdbc connetion with HQL like propA=valueA;

More, your connection url is for hive server, it does not work for hive server2.
If you need to use hive server2, you have to use
jdbc:hive2://master149:1/default


On Tue, Mar 4, 2014 at 7:43 PM, Garg, Rinku 
rinku.g...@fisglobal.commailto:rinku.g...@fisglobal.com wrote:
Hi All,

We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as 
desired.

We need to set hive configuration parameter from Java while making JDBC 
connection.

We have written a java program to execute queries on hive server with some 
configurations properties setting dynamically . We are doing it as below

CONNECTION_URL=jdbc:hive://master149:1/default

Next, we are doing following method to set properties through java

props.setProperty(hive.server2.async.exec.threads,50);
props.setProperty(hive.server2.thrift.max.worker.threads,500);
props.setProperty(hive.groupby.orderby.position.alias,false);

and a hive connection is made as given below

hiveConnection = DriverManager.getConnection(connectionURL,props);

by above steps when a hive connection is made using hive-jdbc and we are 
getting hive query results as desired.

QUERY:


1.   Are we doing rightly for setting up the hive properties, if yes then 
how can we verify that?

2.   If the above is not the right way, then how can we achieve setting 
hive configuration parameters from Java using JDBC?

Thanks
Rinku Garg

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.



--
Regards
Gordon Wang

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.