Re: Hive append
are you talking about adding new records to tables or updating records in already existing table? On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote: Query in HIVE I tried merge kind of operation in Hive to retain the existing records and append the new records instead of dropping the table and populating it again. If anyone can come help with any other approach other than this or the approach to perform merge operation will be great help -- Nitin Pawar
Re: Hive append
Hi Nitin, existing records should remain same and the new records should get inserted into the table On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.com wrote: are you talking about adding new records to tables or updating records in already existing table? On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote: Query in HIVE I tried merge kind of operation in Hive to retain the existing records and append the new records instead of dropping the table and populating it again. If anyone can come help with any other approach other than this or the approach to perform merge operation will be great help -- Nitin Pawar
Re: Hive append
Hi Nitin, existing records should remain same and the new records should get inserted into the table On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.com wrote: are you talking about adding new records to tables or updating records in already existing table? On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote: Query in HIVE I tried merge kind of operation in Hive to retain the existing records and append the new records instead of dropping the table and populating it again. If anyone can come help with any other approach other than this or the approach to perform merge operation will be great help -- Nitin Pawar
Re: Hive append
You may want to look at partitioned tables and load data into partitions. For my that seems like the easiest way. If you do not have a defined partition column in your data, then another approach is load data into a temporary staging table and from there load into partitioned table. In this approach the catch would be that the data you are getting does not have data for older partitions. I normally have an extra column added to my tables. Something like data_load_date which is my partition table. Then from the staging table I load data in this table with partition to be the date on which I am loading new data to table. On Thu, Mar 6, 2014 at 2:30 PM, Raj hadoop raj.had...@gmail.com wrote: Hi Nitin, existing records should remain same and the new records should get inserted into the table On Thu, Mar 6, 2014 at 2:11 PM, Nitin Pawar nitinpawar...@gmail.comwrote: are you talking about adding new records to tables or updating records in already existing table? On Thu, Mar 6, 2014 at 1:59 PM, Raj hadoop raj.had...@gmail.com wrote: Query in HIVE I tried merge kind of operation in Hive to retain the existing records and append the new records instead of dropping the table and populating it again. If anyone can come help with any other approach other than this or the approach to perform merge operation will be great help -- Nitin Pawar -- Nitin Pawar
RE: Setting | Verifying | Hive Query Parameters from Java
Hi All, Can anybody help me on below mail trail. Thanks Rinku Garg From: Garg, Rinku Sent: Tuesday, March 04, 2014 5:14 PM To: user@hive.apache.org Subject: Setting | Verifying | Hive Query Parameters from Java Hi All, We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as desired. We need to set hive configuration parameter from Java while making JDBC connection. We have written a java program to execute queries on hive server with some configurations properties setting dynamically . We are doing it as below CONNECTION_URL=jdbc:hive://master149:1/default Next, we are doing following method to set properties through java props.setProperty(hive.server2.async.exec.threads,50); props.setProperty(hive.server2.thrift.max.worker.threads,500); props.setProperty(hive.groupby.orderby.position.alias,false); and a hive connection is made as given below hiveConnection = DriverManager.getConnection(connectionURL,props); by above steps when a hive connection is made using hive-jdbc and we are getting hive query results as desired. QUERY: 1. Are we doing rightly for setting up the hive properties, if yes then how can we verify that? 2. If the above is not the right way, then how can we achieve setting hive configuration parameters from Java using JDBC? Thanks Rinku Garg _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Re: Data Modeling Tool
A data architect friend said the latest release of CA Erwin can handle Hive, but it doesn't support Postgres directly. Thanks Joey D'Antoni On Wednesday, March 5, 2014 9:59 PM, Ronak Bhatt ronakb...@gmail.com wrote: Hello Hive Experts Is there any data modeling tool that you can suggest that can work with Hive and Postgres? Objective : build maintain entity definitions for Hive, Postgres thru this one tool...Build logical and physical models for data warehouse in the same tool. Any pointers? thanks, ronak
Hive unwanted location directory
We are creating external table in Hive and if the location path is not present in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' dummy folder. Is there any option in hive or any way to stop creating dummy directories if the location folder not exists. Our use case needs many temporary tables needs to be created dynamically and we are creating many unwanted dummy directories if the data not present on the HDFS. CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ schema json literal') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Regards Sathish Valluri
Re: Automatic replacement of partitions in hive
Nitin, #3 will not work. msck repair table does not remove partitions if the files associated with the partition do not exist. We have successfully applied #2 in our application. Regards, Bryan Jeffrey On Thu, Mar 6, 2014 at 5:37 AM, Nitin Pawar nitinpawar...@gmail.com wrote: There is no concept called automatic. Please wait for expert hive gurus to reply before using any of my suggestions Few options which I can think of are 1) Insert overwrite table with dynamic partitions enabled and restricting the partition column values for the date range you want. Cost of this operation will totally matter on how big the table is when you are importing via sqoop. 2) Load data in new partition and drop older partition using hive script little bit of scripting effort is needed 3) Use hadoop command line utilities to clear partition directories from hdfs and then do a table repair. I never heard anyone using this to delete partition. Its mostly to recover lost partitions etc On Thu, Mar 6, 2014 at 3:53 PM, Kasi Subrahmanyam kasisubbu...@gmail.comwrote: Hi, I have a table in hive which has data of three months old. I have partitioned the data and I got 90 partitions. Now when I get the new data from next day I want to replace the partition 1week old with the new one automatically. Can this partitioning and replacement be done using swoop at the same time Thanks, Subbu -- Nitin Pawar
Re: Automatic replacement of partitions in hive
Thanks for clarifying that Bryan On Thu, Mar 6, 2014 at 7:55 PM, Bryan Jeffrey bryan.jeff...@gmail.comwrote: Nitin, #3 will not work. msck repair table does not remove partitions if the files associated with the partition do not exist. We have successfully applied #2 in our application. Regards, Bryan Jeffrey On Thu, Mar 6, 2014 at 5:37 AM, Nitin Pawar nitinpawar...@gmail.comwrote: There is no concept called automatic. Please wait for expert hive gurus to reply before using any of my suggestions Few options which I can think of are 1) Insert overwrite table with dynamic partitions enabled and restricting the partition column values for the date range you want. Cost of this operation will totally matter on how big the table is when you are importing via sqoop. 2) Load data in new partition and drop older partition using hive script little bit of scripting effort is needed 3) Use hadoop command line utilities to clear partition directories from hdfs and then do a table repair. I never heard anyone using this to delete partition. Its mostly to recover lost partitions etc On Thu, Mar 6, 2014 at 3:53 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote: Hi, I have a table in hive which has data of three months old. I have partitioned the data and I got 90 partitions. Now when I get the new data from next day I want to replace the partition 1week old with the new one automatically. Can this partitioning and replacement be done using swoop at the same time Thanks, Subbu -- Nitin Pawar -- Nitin Pawar
RE: Setting | Verifying | Hive Query Parameters from Java
If you want to set some properties of hive, just run it as it is in your JDBC connection. Any command in the hive JDBC will send to the server as the same if you run set hive.server2.async.exec.threads=50; in the hive session. Run the command set hive.server2.async.exec.threads=50; as a SQL statement, it will adjust the value for your JDBC connection. About the properties setting, I am not sure if it will work in Hive JDBC. Hive JDBC is a limited JDBC implementation based on Hive, so it maybe won't work, but I don't know for sure. Yong From: rinku.g...@fisglobal.com To: user@hive.apache.org Subject: RE: Setting | Verifying | Hive Query Parameters from Java Date: Thu, 6 Mar 2014 11:12:52 + Hi All, Can anybody help me on below mail trail. Thanks Rinku Garg From: Garg, Rinku Sent: Tuesday, March 04, 2014 5:14 PM To: user@hive.apache.org Subject: Setting | Verifying | Hive Query Parameters from Java Hi All, We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as desired. We need to set hive configuration parameter from Java while making JDBC connection. We have written a java program to execute queries on hive server with some configurations properties setting dynamically . We are doing it as below CONNECTION_URL=jdbc:hive://master149:1/default Next, we are doing following method to set properties through java props.setProperty(hive.server2.async.exec.threads,50); props.setProperty(hive.server2.thrift.max.worker.threads,500); props.setProperty(hive.groupby.orderby.position.alias,false); and a hive connection is made as given below hiveConnection = DriverManager.getConnection(connectionURL,props); by above steps when a hive connection is made using hive-jdbc and we are getting hive query results as desired. QUERY: 1. Are we doing rightly for setting up the hive properties, if yes then how can we verify that? 2. If the above is not the right way, then how can we achieve setting hive configuration parameters from Java using JDBC? Thanks Rinku Garg _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Re: Partitions in Hive
partition in hive is done on the column value and not on the sub portion of column value. If you want to separate data based on the first character then create another column to store that value On Thu, Mar 6, 2014 at 11:42 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I have a table with 3 columns in hive. I want that table to be partitioned based on first letter of column 1. How do we define such partition condition in hive ? Regards, Nagarjuna K -- Nitin Pawar
Re: Setting | Verifying | Hive Query Parameters from Java
The 2 following props are for hive server2. I don't think you can change it in your jdbc session. I am wondering why you need to change them in your jdbc connection. props.setProperty(hive.server2.async.exec.threads,50); props.setProperty(hive.server2.thrift.max.worker.threads,500); You can set props in your jdbc connetion with HQL like propA=valueA; More, your connection url is for hive server, it does not work for hive server2. If you need to use hive server2, you have to use jdbc:hive2://master149:1/default On Tue, Mar 4, 2014 at 7:43 PM, Garg, Rinku rinku.g...@fisglobal.comwrote: Hi All, We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as desired. We need to set hive configuration parameter from Java while making JDBC connection. We have written a java program to execute queries on hive server with some configurations properties setting dynamically . We are doing it as below CONNECTION_URL=jdbc:hive://master149:1/default Next, we are doing following method to set properties through java props.setProperty(hive.server2.async.exec.threads,50); props.setProperty(hive.server2.thrift.max.worker.threads,500); props.setProperty(hive.groupby.orderby.position.alias,false); and a hive connection is made as given below hiveConnection = DriverManager.getConnection(connectionURL,props); by above steps when a hive connection is made using hive-jdbc and we are getting hive query results as desired. *QUERY: * *1. **Are we doing rightly for setting up the hive properties, if yes then how can we verify that?* *2. **If the above is not the right way, then how can we achieve setting hive configuration parameters from Java using JDBC?* Thanks Rinku Garg _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. -- Regards Gordon Wang
RE: Setting | Verifying | Hive Query Parameters from Java
Hi Gordon, Thanks a lot for your reply. The properties mentioned in mail trail are just an example. Actual properties that we want to set are as given below: set yarn.nodemanager.resource.memory-mb=16384; set mapreduce.map.memory.mb=2048; set mapreduce.reduce.memory.mb=2048; set mapreduce.map.java.opts=-Xmx2048M; set yarn.app.mapreduce.am.command-opts=-Xmx2048m; Please suggest. Thanks Rinku Garg From: Gordon Wang [mailto:gw...@gopivotal.com] Sent: Friday, March 07, 2014 11:49 AM To: user@hive.apache.org Subject: Re: Setting | Verifying | Hive Query Parameters from Java The 2 following props are for hive server2. I don't think you can change it in your jdbc session. I am wondering why you need to change them in your jdbc connection. props.setProperty(hive.server2.async.exec.threads,50); props.setProperty(hive.server2.thrift.max.worker.threads,500); You can set props in your jdbc connetion with HQL like propA=valueA; More, your connection url is for hive server, it does not work for hive server2. If you need to use hive server2, you have to use jdbc:hive2://master149:1/default On Tue, Mar 4, 2014 at 7:43 PM, Garg, Rinku rinku.g...@fisglobal.commailto:rinku.g...@fisglobal.com wrote: Hi All, We have installed CDH4.2.0 and hive-0.10.0-cdh4.2.0. Both are working as desired. We need to set hive configuration parameter from Java while making JDBC connection. We have written a java program to execute queries on hive server with some configurations properties setting dynamically . We are doing it as below CONNECTION_URL=jdbc:hive://master149:1/default Next, we are doing following method to set properties through java props.setProperty(hive.server2.async.exec.threads,50); props.setProperty(hive.server2.thrift.max.worker.threads,500); props.setProperty(hive.groupby.orderby.position.alias,false); and a hive connection is made as given below hiveConnection = DriverManager.getConnection(connectionURL,props); by above steps when a hive connection is made using hive-jdbc and we are getting hive query results as desired. QUERY: 1. Are we doing rightly for setting up the hive properties, if yes then how can we verify that? 2. If the above is not the right way, then how can we achieve setting hive configuration parameters from Java using JDBC? Thanks Rinku Garg _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. -- Regards Gordon Wang _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.