How to configure Hive intermediate result file format?
Hi, Is there any configuration that can change the Hive intermediate result file format (between MR jobs)? I think right now, the default is SequenceFile. I am not sure if I can set to something else like RCFile. The Hive version I am using is 0.9.0 Thanks.
SerDe for Fixed Columns?
Hi! Sorry to ask such a basic question, but How do I import data from a text file with fixed columns (no delimiter)? I.e, column 1 is characters 1-6, column 2 is characters 7-11, column 3 is character 12, column 4 is character 13, and so forth? I'd be surprised if no one has written a SerDe for this. The manual talks about a MetadataTypedColumnsetSerDe, a ThriftSerDe, a DynamicSerDe, and SerDes for JSON, Avro and ORC. I also searched Bing for site:mail-archives.apache.org/mod_mbox/hive-user/ serde fixed, but that didn't appear to have anything relevant. FWIW, the dataset is the National Immunization Survey: http://www.cdc.gov/nchs/nis/data_files.htm DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Using Cluster by to improve Group by Performance
Hi, I have a question if I could use the cluster by clause in a sub query to improve the performance of a group by query in hive Lets I have a Table A with columns (all strings) col1..col5 and the table is not Clustered now I 'm trying to run the below query select col1, col2, col3, col4, concat_ws(',', collect_set(col5)) from A group by col1, col2, col3, col4 Would the below query optimize the above query and if not what is the best practice to optimize this query. Assuming only col1 col2 are the uniquely identifying columns select ct.col1, ct.col2, ct.col3, ct.col4, concat_ws(',', collect_set(ct.col5)) from ( select col1, col2, col3, col4, col5 from A cluster by col1, col2 ) ct group by ct.col1, ct.col2, ct.col3, ct.col4. Thanks for your responses.
Re: SerDe for Fixed Columns?
Also, you could use the regexp serde using a regexp like (.{10})(.{3})(.{4}) etc (1st column - 10 characters, 2nd col - 3 characters, etc). On Mon, Oct 28, 2013 at 1:03 PM, Nitin Pawar nitinpawar...@gmail.comwrote: I am not aware of any such serde. Till others reply you can wait or if you are in hurry then load the data in a table as single column and then use substring method to select and load data into proper table. On Tue, Oct 29, 2013 at 1:18 AM, P Reeder p_ree...@persistentsys.comwrote: Hi! Sorry to ask such a basic question, but ** ** How do I import data from a text file with fixed columns (no delimiter)? I.e, column 1 is characters 1-6, column 2 is characters 7-11, column 3 is character 12, column 4 is character 13, and so forth? I’d be surprised if no one has written a SerDe for this. ** ** The manual talks about a MetadataTypedColumnsetSerDe, a ThriftSerDe, a DynamicSerDe, and SerDes for JSON, Avro and ORC. I also searched Bing for “site:mail-archives.apache.org/mod_mbox/hive-user/ serde fixed”, but that didn’t appear to have anything relevant. ** ** FWIW, the dataset is the National Immunization Survey: http://www.cdc.gov/nchs/nis/data_files.htm DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- Nitin Pawar -- -- Good judgement comes with experience. Experience comes with bad judgement. -- Roberto Congiu - Data Engineer - OpenX tel: +1 626 466 1141
request Hive wiki write access
Hi, I would like write access to the Hive wiki to add documentation on how to used vectorized query. Can the owner please add me? I got no reply before when sending to user@hive.apache.org. Thanks, Eric From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com] Sent: Thursday, October 17, 2013 5:30 PM To: user@hive.apache.org Subject: request Hive wiki write access Hi, Can I please have write access to the Hive wiki? I'm writing per the instructions here: https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki. Thanks, Eric
Re: [ANNOUNCE] New Hive PMC Members - Thejas Nair and Brock Noland
Congratulations Brock and Thejas! On 10/24/13 3:10 PM, Carl Steinbach c...@apache.org wrote: I am pleased to announce that Thejas Nair and Brock Noland have been elected to the Hive Project Management Committee. Please join me in congratulating Thejas and Brock! Thanks. Carl
Re: request Hive wiki write access
Hi Eric, Added you as a contributor to Hive wiki. Thanks, Ashutosh On Mon, Oct 28, 2013 at 4:39 PM, Eric Hanson (SQL SERVER) eric.n.han...@microsoft.com wrote: Hi, I would like write access to the Hive wiki to add documentation on how to used vectorized query. Can the owner please add me? I got no reply before when sending to user@hive.apache.org. Thanks, Eric From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com] Sent: Thursday, October 17, 2013 5:30 PM To: user@hive.apache.org Subject: request Hive wiki write access Hi, Can I please have write access to the Hive wiki? I'm writing per the instructions here: https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki. Thanks, Eric
Re: request Hive wiki write access
Could you please also add me? olorinb...@gmail.com I wanted to add details about LDAP integration -Mikhail 2013/10/28, Ashutosh Chauhan hashut...@apache.org: Hi Eric, Added you as a contributor to Hive wiki. Thanks, Ashutosh On Mon, Oct 28, 2013 at 4:39 PM, Eric Hanson (SQL SERVER) eric.n.han...@microsoft.com wrote: Hi, I would like write access to the Hive wiki to add documentation on how to used vectorized query. Can the owner please add me? I got no reply before when sending to user@hive.apache.org. Thanks, Eric From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com] Sent: Thursday, October 17, 2013 5:30 PM To: user@hive.apache.org Subject: request Hive wiki write access Hi, Can I please have write access to the Hive wiki? I'm writing per the instructions here: https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki. Thanks, Eric -- Thanks, Michael Antonov
Re: request Hive wiki write access
Hi Mikhail, Sure, but first you first need to create an account on https://cwiki.apache.org/confluence/display/Hive/Home Once you have done that, than let me know your cwiki id and I will add you as a contributor. Ashutosh On Mon, Oct 28, 2013 at 5:55 PM, Mikhail Antonov olorinb...@gmail.comwrote: olorinb...@gmail.com
Re: request Hive wiki write access
3rd as well. I would like to add information about hs2 client libraries (ruby,node,python). bradruder...@gmail.com Thanks, Brad On Mon, Oct 28, 2013 at 5:55 PM, Mikhail Antonov olorinb...@gmail.comwrote: Could you please also add me? olorinb...@gmail.com I wanted to add details about LDAP integration -Mikhail 2013/10/28, Ashutosh Chauhan hashut...@apache.org: Hi Eric, Added you as a contributor to Hive wiki. Thanks, Ashutosh On Mon, Oct 28, 2013 at 4:39 PM, Eric Hanson (SQL SERVER) eric.n.han...@microsoft.com wrote: Hi, I would like write access to the Hive wiki to add documentation on how to used vectorized query. Can the owner please add me? I got no reply before when sending to user@hive.apache.org. Thanks, Eric From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com] Sent: Thursday, October 17, 2013 5:30 PM To: user@hive.apache.org Subject: request Hive wiki write access Hi, Can I please have write access to the Hive wiki? I'm writing per the instructions here: https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki. Thanks, Eric -- Thanks, Michael Antonov