How to configure Hive intermediate result file format?

2013-10-28 Thread Shangzhong zhu
Hi, Is there any configuration that can change the Hive intermediate result
file format (between MR jobs)?

I think right now, the default is SequenceFile. I am not sure if I can set
to something else like RCFile.

The Hive version I am using is 0.9.0

Thanks.


SerDe for Fixed Columns?

2013-10-28 Thread P Reeder
Hi! Sorry to ask such a basic question, but

How do I import data from a text file with fixed columns (no delimiter)? I.e, 
column 1 is characters 1-6, column 2 is characters 7-11, column 3 is character 
12, column 4 is character 13, and so forth?  I'd be surprised if no one has 
written a SerDe for this.

The manual talks about a MetadataTypedColumnsetSerDe, a ThriftSerDe, a 
DynamicSerDe, and SerDes for JSON, Avro and ORC.  I also searched Bing for 
site:mail-archives.apache.org/mod_mbox/hive-user/ serde fixed, but that 
didn't appear to have anything relevant.

FWIW, the dataset is the National Immunization Survey: 
http://www.cdc.gov/nchs/nis/data_files.htm

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.



Using Cluster by to improve Group by Performance

2013-10-28 Thread KayVajj
Hi,

I have a question if I could use the cluster by clause in a sub query to
improve the performance of a group by query in hive

Lets I have a Table A with columns (all strings) col1..col5 and the table
is not Clustered

now I 'm trying to run the below query

select
 col1,
 col2,
 col3,
 col4,
 concat_ws(',', collect_set(col5))
 from A
 group by
 col1,
 col2,
 col3,
 col4



Would the below query optimize the above query and if not what is the best
practice to optimize this query. Assuming only col1  col2 are the uniquely
identifying columns




select
 ct.col1,
 ct.col2,
 ct.col3,
 ct.col4,
 concat_ws(',', collect_set(ct.col5))
 from
 (
 select
 col1,
 col2,
 col3,
 col4,
 col5
 from A
 cluster by col1, col2
 ) ct
 group by
 ct.col1,
 ct.col2,
 ct.col3,
 ct.col4.


Thanks for your responses.


Re: SerDe for Fixed Columns?

2013-10-28 Thread Roberto Congiu
Also, you could use the regexp serde using a regexp like
(.{10})(.{3})(.{4}) etc (1st column - 10 characters, 2nd col - 3
characters, etc).


On Mon, Oct 28, 2013 at 1:03 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 I am not aware of any such serde.

 Till others reply you can wait
 or if you are in hurry then load the data in a table as single column and
 then use substring method to select and load data into proper table.


 On Tue, Oct 29, 2013 at 1:18 AM, P Reeder p_ree...@persistentsys.comwrote:

  Hi! Sorry to ask such a basic question, but 

 ** **

 How do I import data from a text file with fixed columns (no delimiter)?
 I.e, column 1 is characters 1-6, column 2 is characters 7-11, column 3 is
 character 12, column 4 is character 13, and so forth?  I’d be surprised if
 no one has written a SerDe for this.  

 ** **

 The manual talks about a MetadataTypedColumnsetSerDe, a ThriftSerDe, a
 DynamicSerDe, and SerDes for JSON, Avro and ORC.  I also searched Bing for
 “site:mail-archives.apache.org/mod_mbox/hive-user/ serde fixed”, but
 that didn’t appear to have anything relevant.

 ** **

 FWIW, the dataset is the National Immunization Survey:
 http://www.cdc.gov/nchs/nis/data_files.htm

 DISCLAIMER == This e-mail may contain privileged and confidential
 information which is the property of Persistent Systems Ltd. It is intended
 only for the use of the individual or entity to which it is addressed. If
 you are not the intended recipient, you are not authorized to read, retain,
 copy, print, distribute or use this message. If you have received this
 communication in error, please notify the sender and delete all copies of
 this message. Persistent Systems Ltd. does not accept any liability for
 virus infected mails.




 --
 Nitin Pawar




-- 
--
Good judgement comes with experience.
Experience comes with bad judgement.
--
Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141


request Hive wiki write access

2013-10-28 Thread Eric Hanson (SQL SERVER)
Hi,

I would like write access to the Hive wiki to add documentation on how to used 
vectorized query. Can the owner please add me? I got no reply before when 
sending to user@hive.apache.org.

Thanks,
Eric


From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com]
Sent: Thursday, October 17, 2013 5:30 PM
To: user@hive.apache.org
Subject: request Hive wiki write access

Hi,

Can I please have write access to the Hive wiki? I'm writing per the 
instructions here: 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki.

Thanks,
Eric


Re: [ANNOUNCE] New Hive PMC Members - Thejas Nair and Brock Noland

2013-10-28 Thread Chris Drome
Congratulations Brock and Thejas!

On 10/24/13 3:10 PM, Carl Steinbach c...@apache.org wrote:

I am pleased to announce that Thejas Nair and Brock Noland have been
elected to the Hive Project Management Committee. Please join me in
congratulating Thejas and Brock!

Thanks.

Carl



Re: request Hive wiki write access

2013-10-28 Thread Ashutosh Chauhan
Hi Eric,

Added you as a contributor to Hive wiki.

Thanks,
Ashutosh


On Mon, Oct 28, 2013 at 4:39 PM, Eric Hanson (SQL SERVER) 
eric.n.han...@microsoft.com wrote:

 Hi,

 I would like write access to the Hive wiki to add documentation on how to
 used vectorized query. Can the owner please add me? I got no reply before
 when sending to user@hive.apache.org.

 Thanks,
 Eric


 From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com]
 Sent: Thursday, October 17, 2013 5:30 PM
 To: user@hive.apache.org
 Subject: request Hive wiki write access

 Hi,

 Can I please have write access to the Hive wiki? I'm writing per the
 instructions here:
 https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki.

 Thanks,
 Eric



Re: request Hive wiki write access

2013-10-28 Thread Mikhail Antonov
Could you please also add me? olorinb...@gmail.com

I wanted to add details about LDAP integration

-Mikhail

2013/10/28, Ashutosh Chauhan hashut...@apache.org:
 Hi Eric,

 Added you as a contributor to Hive wiki.

 Thanks,
 Ashutosh


 On Mon, Oct 28, 2013 at 4:39 PM, Eric Hanson (SQL SERVER) 
 eric.n.han...@microsoft.com wrote:

 Hi,

 I would like write access to the Hive wiki to add documentation on how to
 used vectorized query. Can the owner please add me? I got no reply before
 when sending to user@hive.apache.org.

 Thanks,
 Eric


 From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com]
 Sent: Thursday, October 17, 2013 5:30 PM
 To: user@hive.apache.org
 Subject: request Hive wiki write access

 Hi,

 Can I please have write access to the Hive wiki? I'm writing per the
 instructions here:
 https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki.

 Thanks,
 Eric




-- 
Thanks,
Michael Antonov


Re: request Hive wiki write access

2013-10-28 Thread Ashutosh Chauhan
Hi Mikhail,

Sure, but first you first need to create an account on
https://cwiki.apache.org/confluence/display/Hive/Home Once you have done
that, than let me know your cwiki id and I will add you as a contributor.

Ashutosh
On Mon, Oct 28, 2013 at 5:55 PM, Mikhail Antonov olorinb...@gmail.comwrote:

 olorinb...@gmail.com



Re: request Hive wiki write access

2013-10-28 Thread Brad Ruderman
3rd as well. I would like to add information about hs2 client libraries
(ruby,node,python).

bradruder...@gmail.com

Thanks,
Brad


On Mon, Oct 28, 2013 at 5:55 PM, Mikhail Antonov olorinb...@gmail.comwrote:

 Could you please also add me? olorinb...@gmail.com

 I wanted to add details about LDAP integration

 -Mikhail

 2013/10/28, Ashutosh Chauhan hashut...@apache.org:
  Hi Eric,
 
  Added you as a contributor to Hive wiki.
 
  Thanks,
  Ashutosh
 
 
  On Mon, Oct 28, 2013 at 4:39 PM, Eric Hanson (SQL SERVER) 
  eric.n.han...@microsoft.com wrote:
 
  Hi,
 
  I would like write access to the Hive wiki to add documentation on how
 to
  used vectorized query. Can the owner please add me? I got no reply
 before
  when sending to user@hive.apache.org.
 
  Thanks,
  Eric
 
 
  From: Eric Hanson (SQL SERVER) [mailto:eric.n.han...@microsoft.com]
  Sent: Thursday, October 17, 2013 5:30 PM
  To: user@hive.apache.org
  Subject: request Hive wiki write access
 
  Hi,
 
  Can I please have write access to the Hive wiki? I'm writing per the
  instructions here:
  https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki.
 
  Thanks,
  Eric
 
 


 --
 Thanks,
 Michael Antonov