Running Hive 0.12.0 with Hadoop 2.2

2013-11-13 Thread Bill Q
Hi, Does Hive 0.12 run on Hadoop 2.2? if it does, anything special about installing it or running hive? Will hive start a M/R ApplicationMaster by itself or should I start one for it? Many thanks. Bill

Re: Re: config hive authorization (hive with kerberos and remote metastore)

2013-11-13 Thread david1990...@163.com
Whatever I authorize to user hadoop, I cannot do 'select ' even I change the database or use beeline. Can anyone config hive authorization successfully with remote metastore ? From: Mikhail Antonov Date: 2013-11-14 13:57 To: user Subject: Re: config hive authorization (hive with kerberos and rem

Re: Hive skewed tables

2013-11-13 Thread Nitin Pawar
how did u check its looking at all files inside the partition? If you want more restriction on limit on filse to be accessed, you can bucket them as well. That way you really dont have to worry about which data is skewed and let the framework handle it. On Thu, Nov 14, 2013 at 11:16 AM, Rajesh B

Re: config hive authorization (hive with kerberos and remote metastore)

2013-11-13 Thread Mikhail Antonov
Did you try connecting from beeline console? Also, that happens on the default database, what happens if you try to create a new database? -Mikhail 2013/11/13, david1990...@163.com : > Hive is configured with remote metastore and kerberos ,and it works fine . > > But now ,I want to config hive a

config hive authorization (hive with kerberos and remote metastore)

2013-11-13 Thread david1990...@163.com
Hive is configured with remote metastore and kerberos ,and it works fine . But now ,I want to config hive authorization ,and I modify hive-site.xml like this: hive.security.authorization.enabled true enable or disable the hive client authorization hive.security.authorization.createtable.owner.

Re: Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
Thanks Nitin. I have only one partition in this table for testing. I thought within the partition it will scan only certain files based on skewed fields. However it is scanning the entire data within the partition. On Nov 14, 2013 9:38 AM, "Nitin Pawar" wrote: > In my understanding, > when

Re: ORC Tuning - Examples?

2013-11-13 Thread Yin Huai
Hi John, I have not played with the stride length. Based on my understanding of the code, since the stride length determines the number of rows between index entries, if you decrease the stride length, you can get more fine-grained indexes which can potentially help you to skip more unnecessary ro

Re: Hive skewed tables

2013-11-13 Thread Nitin Pawar
In my understanding, when you are saying scanning entire dataset it is looking at all your partitions because your data has been partitioned by the date column. A skewed table is a table where there will be different files created for all your skewed keys in all the partitions. So for your query i

Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
Hi All, I have the following skewed table "addresses_1" select id, count(*) c from addresses_1 group by id order by c desc limit 10; 1426246531554806 198477395958492 102641838220181 138947865211331 156483436193429 96411677179771 210082076168033 800174765152421 1391

Re: ORC Tuning - Examples?

2013-11-13 Thread John Omernik
Yin - Fantastic! That is exactly the type of explanation of settings I'd like to see. More than just what it does, but the tradeoffs, and how things are applied in the real world. Have you played with the stride length at all? On Wed, Nov 13, 2013 at 1:13 PM, Yin Huai wrote: > Hi John, > > He

Re: ORC Tuning - Examples?

2013-11-13 Thread Yin Huai
Hi John, Here is my experience on the stripe size. For a given table, when the stripe size is increased, the size of a column in a stripe increases, which means the ORC reader can read a column from disks in a more efficient way because the reader can sequentially read more data (assuming the read

Compression for a HDFS text file - Hive External Partition Table

2013-11-13 Thread Raj Hadoop
Hi ,    1)  My requirement is to load a file ( a tar.gz file which has multiple tab separated values files and one file is the main file which has huge data – about 10 GB per day) to an externally partitioned hive table.   2)  What I am doing is I have automated the process by extracting

Converting array in Flume avro default output to string in Hive

2013-11-13 Thread Deepak Subhramanian
Hi, I am using Flume to generate events in the default flume avro output format. Bytes in avro schema are stored as array in Hive when I use avroserde for hive . How do I convert array to string to read the flume body data. I am using hive version 0.10 CREATE external TABLE flume_avro_test ROW F