Job killed on HiveServer2 restart

2014-03-11 Thread Ashu Pachauri
We use Hive-0.12 and are planning to use HiveServer2 with cloudera Hue. The
scenario is like this. There are frequent additions of Hive UDFs by users
and this requires frequent Hive deployment. To pick up these changes, we
need to restart HiveServer2.
When we submit a query to HiveServer2, the job is killed whenever we
restart HiveServer. So, basically, we find no way to persist the query
while picking up these Hive UDF changes. Is there a way to persist a query
with HS2 restart or not requiring to restart HS2 for picking up new changes.


-- 
Thanks and Regards,
Ashu


BinarySortableSerde not working with TextInputFormat

2014-03-11 Thread Vaibhav Jain
Hi,

I have a table created by the following query

CREATE EXTERNAL TABLE IF NOT EXISTS partition_table (partkey STRING)
 ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
LOCATION  ''


Whenever I am trying to count the number of rows  in the table  (using
select count * from partition_table) I am getting the following exception :

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing writable SEQ
$org.apache.hadoop.hive.ql.io.HiveKey!org.apache.hadoop.io.NullWritable
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing writable SEQ
$org.apache.hadoop.hive.ql.io.HiveKey!org.apache.hadoop.io.NullWritable
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by:* java.lang.ClassCastException: org.apache.hadoop.io.Text cannot
be cast to org.apache.hadoop.io.BytesWritable*
* at
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:172)*
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:523)




Please advise on what I should do. I am not sure why this query is failing.

Thanks

-- 
Thanks
Vaibhav Jain


Hive Thrift Client - Java example

2014-03-11 Thread Prasan Samtani
I did some looking around on the wiki, and couldn't find any examples 
connecting to the Thrift server using Java. I understand that JDBC is the 
preferred way with Java, but it seems that the Thrift client has a few 
additions that I would like to use (for example getQueryPlan).

For the convenience of others, I've posted a gist with some examples of 
connecting to the Hive ThriftServer using Java here:
https://gist.github.com/ZenBowman/9492608

Could we add some examples to the wiki on using the Thrift Client?

Thanks,
Prasan

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-11 Thread Stephen Sprague
that makes no sense. if the column is an int it isn't going to sort like a
string.  I smell a user error somewhere.


On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz wrote:

> Hi ,
>
> I have a table that has a partition column partition_hr . Data Type is int
> (partition_hrint) . When i run a sort on this column the output is
> like this.
>
> 0
> 1
> 10
> 11
> 12
> 13
> 14
> 15
> 16
> 17
> 18
> 19
> 2
> 20
> 21
> 22
> 23
> 3
> 4
> 5
> 6
> 7
> 8
> 9
>
> I expected the output like this  .
>
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> .
> .
> and so on.
>
> It works fine for non-partition columns. Please advise.
>
> Thanks,
> Moiz
>
>


RE: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread java8964
What version of hive you are using?
It is good to know that if it works in newer version.
Yong

Date: Tue, 11 Mar 2014 08:33:06 +0100
Subject: Re: Using an UDF in the WHERE (IN) clause
From: petter.von.dolw...@gmail.com
To: user@hive.apache.org

Hi Young,

I must argue that the partition pruning do actually work if I don't use the IN 
clause. What I wanted to achieve in my original query was to specify a range of 
partitions in a simple way. The same query can be expressed as


SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and partitionCol 
<= UDF("2014-03-11");

This UDF returns an INT (rather than an INT array). Both this UDF and the 
original one are annotated with @UDFType(deterministic = true) (if that has any 
impact) . This variant works fine and does partition pruning. Note that I don't 
have another column as input to my UDF but a static value.


Thanks,
Petter




2014-03-11 0:16 GMT+01:00 java8964 :




I don't know from syntax point of view, if Hive will allow to do "columnA IN 
UDF(columnB)".
What I do know that even let's say above work, it won't do the partition 
pruning.

The partition pruning in Hive is strict static, any dynamic values provided to 
partition column won't enable partition pruning, even though it is a feature I 
missed too.
Yong


Date: Mon, 10 Mar 2014 16:23:01 +0100
Subject: Using an UDF in the WHERE (IN) clause
From: petter.von.dolw...@gmail.com
To: user@hive.apache.org


Hi,

I'm trying to get the following query to work. The parser don't like it. 
Anybody aware of a workaround?

SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");



partitionCol is my partition column of type INT and I want to achieve early 
pruning. I've tried returning an array of INTs from my_udf and also a plain 
string in the format (1,2,3). It seems like the parser wont allow me to put an 
UDF in this place.



Any help appreciated.

Thanks,
Petter
  

  

Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-11 Thread Arafat, Moiz
Hi ,

I have a table that has a partition column partition_hr . Data Type is int 
(partition_hrint) . When i run a sort on this column the output is like 
this. 

0
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
23
3
4
5
6
7
8
9

I expected the output like this  . 

0
1
2
3
4
5
6
7
8
9
10
.
.
and so on.

It works fine for non-partition columns. Please advise.

Thanks,
Moiz



Re: How to add partition using hive by a specific date?

2014-03-11 Thread Sherine Brisillal
Thanks for your Response!!

hive> LOAD DATA INPATH 's3://test.com/' OVERWRITE INTO TABLE test PARTITION
(dt='2014-01-01');
FAILED: Error in semantic analysis: line 1:17 Invalid Path 's3://test.com/':
only "file" or "hdfs" file systems accepted. s3 file system is not
supported.

My data is partitioned as follows:

   DIR   s3://test.com/2014-03-01/
   DIR   s3://test.com/2014-03-02/
   DIR   s3://test.com/2014-03-03/
   DIR   s3://test.com/2014-03-04/
   DIR   s3://test.com/2014-03-05/

s3://test.com/2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_06-26_19-56.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_15-20_12-53.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_22-54_27-19.log


On Tue, Mar 11, 2014 at 3:03 PM, Chinna Rao Lalam <
lalamchinnara...@gmail.com> wrote:

> Hi,
>
>   Any exceptions in the log file?
>
>
>   Try this for loading into partions. In your scenario if it is possible
> with load query.
>   LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites
> PARTITION (ds='2008-08-15');
>
>
> Hope it helps,
> Chinna
>
>>
>>
>


-- 

*Regards*
*Sherine Brisillal*


Hadoop Hive: How to allow regular user continuously write data and create tables in warehouse directory?

2014-03-11 Thread anton ashanin
I am running Hadoop 2.2.0.2.0.6.0-101 on a single node.
I am trying to run Java MRD program that writes data to an existing Hive
table from Eclipse under regular user. I get exception:

org.apache.hadoop.security.AccessControlException: Permission denied:
user=dev, access=WRITE,
inode="/apps/hive/warehouse/testids":hdfs:hdfs:drwxr-xr-x

This happens because regular user has no write permission to warehouse
directory, only `hdfs` user does:

drwxr-xr-x   - hdfs hdfs  0 2014-03-06 16:08
/apps/hive/warehouse/testids
drwxr-xr-x   - hdfs hdfs  0 2014-03-05 12:07
/apps/hive/warehouse/test

To circumvent this I change permissions on warehouse directory, so
everybody now have write permissions:

[hdfs@localhost wks]$ hadoop fs -chmod -R a+w /apps/hive/warehouse
[hdfs@localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxrwxrwx   - hdfs hdfs  0 2014-03-06 16:08
/apps/hive/warehouse/testids
drwxrwxrwx   - hdfs hdfs  0 2014-03-05 12:07
/apps/hive/warehouse/test

This helps to some extent, and MRD program can now write as a regular user
to warehouse directory, but only once. When trying to write data into the
same table second time I get:

ERROR security.UserGroupInformation: PriviledgedActionException as:dev
(auth:SIMPLE) cause:org.apache.hcatalog.common.HCatException : 2003 :
Non-partitioned table already contains data : default.testids

Now, if I delete output table and create it anew in `hive` shell, I again
get default permissions that do not allow regular user to write data into
this table:

[hdfs@localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxr-xr-x   - hdfs hdfs  0 2014-03-11 12:19
/apps/hive/warehouse/testids
drwxrwxrwx   - hdfs hdfs  0 2014-03-05 12:07
/apps/hive/warehouse/test

Please advise on Hive correct configuration steps that will allow a program
run as a regular user do the following operations in Hive warehouse:

- Programmatically create / delete / rename Hive tables?
- Programmatically read / write data from Hive tables?

Many thanks!


Re: How to add partition using hive by a specific date?

2014-03-11 Thread Chinna Rao Lalam
Hi,

  Any exceptions in the log file?


  Try this for loading into partions. In your scenario if it is possible
with load query.
  LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites
PARTITION (ds='2008-08-15');


Hope it helps,
Chinna

>
>


Views show different type for partition column compared to table

2014-03-11 Thread Petter von Dolwitz (Hem)
Hi,

I have a table partitioned where the partition column is of type INT. When
creating a view on this table the partition column shows up as STRING. I
can still however issue queries towards the view treating the partition
column as an INT, e.g. SELECT * from myview where partitionCol=1;

What is the reason for this behaviour? It does not seems to be correct. I
am on Hive 0.10.

CREATE EXTERNAL TABLE IF NOT EXISTS mytable (
  col1 STRING)
  PARTITIONED BY (partition_col INT)
  ROW FORMAT SERDE 'mySerde'
  STORED AS INPUTFORMAT 'myInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
  LOCATION 'myLocation';

CREATE VIEW IF NOT EXISTS myView
  (col1, partition_col)
PARTITIONED ON (partition_col)
AS SELECT col1, partition_col
FROM mytable
WHERE col1='value1';

describe mytable;
OK
col1string  from deserializer
partition_col   int None

# Partition Information
# col_name  data_type   comment

partition_col   int None


describe myview;
OK
col1string  None
partition_col   string  None

# Partition Information
# col_name  data_type   comment

partition_col   string  None


Thanks,
Petter


Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Petter von Dolwitz (Hem)
Hi Navis,

I suspected that the parser only accepted an expression like (value1,
value2, value3...) as input. I guess one solution as you say would be to
add an array as an allowed argument to IN. I do not know if other SQL
dialects allow this. Another way would be to introduce a new type of UDF
that is expanded before it is sent to the parser (use string as return
type). Much like variables are handled I guess.

Thanks,
Petter


2014-03-11 2:32 GMT+01:00 Navis류승우 :

> (KW_IN expressions)
>-> ^(TOK_FUNCTION KW_IN $precedenceEqualExpression expressions)
>
> expressions
> :
> LPAREN expression (COMMA expression)* RPAREN -> expression*
> ;
>
> You should have arguments of IN wrapped by parentheses. But It seemed
> not possible to use array returning expression in it (type mismatch in
> current hive).
>
> We might extend IN function to accept single array as a argument.
>
>
> 2014-03-11 8:16 GMT+09:00 java8964 :
> > I don't know from syntax point of view, if Hive will allow to do
> "columnA IN
> > UDF(columnB)".
> >
> > What I do know that even let's say above work, it won't do the partition
> > pruning.
> >
> > The partition pruning in Hive is strict static, any dynamic values
> provided
> > to partition column won't enable partition pruning, even though it is a
> > feature I missed too.
> >
> > Yong
> >
> > 
> > Date: Mon, 10 Mar 2014 16:23:01 +0100
> > Subject: Using an UDF in the WHERE (IN) clause
> > From: petter.von.dolw...@gmail.com
> > To: user@hive.apache.org
> >
> >
> > Hi,
> >
> > I'm trying to get the following query to work. The parser don't like it.
> > Anybody aware of a workaround?
> >
> > SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
> >
> > partitionCol is my partition column of type INT and I want to achieve
> early
> > pruning. I've tried returning an array of INTs from my_udf and also a
> plain
> > string in the format (1,2,3). It seems like the parser wont allow me to
> put
> > an UDF in this place.
> >
> > Any help appreciated.
> >
> > Thanks,
> > Petter
>


Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Navis류승우
Then you should use BETWEEN, not IN. BETWEEN can be used for PPD, afaik.

2014-03-11 16:33 GMT+09:00 Petter von Dolwitz (Hem)
:
> Hi Young,
>
> I must argue that the partition pruning do actually work if I don't use the
> IN clause. What I wanted to achieve in my original query was to specify a
> range of partitions in a simple way. The same query can be expressed as
>
> SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and
> partitionCol <= UDF("2014-03-11");
>
> This UDF returns an INT (rather than an INT array). Both this UDF and the
> original one are annotated with @UDFType(deterministic = true) (if that has
> any impact) . This variant works fine and does partition pruning. Note that
> I don't have another column as input to my UDF but a static value.
>
> Thanks,
> Petter
>
>
>
>
> 2014-03-11 0:16 GMT+01:00 java8964 :
>
>> I don't know from syntax point of view, if Hive will allow to do "columnA
>> IN UDF(columnB)".
>>
>> What I do know that even let's say above work, it won't do the partition
>> pruning.
>>
>> The partition pruning in Hive is strict static, any dynamic values
>> provided to partition column won't enable partition pruning, even though it
>> is a feature I missed too.
>>
>> Yong
>>
>> 
>> Date: Mon, 10 Mar 2014 16:23:01 +0100
>> Subject: Using an UDF in the WHERE (IN) clause
>> From: petter.von.dolw...@gmail.com
>> To: user@hive.apache.org
>>
>>
>> Hi,
>>
>> I'm trying to get the following query to work. The parser don't like it.
>> Anybody aware of a workaround?
>>
>> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>>
>> partitionCol is my partition column of type INT and I want to achieve
>> early pruning. I've tried returning an array of INTs from my_udf and also a
>> plain string in the format (1,2,3). It seems like the parser wont allow me
>> to put an UDF in this place.
>>
>> Any help appreciated.
>>
>> Thanks,
>> Petter
>
>


Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Petter von Dolwitz (Hem)
Hi Young,

I must argue that the partition pruning do actually work if I don't use the
IN clause. What I wanted to achieve in my original query was to specify a
range of partitions in a simple way. The same query can be expressed as

SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and
partitionCol <= UDF("2014-03-11");

This UDF returns an INT (rather than an INT array). Both this UDF and the
original one are annotated with @UDFType(deterministic = true) (if that has
any impact) . This variant works fine and does partition pruning. Note that
I don't have another column as input to my UDF but a static value.

Thanks,
Petter




2014-03-11 0:16 GMT+01:00 java8964 :

> I don't know from syntax point of view, if Hive will allow to do "columnA
> IN UDF(columnB)".
>
> What I do know that even let's say above work, it won't do the partition
> pruning.
>
> The partition pruning in Hive is strict static, any dynamic values
> provided to partition column won't enable partition pruning, even though it
> is a feature I missed too.
>
> Yong
>
> --
> Date: Mon, 10 Mar 2014 16:23:01 +0100
> Subject: Using an UDF in the WHERE (IN) clause
> From: petter.von.dolw...@gmail.com
> To: user@hive.apache.org
>
>
> Hi,
>
> I'm trying to get the following query to work. The parser don't like it.
> Anybody aware of a workaround?
>
> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>
> partitionCol is my partition column of type INT and I want to achieve
> early pruning. I've tried returning an array of INTs from my_udf and also a
> plain string in the format (1,2,3). It seems like the parser wont allow me
> to put an UDF in this place.
>
> Any help appreciated.
>
> Thanks,
> Petter
>


How to add partition using hive by a specific date?

2014-03-11 Thread Sherine Brisillal
Hi,

I'm using hive (with external tables) to process data stored on amazon S3.

My data is partitioned as follows:

   DIR   s3://test.com/2014-03-01/
   DIR   s3://test.com/2014-03-02/
   DIR   s3://test.com/2014-03-03/
   DIR   s3://test.com/2014-03-04/
   DIR   s3://test.com/2014-03-05/

s3://test.com/2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_06-26_19-56.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_15-20_12-53.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_22-54_27-19.log

My table looks like:

CREATE EXTERNAL TABLE test (
foo string,
time string,
bar string
)  PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://test.com/';

I don't want to use this( s3://test.com/dt=2014-03-05) naming convention, I
need to add all partitions as in:
*ALTER TABLE test  ADD PARTITION (dt='2014-03-09') location
's3://test.com/2014-03-09 '*. It's working
without any error. but, it doesn't load any data's from s3. output is 0
bytes. why isn't data loading from s3?

Could somebody answer this question ? Thanks!