RE: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread java8964
What version of hive you are using?
It is good to know that if it works in newer version.
Yong

Date: Tue, 11 Mar 2014 08:33:06 +0100
Subject: Re: Using an UDF in the WHERE (IN) clause
From: petter.von.dolw...@gmail.com
To: user@hive.apache.org

Hi Young,

I must argue that the partition pruning do actually work if I don't use the IN 
clause. What I wanted to achieve in my original query was to specify a range of 
partitions in a simple way. The same query can be expressed as


SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and partitionCol 
<= UDF("2014-03-11");

This UDF returns an INT (rather than an INT array). Both this UDF and the 
original one are annotated with @UDFType(deterministic = true) (if that has any 
impact) . This variant works fine and does partition pruning. Note that I don't 
have another column as input to my UDF but a static value.


Thanks,
Petter




2014-03-11 0:16 GMT+01:00 java8964 :




I don't know from syntax point of view, if Hive will allow to do "columnA IN 
UDF(columnB)".
What I do know that even let's say above work, it won't do the partition 
pruning.

The partition pruning in Hive is strict static, any dynamic values provided to 
partition column won't enable partition pruning, even though it is a feature I 
missed too.
Yong


Date: Mon, 10 Mar 2014 16:23:01 +0100
Subject: Using an UDF in the WHERE (IN) clause
From: petter.von.dolw...@gmail.com
To: user@hive.apache.org


Hi,

I'm trying to get the following query to work. The parser don't like it. 
Anybody aware of a workaround?

SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");



partitionCol is my partition column of type INT and I want to achieve early 
pruning. I've tried returning an array of INTs from my_udf and also a plain 
string in the format (1,2,3). It seems like the parser wont allow me to put an 
UDF in this place.



Any help appreciated.

Thanks,
Petter
  

  

Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Petter von Dolwitz (Hem)
Hi Navis,

I suspected that the parser only accepted an expression like (value1,
value2, value3...) as input. I guess one solution as you say would be to
add an array as an allowed argument to IN. I do not know if other SQL
dialects allow this. Another way would be to introduce a new type of UDF
that is expanded before it is sent to the parser (use string as return
type). Much like variables are handled I guess.

Thanks,
Petter


2014-03-11 2:32 GMT+01:00 Navis류승우 :

> (KW_IN expressions)
>-> ^(TOK_FUNCTION KW_IN $precedenceEqualExpression expressions)
>
> expressions
> :
> LPAREN expression (COMMA expression)* RPAREN -> expression*
> ;
>
> You should have arguments of IN wrapped by parentheses. But It seemed
> not possible to use array returning expression in it (type mismatch in
> current hive).
>
> We might extend IN function to accept single array as a argument.
>
>
> 2014-03-11 8:16 GMT+09:00 java8964 :
> > I don't know from syntax point of view, if Hive will allow to do
> "columnA IN
> > UDF(columnB)".
> >
> > What I do know that even let's say above work, it won't do the partition
> > pruning.
> >
> > The partition pruning in Hive is strict static, any dynamic values
> provided
> > to partition column won't enable partition pruning, even though it is a
> > feature I missed too.
> >
> > Yong
> >
> > 
> > Date: Mon, 10 Mar 2014 16:23:01 +0100
> > Subject: Using an UDF in the WHERE (IN) clause
> > From: petter.von.dolw...@gmail.com
> > To: user@hive.apache.org
> >
> >
> > Hi,
> >
> > I'm trying to get the following query to work. The parser don't like it.
> > Anybody aware of a workaround?
> >
> > SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
> >
> > partitionCol is my partition column of type INT and I want to achieve
> early
> > pruning. I've tried returning an array of INTs from my_udf and also a
> plain
> > string in the format (1,2,3). It seems like the parser wont allow me to
> put
> > an UDF in this place.
> >
> > Any help appreciated.
> >
> > Thanks,
> > Petter
>


Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Navis류승우
Then you should use BETWEEN, not IN. BETWEEN can be used for PPD, afaik.

2014-03-11 16:33 GMT+09:00 Petter von Dolwitz (Hem)
:
> Hi Young,
>
> I must argue that the partition pruning do actually work if I don't use the
> IN clause. What I wanted to achieve in my original query was to specify a
> range of partitions in a simple way. The same query can be expressed as
>
> SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and
> partitionCol <= UDF("2014-03-11");
>
> This UDF returns an INT (rather than an INT array). Both this UDF and the
> original one are annotated with @UDFType(deterministic = true) (if that has
> any impact) . This variant works fine and does partition pruning. Note that
> I don't have another column as input to my UDF but a static value.
>
> Thanks,
> Petter
>
>
>
>
> 2014-03-11 0:16 GMT+01:00 java8964 :
>
>> I don't know from syntax point of view, if Hive will allow to do "columnA
>> IN UDF(columnB)".
>>
>> What I do know that even let's say above work, it won't do the partition
>> pruning.
>>
>> The partition pruning in Hive is strict static, any dynamic values
>> provided to partition column won't enable partition pruning, even though it
>> is a feature I missed too.
>>
>> Yong
>>
>> 
>> Date: Mon, 10 Mar 2014 16:23:01 +0100
>> Subject: Using an UDF in the WHERE (IN) clause
>> From: petter.von.dolw...@gmail.com
>> To: user@hive.apache.org
>>
>>
>> Hi,
>>
>> I'm trying to get the following query to work. The parser don't like it.
>> Anybody aware of a workaround?
>>
>> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>>
>> partitionCol is my partition column of type INT and I want to achieve
>> early pruning. I've tried returning an array of INTs from my_udf and also a
>> plain string in the format (1,2,3). It seems like the parser wont allow me
>> to put an UDF in this place.
>>
>> Any help appreciated.
>>
>> Thanks,
>> Petter
>
>


Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Petter von Dolwitz (Hem)
Hi Young,

I must argue that the partition pruning do actually work if I don't use the
IN clause. What I wanted to achieve in my original query was to specify a
range of partitions in a simple way. The same query can be expressed as

SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and
partitionCol <= UDF("2014-03-11");

This UDF returns an INT (rather than an INT array). Both this UDF and the
original one are annotated with @UDFType(deterministic = true) (if that has
any impact) . This variant works fine and does partition pruning. Note that
I don't have another column as input to my UDF but a static value.

Thanks,
Petter




2014-03-11 0:16 GMT+01:00 java8964 :

> I don't know from syntax point of view, if Hive will allow to do "columnA
> IN UDF(columnB)".
>
> What I do know that even let's say above work, it won't do the partition
> pruning.
>
> The partition pruning in Hive is strict static, any dynamic values
> provided to partition column won't enable partition pruning, even though it
> is a feature I missed too.
>
> Yong
>
> --
> Date: Mon, 10 Mar 2014 16:23:01 +0100
> Subject: Using an UDF in the WHERE (IN) clause
> From: petter.von.dolw...@gmail.com
> To: user@hive.apache.org
>
>
> Hi,
>
> I'm trying to get the following query to work. The parser don't like it.
> Anybody aware of a workaround?
>
> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>
> partitionCol is my partition column of type INT and I want to achieve
> early pruning. I've tried returning an array of INTs from my_udf and also a
> plain string in the format (1,2,3). It seems like the parser wont allow me
> to put an UDF in this place.
>
> Any help appreciated.
>
> Thanks,
> Petter
>


Re: Using an UDF in the WHERE (IN) clause

2014-03-10 Thread Navis류승우
(KW_IN expressions)
   -> ^(TOK_FUNCTION KW_IN $precedenceEqualExpression expressions)

expressions
:
LPAREN expression (COMMA expression)* RPAREN -> expression*
;

You should have arguments of IN wrapped by parentheses. But It seemed
not possible to use array returning expression in it (type mismatch in
current hive).

We might extend IN function to accept single array as a argument.


2014-03-11 8:16 GMT+09:00 java8964 :
> I don't know from syntax point of view, if Hive will allow to do "columnA IN
> UDF(columnB)".
>
> What I do know that even let's say above work, it won't do the partition
> pruning.
>
> The partition pruning in Hive is strict static, any dynamic values provided
> to partition column won't enable partition pruning, even though it is a
> feature I missed too.
>
> Yong
>
> 
> Date: Mon, 10 Mar 2014 16:23:01 +0100
> Subject: Using an UDF in the WHERE (IN) clause
> From: petter.von.dolw...@gmail.com
> To: user@hive.apache.org
>
>
> Hi,
>
> I'm trying to get the following query to work. The parser don't like it.
> Anybody aware of a workaround?
>
> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>
> partitionCol is my partition column of type INT and I want to achieve early
> pruning. I've tried returning an array of INTs from my_udf and also a plain
> string in the format (1,2,3). It seems like the parser wont allow me to put
> an UDF in this place.
>
> Any help appreciated.
>
> Thanks,
> Petter


RE: Using an UDF in the WHERE (IN) clause

2014-03-10 Thread java8964
I don't know from syntax point of view, if Hive will allow to do "columnA IN 
UDF(columnB)".
What I do know that even let's say above work, it won't do the partition 
pruning.
The partition pruning in Hive is strict static, any dynamic values provided to 
partition column won't enable partition pruning, even though it is a feature I 
missed too.
Yong

Date: Mon, 10 Mar 2014 16:23:01 +0100
Subject: Using an UDF in the WHERE (IN) clause
From: petter.von.dolw...@gmail.com
To: user@hive.apache.org

Hi,

I'm trying to get the following query to work. The parser don't like it. 
Anybody aware of a workaround?

SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");


partitionCol is my partition column of type INT and I want to achieve early 
pruning. I've tried returning an array of INTs from my_udf and also a plain 
string in the format (1,2,3). It seems like the parser wont allow me to put an 
UDF in this place.


Any help appreciated.

Thanks,
Petter