RE: Using an UDF in the WHERE (IN) clause
What version of hive you are using? It is good to know that if it works in newer version. Yong Date: Tue, 11 Mar 2014 08:33:06 +0100 Subject: Re: Using an UDF in the WHERE (IN) clause From: petter.von.dolw...@gmail.com To: user@hive.apache.org Hi Young, I must argue that the partition pruning do actually work if I don't use the IN clause. What I wanted to achieve in my original query was to specify a range of partitions in a simple way. The same query can be expressed as SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and partitionCol <= UDF("2014-03-11"); This UDF returns an INT (rather than an INT array). Both this UDF and the original one are annotated with @UDFType(deterministic = true) (if that has any impact) . This variant works fine and does partition pruning. Note that I don't have another column as input to my UDF but a static value. Thanks, Petter 2014-03-11 0:16 GMT+01:00 java8964 : I don't know from syntax point of view, if Hive will allow to do "columnA IN UDF(columnB)". What I do know that even let's say above work, it won't do the partition pruning. The partition pruning in Hive is strict static, any dynamic values provided to partition column won't enable partition pruning, even though it is a feature I missed too. Yong Date: Mon, 10 Mar 2014 16:23:01 +0100 Subject: Using an UDF in the WHERE (IN) clause From: petter.von.dolw...@gmail.com To: user@hive.apache.org Hi, I'm trying to get the following query to work. The parser don't like it. Anybody aware of a workaround? SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10"); partitionCol is my partition column of type INT and I want to achieve early pruning. I've tried returning an array of INTs from my_udf and also a plain string in the format (1,2,3). It seems like the parser wont allow me to put an UDF in this place. Any help appreciated. Thanks, Petter
Re: Using an UDF in the WHERE (IN) clause
Hi Navis, I suspected that the parser only accepted an expression like (value1, value2, value3...) as input. I guess one solution as you say would be to add an array as an allowed argument to IN. I do not know if other SQL dialects allow this. Another way would be to introduce a new type of UDF that is expanded before it is sent to the parser (use string as return type). Much like variables are handled I guess. Thanks, Petter 2014-03-11 2:32 GMT+01:00 Navis류승우 : > (KW_IN expressions) >-> ^(TOK_FUNCTION KW_IN $precedenceEqualExpression expressions) > > expressions > : > LPAREN expression (COMMA expression)* RPAREN -> expression* > ; > > You should have arguments of IN wrapped by parentheses. But It seemed > not possible to use array returning expression in it (type mismatch in > current hive). > > We might extend IN function to accept single array as a argument. > > > 2014-03-11 8:16 GMT+09:00 java8964 : > > I don't know from syntax point of view, if Hive will allow to do > "columnA IN > > UDF(columnB)". > > > > What I do know that even let's say above work, it won't do the partition > > pruning. > > > > The partition pruning in Hive is strict static, any dynamic values > provided > > to partition column won't enable partition pruning, even though it is a > > feature I missed too. > > > > Yong > > > > > > Date: Mon, 10 Mar 2014 16:23:01 +0100 > > Subject: Using an UDF in the WHERE (IN) clause > > From: petter.von.dolw...@gmail.com > > To: user@hive.apache.org > > > > > > Hi, > > > > I'm trying to get the following query to work. The parser don't like it. > > Anybody aware of a workaround? > > > > SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10"); > > > > partitionCol is my partition column of type INT and I want to achieve > early > > pruning. I've tried returning an array of INTs from my_udf and also a > plain > > string in the format (1,2,3). It seems like the parser wont allow me to > put > > an UDF in this place. > > > > Any help appreciated. > > > > Thanks, > > Petter >
Re: Using an UDF in the WHERE (IN) clause
Then you should use BETWEEN, not IN. BETWEEN can be used for PPD, afaik. 2014-03-11 16:33 GMT+09:00 Petter von Dolwitz (Hem) : > Hi Young, > > I must argue that the partition pruning do actually work if I don't use the > IN clause. What I wanted to achieve in my original query was to specify a > range of partitions in a simple way. The same query can be expressed as > > SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and > partitionCol <= UDF("2014-03-11"); > > This UDF returns an INT (rather than an INT array). Both this UDF and the > original one are annotated with @UDFType(deterministic = true) (if that has > any impact) . This variant works fine and does partition pruning. Note that > I don't have another column as input to my UDF but a static value. > > Thanks, > Petter > > > > > 2014-03-11 0:16 GMT+01:00 java8964 : > >> I don't know from syntax point of view, if Hive will allow to do "columnA >> IN UDF(columnB)". >> >> What I do know that even let's say above work, it won't do the partition >> pruning. >> >> The partition pruning in Hive is strict static, any dynamic values >> provided to partition column won't enable partition pruning, even though it >> is a feature I missed too. >> >> Yong >> >> >> Date: Mon, 10 Mar 2014 16:23:01 +0100 >> Subject: Using an UDF in the WHERE (IN) clause >> From: petter.von.dolw...@gmail.com >> To: user@hive.apache.org >> >> >> Hi, >> >> I'm trying to get the following query to work. The parser don't like it. >> Anybody aware of a workaround? >> >> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10"); >> >> partitionCol is my partition column of type INT and I want to achieve >> early pruning. I've tried returning an array of INTs from my_udf and also a >> plain string in the format (1,2,3). It seems like the parser wont allow me >> to put an UDF in this place. >> >> Any help appreciated. >> >> Thanks, >> Petter > >
Re: Using an UDF in the WHERE (IN) clause
Hi Young, I must argue that the partition pruning do actually work if I don't use the IN clause. What I wanted to achieve in my original query was to specify a range of partitions in a simple way. The same query can be expressed as SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and partitionCol <= UDF("2014-03-11"); This UDF returns an INT (rather than an INT array). Both this UDF and the original one are annotated with @UDFType(deterministic = true) (if that has any impact) . This variant works fine and does partition pruning. Note that I don't have another column as input to my UDF but a static value. Thanks, Petter 2014-03-11 0:16 GMT+01:00 java8964 : > I don't know from syntax point of view, if Hive will allow to do "columnA > IN UDF(columnB)". > > What I do know that even let's say above work, it won't do the partition > pruning. > > The partition pruning in Hive is strict static, any dynamic values > provided to partition column won't enable partition pruning, even though it > is a feature I missed too. > > Yong > > -- > Date: Mon, 10 Mar 2014 16:23:01 +0100 > Subject: Using an UDF in the WHERE (IN) clause > From: petter.von.dolw...@gmail.com > To: user@hive.apache.org > > > Hi, > > I'm trying to get the following query to work. The parser don't like it. > Anybody aware of a workaround? > > SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10"); > > partitionCol is my partition column of type INT and I want to achieve > early pruning. I've tried returning an array of INTs from my_udf and also a > plain string in the format (1,2,3). It seems like the parser wont allow me > to put an UDF in this place. > > Any help appreciated. > > Thanks, > Petter >
Re: Using an UDF in the WHERE (IN) clause
(KW_IN expressions) -> ^(TOK_FUNCTION KW_IN $precedenceEqualExpression expressions) expressions : LPAREN expression (COMMA expression)* RPAREN -> expression* ; You should have arguments of IN wrapped by parentheses. But It seemed not possible to use array returning expression in it (type mismatch in current hive). We might extend IN function to accept single array as a argument. 2014-03-11 8:16 GMT+09:00 java8964 : > I don't know from syntax point of view, if Hive will allow to do "columnA IN > UDF(columnB)". > > What I do know that even let's say above work, it won't do the partition > pruning. > > The partition pruning in Hive is strict static, any dynamic values provided > to partition column won't enable partition pruning, even though it is a > feature I missed too. > > Yong > > > Date: Mon, 10 Mar 2014 16:23:01 +0100 > Subject: Using an UDF in the WHERE (IN) clause > From: petter.von.dolw...@gmail.com > To: user@hive.apache.org > > > Hi, > > I'm trying to get the following query to work. The parser don't like it. > Anybody aware of a workaround? > > SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10"); > > partitionCol is my partition column of type INT and I want to achieve early > pruning. I've tried returning an array of INTs from my_udf and also a plain > string in the format (1,2,3). It seems like the parser wont allow me to put > an UDF in this place. > > Any help appreciated. > > Thanks, > Petter
RE: Using an UDF in the WHERE (IN) clause
I don't know from syntax point of view, if Hive will allow to do "columnA IN UDF(columnB)". What I do know that even let's say above work, it won't do the partition pruning. The partition pruning in Hive is strict static, any dynamic values provided to partition column won't enable partition pruning, even though it is a feature I missed too. Yong Date: Mon, 10 Mar 2014 16:23:01 +0100 Subject: Using an UDF in the WHERE (IN) clause From: petter.von.dolw...@gmail.com To: user@hive.apache.org Hi, I'm trying to get the following query to work. The parser don't like it. Anybody aware of a workaround? SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10"); partitionCol is my partition column of type INT and I want to achieve early pruning. I've tried returning an array of INTs from my_udf and also a plain string in the format (1,2,3). It seems like the parser wont allow me to put an UDF in this place. Any help appreciated. Thanks, Petter