Re: GenericUDFRank UDF is not working as expected

2013-07-23 Thread Nitin Pawar
try rank(columntoberanked, columntobegrouped)

in your case rank (userid, city)


On Wed, Jul 24, 2013 at 3:47 AM, Shahar Glixman wrote:

> Hello,
>
> I'm trying to use GenericUDFRank described in:
> https://issues.apache.org/jira/browse/HIVE-2361, however, no matter
>  the query I use, the result is not what I expected.
> Assume a user hive table with the format:
> Country, City, userId
>
> I'm running the following query:
>
> ADD JAR Rank.jar;
> CREATE TEMPORARY FUNCTION rank AS
> 'com.nexr.platform.analysis.udf.GenericUDFRank';
>
> SELECT
>   Country,
>   City,
>   rank(userId)
>
> FROM
>   myTable
>
> DISTRIBUTE BY
>   Country,
>   City
>
> SORT BY
>   Country,
>   City
>   userId;
>
> For the following table:
> US NY 8
> US NY 12
> US NY 3
> US NJ 10
> US NJ 26
>
> I'm expecting the following result:
> US NY 1
> US NY 2
> US NY 3
> US NJ 1
> US NJ 2
>
> But I get:
> US NY 1
> US NY 1
> US NY 1
> US NJ 1
> US NJ 1
>
> I used also a different rank implementation (
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive)
> but results
>  were similar. I guess I'm using the UDF the wrong way, but I cant find
> the correct way.
> Any help is appreciated.
>
> thanks
>
> The above terms reflect a potential business arrangement, are provided solely
> as a basis for further discussion, and are not intended to be and do not
> constitute a legally binding obligation. No legally binding obligations will
> be created, implied, or inferred until an agreement in final form is executed
> in writing by all parties involved.
>
> This email and any attachments hereto may be confidential or privileged.
>  If you received this communication by mistake, please don't forward it
> to anyone else, please erase all copies and attachments, and please let
> me know that it has gone to the wrong person. Thanks.
>



-- 
Nitin Pawar


Reminder: Bay Area Hive user meetup at LinkedIn tomorrow (7/24)

2013-07-23 Thread Mohammad Islam
Please join us at LinkedIn (2025 Stierlin Court, Mountain View, CA) tomorrow at 
6 PM for this month's Hive user meet up.We have a packed agenda :
* Hive at LinkedIn, Mohammad Islam, Mark Wagner, and Karthik Ramasamy
* Hive Server 2 at Yahoo!, Chris Drome
* Hive on Tez, Gunther Hagleitner
* Spatial Analytics with Hive, Carter Shaklin
* Cloudera presentation, Arvind and Shreepadma
The sessions (not the beer) will be streamed at: 
http://www.ustream.tv/linkedin-events
The recordings will be available shortly after the meet up.

More details are available at 
http://www.meetup.com/Hive-User-Group-Meeting/events/126986902/

We hope to see you there!

Re: Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Navis류승우
It will be called 4 times whatever you annotated on the UDF if you are
using released version of hive.

https://issues.apache.org/jira/browse/HIVE-4209 , which will be
included in 0.12.0, will make that single UDF call by caching result.

2013/7/24 Sanjay Subramanian :
> Thanks Jan
>
> I will mod my UDF and test it out
>
> I want to make sure I understand your words here
> "The obvious condition is that it must always return the identical result
> when called with same parameters."
>
> If I can make sure that a call to the web service is successful it will
> always return same output for a given set of input
>
> F(x1,y1) >will always equal -> z1
>
> that’s what u mean right ?
>
> sanjay
>
> From: Jan Dolinár 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, July 23, 2013 12:35 PM
> To: user 
>
> Subject: Re: Calling same UDF multiple times in a SELECT query
>
> Hi,
>
> If you use annotation, Hive should be able to optimize it to single call:
>
>  @UDFType(deterministic = true)
>
> The obvious condition is that it must always return the identical result
> when called with same parameters.
>
> Little bit more on this can be found in Mark Grovers post at
> http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.
>
> Regards,
> Jan
>
>
> On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar 
> wrote:
>>
>> fucntion return values are not stored for repeat use of same (as per my
>> understanding)
>>
>> I know you may have already thought about other approach as
>>
>> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
>> from table
>>
>>
>>
>>
>> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian
>>  wrote:
>>>
>>> Hi
>>>
>>> V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>>
>>> I need to check and use the output from a UDF in a query to assign values
>>> to 2 columns in a SELECT query
>>>
>>> Example
>>>
>>> SELECT
>>>  a,
>>>  IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>>>  IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
>>> FROM
>>>  my_hive_table
>>>
>>>
>>> So will fooUdf be called 4 times ? Or once ?
>>>
>>> Why this is important is because in our case this UDF calls a web service
>>> and I don't want so many calls to the service.
>>>
>>> Thanks
>>>
>>> sanjay
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ==
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the sender
>>> by reply email and destroy all copies of the original message along with any
>>> attachments, from your computer system. If you are the intended recipient,
>>> please be advised that the content of this message is subject to access,
>>> review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>>
>> --
>> Nitin Pawar
>
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.


GenericUDFRank UDF is not working as expected

2013-07-23 Thread Shahar Glixman
Hello,

I'm trying to use GenericUDFRank described in:
https://issues.apache.org/jira/browse/HIVE-2361, however, no matter
 the query I use, the result is not what I expected.
Assume a user hive table with the format:
Country, City, userId

I'm running the following query:

ADD JAR Rank.jar;
CREATE TEMPORARY FUNCTION rank AS
'com.nexr.platform.analysis.udf.GenericUDFRank';

SELECT
  Country,
  City,
  rank(userId)

FROM
  myTable

DISTRIBUTE BY
  Country,
  City

SORT BY
  Country,
  City
  userId;

For the following table:
US NY 8
US NY 12
US NY 3
US NJ 10
US NJ 26

I'm expecting the following result:
US NY 1
US NY 2
US NY 3
US NJ 1
US NJ 2

But I get:
US NY 1
US NY 1
US NY 1
US NJ 1
US NJ 1

I used also a different rank implementation (
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive)
but results
 were similar. I guess I'm using the UDF the wrong way, but I cant find the
correct way.
Any help is appreciated.

thanks

-- 
The above terms reflect a potential business arrangement, are provided solely 
as a basis for further discussion, and are not intended to be and do not 
constitute a legally binding obligation. No legally binding obligations will 
be created, implied, or inferred until an agreement in final form is executed 
in writing by all parties involved.

This email and any attachments hereto may be confidential or privileged. 
 If you received this communication by mistake, please don't forward it to 
anyone else, please erase all copies and attachments, and please let me 
know that it has gone to the wrong person. Thanks.


Re: Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Sanjay Subramanian
Thanks Jan

I will mod my UDF and test it out

I want to make sure I understand your words here
"The obvious condition is that it must always return the identical result when 
called with same parameters."

If I can make sure that a call to the web service is successful it will always 
return same output for a given set of input

F(x1,y1) >will always equal -> z1

that’s what u mean right ?

sanjay

From: Jan Dolinár mailto:dolik@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, July 23, 2013 12:35 PM
To: user mailto:user@hive.apache.org>>
Subject: Re: Calling same UDF multiple times in a SELECT query

Hi,

If you use annotation, Hive should be able to optimize it to single call:

 @UDFType(deterministic = true)

The obvious condition is that it must always return the identical result when 
called with same parameters.

Little bit more on this can be found in Mark Grovers post at 
http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.

Regards,
Jan


On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar 
mailto:nitinpawar...@gmail.com>> wrote:
fucntion return values are not stored for repeat use of same (as per my 
understanding)

I know you may have already thought about other approach as

select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call from 
table




On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
 wrote:
Hi

V r using version hive-exec-0.9.0-cdh4.1.2 in production

I need to check and use the output from a UDF in a query to assign values to 2 
columns in a SELECT query

Example

SELECT
 a,
 IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
 IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
FROM
 my_hive_table


So will fooUdf be called 4 times ? Or once ?

Why this is important is because in our case this UDF calls a web service and I 
don't want so many calls to the service.

Thanks

sanjay



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.



--
Nitin Pawar


CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Sanjay Subramanian
Hi Nitin

Thanks
Yes I did actually do a nested query but it spawns reducers that I did not 
want…I wanted to keep it to one select so that only mappers are called and then 
I can invoke several mappers to call the we b service

Thanks

sanjay

From: Nitin Pawar mailto:nitinpawar...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, July 23, 2013 12:25 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Calling same UDF multiple times in a SELECT query

fucntion return values are not stored for repeat use of same (as per my 
understanding)

I know you may have already thought about other approach as

select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call from 
table




On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
 wrote:
Hi

V r using version hive-exec-0.9.0-cdh4.1.2 in production

I need to check and use the output from a UDF in a query to assign values to 2 
columns in a SELECT query

Example

SELECT
 a,
 IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
 IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
FROM
 my_hive_table


So will fooUdf be called 4 times ? Or once ?

Why this is important is because in our case this UDF calls a web service and I 
don't want so many calls to the service.

Thanks

sanjay



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.



--
Nitin Pawar

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Jan Dolinár
Hi,

If you use annotation, Hive should be able to optimize it to single call:

 @UDFType(deterministic = true)

The obvious condition is that it must always return the identical result
when called with same parameters.

Little bit more on this can be found in Mark Grovers post at
http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.

Regards,
Jan


On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar wrote:

> fucntion return values are not stored for repeat use of same (as per my
> understanding)
>
> I know you may have already thought about other approach as
>
> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
> from table
>
>
>
>
> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>>  Hi
>>
>>  V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>
>>  I need to check and use the output from a UDF in a query to assign
>> values to 2 columns in a SELECT query
>>
>>  Example
>>
>>  SELECT
>>  a,
>>  IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>>  IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
>> FROM
>>  my_hive_table
>>
>>
>>  So will fooUdf be called 4 times ? Or once ?
>>
>>  Why this is important is because in our case this UDF calls a web
>> service and I don't want so many calls to the service.
>>
>>  Thanks
>>
>>  sanjay
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ==
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
>
> --
> Nitin Pawar
>


Re: Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Nitin Pawar
fucntion return values are not stored for repeat use of same (as per my
understanding)

I know you may have already thought about other approach as

select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
from table




On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian <
sanjay.subraman...@wizecommerce.com> wrote:

>  Hi
>
>  V r using version hive-exec-0.9.0-cdh4.1.2 in production
>
>  I need to check and use the output from a UDF in a query to assign
> values to 2 columns in a SELECT query
>
>  Example
>
>  SELECT
>  a,
>  IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>  IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
> FROM
>  my_hive_table
>
>
>  So will fooUdf be called 4 times ? Or once ?
>
>  Why this is important is because in our case this UDF calls a web
> service and I don't want so many calls to the service.
>
>  Thanks
>
>  sanjay
>
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 
Nitin Pawar


Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Sanjay Subramanian
Hi

V r using version hive-exec-0.9.0-cdh4.1.2 in production

I need to check and use the output from a UDF in a query to assign values to 2 
columns in a SELECT query

Example

SELECT
 a,
 IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
 IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
FROM
 my_hive_table


So will fooUdf be called 4 times ? Or once ?

Why this is important is because in our case this UDF calls a web service and I 
don't want so many calls to the service.

Thanks

sanjay



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Hive-0.11.0 HCatalog configuration

2013-07-23 Thread Alan Gates

On Jul 23, 2013, at 1:03 AM, nabhajit wrote:

> Hi,
> 
> 
> I am trying to configure Hcatalog , which is now part of Hive-0.11.0.
> 
> Do I have to make changes to the permission of the following files?
> 
> $HCAT_HOME/bin/hcat and $HCAT_HOME/sbin/webhcat-server.sh

Yes, this is a known issue and has been fixed in trunk.

> 
> as, currently they  do not have execute permission.
> 
> Also, do webhcat-default.xml file needs to be renamed to webhcat-site.xml ?

No, you only need to create a webhcat-site.xml and populate it with values if 
you need to change the defaults.  You can put only values that are different 
from the defaults in webhcat-site.xml.  The format is the same as 
webhcat-default.xml.

Alan.
  
> 
> I am following  the below link for configuration:
> 
> http://hive.apache.org/docs/hcat_r0.5.0/configuration.html
> 
> 
> Thanks,
> 
> Nabhajit
> 
> 



Does HiveServer2 support delegation token?

2013-07-23 Thread Bing Li
Hi, all
HiveMetastore supports delegation token.
Does HiveServer2 support it as well? If not, do we have a plan for this?

Besides, on hive wiki
hive.server2.authentication - Authentication mode, default NONE. Options
are NONE, KERBEROS, LDAP and CUSTOM

Will HiveServer2 support PAM which could be configured to use multiple
authentication ways like OS, or LDAP as well?



Thanks,
- Bing


TimestampWritable in UDAF

2013-07-23 Thread Rouzbeh Safaie
Hi guys (hope this is the correct mailing list),

I am writing a custom UDAF in Hive, one of the fields I am trying to merge
is a Timestamp, in the terminatePartial function I have something like:

((TimestampWritable) partialResult[0]).set(dateBuffer.logtime);

I have checked that the correct Timestamp is being written at this point
but when I get to the merge function it is not reading the value written in
the terminatePartial function, instead the Timestamp object it finds is the
same as what is in the current AggregationBuffer.

I am sure everything else is setup correctly because if I convert the
Timestamp column to a long a LongWritable instead of a TimestampWritable
everything works as expected (the merge function picks up the correct value
from the params passed in).

Could this be related to:

https://issues.apache.org/jira/browse/HIVE-4516

I don't have the patch applied in our environment at work and doing so
would be a bit of a pain, just wondered if someone could shed some light on
whether this is the problem as I am very new to all this.

Thanks,
Rouz


Hive-0.11.0 HCatalog configuration

2013-07-23 Thread nabhajit

Hi,


I am trying to configure Hcatalog , which is now part of Hive-0.11.0.

Do I have to make changes to the permission of the following files?

$HCAT_HOME/bin/hcat and $HCAT_HOME/sbin/webhcat-server.sh

as, currently they  do not have execute permission.

Also, do webhcat-default.xml file needs to be renamed to webhcat-site.xml ?

 I am following  the below link for configuration:

http://hive.apache.org/docs/hcat_r0.5.0/configuration.html


Thanks,

Nabhajit