Re: Hive UDAF extending UDAF class: iterate or evaluate method
Sounds like the wikidoc needs some work. I'm open to suggestions. If Sanjay's simple UDF helps, I could put it in the wiki along with any advice you think would help. Does anyone else have use cases to contribute? -- Lefty On Mon, Aug 5, 2013 at 2:45 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Hi Ritesh To help u get started , I am writing a simple HelloWorld-ish UDF that might help…If it doesn't please ask for more clarifications... Good Luck Thanks sanjay *ToUpperCase.java* *package* com.sanjaysubramanian.utils.hive.udf; *import* org.apache.hadoop.hive.ql.exec.UDF; *public* *final* *class* ToUpperCase *extends* UDF{ *protected* *final* Log logger = LogFactory.*getLog*(toUpperCase.* class*); *public* *String* evaluate(*final* String inputString) { if (inputString != null){ *return* inputString.toUpper; } else { *return* inputString; } } } *Usage in a Hive script* * * hive -e create temporary function toupper as 'com.sanjaysubramanian.utils.hive.udf.ToUpperCase'; SELECT first_name, toupper(first_name) FROM company_names *** From: Ritesh Agrawal ragra...@netflix.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Monday, August 5, 2013 9:41 AM To: user@hive.apache.org user@hive.apache.org Subject: Re: Hive UDAF extending UDAF class: iterate or evaluate method Hi Lefty, I used the wiki you sent to write my first version of UDAF. However, I found it to be utterly complex, especially for storing partial results as I am not very familiar with hive API. Then I found another example of UDAF in the hadoop the definitive guide book and it had much simpler code but using different method. Instead of using iterate it was using evaluate method and so I am getting confused. Ritesh On Sun, Aug 4, 2013 at 2:18 PM, Lefty Leverenz leftylever...@gmail.comwrote: You might find this wikidoc useful: GenericUDAFCaseStudyhttps://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy. The O'Reilly book Programming Hive also has a section called User-Defined Aggregate Functions in chapter 13 (Functions), pages 172 to 176. -- Lefty On Sun, Aug 4, 2013 at 7:12 AM, Ritesh Agrawal ragra...@netflix.comwrote: Hi all, I am trying to write a UDAF function. I found an example that shows how to implement a UDAF in Hadoop The Definitive Guide book. However I am little confused. In the book, the author extends UDAF class and implements init, iterate, terminatePartial, merge and terminate function. However looking at the hive docs ( http://hive.apache.org/docs/r0.11.0/api/org/apache/hadoop/hive/ql/exec/UDAF.html), it seems I need to implement init, aggregate, evaluatePartial, aggregatePartial and evaluate function. Please let me know what are the write functions to implement. Ritesh -- Lefty CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. -- Lefty
Re: Hive UDAF extending UDAF class: iterate or evaluate method
Hi Sanjay, Lefty Thanks for the help but none of above responses directly answering my question (probably I am not asking clear enough :-( ). Below I have two different structure of a UDAF (aggregation function). My question is which one is the preferred/right approach http://pastebin.com/QCgd4Hxc : This version is based on based on what I could understand from API docs about UDAF class. http://pastebin.com/Uctamtek : This version is based on the book Hadoop The definitive guide. Notice the function names for different from the first one. I hope this clarifies my question. Thanks Ritesh On Wed, Aug 7, 2013 at 5:34 PM, Lefty Leverenz leftylever...@gmail.comwrote: Sounds like the wikidoc needs some work. I'm open to suggestions. If Sanjay's simple UDF helps, I could put it in the wiki along with any advice you think would help. Does anyone else have use cases to contribute? -- Lefty On Mon, Aug 5, 2013 at 2:45 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Hi Ritesh To help u get started , I am writing a simple HelloWorld-ish UDF that might help…If it doesn't please ask for more clarifications... Good Luck Thanks sanjay *ToUpperCase.java* *package* com.sanjaysubramanian.utils.hive.udf; *import* org.apache.hadoop.hive.ql.exec.UDF; *public* *final* *class* ToUpperCase *extends* UDF{ *protected* *final* Log logger = LogFactory.*getLog*(toUpperCase.* class*); *public* *String* evaluate(*final* String inputString) { if (inputString != null){ *return* inputString.toUpper; } else { *return* inputString; } } } *Usage in a Hive script* * * hive -e create temporary function toupper as 'com.sanjaysubramanian.utils.hive.udf.ToUpperCase'; SELECT first_name, toupper(first_name) FROM company_names *** From: Ritesh Agrawal ragra...@netflix.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Monday, August 5, 2013 9:41 AM To: user@hive.apache.org user@hive.apache.org Subject: Re: Hive UDAF extending UDAF class: iterate or evaluate method Hi Lefty, I used the wiki you sent to write my first version of UDAF. However, I found it to be utterly complex, especially for storing partial results as I am not very familiar with hive API. Then I found another example of UDAF in the hadoop the definitive guide book and it had much simpler code but using different method. Instead of using iterate it was using evaluate method and so I am getting confused. Ritesh On Sun, Aug 4, 2013 at 2:18 PM, Lefty Leverenz leftylever...@gmail.comwrote: You might find this wikidoc useful: GenericUDAFCaseStudyhttps://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy. The O'Reilly book Programming Hive also has a section called User-Defined Aggregate Functions in chapter 13 (Functions), pages 172 to 176. -- Lefty On Sun, Aug 4, 2013 at 7:12 AM, Ritesh Agrawal ragra...@netflix.comwrote: Hi all, I am trying to write a UDAF function. I found an example that shows how to implement a UDAF in Hadoop The Definitive Guide book. However I am little confused. In the book, the author extends UDAF class and implements init, iterate, terminatePartial, merge and terminate function. However looking at the hive docs ( http://hive.apache.org/docs/r0.11.0/api/org/apache/hadoop/hive/ql/exec/UDAF.html), it seems I need to implement init, aggregate, evaluatePartial, aggregatePartial and evaluate function. Please let me know what are the write functions to implement. Ritesh -- Lefty CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. -- Lefty
Re: Hive UDAF extending UDAF class: iterate or evaluate method
Please follow the guidance for UDF provided in the Hive Programming book by Wampler/Capriolo. That will work for u. I can say with confidence that their book was mighty helpful to me in my project from start to production... And I would recommend go ahead with a way, implement and then fine tune otherwise u will be in analysis paralysis mode… We are all on a path of discovery here ... Regards sanjay From: Ritesh Agrawal ragra...@netflix.commailto:ragra...@netflix.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Wednesday, August 7, 2013 5:57 PM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Hive UDAF extending UDAF class: iterate or evaluate method Hi Sanjay, Lefty Thanks for the help but none of above responses directly answering my question (probably I am not asking clear enough :-( ). Below I have two different structure of a UDAF (aggregation function). My question is which one is the preferred/right approach http://pastebin.com/QCgd4Hxc : This version is based on based on what I could understand from API docs about UDAF class. http://pastebin.com/Uctamtek : This version is based on the book Hadoop The definitive guide. Notice the function names for different from the first one. I hope this clarifies my question. Thanks Ritesh On Wed, Aug 7, 2013 at 5:34 PM, Lefty Leverenz leftylever...@gmail.commailto:leftylever...@gmail.com wrote: Sounds like the wikidoc needs some work. I'm open to suggestions. If Sanjay's simple UDF helps, I could put it in the wiki along with any advice you think would help. Does anyone else have use cases to contribute? -- Lefty On Mon, Aug 5, 2013 at 2:45 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com wrote: Hi Ritesh To help u get started , I am writing a simple HelloWorld-ish UDF that might help…If it doesn't please ask for more clarifications... Good Luck Thanks sanjay ToUpperCase.java package com.sanjaysubramanian.utils.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF; public finalclass ToUpperCase extends UDF{ protected final Log logger = LogFactory.getLog(toUpperCase.class); publicString evaluate(final String inputString) { if (inputString != null){ return inputString.toUpper; } else { return inputString; } } } Usage in a Hive script hive -e create temporary function toupper as 'com.sanjaysubramanian.utils.hive.udf.ToUpperCase'; SELECT first_name, toupper(first_name) FROM company_names *** From: Ritesh Agrawal ragra...@netflix.commailto:ragra...@netflix.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Monday, August 5, 2013 9:41 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: Hive UDAF extending UDAF class: iterate or evaluate method Hi Lefty, I used the wiki you sent to write my first version of UDAF. However, I found it to be utterly complex, especially for storing partial results as I am not very familiar with hive API. Then I found another example of UDAF in the hadoop the definitive guide book and it had much simpler code but using different method. Instead of using iterate it was using evaluate method and so I am getting confused. Ritesh On Sun, Aug 4, 2013 at 2:18 PM, Lefty Leverenz leftylever...@gmail.commailto:leftylever...@gmail.com wrote: You might find this wikidoc useful: GenericUDAFCaseStudyhttps://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy. The O'Reilly book Programming Hive also has a section called User-Defined Aggregate Functions in chapter 13 (Functions), pages 172 to 176. -- Lefty On Sun, Aug 4, 2013 at 7:12 AM, Ritesh Agrawal ragra...@netflix.commailto:ragra...@netflix.com wrote: Hi all, I am trying to write a UDAF function. I found an example that shows how to implement a UDAF in Hadoop The Definitive Guide book. However I am little confused. In the book, the author extends UDAF class and implements init, iterate, terminatePartial, merge and terminate function. However looking at the hive docs (http://hive.apache.org/docs/r0.11.0/api/org/apache/hadoop/hive/ql/exec/UDAF.html), it seems I need to implement init, aggregate, evaluatePartial, aggregatePartial and evaluate function. Please let me know what are the write functions to implement. Ritesh -- Lefty CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may
Re: Hive UDAF extending UDAF class: iterate or evaluate method
Hi Lefty, I used the wiki you sent to write my first version of UDAF. However, I found it to be utterly complex, especially for storing partial results as I am not very familiar with hive API. Then I found another example of UDAF in the hadoop the definitive guide book and it had much simpler code but using different method. Instead of using iterate it was using evaluate method and so I am getting confused. Ritesh On Sun, Aug 4, 2013 at 2:18 PM, Lefty Leverenz leftylever...@gmail.comwrote: You might find this wikidoc useful: GenericUDAFCaseStudyhttps://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy . The O'Reilly book Programming Hive also has a section called User-Defined Aggregate Functions in chapter 13 (Functions), pages 172 to 176. -- Lefty On Sun, Aug 4, 2013 at 7:12 AM, Ritesh Agrawal ragra...@netflix.comwrote: Hi all, I am trying to write a UDAF function. I found an example that shows how to implement a UDAF in Hadoop The Definitive Guide book. However I am little confused. In the book, the author extends UDAF class and implements init, iterate, terminatePartial, merge and terminate function. However looking at the hive docs ( http://hive.apache.org/docs/r0.11.0/api/org/apache/hadoop/hive/ql/exec/UDAF.html), it seems I need to implement init, aggregate, evaluatePartial, aggregatePartial and evaluate function. Please let me know what are the write functions to implement. Ritesh -- Lefty
Hive UDAF extending UDAF class: iterate or evaluate method
Hi all, I am trying to write a UDAF function. I found an example that shows how to implement a UDAF in Hadoop The Definitive Guide book. However I am little confused. In the book, the author extends UDAF class and implements init, iterate, terminatePartial, merge and terminate function. However looking at the hive docs (http://hive.apache.org/docs/r0.11.0/api/org/apache/hadoop/hive/ql/exec/UDAF.html), it seems I need to implement init, aggregate, evaluatePartial, aggregatePartial and evaluate function. Please let me know what are the write functions to implement. Ritesh
Re: Hive UDAF extending UDAF class: iterate or evaluate method
You might find this wikidoc useful: GenericUDAFCaseStudyhttps://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy . The O'Reilly book Programming Hive also has a section called User-Defined Aggregate Functions in chapter 13 (Functions), pages 172 to 176. -- Lefty On Sun, Aug 4, 2013 at 7:12 AM, Ritesh Agrawal ragra...@netflix.com wrote: Hi all, I am trying to write a UDAF function. I found an example that shows how to implement a UDAF in Hadoop The Definitive Guide book. However I am little confused. In the book, the author extends UDAF class and implements init, iterate, terminatePartial, merge and terminate function. However looking at the hive docs ( http://hive.apache.org/docs/r0.11.0/api/org/apache/hadoop/hive/ql/exec/UDAF.html), it seems I need to implement init, aggregate, evaluatePartial, aggregatePartial and evaluate function. Please let me know what are the write functions to implement. Ritesh -- Lefty