Re: Resolvers for UDAFs

Zheng Shao Wed, 03 Feb 2010 23:44:26 -0800

Yes it should be:

SELECT customer_id, topx(2, product_id, product_count)
FROM products_bought
GROUP BY customer_id;




On Wed, Feb 3, 2010 at 11:31 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote:
> Hi Zheng,
>
> Wouldnt the query you mentioned need a group by clause? I need the top x
> customers per product id. Sorry, can you please explain.
>
> Thanks and Regards,
> Sonal
>
>
> On Thu, Feb 4, 2010 at 12:07 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote:
>>
>> Hi Zheng,
>>
>> Thanks for your email and your feedback. I will try to change the code as
>> suggested by you.
>>
>> Here is the output of describe:
>>
>> hive> describe products_bought;
>> OK
>> product_id    int
>> customer_id    int
>> product_count    int
>>
>>
>> My function was working fine earlier with this table and iterate(int, int,
>> int, int). Once I introduced the other iterate, it stopped working.
>>
>>
>> Thanks and Regards,
>> Sonal
>>
>>
>> On Thu, Feb 4, 2010 at 11:37 AM, Zheng Shao <zsh...@gmail.com> wrote:
>>>
>>> Hi Sonal,
>>>
>>> 1. We usually move the group_by column out of the UDAF - just like we
>>> do "SELECT key, sum(value) FROM table".
>>>
>>> I think you should write:
>>>
>>> SELECT customer_id, topx(2, product_id, product_count)
>>> FROM products_bought
>>>
>>> and in topx:
>>> public boolean iterate(int max, int attribute, int count).
>>>
>>>
>>> 2. Can you run "describe products_bought"? Does product_count column
>>> have type "int"?
>>>
>>> You might want to try removing the other interate function to see
>>> whether that solves the problem.
>>>
>>>
>>> Zheng
>>>
>>>
>>> On Wed, Feb 3, 2010 at 9:58 PM, Sonal Goyal <sonalgoy...@gmail.com>
>>> wrote:
>>> > Hi Zheng,
>>> >
>>> > My query is:
>>> >
>>> > select a.myTable.key, a.myTable.attribute, a.myTable.count from (select
>>> > explode (t.pc) as myTable from (select topx(2, product_id, customer_id,
>>> > product_count) as pc from (select product_id, customer_id,
>>> > product_count
>>> > from products_bought order by product_id, product_count desc) r ) t )a;
>>> >
>>> > My overloaded iterators are:
>>> >
>>> > public boolean iterate(int max, int groupBy, int attribute, int count)
>>> >
>>> > public boolean iterate(int max, int groupBy, int attribute, double
>>> > count)
>>> >
>>> > Before overloading, my query was running fine. My table products_bought
>>> > is:
>>> > product_id int, customer_id int, product_count int
>>> >
>>> > And I get:
>>> > FAILED: Error in semantic analysis: Ambiguous method for class
>>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>>> >
>>> > The hive logs say:
>>> > 2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor
>>> > (SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE]
>>> > <value> [<value>]*
>>> > 2010-02-03 11:22:14,663 ERROR ql.Driver
>>> > (SessionState.java:printError(255))
>>> > - FAILED: Error in semantic analysis: Ambiguous method for class
>>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int]
>>> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>>> > method
>>> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int,
>>> > int]
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>>> >         at
>>> >
>>> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>>> >         at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>>> >         at
>>> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>>> >         at
>>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>>> >         at
>>> > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >         at
>>> >
>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >         at
>>> >
>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> >
>>> >
>>> >
>>> > Thanks and Regards,
>>> > Sonal
>>> >
>>> >
>>> > On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zsh...@gmail.com> wrote:
>>> >>
>>> >> Can you post the Hive query? What are the types of the parameters that
>>> >> you passed to the function?
>>> >>
>>> >> Zheng
>>> >>
>>> >> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <sonalgoy...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I am writing a UDAF which takes in 4 parameters. I have 2 cases -
>>> >> > one
>>> >> > where
>>> >> > all the paramters are ints, and second where the last parameter is
>>> >> > double. I
>>> >> > wrote two evaluators for this, with iterate as
>>> >> >
>>> >> > public boolean iterate(int max, int groupBy, int attribute, int
>>> >> > count)
>>> >> >
>>> >> > and
>>> >> >
>>> >> > public boolean iterate(int max, int groupBy, int attribute, double
>>> >> > count)
>>> >> >
>>> >> > However, when I run a query, I get the exception:
>>> >> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous
>>> >> > method
>>> >> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int,
>>> >> > int,
>>> >> > int]
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114)
>>> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370)
>>> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200)
>>> >> >         at
>>> >> > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311)
>>> >> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> >> > Method)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> >> >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> >> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> >> >
>>> >> > One option for me is to write  a resolver which I will do. But, I
>>> >> > just
>>> >> > wanted to know if this is a bug in hive whereby it is not able to
>>> >> > get
>>> >> > the
>>> >> > write evaluator. Or if this is a gap in my understanding.
>>> >> >
>>> >> > I look forward to hearing your views on this.
>>> >> >
>>> >> > Thanks and Regards,
>>> >> > Sonal
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Yours,
>>> >> Zheng
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>
>
>



-- 
Yours,
Zheng

Re: Resolvers for UDAFs

Reply via email to