Yes it should be: SELECT customer_id, topx(2, product_id, product_count) FROM products_bought GROUP BY customer_id;
On Wed, Feb 3, 2010 at 11:31 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Hi Zheng, > > Wouldnt the query you mentioned need a group by clause? I need the top x > customers per product id. Sorry, can you please explain. > > Thanks and Regards, > Sonal > > > On Thu, Feb 4, 2010 at 12:07 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: >> >> Hi Zheng, >> >> Thanks for your email and your feedback. I will try to change the code as >> suggested by you. >> >> Here is the output of describe: >> >> hive> describe products_bought; >> OK >> product_id int >> customer_id int >> product_count int >> >> >> My function was working fine earlier with this table and iterate(int, int, >> int, int). Once I introduced the other iterate, it stopped working. >> >> >> Thanks and Regards, >> Sonal >> >> >> On Thu, Feb 4, 2010 at 11:37 AM, Zheng Shao <zsh...@gmail.com> wrote: >>> >>> Hi Sonal, >>> >>> 1. We usually move the group_by column out of the UDAF - just like we >>> do "SELECT key, sum(value) FROM table". >>> >>> I think you should write: >>> >>> SELECT customer_id, topx(2, product_id, product_count) >>> FROM products_bought >>> >>> and in topx: >>> public boolean iterate(int max, int attribute, int count). >>> >>> >>> 2. Can you run "describe products_bought"? Does product_count column >>> have type "int"? >>> >>> You might want to try removing the other interate function to see >>> whether that solves the problem. >>> >>> >>> Zheng >>> >>> >>> On Wed, Feb 3, 2010 at 9:58 PM, Sonal Goyal <sonalgoy...@gmail.com> >>> wrote: >>> > Hi Zheng, >>> > >>> > My query is: >>> > >>> > select a.myTable.key, a.myTable.attribute, a.myTable.count from (select >>> > explode (t.pc) as myTable from (select topx(2, product_id, customer_id, >>> > product_count) as pc from (select product_id, customer_id, >>> > product_count >>> > from products_bought order by product_id, product_count desc) r ) t )a; >>> > >>> > My overloaded iterators are: >>> > >>> > public boolean iterate(int max, int groupBy, int attribute, int count) >>> > >>> > public boolean iterate(int max, int groupBy, int attribute, double >>> > count) >>> > >>> > Before overloading, my query was running fine. My table products_bought >>> > is: >>> > product_id int, customer_id int, product_count int >>> > >>> > And I get: >>> > FAILED: Error in semantic analysis: Ambiguous method for class >>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int] >>> > >>> > The hive logs say: >>> > 2010-02-03 11:18:15,721 ERROR processors.DeleteResourceProcessor >>> > (SessionState.java:printError(255)) - Usage: delete [FILE|JAR|ARCHIVE] >>> > <value> [<value>]* >>> > 2010-02-03 11:22:14,663 ERROR ql.Driver >>> > (SessionState.java:printError(255)) >>> > - FAILED: Error in semantic analysis: Ambiguous method for class >>> > org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, int] >>> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous >>> > method >>> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, int, >>> > int] >>> > at >>> > >>> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587) >>> > at >>> > >>> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114) >>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317) >>> > at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370) >>> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362) >>> > at >>> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140) >>> > at >>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200) >>> > at >>> > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> > at >>> > >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> > at >>> > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> > at java.lang.reflect.Method.invoke(Method.java:597) >>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>> > >>> > >>> > >>> > Thanks and Regards, >>> > Sonal >>> > >>> > >>> > On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao <zsh...@gmail.com> wrote: >>> >> >>> >> Can you post the Hive query? What are the types of the parameters that >>> >> you passed to the function? >>> >> >>> >> Zheng >>> >> >>> >> On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal <sonalgoy...@gmail.com> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > I am writing a UDAF which takes in 4 parameters. I have 2 cases - >>> >> > one >>> >> > where >>> >> > all the paramters are ints, and second where the last parameter is >>> >> > double. I >>> >> > wrote two evaluators for this, with iterate as >>> >> > >>> >> > public boolean iterate(int max, int groupBy, int attribute, int >>> >> > count) >>> >> > >>> >> > and >>> >> > >>> >> > public boolean iterate(int max, int groupBy, int attribute, double >>> >> > count) >>> >> > >>> >> > However, when I run a query, I get the exception: >>> >> > org.apache.hadoop.hive.ql.exec.AmbiguousMethodException: Ambiguous >>> >> > method >>> >> > for class org.apache.hadoop.hive.udaf.TopXPerGroup with [int, int, >>> >> > int, >>> >> > int] >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:83) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:57) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:594) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:1882) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2270) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2821) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:4543) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5058) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4999) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5020) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5587) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:114) >>> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:317) >>> >> > at >>> >> > org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:370) >>> >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:362) >>> >> > at >>> >> > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140) >>> >> > at >>> >> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:200) >>> >> > at >>> >> > org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:311) >>> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> >> > Method) >>> >> > at >>> >> > >>> >> > >>> >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> >> > at >>> >> > >>> >> > >>> >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> >> > at java.lang.reflect.Method.invoke(Method.java:597) >>> >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>> >> > >>> >> > One option for me is to write a resolver which I will do. But, I >>> >> > just >>> >> > wanted to know if this is a bug in hive whereby it is not able to >>> >> > get >>> >> > the >>> >> > write evaluator. Or if this is a gap in my understanding. >>> >> > >>> >> > I look forward to hearing your views on this. >>> >> > >>> >> > Thanks and Regards, >>> >> > Sonal >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Yours, >>> >> Zheng >>> > >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >> > > -- Yours, Zheng