There are limitations as to what can be passed between terminatePartial() and merge() I'm not sure that you can pass java arrays (i.e. your double[] c1;) through all the hive reflection gubbins. Try using ArrayList<>s instead, but be warned, you need to make explicit deep copies of anything passed in to merge().
Robin From: Ritesh Agrawal <ragra...@netflix.com<mailto:ragra...@netflix.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Monday, July 29, 2013 9:12 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: UDAF terminatePartial structure Hi Robin,igor Thanks for the suggestion and links. Based on examples I found, below is my UDF. However, I am getting following error when trying to run it. Not sure what the error means ============= ERROR ==================== FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.NoSuchMethodException: [D.<init>()) java.lang.RuntimeException: java.lang.NoSuchMethodException: [D.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.serde2.objectinspector.ReflectionStructObjectInspector.create(ReflectionStructObjectInspector.java:170) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:225) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:127) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:221) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:127) ============= UDF CODE ================== package com.netflix.hive.udaf; import java.io.IOException; import java.lang.reflect.Array; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Set; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; import org.apache.hadoop.hive.serde2.io.DoubleWritable; import org.apache.hadoop.io.IntWritable; @Description( name = "MFFoldIn", value = "_FUNC_(expr, nb) - Computes latent features for a given item/user based user/item vectors", extended = "Example:\n" ) public class MFFoldIn extends UDAF { publicstatic class MFFoldInEvaluator implements UDAFEvaluator{ publicstatic class PartialResult{ double[] c1; double[][] c2; double[][] c3; doublewm; doublelambda; intitemCount; double[][] varco; Set<Long> observedShows; publicint getDimensionsCount() throws Exception{ if(c1 != null) returnc1.length; thrownew Exception("Unknown dimension count"); } } private UserVecBuilder builder; publicvoid init() { builder = null; } publicboolean iterate(DoubleWritable wm, DoubleWritable lambda, IntWritable itemCount, String itemSquaredFile, DoubleWritable weight, List<Double> lf, Long item) throws IOException{ double[] lflist = new double[lf.size()]; for(int i=0; i<lf.size(); i++) lflist[i] = lf.get(i).doubleValue(); if(builder == null) builder = new UserVecBuilder(); if(!builder.isReady()){ builder.setW_m(wm.get()); builder.setLambda(lambda.get()); builder.setItemRowCount(itemCount.get()); builder.readItemCovarianceMatFiles(itemSquaredFile, lflist.length); } builder.add(item, lflist, weight.get()); returntrue; } public PartialResult terminatePartial(){ PartialResult partial = new PartialResult(); partial.c1 = builder.getComponent1(); partial.c2 = builder.getComponent2(); partial.c3 = builder.getComponent3(); partial.wm = builder.getW_m(); partial.lambda = builder.getLambda(); partial.observedShows = builder.getObservedShows(); partial.itemCount = builder.getItemRowCount(); partial.varco = builder.getVarCovar(); return partial; } publicboolean merge(PartialResult other){ if(other == null) returntrue; if(builder == null) builder = new UserVecBuilder(); if(!builder.isReady()){ builder.setW_m(other.wm); builder.setLambda(other.lambda); builder.setItemRowCount(other.itemCount); builder.setItemCovarianceMat(other.varco); builder.setComponent1(other.c1); builder.setComponent2(other.c2); builder.setComponent3(other.c3); builder.setObservedShows(other.observedShows); }else{ builder.merge(other.c1, other.c2, other.c3, other.observedShows); } returntrue; } publicdouble[] terminate(){ if(builder == null) returnnull; returnbuilder.build(); } } } ==================== On Jul 29, 2013, at 4:37 PM, Igor Tatarinov wrote: I found this Cloudera example helpful: http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-contrib/0.7.0-cdh3u0/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMaxMinNUtil.java#UDAFExampleMaxMinNUtil.Evaluator igor decide.com<http://decide.com/> On Mon, Jul 29, 2013 at 4:32 PM, Ritesh Agrawal <ragra...@netflix.com<mailto:ragra...@netflix.com>> wrote: Hi Robin, Thanks for the suggestion. I did find such an example in Hadoop The definitive guide book. However I am not total confused. The book extends UDAF instead of AbstractGenericUDAFResolver. Which one is recommended ? Also the example in the book uses DoubleWritable as a return type for the "terminate" function. However, I will be returning an arraylist of double. Do I always need to written objects that are derived from WritableComponents. Ritesh On Jul 29, 2013, at 4:15 PM, Robin Morris wrote: > I believe a map will be passed correctly from the terminatePartial to the > merge functions. But it seems a bit of overkill. > > Why not define a class within your UDAF which has 4 public data members, > and return instances of that class from terminatePartial()? > > Robin > > > On 7/29/13 3:19 PM, "Ritesh Agrawal" > <ragra...@netflix.com<mailto:ragra...@netflix.com>> wrote: > >> Hi all, >> >> I am writing my first UDAF. In my terminatePartial() function, I need to >> store different data having different data types. Below is a list of >> items that I need to store >> 1. C1 : list of doubles >> 2. C2: list of doubles >> 3. C3: double >> 4. Show: list of strings >> >> >> I am wondering can I use simple HashMap and store these different objects >> into it. Will it automatically serialize or will I need to write my own >> serializiable method. Also is there any example of a UDAF that shows how >> to use map type structure for storing partial results. >> >> Thanks >> >> Ritesh >