Well... Converting from string to integers anyway... To many 4th of July Hot Dogs. going into nitrate overload. :)
I am pulling an array of string values from json data. The string values are actually integers. I am converting to integers and summing each array entry to the final tally. On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <[email protected]> wrote: > Ted, > > Yes, I started out just getting a basic count to work. I am trying to keep > the workflow as close to a basic user as possible. As such, I am building > and using the MapR Apache Drill sandbox to test. > > > 1. Always look at the drillbits.log file to see if drill had any > issues loading your UDF. That was where I learned that all workspace values > needed to be holders > - > - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure loading > function class > com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, > field > xList. Aggregate function 'MyLinearRegression1' workspace variable > 'xList' > is of type 'interface > org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. > Please change it to Holder type. > 2. Error messages: > - If you get an error in this format it means that Drill can not > find your function so it probably didn't load it. back to step 1: > - > - PARSE ERROR: From line 1, column 8 to line 1, column 44: No > match found for function signature MyFunctionName(<ANY>) > - If you get an error in this format it means that the function is > there but Drill could not find a signature that matched the param types > or > param numbers you were passing it. The exact wording will change but > the Missing function implementation is the key phrase to look for: > - > - Error: SYSTEM ERROR: > org.apache.drill.exec.exception.SchemaChangeException: Failure while > trying > to materialize incoming schema. Errors: > - Error in expression at index -1. Error: Missing function > implementation: [castBIGINT(VARCHAR-REPEATED)]. Full expression: > --UNKNOWN > EXPRESSION-- > 3. In your function definition for aggregate functions you need to > set null processing to internal and your isRandom to false. Example below: > - > - @FunctionTemplate(name = "MyFunctionName", scope = > FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = > FunctionTemplate.NullHandling.INTERNAL, isRandom = false, > isBinaryCommutative = false, costCategory = > FunctionTemplate.FunctionCostCategory.COMPLEX) > > Below is an example from the Apache Drill tutorial data sets contained in > the MapR Apache Drill sandbox. I am pulling an array if string values from > json data. The string values are actually integers. I am converting to > string and summing each array entry to the final tally. This in no way > represents what this data was for but it did become a handy way for me to > peck out the "correct" way to build an aggregation UDF function > > @FunctionTemplate(name = "MyArraySum", scope = > FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = > FunctionTemplate.NullHandling.INTERNAL, isRandom = false, > isBinaryCommutative = false, costCategory = > FunctionTemplate.FunctionCostCategory.COMPLEX) > public static class MyArraySum implements DrillAggFunc { > > @Param RepeatedVarCharHolder listToSearch; > @Workspace NullableBigIntHolder count; > @Workspace NullableBigIntHolder sum; > @Workspace NullableVarCharHolder vc; > @Output BigIntHolder out; > > @Override > public void setup() { > count.value=0; > sum.value = 0; > } > > @Override > public void add() { > int c = listToSearch.end - listToSearch.start; > int val = 0; > try { > for(int i=0; i<c; i++){ > listToSearch.vector.getAccessor().get(i, vc); > String inputStr = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, > vc.end, vc.buffer); > val = Integer.parseInt(inputStr); > sum.value = sum.value + val; > } > } catch (Exception e) { > val = 0; > } > count.value = count.value + 1; > } > > Example select statement: > SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as > my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5); > > On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <[email protected]> wrote: > >> Jim, >> >> I think that you may be having trouble with aggregators in general. >> >> Have you been able to build *any* aggregator of anything? I haven't. >> >> When I try to build an aggregator of int's or doubles, I get a very >> persistent problem with Drill even seeing my aggregates: >> >> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from >> cp.`employee.json`;* >> >> Jul 04, 2015 4:19:35 PM >> org.apache.calcite.sql.validate.SqlValidatorException <init> >> >> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match >> found for function signature sum_int(<ANY>) >> >> Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException <init> >> >> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, >> column 8 to line 1, column 27: No match found for function signature >> sum_int(<ANY>) >> >> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match >> found for function signature sum_int(<ANY>)* >> >> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010 >> <http://10.0.1.2:31010>] (state=,code=0)* >> >> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as int)) from >> cp.`employee.json`*; >> >> Jul 04, 2015 4:19:45 PM >> org.apache.calcite.sql.validate.SqlValidatorException <init> >> >> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match >> found for function signature sum_int(<NUMERIC>) >> >> Jul 04, 2015 4:19:45 PM org.apache.calcite.runtime.CalciteException <init> >> >> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, >> column 8 to line 1, column 40: No match found for function signature >> sum_int(<NUMERIC>) >> >> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: No match >> found for function signature sum_int(<NUMERIC>)* >> >> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on 10.0.1.2:31010 >> <http://10.0.1.2:31010>] (state=,code=0)* >> >> 0: jdbc:drill:zk=local> >> >> >> It looks like there is some undocumented subtlety about how to register an >> aggregator. >> >> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <[email protected]> wrote: >> >> > I'm working on the same thing. I want to aggregate a list of values. It >> has >> > been a search and guess game for the most part. I'm still stuck in the >> > process of getting the values all into a list. The writers look >> interesting >> > but for aggregation functions it looks like the input is the param and >> > output objects can't hold the aggregations steps. The Workspace is where >> > that happens. If I try and use a Writer in a workspace it won't load and >> > tells me to change it to Holders which was why I was using them to start >> > with. Maybe I'm missing the architecture of the agg function. It looked >> > like it was.... >> > >> > @Param comes in -> initialize @Workspace vars in setup -> process data >> > through @Workspace vars in add -> finalize @Output in output. >> > >> > So I'm back to trying to figure out how to create a >> RepeatedBigIntHolder or >> > a RepeatedVarCharHolder... >> > >> > >> > >> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <[email protected]> >> wrote: >> > >> > > I am working on trying to build any kind of list constructing >> aggregator >> > > and having absolute fits. >> > > >> > > To simplify life, I decided to just build a generic list builder that >> is >> > a >> > > scalar function that returns a list containing its argument. Thus >> > zoop(3) >> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]]. >> > > >> > > The ComplexWriter looks like the place to go. As usual, the complete >> lack >> > > of comments in most of Drill makes this very hard since I have to >> guess >> > > what works and what doesn't. >> > > >> > > In my code, I note that ComplexWriter has a nice rootAsList() >> method. I >> > > used this in zip and it works nicely to construct lists for output. I >> > note >> > > that the resulting ListWriter has a method copyReader(FieldReader >> var1) >> > > which looks really good. >> > > >> > > Unfortunately, the only implementation of copyReader() is in >> > > AbstractFieldWriter and it looks this: >> > > >> > > public void copyReader(FieldReader reader) { >> > > this.fail("Copy FieldReader"); >> > > } >> > > >> > > I would like to formally say at this point "WTF"? >> > > >> > > In digging in further, I see other methods that look handy like >> > > >> > > public void write(IntHolder holder) { >> > > this.fail("Int"); >> > > } >> > > >> > > And then in looking at implementations, it looks like there is a >> > > combinatorial explosion because every type seems to need a write >> method >> > for >> > > every other type. >> > > >> > > What is the thought here? How can I copy an arbitrary value into a >> list? >> > > >> > > My next thought was to build code that dispatches on type. There is a >> > > method called getType() on the FieldReader. Unfortunately, that >> drives >> > > into code generated by protoc and I see no way to dispatch on the >> type of >> > > an incoming value. >> > > >> > > >> > > How is this supposed to work? >> > > >> > > >> > > >> > > >> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <[email protected]> >> > wrote: >> > > >> > > > For a detailed example on using ComplexWriter interface you can >> take a >> > > look >> > > > at the Mappify >> > > > < >> > > > >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java >> > > > > >> > > > (kvgen) function. The function itself is very simple however it >> makes >> > use >> > > > of the utility methods in MappifyUtility >> > > > < >> > > > >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java >> > > > > >> > > > and MapUtility >> > > > < >> > > > >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java >> > > > > >> > > > which perform most of the work. >> > > > >> > > > Currently we don't have a generic infrastructure to handle errors >> > coming >> > > > out of functions. However there is UserException, which when raised >> > will >> > > > make sure that Drill does not gobble up the error message in that >> > > > exception. So you can probably throw a UserException with the >> failing >> > > input >> > > > in your function to make sure it propagates to the user. >> > > > >> > > > Thanks >> > > > Mehant >> > > > >> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <[email protected]> >> > > wrote: >> > > > >> > > > > *Holders are for both input and output. You can also use >> > CompleWriter >> > > > for >> > > > > output and FieldReader for input if you want to write or read a >> > complex >> > > > > value. >> > > > > >> > > > > I don't think we've provided a really clean way to construct a >> > > > > Repeated*Holder for output purposes. You can probably do it by >> > > reaching >> > > > > into a bunch of internal interfaces in Drill. However, I would >> > > recommend >> > > > > using the ComplexWriter output pattern for now. This will be a >> > little >> > > > less >> > > > > efficient but substantially less brittle. I suggest you open up a >> > jira >> > > > for >> > > > > using a Repeated*Holder as an output. >> > > > > >> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning < >> [email protected]> >> > > > wrote: >> > > > > >> > > > > > Holders are for input, I think. >> > > > > > >> > > > > > Try the different kinds of writers. >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <[email protected] >> > >> > > > wrote: >> > > > > > >> > > > > > > Using a repeatedholder as a @param I've got working. I was >> > working >> > > > on a >> > > > > > > custom aggregator function using DrillAggFunc. In this I can >> do >> > > > simple >> > > > > > > things but If I want to build a list values and do something >> with >> > > it >> > > > in >> > > > > > the >> > > > > > > final output method I think I need to use RepeatedHolders in >> the >> > > > > > > @Workspace. To do that I need to create a new one in the setup >> > > > method. >> > > > > I >> > > > > > > can't get one built. They all require a BufferAllocator to be >> > > passed >> > > > in >> > > > > > to >> > > > > > > build it. I have not found a way to get an allocator yet. Any >> > > > > > suggestions? >> > > > > > > >> > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > > > If you look at the zip function in >> > > > > > > > https://github.com/mapr-demos/simple-drill-functions you >> can >> > > have >> > > > an >> > > > > > > > example of building a structure. >> > > > > > > > >> > > > > > > > The basic idea is that your output is denoted as >> > > > > > > > >> > > > > > > > @Output >> > > > > > > > BaseWriter.ComplexWriter writer; >> > > > > > > > >> > > > > > > > The pattern for building a list of lists of integers is like >> > > this: >> > > > > > > > >> > > > > > > > writer.setValueCount(n); >> > > > > > > > ... >> > > > > > > > BaseWriter.ListWriter outer = writer.rootAsList(); >> > > > > > > > outer.start(); // [ outer list >> > > > > > > > ... >> > > > > > > > // for each inner list >> > > > > > > > BaseWriter.ListWriter inner = outer.list(); >> > > > > > > > inner.start(); >> > > > > > > > // for each inner list element >> > > > > > > > inner.integer().writeInt(accessor.get(i)); >> > > > > > > > } >> > > > > > > > inner.end(); // ] inner list >> > > > > > > > } >> > > > > > > > outer.end(); // ] outer list >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates < >> > [email protected]> >> > > > > > wrote: >> > > > > > > > >> > > > > > > > > I have working aggregation and simple UDFs. I've been >> trying >> > to >> > > > > > > document >> > > > > > > > > and understand each of the options available in a Drill >> UDF. >> > > > > > > > Understanding >> > > > > > > > > the different FunctionScope's, the ones that are allowed, >> the >> > > > ones >> > > > > > that >> > > > > > > > are >> > > > > > > > > not. The impact of different cost categories. The >> different >> > > > steps >> > > > > > > needed >> > > > > > > > > to understand handling any of the supported data types >> and >> > > > > > structures >> > > > > > > in >> > > > > > > > > drill. >> > > > > > > > > >> > > > > > > > > Here are a few of my current road blocks. Any pointers >> would >> > be >> > > > > > greatly >> > > > > > > > > appreciated. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > 1. I've been trying to understand how to correctly use >> > > > > > > RepeatedHolders >> > > > > > > > > of whatever type. For this discussion lets start with a >> > > > > > > > > RepeatedBigIntHolder. I'm trying to figure out the best >> > way >> > > to >> > > > > > > create >> > > > > > > > a >> > > > > > > > > new >> > > > > > > > > one. I have not figured out where in the existing drill >> > code >> > > > > > someone >> > > > > > > > > does >> > > > > > > > > this. If I use a RepeatedBigIntHolder as a Workspace >> > object >> > > > is >> > > > > is >> > > > > > > > null >> > > > > > > > > to >> > > > > > > > > start with. I created a new one in the startup section >> of >> > > the >> > > > > udf >> > > > > > > but >> > > > > > > > > the >> > > > > > > > > vector was null. I can find no reference in creating a >> new >> > > > > > > > BigIntVector. >> > > > > > > > > There is a way to create a BigIntVector and I did find >> an >> > > > > example >> > > > > > of >> > > > > > > > > creating a new VarCharVector but I can't do that using >> the >> > > > drill >> > > > > > jar >> > > > > > > > > files >> > > > > > > > > from 1.0. The org.apache.drill.common.types.TypeProtos >> and >> > > > > > > > > the org.apache.drill.common.types.TypeProtos.MinorType >> > > classes >> > > > > do >> > > > > > > not >> > > > > > > > > appear to be accessible from the drill jar files. >> > > > > > > > > 2. What is the best way to close out a UDF in the >> event it >> > > > > > generates >> > > > > > > > an >> > > > > > > > > exception? Are there specific steps one should follow >> to >> > > make >> > > > a >> > > > > > > clean >> > > > > > > > > exit >> > > > > > > > > in a catch block that are beneficial to Drill? >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
