I'm not sure, but I don't think you can/should create a BufferAllocator inside an UDF.
On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jba...@maprtech.com> wrote: > I did get a new RepeatedBigIntHolder built and added a BigIntVector added > to it. I'll try it in the UDF tomorrow and see if there is a difference in > the ways I found to get a BufferAllocator. > > . > . > . > @Inject DrillBuf buffer; > @Workspace RepeatedBigIntHolder yList; > . > . > . > @Override > public void setup() { > . > . > . > //org.apache.drill.exec.memory.BufferAllocator allocator = > buffer.getAllocator(); > org.apache.drill.exec.memory.BufferAllocator allocator = new > org.apache.drill.exec.memory.TopLevelAllocator(); > yList = new RepeatedBigIntHolder(); > yList.vector = new > > org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new > > org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN), > > org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)), > allocator); > . > . > . > } > > > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jba...@maprtech.com> wrote: > > > I still have issues finding the correct way to create and use a > > RepeatedHolder and Writers are a non starter for Workspace values. I can > > make do with creating a concatenated string in a VarCharHolder for small > > data sets to get past this in the short term and finish testing the > output > > values I expect but won't be able to do any scale till I figure out how > to > > make a repeated list. > > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jba...@maprtech.com> wrote: > > > >> Well... Converting from string to integers anyway... To many 4th of July > >> Hot Dogs. going into nitrate overload. :) > >> > >> I am pulling an array of string values from json data. The string values > >> are actually integers. I am converting to integers and summing each > >> array entry to the final tally. > >> > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <jba...@maprtech.com> wrote: > >> > >>> Ted, > >>> > >>> Yes, I started out just getting a basic count to work. I am trying to > >>> keep the workflow as close to a basic user as possible. As such, I am > >>> building and using the MapR Apache Drill sandbox to test. > >>> > >>> > >>> 1. Always look at the drillbits.log file to see if drill had any > >>> issues loading your UDF. That was where I learned that all > workspace values > >>> needed to be holders > >>> - > >>> - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure loading > >>> function class > >>> > com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, field > >>> xList. Aggregate function 'MyLinearRegression1' workspace > variable 'xList' > >>> is of type 'interface > >>> > org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. > >>> Please change it to Holder type. > >>> 2. Error messages: > >>> - If you get an error in this format it means that Drill can not > >>> find your function so it probably didn't load it. back to step 1: > >>> - > >>> - PARSE ERROR: From line 1, column 8 to line 1, column 44: No > >>> match found for function signature MyFunctionName(<ANY>) > >>> - If you get an error in this format it means that the function > >>> is there but Drill could not find a signature that matched the > param types > >>> or param numbers you were passing it. The exact wording will > change but > >>> the Missing function implementation is the key phrase to look > for: > >>> - > >>> - Error: SYSTEM ERROR: > >>> org.apache.drill.exec.exception.SchemaChangeException: > Failure while trying > >>> to materialize incoming schema. Errors: > >>> - Error in expression at index -1. Error: Missing function > >>> implementation: [castBIGINT(VARCHAR-REPEATED)]. Full > expression: --UNKNOWN > >>> EXPRESSION-- > >>> 3. In your function definition for aggregate functions you need > >>> to set null processing to internal and your isRandom to false. > Example > >>> below: > >>> - > >>> - @FunctionTemplate(name = "MyFunctionName", scope = > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false, > >>> isBinaryCommutative = false, costCategory = > >>> FunctionTemplate.FunctionCostCategory.COMPLEX) > >>> > >>> Below is an example from the Apache Drill tutorial data sets contained > >>> in the MapR Apache Drill sandbox. I am pulling an array if string > values > >>> from json data. The string values are actually integers. I am > converting to > >>> string and summing each array entry to the final tally. This in no way > >>> represents what this data was for but it did become a handy way for me > to > >>> peck out the "correct" way to build an aggregation UDF function > >>> > >>> @FunctionTemplate(name = "MyArraySum", scope = > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false, > >>> isBinaryCommutative = false, costCategory = > >>> FunctionTemplate.FunctionCostCategory.COMPLEX) > >>> public static class MyArraySum implements DrillAggFunc { > >>> > >>> @Param RepeatedVarCharHolder listToSearch; > >>> @Workspace NullableBigIntHolder count; > >>> @Workspace NullableBigIntHolder sum; > >>> @Workspace NullableVarCharHolder vc; > >>> @Output BigIntHolder out; > >>> > >>> @Override > >>> public void setup() { > >>> count.value=0; > >>> sum.value = 0; > >>> } > >>> > >>> @Override > >>> public void add() { > >>> int c = listToSearch.end - listToSearch.start; > >>> int val = 0; > >>> try { > >>> for(int i=0; i<c; i++){ > >>> listToSearch.vector.getAccessor().get(i, vc); > >>> String inputStr = > >>> > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, > >>> vc.end, vc.buffer); > >>> val = Integer.parseInt(inputStr); > >>> sum.value = sum.value + val; > >>> } > >>> } catch (Exception e) { > >>> val = 0; > >>> } > >>> count.value = count.value + 1; > >>> } > >>> > >>> Example select statement: > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5); > >>> > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <ted.dunn...@gmail.com> > >>> wrote: > >>> > >>>> Jim, > >>>> > >>>> I think that you may be having trouble with aggregators in general. > >>>> > >>>> Have you been able to build *any* aggregator of anything? I haven't. > >>>> > >>>> When I try to build an aggregator of int's or doubles, I get a very > >>>> persistent problem with Drill even seeing my aggregates: > >>>> > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from > >>>> cp.`employee.json`;* > >>>> > >>>> Jul 04, 2015 4:19:35 PM > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init> > >>>> > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No > match > >>>> found for function signature sum_int(<ANY>) > >>>> > >>>> Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException > >>>> <init> > >>>> > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line > 1, > >>>> column 8 to line 1, column 27: No match found for function signature > >>>> sum_int(<ANY>) > >>>> > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No > >>>> match > >>>> found for function signature sum_int(<ANY>)* > >>>> > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010 > >>>> <http://10.0.1.2:31010>] (state=,code=0)* > >>>> > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as int)) from > >>>> cp.`employee.json`*; > >>>> > >>>> Jul 04, 2015 4:19:45 PM > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init> > >>>> > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No > match > >>>> found for function signature sum_int(<NUMERIC>) > >>>> > >>>> Jul 04, 2015 4:19:45 PM org.apache.calcite.runtime.CalciteException > >>>> <init> > >>>> > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line > 1, > >>>> column 8 to line 1, column 40: No match found for function signature > >>>> sum_int(<NUMERIC>) > >>>> > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: No > >>>> match > >>>> found for function signature sum_int(<NUMERIC>)* > >>>> > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on 10.0.1.2:31010 > >>>> <http://10.0.1.2:31010>] (state=,code=0)* > >>>> > >>>> 0: jdbc:drill:zk=local> > >>>> > >>>> > >>>> It looks like there is some undocumented subtlety about how to > register > >>>> an > >>>> aggregator. > >>>> > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <jba...@maprtech.com> > wrote: > >>>> > >>>> > I'm working on the same thing. I want to aggregate a list of values. > >>>> It has > >>>> > been a search and guess game for the most part. I'm still stuck in > the > >>>> > process of getting the values all into a list. The writers look > >>>> interesting > >>>> > but for aggregation functions it looks like the input is the param > >>>> and > >>>> > output objects can't hold the aggregations steps. The Workspace is > >>>> where > >>>> > that happens. If I try and use a Writer in a workspace it won't load > >>>> and > >>>> > tells me to change it to Holders which was why I was using them to > >>>> start > >>>> > with. Maybe I'm missing the architecture of the agg function. It > >>>> looked > >>>> > like it was.... > >>>> > > >>>> > @Param comes in -> initialize @Workspace vars in setup -> process > data > >>>> > through @Workspace vars in add -> finalize @Output in output. > >>>> > > >>>> > So I'm back to trying to figure out how to create a > >>>> RepeatedBigIntHolder or > >>>> > a RepeatedVarCharHolder... > >>>> > > >>>> > > >>>> > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <ted.dunn...@gmail.com> > >>>> wrote: > >>>> > > >>>> > > I am working on trying to build any kind of list constructing > >>>> aggregator > >>>> > > and having absolute fits. > >>>> > > > >>>> > > To simplify life, I decided to just build a generic list builder > >>>> that is > >>>> > a > >>>> > > scalar function that returns a list containing its argument. Thus > >>>> > zoop(3) > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]]. > >>>> > > > >>>> > > The ComplexWriter looks like the place to go. As usual, the > >>>> complete lack > >>>> > > of comments in most of Drill makes this very hard since I have to > >>>> guess > >>>> > > what works and what doesn't. > >>>> > > > >>>> > > In my code, I note that ComplexWriter has a nice rootAsList() > >>>> method. I > >>>> > > used this in zip and it works nicely to construct lists for > >>>> output. I > >>>> > note > >>>> > > that the resulting ListWriter has a method copyReader(FieldReader > >>>> var1) > >>>> > > which looks really good. > >>>> > > > >>>> > > Unfortunately, the only implementation of copyReader() is in > >>>> > > AbstractFieldWriter and it looks this: > >>>> > > > >>>> > > public void copyReader(FieldReader reader) { > >>>> > > this.fail("Copy FieldReader"); > >>>> > > } > >>>> > > > >>>> > > I would like to formally say at this point "WTF"? > >>>> > > > >>>> > > In digging in further, I see other methods that look handy like > >>>> > > > >>>> > > public void write(IntHolder holder) { > >>>> > > this.fail("Int"); > >>>> > > } > >>>> > > > >>>> > > And then in looking at implementations, it looks like there is a > >>>> > > combinatorial explosion because every type seems to need a write > >>>> method > >>>> > for > >>>> > > every other type. > >>>> > > > >>>> > > What is the thought here? How can I copy an arbitrary value into > a > >>>> list? > >>>> > > > >>>> > > My next thought was to build code that dispatches on type. There > >>>> is a > >>>> > > method called getType() on the FieldReader. Unfortunately, that > >>>> drives > >>>> > > into code generated by protoc and I see no way to dispatch on the > >>>> type of > >>>> > > an incoming value. > >>>> > > > >>>> > > > >>>> > > How is this supposed to work? > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid < > baid.meh...@gmail.com> > >>>> > wrote: > >>>> > > > >>>> > > > For a detailed example on using ComplexWriter interface you can > >>>> take a > >>>> > > look > >>>> > > > at the Mappify > >>>> > > > < > >>>> > > > > >>>> > > > >>>> > > >>>> > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java > >>>> > > > > > >>>> > > > (kvgen) function. The function itself is very simple however it > >>>> makes > >>>> > use > >>>> > > > of the utility methods in MappifyUtility > >>>> > > > < > >>>> > > > > >>>> > > > >>>> > > >>>> > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java > >>>> > > > > > >>>> > > > and MapUtility > >>>> > > > < > >>>> > > > > >>>> > > > >>>> > > >>>> > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java > >>>> > > > > > >>>> > > > which perform most of the work. > >>>> > > > > >>>> > > > Currently we don't have a generic infrastructure to handle > errors > >>>> > coming > >>>> > > > out of functions. However there is UserException, which when > >>>> raised > >>>> > will > >>>> > > > make sure that Drill does not gobble up the error message in > that > >>>> > > > exception. So you can probably throw a UserException with the > >>>> failing > >>>> > > input > >>>> > > > in your function to make sure it propagates to the user. > >>>> > > > > >>>> > > > Thanks > >>>> > > > Mehant > >>>> > > > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau < > >>>> jacq...@apache.org> > >>>> > > wrote: > >>>> > > > > >>>> > > > > *Holders are for both input and output. You can also use > >>>> > CompleWriter > >>>> > > > for > >>>> > > > > output and FieldReader for input if you want to write or read > a > >>>> > complex > >>>> > > > > value. > >>>> > > > > > >>>> > > > > I don't think we've provided a really clean way to construct a > >>>> > > > > Repeated*Holder for output purposes. You can probably do it > by > >>>> > > reaching > >>>> > > > > into a bunch of internal interfaces in Drill. However, I > would > >>>> > > recommend > >>>> > > > > using the ComplexWriter output pattern for now. This will be > a > >>>> > little > >>>> > > > less > >>>> > > > > efficient but substantially less brittle. I suggest you open > >>>> up a > >>>> > jira > >>>> > > > for > >>>> > > > > using a Repeated*Holder as an output. > >>>> > > > > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning < > >>>> ted.dunn...@gmail.com> > >>>> > > > wrote: > >>>> > > > > > >>>> > > > > > Holders are for input, I think. > >>>> > > > > > > >>>> > > > > > Try the different kinds of writers. > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates < > >>>> jba...@maprtech.com> > >>>> > > > wrote: > >>>> > > > > > > >>>> > > > > > > Using a repeatedholder as a @param I've got working. I was > >>>> > working > >>>> > > > on a > >>>> > > > > > > custom aggregator function using DrillAggFunc. In this I > >>>> can do > >>>> > > > simple > >>>> > > > > > > things but If I want to build a list values and do > >>>> something with > >>>> > > it > >>>> > > > in > >>>> > > > > > the > >>>> > > > > > > final output method I think I need to use RepeatedHolders > >>>> in the > >>>> > > > > > > @Workspace. To do that I need to create a new one in the > >>>> setup > >>>> > > > method. > >>>> > > > > I > >>>> > > > > > > can't get one built. They all require a BufferAllocator to > >>>> be > >>>> > > passed > >>>> > > > in > >>>> > > > > > to > >>>> > > > > > > build it. I have not found a way to get an allocator yet. > >>>> Any > >>>> > > > > > suggestions? > >>>> > > > > > > > >>>> > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning < > >>>> > ted.dunn...@gmail.com > >>>> > > > > >>>> > > > > > wrote: > >>>> > > > > > > > >>>> > > > > > > > If you look at the zip function in > >>>> > > > > > > > https://github.com/mapr-demos/simple-drill-functions > you > >>>> can > >>>> > > have > >>>> > > > an > >>>> > > > > > > > example of building a structure. > >>>> > > > > > > > > >>>> > > > > > > > The basic idea is that your output is denoted as > >>>> > > > > > > > > >>>> > > > > > > > @Output > >>>> > > > > > > > BaseWriter.ComplexWriter writer; > >>>> > > > > > > > > >>>> > > > > > > > The pattern for building a list of lists of integers is > >>>> like > >>>> > > this: > >>>> > > > > > > > > >>>> > > > > > > > writer.setValueCount(n); > >>>> > > > > > > > ... > >>>> > > > > > > > BaseWriter.ListWriter outer = > writer.rootAsList(); > >>>> > > > > > > > outer.start(); // [ outer list > >>>> > > > > > > > ... > >>>> > > > > > > > // for each inner list > >>>> > > > > > > > BaseWriter.ListWriter inner = outer.list(); > >>>> > > > > > > > inner.start(); > >>>> > > > > > > > // for each inner list element > >>>> > > > > > > > > inner.integer().writeInt(accessor.get(i)); > >>>> > > > > > > > } > >>>> > > > > > > > inner.end(); // ] inner list > >>>> > > > > > > > } > >>>> > > > > > > > outer.end(); // ] outer list > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates < > >>>> > jba...@maprtech.com> > >>>> > > > > > wrote: > >>>> > > > > > > > > >>>> > > > > > > > > I have working aggregation and simple UDFs. I've been > >>>> trying > >>>> > to > >>>> > > > > > > document > >>>> > > > > > > > > and understand each of the options available in a > Drill > >>>> UDF. > >>>> > > > > > > > Understanding > >>>> > > > > > > > > the different FunctionScope's, the ones that are > >>>> allowed, the > >>>> > > > ones > >>>> > > > > > that > >>>> > > > > > > > are > >>>> > > > > > > > > not. The impact of different cost categories. The > >>>> different > >>>> > > > steps > >>>> > > > > > > needed > >>>> > > > > > > > > to understand handling any of the supported data types > >>>> and > >>>> > > > > > structures > >>>> > > > > > > in > >>>> > > > > > > > > drill. > >>>> > > > > > > > > > >>>> > > > > > > > > Here are a few of my current road blocks. Any pointers > >>>> would > >>>> > be > >>>> > > > > > greatly > >>>> > > > > > > > > appreciated. > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > 1. I've been trying to understand how to correctly > >>>> use > >>>> > > > > > > RepeatedHolders > >>>> > > > > > > > > of whatever type. For this discussion lets start > >>>> with a > >>>> > > > > > > > > RepeatedBigIntHolder. I'm trying to figure out the > >>>> best > >>>> > way > >>>> > > to > >>>> > > > > > > create > >>>> > > > > > > > a > >>>> > > > > > > > > new > >>>> > > > > > > > > one. I have not figured out where in the existing > >>>> drill > >>>> > code > >>>> > > > > > someone > >>>> > > > > > > > > does > >>>> > > > > > > > > this. If I use a RepeatedBigIntHolder as a > Workspace > >>>> > object > >>>> > > > is > >>>> > > > > is > >>>> > > > > > > > null > >>>> > > > > > > > > to > >>>> > > > > > > > > start with. I created a new one in the startup > >>>> section of > >>>> > > the > >>>> > > > > udf > >>>> > > > > > > but > >>>> > > > > > > > > the > >>>> > > > > > > > > vector was null. I can find no reference in > creating > >>>> a new > >>>> > > > > > > > BigIntVector. > >>>> > > > > > > > > There is a way to create a BigIntVector and I did > >>>> find an > >>>> > > > > example > >>>> > > > > > of > >>>> > > > > > > > > creating a new VarCharVector but I can't do that > >>>> using the > >>>> > > > drill > >>>> > > > > > jar > >>>> > > > > > > > > files > >>>> > > > > > > > > from 1.0. The > >>>> org.apache.drill.common.types.TypeProtos and > >>>> > > > > > > > > the > >>>> org.apache.drill.common.types.TypeProtos.MinorType > >>>> > > classes > >>>> > > > > do > >>>> > > > > > > not > >>>> > > > > > > > > appear to be accessible from the drill jar files. > >>>> > > > > > > > > 2. What is the best way to close out a UDF in the > >>>> event it > >>>> > > > > > generates > >>>> > > > > > > > an > >>>> > > > > > > > > exception? Are there specific steps one should > >>>> follow to > >>>> > > make > >>>> > > > a > >>>> > > > > > > clean > >>>> > > > > > > > > exit > >>>> > > > > > > > > in a catch block that are beneficial to Drill? > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>