Just to close out this thread.... I got my final UDFs to work. I ended up with 2. One to create an array of values and the other to calculate a simple linear regression. This data set was a simple x = y slope
SELECT MyLinearRegression2(xValues,yValues,CAST(22356 as BIGINT)) as xPerdict FROM (SELECT MyList(test_field1) as xValues, MyList(test_field2) as yValues FROM (SELECT test_field1,test_field2 FROM `hive.default`.`my_hive_table` limit 10)); +-----------+ | xPerdict | +-----------+ | 22356.0 | +-----------+ On Sun, Jul 5, 2015 at 4:10 PM, Jacques Nadeau <[email protected]> wrote: > You're right. You're off the beaten path. I think everyone here would love > to have more documentation and more comments. Of course, all of these take > time. > > If you have time to volunteer to help improve these things, that would be > great. > > With regards to the question about the jira, describe your use case and > what functionality you couldn't find or make work. The active developers on > the project can then do their best to help shape the Jira into better docs, > javadocs and/or new functionality as time allows. > > On Jul 5, 2015 1:37 PM, "Ted Dunning" <[email protected]> wrote: > > > Uh... actually, I think that it isn't obvious because there is absolutely > > no documentation and there are no comments in the code. > > > > And what should the JIRA say? We can't even tell what's missing, if > > anything, because we can't tell how it is supposed to work. > > > > > > > > > > On Sun, Jul 5, 2015 at 11:50 AM, Jacques Nadeau <[email protected]> > > wrote: > > > > > It isn't obvious because you shouldn't do it. Please file a JIRA to > add > > > real support for this type of output. > > > > > > Your current function would leak large amounts of memory that would > > > ultimately crash the node. > > > > > > Realistically, there are very few internal Drill APIs that you should > > > access via a UDF (injectables, holders, complexwriter, fieldreader and > > > helpers). A post 1.0 goal was to provide a UDF interface JAR to ensure > > > people don't accidentally reach into Drill's internals. (A later > > > possibility is bytecode weaving to completely protect against it). > > > > > > J > > > > > > On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <[email protected]> > > > wrote: > > > > > > > That was impressively non-obvious. > > > > > > > > > > > > > > > > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <[email protected]> > wrote: > > > > > > > > > I did get a new RepeatedBigIntHolder built and added a BigIntVector > > > added > > > > > to it. I'll try it in the UDF tomorrow and see if there is a > > difference > > > > in > > > > > the ways I found to get a BufferAllocator. > > > > > > > > > > . > > > > > . > > > > > . > > > > > @Inject DrillBuf buffer; > > > > > @Workspace RepeatedBigIntHolder yList; > > > > > . > > > > > . > > > > > . > > > > > @Override > > > > > public void setup() { > > > > > . > > > > > . > > > > > . > > > > > //org.apache.drill.exec.memory.BufferAllocator allocator = > > > > > buffer.getAllocator(); > > > > > org.apache.drill.exec.memory.BufferAllocator allocator = new > > > > > org.apache.drill.exec.memory.TopLevelAllocator(); > > > > > yList = new RepeatedBigIntHolder(); > > > > > yList.vector = new > > > > > > > > > > > > > > > > > > > > org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new > > > > > > > > > > > > > > > > > > > > org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN), > > > > > > > > > > > > > > > > > > > > org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)), > > > > > allocator); > > > > > . > > > > > . > > > > > . > > > > > } > > > > > > > > > > > > > > > > > > > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <[email protected]> > > wrote: > > > > > > > > > > > I still have issues finding the correct way to create and use a > > > > > > RepeatedHolder and Writers are a non starter for Workspace > values. > > I > > > > can > > > > > > make do with creating a concatenated string in a VarCharHolder > for > > > > small > > > > > > data sets to get past this in the short term and finish testing > the > > > > > output > > > > > > values I expect but won't be able to do any scale till I figure > out > > > how > > > > > to > > > > > > make a repeated list. > > > > > > > > > > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <[email protected]> > > > wrote: > > > > > > > > > > > >> Well... Converting from string to integers anyway... To many 4th > > of > > > > July > > > > > >> Hot Dogs. going into nitrate overload. :) > > > > > >> > > > > > >> I am pulling an array of string values from json data. The > string > > > > values > > > > > >> are actually integers. I am converting to integers and summing > > each > > > > > >> array entry to the final tally. > > > > > >> > > > > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <[email protected]> > > > > wrote: > > > > > >> > > > > > >>> Ted, > > > > > >>> > > > > > >>> Yes, I started out just getting a basic count to work. I am > > trying > > > to > > > > > >>> keep the workflow as close to a basic user as possible. As > such, > > I > > > am > > > > > >>> building and using the MapR Apache Drill sandbox to test. > > > > > >>> > > > > > >>> > > > > > >>> 1. Always look at the drillbits.log file to see if drill had > > any > > > > > >>> issues loading your UDF. That was where I learned that all > > > > > workspace values > > > > > >>> needed to be holders > > > > > >>> - > > > > > >>> - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure > > > loading > > > > > >>> function class > > > > > >>> > > > > > > com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, > > > > field > > > > > >>> xList. Aggregate function 'MyLinearRegression1' workspace > > > > > variable 'xList' > > > > > >>> is of type 'interface > > > > > >>> > > > > > > > org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. > > > > > >>> Please change it to Holder type. > > > > > >>> 2. Error messages: > > > > > >>> - If you get an error in this format it means that Drill > > can > > > > not > > > > > >>> find your function so it probably didn't load it. back to > > > step > > > > 1: > > > > > >>> - > > > > > >>> - PARSE ERROR: From line 1, column 8 to line 1, column > > 44: > > > > No > > > > > >>> match found for function signature > MyFunctionName(<ANY>) > > > > > >>> - If you get an error in this format it means that the > > > function > > > > > >>> is there but Drill could not find a signature that > matched > > > the > > > > > param types > > > > > >>> or param numbers you were passing it. The exact wording > > will > > > > > change but > > > > > >>> the Missing function implementation is the key phrase to > > look > > > > > for: > > > > > >>> - > > > > > >>> - Error: SYSTEM ERROR: > > > > > >>> org.apache.drill.exec.exception.SchemaChangeException: > > > > > Failure while trying > > > > > >>> to materialize incoming schema. Errors: > > > > > >>> - Error in expression at index -1. Error: Missing > > > function > > > > > >>> implementation: [castBIGINT(VARCHAR-REPEATED)]. Full > > > > > expression: --UNKNOWN > > > > > >>> EXPRESSION-- > > > > > >>> 3. In your function definition for aggregate functions > you > > > need > > > > > >>> to set null processing to internal and your isRandom to > false. > > > > > Example > > > > > >>> below: > > > > > >>> - > > > > > >>> - @FunctionTemplate(name = "MyFunctionName", scope = > > > > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = > > > > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false, > > > > > >>> isBinaryCommutative = false, costCategory = > > > > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX) > > > > > >>> > > > > > >>> Below is an example from the Apache Drill tutorial data sets > > > > contained > > > > > >>> in the MapR Apache Drill sandbox. I am pulling an array if > string > > > > > values > > > > > >>> from json data. The string values are actually integers. I am > > > > > converting to > > > > > >>> string and summing each array entry to the final tally. This in > > no > > > > way > > > > > >>> represents what this data was for but it did become a handy way > > for > > > > me > > > > > to > > > > > >>> peck out the "correct" way to build an aggregation UDF function > > > > > >>> > > > > > >>> @FunctionTemplate(name = "MyArraySum", scope = > > > > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = > > > > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false, > > > > > >>> isBinaryCommutative = false, costCategory = > > > > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX) > > > > > >>> public static class MyArraySum implements DrillAggFunc { > > > > > >>> > > > > > >>> @Param RepeatedVarCharHolder listToSearch; > > > > > >>> @Workspace NullableBigIntHolder count; > > > > > >>> @Workspace NullableBigIntHolder sum; > > > > > >>> @Workspace NullableVarCharHolder vc; > > > > > >>> @Output BigIntHolder out; > > > > > >>> > > > > > >>> @Override > > > > > >>> public void setup() { > > > > > >>> count.value=0; > > > > > >>> sum.value = 0; > > > > > >>> } > > > > > >>> > > > > > >>> @Override > > > > > >>> public void add() { > > > > > >>> int c = listToSearch.end - listToSearch.start; > > > > > >>> int val = 0; > > > > > >>> try { > > > > > >>> for(int i=0; i<c; i++){ > > > > > >>> listToSearch.vector.getAccessor().get(i, vc); > > > > > >>> String inputStr = > > > > > >>> > > > > > > > > > > > > > > > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, > > > > > >>> vc.end, vc.buffer); > > > > > >>> val = Integer.parseInt(inputStr); > > > > > >>> sum.value = sum.value + val; > > > > > >>> } > > > > > >>> } catch (Exception e) { > > > > > >>> val = 0; > > > > > >>> } > > > > > >>> count.value = count.value + 1; > > > > > >>> } > > > > > >>> > > > > > >>> Example select statement: > > > > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id > as > > > > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t > > limit > > > > 5); > > > > > >>> > > > > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning < > > [email protected] > > > > > > > > > >>> wrote: > > > > > >>> > > > > > >>>> Jim, > > > > > >>>> > > > > > >>>> I think that you may be having trouble with aggregators in > > > general. > > > > > >>>> > > > > > >>>> Have you been able to build *any* aggregator of anything? I > > > > haven't. > > > > > >>>> > > > > > >>>> When I try to build an aggregator of int's or doubles, I get a > > > very > > > > > >>>> persistent problem with Drill even seeing my aggregates: > > > > > >>>> > > > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from > > > > > >>>> cp.`employee.json`;* > > > > > >>>> > > > > > >>>> Jul 04, 2015 4:19:35 PM > > > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init> > > > > > >>>> > > > > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: > > No > > > > > match > > > > > >>>> found for function signature sum_int(<ANY>) > > > > > >>>> > > > > > >>>> Jul 04, 2015 4:19:35 PM > > > org.apache.calcite.runtime.CalciteException > > > > > >>>> <init> > > > > > >>>> > > > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: > From > > > > line > > > > > 1, > > > > > >>>> column 8 to line 1, column 27: No match found for function > > > signature > > > > > >>>> sum_int(<ANY>) > > > > > >>>> > > > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column > 27: > > > No > > > > > >>>> match > > > > > >>>> found for function signature sum_int(<ANY>)* > > > > > >>>> > > > > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on > > > 10.0.1.2:31010 > > > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)* > > > > > >>>> > > > > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as > > int)) > > > > from > > > > > >>>> cp.`employee.json`*; > > > > > >>>> > > > > > >>>> Jul 04, 2015 4:19:45 PM > > > > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init> > > > > > >>>> > > > > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: > > No > > > > > match > > > > > >>>> found for function signature sum_int(<NUMERIC>) > > > > > >>>> > > > > > >>>> Jul 04, 2015 4:19:45 PM > > > org.apache.calcite.runtime.CalciteException > > > > > >>>> <init> > > > > > >>>> > > > > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: > From > > > > line > > > > > 1, > > > > > >>>> column 8 to line 1, column 40: No match found for function > > > signature > > > > > >>>> sum_int(<NUMERIC>) > > > > > >>>> > > > > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column > 40: > > > No > > > > > >>>> match > > > > > >>>> found for function signature sum_int(<NUMERIC>)* > > > > > >>>> > > > > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on > > > 10.0.1.2:31010 > > > > > >>>> <http://10.0.1.2:31010>] (state=,code=0)* > > > > > >>>> > > > > > >>>> 0: jdbc:drill:zk=local> > > > > > >>>> > > > > > >>>> > > > > > >>>> It looks like there is some undocumented subtlety about how to > > > > > register > > > > > >>>> an > > > > > >>>> aggregator. > > > > > >>>> > > > > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates < > [email protected]> > > > > > wrote: > > > > > >>>> > > > > > >>>> > I'm working on the same thing. I want to aggregate a list of > > > > values. > > > > > >>>> It has > > > > > >>>> > been a search and guess game for the most part. I'm still > > stuck > > > in > > > > > the > > > > > >>>> > process of getting the values all into a list. The writers > > look > > > > > >>>> interesting > > > > > >>>> > but for aggregation functions it looks like the input is > the > > > > param > > > > > >>>> and > > > > > >>>> > output objects can't hold the aggregations steps. The > > Workspace > > > is > > > > > >>>> where > > > > > >>>> > that happens. If I try and use a Writer in a workspace it > > won't > > > > load > > > > > >>>> and > > > > > >>>> > tells me to change it to Holders which was why I was using > > them > > > to > > > > > >>>> start > > > > > >>>> > with. Maybe I'm missing the architecture of the agg > function. > > It > > > > > >>>> looked > > > > > >>>> > like it was.... > > > > > >>>> > > > > > > >>>> > @Param comes in -> initialize @Workspace vars in setup -> > > > process > > > > > data > > > > > >>>> > through @Workspace vars in add -> finalize @Output in > output. > > > > > >>>> > > > > > > >>>> > So I'm back to trying to figure out how to create a > > > > > >>>> RepeatedBigIntHolder or > > > > > >>>> > a RepeatedVarCharHolder... > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning < > > > > [email protected]> > > > > > >>>> wrote: > > > > > >>>> > > > > > > >>>> > > I am working on trying to build any kind of list > > constructing > > > > > >>>> aggregator > > > > > >>>> > > and having absolute fits. > > > > > >>>> > > > > > > > >>>> > > To simplify life, I decided to just build a generic list > > > builder > > > > > >>>> that is > > > > > >>>> > a > > > > > >>>> > > scalar function that returns a list containing its > argument. > > > > Thus > > > > > >>>> > zoop(3) > > > > > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => > [[1,2,3]]. > > > > > >>>> > > > > > > > >>>> > > The ComplexWriter looks like the place to go. As usual, > the > > > > > >>>> complete lack > > > > > >>>> > > of comments in most of Drill makes this very hard since I > > have > > > > to > > > > > >>>> guess > > > > > >>>> > > what works and what doesn't. > > > > > >>>> > > > > > > > >>>> > > In my code, I note that ComplexWriter has a nice > > rootAsList() > > > > > >>>> method. I > > > > > >>>> > > used this in zip and it works nicely to construct lists > for > > > > > >>>> output. I > > > > > >>>> > note > > > > > >>>> > > that the resulting ListWriter has a method > > > > copyReader(FieldReader > > > > > >>>> var1) > > > > > >>>> > > which looks really good. > > > > > >>>> > > > > > > > >>>> > > Unfortunately, the only implementation of copyReader() is > in > > > > > >>>> > > AbstractFieldWriter and it looks this: > > > > > >>>> > > > > > > > >>>> > > public void copyReader(FieldReader reader) { > > > > > >>>> > > this.fail("Copy FieldReader"); > > > > > >>>> > > } > > > > > >>>> > > > > > > > >>>> > > I would like to formally say at this point "WTF"? > > > > > >>>> > > > > > > > >>>> > > In digging in further, I see other methods that look handy > > > like > > > > > >>>> > > > > > > > >>>> > > public void write(IntHolder holder) { > > > > > >>>> > > this.fail("Int"); > > > > > >>>> > > } > > > > > >>>> > > > > > > > >>>> > > And then in looking at implementations, it looks like > there > > > is a > > > > > >>>> > > combinatorial explosion because every type seems to need a > > > write > > > > > >>>> method > > > > > >>>> > for > > > > > >>>> > > every other type. > > > > > >>>> > > > > > > > >>>> > > What is the thought here? How can I copy an arbitrary > value > > > > into > > > > > a > > > > > >>>> list? > > > > > >>>> > > > > > > > >>>> > > My next thought was to build code that dispatches on type. > > > > There > > > > > >>>> is a > > > > > >>>> > > method called getType() on the FieldReader. > Unfortunately, > > > that > > > > > >>>> drives > > > > > >>>> > > into code generated by protoc and I see no way to dispatch > > on > > > > the > > > > > >>>> type of > > > > > >>>> > > an incoming value. > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > How is this supposed to work? > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid < > > > > > [email protected]> > > > > > >>>> > wrote: > > > > > >>>> > > > > > > > >>>> > > > For a detailed example on using ComplexWriter interface > > you > > > > can > > > > > >>>> take a > > > > > >>>> > > look > > > > > >>>> > > > at the Mappify > > > > > >>>> > > > < > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java > > > > > >>>> > > > > > > > > > >>>> > > > (kvgen) function. The function itself is very simple > > however > > > > it > > > > > >>>> makes > > > > > >>>> > use > > > > > >>>> > > > of the utility methods in MappifyUtility > > > > > >>>> > > > < > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java > > > > > >>>> > > > > > > > > > >>>> > > > and MapUtility > > > > > >>>> > > > < > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java > > > > > >>>> > > > > > > > > > >>>> > > > which perform most of the work. > > > > > >>>> > > > > > > > > >>>> > > > Currently we don't have a generic infrastructure to > handle > > > > > errors > > > > > >>>> > coming > > > > > >>>> > > > out of functions. However there is UserException, which > > when > > > > > >>>> raised > > > > > >>>> > will > > > > > >>>> > > > make sure that Drill does not gobble up the error > message > > in > > > > > that > > > > > >>>> > > > exception. So you can probably throw a UserException > with > > > the > > > > > >>>> failing > > > > > >>>> > > input > > > > > >>>> > > > in your function to make sure it propagates to the user. > > > > > >>>> > > > > > > > > >>>> > > > Thanks > > > > > >>>> > > > Mehant > > > > > >>>> > > > > > > > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau < > > > > > >>>> [email protected]> > > > > > >>>> > > wrote: > > > > > >>>> > > > > > > > > >>>> > > > > *Holders are for both input and output. You can also > > use > > > > > >>>> > CompleWriter > > > > > >>>> > > > for > > > > > >>>> > > > > output and FieldReader for input if you want to write > or > > > > read > > > > > a > > > > > >>>> > complex > > > > > >>>> > > > > value. > > > > > >>>> > > > > > > > > > >>>> > > > > I don't think we've provided a really clean way to > > > > construct a > > > > > >>>> > > > > Repeated*Holder for output purposes. You can probably > > do > > > it > > > > > by > > > > > >>>> > > reaching > > > > > >>>> > > > > into a bunch of internal interfaces in Drill. > However, > > I > > > > > would > > > > > >>>> > > recommend > > > > > >>>> > > > > using the ComplexWriter output pattern for now. This > > will > > > > be > > > > > a > > > > > >>>> > little > > > > > >>>> > > > less > > > > > >>>> > > > > efficient but substantially less brittle. I suggest > you > > > > open > > > > > >>>> up a > > > > > >>>> > jira > > > > > >>>> > > > for > > > > > >>>> > > > > using a Repeated*Holder as an output. > > > > > >>>> > > > > > > > > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning < > > > > > >>>> [email protected]> > > > > > >>>> > > > wrote: > > > > > >>>> > > > > > > > > > >>>> > > > > > Holders are for input, I think. > > > > > >>>> > > > > > > > > > > >>>> > > > > > Try the different kinds of writers. > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates < > > > > > >>>> [email protected]> > > > > > >>>> > > > wrote: > > > > > >>>> > > > > > > > > > > >>>> > > > > > > Using a repeatedholder as a @param I've got > > working. I > > > > was > > > > > >>>> > working > > > > > >>>> > > > on a > > > > > >>>> > > > > > > custom aggregator function using DrillAggFunc. In > > > this I > > > > > >>>> can do > > > > > >>>> > > > simple > > > > > >>>> > > > > > > things but If I want to build a list values and do > > > > > >>>> something with > > > > > >>>> > > it > > > > > >>>> > > > in > > > > > >>>> > > > > > the > > > > > >>>> > > > > > > final output method I think I need to use > > > > RepeatedHolders > > > > > >>>> in the > > > > > >>>> > > > > > > @Workspace. To do that I need to create a new one > in > > > the > > > > > >>>> setup > > > > > >>>> > > > method. > > > > > >>>> > > > > I > > > > > >>>> > > > > > > can't get one built. They all require a > > > BufferAllocator > > > > to > > > > > >>>> be > > > > > >>>> > > passed > > > > > >>>> > > > in > > > > > >>>> > > > > > to > > > > > >>>> > > > > > > build it. I have not found a way to get an > allocator > > > > yet. > > > > > >>>> Any > > > > > >>>> > > > > > suggestions? > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning < > > > > > >>>> > [email protected] > > > > > >>>> > > > > > > > > >>>> > > > > > wrote: > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > If you look at the zip function in > > > > > >>>> > > > > > > > > > > https://github.com/mapr-demos/simple-drill-functions > > > > > you > > > > > >>>> can > > > > > >>>> > > have > > > > > >>>> > > > an > > > > > >>>> > > > > > > > example of building a structure. > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > The basic idea is that your output is denoted as > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > @Output > > > > > >>>> > > > > > > > BaseWriter.ComplexWriter writer; > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > The pattern for building a list of lists of > > integers > > > > is > > > > > >>>> like > > > > > >>>> > > this: > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > writer.setValueCount(n); > > > > > >>>> > > > > > > > ... > > > > > >>>> > > > > > > > BaseWriter.ListWriter outer = > > > > > writer.rootAsList(); > > > > > >>>> > > > > > > > outer.start(); // [ outer list > > > > > >>>> > > > > > > > ... > > > > > >>>> > > > > > > > // for each inner list > > > > > >>>> > > > > > > > BaseWriter.ListWriter inner = > > > > outer.list(); > > > > > >>>> > > > > > > > inner.start(); > > > > > >>>> > > > > > > > // for each inner list element > > > > > >>>> > > > > > > > > > > > > inner.integer().writeInt(accessor.get(i)); > > > > > >>>> > > > > > > > } > > > > > >>>> > > > > > > > inner.end(); // ] inner list > > > > > >>>> > > > > > > > } > > > > > >>>> > > > > > > > outer.end(); // ] outer list > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates < > > > > > >>>> > [email protected]> > > > > > >>>> > > > > > wrote: > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > I have working aggregation and simple UDFs. > I've > > > > been > > > > > >>>> trying > > > > > >>>> > to > > > > > >>>> > > > > > > document > > > > > >>>> > > > > > > > > and understand each of the options available > in > > a > > > > > Drill > > > > > >>>> UDF. > > > > > >>>> > > > > > > > Understanding > > > > > >>>> > > > > > > > > the different FunctionScope's, the ones that > are > > > > > >>>> allowed, the > > > > > >>>> > > > ones > > > > > >>>> > > > > > that > > > > > >>>> > > > > > > > are > > > > > >>>> > > > > > > > > not. The impact of different cost categories. > > The > > > > > >>>> different > > > > > >>>> > > > steps > > > > > >>>> > > > > > > needed > > > > > >>>> > > > > > > > > to understand handling any of the supported > data > > > > types > > > > > >>>> and > > > > > >>>> > > > > > structures > > > > > >>>> > > > > > > in > > > > > >>>> > > > > > > > > drill. > > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > Here are a few of my current road blocks. Any > > > > pointers > > > > > >>>> would > > > > > >>>> > be > > > > > >>>> > > > > > greatly > > > > > >>>> > > > > > > > > appreciated. > > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > 1. I've been trying to understand how to > > > > correctly > > > > > >>>> use > > > > > >>>> > > > > > > RepeatedHolders > > > > > >>>> > > > > > > > > of whatever type. For this discussion lets > > > start > > > > > >>>> with a > > > > > >>>> > > > > > > > > RepeatedBigIntHolder. I'm trying to figure > > out > > > > the > > > > > >>>> best > > > > > >>>> > way > > > > > >>>> > > to > > > > > >>>> > > > > > > create > > > > > >>>> > > > > > > > a > > > > > >>>> > > > > > > > > new > > > > > >>>> > > > > > > > > one. I have not figured out where in the > > > existing > > > > > >>>> drill > > > > > >>>> > code > > > > > >>>> > > > > > someone > > > > > >>>> > > > > > > > > does > > > > > >>>> > > > > > > > > this. If I use a RepeatedBigIntHolder as a > > > > > Workspace > > > > > >>>> > object > > > > > >>>> > > > is > > > > > >>>> > > > > is > > > > > >>>> > > > > > > > null > > > > > >>>> > > > > > > > > to > > > > > >>>> > > > > > > > > start with. I created a new one in the > > startup > > > > > >>>> section of > > > > > >>>> > > the > > > > > >>>> > > > > udf > > > > > >>>> > > > > > > but > > > > > >>>> > > > > > > > > the > > > > > >>>> > > > > > > > > vector was null. I can find no reference in > > > > > creating > > > > > >>>> a new > > > > > >>>> > > > > > > > BigIntVector. > > > > > >>>> > > > > > > > > There is a way to create a BigIntVector > and I > > > did > > > > > >>>> find an > > > > > >>>> > > > > example > > > > > >>>> > > > > > of > > > > > >>>> > > > > > > > > creating a new VarCharVector but I can't do > > > that > > > > > >>>> using the > > > > > >>>> > > > drill > > > > > >>>> > > > > > jar > > > > > >>>> > > > > > > > > files > > > > > >>>> > > > > > > > > from 1.0. The > > > > > >>>> org.apache.drill.common.types.TypeProtos and > > > > > >>>> > > > > > > > > the > > > > > >>>> org.apache.drill.common.types.TypeProtos.MinorType > > > > > >>>> > > classes > > > > > >>>> > > > > do > > > > > >>>> > > > > > > not > > > > > >>>> > > > > > > > > appear to be accessible from the drill jar > > > files. > > > > > >>>> > > > > > > > > 2. What is the best way to close out a UDF > in > > > the > > > > > >>>> event it > > > > > >>>> > > > > > generates > > > > > >>>> > > > > > > > an > > > > > >>>> > > > > > > > > exception? Are there specific steps one > > should > > > > > >>>> follow to > > > > > >>>> > > make > > > > > >>>> > > > a > > > > > >>>> > > > > > > clean > > > > > >>>> > > > > > > > > exit > > > > > >>>> > > > > > > > > in a catch block that are beneficial to > > Drill? > > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > >>> > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > > > > > >
