Well... After much guess work I got it to work... At least in regards to intellectual curiosity, real world use is a different story. Thanks to those who attempted to assist.
Query: SELECT MyList(test_field1) FROM (SELECT test_field1 FROM `hive.default`.`my_hive_table` limit 10); +----------------------------------------------------------+ | EXPR$0 | +----------------------------------------------------------+ | [18108,19719,15559,14152,18577,17170,13010,11603,16028] | +----------------------------------------------------------+ Function: @FunctionTemplate(name = "MyList", scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) public static class MyList implements DrillAggFunc { @Param NullableBigIntHolder xValue; @Inject DrillBuf buffer; @Workspace IntHolder count; @Workspace RepeatedBigIntHolder xList; @Output RepeatedBigIntHolder out; @Override public void setup() { count = new IntHolder(); count.value=0; org.apache.drill.exec.memory.BufferAllocator allocator = new org.apache.drill.exec.memory.TopLevelAllocator(); xList = new RepeatedBigIntHolder(); xList.vector = new org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new org.apache.drill.common.expression.SchemaPath("bigints", org.apache.drill.common.expression.ExpressionPosition.UNKNOWN), org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)), allocator); org.apache.drill.exec.vector.AllocationHelper.allocate(xList.vector, 100, 50); xList.vector.getMutator().generateTestData(100); xList.vector.getMutator().setValueCount(100); } @Override public void add() { if (xValue == null){ return; } int size = xList.end - xList.start; if (count.value + 1 > size ){ xList.vector.getMutator().setValueCount(count.value + 1); } xList.vector.getMutator().setSafe(count.value, xValue.value); xList.end=count.value; count.value = count.value + 1; } @Override public void output() { out.vector = xList.vector; out.start = xList.start; out.end = xList.end; } @Override public void reset() { } } On Sun, Jul 5, 2015 at 2:43 PM, Jim Bates <jba...@maprtech.com> wrote: > I agree. I've gone way to deep into drill to try and get this done. While > it became clear to me that this is most likely not the way to do it.... It > has been a good learning experience around aggregation UDFs. I should be > able to put a lot back into the docs on this as soon as I figure out how to > get access to contribute to the docs. I'll file a JIRA on this but in the > short term I would still like to finish off what I started. I have my > RepeatedBigIntHolders that now no longer blow up when I try and push data > into them. Now I'm trying to see if I can get anything back out. > > FunFunFun. > > On Sun, Jul 5, 2015 at 1:50 PM, Jacques Nadeau <jacq...@apache.org> wrote: > >> It isn't obvious because you shouldn't do it. Please file a JIRA to add >> real support for this type of output. >> >> Your current function would leak large amounts of memory that would >> ultimately crash the node. >> >> Realistically, there are very few internal Drill APIs that you should >> access via a UDF (injectables, holders, complexwriter, fieldreader and >> helpers). A post 1.0 goal was to provide a UDF interface JAR to ensure >> people don't accidentally reach into Drill's internals. (A later >> possibility is bytecode weaving to completely protect against it). >> >> J >> >> On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> >> > That was impressively non-obvious. >> > >> > >> > >> > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jba...@maprtech.com> wrote: >> > >> > > I did get a new RepeatedBigIntHolder built and added a BigIntVector >> added >> > > to it. I'll try it in the UDF tomorrow and see if there is a >> difference >> > in >> > > the ways I found to get a BufferAllocator. >> > > >> > > . >> > > . >> > > . >> > > @Inject DrillBuf buffer; >> > > @Workspace RepeatedBigIntHolder yList; >> > > . >> > > . >> > > . >> > > @Override >> > > public void setup() { >> > > . >> > > . >> > > . >> > > //org.apache.drill.exec.memory.BufferAllocator allocator = >> > > buffer.getAllocator(); >> > > org.apache.drill.exec.memory.BufferAllocator allocator = new >> > > org.apache.drill.exec.memory.TopLevelAllocator(); >> > > yList = new RepeatedBigIntHolder(); >> > > yList.vector = new >> > > >> > > >> > >> org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new >> > > >> > > >> > >> org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN), >> > > >> > > >> > >> org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)), >> > > allocator); >> > > . >> > > . >> > > . >> > > } >> > > >> > > >> > > >> > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jba...@maprtech.com> >> wrote: >> > > >> > > > I still have issues finding the correct way to create and use a >> > > > RepeatedHolder and Writers are a non starter for Workspace values. I >> > can >> > > > make do with creating a concatenated string in a VarCharHolder for >> > small >> > > > data sets to get past this in the short term and finish testing the >> > > output >> > > > values I expect but won't be able to do any scale till I figure out >> how >> > > to >> > > > make a repeated list. >> > > > >> > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jba...@maprtech.com> >> wrote: >> > > > >> > > >> Well... Converting from string to integers anyway... To many 4th of >> > July >> > > >> Hot Dogs. going into nitrate overload. :) >> > > >> >> > > >> I am pulling an array of string values from json data. The string >> > values >> > > >> are actually integers. I am converting to integers and summing each >> > > >> array entry to the final tally. >> > > >> >> > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <jba...@maprtech.com> >> > wrote: >> > > >> >> > > >>> Ted, >> > > >>> >> > > >>> Yes, I started out just getting a basic count to work. I am >> trying to >> > > >>> keep the workflow as close to a basic user as possible. As such, >> I am >> > > >>> building and using the MapR Apache Drill sandbox to test. >> > > >>> >> > > >>> >> > > >>> 1. Always look at the drillbits.log file to see if drill had >> any >> > > >>> issues loading your UDF. That was where I learned that all >> > > workspace values >> > > >>> needed to be holders >> > > >>> - >> > > >>> - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure >> loading >> > > >>> function class >> > > >>> >> > > com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, >> > field >> > > >>> xList. Aggregate function 'MyLinearRegression1' workspace >> > > variable 'xList' >> > > >>> is of type 'interface >> > > >>> >> > > >> org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. >> > > >>> Please change it to Holder type. >> > > >>> 2. Error messages: >> > > >>> - If you get an error in this format it means that Drill can >> > not >> > > >>> find your function so it probably didn't load it. back to >> step >> > 1: >> > > >>> - >> > > >>> - PARSE ERROR: From line 1, column 8 to line 1, column >> 44: >> > No >> > > >>> match found for function signature MyFunctionName(<ANY>) >> > > >>> - If you get an error in this format it means that the >> function >> > > >>> is there but Drill could not find a signature that matched >> the >> > > param types >> > > >>> or param numbers you were passing it. The exact wording will >> > > change but >> > > >>> the Missing function implementation is the key phrase to >> look >> > > for: >> > > >>> - >> > > >>> - Error: SYSTEM ERROR: >> > > >>> org.apache.drill.exec.exception.SchemaChangeException: >> > > Failure while trying >> > > >>> to materialize incoming schema. Errors: >> > > >>> - Error in expression at index -1. Error: Missing >> function >> > > >>> implementation: [castBIGINT(VARCHAR-REPEATED)]. Full >> > > expression: --UNKNOWN >> > > >>> EXPRESSION-- >> > > >>> 3. In your function definition for aggregate functions you >> need >> > > >>> to set null processing to internal and your isRandom to false. >> > > Example >> > > >>> below: >> > > >>> - >> > > >>> - @FunctionTemplate(name = "MyFunctionName", scope = >> > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = >> > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false, >> > > >>> isBinaryCommutative = false, costCategory = >> > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX) >> > > >>> >> > > >>> Below is an example from the Apache Drill tutorial data sets >> > contained >> > > >>> in the MapR Apache Drill sandbox. I am pulling an array if string >> > > values >> > > >>> from json data. The string values are actually integers. I am >> > > converting to >> > > >>> string and summing each array entry to the final tally. This in no >> > way >> > > >>> represents what this data was for but it did become a handy way >> for >> > me >> > > to >> > > >>> peck out the "correct" way to build an aggregation UDF function >> > > >>> >> > > >>> @FunctionTemplate(name = "MyArraySum", scope = >> > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = >> > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false, >> > > >>> isBinaryCommutative = false, costCategory = >> > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX) >> > > >>> public static class MyArraySum implements DrillAggFunc { >> > > >>> >> > > >>> @Param RepeatedVarCharHolder listToSearch; >> > > >>> @Workspace NullableBigIntHolder count; >> > > >>> @Workspace NullableBigIntHolder sum; >> > > >>> @Workspace NullableVarCharHolder vc; >> > > >>> @Output BigIntHolder out; >> > > >>> >> > > >>> @Override >> > > >>> public void setup() { >> > > >>> count.value=0; >> > > >>> sum.value = 0; >> > > >>> } >> > > >>> >> > > >>> @Override >> > > >>> public void add() { >> > > >>> int c = listToSearch.end - listToSearch.start; >> > > >>> int val = 0; >> > > >>> try { >> > > >>> for(int i=0; i<c; i++){ >> > > >>> listToSearch.vector.getAccessor().get(i, vc); >> > > >>> String inputStr = >> > > >>> >> > > >> > >> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, >> > > >>> vc.end, vc.buffer); >> > > >>> val = Integer.parseInt(inputStr); >> > > >>> sum.value = sum.value + val; >> > > >>> } >> > > >>> } catch (Exception e) { >> > > >>> val = 0; >> > > >>> } >> > > >>> count.value = count.value + 1; >> > > >>> } >> > > >>> >> > > >>> Example select statement: >> > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as >> > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t >> limit >> > 5); >> > > >>> >> > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning < >> ted.dunn...@gmail.com> >> > > >>> wrote: >> > > >>> >> > > >>>> Jim, >> > > >>>> >> > > >>>> I think that you may be having trouble with aggregators in >> general. >> > > >>>> >> > > >>>> Have you been able to build *any* aggregator of anything? I >> > haven't. >> > > >>>> >> > > >>>> When I try to build an aggregator of int's or doubles, I get a >> very >> > > >>>> persistent problem with Drill even seeing my aggregates: >> > > >>>> >> > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from >> > > >>>> cp.`employee.json`;* >> > > >>>> >> > > >>>> Jul 04, 2015 4:19:35 PM >> > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init> >> > > >>>> >> > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No >> > > match >> > > >>>> found for function signature sum_int(<ANY>) >> > > >>>> >> > > >>>> Jul 04, 2015 4:19:35 PM >> org.apache.calcite.runtime.CalciteException >> > > >>>> <init> >> > > >>>> >> > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: From >> > line >> > > 1, >> > > >>>> column 8 to line 1, column 27: No match found for function >> signature >> > > >>>> sum_int(<ANY>) >> > > >>>> >> > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: >> No >> > > >>>> match >> > > >>>> found for function signature sum_int(<ANY>)* >> > > >>>> >> > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on >> 10.0.1.2:31010 >> > > >>>> <http://10.0.1.2:31010>] (state=,code=0)* >> > > >>>> >> > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as int)) >> > from >> > > >>>> cp.`employee.json`*; >> > > >>>> >> > > >>>> Jul 04, 2015 4:19:45 PM >> > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init> >> > > >>>> >> > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No >> > > match >> > > >>>> found for function signature sum_int(<NUMERIC>) >> > > >>>> >> > > >>>> Jul 04, 2015 4:19:45 PM >> org.apache.calcite.runtime.CalciteException >> > > >>>> <init> >> > > >>>> >> > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: From >> > line >> > > 1, >> > > >>>> column 8 to line 1, column 40: No match found for function >> signature >> > > >>>> sum_int(<NUMERIC>) >> > > >>>> >> > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: >> No >> > > >>>> match >> > > >>>> found for function signature sum_int(<NUMERIC>)* >> > > >>>> >> > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on >> 10.0.1.2:31010 >> > > >>>> <http://10.0.1.2:31010>] (state=,code=0)* >> > > >>>> >> > > >>>> 0: jdbc:drill:zk=local> >> > > >>>> >> > > >>>> >> > > >>>> It looks like there is some undocumented subtlety about how to >> > > register >> > > >>>> an >> > > >>>> aggregator. >> > > >>>> >> > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <jba...@maprtech.com> >> > > wrote: >> > > >>>> >> > > >>>> > I'm working on the same thing. I want to aggregate a list of >> > values. >> > > >>>> It has >> > > >>>> > been a search and guess game for the most part. I'm still >> stuck in >> > > the >> > > >>>> > process of getting the values all into a list. The writers look >> > > >>>> interesting >> > > >>>> > but for aggregation functions it looks like the input is the >> > param >> > > >>>> and >> > > >>>> > output objects can't hold the aggregations steps. The >> Workspace is >> > > >>>> where >> > > >>>> > that happens. If I try and use a Writer in a workspace it won't >> > load >> > > >>>> and >> > > >>>> > tells me to change it to Holders which was why I was using >> them to >> > > >>>> start >> > > >>>> > with. Maybe I'm missing the architecture of the agg function. >> It >> > > >>>> looked >> > > >>>> > like it was.... >> > > >>>> > >> > > >>>> > @Param comes in -> initialize @Workspace vars in setup -> >> process >> > > data >> > > >>>> > through @Workspace vars in add -> finalize @Output in output. >> > > >>>> > >> > > >>>> > So I'm back to trying to figure out how to create a >> > > >>>> RepeatedBigIntHolder or >> > > >>>> > a RepeatedVarCharHolder... >> > > >>>> > >> > > >>>> > >> > > >>>> > >> > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning < >> > ted.dunn...@gmail.com> >> > > >>>> wrote: >> > > >>>> > >> > > >>>> > > I am working on trying to build any kind of list constructing >> > > >>>> aggregator >> > > >>>> > > and having absolute fits. >> > > >>>> > > >> > > >>>> > > To simplify life, I decided to just build a generic list >> builder >> > > >>>> that is >> > > >>>> > a >> > > >>>> > > scalar function that returns a list containing its argument. >> > Thus >> > > >>>> > zoop(3) >> > > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]]. >> > > >>>> > > >> > > >>>> > > The ComplexWriter looks like the place to go. As usual, the >> > > >>>> complete lack >> > > >>>> > > of comments in most of Drill makes this very hard since I >> have >> > to >> > > >>>> guess >> > > >>>> > > what works and what doesn't. >> > > >>>> > > >> > > >>>> > > In my code, I note that ComplexWriter has a nice rootAsList() >> > > >>>> method. I >> > > >>>> > > used this in zip and it works nicely to construct lists for >> > > >>>> output. I >> > > >>>> > note >> > > >>>> > > that the resulting ListWriter has a method >> > copyReader(FieldReader >> > > >>>> var1) >> > > >>>> > > which looks really good. >> > > >>>> > > >> > > >>>> > > Unfortunately, the only implementation of copyReader() is in >> > > >>>> > > AbstractFieldWriter and it looks this: >> > > >>>> > > >> > > >>>> > > public void copyReader(FieldReader reader) { >> > > >>>> > > this.fail("Copy FieldReader"); >> > > >>>> > > } >> > > >>>> > > >> > > >>>> > > I would like to formally say at this point "WTF"? >> > > >>>> > > >> > > >>>> > > In digging in further, I see other methods that look handy >> like >> > > >>>> > > >> > > >>>> > > public void write(IntHolder holder) { >> > > >>>> > > this.fail("Int"); >> > > >>>> > > } >> > > >>>> > > >> > > >>>> > > And then in looking at implementations, it looks like there >> is a >> > > >>>> > > combinatorial explosion because every type seems to need a >> write >> > > >>>> method >> > > >>>> > for >> > > >>>> > > every other type. >> > > >>>> > > >> > > >>>> > > What is the thought here? How can I copy an arbitrary value >> > into >> > > a >> > > >>>> list? >> > > >>>> > > >> > > >>>> > > My next thought was to build code that dispatches on type. >> > There >> > > >>>> is a >> > > >>>> > > method called getType() on the FieldReader. Unfortunately, >> that >> > > >>>> drives >> > > >>>> > > into code generated by protoc and I see no way to dispatch on >> > the >> > > >>>> type of >> > > >>>> > > an incoming value. >> > > >>>> > > >> > > >>>> > > >> > > >>>> > > How is this supposed to work? >> > > >>>> > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid < >> > > baid.meh...@gmail.com> >> > > >>>> > wrote: >> > > >>>> > > >> > > >>>> > > > For a detailed example on using ComplexWriter interface you >> > can >> > > >>>> take a >> > > >>>> > > look >> > > >>>> > > > at the Mappify >> > > >>>> > > > < >> > > >>>> > > > >> > > >>>> > > >> > > >>>> > >> > > >>>> >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java >> > > >>>> > > > > >> > > >>>> > > > (kvgen) function. The function itself is very simple >> however >> > it >> > > >>>> makes >> > > >>>> > use >> > > >>>> > > > of the utility methods in MappifyUtility >> > > >>>> > > > < >> > > >>>> > > > >> > > >>>> > > >> > > >>>> > >> > > >>>> >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java >> > > >>>> > > > > >> > > >>>> > > > and MapUtility >> > > >>>> > > > < >> > > >>>> > > > >> > > >>>> > > >> > > >>>> > >> > > >>>> >> > > >> > >> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java >> > > >>>> > > > > >> > > >>>> > > > which perform most of the work. >> > > >>>> > > > >> > > >>>> > > > Currently we don't have a generic infrastructure to handle >> > > errors >> > > >>>> > coming >> > > >>>> > > > out of functions. However there is UserException, which >> when >> > > >>>> raised >> > > >>>> > will >> > > >>>> > > > make sure that Drill does not gobble up the error message >> in >> > > that >> > > >>>> > > > exception. So you can probably throw a UserException with >> the >> > > >>>> failing >> > > >>>> > > input >> > > >>>> > > > in your function to make sure it propagates to the user. >> > > >>>> > > > >> > > >>>> > > > Thanks >> > > >>>> > > > Mehant >> > > >>>> > > > >> > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau < >> > > >>>> jacq...@apache.org> >> > > >>>> > > wrote: >> > > >>>> > > > >> > > >>>> > > > > *Holders are for both input and output. You can also use >> > > >>>> > CompleWriter >> > > >>>> > > > for >> > > >>>> > > > > output and FieldReader for input if you want to write or >> > read >> > > a >> > > >>>> > complex >> > > >>>> > > > > value. >> > > >>>> > > > > >> > > >>>> > > > > I don't think we've provided a really clean way to >> > construct a >> > > >>>> > > > > Repeated*Holder for output purposes. You can probably >> do it >> > > by >> > > >>>> > > reaching >> > > >>>> > > > > into a bunch of internal interfaces in Drill. However, I >> > > would >> > > >>>> > > recommend >> > > >>>> > > > > using the ComplexWriter output pattern for now. This >> will >> > be >> > > a >> > > >>>> > little >> > > >>>> > > > less >> > > >>>> > > > > efficient but substantially less brittle. I suggest you >> > open >> > > >>>> up a >> > > >>>> > jira >> > > >>>> > > > for >> > > >>>> > > > > using a Repeated*Holder as an output. >> > > >>>> > > > > >> > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning < >> > > >>>> ted.dunn...@gmail.com> >> > > >>>> > > > wrote: >> > > >>>> > > > > >> > > >>>> > > > > > Holders are for input, I think. >> > > >>>> > > > > > >> > > >>>> > > > > > Try the different kinds of writers. >> > > >>>> > > > > > >> > > >>>> > > > > > >> > > >>>> > > > > > >> > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates < >> > > >>>> jba...@maprtech.com> >> > > >>>> > > > wrote: >> > > >>>> > > > > > >> > > >>>> > > > > > > Using a repeatedholder as a @param I've got working. >> I >> > was >> > > >>>> > working >> > > >>>> > > > on a >> > > >>>> > > > > > > custom aggregator function using DrillAggFunc. In >> this I >> > > >>>> can do >> > > >>>> > > > simple >> > > >>>> > > > > > > things but If I want to build a list values and do >> > > >>>> something with >> > > >>>> > > it >> > > >>>> > > > in >> > > >>>> > > > > > the >> > > >>>> > > > > > > final output method I think I need to use >> > RepeatedHolders >> > > >>>> in the >> > > >>>> > > > > > > @Workspace. To do that I need to create a new one in >> the >> > > >>>> setup >> > > >>>> > > > method. >> > > >>>> > > > > I >> > > >>>> > > > > > > can't get one built. They all require a >> BufferAllocator >> > to >> > > >>>> be >> > > >>>> > > passed >> > > >>>> > > > in >> > > >>>> > > > > > to >> > > >>>> > > > > > > build it. I have not found a way to get an allocator >> > yet. >> > > >>>> Any >> > > >>>> > > > > > suggestions? >> > > >>>> > > > > > > >> > > >>>> > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning < >> > > >>>> > ted.dunn...@gmail.com >> > > >>>> > > > >> > > >>>> > > > > > wrote: >> > > >>>> > > > > > > >> > > >>>> > > > > > > > If you look at the zip function in >> > > >>>> > > > > > > > >> https://github.com/mapr-demos/simple-drill-functions >> > > you >> > > >>>> can >> > > >>>> > > have >> > > >>>> > > > an >> > > >>>> > > > > > > > example of building a structure. >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > The basic idea is that your output is denoted as >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > @Output >> > > >>>> > > > > > > > BaseWriter.ComplexWriter writer; >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > The pattern for building a list of lists of >> integers >> > is >> > > >>>> like >> > > >>>> > > this: >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > writer.setValueCount(n); >> > > >>>> > > > > > > > ... >> > > >>>> > > > > > > > BaseWriter.ListWriter outer = >> > > writer.rootAsList(); >> > > >>>> > > > > > > > outer.start(); // [ outer list >> > > >>>> > > > > > > > ... >> > > >>>> > > > > > > > // for each inner list >> > > >>>> > > > > > > > BaseWriter.ListWriter inner = >> > outer.list(); >> > > >>>> > > > > > > > inner.start(); >> > > >>>> > > > > > > > // for each inner list element >> > > >>>> > > > > > > > >> > > inner.integer().writeInt(accessor.get(i)); >> > > >>>> > > > > > > > } >> > > >>>> > > > > > > > inner.end(); // ] inner list >> > > >>>> > > > > > > > } >> > > >>>> > > > > > > > outer.end(); // ] outer list >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates < >> > > >>>> > jba...@maprtech.com> >> > > >>>> > > > > > wrote: >> > > >>>> > > > > > > > >> > > >>>> > > > > > > > > I have working aggregation and simple UDFs. I've >> > been >> > > >>>> trying >> > > >>>> > to >> > > >>>> > > > > > > document >> > > >>>> > > > > > > > > and understand each of the options available in a >> > > Drill >> > > >>>> UDF. >> > > >>>> > > > > > > > Understanding >> > > >>>> > > > > > > > > the different FunctionScope's, the ones that are >> > > >>>> allowed, the >> > > >>>> > > > ones >> > > >>>> > > > > > that >> > > >>>> > > > > > > > are >> > > >>>> > > > > > > > > not. The impact of different cost categories. The >> > > >>>> different >> > > >>>> > > > steps >> > > >>>> > > > > > > needed >> > > >>>> > > > > > > > > to understand handling any of the supported data >> > types >> > > >>>> and >> > > >>>> > > > > > structures >> > > >>>> > > > > > > in >> > > >>>> > > > > > > > > drill. >> > > >>>> > > > > > > > > >> > > >>>> > > > > > > > > Here are a few of my current road blocks. Any >> > pointers >> > > >>>> would >> > > >>>> > be >> > > >>>> > > > > > greatly >> > > >>>> > > > > > > > > appreciated. >> > > >>>> > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> > > > > > > > > 1. I've been trying to understand how to >> > correctly >> > > >>>> use >> > > >>>> > > > > > > RepeatedHolders >> > > >>>> > > > > > > > > of whatever type. For this discussion lets >> start >> > > >>>> with a >> > > >>>> > > > > > > > > RepeatedBigIntHolder. I'm trying to figure out >> > the >> > > >>>> best >> > > >>>> > way >> > > >>>> > > to >> > > >>>> > > > > > > create >> > > >>>> > > > > > > > a >> > > >>>> > > > > > > > > new >> > > >>>> > > > > > > > > one. I have not figured out where in the >> existing >> > > >>>> drill >> > > >>>> > code >> > > >>>> > > > > > someone >> > > >>>> > > > > > > > > does >> > > >>>> > > > > > > > > this. If I use a RepeatedBigIntHolder as a >> > > Workspace >> > > >>>> > object >> > > >>>> > > > is >> > > >>>> > > > > is >> > > >>>> > > > > > > > null >> > > >>>> > > > > > > > > to >> > > >>>> > > > > > > > > start with. I created a new one in the startup >> > > >>>> section of >> > > >>>> > > the >> > > >>>> > > > > udf >> > > >>>> > > > > > > but >> > > >>>> > > > > > > > > the >> > > >>>> > > > > > > > > vector was null. I can find no reference in >> > > creating >> > > >>>> a new >> > > >>>> > > > > > > > BigIntVector. >> > > >>>> > > > > > > > > There is a way to create a BigIntVector and I >> did >> > > >>>> find an >> > > >>>> > > > > example >> > > >>>> > > > > > of >> > > >>>> > > > > > > > > creating a new VarCharVector but I can't do >> that >> > > >>>> using the >> > > >>>> > > > drill >> > > >>>> > > > > > jar >> > > >>>> > > > > > > > > files >> > > >>>> > > > > > > > > from 1.0. The >> > > >>>> org.apache.drill.common.types.TypeProtos and >> > > >>>> > > > > > > > > the >> > > >>>> org.apache.drill.common.types.TypeProtos.MinorType >> > > >>>> > > classes >> > > >>>> > > > > do >> > > >>>> > > > > > > not >> > > >>>> > > > > > > > > appear to be accessible from the drill jar >> files. >> > > >>>> > > > > > > > > 2. What is the best way to close out a UDF in >> the >> > > >>>> event it >> > > >>>> > > > > > generates >> > > >>>> > > > > > > > an >> > > >>>> > > > > > > > > exception? Are there specific steps one should >> > > >>>> follow to >> > > >>>> > > make >> > > >>>> > > > a >> > > >>>> > > > > > > clean >> > > >>>> > > > > > > > > exit >> > > >>>> > > > > > > > > in a catch block that are beneficial to Drill? >> > > >>>> > > > > > > > > >> > > >>>> > > > > > > > >> > > >>>> > > > > > > >> > > >>>> > > > > > >> > > >>>> > > > > >> > > >>>> > > > >> > > >>>> > > >> > > >>>> > >> > > >>>> >> > > >>> >> > > >>> >> > > >> >> > > > >> > > >> > >> > >