Jim, I think that you may be having trouble with aggregators in general.
Have you been able to build *any* aggregator of anything? I haven't. When I try to build an aggregator of int's or doubles, I get a very persistent problem with Drill even seeing my aggregates: 0: jdbc:drill:zk=local> *select sum_int(employee_id) from cp.`employee.json`;* Jul 04, 2015 4:19:35 PM org.apache.calcite.sql.validate.SqlValidatorException <init> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature sum_int(<ANY>) Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException <init> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 27: No match found for function signature sum_int(<ANY>) *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match found for function signature sum_int(<ANY>)* *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010 <http://10.0.1.2:31010>] (state=,code=0)* 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as int)) from cp.`employee.json`*; Jul 04, 2015 4:19:45 PM org.apache.calcite.sql.validate.SqlValidatorException <init> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature sum_int(<NUMERIC>) Jul 04, 2015 4:19:45 PM org.apache.calcite.runtime.CalciteException <init> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 40: No match found for function signature sum_int(<NUMERIC>) *Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: No match found for function signature sum_int(<NUMERIC>)* *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on 10.0.1.2:31010 <http://10.0.1.2:31010>] (state=,code=0)* 0: jdbc:drill:zk=local> It looks like there is some undocumented subtlety about how to register an aggregator. On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <[email protected]> wrote: > I'm working on the same thing. I want to aggregate a list of values. It has > been a search and guess game for the most part. I'm still stuck in the > process of getting the values all into a list. The writers look interesting > but for aggregation functions it looks like the input is the param and > output objects can't hold the aggregations steps. The Workspace is where > that happens. If I try and use a Writer in a workspace it won't load and > tells me to change it to Holders which was why I was using them to start > with. Maybe I'm missing the architecture of the agg function. It looked > like it was.... > > @Param comes in -> initialize @Workspace vars in setup -> process data > through @Workspace vars in add -> finalize @Output in output. > > So I'm back to trying to figure out how to create a RepeatedBigIntHolder or > a RepeatedVarCharHolder... > > > > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <[email protected]> wrote: > > > I am working on trying to build any kind of list constructing aggregator > > and having absolute fits. > > > > To simplify life, I decided to just build a generic list builder that is > a > > scalar function that returns a list containing its argument. Thus > zoop(3) > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]]. > > > > The ComplexWriter looks like the place to go. As usual, the complete lack > > of comments in most of Drill makes this very hard since I have to guess > > what works and what doesn't. > > > > In my code, I note that ComplexWriter has a nice rootAsList() method. I > > used this in zip and it works nicely to construct lists for output. I > note > > that the resulting ListWriter has a method copyReader(FieldReader var1) > > which looks really good. > > > > Unfortunately, the only implementation of copyReader() is in > > AbstractFieldWriter and it looks this: > > > > public void copyReader(FieldReader reader) { > > this.fail("Copy FieldReader"); > > } > > > > I would like to formally say at this point "WTF"? > > > > In digging in further, I see other methods that look handy like > > > > public void write(IntHolder holder) { > > this.fail("Int"); > > } > > > > And then in looking at implementations, it looks like there is a > > combinatorial explosion because every type seems to need a write method > for > > every other type. > > > > What is the thought here? How can I copy an arbitrary value into a list? > > > > My next thought was to build code that dispatches on type. There is a > > method called getType() on the FieldReader. Unfortunately, that drives > > into code generated by protoc and I see no way to dispatch on the type of > > an incoming value. > > > > > > How is this supposed to work? > > > > > > > > > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <[email protected]> > wrote: > > > > > For a detailed example on using ComplexWriter interface you can take a > > look > > > at the Mappify > > > < > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java > > > > > > > (kvgen) function. The function itself is very simple however it makes > use > > > of the utility methods in MappifyUtility > > > < > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java > > > > > > > and MapUtility > > > < > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java > > > > > > > which perform most of the work. > > > > > > Currently we don't have a generic infrastructure to handle errors > coming > > > out of functions. However there is UserException, which when raised > will > > > make sure that Drill does not gobble up the error message in that > > > exception. So you can probably throw a UserException with the failing > > input > > > in your function to make sure it propagates to the user. > > > > > > Thanks > > > Mehant > > > > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > > > *Holders are for both input and output. You can also use > CompleWriter > > > for > > > > output and FieldReader for input if you want to write or read a > complex > > > > value. > > > > > > > > I don't think we've provided a really clean way to construct a > > > > Repeated*Holder for output purposes. You can probably do it by > > reaching > > > > into a bunch of internal interfaces in Drill. However, I would > > recommend > > > > using the ComplexWriter output pattern for now. This will be a > little > > > less > > > > efficient but substantially less brittle. I suggest you open up a > jira > > > for > > > > using a Repeated*Holder as an output. > > > > > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning <[email protected]> > > > wrote: > > > > > > > > > Holders are for input, I think. > > > > > > > > > > Try the different kinds of writers. > > > > > > > > > > > > > > > > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <[email protected]> > > > wrote: > > > > > > > > > > > Using a repeatedholder as a @param I've got working. I was > working > > > on a > > > > > > custom aggregator function using DrillAggFunc. In this I can do > > > simple > > > > > > things but If I want to build a list values and do something with > > it > > > in > > > > > the > > > > > > final output method I think I need to use RepeatedHolders in the > > > > > > @Workspace. To do that I need to create a new one in the setup > > > method. > > > > I > > > > > > can't get one built. They all require a BufferAllocator to be > > passed > > > in > > > > > to > > > > > > build it. I have not found a way to get an allocator yet. Any > > > > > suggestions? > > > > > > > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > If you look at the zip function in > > > > > > > https://github.com/mapr-demos/simple-drill-functions you can > > have > > > an > > > > > > > example of building a structure. > > > > > > > > > > > > > > The basic idea is that your output is denoted as > > > > > > > > > > > > > > @Output > > > > > > > BaseWriter.ComplexWriter writer; > > > > > > > > > > > > > > The pattern for building a list of lists of integers is like > > this: > > > > > > > > > > > > > > writer.setValueCount(n); > > > > > > > ... > > > > > > > BaseWriter.ListWriter outer = writer.rootAsList(); > > > > > > > outer.start(); // [ outer list > > > > > > > ... > > > > > > > // for each inner list > > > > > > > BaseWriter.ListWriter inner = outer.list(); > > > > > > > inner.start(); > > > > > > > // for each inner list element > > > > > > > inner.integer().writeInt(accessor.get(i)); > > > > > > > } > > > > > > > inner.end(); // ] inner list > > > > > > > } > > > > > > > outer.end(); // ] outer list > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > > I have working aggregation and simple UDFs. I've been trying > to > > > > > > document > > > > > > > > and understand each of the options available in a Drill UDF. > > > > > > > Understanding > > > > > > > > the different FunctionScope's, the ones that are allowed, the > > > ones > > > > > that > > > > > > > are > > > > > > > > not. The impact of different cost categories. The different > > > steps > > > > > > needed > > > > > > > > to understand handling any of the supported data types and > > > > > structures > > > > > > in > > > > > > > > drill. > > > > > > > > > > > > > > > > Here are a few of my current road blocks. Any pointers would > be > > > > > greatly > > > > > > > > appreciated. > > > > > > > > > > > > > > > > > > > > > > > > 1. I've been trying to understand how to correctly use > > > > > > RepeatedHolders > > > > > > > > of whatever type. For this discussion lets start with a > > > > > > > > RepeatedBigIntHolder. I'm trying to figure out the best > way > > to > > > > > > create > > > > > > > a > > > > > > > > new > > > > > > > > one. I have not figured out where in the existing drill > code > > > > > someone > > > > > > > > does > > > > > > > > this. If I use a RepeatedBigIntHolder as a Workspace > object > > > is > > > > is > > > > > > > null > > > > > > > > to > > > > > > > > start with. I created a new one in the startup section of > > the > > > > udf > > > > > > but > > > > > > > > the > > > > > > > > vector was null. I can find no reference in creating a new > > > > > > > BigIntVector. > > > > > > > > There is a way to create a BigIntVector and I did find an > > > > example > > > > > of > > > > > > > > creating a new VarCharVector but I can't do that using the > > > drill > > > > > jar > > > > > > > > files > > > > > > > > from 1.0. The org.apache.drill.common.types.TypeProtos and > > > > > > > > the org.apache.drill.common.types.TypeProtos.MinorType > > classes > > > > do > > > > > > not > > > > > > > > appear to be accessible from the drill jar files. > > > > > > > > 2. What is the best way to close out a UDF in the event it > > > > > generates > > > > > > > an > > > > > > > > exception? Are there specific steps one should follow to > > make > > > a > > > > > > clean > > > > > > > > exit > > > > > > > > in a catch block that are beneficial to Drill? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
