Jim,

I think that you may be having trouble with aggregators in general.

Have you been able to build *any* aggregator of anything?  I haven't.

When I try to build an aggregator of int's or doubles, I get a very
persistent problem with Drill even seeing my aggregates:

0: jdbc:drill:zk=local> *select sum_int(employee_id) from
cp.`employee.json`;*

Jul 04, 2015 4:19:35 PM
org.apache.calcite.sql.validate.SqlValidatorException <init>

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match
found for function signature sum_int(<ANY>)

Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException <init>

SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 8 to line 1, column 27: No match found for function signature
sum_int(<ANY>)

*Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match
found for function signature sum_int(<ANY>)*

*[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010
<http://10.0.1.2:31010>] (state=,code=0)*

0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as int)) from
cp.`employee.json`*;

Jul 04, 2015 4:19:45 PM
org.apache.calcite.sql.validate.SqlValidatorException <init>

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match
found for function signature sum_int(<NUMERIC>)

Jul 04, 2015 4:19:45 PM org.apache.calcite.runtime.CalciteException <init>

SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 8 to line 1, column 40: No match found for function signature
sum_int(<NUMERIC>)

*Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: No match
found for function signature sum_int(<NUMERIC>)*

*[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on 10.0.1.2:31010
<http://10.0.1.2:31010>] (state=,code=0)*

0: jdbc:drill:zk=local>


It looks like there is some undocumented subtlety about how to register an
aggregator.

On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <[email protected]> wrote:

> I'm working on the same thing. I want to aggregate a list of values. It has
> been a search and guess game for the most part. I'm still stuck in the
> process of getting the values all into a list. The writers look interesting
> but for aggregation functions  it looks like the input is the param and
> output objects can't hold the aggregations steps. The Workspace is where
> that happens. If I try and use a Writer in a workspace it won't load and
> tells me to change it to Holders which was why I was using them to start
> with. Maybe I'm missing the architecture of the agg function. It looked
> like it was....
>
> @Param comes in -> initialize @Workspace vars in setup -> process data
> through @Workspace vars in add -> finalize @Output in output.
>
> So I'm back to trying to figure out how to create a RepeatedBigIntHolder or
> a RepeatedVarCharHolder...
>
>
>
> On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <[email protected]> wrote:
>
> > I am working on trying to build any kind of list constructing aggregator
> > and having absolute fits.
> >
> > To simplify life, I decided to just build a generic list builder that is
> a
> > scalar function that returns a list containing its argument.  Thus
> zoop(3)
> > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]].
> >
> > The ComplexWriter looks like the place to go. As usual, the complete lack
> > of comments in most of Drill makes this very hard since I have to guess
> > what works and what doesn't.
> >
> > In my code, I note that ComplexWriter has a nice rootAsList() method.  I
> > used this in zip and it works nicely to construct lists for output.  I
> note
> > that the resulting ListWriter has a method copyReader(FieldReader var1)
> > which looks really good.
> >
> > Unfortunately, the only implementation of copyReader() is in
> > AbstractFieldWriter and it looks this:
> >
> > public void copyReader(FieldReader reader) {
> >     this.fail("Copy FieldReader");
> > }
> >
> > I would like to formally say at this point "WTF"?
> >
> > In digging in further, I see other methods that look handy like
> >
> > public void write(IntHolder holder) {
> >     this.fail("Int");
> > }
> >
> > And then in looking at implementations, it looks like there is a
> > combinatorial explosion because every type seems to need a write method
> for
> > every other type.
> >
> > What is the thought here?  How can I copy an arbitrary value into a list?
> >
> > My next thought was to build code that dispatches on type.  There is a
> > method called getType() on the FieldReader.  Unfortunately, that drives
> > into code generated by protoc and I see no way to dispatch on the type of
> > an incoming value.
> >
> >
> > How is this supposed to work?
> >
> >
> >
> >
> > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <[email protected]>
> wrote:
> >
> > > For a detailed example on using ComplexWriter interface you can take a
> > look
> > > at the Mappify
> > > <
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> > > >
> > > (kvgen) function. The function itself is very simple however it makes
> use
> > > of the utility methods in MappifyUtility
> > > <
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> > > >
> > > and MapUtility
> > > <
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> > > >
> > > which perform most of the work.
> > >
> > > Currently we don't have a generic infrastructure to handle errors
> coming
> > > out of functions. However there is UserException, which when raised
> will
> > > make sure that Drill does not gobble up the error message in that
> > > exception. So you can probably throw a UserException with the failing
> > input
> > > in your function to make sure it propagates to the user.
> > >
> > > Thanks
> > > Mehant
> > >
> > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <[email protected]>
> > wrote:
> > >
> > > > *Holders are for both input and output.  You can also use
> CompleWriter
> > > for
> > > > output and FieldReader for input if you want to write or read a
> complex
> > > > value.
> > > >
> > > > I don't think we've provided a really clean way to construct a
> > > > Repeated*Holder for output purposes.  You can probably do it by
> > reaching
> > > > into a bunch of internal interfaces in Drill.  However, I would
> > recommend
> > > > using the ComplexWriter output pattern for now.  This will be a
> little
> > > less
> > > > efficient but substantially less brittle.  I suggest you open up a
> jira
> > > for
> > > > using a Repeated*Holder as an output.
> > > >
> > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning <[email protected]>
> > > wrote:
> > > >
> > > > > Holders are for input, I think.
> > > > >
> > > > > Try the different kinds of writers.
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <[email protected]>
> > > wrote:
> > > > >
> > > > > > Using a repeatedholder as a @param I've got working. I was
> working
> > > on a
> > > > > > custom aggregator function using DrillAggFunc. In this I can do
> > > simple
> > > > > > things but If I want to build a list values and do something with
> > it
> > > in
> > > > > the
> > > > > > final output method I think I need to use RepeatedHolders in the
> > > > > > @Workspace. To do that I need to create a new one in the setup
> > > method.
> > > > I
> > > > > > can't get one built. They all require a BufferAllocator to be
> > passed
> > > in
> > > > > to
> > > > > > build it. I have not found a way to get an allocator yet. Any
> > > > > suggestions?
> > > > > >
> > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > If you look at the zip function in
> > > > > > > https://github.com/mapr-demos/simple-drill-functions you can
> > have
> > > an
> > > > > > > example of building a structure.
> > > > > > >
> > > > > > > The basic idea is that your output is denoted as
> > > > > > >
> > > > > > >         @Output
> > > > > > >         BaseWriter.ComplexWriter writer;
> > > > > > >
> > > > > > > The pattern for building a list of lists of integers is like
> > this:
> > > > > > >
> > > > > > >         writer.setValueCount(n);
> > > > > > >         ...
> > > > > > >         BaseWriter.ListWriter outer = writer.rootAsList();
> > > > > > >         outer.start(); // [ outer list
> > > > > > >         ...
> > > > > > >         // for each inner list
> > > > > > >             BaseWriter.ListWriter inner = outer.list();
> > > > > > >             inner.start();
> > > > > > >             // for each inner list element
> > > > > > >                 inner.integer().writeInt(accessor.get(i));
> > > > > > >             }
> > > > > > >             inner.end();   // ] inner list
> > > > > > >         }
> > > > > > >         outer.end(); // ] outer list
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > I have working aggregation and simple UDFs. I've been trying
> to
> > > > > > document
> > > > > > > > and understand each of the options available in a Drill UDF.
> > > > > > > Understanding
> > > > > > > > the different FunctionScope's, the ones that are allowed, the
> > > ones
> > > > > that
> > > > > > > are
> > > > > > > > not. The impact of different cost categories. The different
> > > steps
> > > > > > needed
> > > > > > > > to understand handling any of the supported data types  and
> > > > > structures
> > > > > > in
> > > > > > > > drill.
> > > > > > > >
> > > > > > > > Here are a few of my current road blocks. Any pointers would
> be
> > > > > greatly
> > > > > > > > appreciated.
> > > > > > > >
> > > > > > > >
> > > > > > > >    1. I've been trying to understand how to correctly use
> > > > > > RepeatedHolders
> > > > > > > >    of whatever type. For this discussion lets start with a
> > > > > > > >    RepeatedBigIntHolder. I'm trying to figure out the best
> way
> > to
> > > > > > create
> > > > > > > a
> > > > > > > > new
> > > > > > > >    one. I have not figured out where in the existing drill
> code
> > > > > someone
> > > > > > > > does
> > > > > > > >    this. If I use a  RepeatedBigIntHolder as a Workspace
> object
> > > is
> > > > is
> > > > > > > null
> > > > > > > > to
> > > > > > > >    start with. I created a new one in the startup section of
> > the
> > > > udf
> > > > > > but
> > > > > > > > the
> > > > > > > >    vector was null. I can find no reference in creating a new
> > > > > > > BigIntVector.
> > > > > > > >    There is a way to create a BigIntVector and I did find an
> > > > example
> > > > > of
> > > > > > > >    creating a new VarCharVector but I can't do that using the
> > > drill
> > > > > jar
> > > > > > > > files
> > > > > > > >    from 1.0. The org.apache.drill.common.types.TypeProtos and
> > > > > > > >    the org.apache.drill.common.types.TypeProtos.MinorType
> > classes
> > > > do
> > > > > > not
> > > > > > > >    appear to be accessible from the drill jar files.
> > > > > > > >    2. What is the best way to close out a UDF in the event it
> > > > > generates
> > > > > > > an
> > > > > > > >    exception? Are there specific steps one should follow to
> > make
> > > a
> > > > > > clean
> > > > > > > > exit
> > > > > > > >    in a catch block that are beneficial to Drill?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to