I am working on trying to build any kind of list constructing aggregator
and having absolute fits.
To simplify life, I decided to just build a generic list builder that is a
scalar function that returns a list containing its argument. Thus zoop(3)
=> [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]].
The ComplexWriter looks like the place to go. As usual, the complete lack
of comments in most of Drill makes this very hard since I have to guess
what works and what doesn't.
In my code, I note that ComplexWriter has a nice rootAsList() method. I
used this in zip and it works nicely to construct lists for output. I note
that the resulting ListWriter has a method copyReader(FieldReader var1)
which looks really good.
Unfortunately, the only implementation of copyReader() is in
AbstractFieldWriter and it looks this:
public void copyReader(FieldReader reader) {
this.fail("Copy FieldReader");
}
I would like to formally say at this point "WTF"?
In digging in further, I see other methods that look handy like
public void write(IntHolder holder) {
this.fail("Int");
}
And then in looking at implementations, it looks like there is a
combinatorial explosion because every type seems to need a write method for
every other type.
What is the thought here? How can I copy an arbitrary value into a list?
My next thought was to build code that dispatches on type. There is a
method called getType() on the FieldReader. Unfortunately, that drives
into code generated by protoc and I see no way to dispatch on the type of
an incoming value.
How is this supposed to work?
On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <[email protected]> wrote:
> For a detailed example on using ComplexWriter interface you can take a look
> at the Mappify
> <
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
> >
> (kvgen) function. The function itself is very simple however it makes use
> of the utility methods in MappifyUtility
> <
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
> >
> and MapUtility
> <
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
> >
> which perform most of the work.
>
> Currently we don't have a generic infrastructure to handle errors coming
> out of functions. However there is UserException, which when raised will
> make sure that Drill does not gobble up the error message in that
> exception. So you can probably throw a UserException with the failing input
> in your function to make sure it propagates to the user.
>
> Thanks
> Mehant
>
> On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <[email protected]> wrote:
>
> > *Holders are for both input and output. You can also use CompleWriter
> for
> > output and FieldReader for input if you want to write or read a complex
> > value.
> >
> > I don't think we've provided a really clean way to construct a
> > Repeated*Holder for output purposes. You can probably do it by reaching
> > into a bunch of internal interfaces in Drill. However, I would recommend
> > using the ComplexWriter output pattern for now. This will be a little
> less
> > efficient but substantially less brittle. I suggest you open up a jira
> for
> > using a Repeated*Holder as an output.
> >
> > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning <[email protected]>
> wrote:
> >
> > > Holders are for input, I think.
> > >
> > > Try the different kinds of writers.
> > >
> > >
> > >
> > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <[email protected]>
> wrote:
> > >
> > > > Using a repeatedholder as a @param I've got working. I was working
> on a
> > > > custom aggregator function using DrillAggFunc. In this I can do
> simple
> > > > things but If I want to build a list values and do something with it
> in
> > > the
> > > > final output method I think I need to use RepeatedHolders in the
> > > > @Workspace. To do that I need to create a new one in the setup
> method.
> > I
> > > > can't get one built. They all require a BufferAllocator to be passed
> in
> > > to
> > > > build it. I have not found a way to get an allocator yet. Any
> > > suggestions?
> > > >
> > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning <[email protected]>
> > > wrote:
> > > >
> > > > > If you look at the zip function in
> > > > > https://github.com/mapr-demos/simple-drill-functions you can have
> an
> > > > > example of building a structure.
> > > > >
> > > > > The basic idea is that your output is denoted as
> > > > >
> > > > > @Output
> > > > > BaseWriter.ComplexWriter writer;
> > > > >
> > > > > The pattern for building a list of lists of integers is like this:
> > > > >
> > > > > writer.setValueCount(n);
> > > > > ...
> > > > > BaseWriter.ListWriter outer = writer.rootAsList();
> > > > > outer.start(); // [ outer list
> > > > > ...
> > > > > // for each inner list
> > > > > BaseWriter.ListWriter inner = outer.list();
> > > > > inner.start();
> > > > > // for each inner list element
> > > > > inner.integer().writeInt(accessor.get(i));
> > > > > }
> > > > > inner.end(); // ] inner list
> > > > > }
> > > > > outer.end(); // ] outer list
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates <[email protected]>
> > > wrote:
> > > > >
> > > > > > I have working aggregation and simple UDFs. I've been trying to
> > > > document
> > > > > > and understand each of the options available in a Drill UDF.
> > > > > Understanding
> > > > > > the different FunctionScope's, the ones that are allowed, the
> ones
> > > that
> > > > > are
> > > > > > not. The impact of different cost categories. The different
> steps
> > > > needed
> > > > > > to understand handling any of the supported data types and
> > > structures
> > > > in
> > > > > > drill.
> > > > > >
> > > > > > Here are a few of my current road blocks. Any pointers would be
> > > greatly
> > > > > > appreciated.
> > > > > >
> > > > > >
> > > > > > 1. I've been trying to understand how to correctly use
> > > > RepeatedHolders
> > > > > > of whatever type. For this discussion lets start with a
> > > > > > RepeatedBigIntHolder. I'm trying to figure out the best way to
> > > > create
> > > > > a
> > > > > > new
> > > > > > one. I have not figured out where in the existing drill code
> > > someone
> > > > > > does
> > > > > > this. If I use a RepeatedBigIntHolder as a Workspace object
> is
> > is
> > > > > null
> > > > > > to
> > > > > > start with. I created a new one in the startup section of the
> > udf
> > > > but
> > > > > > the
> > > > > > vector was null. I can find no reference in creating a new
> > > > > BigIntVector.
> > > > > > There is a way to create a BigIntVector and I did find an
> > example
> > > of
> > > > > > creating a new VarCharVector but I can't do that using the
> drill
> > > jar
> > > > > > files
> > > > > > from 1.0. The org.apache.drill.common.types.TypeProtos and
> > > > > > the org.apache.drill.common.types.TypeProtos.MinorType classes
> > do
> > > > not
> > > > > > appear to be accessible from the drill jar files.
> > > > > > 2. What is the best way to close out a UDF in the event it
> > > generates
> > > > > an
> > > > > > exception? Are there specific steps one should follow to make
> a
> > > > clean
> > > > > > exit
> > > > > > in a catch block that are beneficial to Drill?
> > > > > >
> > > > >
> > > >
> > >
> >
>