Re: Some questions on UDFs

Jim Bates Sun, 05 Jul 2015 13:24:06 -0700

Well... After much guess work I got it to work... At least in regards to
intellectual curiosity, real world use is a different story. Thanks to
those who attempted to assist.


Query:
SELECT MyList(test_field1)  FROM (SELECT test_field1 FROM
`hive.default`.`my_hive_table` limit 10);
+----------------------------------------------------------+
|                          EXPR$0                          |
+----------------------------------------------------------+
| [18108,19719,15559,14152,18577,17170,13010,11603,16028]  |
+----------------------------------------------------------+

Function:
@FunctionTemplate(name = "MyList", scope =
FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
isBinaryCommutative = false, costCategory =
FunctionTemplate.FunctionCostCategory.COMPLEX)
public static class MyList implements DrillAggFunc {

@Param NullableBigIntHolder xValue;
@Inject DrillBuf buffer;
@Workspace IntHolder count;
@Workspace RepeatedBigIntHolder xList;
@Output RepeatedBigIntHolder out;

@Override
public void setup() {
count = new IntHolder();
count.value=0;
org.apache.drill.exec.memory.BufferAllocator allocator =  new
org.apache.drill.exec.memory.TopLevelAllocator();
xList = new RepeatedBigIntHolder();
xList.vector = new
org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
org.apache.drill.common.expression.SchemaPath("bigints",
org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
allocator);
org.apache.drill.exec.vector.AllocationHelper.allocate(xList.vector, 100,
50);
xList.vector.getMutator().generateTestData(100);
xList.vector.getMutator().setValueCount(100);
}

@Override
public void add() {
if (xValue == null){
return;
}
int size = xList.end - xList.start;
if (count.value + 1 > size ){
xList.vector.getMutator().setValueCount(count.value + 1);
}
xList.vector.getMutator().setSafe(count.value, xValue.value);
xList.end=count.value;
count.value = count.value + 1;
}

@Override
public void output() {
        out.vector = xList.vector;
        out.start = xList.start;
        out.end = xList.end;
}

@Override
public void reset() {
}
}

On Sun, Jul 5, 2015 at 2:43 PM, Jim Bates <jba...@maprtech.com> wrote:

> I agree. I've gone way to deep into drill to try and get this done. While
> it became clear to me that this is most likely not the way to do it.... It
> has been a good learning experience around aggregation UDFs. I should be
> able to put a lot back into the docs on this as soon as I figure out how to
> get access to contribute to the docs. I'll file a JIRA on this but in the
> short term I would still like to finish off what I started. I have my
> RepeatedBigIntHolders that now no longer blow up when I try and push data
> into them. Now I'm trying to see if I can get anything back out.
>
> FunFunFun.
>
> On Sun, Jul 5, 2015 at 1:50 PM, Jacques Nadeau <jacq...@apache.org> wrote:
>
>> It isn't obvious because you shouldn't do it.  Please file a JIRA to add
>> real support for this type of output.
>>
>> Your current function would leak large amounts of memory that would
>> ultimately crash the node.
>>
>> Realistically, there are very few internal Drill APIs that you should
>> access via a UDF (injectables, holders, complexwriter, fieldreader and
>> helpers).  A post 1.0 goal was to provide a UDF interface JAR to ensure
>> people don't accidentally reach into Drill's internals.  (A later
>> possibility is bytecode weaving to completely protect against it).
>>
>> J
>>
>> On Sun, Jul 5, 2015 at 11:36 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>
>> > That was impressively non-obvious.
>> >
>> >
>> >
>> > On Sat, Jul 4, 2015 at 6:40 PM, Jim Bates <jba...@maprtech.com> wrote:
>> >
>> > > I did get a new RepeatedBigIntHolder built and added a BigIntVector
>> added
>> > > to it. I'll try it in the UDF tomorrow and see if there is a
>> difference
>> > in
>> > > the ways I found to get a BufferAllocator.
>> > >
>> > > .
>> > > .
>> > > .
>> > > @Inject DrillBuf buffer;
>> > > @Workspace RepeatedBigIntHolder yList;
>> > > .
>> > > .
>> > > .
>> > > @Override
>> > > public void setup() {
>> > > .
>> > > .
>> > > .
>> > > //org.apache.drill.exec.memory.BufferAllocator allocator =
>> > > buffer.getAllocator();
>> > > org.apache.drill.exec.memory.BufferAllocator allocator =  new
>> > > org.apache.drill.exec.memory.TopLevelAllocator();
>> > > yList = new RepeatedBigIntHolder();
>> > > yList.vector = new
>> > >
>> > >
>> >
>> org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
>> > >
>> > >
>> >
>> org.apache.drill.common.expression.SchemaPath("bigints",org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
>> > >
>> > >
>> >
>> org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
>> > > allocator);
>> > > .
>> > > .
>> > > .
>> > > }
>> > >
>> > >
>> > >
>> > > On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates <jba...@maprtech.com>
>> wrote:
>> > >
>> > > > I still have issues finding the correct way to create and use a
>> > > > RepeatedHolder and Writers are a non starter for Workspace values. I
>> > can
>> > > > make do with creating a concatenated string in a VarCharHolder for
>> > small
>> > > > data sets to get past this in the short term and finish testing the
>> > > output
>> > > > values I expect but won't be able to do any scale till I figure out
>> how
>> > > to
>> > > > make a repeated list.
>> > > >
>> > > > On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates <jba...@maprtech.com>
>> wrote:
>> > > >
>> > > >> Well... Converting from string to integers anyway... To many 4th of
>> > July
>> > > >> Hot Dogs. going into nitrate overload. :)
>> > > >>
>> > > >> I am pulling an array of string values from json data. The string
>> > values
>> > > >> are actually integers. I am converting to integers and summing each
>> > > >> array entry to the final tally.
>> > > >>
>> > > >> On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates <jba...@maprtech.com>
>> > wrote:
>> > > >>
>> > > >>> Ted,
>> > > >>>
>> > > >>> Yes, I started out just getting a basic count to work. I am
>> trying to
>> > > >>> keep the workflow as close to a basic user as possible. As such,
>> I am
>> > > >>> building and using the MapR Apache Drill sandbox to test.
>> > > >>>
>> > > >>>
>> > > >>>    1. Always look at the drillbits.log file to see if drill had
>> any
>> > > >>>    issues loading your UDF. That was where I learned that all
>> > > workspace values
>> > > >>>    needed to be holders
>> > > >>>       -
>> > > >>>       - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure
>> loading
>> > > >>>       function class
>> > > >>>
>> > >  com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1,
>> > field
>> > > >>>       xList. Aggregate function 'MyLinearRegression1' workspace
>> > > variable 'xList'
>> > > >>>       is of type 'interface
>> > > >>>
>> > >
>> org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
>> > > >>>       Please change it to Holder type.
>> > > >>>    2. Error messages:
>> > > >>>       - If you get an error in this format it means that Drill can
>> > not
>> > > >>>       find your function so it probably didn't load it. back to
>> step
>> > 1:
>> > > >>>          -
>> > > >>>          - PARSE ERROR: From line 1, column 8 to line 1, column
>> 44:
>> > No
>> > > >>>          match found for function signature MyFunctionName(<ANY>)
>> > > >>>       - If you get an error in this format it means that the
>> function
>> > > >>>       is there but Drill could not find a signature that matched
>> the
>> > > param types
>> > > >>>       or param numbers you were passing it. The exact wording will
>> > > change but
>> > > >>>       the Missing function implementation is the key phrase to
>> look
>> > > for:
>> > > >>>          -
>> > > >>>          - Error: SYSTEM ERROR:
>> > > >>>          org.apache.drill.exec.exception.SchemaChangeException:
>> > > Failure while trying
>> > > >>>          to materialize incoming schema.  Errors:
>> > > >>>          - Error in expression at index -1.  Error: Missing
>> function
>> > > >>>          implementation: [castBIGINT(VARCHAR-REPEATED)].  Full
>> > > expression: --UNKNOWN
>> > > >>>          EXPRESSION--
>> > > >>>       3. In your function definition for aggregate functions you
>> need
>> > > >>>    to set null processing to internal and your isRandom to false.
>> > > Example
>> > > >>>    below:
>> > > >>>       -
>> > > >>>       - @FunctionTemplate(name = "MyFunctionName", scope =
>> > > >>>       FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
>> > > >>>       FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
>> > > >>>       isBinaryCommutative = false, costCategory =
>> > > >>>       FunctionTemplate.FunctionCostCategory.COMPLEX)
>> > > >>>
>> > > >>> Below is an example from the Apache Drill tutorial data sets
>> > contained
>> > > >>> in the MapR Apache Drill sandbox. I am pulling an array if string
>> > > values
>> > > >>> from json data. The string values are actually integers. I am
>> > > converting to
>> > > >>> string and summing each array entry to the final tally. This in no
>> > way
>> > > >>> represents what this data was for but it did become a handy way
>> for
>> > me
>> > > to
>> > > >>> peck out the "correct" way to build an aggregation UDF function
>> > > >>>
>> > > >>> @FunctionTemplate(name = "MyArraySum", scope =
>> > > >>> FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
>> > > >>> FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
>> > > >>> isBinaryCommutative = false, costCategory =
>> > > >>> FunctionTemplate.FunctionCostCategory.COMPLEX)
>> > > >>> public static class MyArraySum implements DrillAggFunc {
>> > > >>>
>> > > >>> @Param RepeatedVarCharHolder listToSearch;
>> > > >>> @Workspace NullableBigIntHolder count;
>> > > >>> @Workspace NullableBigIntHolder sum;
>> > > >>> @Workspace NullableVarCharHolder vc;
>> > > >>> @Output BigIntHolder out;
>> > > >>>
>> > > >>> @Override
>> > > >>> public void setup() {
>> > > >>> count.value=0;
>> > > >>> sum.value = 0;
>> > > >>> }
>> > > >>>
>> > > >>> @Override
>> > > >>> public void add() {
>> > > >>> int c = listToSearch.end - listToSearch.start;
>> > > >>> int val = 0;
>> > > >>> try {
>> > > >>> for(int i=0; i<c; i++){
>> > > >>> listToSearch.vector.getAccessor().get(i, vc);
>> > > >>> String inputStr =
>> > > >>>
>> > >
>> >
>> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
>> > > >>> vc.end, vc.buffer);
>> > > >>> val = Integer.parseInt(inputStr);
>> > > >>> sum.value = sum.value + val;
>> > > >>> }
>> > > >>> } catch (Exception e) {
>> > > >>> val = 0;
>> > > >>> }
>> > > >>> count.value = count.value + 1;
>> > > >>> }
>> > > >>>
>> > > >>> Example select statement:
>> > > >>> SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as
>> > > >>> my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t
>> limit
>> > 5);
>> > > >>>
>> > > >>> On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning <
>> ted.dunn...@gmail.com>
>> > > >>> wrote:
>> > > >>>
>> > > >>>> Jim,
>> > > >>>>
>> > > >>>> I think that you may be having trouble with aggregators in
>> general.
>> > > >>>>
>> > > >>>> Have you been able to build *any* aggregator of anything?  I
>> > haven't.
>> > > >>>>
>> > > >>>> When I try to build an aggregator of int's or doubles, I get a
>> very
>> > > >>>> persistent problem with Drill even seeing my aggregates:
>> > > >>>>
>> > > >>>> 0: jdbc:drill:zk=local> *select sum_int(employee_id) from
>> > > >>>> cp.`employee.json`;*
>> > > >>>>
>> > > >>>> Jul 04, 2015 4:19:35 PM
>> > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init>
>> > > >>>>
>> > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No
>> > > match
>> > > >>>> found for function signature sum_int(<ANY>)
>> > > >>>>
>> > > >>>> Jul 04, 2015 4:19:35 PM
>> org.apache.calcite.runtime.CalciteException
>> > > >>>> <init>
>> > > >>>>
>> > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: From
>> > line
>> > > 1,
>> > > >>>> column 8 to line 1, column 27: No match found for function
>> signature
>> > > >>>> sum_int(<ANY>)
>> > > >>>>
>> > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27:
>> No
>> > > >>>> match
>> > > >>>> found for function signature sum_int(<ANY>)*
>> > > >>>>
>> > > >>>> *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on
>> 10.0.1.2:31010
>> > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
>> > > >>>>
>> > > >>>> 0: jdbc:drill:zk=local> *select sum_int(cast(employee_id as int))
>> > from
>> > > >>>> cp.`employee.json`*;
>> > > >>>>
>> > > >>>> Jul 04, 2015 4:19:45 PM
>> > > >>>> org.apache.calcite.sql.validate.SqlValidatorException <init>
>> > > >>>>
>> > > >>>> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No
>> > > match
>> > > >>>> found for function signature sum_int(<NUMERIC>)
>> > > >>>>
>> > > >>>> Jul 04, 2015 4:19:45 PM
>> org.apache.calcite.runtime.CalciteException
>> > > >>>> <init>
>> > > >>>>
>> > > >>>> SEVERE: org.apache.calcite.runtime.CalciteContextException: From
>> > line
>> > > 1,
>> > > >>>> column 8 to line 1, column 40: No match found for function
>> signature
>> > > >>>> sum_int(<NUMERIC>)
>> > > >>>>
>> > > >>>> *Error: PARSE ERROR: From line 1, column 8 to line 1, column 40:
>> No
>> > > >>>> match
>> > > >>>> found for function signature sum_int(<NUMERIC>)*
>> > > >>>>
>> > > >>>> *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on
>> 10.0.1.2:31010
>> > > >>>> <http://10.0.1.2:31010>] (state=,code=0)*
>> > > >>>>
>> > > >>>> 0: jdbc:drill:zk=local>
>> > > >>>>
>> > > >>>>
>> > > >>>> It looks like there is some undocumented subtlety about how to
>> > > register
>> > > >>>> an
>> > > >>>> aggregator.
>> > > >>>>
>> > > >>>> On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates <jba...@maprtech.com>
>> > > wrote:
>> > > >>>>
>> > > >>>> > I'm working on the same thing. I want to aggregate a list of
>> > values.
>> > > >>>> It has
>> > > >>>> > been a search and guess game for the most part. I'm still
>> stuck in
>> > > the
>> > > >>>> > process of getting the values all into a list. The writers look
>> > > >>>> interesting
>> > > >>>> > but for aggregation functions  it looks like the input is the
>> > param
>> > > >>>> and
>> > > >>>> > output objects can't hold the aggregations steps. The
>> Workspace is
>> > > >>>> where
>> > > >>>> > that happens. If I try and use a Writer in a workspace it won't
>> > load
>> > > >>>> and
>> > > >>>> > tells me to change it to Holders which was why I was using
>> them to
>> > > >>>> start
>> > > >>>> > with. Maybe I'm missing the architecture of the agg function.
>> It
>> > > >>>> looked
>> > > >>>> > like it was....
>> > > >>>> >
>> > > >>>> > @Param comes in -> initialize @Workspace vars in setup ->
>> process
>> > > data
>> > > >>>> > through @Workspace vars in add -> finalize @Output in output.
>> > > >>>> >
>> > > >>>> > So I'm back to trying to figure out how to create a
>> > > >>>> RepeatedBigIntHolder or
>> > > >>>> > a RepeatedVarCharHolder...
>> > > >>>> >
>> > > >>>> >
>> > > >>>> >
>> > > >>>> > On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning <
>> > ted.dunn...@gmail.com>
>> > > >>>> wrote:
>> > > >>>> >
>> > > >>>> > > I am working on trying to build any kind of list constructing
>> > > >>>> aggregator
>> > > >>>> > > and having absolute fits.
>> > > >>>> > >
>> > > >>>> > > To simplify life, I decided to just build a generic list
>> builder
>> > > >>>> that is
>> > > >>>> > a
>> > > >>>> > > scalar function that returns a list containing its argument.
>> > Thus
>> > > >>>> > zoop(3)
>> > > >>>> > > => [3], zoop('abc') => 'abc' and zoop([1,2,3]) => [[1,2,3]].
>> > > >>>> > >
>> > > >>>> > > The ComplexWriter looks like the place to go. As usual, the
>> > > >>>> complete lack
>> > > >>>> > > of comments in most of Drill makes this very hard since I
>> have
>> > to
>> > > >>>> guess
>> > > >>>> > > what works and what doesn't.
>> > > >>>> > >
>> > > >>>> > > In my code, I note that ComplexWriter has a nice rootAsList()
>> > > >>>> method.  I
>> > > >>>> > > used this in zip and it works nicely to construct lists for
>> > > >>>> output.  I
>> > > >>>> > note
>> > > >>>> > > that the resulting ListWriter has a method
>> > copyReader(FieldReader
>> > > >>>> var1)
>> > > >>>> > > which looks really good.
>> > > >>>> > >
>> > > >>>> > > Unfortunately, the only implementation of copyReader() is in
>> > > >>>> > > AbstractFieldWriter and it looks this:
>> > > >>>> > >
>> > > >>>> > > public void copyReader(FieldReader reader) {
>> > > >>>> > >     this.fail("Copy FieldReader");
>> > > >>>> > > }
>> > > >>>> > >
>> > > >>>> > > I would like to formally say at this point "WTF"?
>> > > >>>> > >
>> > > >>>> > > In digging in further, I see other methods that look handy
>> like
>> > > >>>> > >
>> > > >>>> > > public void write(IntHolder holder) {
>> > > >>>> > >     this.fail("Int");
>> > > >>>> > > }
>> > > >>>> > >
>> > > >>>> > > And then in looking at implementations, it looks like there
>> is a
>> > > >>>> > > combinatorial explosion because every type seems to need a
>> write
>> > > >>>> method
>> > > >>>> > for
>> > > >>>> > > every other type.
>> > > >>>> > >
>> > > >>>> > > What is the thought here?  How can I copy an arbitrary value
>> > into
>> > > a
>> > > >>>> list?
>> > > >>>> > >
>> > > >>>> > > My next thought was to build code that dispatches on type.
>> > There
>> > > >>>> is a
>> > > >>>> > > method called getType() on the FieldReader.  Unfortunately,
>> that
>> > > >>>> drives
>> > > >>>> > > into code generated by protoc and I see no way to dispatch on
>> > the
>> > > >>>> type of
>> > > >>>> > > an incoming value.
>> > > >>>> > >
>> > > >>>> > >
>> > > >>>> > > How is this supposed to work?
>> > > >>>> > >
>> > > >>>> > >
>> > > >>>> > >
>> > > >>>> > >
>> > > >>>> > > On Sat, Jul 4, 2015 at 2:14 PM, mehant baid <
>> > > baid.meh...@gmail.com>
>> > > >>>> > wrote:
>> > > >>>> > >
>> > > >>>> > > > For a detailed example on using ComplexWriter interface you
>> > can
>> > > >>>> take a
>> > > >>>> > > look
>> > > >>>> > > > at the Mappify
>> > > >>>> > > > <
>> > > >>>> > > >
>> > > >>>> > >
>> > > >>>> >
>> > > >>>>
>> > >
>> >
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
>> > > >>>> > > > >
>> > > >>>> > > > (kvgen) function. The function itself is very simple
>> however
>> > it
>> > > >>>> makes
>> > > >>>> > use
>> > > >>>> > > > of the utility methods in MappifyUtility
>> > > >>>> > > > <
>> > > >>>> > > >
>> > > >>>> > >
>> > > >>>> >
>> > > >>>>
>> > >
>> >
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
>> > > >>>> > > > >
>> > > >>>> > > > and MapUtility
>> > > >>>> > > > <
>> > > >>>> > > >
>> > > >>>> > >
>> > > >>>> >
>> > > >>>>
>> > >
>> >
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
>> > > >>>> > > > >
>> > > >>>> > > > which perform most of the work.
>> > > >>>> > > >
>> > > >>>> > > > Currently we don't have a generic infrastructure to handle
>> > > errors
>> > > >>>> > coming
>> > > >>>> > > > out of functions. However there is UserException, which
>> when
>> > > >>>> raised
>> > > >>>> > will
>> > > >>>> > > > make sure that Drill does not gobble up the error message
>> in
>> > > that
>> > > >>>> > > > exception. So you can probably throw a UserException with
>> the
>> > > >>>> failing
>> > > >>>> > > input
>> > > >>>> > > > in your function to make sure it propagates to the user.
>> > > >>>> > > >
>> > > >>>> > > > Thanks
>> > > >>>> > > > Mehant
>> > > >>>> > > >
>> > > >>>> > > > On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau <
>> > > >>>> jacq...@apache.org>
>> > > >>>> > > wrote:
>> > > >>>> > > >
>> > > >>>> > > > > *Holders are for both input and output.  You can also use
>> > > >>>> > CompleWriter
>> > > >>>> > > > for
>> > > >>>> > > > > output and FieldReader for input if you want to write or
>> > read
>> > > a
>> > > >>>> > complex
>> > > >>>> > > > > value.
>> > > >>>> > > > >
>> > > >>>> > > > > I don't think we've provided a really clean way to
>> > construct a
>> > > >>>> > > > > Repeated*Holder for output purposes.  You can probably
>> do it
>> > > by
>> > > >>>> > > reaching
>> > > >>>> > > > > into a bunch of internal interfaces in Drill.  However, I
>> > > would
>> > > >>>> > > recommend
>> > > >>>> > > > > using the ComplexWriter output pattern for now.  This
>> will
>> > be
>> > > a
>> > > >>>> > little
>> > > >>>> > > > less
>> > > >>>> > > > > efficient but substantially less brittle.  I suggest you
>> > open
>> > > >>>> up a
>> > > >>>> > jira
>> > > >>>> > > > for
>> > > >>>> > > > > using a Repeated*Holder as an output.
>> > > >>>> > > > >
>> > > >>>> > > > > On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning <
>> > > >>>> ted.dunn...@gmail.com>
>> > > >>>> > > > wrote:
>> > > >>>> > > > >
>> > > >>>> > > > > > Holders are for input, I think.
>> > > >>>> > > > > >
>> > > >>>> > > > > > Try the different kinds of writers.
>> > > >>>> > > > > >
>> > > >>>> > > > > >
>> > > >>>> > > > > >
>> > > >>>> > > > > > On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates <
>> > > >>>> jba...@maprtech.com>
>> > > >>>> > > > wrote:
>> > > >>>> > > > > >
>> > > >>>> > > > > > > Using a repeatedholder as a @param I've got working.
>> I
>> > was
>> > > >>>> > working
>> > > >>>> > > > on a
>> > > >>>> > > > > > > custom aggregator function using DrillAggFunc. In
>> this I
>> > > >>>> can do
>> > > >>>> > > > simple
>> > > >>>> > > > > > > things but If I want to build a list values and do
>> > > >>>> something with
>> > > >>>> > > it
>> > > >>>> > > > in
>> > > >>>> > > > > > the
>> > > >>>> > > > > > > final output method I think I need to use
>> > RepeatedHolders
>> > > >>>> in the
>> > > >>>> > > > > > > @Workspace. To do that I need to create a new one in
>> the
>> > > >>>> setup
>> > > >>>> > > > method.
>> > > >>>> > > > > I
>> > > >>>> > > > > > > can't get one built. They all require a
>> BufferAllocator
>> > to
>> > > >>>> be
>> > > >>>> > > passed
>> > > >>>> > > > in
>> > > >>>> > > > > > to
>> > > >>>> > > > > > > build it. I have not found a way to get an allocator
>> > yet.
>> > > >>>> Any
>> > > >>>> > > > > > suggestions?
>> > > >>>> > > > > > >
>> > > >>>> > > > > > > On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning <
>> > > >>>> > ted.dunn...@gmail.com
>> > > >>>> > > >
>> > > >>>> > > > > > wrote:
>> > > >>>> > > > > > >
>> > > >>>> > > > > > > > If you look at the zip function in
>> > > >>>> > > > > > > >
>> https://github.com/mapr-demos/simple-drill-functions
>> > > you
>> > > >>>> can
>> > > >>>> > > have
>> > > >>>> > > > an
>> > > >>>> > > > > > > > example of building a structure.
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > > The basic idea is that your output is denoted as
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > >         @Output
>> > > >>>> > > > > > > >         BaseWriter.ComplexWriter writer;
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > > The pattern for building a list of lists of
>> integers
>> > is
>> > > >>>> like
>> > > >>>> > > this:
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > >         writer.setValueCount(n);
>> > > >>>> > > > > > > >         ...
>> > > >>>> > > > > > > >         BaseWriter.ListWriter outer =
>> > > writer.rootAsList();
>> > > >>>> > > > > > > >         outer.start(); // [ outer list
>> > > >>>> > > > > > > >         ...
>> > > >>>> > > > > > > >         // for each inner list
>> > > >>>> > > > > > > >             BaseWriter.ListWriter inner =
>> > outer.list();
>> > > >>>> > > > > > > >             inner.start();
>> > > >>>> > > > > > > >             // for each inner list element
>> > > >>>> > > > > > > >
>> > >  inner.integer().writeInt(accessor.get(i));
>> > > >>>> > > > > > > >             }
>> > > >>>> > > > > > > >             inner.end();   // ] inner list
>> > > >>>> > > > > > > >         }
>> > > >>>> > > > > > > >         outer.end(); // ] outer list
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > > On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates <
>> > > >>>> > jba...@maprtech.com>
>> > > >>>> > > > > > wrote:
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > > > > I have working aggregation and simple UDFs. I've
>> > been
>> > > >>>> trying
>> > > >>>> > to
>> > > >>>> > > > > > > document
>> > > >>>> > > > > > > > > and understand each of the options available in a
>> > > Drill
>> > > >>>> UDF.
>> > > >>>> > > > > > > > Understanding
>> > > >>>> > > > > > > > > the different FunctionScope's, the ones that are
>> > > >>>> allowed, the
>> > > >>>> > > > ones
>> > > >>>> > > > > > that
>> > > >>>> > > > > > > > are
>> > > >>>> > > > > > > > > not. The impact of different cost categories. The
>> > > >>>> different
>> > > >>>> > > > steps
>> > > >>>> > > > > > > needed
>> > > >>>> > > > > > > > > to understand handling any of the supported data
>> > types
>> > > >>>> and
>> > > >>>> > > > > > structures
>> > > >>>> > > > > > > in
>> > > >>>> > > > > > > > > drill.
>> > > >>>> > > > > > > > >
>> > > >>>> > > > > > > > > Here are a few of my current road blocks. Any
>> > pointers
>> > > >>>> would
>> > > >>>> > be
>> > > >>>> > > > > > greatly
>> > > >>>> > > > > > > > > appreciated.
>> > > >>>> > > > > > > > >
>> > > >>>> > > > > > > > >
>> > > >>>> > > > > > > > >    1. I've been trying to understand how to
>> > correctly
>> > > >>>> use
>> > > >>>> > > > > > > RepeatedHolders
>> > > >>>> > > > > > > > >    of whatever type. For this discussion lets
>> start
>> > > >>>> with a
>> > > >>>> > > > > > > > >    RepeatedBigIntHolder. I'm trying to figure out
>> > the
>> > > >>>> best
>> > > >>>> > way
>> > > >>>> > > to
>> > > >>>> > > > > > > create
>> > > >>>> > > > > > > > a
>> > > >>>> > > > > > > > > new
>> > > >>>> > > > > > > > >    one. I have not figured out where in the
>> existing
>> > > >>>> drill
>> > > >>>> > code
>> > > >>>> > > > > > someone
>> > > >>>> > > > > > > > > does
>> > > >>>> > > > > > > > >    this. If I use a  RepeatedBigIntHolder as a
>> > > Workspace
>> > > >>>> > object
>> > > >>>> > > > is
>> > > >>>> > > > > is
>> > > >>>> > > > > > > > null
>> > > >>>> > > > > > > > > to
>> > > >>>> > > > > > > > >    start with. I created a new one in the startup
>> > > >>>> section of
>> > > >>>> > > the
>> > > >>>> > > > > udf
>> > > >>>> > > > > > > but
>> > > >>>> > > > > > > > > the
>> > > >>>> > > > > > > > >    vector was null. I can find no reference in
>> > > creating
>> > > >>>> a new
>> > > >>>> > > > > > > > BigIntVector.
>> > > >>>> > > > > > > > >    There is a way to create a BigIntVector and I
>> did
>> > > >>>> find an
>> > > >>>> > > > > example
>> > > >>>> > > > > > of
>> > > >>>> > > > > > > > >    creating a new VarCharVector but I can't do
>> that
>> > > >>>> using the
>> > > >>>> > > > drill
>> > > >>>> > > > > > jar
>> > > >>>> > > > > > > > > files
>> > > >>>> > > > > > > > >    from 1.0. The
>> > > >>>> org.apache.drill.common.types.TypeProtos and
>> > > >>>> > > > > > > > >    the
>> > > >>>> org.apache.drill.common.types.TypeProtos.MinorType
>> > > >>>> > > classes
>> > > >>>> > > > > do
>> > > >>>> > > > > > > not
>> > > >>>> > > > > > > > >    appear to be accessible from the drill jar
>> files.
>> > > >>>> > > > > > > > >    2. What is the best way to close out a UDF in
>> the
>> > > >>>> event it
>> > > >>>> > > > > > generates
>> > > >>>> > > > > > > > an
>> > > >>>> > > > > > > > >    exception? Are there specific steps one should
>> > > >>>> follow to
>> > > >>>> > > make
>> > > >>>> > > > a
>> > > >>>> > > > > > > clean
>> > > >>>> > > > > > > > > exit
>> > > >>>> > > > > > > > >    in a catch block that are beneficial to Drill?
>> > > >>>> > > > > > > > >
>> > > >>>> > > > > > > >
>> > > >>>> > > > > > >
>> > > >>>> > > > > >
>> > > >>>> > > > >
>> > > >>>> > > >
>> > > >>>> > >
>> > > >>>> >
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Re: Some questions on UDFs

Reply via email to