Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
Using a repeatedholder as a @param I've got working. I was working on a
custom aggregator function using DrillAggFunc. In this I can do simple
things but If I want to build a list values and do something with it in the
final output method I think I need to use RepeatedHolders in the
@Workspace. To do that I need to create a new one in the setup method. I
can't get one built. They all require a BufferAllocator to be passed in to
build it. I have not found a way to get an allocator yet. Any suggestions?

On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 If you look at the zip function in
 https://github.com/mapr-demos/simple-drill-functions you can have an
 example of building a structure.

 The basic idea is that your output is denoted as

 @Output
 BaseWriter.ComplexWriter writer;

 The pattern for building a list of lists of integers is like this:

 writer.setValueCount(n);
 ...
 BaseWriter.ListWriter outer = writer.rootAsList();
 outer.start(); // [ outer list
 ...
 // for each inner list
 BaseWriter.ListWriter inner = outer.list();
 inner.start();
 // for each inner list element
 inner.integer().writeInt(accessor.get(i));
 }
 inner.end();   // ] inner list
 }
 outer.end(); // ] outer list



 On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote:

  I have working aggregation and simple UDFs. I've been trying to document
  and understand each of the options available in a Drill UDF.
 Understanding
  the different FunctionScope's, the ones that are allowed, the ones that
 are
  not. The impact of different cost categories. The different  steps needed
  to understand handling any of the supported data types  and structures in
  drill.
 
  Here are a few of my current road blocks. Any pointers would be greatly
  appreciated.
 
 
 1. I've been trying to understand how to correctly use RepeatedHolders
 of whatever type. For this discussion lets start with a
 RepeatedBigIntHolder. I'm trying to figure out the best way to create
 a
  new
 one. I have not figured out where in the existing drill code someone
  does
 this. If I use a  RepeatedBigIntHolder as a Workspace object is is
 null
  to
 start with. I created a new one in the startup section of the udf but
  the
 vector was null. I can find no reference in creating a new
 BigIntVector.
 There is a way to create a BigIntVector and I did find an example of
 creating a new VarCharVector but I can't do that using the drill jar
  files
 from 1.0. The org.apache.drill.common.types.TypeProtos and
 the org.apache.drill.common.types.TypeProtos.MinorType classes do not
 appear to be accessible from the drill jar files.
 2. What is the best way to close out a UDF in the event it generates
 an
 exception? Are there specific steps one should follow to make a clean
  exit
 in a catch block that are beneficial to Drill?
 



Some questions on UDFs

2015-07-04 Thread Jim Bates
I have working aggregation and simple UDFs. I've been trying to document
and understand each of the options available in a Drill UDF. Understanding
the different FunctionScope's, the ones that are allowed, the ones that are
not. The impact of different cost categories. The different  steps needed
to understand handling any of the supported data types  and structures in
drill.

Here are a few of my current road blocks. Any pointers would be greatly
appreciated.


   1. I've been trying to understand how to correctly use RepeatedHolders
   of whatever type. For this discussion lets start with a
   RepeatedBigIntHolder. I'm trying to figure out the best way to create a new
   one. I have not figured out where in the existing drill code someone does
   this. If I use a  RepeatedBigIntHolder as a Workspace object is is null to
   start with. I created a new one in the startup section of the udf but the
   vector was null. I can find no reference in creating a new BigIntVector.
   There is a way to create a BigIntVector and I did find an example of
   creating a new VarCharVector but I can't do that using the drill jar files
   from 1.0. The org.apache.drill.common.types.TypeProtos and
   the org.apache.drill.common.types.TypeProtos.MinorType classes do not
   appear to be accessible from the drill jar files.
   2. What is the best way to close out a UDF in the event it generates an
   exception? Are there specific steps one should follow to make a clean exit
   in a catch block that are beneficial to Drill?


Re: Some questions on UDFs

2015-07-04 Thread Jacques Nadeau
*Holders are for both input and output.  You can also use CompleWriter for
output and FieldReader for input if you want to write or read a complex
value.

I don't think we've provided a really clean way to construct a
Repeated*Holder for output purposes.  You can probably do it by reaching
into a bunch of internal interfaces in Drill.  However, I would recommend
using the ComplexWriter output pattern for now.  This will be a little less
efficient but substantially less brittle.  I suggest you open up a jira for
using a Repeated*Holder as an output.

On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Holders are for input, I think.

 Try the different kinds of writers.



 On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote:

  Using a repeatedholder as a @param I've got working. I was working on a
  custom aggregator function using DrillAggFunc. In this I can do simple
  things but If I want to build a list values and do something with it in
 the
  final output method I think I need to use RepeatedHolders in the
  @Workspace. To do that I need to create a new one in the setup method. I
  can't get one built. They all require a BufferAllocator to be passed in
 to
  build it. I have not found a way to get an allocator yet. Any
 suggestions?
 
  On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
   If you look at the zip function in
   https://github.com/mapr-demos/simple-drill-functions you can have an
   example of building a structure.
  
   The basic idea is that your output is denoted as
  
   @Output
   BaseWriter.ComplexWriter writer;
  
   The pattern for building a list of lists of integers is like this:
  
   writer.setValueCount(n);
   ...
   BaseWriter.ListWriter outer = writer.rootAsList();
   outer.start(); // [ outer list
   ...
   // for each inner list
   BaseWriter.ListWriter inner = outer.list();
   inner.start();
   // for each inner list element
   inner.integer().writeInt(accessor.get(i));
   }
   inner.end();   // ] inner list
   }
   outer.end(); // ] outer list
  
  
  
   On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com
 wrote:
  
I have working aggregation and simple UDFs. I've been trying to
  document
and understand each of the options available in a Drill UDF.
   Understanding
the different FunctionScope's, the ones that are allowed, the ones
 that
   are
not. The impact of different cost categories. The different  steps
  needed
to understand handling any of the supported data types  and
 structures
  in
drill.
   
Here are a few of my current road blocks. Any pointers would be
 greatly
appreciated.
   
   
   1. I've been trying to understand how to correctly use
  RepeatedHolders
   of whatever type. For this discussion lets start with a
   RepeatedBigIntHolder. I'm trying to figure out the best way to
  create
   a
new
   one. I have not figured out where in the existing drill code
 someone
does
   this. If I use a  RepeatedBigIntHolder as a Workspace object is is
   null
to
   start with. I created a new one in the startup section of the udf
  but
the
   vector was null. I can find no reference in creating a new
   BigIntVector.
   There is a way to create a BigIntVector and I did find an example
 of
   creating a new VarCharVector but I can't do that using the drill
 jar
files
   from 1.0. The org.apache.drill.common.types.TypeProtos and
   the org.apache.drill.common.types.TypeProtos.MinorType classes do
  not
   appear to be accessible from the drill jar files.
   2. What is the best way to close out a UDF in the event it
 generates
   an
   exception? Are there specific steps one should follow to make a
  clean
exit
   in a catch block that are beneficial to Drill?
   
  
 



Re: Some questions on UDFs

2015-07-04 Thread Ted Dunning
If you look at the zip function in
https://github.com/mapr-demos/simple-drill-functions you can have an
example of building a structure.

The basic idea is that your output is denoted as

@Output
BaseWriter.ComplexWriter writer;

The pattern for building a list of lists of integers is like this:

writer.setValueCount(n);
...
BaseWriter.ListWriter outer = writer.rootAsList();
outer.start(); // [ outer list
...
// for each inner list
BaseWriter.ListWriter inner = outer.list();
inner.start();
// for each inner list element
inner.integer().writeInt(accessor.get(i));
}
inner.end();   // ] inner list
}
outer.end(); // ] outer list



On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote:

 I have working aggregation and simple UDFs. I've been trying to document
 and understand each of the options available in a Drill UDF. Understanding
 the different FunctionScope's, the ones that are allowed, the ones that are
 not. The impact of different cost categories. The different  steps needed
 to understand handling any of the supported data types  and structures in
 drill.

 Here are a few of my current road blocks. Any pointers would be greatly
 appreciated.


1. I've been trying to understand how to correctly use RepeatedHolders
of whatever type. For this discussion lets start with a
RepeatedBigIntHolder. I'm trying to figure out the best way to create a
 new
one. I have not figured out where in the existing drill code someone
 does
this. If I use a  RepeatedBigIntHolder as a Workspace object is is null
 to
start with. I created a new one in the startup section of the udf but
 the
vector was null. I can find no reference in creating a new BigIntVector.
There is a way to create a BigIntVector and I did find an example of
creating a new VarCharVector but I can't do that using the drill jar
 files
from 1.0. The org.apache.drill.common.types.TypeProtos and
the org.apache.drill.common.types.TypeProtos.MinorType classes do not
appear to be accessible from the drill jar files.
2. What is the best way to close out a UDF in the event it generates an
exception? Are there specific steps one should follow to make a clean
 exit
in a catch block that are beneficial to Drill?



Re: Some questions on UDFs

2015-07-04 Thread mehant baid
For a detailed example on using ComplexWriter interface you can take a look
at the Mappify
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
(kvgen) function. The function itself is very simple however it makes use
of the utility methods in MappifyUtility
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
and MapUtility
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
which perform most of the work.

Currently we don't have a generic infrastructure to handle errors coming
out of functions. However there is UserException, which when raised will
make sure that Drill does not gobble up the error message in that
exception. So you can probably throw a UserException with the failing input
in your function to make sure it propagates to the user.

Thanks
Mehant

On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org wrote:

 *Holders are for both input and output.  You can also use CompleWriter for
 output and FieldReader for input if you want to write or read a complex
 value.

 I don't think we've provided a really clean way to construct a
 Repeated*Holder for output purposes.  You can probably do it by reaching
 into a bunch of internal interfaces in Drill.  However, I would recommend
 using the ComplexWriter output pattern for now.  This will be a little less
 efficient but substantially less brittle.  I suggest you open up a jira for
 using a Repeated*Holder as an output.

 On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  Holders are for input, I think.
 
  Try the different kinds of writers.
 
 
 
  On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote:
 
   Using a repeatedholder as a @param I've got working. I was working on a
   custom aggregator function using DrillAggFunc. In this I can do simple
   things but If I want to build a list values and do something with it in
  the
   final output method I think I need to use RepeatedHolders in the
   @Workspace. To do that I need to create a new one in the setup method.
 I
   can't get one built. They all require a BufferAllocator to be passed in
  to
   build it. I have not found a way to get an allocator yet. Any
  suggestions?
  
   On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
  
If you look at the zip function in
https://github.com/mapr-demos/simple-drill-functions you can have an
example of building a structure.
   
The basic idea is that your output is denoted as
   
@Output
BaseWriter.ComplexWriter writer;
   
The pattern for building a list of lists of integers is like this:
   
writer.setValueCount(n);
...
BaseWriter.ListWriter outer = writer.rootAsList();
outer.start(); // [ outer list
...
// for each inner list
BaseWriter.ListWriter inner = outer.list();
inner.start();
// for each inner list element
inner.integer().writeInt(accessor.get(i));
}
inner.end();   // ] inner list
}
outer.end(); // ] outer list
   
   
   
On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com
  wrote:
   
 I have working aggregation and simple UDFs. I've been trying to
   document
 and understand each of the options available in a Drill UDF.
Understanding
 the different FunctionScope's, the ones that are allowed, the ones
  that
are
 not. The impact of different cost categories. The different  steps
   needed
 to understand handling any of the supported data types  and
  structures
   in
 drill.

 Here are a few of my current road blocks. Any pointers would be
  greatly
 appreciated.


1. I've been trying to understand how to correctly use
   RepeatedHolders
of whatever type. For this discussion lets start with a
RepeatedBigIntHolder. I'm trying to figure out the best way to
   create
a
 new
one. I have not figured out where in the existing drill code
  someone
 does
this. If I use a  RepeatedBigIntHolder as a Workspace object is
 is
null
 to
start with. I created a new one in the startup section of the
 udf
   but
 the
vector was null. I can find no reference in creating a new
BigIntVector.
There is a way to create a BigIntVector and I did find an
 example
  of
creating a new VarCharVector but I can't do that using the drill
  jar
 files
from 1.0. The org.apache.drill.common.types.TypeProtos and
the org.apache.drill.common.types.TypeProtos.MinorType classes
 do
   not
appear to be accessible from the drill jar files.
2. What is the 

Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
Found the TypeProtos in the drill-protocol jar.

On Sat, Jul 4, 2015 at 12:29 PM, Jim Bates jba...@maprtech.com wrote:

 I have working aggregation and simple UDFs. I've been trying to document
 and understand each of the options available in a Drill UDF. Understanding
 the different FunctionScope's, the ones that are allowed, the ones that are
 not. The impact of different cost categories. The different  steps needed
 to understand handling any of the supported data types  and structures in
 drill.

 Here are a few of my current road blocks. Any pointers would be greatly
 appreciated.


1. I've been trying to understand how to correctly use RepeatedHolders
of whatever type. For this discussion lets start with a
RepeatedBigIntHolder. I'm trying to figure out the best way to create a new
one. I have not figured out where in the existing drill code someone does
this. If I use a  RepeatedBigIntHolder as a Workspace object is is null to
start with. I created a new one in the startup section of the udf but the
vector was null. I can find no reference in creating a new BigIntVector.
There is a way to create a BigIntVector and I did find an example of
creating a new VarCharVector but I can't do that using the drill jar files
from 1.0. The org.apache.drill.common.types.TypeProtos and
the org.apache.drill.common.types.TypeProtos.MinorType classes do not
appear to be accessible from the drill jar files.
2. What is the best way to close out a UDF in the event it generates
an exception? Are there specific steps one should follow to make a clean
exit in a catch block that are beneficial to Drill?




Re: Some questions on UDFs

2015-07-04 Thread Ted Dunning
I am working on trying to build any kind of list constructing aggregator
and having absolute fits.

To simplify life, I decided to just build a generic list builder that is a
scalar function that returns a list containing its argument.  Thus zoop(3)
= [3], zoop('abc') = 'abc' and zoop([1,2,3]) = [[1,2,3]].

The ComplexWriter looks like the place to go. As usual, the complete lack
of comments in most of Drill makes this very hard since I have to guess
what works and what doesn't.

In my code, I note that ComplexWriter has a nice rootAsList() method.  I
used this in zip and it works nicely to construct lists for output.  I note
that the resulting ListWriter has a method copyReader(FieldReader var1)
which looks really good.

Unfortunately, the only implementation of copyReader() is in
AbstractFieldWriter and it looks this:

public void copyReader(FieldReader reader) {
this.fail(Copy FieldReader);
}

I would like to formally say at this point WTF?

In digging in further, I see other methods that look handy like

public void write(IntHolder holder) {
this.fail(Int);
}

And then in looking at implementations, it looks like there is a
combinatorial explosion because every type seems to need a write method for
every other type.

What is the thought here?  How can I copy an arbitrary value into a list?

My next thought was to build code that dispatches on type.  There is a
method called getType() on the FieldReader.  Unfortunately, that drives
into code generated by protoc and I see no way to dispatch on the type of
an incoming value.


How is this supposed to work?




On Sat, Jul 4, 2015 at 2:14 PM, mehant baid baid.meh...@gmail.com wrote:

 For a detailed example on using ComplexWriter interface you can take a look
 at the Mappify
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
 
 (kvgen) function. The function itself is very simple however it makes use
 of the utility methods in MappifyUtility
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
 
 and MapUtility
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
 
 which perform most of the work.

 Currently we don't have a generic infrastructure to handle errors coming
 out of functions. However there is UserException, which when raised will
 make sure that Drill does not gobble up the error message in that
 exception. So you can probably throw a UserException with the failing input
 in your function to make sure it propagates to the user.

 Thanks
 Mehant

 On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org wrote:

  *Holders are for both input and output.  You can also use CompleWriter
 for
  output and FieldReader for input if you want to write or read a complex
  value.
 
  I don't think we've provided a really clean way to construct a
  Repeated*Holder for output purposes.  You can probably do it by reaching
  into a bunch of internal interfaces in Drill.  However, I would recommend
  using the ComplexWriter output pattern for now.  This will be a little
 less
  efficient but substantially less brittle.  I suggest you open up a jira
 for
  using a Repeated*Holder as an output.
 
  On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
   Holders are for input, I think.
  
   Try the different kinds of writers.
  
  
  
   On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com
 wrote:
  
Using a repeatedholder as a @param I've got working. I was working
 on a
custom aggregator function using DrillAggFunc. In this I can do
 simple
things but If I want to build a list values and do something with it
 in
   the
final output method I think I need to use RepeatedHolders in the
@Workspace. To do that I need to create a new one in the setup
 method.
  I
can't get one built. They all require a BufferAllocator to be passed
 in
   to
build it. I have not found a way to get an allocator yet. Any
   suggestions?
   
On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com
   wrote:
   
 If you look at the zip function in
 https://github.com/mapr-demos/simple-drill-functions you can have
 an
 example of building a structure.

 The basic idea is that your output is denoted as

 @Output
 BaseWriter.ComplexWriter writer;

 The pattern for building a list of lists of integers is like this:

 writer.setValueCount(n);
 ...
 BaseWriter.ListWriter outer = writer.rootAsList();
 outer.start(); // [ outer list
 ...
 // for each inner list
 BaseWriter.ListWriter inner = outer.list();
 inner.start();
 // for each inner list element
 

[VOTE][RESULT] Release Apache Drill 1.1.0 (rc0)

2015-07-04 Thread Jacques Nadeau
Looks like we have a release.

I'll upload to dist and send out an annc tomorrow.

Happy Fourth Everyone!

Final Tally:

7 x binding +1
Jinfeng, Parth, Hanifi, Mehant, Jason, Aman, Jacques

6 x non-binding +1
Hakim, Norris, Hsuan, Rahul, Sudheesh, Chun



On Fri, Jul 3, 2015 at 8:43 AM, Chun Chang cch...@maprtech.com wrote:

 72 hour longevity test looks good.

 +1 (non-binding)

 On Thu, Jul 2, 2015 at 8:39 PM, Aman Sinha asi...@maprtech.com wrote:

  (Followup to my previous email). I ran several queries against  TPCH  SF1
  on my Mac and did not find any issues, apart from the version # shown in
  sqlline (which I think is a non-blocker).
 
  +1  (binding)
 
  Aman
 
  On Thu, Jul 2, 2015 at 8:36 PM, Hanifi GUNES hanifigu...@gmail.com
  wrote:
 
   * Jinfeng*
  
   *-  Verified checksum for both the source and binary tar files.*
  
   * Hanifi, Sudheesh*
  
   *- manually inspected maven repo- built a query submitter importing
   jdbc-all artifact from the repo at [jacques:3]*
  
   Is there a guideline on verifying maven artifacts besides inspecting
   published POMs or trying to use them? I could do that if someone points
  me.
  
  
   Thanks.
   -Hanifi
  
  
   2015-07-02 20:09 GMT-07:00 Ted Dunning ted.dunn...@gmail.com:
  
I haven't seen that anybody is checking signatures and the maven
   artifacts.
   
Is anybody doing that?  If not, the release should be held back until
   that
is done.
   
(I can't do it due to time pressure)
   
   
   
On Thu, Jul 2, 2015 at 6:58 PM, Aman Sinha asi...@maprtech.com
  wrote:
   
 Downloaded the binary tar-ball.  Installed on my macbook.  Started
sqlline
 in embedded mode. Saw that sqlline is showing version 1.0.0 instead
  of
 1.1.0, although 'select * from sys.version'  is showing the right
   commit.
 Anyone else sees this ?

 /sqlline -u jdbc:drill:zk=local -n admin -p admin --maxWidth=10
 ...
 apache drill 1.0.0
 just drill it



 On Thu, Jul 2, 2015 at 6:01 PM, Jason Altekruse 
altekruseja...@gmail.com
 wrote:

  +1 binding
 
  - downloaded and built the source tarball, all tests passed (on
 MAC
osx)
  - started sqlline, issued a few queries
  - tried a basic update of storage plugin from the web UI and
 looked
over
 a
  few query profiles
 
 
  On Thu, Jul 2, 2015 at 5:42 PM, Mehant Baid 
 baid.meh...@gmail.com
  
 wrote:
 
   +1 (binding)
  
   * Downloaded src tar-ball, was able to build and run unit tests
   successfully.
   * Brought up DrillBit in embedded and distributed mode.
   * Ran some TPC-H queries via Sqlline and the web UI.
   * Checked the UI for profiles
  
   Looks good.
  
   Thanks
   Mehant
  
  
  
   On 7/2/15 5:36 PM, Sudheesh Katkam wrote:
  
   +1 (non-binding)
  
   * downloaded binary tar-ball
   * ran queries (including cancellations) in embedded mode on
 Mac;
  verified
   states in web UI
  
   * downloaded and built from source tar-ball; ran unit tests on
  Mac
   * ran queries (including cancellations) on a 3 node cluster;
verified
   states in web UI
  
   * built a Java query submitter that uses the maven artifacts
  
   Thanks,
   Sudheesh
  
On Jul 2, 2015, at 4:06 PM, Hanifi Gunes 
 hgu...@maprtech.com
 wrote:
  
   - fully built and tested Drill from source on CentOS
   - deployed on 3 nodes
   - ran concurrent queries
   - manually inspected maven repo
   - built a Scala query submitter importing jdbc-all artifact
  from
the
  repo
   at [jacques:3]
  
   overall, great job!
  
   +1 (binding)
  
   On Thu, Jul 2, 2015 at 3:16 PM, rahul challapalli 
   challapallira...@gmail.com wrote:
  
+1 (non-binding)
  
   Tested the new CTAS auto partition feature
   Published jdbc-all artifact looks good as well
  
   I am able to add the staged jdbc-all package as a dependency
  to
   my
   sample
   JDBC app's pom file and I was able to connect to my drill
cluster. I
   think
   this is a sufficient test for the published artifact.
  
   Part of the pom file below
  
   repositories
   repository
 idstaged-releases/id
 url
  

  http://repository.apache.org/content/repositories/orgapachedrill-1001
   /url
   /repository
 /repositories
  dependencies
   dependency
   groupIdorg.apache.drill.exec/groupId
   artifactIddrill-jdbc-all/artifactId
   version1.1.0/version
 /dependency
   /dependencies
  
   - Rahul
  
   On Thu, Jul 2, 2015 at 2:02 PM, Parth Chandra 
 pchan...@maprtech.com
   wrote:
  
+1 (binding)
  
   Release looks good.
   Built 

Re: Some questions on UDFs

2015-07-04 Thread Ted Dunning
Holders are for input, I think.

Try the different kinds of writers.



On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote:

 Using a repeatedholder as a @param I've got working. I was working on a
 custom aggregator function using DrillAggFunc. In this I can do simple
 things but If I want to build a list values and do something with it in the
 final output method I think I need to use RepeatedHolders in the
 @Workspace. To do that I need to create a new one in the setup method. I
 can't get one built. They all require a BufferAllocator to be passed in to
 build it. I have not found a way to get an allocator yet. Any suggestions?

 On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  If you look at the zip function in
  https://github.com/mapr-demos/simple-drill-functions you can have an
  example of building a structure.
 
  The basic idea is that your output is denoted as
 
  @Output
  BaseWriter.ComplexWriter writer;
 
  The pattern for building a list of lists of integers is like this:
 
  writer.setValueCount(n);
  ...
  BaseWriter.ListWriter outer = writer.rootAsList();
  outer.start(); // [ outer list
  ...
  // for each inner list
  BaseWriter.ListWriter inner = outer.list();
  inner.start();
  // for each inner list element
  inner.integer().writeInt(accessor.get(i));
  }
  inner.end();   // ] inner list
  }
  outer.end(); // ] outer list
 
 
 
  On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote:
 
   I have working aggregation and simple UDFs. I've been trying to
 document
   and understand each of the options available in a Drill UDF.
  Understanding
   the different FunctionScope's, the ones that are allowed, the ones that
  are
   not. The impact of different cost categories. The different  steps
 needed
   to understand handling any of the supported data types  and structures
 in
   drill.
  
   Here are a few of my current road blocks. Any pointers would be greatly
   appreciated.
  
  
  1. I've been trying to understand how to correctly use
 RepeatedHolders
  of whatever type. For this discussion lets start with a
  RepeatedBigIntHolder. I'm trying to figure out the best way to
 create
  a
   new
  one. I have not figured out where in the existing drill code someone
   does
  this. If I use a  RepeatedBigIntHolder as a Workspace object is is
  null
   to
  start with. I created a new one in the startup section of the udf
 but
   the
  vector was null. I can find no reference in creating a new
  BigIntVector.
  There is a way to create a BigIntVector and I did find an example of
  creating a new VarCharVector but I can't do that using the drill jar
   files
  from 1.0. The org.apache.drill.common.types.TypeProtos and
  the org.apache.drill.common.types.TypeProtos.MinorType classes do
 not
  appear to be accessible from the drill jar files.
  2. What is the best way to close out a UDF in the event it generates
  an
  exception? Are there specific steps one should follow to make a
 clean
   exit
  in a catch block that are beneficial to Drill?
  
 



Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
Well... Converting from string to integers anyway... To many 4th of July
Hot Dogs. going into nitrate overload. :)

I am pulling an array of string values from json data. The string values
are actually integers. I am converting to integers and summing each array
entry to the final tally.

On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates jba...@maprtech.com wrote:

 Ted,

 Yes, I started out just getting a basic count to work. I am trying to keep
 the workflow as close to a basic user as possible. As such, I am building
 and using the MapR Apache Drill sandbox to test.


1. Always look at the drillbits.log file to see if drill had any
issues loading your UDF. That was where I learned that all workspace values
needed to be holders
   -
   - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure loading
   function class
   com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, 
 field
   xList. Aggregate function 'MyLinearRegression1' workspace variable 
 'xList'
   is of type 'interface
   org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
   Please change it to Holder type.
2. Error messages:
   - If you get an error in this format it means that Drill can not
   find your function so it probably didn't load it. back to step 1:
  -
  - PARSE ERROR: From line 1, column 8 to line 1, column 44: No
  match found for function signature MyFunctionName(ANY)
   - If you get an error in this format it means that the function is
   there but Drill could not find a signature that matched the param types 
 or
   param numbers you were passing it. The exact wording will change but
   the Missing function implementation is the key phrase to look for:
  -
  - Error: SYSTEM ERROR:
  org.apache.drill.exec.exception.SchemaChangeException: Failure while 
 trying
  to materialize incoming schema.  Errors:
  - Error in expression at index -1.  Error: Missing function
  implementation: [castBIGINT(VARCHAR-REPEATED)].  Full expression: 
 --UNKNOWN
  EXPRESSION--
   3. In your function definition for aggregate functions you need to
set null processing to internal and your isRandom to false. Example below:
   -
   - @FunctionTemplate(name = MyFunctionName, scope =
   FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
   FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
   isBinaryCommutative = false, costCategory =
   FunctionTemplate.FunctionCostCategory.COMPLEX)

 Below is an example from the Apache Drill tutorial data sets contained in
 the MapR Apache Drill sandbox. I am pulling an array if string values from
 json data. The string values are actually integers. I am converting to
 string and summing each array entry to the final tally. This in no way
 represents what this data was for but it did become a handy way for me to
 peck out the correct way to build an aggregation UDF function

 @FunctionTemplate(name = MyArraySum, scope =
 FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
 FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
 isBinaryCommutative = false, costCategory =
 FunctionTemplate.FunctionCostCategory.COMPLEX)
 public static class MyArraySum implements DrillAggFunc {

 @Param RepeatedVarCharHolder listToSearch;
 @Workspace NullableBigIntHolder count;
 @Workspace NullableBigIntHolder sum;
 @Workspace NullableVarCharHolder vc;
 @Output BigIntHolder out;

 @Override
 public void setup() {
 count.value=0;
 sum.value = 0;
 }

 @Override
 public void add() {
 int c = listToSearch.end - listToSearch.start;
 int val = 0;
 try {
 for(int i=0; ic; i++){
 listToSearch.vector.getAccessor().get(i, vc);
 String inputStr =
 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
 vc.end, vc.buffer);
 val = Integer.parseInt(inputStr);
 sum.value = sum.value + val;
 }
 } catch (Exception e) {
 val = 0;
 }
 count.value = count.value + 1;
 }

 Example select statement:
 SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as
 my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5);

 On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Jim,

 I think that you may be having trouble with aggregators in general.

 Have you been able to build *any* aggregator of anything?  I haven't.

 When I try to build an aggregator of int's or doubles, I get a very
 persistent problem with Drill even seeing my aggregates:

 0: jdbc:drill:zk=local *select sum_int(employee_id) from
 cp.`employee.json`;*

 Jul 04, 2015 4:19:35 PM
 org.apache.calcite.sql.validate.SqlValidatorException init

 SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match
 found for function signature sum_int(ANY)

 Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException init

 SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 

Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
I still have issues finding the correct way to create and use a
RepeatedHolder and Writers are a non starter for Workspace values. I can
make do with creating a concatenated string in a VarCharHolder for small
data sets to get past this in the short term and finish testing the output
values I expect but won't be able to do any scale till I figure out how to
make a repeated list.

On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates jba...@maprtech.com wrote:

 Well... Converting from string to integers anyway... To many 4th of July
 Hot Dogs. going into nitrate overload. :)

 I am pulling an array of string values from json data. The string values
 are actually integers. I am converting to integers and summing each array
 entry to the final tally.

 On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates jba...@maprtech.com wrote:

 Ted,

 Yes, I started out just getting a basic count to work. I am trying to
 keep the workflow as close to a basic user as possible. As such, I am
 building and using the MapR Apache Drill sandbox to test.


1. Always look at the drillbits.log file to see if drill had any
issues loading your UDF. That was where I learned that all workspace 
 values
needed to be holders
   -
   - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure loading
   function class
   com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, 
 field
   xList. Aggregate function 'MyLinearRegression1' workspace variable 
 'xList'
   is of type 'interface
   org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
   Please change it to Holder type.
2. Error messages:
   - If you get an error in this format it means that Drill can not
   find your function so it probably didn't load it. back to step 1:
  -
  - PARSE ERROR: From line 1, column 8 to line 1, column 44: No
  match found for function signature MyFunctionName(ANY)
   - If you get an error in this format it means that the function is
   there but Drill could not find a signature that matched the param 
 types or
   param numbers you were passing it. The exact wording will change but
   the Missing function implementation is the key phrase to look for:
  -
  - Error: SYSTEM ERROR:
  org.apache.drill.exec.exception.SchemaChangeException: Failure 
 while trying
  to materialize incoming schema.  Errors:
  - Error in expression at index -1.  Error: Missing function
  implementation: [castBIGINT(VARCHAR-REPEATED)].  Full expression: 
 --UNKNOWN
  EXPRESSION--
   3. In your function definition for aggregate functions you need to
set null processing to internal and your isRandom to false. Example below:
   -
   - @FunctionTemplate(name = MyFunctionName, scope =
   FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
   FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
   isBinaryCommutative = false, costCategory =
   FunctionTemplate.FunctionCostCategory.COMPLEX)

 Below is an example from the Apache Drill tutorial data sets contained in
 the MapR Apache Drill sandbox. I am pulling an array if string values from
 json data. The string values are actually integers. I am converting to
 string and summing each array entry to the final tally. This in no way
 represents what this data was for but it did become a handy way for me to
 peck out the correct way to build an aggregation UDF function

 @FunctionTemplate(name = MyArraySum, scope =
 FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
 FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
 isBinaryCommutative = false, costCategory =
 FunctionTemplate.FunctionCostCategory.COMPLEX)
 public static class MyArraySum implements DrillAggFunc {

 @Param RepeatedVarCharHolder listToSearch;
 @Workspace NullableBigIntHolder count;
 @Workspace NullableBigIntHolder sum;
 @Workspace NullableVarCharHolder vc;
 @Output BigIntHolder out;

 @Override
 public void setup() {
 count.value=0;
 sum.value = 0;
 }

 @Override
 public void add() {
 int c = listToSearch.end - listToSearch.start;
 int val = 0;
 try {
 for(int i=0; ic; i++){
 listToSearch.vector.getAccessor().get(i, vc);
 String inputStr =
 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
 vc.end, vc.buffer);
 val = Integer.parseInt(inputStr);
 sum.value = sum.value + val;
 }
 } catch (Exception e) {
 val = 0;
 }
 count.value = count.value + 1;
 }

 Example select statement:
 SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as
 my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5);

 On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

 Jim,

 I think that you may be having trouble with aggregators in general.

 Have you been able to build *any* aggregator of anything?  I haven't.

 When I try to build an aggregator of int's or doubles, I get a very
 persistent problem with 

[jira] [Resolved] (DRILL-3329) Place the Drill JDBC Driver in a Public Maven Repository

2015-07-04 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3329.
---
Resolution: Fixed

This has been resolved for release 1.1.0

You can reference the driver using:

dependency
  groupIdorg.apache.drill.exec/groupId
  artifactIddrill-jdbc-all/artifactId
  version1.1.0/version
/dependency

It is available in the Apache repo and should propagate to Maven central 
shortly.

 Place the Drill JDBC Driver in a Public Maven Repository
 

 Key: DRILL-3329
 URL: https://issues.apache.org/jira/browse/DRILL-3329
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - JDBC
Affects Versions: 1.0.0
Reporter: Paul Curtis
Assignee: Daniel Barclay (Drill)
Priority: Minor
  Labels: maven
 Fix For: 1.1.0


 Building Java projects utilizing Drill would be greatly enhanced if the Drill 
 JDBC driver was available in a public Maven repository. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
I did get a new RepeatedBigIntHolder built and added a BigIntVector added
to it. I'll try it in the UDF tomorrow and see if there is a difference in
the ways I found to get a BufferAllocator.

.
.
.
@Inject DrillBuf buffer;
@Workspace RepeatedBigIntHolder yList;
.
.
.
@Override
public void setup() {
.
.
.
//org.apache.drill.exec.memory.BufferAllocator allocator =
buffer.getAllocator();
org.apache.drill.exec.memory.BufferAllocator allocator =  new
org.apache.drill.exec.memory.TopLevelAllocator();
yList = new RepeatedBigIntHolder();
yList.vector = new
org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new
org.apache.drill.common.expression.SchemaPath(bigints,org.apache.drill.common.expression.ExpressionPosition.UNKNOWN),
org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)),
allocator);
.
.
.
}



On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates jba...@maprtech.com wrote:

 I still have issues finding the correct way to create and use a
 RepeatedHolder and Writers are a non starter for Workspace values. I can
 make do with creating a concatenated string in a VarCharHolder for small
 data sets to get past this in the short term and finish testing the output
 values I expect but won't be able to do any scale till I figure out how to
 make a repeated list.

 On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates jba...@maprtech.com wrote:

 Well... Converting from string to integers anyway... To many 4th of July
 Hot Dogs. going into nitrate overload. :)

 I am pulling an array of string values from json data. The string values
 are actually integers. I am converting to integers and summing each
 array entry to the final tally.

 On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates jba...@maprtech.com wrote:

 Ted,

 Yes, I started out just getting a basic count to work. I am trying to
 keep the workflow as close to a basic user as possible. As such, I am
 building and using the MapR Apache Drill sandbox to test.


1. Always look at the drillbits.log file to see if drill had any
issues loading your UDF. That was where I learned that all workspace 
 values
needed to be holders
   -
   - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure loading
   function class
   com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, 
 field
   xList. Aggregate function 'MyLinearRegression1' workspace variable 
 'xList'
   is of type 'interface
   org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
   Please change it to Holder type.
2. Error messages:
   - If you get an error in this format it means that Drill can not
   find your function so it probably didn't load it. back to step 1:
  -
  - PARSE ERROR: From line 1, column 8 to line 1, column 44: No
  match found for function signature MyFunctionName(ANY)
   - If you get an error in this format it means that the function
   is there but Drill could not find a signature that matched the param 
 types
   or param numbers you were passing it. The exact wording will change 
 but
   the Missing function implementation is the key phrase to look for:
  -
  - Error: SYSTEM ERROR:
  org.apache.drill.exec.exception.SchemaChangeException: Failure 
 while trying
  to materialize incoming schema.  Errors:
  - Error in expression at index -1.  Error: Missing function
  implementation: [castBIGINT(VARCHAR-REPEATED)].  Full expression: 
 --UNKNOWN
  EXPRESSION--
   3. In your function definition for aggregate functions you need
to set null processing to internal and your isRandom to false. Example
below:
   -
   - @FunctionTemplate(name = MyFunctionName, scope =
   FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
   FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
   isBinaryCommutative = false, costCategory =
   FunctionTemplate.FunctionCostCategory.COMPLEX)

 Below is an example from the Apache Drill tutorial data sets contained
 in the MapR Apache Drill sandbox. I am pulling an array if string values
 from json data. The string values are actually integers. I am converting to
 string and summing each array entry to the final tally. This in no way
 represents what this data was for but it did become a handy way for me to
 peck out the correct way to build an aggregation UDF function

 @FunctionTemplate(name = MyArraySum, scope =
 FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
 FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
 isBinaryCommutative = false, costCategory =
 FunctionTemplate.FunctionCostCategory.COMPLEX)
 public static class MyArraySum implements DrillAggFunc {

 @Param RepeatedVarCharHolder listToSearch;
 @Workspace NullableBigIntHolder count;
 @Workspace NullableBigIntHolder sum;
 @Workspace NullableVarCharHolder vc;
 @Output BigIntHolder out;

 @Override
 

Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
I'm working on the same thing. I want to aggregate a list of values. It has
been a search and guess game for the most part. I'm still stuck in the
process of getting the values all into a list. The writers look interesting
but for aggregation functions  it looks like the input is the param and
output objects can't hold the aggregations steps. The Workspace is where
that happens. If I try and use a Writer in a workspace it won't load and
tells me to change it to Holders which was why I was using them to start
with. Maybe I'm missing the architecture of the agg function. It looked
like it was

@Param comes in - initialize @Workspace vars in setup - process data
through @Workspace vars in add - finalize @Output in output.

So I'm back to trying to figure out how to create a RepeatedBigIntHolder or
a RepeatedVarCharHolder...



On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 I am working on trying to build any kind of list constructing aggregator
 and having absolute fits.

 To simplify life, I decided to just build a generic list builder that is a
 scalar function that returns a list containing its argument.  Thus zoop(3)
 = [3], zoop('abc') = 'abc' and zoop([1,2,3]) = [[1,2,3]].

 The ComplexWriter looks like the place to go. As usual, the complete lack
 of comments in most of Drill makes this very hard since I have to guess
 what works and what doesn't.

 In my code, I note that ComplexWriter has a nice rootAsList() method.  I
 used this in zip and it works nicely to construct lists for output.  I note
 that the resulting ListWriter has a method copyReader(FieldReader var1)
 which looks really good.

 Unfortunately, the only implementation of copyReader() is in
 AbstractFieldWriter and it looks this:

 public void copyReader(FieldReader reader) {
 this.fail(Copy FieldReader);
 }

 I would like to formally say at this point WTF?

 In digging in further, I see other methods that look handy like

 public void write(IntHolder holder) {
 this.fail(Int);
 }

 And then in looking at implementations, it looks like there is a
 combinatorial explosion because every type seems to need a write method for
 every other type.

 What is the thought here?  How can I copy an arbitrary value into a list?

 My next thought was to build code that dispatches on type.  There is a
 method called getType() on the FieldReader.  Unfortunately, that drives
 into code generated by protoc and I see no way to dispatch on the type of
 an incoming value.


 How is this supposed to work?




 On Sat, Jul 4, 2015 at 2:14 PM, mehant baid baid.meh...@gmail.com wrote:

  For a detailed example on using ComplexWriter interface you can take a
 look
  at the Mappify
  
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
  
  (kvgen) function. The function itself is very simple however it makes use
  of the utility methods in MappifyUtility
  
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java
  
  and MapUtility
  
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java
  
  which perform most of the work.
 
  Currently we don't have a generic infrastructure to handle errors coming
  out of functions. However there is UserException, which when raised will
  make sure that Drill does not gobble up the error message in that
  exception. So you can probably throw a UserException with the failing
 input
  in your function to make sure it propagates to the user.
 
  Thanks
  Mehant
 
  On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org
 wrote:
 
   *Holders are for both input and output.  You can also use CompleWriter
  for
   output and FieldReader for input if you want to write or read a complex
   value.
  
   I don't think we've provided a really clean way to construct a
   Repeated*Holder for output purposes.  You can probably do it by
 reaching
   into a bunch of internal interfaces in Drill.  However, I would
 recommend
   using the ComplexWriter output pattern for now.  This will be a little
  less
   efficient but substantially less brittle.  I suggest you open up a jira
  for
   using a Repeated*Holder as an output.
  
   On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
  
Holders are for input, I think.
   
Try the different kinds of writers.
   
   
   
On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com
  wrote:
   
 Using a repeatedholder as a @param I've got working. I was working
  on a
 custom aggregator function using DrillAggFunc. In this I can do
  simple
 things but If I want to build a list values and do something with
 it
  in
the
 final output method I think I need to use RepeatedHolders in the
 @Workspace. To do that I need to create a new one in the setup
  method.
   I
 can't 

Re: Some questions on UDFs

2015-07-04 Thread Jim Bates
Ted,

Yes, I started out just getting a basic count to work. I am trying to keep
the workflow as close to a basic user as possible. As such, I am building
and using the MapR Apache Drill sandbox to test.


   1. Always look at the drillbits.log file to see if drill had any issues
   loading your UDF. That was where I learned that all workspace values needed
   to be holders
  -
  - WARN  o.a.d.exec.expr.fn.FunctionConverter - Failure loading
  function class
  com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1,
field
  xList. Aggregate function 'MyLinearRegression1' workspace
variable 'xList'
  is of type 'interface
  org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'.
  Please change it to Holder type.
   2. Error messages:
  - If you get an error in this format it means that Drill can not find
  your function so it probably didn't load it. back to step 1:
 -
 - PARSE ERROR: From line 1, column 8 to line 1, column 44: No
 match found for function signature MyFunctionName(ANY)
  - If you get an error in this format it means that the function is
  there but Drill could not find a signature that matched the
param types or
  param numbers you were passing it. The exact wording will change but
  the Missing function implementation is the key phrase to look for:
 -
 - Error: SYSTEM ERROR:
 org.apache.drill.exec.exception.SchemaChangeException:
Failure while trying
 to materialize incoming schema.  Errors:
 - Error in expression at index -1.  Error: Missing function
 implementation: [castBIGINT(VARCHAR-REPEATED)].  Full
expression: --UNKNOWN
 EXPRESSION--
  3. In your function definition for aggregate functions you need to
   set null processing to internal and your isRandom to false. Example below:
  -
  - @FunctionTemplate(name = MyFunctionName, scope =
  FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
  FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
  isBinaryCommutative = false, costCategory =
  FunctionTemplate.FunctionCostCategory.COMPLEX)

Below is an example from the Apache Drill tutorial data sets contained in
the MapR Apache Drill sandbox. I am pulling an array if string values from
json data. The string values are actually integers. I am converting to
string and summing each array entry to the final tally. This in no way
represents what this data was for but it did become a handy way for me to
peck out the correct way to build an aggregation UDF function

@FunctionTemplate(name = MyArraySum, scope =
FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls =
FunctionTemplate.NullHandling.INTERNAL, isRandom = false,
isBinaryCommutative = false, costCategory =
FunctionTemplate.FunctionCostCategory.COMPLEX)
public static class MyArraySum implements DrillAggFunc {

@Param RepeatedVarCharHolder listToSearch;
@Workspace NullableBigIntHolder count;
@Workspace NullableBigIntHolder sum;
@Workspace NullableVarCharHolder vc;
@Output BigIntHolder out;

@Override
public void setup() {
count.value=0;
sum.value = 0;
}

@Override
public void add() {
int c = listToSearch.end - listToSearch.start;
int val = 0;
try {
for(int i=0; ic; i++){
listToSearch.vector.getAccessor().get(i, vc);
String inputStr =
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start,
vc.end, vc.buffer);
val = Integer.parseInt(inputStr);
sum.value = sum.value + val;
}
} catch (Exception e) {
val = 0;
}
count.value = count.value + 1;
}

Example select statement:
SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as my_arrays
FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5);

On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Jim,

 I think that you may be having trouble with aggregators in general.

 Have you been able to build *any* aggregator of anything?  I haven't.

 When I try to build an aggregator of int's or doubles, I get a very
 persistent problem with Drill even seeing my aggregates:

 0: jdbc:drill:zk=local *select sum_int(employee_id) from
 cp.`employee.json`;*

 Jul 04, 2015 4:19:35 PM
 org.apache.calcite.sql.validate.SqlValidatorException init

 SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match
 found for function signature sum_int(ANY)

 Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException init

 SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
 column 8 to line 1, column 27: No match found for function signature
 sum_int(ANY)

 *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match
 found for function signature sum_int(ANY)*

 *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010
 http://10.0.1.2:31010] (state=,code=0)*

 0: jdbc:drill:zk=local *select sum_int(cast(employee_id as int)) from
 cp.`employee.json`*;

 Jul 04, 2015 4:19:45 PM
 

Re: Some questions on UDFs

2015-07-04 Thread Ted Dunning
Jim,

I think that you may be having trouble with aggregators in general.

Have you been able to build *any* aggregator of anything?  I haven't.

When I try to build an aggregator of int's or doubles, I get a very
persistent problem with Drill even seeing my aggregates:

0: jdbc:drill:zk=local *select sum_int(employee_id) from
cp.`employee.json`;*

Jul 04, 2015 4:19:35 PM
org.apache.calcite.sql.validate.SqlValidatorException init

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match
found for function signature sum_int(ANY)

Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException init

SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 8 to line 1, column 27: No match found for function signature
sum_int(ANY)

*Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match
found for function signature sum_int(ANY)*

*[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010
http://10.0.1.2:31010] (state=,code=0)*

0: jdbc:drill:zk=local *select sum_int(cast(employee_id as int)) from
cp.`employee.json`*;

Jul 04, 2015 4:19:45 PM
org.apache.calcite.sql.validate.SqlValidatorException init

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match
found for function signature sum_int(NUMERIC)

Jul 04, 2015 4:19:45 PM org.apache.calcite.runtime.CalciteException init

SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 8 to line 1, column 40: No match found for function signature
sum_int(NUMERIC)

*Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: No match
found for function signature sum_int(NUMERIC)*

*[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on 10.0.1.2:31010
http://10.0.1.2:31010] (state=,code=0)*

0: jdbc:drill:zk=local


It looks like there is some undocumented subtlety about how to register an
aggregator.

On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates jba...@maprtech.com wrote:

 I'm working on the same thing. I want to aggregate a list of values. It has
 been a search and guess game for the most part. I'm still stuck in the
 process of getting the values all into a list. The writers look interesting
 but for aggregation functions  it looks like the input is the param and
 output objects can't hold the aggregations steps. The Workspace is where
 that happens. If I try and use a Writer in a workspace it won't load and
 tells me to change it to Holders which was why I was using them to start
 with. Maybe I'm missing the architecture of the agg function. It looked
 like it was

 @Param comes in - initialize @Workspace vars in setup - process data
 through @Workspace vars in add - finalize @Output in output.

 So I'm back to trying to figure out how to create a RepeatedBigIntHolder or
 a RepeatedVarCharHolder...



 On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning ted.dunn...@gmail.com wrote:

  I am working on trying to build any kind of list constructing aggregator
  and having absolute fits.
 
  To simplify life, I decided to just build a generic list builder that is
 a
  scalar function that returns a list containing its argument.  Thus
 zoop(3)
  = [3], zoop('abc') = 'abc' and zoop([1,2,3]) = [[1,2,3]].
 
  The ComplexWriter looks like the place to go. As usual, the complete lack
  of comments in most of Drill makes this very hard since I have to guess
  what works and what doesn't.
 
  In my code, I note that ComplexWriter has a nice rootAsList() method.  I
  used this in zip and it works nicely to construct lists for output.  I
 note
  that the resulting ListWriter has a method copyReader(FieldReader var1)
  which looks really good.
 
  Unfortunately, the only implementation of copyReader() is in
  AbstractFieldWriter and it looks this:
 
  public void copyReader(FieldReader reader) {
  this.fail(Copy FieldReader);
  }
 
  I would like to formally say at this point WTF?
 
  In digging in further, I see other methods that look handy like
 
  public void write(IntHolder holder) {
  this.fail(Int);
  }
 
  And then in looking at implementations, it looks like there is a
  combinatorial explosion because every type seems to need a write method
 for
  every other type.
 
  What is the thought here?  How can I copy an arbitrary value into a list?
 
  My next thought was to build code that dispatches on type.  There is a
  method called getType() on the FieldReader.  Unfortunately, that drives
  into code generated by protoc and I see no way to dispatch on the type of
  an incoming value.
 
 
  How is this supposed to work?
 
 
 
 
  On Sat, Jul 4, 2015 at 2:14 PM, mehant baid baid.meh...@gmail.com
 wrote:
 
   For a detailed example on using ComplexWriter interface you can take a
  look
   at the Mappify
   
  
 
 https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java
   
   (kvgen) function. The function itself is very simple however it makes
 use
   of the utility methods in MappifyUtility