Re: Some questions on UDFs
Using a repeatedholder as a @param I've got working. I was working on a custom aggregator function using DrillAggFunc. In this I can do simple things but If I want to build a list values and do something with it in the final output method I think I need to use RepeatedHolders in the @Workspace. To do that I need to create a new one in the setup method. I can't get one built. They all require a BufferAllocator to be passed in to build it. I have not found a way to get an allocator yet. Any suggestions? On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you look at the zip function in https://github.com/mapr-demos/simple-drill-functions you can have an example of building a structure. The basic idea is that your output is denoted as @Output BaseWriter.ComplexWriter writer; The pattern for building a list of lists of integers is like this: writer.setValueCount(n); ... BaseWriter.ListWriter outer = writer.rootAsList(); outer.start(); // [ outer list ... // for each inner list BaseWriter.ListWriter inner = outer.list(); inner.start(); // for each inner list element inner.integer().writeInt(accessor.get(i)); } inner.end(); // ] inner list } outer.end(); // ] outer list On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote: I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the best way to close out a UDF in the event it generates an exception? Are there specific steps one should follow to make a clean exit in a catch block that are beneficial to Drill?
Some questions on UDFs
I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the best way to close out a UDF in the event it generates an exception? Are there specific steps one should follow to make a clean exit in a catch block that are beneficial to Drill?
Re: Some questions on UDFs
*Holders are for both input and output. You can also use CompleWriter for output and FieldReader for input if you want to write or read a complex value. I don't think we've provided a really clean way to construct a Repeated*Holder for output purposes. You can probably do it by reaching into a bunch of internal interfaces in Drill. However, I would recommend using the ComplexWriter output pattern for now. This will be a little less efficient but substantially less brittle. I suggest you open up a jira for using a Repeated*Holder as an output. On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: Holders are for input, I think. Try the different kinds of writers. On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote: Using a repeatedholder as a @param I've got working. I was working on a custom aggregator function using DrillAggFunc. In this I can do simple things but If I want to build a list values and do something with it in the final output method I think I need to use RepeatedHolders in the @Workspace. To do that I need to create a new one in the setup method. I can't get one built. They all require a BufferAllocator to be passed in to build it. I have not found a way to get an allocator yet. Any suggestions? On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you look at the zip function in https://github.com/mapr-demos/simple-drill-functions you can have an example of building a structure. The basic idea is that your output is denoted as @Output BaseWriter.ComplexWriter writer; The pattern for building a list of lists of integers is like this: writer.setValueCount(n); ... BaseWriter.ListWriter outer = writer.rootAsList(); outer.start(); // [ outer list ... // for each inner list BaseWriter.ListWriter inner = outer.list(); inner.start(); // for each inner list element inner.integer().writeInt(accessor.get(i)); } inner.end(); // ] inner list } outer.end(); // ] outer list On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote: I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the best way to close out a UDF in the event it generates an exception? Are there specific steps one should follow to make a clean exit in a catch block that are beneficial to Drill?
Re: Some questions on UDFs
If you look at the zip function in https://github.com/mapr-demos/simple-drill-functions you can have an example of building a structure. The basic idea is that your output is denoted as @Output BaseWriter.ComplexWriter writer; The pattern for building a list of lists of integers is like this: writer.setValueCount(n); ... BaseWriter.ListWriter outer = writer.rootAsList(); outer.start(); // [ outer list ... // for each inner list BaseWriter.ListWriter inner = outer.list(); inner.start(); // for each inner list element inner.integer().writeInt(accessor.get(i)); } inner.end(); // ] inner list } outer.end(); // ] outer list On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote: I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the best way to close out a UDF in the event it generates an exception? Are there specific steps one should follow to make a clean exit in a catch block that are beneficial to Drill?
Re: Some questions on UDFs
For a detailed example on using ComplexWriter interface you can take a look at the Mappify https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java (kvgen) function. The function itself is very simple however it makes use of the utility methods in MappifyUtility https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java and MapUtility https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java which perform most of the work. Currently we don't have a generic infrastructure to handle errors coming out of functions. However there is UserException, which when raised will make sure that Drill does not gobble up the error message in that exception. So you can probably throw a UserException with the failing input in your function to make sure it propagates to the user. Thanks Mehant On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org wrote: *Holders are for both input and output. You can also use CompleWriter for output and FieldReader for input if you want to write or read a complex value. I don't think we've provided a really clean way to construct a Repeated*Holder for output purposes. You can probably do it by reaching into a bunch of internal interfaces in Drill. However, I would recommend using the ComplexWriter output pattern for now. This will be a little less efficient but substantially less brittle. I suggest you open up a jira for using a Repeated*Holder as an output. On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: Holders are for input, I think. Try the different kinds of writers. On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote: Using a repeatedholder as a @param I've got working. I was working on a custom aggregator function using DrillAggFunc. In this I can do simple things but If I want to build a list values and do something with it in the final output method I think I need to use RepeatedHolders in the @Workspace. To do that I need to create a new one in the setup method. I can't get one built. They all require a BufferAllocator to be passed in to build it. I have not found a way to get an allocator yet. Any suggestions? On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you look at the zip function in https://github.com/mapr-demos/simple-drill-functions you can have an example of building a structure. The basic idea is that your output is denoted as @Output BaseWriter.ComplexWriter writer; The pattern for building a list of lists of integers is like this: writer.setValueCount(n); ... BaseWriter.ListWriter outer = writer.rootAsList(); outer.start(); // [ outer list ... // for each inner list BaseWriter.ListWriter inner = outer.list(); inner.start(); // for each inner list element inner.integer().writeInt(accessor.get(i)); } inner.end(); // ] inner list } outer.end(); // ] outer list On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote: I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the
Re: Some questions on UDFs
Found the TypeProtos in the drill-protocol jar. On Sat, Jul 4, 2015 at 12:29 PM, Jim Bates jba...@maprtech.com wrote: I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the best way to close out a UDF in the event it generates an exception? Are there specific steps one should follow to make a clean exit in a catch block that are beneficial to Drill?
Re: Some questions on UDFs
I am working on trying to build any kind of list constructing aggregator and having absolute fits. To simplify life, I decided to just build a generic list builder that is a scalar function that returns a list containing its argument. Thus zoop(3) = [3], zoop('abc') = 'abc' and zoop([1,2,3]) = [[1,2,3]]. The ComplexWriter looks like the place to go. As usual, the complete lack of comments in most of Drill makes this very hard since I have to guess what works and what doesn't. In my code, I note that ComplexWriter has a nice rootAsList() method. I used this in zip and it works nicely to construct lists for output. I note that the resulting ListWriter has a method copyReader(FieldReader var1) which looks really good. Unfortunately, the only implementation of copyReader() is in AbstractFieldWriter and it looks this: public void copyReader(FieldReader reader) { this.fail(Copy FieldReader); } I would like to formally say at this point WTF? In digging in further, I see other methods that look handy like public void write(IntHolder holder) { this.fail(Int); } And then in looking at implementations, it looks like there is a combinatorial explosion because every type seems to need a write method for every other type. What is the thought here? How can I copy an arbitrary value into a list? My next thought was to build code that dispatches on type. There is a method called getType() on the FieldReader. Unfortunately, that drives into code generated by protoc and I see no way to dispatch on the type of an incoming value. How is this supposed to work? On Sat, Jul 4, 2015 at 2:14 PM, mehant baid baid.meh...@gmail.com wrote: For a detailed example on using ComplexWriter interface you can take a look at the Mappify https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java (kvgen) function. The function itself is very simple however it makes use of the utility methods in MappifyUtility https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java and MapUtility https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java which perform most of the work. Currently we don't have a generic infrastructure to handle errors coming out of functions. However there is UserException, which when raised will make sure that Drill does not gobble up the error message in that exception. So you can probably throw a UserException with the failing input in your function to make sure it propagates to the user. Thanks Mehant On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org wrote: *Holders are for both input and output. You can also use CompleWriter for output and FieldReader for input if you want to write or read a complex value. I don't think we've provided a really clean way to construct a Repeated*Holder for output purposes. You can probably do it by reaching into a bunch of internal interfaces in Drill. However, I would recommend using the ComplexWriter output pattern for now. This will be a little less efficient but substantially less brittle. I suggest you open up a jira for using a Repeated*Holder as an output. On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: Holders are for input, I think. Try the different kinds of writers. On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote: Using a repeatedholder as a @param I've got working. I was working on a custom aggregator function using DrillAggFunc. In this I can do simple things but If I want to build a list values and do something with it in the final output method I think I need to use RepeatedHolders in the @Workspace. To do that I need to create a new one in the setup method. I can't get one built. They all require a BufferAllocator to be passed in to build it. I have not found a way to get an allocator yet. Any suggestions? On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you look at the zip function in https://github.com/mapr-demos/simple-drill-functions you can have an example of building a structure. The basic idea is that your output is denoted as @Output BaseWriter.ComplexWriter writer; The pattern for building a list of lists of integers is like this: writer.setValueCount(n); ... BaseWriter.ListWriter outer = writer.rootAsList(); outer.start(); // [ outer list ... // for each inner list BaseWriter.ListWriter inner = outer.list(); inner.start(); // for each inner list element
[VOTE][RESULT] Release Apache Drill 1.1.0 (rc0)
Looks like we have a release. I'll upload to dist and send out an annc tomorrow. Happy Fourth Everyone! Final Tally: 7 x binding +1 Jinfeng, Parth, Hanifi, Mehant, Jason, Aman, Jacques 6 x non-binding +1 Hakim, Norris, Hsuan, Rahul, Sudheesh, Chun On Fri, Jul 3, 2015 at 8:43 AM, Chun Chang cch...@maprtech.com wrote: 72 hour longevity test looks good. +1 (non-binding) On Thu, Jul 2, 2015 at 8:39 PM, Aman Sinha asi...@maprtech.com wrote: (Followup to my previous email). I ran several queries against TPCH SF1 on my Mac and did not find any issues, apart from the version # shown in sqlline (which I think is a non-blocker). +1 (binding) Aman On Thu, Jul 2, 2015 at 8:36 PM, Hanifi GUNES hanifigu...@gmail.com wrote: * Jinfeng* *- Verified checksum for both the source and binary tar files.* * Hanifi, Sudheesh* *- manually inspected maven repo- built a query submitter importing jdbc-all artifact from the repo at [jacques:3]* Is there a guideline on verifying maven artifacts besides inspecting published POMs or trying to use them? I could do that if someone points me. Thanks. -Hanifi 2015-07-02 20:09 GMT-07:00 Ted Dunning ted.dunn...@gmail.com: I haven't seen that anybody is checking signatures and the maven artifacts. Is anybody doing that? If not, the release should be held back until that is done. (I can't do it due to time pressure) On Thu, Jul 2, 2015 at 6:58 PM, Aman Sinha asi...@maprtech.com wrote: Downloaded the binary tar-ball. Installed on my macbook. Started sqlline in embedded mode. Saw that sqlline is showing version 1.0.0 instead of 1.1.0, although 'select * from sys.version' is showing the right commit. Anyone else sees this ? /sqlline -u jdbc:drill:zk=local -n admin -p admin --maxWidth=10 ... apache drill 1.0.0 just drill it On Thu, Jul 2, 2015 at 6:01 PM, Jason Altekruse altekruseja...@gmail.com wrote: +1 binding - downloaded and built the source tarball, all tests passed (on MAC osx) - started sqlline, issued a few queries - tried a basic update of storage plugin from the web UI and looked over a few query profiles On Thu, Jul 2, 2015 at 5:42 PM, Mehant Baid baid.meh...@gmail.com wrote: +1 (binding) * Downloaded src tar-ball, was able to build and run unit tests successfully. * Brought up DrillBit in embedded and distributed mode. * Ran some TPC-H queries via Sqlline and the web UI. * Checked the UI for profiles Looks good. Thanks Mehant On 7/2/15 5:36 PM, Sudheesh Katkam wrote: +1 (non-binding) * downloaded binary tar-ball * ran queries (including cancellations) in embedded mode on Mac; verified states in web UI * downloaded and built from source tar-ball; ran unit tests on Mac * ran queries (including cancellations) on a 3 node cluster; verified states in web UI * built a Java query submitter that uses the maven artifacts Thanks, Sudheesh On Jul 2, 2015, at 4:06 PM, Hanifi Gunes hgu...@maprtech.com wrote: - fully built and tested Drill from source on CentOS - deployed on 3 nodes - ran concurrent queries - manually inspected maven repo - built a Scala query submitter importing jdbc-all artifact from the repo at [jacques:3] overall, great job! +1 (binding) On Thu, Jul 2, 2015 at 3:16 PM, rahul challapalli challapallira...@gmail.com wrote: +1 (non-binding) Tested the new CTAS auto partition feature Published jdbc-all artifact looks good as well I am able to add the staged jdbc-all package as a dependency to my sample JDBC app's pom file and I was able to connect to my drill cluster. I think this is a sufficient test for the published artifact. Part of the pom file below repositories repository idstaged-releases/id url http://repository.apache.org/content/repositories/orgapachedrill-1001 /url /repository /repositories dependencies dependency groupIdorg.apache.drill.exec/groupId artifactIddrill-jdbc-all/artifactId version1.1.0/version /dependency /dependencies - Rahul On Thu, Jul 2, 2015 at 2:02 PM, Parth Chandra pchan...@maprtech.com wrote: +1 (binding) Release looks good. Built
Re: Some questions on UDFs
Holders are for input, I think. Try the different kinds of writers. On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote: Using a repeatedholder as a @param I've got working. I was working on a custom aggregator function using DrillAggFunc. In this I can do simple things but If I want to build a list values and do something with it in the final output method I think I need to use RepeatedHolders in the @Workspace. To do that I need to create a new one in the setup method. I can't get one built. They all require a BufferAllocator to be passed in to build it. I have not found a way to get an allocator yet. Any suggestions? On Sat, Jul 4, 2015 at 1:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you look at the zip function in https://github.com/mapr-demos/simple-drill-functions you can have an example of building a structure. The basic idea is that your output is denoted as @Output BaseWriter.ComplexWriter writer; The pattern for building a list of lists of integers is like this: writer.setValueCount(n); ... BaseWriter.ListWriter outer = writer.rootAsList(); outer.start(); // [ outer list ... // for each inner list BaseWriter.ListWriter inner = outer.list(); inner.start(); // for each inner list element inner.integer().writeInt(accessor.get(i)); } inner.end(); // ] inner list } outer.end(); // ] outer list On Sat, Jul 4, 2015 at 10:29 AM, Jim Bates jba...@maprtech.com wrote: I have working aggregation and simple UDFs. I've been trying to document and understand each of the options available in a Drill UDF. Understanding the different FunctionScope's, the ones that are allowed, the ones that are not. The impact of different cost categories. The different steps needed to understand handling any of the supported data types and structures in drill. Here are a few of my current road blocks. Any pointers would be greatly appreciated. 1. I've been trying to understand how to correctly use RepeatedHolders of whatever type. For this discussion lets start with a RepeatedBigIntHolder. I'm trying to figure out the best way to create a new one. I have not figured out where in the existing drill code someone does this. If I use a RepeatedBigIntHolder as a Workspace object is is null to start with. I created a new one in the startup section of the udf but the vector was null. I can find no reference in creating a new BigIntVector. There is a way to create a BigIntVector and I did find an example of creating a new VarCharVector but I can't do that using the drill jar files from 1.0. The org.apache.drill.common.types.TypeProtos and the org.apache.drill.common.types.TypeProtos.MinorType classes do not appear to be accessible from the drill jar files. 2. What is the best way to close out a UDF in the event it generates an exception? Are there specific steps one should follow to make a clean exit in a catch block that are beneficial to Drill?
Re: Some questions on UDFs
Well... Converting from string to integers anyway... To many 4th of July Hot Dogs. going into nitrate overload. :) I am pulling an array of string values from json data. The string values are actually integers. I am converting to integers and summing each array entry to the final tally. On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates jba...@maprtech.com wrote: Ted, Yes, I started out just getting a basic count to work. I am trying to keep the workflow as close to a basic user as possible. As such, I am building and using the MapR Apache Drill sandbox to test. 1. Always look at the drillbits.log file to see if drill had any issues loading your UDF. That was where I learned that all workspace values needed to be holders - - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure loading function class com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, field xList. Aggregate function 'MyLinearRegression1' workspace variable 'xList' is of type 'interface org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. Please change it to Holder type. 2. Error messages: - If you get an error in this format it means that Drill can not find your function so it probably didn't load it. back to step 1: - - PARSE ERROR: From line 1, column 8 to line 1, column 44: No match found for function signature MyFunctionName(ANY) - If you get an error in this format it means that the function is there but Drill could not find a signature that matched the param types or param numbers you were passing it. The exact wording will change but the Missing function implementation is the key phrase to look for: - - Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: - Error in expression at index -1. Error: Missing function implementation: [castBIGINT(VARCHAR-REPEATED)]. Full expression: --UNKNOWN EXPRESSION-- 3. In your function definition for aggregate functions you need to set null processing to internal and your isRandom to false. Example below: - - @FunctionTemplate(name = MyFunctionName, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) Below is an example from the Apache Drill tutorial data sets contained in the MapR Apache Drill sandbox. I am pulling an array if string values from json data. The string values are actually integers. I am converting to string and summing each array entry to the final tally. This in no way represents what this data was for but it did become a handy way for me to peck out the correct way to build an aggregation UDF function @FunctionTemplate(name = MyArraySum, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) public static class MyArraySum implements DrillAggFunc { @Param RepeatedVarCharHolder listToSearch; @Workspace NullableBigIntHolder count; @Workspace NullableBigIntHolder sum; @Workspace NullableVarCharHolder vc; @Output BigIntHolder out; @Override public void setup() { count.value=0; sum.value = 0; } @Override public void add() { int c = listToSearch.end - listToSearch.start; int val = 0; try { for(int i=0; ic; i++){ listToSearch.vector.getAccessor().get(i, vc); String inputStr = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, vc.end, vc.buffer); val = Integer.parseInt(inputStr); sum.value = sum.value + val; } } catch (Exception e) { val = 0; } count.value = count.value + 1; } Example select statement: SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5); On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning ted.dunn...@gmail.com wrote: Jim, I think that you may be having trouble with aggregators in general. Have you been able to build *any* aggregator of anything? I haven't. When I try to build an aggregator of int's or doubles, I get a very persistent problem with Drill even seeing my aggregates: 0: jdbc:drill:zk=local *select sum_int(employee_id) from cp.`employee.json`;* Jul 04, 2015 4:19:35 PM org.apache.calcite.sql.validate.SqlValidatorException init SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature sum_int(ANY) Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException init SEVERE: org.apache.calcite.runtime.CalciteContextException: From line
Re: Some questions on UDFs
I still have issues finding the correct way to create and use a RepeatedHolder and Writers are a non starter for Workspace values. I can make do with creating a concatenated string in a VarCharHolder for small data sets to get past this in the short term and finish testing the output values I expect but won't be able to do any scale till I figure out how to make a repeated list. On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates jba...@maprtech.com wrote: Well... Converting from string to integers anyway... To many 4th of July Hot Dogs. going into nitrate overload. :) I am pulling an array of string values from json data. The string values are actually integers. I am converting to integers and summing each array entry to the final tally. On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates jba...@maprtech.com wrote: Ted, Yes, I started out just getting a basic count to work. I am trying to keep the workflow as close to a basic user as possible. As such, I am building and using the MapR Apache Drill sandbox to test. 1. Always look at the drillbits.log file to see if drill had any issues loading your UDF. That was where I learned that all workspace values needed to be holders - - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure loading function class com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, field xList. Aggregate function 'MyLinearRegression1' workspace variable 'xList' is of type 'interface org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. Please change it to Holder type. 2. Error messages: - If you get an error in this format it means that Drill can not find your function so it probably didn't load it. back to step 1: - - PARSE ERROR: From line 1, column 8 to line 1, column 44: No match found for function signature MyFunctionName(ANY) - If you get an error in this format it means that the function is there but Drill could not find a signature that matched the param types or param numbers you were passing it. The exact wording will change but the Missing function implementation is the key phrase to look for: - - Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: - Error in expression at index -1. Error: Missing function implementation: [castBIGINT(VARCHAR-REPEATED)]. Full expression: --UNKNOWN EXPRESSION-- 3. In your function definition for aggregate functions you need to set null processing to internal and your isRandom to false. Example below: - - @FunctionTemplate(name = MyFunctionName, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) Below is an example from the Apache Drill tutorial data sets contained in the MapR Apache Drill sandbox. I am pulling an array if string values from json data. The string values are actually integers. I am converting to string and summing each array entry to the final tally. This in no way represents what this data was for but it did become a handy way for me to peck out the correct way to build an aggregation UDF function @FunctionTemplate(name = MyArraySum, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) public static class MyArraySum implements DrillAggFunc { @Param RepeatedVarCharHolder listToSearch; @Workspace NullableBigIntHolder count; @Workspace NullableBigIntHolder sum; @Workspace NullableVarCharHolder vc; @Output BigIntHolder out; @Override public void setup() { count.value=0; sum.value = 0; } @Override public void add() { int c = listToSearch.end - listToSearch.start; int val = 0; try { for(int i=0; ic; i++){ listToSearch.vector.getAccessor().get(i, vc); String inputStr = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, vc.end, vc.buffer); val = Integer.parseInt(inputStr); sum.value = sum.value + val; } } catch (Exception e) { val = 0; } count.value = count.value + 1; } Example select statement: SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5); On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning ted.dunn...@gmail.com wrote: Jim, I think that you may be having trouble with aggregators in general. Have you been able to build *any* aggregator of anything? I haven't. When I try to build an aggregator of int's or doubles, I get a very persistent problem with
[jira] [Resolved] (DRILL-3329) Place the Drill JDBC Driver in a Public Maven Repository
[ https://issues.apache.org/jira/browse/DRILL-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-3329. --- Resolution: Fixed This has been resolved for release 1.1.0 You can reference the driver using: dependency groupIdorg.apache.drill.exec/groupId artifactIddrill-jdbc-all/artifactId version1.1.0/version /dependency It is available in the Apache repo and should propagate to Maven central shortly. Place the Drill JDBC Driver in a Public Maven Repository Key: DRILL-3329 URL: https://issues.apache.org/jira/browse/DRILL-3329 Project: Apache Drill Issue Type: Improvement Components: Client - JDBC Affects Versions: 1.0.0 Reporter: Paul Curtis Assignee: Daniel Barclay (Drill) Priority: Minor Labels: maven Fix For: 1.1.0 Building Java projects utilizing Drill would be greatly enhanced if the Drill JDBC driver was available in a public Maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Some questions on UDFs
I did get a new RepeatedBigIntHolder built and added a BigIntVector added to it. I'll try it in the UDF tomorrow and see if there is a difference in the ways I found to get a BufferAllocator. . . . @Inject DrillBuf buffer; @Workspace RepeatedBigIntHolder yList; . . . @Override public void setup() { . . . //org.apache.drill.exec.memory.BufferAllocator allocator = buffer.getAllocator(); org.apache.drill.exec.memory.BufferAllocator allocator = new org.apache.drill.exec.memory.TopLevelAllocator(); yList = new RepeatedBigIntHolder(); yList.vector = new org.apache.drill.exec.vector.BigIntVector(org.apache.drill.exec.record.MaterializedField.create(new org.apache.drill.common.expression.SchemaPath(bigints,org.apache.drill.common.expression.ExpressionPosition.UNKNOWN), org.apache.drill.common.types.Types.optional(org.apache.drill.common.types.TypeProtos.MinorType.BIGINT)), allocator); . . . } On Sat, Jul 4, 2015 at 7:39 PM, Jim Bates jba...@maprtech.com wrote: I still have issues finding the correct way to create and use a RepeatedHolder and Writers are a non starter for Workspace values. I can make do with creating a concatenated string in a VarCharHolder for small data sets to get past this in the short term and finish testing the output values I expect but won't be able to do any scale till I figure out how to make a repeated list. On Sat, Jul 4, 2015 at 7:12 PM, Jim Bates jba...@maprtech.com wrote: Well... Converting from string to integers anyway... To many 4th of July Hot Dogs. going into nitrate overload. :) I am pulling an array of string values from json data. The string values are actually integers. I am converting to integers and summing each array entry to the final tally. On Sat, Jul 4, 2015 at 7:04 PM, Jim Bates jba...@maprtech.com wrote: Ted, Yes, I started out just getting a basic count to work. I am trying to keep the workflow as close to a basic user as possible. As such, I am building and using the MapR Apache Drill sandbox to test. 1. Always look at the drillbits.log file to see if drill had any issues loading your UDF. That was where I learned that all workspace values needed to be holders - - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure loading function class com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, field xList. Aggregate function 'MyLinearRegression1' workspace variable 'xList' is of type 'interface org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. Please change it to Holder type. 2. Error messages: - If you get an error in this format it means that Drill can not find your function so it probably didn't load it. back to step 1: - - PARSE ERROR: From line 1, column 8 to line 1, column 44: No match found for function signature MyFunctionName(ANY) - If you get an error in this format it means that the function is there but Drill could not find a signature that matched the param types or param numbers you were passing it. The exact wording will change but the Missing function implementation is the key phrase to look for: - - Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: - Error in expression at index -1. Error: Missing function implementation: [castBIGINT(VARCHAR-REPEATED)]. Full expression: --UNKNOWN EXPRESSION-- 3. In your function definition for aggregate functions you need to set null processing to internal and your isRandom to false. Example below: - - @FunctionTemplate(name = MyFunctionName, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) Below is an example from the Apache Drill tutorial data sets contained in the MapR Apache Drill sandbox. I am pulling an array if string values from json data. The string values are actually integers. I am converting to string and summing each array entry to the final tally. This in no way represents what this data was for but it did become a handy way for me to peck out the correct way to build an aggregation UDF function @FunctionTemplate(name = MyArraySum, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) public static class MyArraySum implements DrillAggFunc { @Param RepeatedVarCharHolder listToSearch; @Workspace NullableBigIntHolder count; @Workspace NullableBigIntHolder sum; @Workspace NullableVarCharHolder vc; @Output BigIntHolder out; @Override
Re: Some questions on UDFs
I'm working on the same thing. I want to aggregate a list of values. It has been a search and guess game for the most part. I'm still stuck in the process of getting the values all into a list. The writers look interesting but for aggregation functions it looks like the input is the param and output objects can't hold the aggregations steps. The Workspace is where that happens. If I try and use a Writer in a workspace it won't load and tells me to change it to Holders which was why I was using them to start with. Maybe I'm missing the architecture of the agg function. It looked like it was @Param comes in - initialize @Workspace vars in setup - process data through @Workspace vars in add - finalize @Output in output. So I'm back to trying to figure out how to create a RepeatedBigIntHolder or a RepeatedVarCharHolder... On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning ted.dunn...@gmail.com wrote: I am working on trying to build any kind of list constructing aggregator and having absolute fits. To simplify life, I decided to just build a generic list builder that is a scalar function that returns a list containing its argument. Thus zoop(3) = [3], zoop('abc') = 'abc' and zoop([1,2,3]) = [[1,2,3]]. The ComplexWriter looks like the place to go. As usual, the complete lack of comments in most of Drill makes this very hard since I have to guess what works and what doesn't. In my code, I note that ComplexWriter has a nice rootAsList() method. I used this in zip and it works nicely to construct lists for output. I note that the resulting ListWriter has a method copyReader(FieldReader var1) which looks really good. Unfortunately, the only implementation of copyReader() is in AbstractFieldWriter and it looks this: public void copyReader(FieldReader reader) { this.fail(Copy FieldReader); } I would like to formally say at this point WTF? In digging in further, I see other methods that look handy like public void write(IntHolder holder) { this.fail(Int); } And then in looking at implementations, it looks like there is a combinatorial explosion because every type seems to need a write method for every other type. What is the thought here? How can I copy an arbitrary value into a list? My next thought was to build code that dispatches on type. There is a method called getType() on the FieldReader. Unfortunately, that drives into code generated by protoc and I see no way to dispatch on the type of an incoming value. How is this supposed to work? On Sat, Jul 4, 2015 at 2:14 PM, mehant baid baid.meh...@gmail.com wrote: For a detailed example on using ComplexWriter interface you can take a look at the Mappify https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java (kvgen) function. The function itself is very simple however it makes use of the utility methods in MappifyUtility https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/MappifyUtility.java and MapUtility https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapUtility.java which perform most of the work. Currently we don't have a generic infrastructure to handle errors coming out of functions. However there is UserException, which when raised will make sure that Drill does not gobble up the error message in that exception. So you can probably throw a UserException with the failing input in your function to make sure it propagates to the user. Thanks Mehant On Sat, Jul 4, 2015 at 1:48 PM, Jacques Nadeau jacq...@apache.org wrote: *Holders are for both input and output. You can also use CompleWriter for output and FieldReader for input if you want to write or read a complex value. I don't think we've provided a really clean way to construct a Repeated*Holder for output purposes. You can probably do it by reaching into a bunch of internal interfaces in Drill. However, I would recommend using the ComplexWriter output pattern for now. This will be a little less efficient but substantially less brittle. I suggest you open up a jira for using a Repeated*Holder as an output. On Sat, Jul 4, 2015 at 1:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: Holders are for input, I think. Try the different kinds of writers. On Sat, Jul 4, 2015 at 12:49 PM, Jim Bates jba...@maprtech.com wrote: Using a repeatedholder as a @param I've got working. I was working on a custom aggregator function using DrillAggFunc. In this I can do simple things but If I want to build a list values and do something with it in the final output method I think I need to use RepeatedHolders in the @Workspace. To do that I need to create a new one in the setup method. I can't
Re: Some questions on UDFs
Ted, Yes, I started out just getting a basic count to work. I am trying to keep the workflow as close to a basic user as possible. As such, I am building and using the MapR Apache Drill sandbox to test. 1. Always look at the drillbits.log file to see if drill had any issues loading your UDF. That was where I learned that all workspace values needed to be holders - - WARN o.a.d.exec.expr.fn.FunctionConverter - Failure loading function class com.mapr.example.udfs.drill.MyDrillAggFunctions$MyLinearRegression1, field xList. Aggregate function 'MyLinearRegression1' workspace variable 'xList' is of type 'interface org.apache.drill.exec.vector.complex.writer.BaseWriter$ComplexWriter'. Please change it to Holder type. 2. Error messages: - If you get an error in this format it means that Drill can not find your function so it probably didn't load it. back to step 1: - - PARSE ERROR: From line 1, column 8 to line 1, column 44: No match found for function signature MyFunctionName(ANY) - If you get an error in this format it means that the function is there but Drill could not find a signature that matched the param types or param numbers you were passing it. The exact wording will change but the Missing function implementation is the key phrase to look for: - - Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: - Error in expression at index -1. Error: Missing function implementation: [castBIGINT(VARCHAR-REPEATED)]. Full expression: --UNKNOWN EXPRESSION-- 3. In your function definition for aggregate functions you need to set null processing to internal and your isRandom to false. Example below: - - @FunctionTemplate(name = MyFunctionName, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) Below is an example from the Apache Drill tutorial data sets contained in the MapR Apache Drill sandbox. I am pulling an array if string values from json data. The string values are actually integers. I am converting to string and summing each array entry to the final tally. This in no way represents what this data was for but it did become a handy way for me to peck out the correct way to build an aggregation UDF function @FunctionTemplate(name = MyArraySum, scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE, nulls = FunctionTemplate.NullHandling.INTERNAL, isRandom = false, isBinaryCommutative = false, costCategory = FunctionTemplate.FunctionCostCategory.COMPLEX) public static class MyArraySum implements DrillAggFunc { @Param RepeatedVarCharHolder listToSearch; @Workspace NullableBigIntHolder count; @Workspace NullableBigIntHolder sum; @Workspace NullableVarCharHolder vc; @Output BigIntHolder out; @Override public void setup() { count.value=0; sum.value = 0; } @Override public void add() { int c = listToSearch.end - listToSearch.start; int val = 0; try { for(int i=0; ic; i++){ listToSearch.vector.getAccessor().get(i, vc); String inputStr = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(vc.start, vc.end, vc.buffer); val = Integer.parseInt(inputStr); sum.value = sum.value + val; } } catch (Exception e) { val = 0; } count.value = count.value + 1; } Example select statement: SELECT MyArraySum(my_arrays) FROM (SELECT t.trans_info.prod_id as my_arrays FROM `dfs.clicks`.`./clicks/clicks.campaign.json` t limit 5); On Sat, Jul 4, 2015 at 6:22 PM, Ted Dunning ted.dunn...@gmail.com wrote: Jim, I think that you may be having trouble with aggregators in general. Have you been able to build *any* aggregator of anything? I haven't. When I try to build an aggregator of int's or doubles, I get a very persistent problem with Drill even seeing my aggregates: 0: jdbc:drill:zk=local *select sum_int(employee_id) from cp.`employee.json`;* Jul 04, 2015 4:19:35 PM org.apache.calcite.sql.validate.SqlValidatorException init SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature sum_int(ANY) Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException init SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 27: No match found for function signature sum_int(ANY) *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match found for function signature sum_int(ANY)* *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010 http://10.0.1.2:31010] (state=,code=0)* 0: jdbc:drill:zk=local *select sum_int(cast(employee_id as int)) from cp.`employee.json`*; Jul 04, 2015 4:19:45 PM
Re: Some questions on UDFs
Jim, I think that you may be having trouble with aggregators in general. Have you been able to build *any* aggregator of anything? I haven't. When I try to build an aggregator of int's or doubles, I get a very persistent problem with Drill even seeing my aggregates: 0: jdbc:drill:zk=local *select sum_int(employee_id) from cp.`employee.json`;* Jul 04, 2015 4:19:35 PM org.apache.calcite.sql.validate.SqlValidatorException init SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature sum_int(ANY) Jul 04, 2015 4:19:35 PM org.apache.calcite.runtime.CalciteException init SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 27: No match found for function signature sum_int(ANY) *Error: PARSE ERROR: From line 1, column 8 to line 1, column 27: No match found for function signature sum_int(ANY)* *[Error Id: 91b78fa6-6dd1-4214-a85f-c2bf2c393145 on 10.0.1.2:31010 http://10.0.1.2:31010] (state=,code=0)* 0: jdbc:drill:zk=local *select sum_int(cast(employee_id as int)) from cp.`employee.json`*; Jul 04, 2015 4:19:45 PM org.apache.calcite.sql.validate.SqlValidatorException init SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: No match found for function signature sum_int(NUMERIC) Jul 04, 2015 4:19:45 PM org.apache.calcite.runtime.CalciteException init SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 40: No match found for function signature sum_int(NUMERIC) *Error: PARSE ERROR: From line 1, column 8 to line 1, column 40: No match found for function signature sum_int(NUMERIC)* *[Error Id: f649fc85-6b6a-4468-9a4f-bfef0b23d06b on 10.0.1.2:31010 http://10.0.1.2:31010] (state=,code=0)* 0: jdbc:drill:zk=local It looks like there is some undocumented subtlety about how to register an aggregator. On Sat, Jul 4, 2015 at 4:08 PM, Jim Bates jba...@maprtech.com wrote: I'm working on the same thing. I want to aggregate a list of values. It has been a search and guess game for the most part. I'm still stuck in the process of getting the values all into a list. The writers look interesting but for aggregation functions it looks like the input is the param and output objects can't hold the aggregations steps. The Workspace is where that happens. If I try and use a Writer in a workspace it won't load and tells me to change it to Holders which was why I was using them to start with. Maybe I'm missing the architecture of the agg function. It looked like it was @Param comes in - initialize @Workspace vars in setup - process data through @Workspace vars in add - finalize @Output in output. So I'm back to trying to figure out how to create a RepeatedBigIntHolder or a RepeatedVarCharHolder... On Sat, Jul 4, 2015 at 4:53 PM, Ted Dunning ted.dunn...@gmail.com wrote: I am working on trying to build any kind of list constructing aggregator and having absolute fits. To simplify life, I decided to just build a generic list builder that is a scalar function that returns a list containing its argument. Thus zoop(3) = [3], zoop('abc') = 'abc' and zoop([1,2,3]) = [[1,2,3]]. The ComplexWriter looks like the place to go. As usual, the complete lack of comments in most of Drill makes this very hard since I have to guess what works and what doesn't. In my code, I note that ComplexWriter has a nice rootAsList() method. I used this in zip and it works nicely to construct lists for output. I note that the resulting ListWriter has a method copyReader(FieldReader var1) which looks really good. Unfortunately, the only implementation of copyReader() is in AbstractFieldWriter and it looks this: public void copyReader(FieldReader reader) { this.fail(Copy FieldReader); } I would like to formally say at this point WTF? In digging in further, I see other methods that look handy like public void write(IntHolder holder) { this.fail(Int); } And then in looking at implementations, it looks like there is a combinatorial explosion because every type seems to need a write method for every other type. What is the thought here? How can I copy an arbitrary value into a list? My next thought was to build code that dispatches on type. There is a method called getType() on the FieldReader. Unfortunately, that drives into code generated by protoc and I see no way to dispatch on the type of an incoming value. How is this supposed to work? On Sat, Jul 4, 2015 at 2:14 PM, mehant baid baid.meh...@gmail.com wrote: For a detailed example on using ComplexWriter interface you can take a look at the Mappify https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Mappify.java (kvgen) function. The function itself is very simple however it makes use of the utility methods in MappifyUtility