[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252816#comment-16252816
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/984


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252799#comment-16252799
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/984
  
Discussed in person. Conflicts are resolved. We have a plan for merging 
with other in-flight PRs.
+1 (again)


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252747#comment-16252747
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/984
  
Resolved conflicts.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251341#comment-16251341
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/984
  
@ilooner as far as know you are doing presentation today connected with 
this pull request. If overall approach will be accepted, then we'll try to 
merge this pull request. Please resolve the conflicts as well.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249959#comment-16249959
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/984
  
@arina-ielchiieva I can put **DRILL-5783** in a separate PR **DRILL-5841** 
and **DRILL-5894** are however tightly coupled and must be in the same PR. 
Unfortunately **DRILL-5841** and **DRILL-5894** touched many files because all 
the unit tests were updated so the PR will still remain large. 


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249500#comment-16249500
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/984
  
@ilooner changes in 361 files seems to be too much :) I suggest you split 
this PR at least at three according to the addressed issues. So we try to merge 
smaller chunks.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248288#comment-16248288
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/984
  
After squashing the commits and rebasing I noticed the windows functional 
tests were failing. The issue was caused by replacing the '/' constant in 
FileUtils (now renamed to DrillFileUtils) in **ClassTransformer** and 
**FunctionInitializer** with File.separator. This broke the build because 
File.separator is '\' on windows and in the context of **ClassTransformer** and 
**FunctionInitializer** the separator is used to look things up in the 
classpath. Looking things up in the classpath requires '/' on both windows and 
linux, hence I added back the '/' constant to DrillFileUtils along with a 
comment explaining its purpose and used it in **ClassTransformer** and 
**FunctionInitializer**. *Note:* this was a minor fix that impacted only a few 
lines.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246617#comment-16246617
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150097140
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Ah, forgot that the file defines an interface, not a class. (The situation 
I described was an interface nested inside a class.) So, you're good.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246608#comment-16246608
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096261
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

IntelliJ gave a warning that the modifier is redundant. Also an interface 
nested inside another interface is public by default.

https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246639#comment-16246639
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r15009
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

This is used in the DrillTestWrapper to verify the ordering of results. I 
agree this is not suitable for equality tests, but it's intended to be used 
only for ordering tests. I didn't add support for all the supported RowSet 
types because we would first have to move DrillTestWrapper to use RowSets 
(currently it uses Maps and Lists to represent data). Currently it is not used 
by RowSets, but the intention is to move DrillTestWrapper to use RowSets and 
then make this comparator operate on RowSets, but that will be an incremental 
process.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246609#comment-16246609
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096444
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Sounds Good


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246442#comment-16246442
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150072992
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Great! This does not yet handle nested tuples or arrays; in part because 
the row set work for that is still sitting in PR #914. You can update this to 
be aware of maps and map arrays once that PR is committed.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246441#comment-16246441
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073673
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

Defined here, but not used in this file. Does not include all types that 
Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not 
allow for ranges for floats & doubles as does JUnit. (Two floats are seldom 
exactly equal.)


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246443#comment-16246443
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Aren't nested interfaces `protected` by default? Just had to change one 
from default to `public` so I could use it in another package...


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> The work for this PR has had several other PRs batched together with it. The 
> full description of work is the following:
> DRILL-5783
> * A unit test is created for the priority queue in the TopN operator
> * The code generation classes passed around a completely unused function 
> registry reference in some places so I removed it.
> * The priority queue had unused parameters for some of its methods so I 
> removed them.
> DRILL-5841
> * There were many many ways in which temporary folders were created in unit 
> tests. I have unified the way these folders are created with the 
> DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. All the unit tests 
> have been updated to use these. The test watchers create temp directories in 
> ./target//. So all the files generated and used in the context of a test can 
> easily be found in the same consistent location.
> * This change should fix the sporadic hashagg test failures, as well as 
> failures caused by stray files in /tmp
> DRILL-5894
> * dfs_test is used as a storage plugin throughout the unit tests. This is 
> highly confusing and we can just use dfs instead.
> *Misc*
> * General code cleanup.
> * There are many places where String.format is used unnecessarily. The test 
> builder methods already use String.format for you when you pass them args. I 
> cleaned some of these up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235043#comment-16235043
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/984
  
@paul-rogers This is ready for another round of review. I have also 
responded to / addressed your comments.



> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234814#comment-16234814
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148396417
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/TableFileBuilder.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.List;
+
+public class TableFileBuilder
+{
+  private final List columnNames;
+  private final List columnFormatters;
+  private final List rows = Lists.newArrayList();
+  private final String rowString;
+
+  public TableFileBuilder(List columnNames, List 
columnFormatters) {
--- End diff --

I've switched this around a bit. The general flow is now the following:

  1. A **RowSet** is constructed with a **RowSetBuilder**
  1. A **JsonFileBuilder** is created and configured to use the **RowSet** 
that was just created.
  1. The **JsonFileBuilder** reads the **RowSet** and writes it out to a 
file in a json format.



> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234805#comment-16234805
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148395420
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -377,21 +386,21 @@ public void 
testReadOldMetadataCacheFileOverrideCorrection() throws Exception {
 
   @Test
   public void testReadNewMetadataCacheFileOverOldAndNewFiles() throws 
Exception {
-String table = format("dfs.`%s`", new 
Path(getDfsTestTmpSchemaLocation(), 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER));
-copyMetaDataCacheToTempReplacingInternalPaths(
+File meta = dirTestWatcher.copyResourceToRoot(
 
"parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt",
-MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME);
+Paths.get(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME).toString());
--- End diff --

I've cleaned this up a bit. Of the two patterns for building paths and 
files I prefer pattern **A** because it is cleaner.

**Pattern A:**
```
myPath.resolve("subDir1")
  .resolve("subDir2")
  .toFile();
```

**Pattern B:**
```
new File(new File(myFile, "subDir1"), "subDir2")
```


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234797#comment-16234797
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148394268
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java
 ---
@@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
 testNewColumns(false);
   }
 
-
   @Test
   public void testNewColumnsLegacy() throws Exception {
 testNewColumns(true);
   }
 
   private void testNewColumns(boolean testLegacy) throws Exception {
 final int record_count = 1;
-String dfs_temp = getDfsTestTmpSchemaLocation();
-System.out.println(dfs_temp);
-File table_dir = new File(dfs_temp, "newColumns");
-table_dir.mkdir();
-try (BufferedOutputStream os = new BufferedOutputStream(new 
FileOutputStream(new File(table_dir, "a.json" {
-  String format = "{ a : %d, b : %d }%n";
-  for (int i = 0; i <= record_count; i += 2) {
-os.write(String.format(format, i, i).getBytes());
-  }
+final String tableDirName = "newColumns";
+
+TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", 
"b"), Lists.newArrayList("%d", "%d"));
--- End diff --

I've made the JsonFileBuilder, which takes a RowSet and writes it out to a 
file as json.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234796#comment-16234796
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148393893
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
--- End diff --

Removed BatchUtils


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227798#comment-16227798
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148157170
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227797#comment-16227797
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148157140
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
--- End diff --

Done


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227708#comment-16227708
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r148144235
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
+return makeSubDir(relPath, DirType.ROOT);
+  }
+
+  public File makeTestTmpSubDir(String relPath) {
+return makeSubDir(relPath, DirType.TEST_TMP);
+  }
+
+  private File makeSubDir(String relPath, DirType type) {
+File subDir = new File(getDir(type), relPath);
+subDir.mkdirs();
+return subDir;
+  }
+
+  /**
+   * This preserves the relative path of the directory in root
+   * @param relPath
+   * @return
+   */
+  public File copyResourceToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyFileToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.PROJECT, 
DirType.ROOT);
+  }
+
+  public File copyResourceToRoot(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyResourceToTestTmp(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.TEST_TMP);
+  }
+
+  private File copyTo(String relPath, String destPath, TestTools.DataType 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221221#comment-16221221
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147262655
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ExampleTest.java ---
@@ -59,6 +59,9 @@
 @Ignore
 public class ExampleTest {
 
+  @Rule
+  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
--- End diff --

I added java doc, and added example of how it's used in secondTest


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221210#comment-16221210
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147261203
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -226,43 +217,16 @@ private void createConfig(FixtureBuilder builder) 
throws Exception {
 
   serviceSet = null;
   usesZk = true;
-  isLocal = false;
 } else {
   // Embedded Drillbit.
 
   serviceSet = RemoteServiceSet.getLocalServiceSet();
-  isLocal = true;
 }
   }
 
-  private void startDrillbits(FixtureBuilder builder) throws Exception {
-//// Ensure that Drill uses the log directory determined here rather 
than
-//// it's hard-coded defaults. WIP: seems to be needed some times but
-//// not others.
-//
-//String logDir = null;
-//if (builder.tempDir != null) {
-//  logDir = builder.tempDir.getAbsolutePath();
-//}
-//if (logDir == null) {
-//  logDir = config.getString(ExecConstants.DRILL_TMP_DIR);
-//  if (logDir != null) {
-//logDir += "/drill/log";
-//  }
-//}
-//if (logDir == null) {
-//  logDir = "/tmp/drill";
-//}
-//new File(logDir).mkdirs();
-//System.setProperty("drill.log-dir", logDir);
-
-dfsTestTempDir = makeTempDir("dfs-test");
-
-// Clean up any files that may have been left from the
-// last run.
-
-preserveLocalFiles = builder.preserveLocalFiles;
-removeLocalFiles();
+  private void startDrillbits() throws Exception {
+dfsTestTempDir = new File(getRootDir(), "dfs-test");
+dfsTestTempDir.mkdirs();
--- End diff --

This is possible through the BaseDirTestWatcher. It is configured through 
the BaseDireTestWatcher's constructor. If you pass it false it will not delete 
your directories at the end of each test.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221180#comment-16221180
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147258230
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -213,7 +204,7 @@ private void configureZk(FixtureBuilder builder) {
 }
   }
 
-  private void createConfig(FixtureBuilder builder) throws Exception {
+  private void createConfig() throws Exception {
--- End diff --

They should be. The only change I made is that I didn't require passing the 
FixtureBuilder as an arg, because the class is already holding a reference to 
it.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221167#comment-16221167
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147257244
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
--- End diff --

Fair enough. I would prefer to use File objects to represent concrete files 
or directories. Java.nio.Path objects are useful because they can represent 
relative paths, and provide a bunch of methods for manipulating paths. I'll 
update the methods I introduced to use jva.io.Files and java.nio.Paths


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221157#comment-16221157
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147256621
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java
 ---
@@ -147,7 +152,7 @@ public void sortOneKeyDescendingExternalSortLegacy() 
throws Throwable {
   }
 
   private void sortOneKeyDescendingExternalSort(boolean testLegacy) throws 
Throwable {
-FixtureBuilder builder = ClusterFixture.builder()
+FixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
--- End diff --

Is there a case where the tmp directories are optional? The Drillbit will 
always have to be configured with a drill.tmp-dir at minimum, so I think 
it should be required to use a BaseDirTestWatcher with a ClusterFixture. I 
could move passing the dirTestWatcher to the build method instead of the 
constructor. Let me know what you think.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221146#comment-16221146
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147255211
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

Updated package-info.java and ExampleTest.java



> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220966#comment-16220966
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229478
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java 
---
@@ -33,8 +35,11 @@
 @Category(UnlikelyTest.class)
 public class TestBugFixes extends BaseTestQuery {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestBugFixes.class);
-  private static final String WORKING_PATH = TestTools.getWorkingPath();
-  private static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyResourceToRoot("/bugs/DRILL-4192");
--- End diff --

I have made all the path references relative in the tests now



> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220968#comment-16220968
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229671
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
   // partitioned with 1.4.0, no certain metadata regarding the date 
corruption status.
   // The same detection approach of the corrupt date values as for the 
files partitioned with 1.2.0
+  private static final String PARTITIONED_1_4_FOLDER = 
"partitioned_with_corruption_4203";
   private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_4_FOLDER).toString();
   private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  
"/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  private static final String PARTITIONED_1_9_FOLDER = 
"1_9_0_partitioned_no_corruption";
   private static final String CORRECT_PARTITIONED_DATES_1_9_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_9_FOLDER).toString();
   private static final String VARCHAR_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
+  "/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
   private static final String DATE_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition";
+  "/parquet/4203_corrupt_dates/fewtypes_datepartition";
   private static final String EXCEPTION_WHILE_PARSING_CREATED_BY_META =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
+  "/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
   private static final String CORRECT_DATES_1_6_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
-  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
+  
"/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
   private static final String 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER = "mixed_partitioned";
 
-
   @BeforeClass
   public static void initFs() throws Exception {
+dirTestWatcher.copyResourceToRoot("/parquet/4203_corrupt_dates");
--- End diff --

fixed


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220967#comment-16220967
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229627
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
--- End diff --

Updated all paths to be relative in the tests


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220964#comment-16220964
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147229235
  
--- Diff: common/src/test/java/org/apache/drill/test/SubDirTestWatcher.java 
---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import org.junit.rules.TestWatcher;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.util.List;
+
+public class SubDirTestWatcher extends TestWatcher {
--- End diff --

Done


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219696#comment-16219696
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147009092
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.TopN;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Random;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.test.TestBuilder;
+import org.apache.drill.categories.OperatorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.logical.data.Order;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.compile.ClassBuilder;
+import org.apache.drill.exec.compile.CodeCompiler;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.pop.PopUnitTestBase;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.ExpandableHyperContainer;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.server.options.OptionSet;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.FixtureBuilder;
+import org.apache.drill.test.OperatorFixture;
+import org.apache.drill.test.BatchUtils;
+import org.apache.drill.test.DirTestWatcher;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(OperatorTest.class)
+public class TopNBatchTest extends PopUnitTestBase {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
+
+  // Allows you to look at generated code after tests execute
+  @Rule
+  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
+
+  /**
+   * Priority queue unit test.
+   * @throws Exception
+   */
+  @Test
+  public void priorityQueueOrderingTest() throws Exception {
+Properties properties = new Properties();
+properties.setProperty(ClassBuilder.CODE_DIR_OPTION, 
dirTestWatcher.getDirPath());
+
+DrillConfig drillConfig = DrillConfig.create(properties);
+OptionSet optionSet = new OperatorFixture.TestOptionSet();
+
+FieldReference expr = FieldReference.getWithQuotedRef("colA");
+Order.Ordering ordering = new 
Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
+List orderings = Lists.newArrayList(ordering);
+
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
+
+List cols = Lists.newArrayList(colA, colB);
+BatchSchema batchSchema = new 
BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
+
+try (RootAllocator allocator = new RootAllocator(100_000_000)) {
+  VectorContainer expectedVectors = new RowSetBuilder(allocator, 
batchSchema)
+.add(110, 10)
+.add(109, 9)
+.add(108, 8)
+.add(107, 7)
+.add(106, 6)
+.add(105, 5)
+.add(104, 4)
+.add(103, 3)
+.add(102, 2)
+

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219693#comment-16219693
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147007945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
 "  nations.N_NAME,\n" +
 "  regions.R_NAME\n" +
 "FROM\n" +
-"  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` 
nations\n" +
+"  dfs.`/sample-data/nation.parquet` nations\n" +
--- End diff --

Just mentioned this above but will repeat here.I have now removed 
**dfs_test** completely. There was no reason for it to be added and it was 
inconsistently being mixed with **dfs**. If you want to query a file on the 
local filesystem that is not on the classpath just using **dfs** will be 
sufficient now.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219691#comment-16219691
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147007663
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

I have now removed **dfs_test** completely. There was no reason for it to 
be added and it was inconsistently being mixed with **dfs**. The **dfs** 
workspaces are automatically mapped to the correct temp directories for you 
provided that you use **BaseTestQuery** or the **ClusterFixture**. I will 
update **org.apache.drill.test.package-info.java** with the theory of how this 
works and will add a simple example to **ExampleTest.java**


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211757#comment-16211757
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145827440
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
 "  nations.N_NAME,\n" +
 "  regions.R_NAME\n" +
 "FROM\n" +
-"  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` 
nations\n" +
+"  dfs.`/sample-data/nation.parquet` nations\n" +
--- End diff --

I accidentally changed it to dfs, will switch it back to dfs_test for 
consistency. dfs and dfs_test>/b> are configured to point to the same 
temp directories so they are in fact interchangeable. This configuration 
happens in the openClient method of BaseTestQuery and in the 
configureStoragePlugins method of the ClusterFixture. 

Throughout making this change I was scratching my head as to why 
dfs_test is used instead of dfs. After going through the code I 
don't see any reason as to why dfs_test was created in the first place. I think 
we should actually remove dfs_test and just use dfs everywhere 
for uniformity. I'll go ahead and do that now.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211693#comment-16211693
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145815670
  
--- Diff: exec/java-exec/src/test/resources/topN/one_key_sort.json ---
@@ -12,11 +12,11 @@
 pop:"mock-scan",
 url: "http://apache.org;,
 entries:[
-{records: 1000, types: [
+{records: 10, types: [
--- End diff --

The original test was checking to make sure the the Top N results returned 
are ordered. It was never really necessary to generate that much data. The 
benefit of reducing the amount of data generated is that the test runs faster.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211533#comment-16211533
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145790506
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -908,6 +908,15 @@ public void generateTestData(int count) {
 }
 
   <#else> <#-- type.width <= 8 -->
+@Override
+public void add(Object value) {
+  int index = accessor.getValueCount();
+  int valueCount = index + 1;
+  setValueCount(valueCount);
+
+  set(index, (${type.javaType}) value);
--- End diff --

All the issues with this make sense. I should have deleted this when I 
switched to using the RowSet classes, but forgot too :). Removing this now.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210368#comment-16210368
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575975
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
@@ -21,29 +21,25 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.common.util.TestTools;
+import org.apache.drill.test.BaseTestQuery;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category(OperatorTest.class)
-public class TestAggNullable extends BaseTestQuery{
+public class TestAggNullable extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAggNullable.class);
 
-  static final String WORKING_PATH = TestTools.getWorkingPath();
-  static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
-
   private static void enableAggr(boolean ha, boolean sa) throws Exception {
 
-test(String.format("alter session set `planner.enable_hashagg` = %s", 
ha ? "true":"false"));
-test(String.format("alter session set `planner.enable_streamagg` = 
%s", sa ? "true":"false"));
+test("alter session set `planner.enable_hashagg` = %s", ha);
+test("alter session set `planner.enable_streamagg` = %s", sa);
--- End diff --

We can update these after the other PR goes in then.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210367#comment-16210367
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575781
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -247,7 +248,7 @@ drill.exec: {
 }
   },
   sort: {
-purge.threshold : 10,
+purge.threshold : 1000,
--- End diff --

Oh my. Thanks for catching that!


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210365#comment-16210365
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575588
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -116,6 +116,14 @@ public void clear() {
 }
   }
 
+  public static int getBatchIndex(int sv4Index) {
+return (sv4Index >> 16) & 0x;
+  }
+
+  public static int getRecordIndex(int sv4Index) {
+return (sv4Index) & 0x;
+  }
--- End diff --

Didn't do any performance testing. I mainly intended to use these methods 
for testing purposes. For example I use them in 
org.apache.drill.test.BatchUtils .


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210360#comment-16210360
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145575310
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

Agreed. Pritesh has asked me to look at Codegen improvements in the long 
term, so I will work on this down the line.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210359#comment-16210359
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145574915
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

We cannot declare the MappingSets as constants because they contain some 
internal state (the mappingIndex field). If they were constants the state could 
become corrupt in the case where multiple queries run and create PriorityQueues 
on the same node simultaneously, so we have to create new MappingSets each time 
we create a new priority queue.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210350#comment-16210350
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145574026
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
+   * @throws SchemaChangeException
+   */
+  void generate() throws SchemaChangeException;
+
+  /**
+   * Retrieves the final priority queue HyperBatch containing the results. 
Note: this should be called
+   * after {@link #generate()}.
+   * @return The final priority queue HyperBatch containing the results.
+   */
+  VectorContainer getHyperBatch();
+
+  SelectionVector4 getSv4();
+
+  /**
+   * Retrieves the selection vector used to select the elements in the 
priority queue from the hyper batch
+   * provided by the {@link #getHyperBatch()} method. Note: this 
should be called after {@link #generate()}.
+   * @return The selection vector used to select the elements in the 
priority queue.
+   */
+  SelectionVector4 getFinalSv4();
--- End diff --

The code already works that way. There is a config option 
drill.exec.sort.purge.threshold which controls the maximum number of 
batches allowed in the hyper batch. Once the threshold is exceeded the top N 
values are consolidated into a single batch and the process is repeated. There 
is an issue in the case where the limit is large. Ex. 100,000,000 . In this 
case the operator will keep all the records in memory and die. There is a jira 
created to address this issue: 
[https://issues.apache.org/jira/browse/DRILL-5823]


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210341#comment-16210341
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145572887
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
--- End diff --

fixed


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210335#comment-16210335
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner-mapr commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145572262
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
--- End diff --

I think the priority queue is generic and not specific to TopN. I agree we 
should make an effort to consolidate the code generated data structures.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210314#comment-16210314
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner-mapr commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145571158
  
--- Diff: common/src/test/java/org/apache/drill/test/DirTestWatcher.java ---
@@ -32,23 +32,50 @@
 public class DirTestWatcher extends TestWatcher {
   private String dirPath;
   private File dir;
+  private boolean deleteDirAtEnd = true;
--- End diff --

Done


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210210#comment-16210210
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145561518
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, 
String pluginName,
   public static final String EXPLAIN_PLAN_TEXT = "text";
   public static final String EXPLAIN_PLAN_JSON = "json";
 
-  public static FixtureBuilder builder() {
-FixtureBuilder builder = new FixtureBuilder()
+  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
--- End diff --

Jeez Paul, please start a review rather than making single review comments. 
I just got 39 emails from you, and so did everyone else on dev@drill.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210209#comment-16210209
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner-mapr commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145561373
  
--- Diff: common/src/main/java/org/apache/drill/common/util/TestTools.java 
---
@@ -17,15 +17,28 @@
  */
 package org.apache.drill.common.util;
 
+import java.io.File;
+import java.io.IOException;
 import java.nio.file.Paths;
 
+import org.apache.commons.io.FileUtils;
 import org.junit.rules.TestName;
 import org.junit.rules.TestRule;
 import org.junit.rules.Timeout;
 
+import static org.apache.drill.common.util.TestTools.DataType.PROJECT;
+import static org.apache.drill.common.util.TestTools.DataType.RESOURCE;
+
 public class TestTools {
   // private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestTools.class);
 
+  public enum DataType {
--- End diff --

Done changed to FileSource


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210168#comment-16210168
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145298677
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/agg/TestHashAggrSpill.java
 ---
@@ -43,146 +44,122 @@
 import static org.junit.Assert.assertTrue;
 
 /**
- *  Test spilling for the Hash Aggr operator (using the mock reader)
+ * Test spilling for the Hash Aggr operator (using the mock reader)
  */
 @Category({SlowTest.class, OperatorTest.class})
 public class TestHashAggrSpill {
 
-private void runAndDump(ClientFixture client, String sql, long 
expectedRows, long spillCycle, long fromSpilledPartitions, long 
toSpilledPartitions) throws Exception {
-String plan = client.queryBuilder().sql(sql).explainJson();
-
-QueryBuilder.QuerySummary summary = 
client.queryBuilder().sql(sql).run();
-if ( expectedRows > 0 ) {
-assertEquals(expectedRows, summary.recordCount());
-}
-// System.out.println(String.format(" \n Results: %,d 
records, %d batches, %,d ms\n ", summary.recordCount(), 
summary.batchCount(), summary.runTimeMs() ) );
-
-//System.out.println("Query ID: " + summary.queryIdString());
-ProfileParser profile = 
client.parseProfile(summary.queryIdString());
-//profile.print();
-List ops = 
profile.getOpsOfType(UserBitShared.CoreOperatorType.HASH_AGGREGATE_VALUE);
-
-assertTrue( ! ops.isEmpty() );
-// check for the first op only
-ProfileParser.OperatorProfile hag0 = ops.get(0);
-long opCycle = 
hag0.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
-assertEquals(spillCycle, opCycle);
-long op_spilled_partitions = 
hag0.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
-assertTrue( op_spilled_partitions >= fromSpilledPartitions && 
op_spilled_partitions <= toSpilledPartitions );
-/* assertEquals(3, ops.size());
-for ( int i = 0; i < ops.size(); i++ ) {
-ProfileParser.OperatorProfile hag = ops.get(i);
-long cycle = 
hag.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
-long num_partitions = 
hag.getMetric(HashAggTemplate.Metric.NUM_PARTITIONS.ordinal());
-long spilled_partitions = 
hag.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
-long mb_spilled = 
hag.getMetric(HashAggTemplate.Metric.SPILL_MB.ordinal());
-System.out.println(String.format("(%d) Spill cycle: %d, num 
partitions: %d, spilled partitions: %d, MB spilled: %d", i,cycle, 
num_partitions, spilled_partitions,
-mb_spilled));
-} */
-}
+  @Rule
+  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
 
-/**
- *  A template for Hash Aggr spilling tests
- *
- * @throws Exception
- */
-private void testSpill(long maxMem, long numPartitions, long 
minBatches, int maxParallel, boolean fallback ,boolean predict,
-   String sql, long expectedRows, int cycle, int 
fromPart, int toPart) throws Exception {
-LogFixture.LogFixtureBuilder logBuilder = LogFixture.builder()
-  .toConsole()
-  //.logger("org.apache.drill.exec.physical.impl.aggregate", 
Level.INFO)
-  .logger("org.apache.drill", Level.WARN)
-  ;
-
-FixtureBuilder builder = ClusterFixture.builder()
-  .sessionOption(ExecConstants.HASHAGG_MAX_MEMORY_KEY,maxMem)
-  
.sessionOption(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY,numPartitions)
-  
.sessionOption(ExecConstants.HASHAGG_MIN_BATCHES_PER_PARTITION_KEY,minBatches)
-  
.configProperty(ExecConstants.SYS_STORE_PROVIDER_LOCAL_ENABLE_WRITE, false)
-  .sessionOption(PlannerSettings.FORCE_2PHASE_AGGR_KEY,true)
-  // .sessionOption(PlannerSettings.EXCHANGE.getOptionName(), true)
-  .sessionOption(ExecConstants.HASHAGG_FALLBACK_ENABLED_KEY, 
fallback)
-  
.sessionOption(ExecConstants.HASHAGG_USE_MEMORY_PREDICTION_KEY,predict)
-
-  .maxParallelization(maxParallel)
-  .saveProfiles()
-  //.keepLocalFiles()
-  ;
-String sqlStr = sql != null ? sql :  // if null then use this 
default query
-  "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) FROM 
`mock`.`employee_1200K` GROUP BY empid_s17, dept_i, branch_i";
-
-try (LogFixture 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210150#comment-16210150
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144936263
  
--- Diff: common/src/test/java/org/apache/drill/test/SubDirTestWatcher.java 
---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import org.junit.rules.TestWatcher;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.util.List;
+
+public class SubDirTestWatcher extends TestWatcher {
--- End diff --

Maybe a Javadoc comment to explain what this does? From the name, it must 
have something to do with a temporary sub dir... But, how does this relate to 
the DirTestWatcher?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210157#comment-16210157
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292056
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

One thought worth sharing here. Today, our code cache uses source code as 
the key to look for an existing compiled version of a class. But, this is 
inefficient: we have to generate the source code, transform it, and store two 
copies to act as keys.

Better would be to have a "class definition" object that holds all 
parameters needed to generate the code. Store these in the cache. Likely, these 
will be much smaller than the generated code. Using these avoids the need to 
generate the code to see if we already have a copy of that code.

The class definition idea works as long as the definition is a closed set: 
everything needed to generate the code is in that one definition object.

Your function here is close: it passes in a closed set of items. The open 
bits are the expression and system/session options. For options, we can copy 
the few values we need into our definition. The expression might require a bit 
more thought to capture.

So, if we can evolve your idea further, we can get a potential large 
per-schema performance improvement by avoiding code generation when doing a 
repeat query.

Too much to fix here; but something to think about moving forward.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210169#comment-16210169
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145505524
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
--- End diff --

Again, do we want an absolute-looking path when we actually have a relative 
path?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210176#comment-16210176
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145516858
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
--- End diff --

See `HyperRowSet`


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210158#comment-16210158
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292655
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -247,7 +248,7 @@ drill.exec: {
 }
   },
   sort: {
-purge.threshold : 10,
+purge.threshold : 1000,
--- End diff --

Are you reverting your own change here?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210162#comment-16210162
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292505
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -116,6 +116,14 @@ public void clear() {
 }
   }
 
+  public static int getBatchIndex(int sv4Index) {
+return (sv4Index >> 16) & 0x;
+  }
+
+  public static int getRecordIndex(int sv4Index) {
+return (sv4Index) & 0x;
+  }
--- End diff --

I see you gave into the temptation to wrap these magic expressions in 
methods rather than repeating them over and over in generated code. The worry 
probably was that the overhead of a method call per value would be expensive. 
Did you do a bit of performance testing to demonstrate that Java inlines these 
methods so we get the same performance with this code as the original?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210155#comment-16210155
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144945872
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
--- End diff --

TopN has a priority queue. So does the sort. Are there others? Should we 
have a single, shared implementation that we can optimize and test once, but 
benefit from multiple times? Or, is the TopN version different enough from the 
sort version that we need two versions? If we need two, should improvements 
here (especially for code gen) be applied to the sort version?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210149#comment-16210149
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144935089
  
--- Diff: common/src/main/java/org/apache/drill/common/util/TestTools.java 
---
@@ -17,15 +17,28 @@
  */
 package org.apache.drill.common.util;
 
+import java.io.File;
+import java.io.IOException;
 import java.nio.file.Paths;
 
+import org.apache.commons.io.FileUtils;
 import org.junit.rules.TestName;
 import org.junit.rules.TestRule;
 import org.junit.rules.Timeout;
 
+import static org.apache.drill.common.util.TestTools.DataType.PROJECT;
+import static org.apache.drill.common.util.TestTools.DataType.RESOURCE;
+
 public class TestTools {
   // private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestTools.class);
 
+  public enum DataType {
--- End diff --

`ResourceType`? `FileType`? `FileSource`? `ResourceScope`?

This is a symbolic reference to a resource root, not really a kind of a 
data item...


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210185#comment-16210185
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145523267
  
--- Diff: exec/java-exec/src/test/resources/topN/one_key_sort.json ---
@@ -12,11 +12,11 @@
 pop:"mock-scan",
 url: "http://apache.org;,
 entries:[
-{records: 1000, types: [
+{records: 10, types: [
--- End diff --

This change reduces the record count by two orders of magnitude. Do we know 
if this results in testing the same functionality as the original test?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210159#comment-16210159
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145292776
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
@@ -21,29 +21,25 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.common.util.TestTools;
+import org.apache.drill.test.BaseTestQuery;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category(OperatorTest.class)
-public class TestAggNullable extends BaseTestQuery{
+public class TestAggNullable extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAggNullable.class);
 
-  static final String WORKING_PATH = TestTools.getWorkingPath();
-  static final String TEST_RES_PATH = WORKING_PATH + "/src/test/resources";
-
   private static void enableAggr(boolean ha, boolean sa) throws Exception {
 
-test(String.format("alter session set `planner.enable_hashagg` = %s", 
ha ? "true":"false"));
-test(String.format("alter session set `planner.enable_streamagg` = 
%s", sa ? "true":"false"));
+test("alter session set `planner.enable_hashagg` = %s", ha);
+test("alter session set `planner.enable_streamagg` = %s", sa);
--- End diff --

I was about to suggest using the functions created for this purpose. But, 
then I realized those changes are in a PR that has not yet been reviewed...


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210180#comment-16210180
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145519174
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -213,7 +204,7 @@ private void configureZk(FixtureBuilder builder) {
 }
   }
 
-  private void createConfig(FixtureBuilder builder) throws Exception {
+  private void createConfig() throws Exception {
--- End diff --

There are actually three ZK cases:

* Does not use ZK at all.
* Uses ZK, but it is local (mini ZK started within test JDK)
* Uses ZK, but ZK is remote. (Seldom used in tests, needed for an embedded 
Drillbit.)

Are these semantics preserved with the present changes?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210154#comment-16210154
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144943191
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
@@ -90,12 +89,11 @@
   private String generatedCode;
   private String generifiedCode;
 
-  CodeGenerator(TemplateClassDefinition definition, 
FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
-this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, 
optionManager);
+  CodeGenerator(TemplateClassDefinition definition, OptionSet 
optionManager) {
+this(ClassGenerator.getDefaultMapping(), definition, optionManager);
--- End diff --

Good analysis. It was a real pain to lug the function registry around. 
Presumably the registry is used elsewhere in code gen; perhaps in the 
expression materializer.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210161#comment-16210161
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145293197
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAggNullable.java ---
@@ -61,10 +57,8 @@ public void testHashAggNullableColumns() throws 
Exception {
 
   @Test  // StreamingAgg on nullable columns
   public void testStreamAggNullableColumns() throws Exception {
-String query1 = String.format("select t2.b2 from 
dfs_test.`%s/jsoninput/nullable2.json` t2 " +
-" group by t2.b2", TEST_RES_PATH);
-String query2 = String.format("select t2.a2, t2.b2 from 
dfs_test.`%s/jsoninput/nullable2.json` t2 " +
-" group by t2.a2, t2.b2", TEST_RES_PATH);
+String query1 = "select t2.b2 from cp.`/jsoninput/nullable2.json` t2 
group by t2.b2";
+String query2 = "select t2.a2, t2.b2 from 
cp.`/jsoninput/nullable2.json` t2 group by t2.a2, t2.b2";
--- End diff --

Very nice improvement; this has been bugging me for months. Why doctor up 
the query string when we have perfectly fine workspace mechanism.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210184#comment-16210184
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145521268
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -584,11 +492,14 @@ public static void defineWorkspace(Drillbit drillbit, 
String pluginName,
   public static final String EXPLAIN_PLAN_TEXT = "text";
   public static final String EXPLAIN_PLAN_JSON = "json";
 
-  public static FixtureBuilder builder() {
-FixtureBuilder builder = new FixtureBuilder()
+  public static FixtureBuilder builder(DirTestWatcher dirTestWatcher) {
--- End diff --

See note above about passing in the test watcher via a method instead of 
the constructor.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210152#comment-16210152
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144945381
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
+   * @throws SchemaChangeException
+   */
+  void generate() throws SchemaChangeException;
+
+  /**
+   * Retrieves the final priority queue HyperBatch containing the results. 
Note: this should be called
+   * after {@link #generate()}.
+   * @return The final priority queue HyperBatch containing the results.
+   */
+  VectorContainer getHyperBatch();
+
+  SelectionVector4 getSv4();
+
+  /**
+   * Retrieves the selection vector used to select the elements in the 
priority queue from the hyper batch
+   * provided by the {@link #getHyperBatch()} method. Note: this 
should be called after {@link #generate()}.
+   * @return The selection vector used to select the elements in the 
priority queue.
+   */
+  SelectionVector4 getFinalSv4();
--- End diff --

Taking a step back... Suppose I do a top 100. Also, suppose my batches are 
big, say 100 MB each. Also, suppose I have fiendishly organized by data so that 
the top 100 values are the first values of 100 batches.

If the priority queue uses an SV4, then it means that the queue holds onto 
the entire batch that holds the set of top values. In my case, since the top 
100 values occur across 100 batches, I'm holding onto 100 of batches of 100 MB 
each for a total of 10 GB of memory.

Is this how it works?

Do we need to think about compaction at some point? Once we have buffered, 
say, five batches, we copy the values to a new, much smaller, batch and discard 
the inputs. We repeat this over and over to keep the number of buffered batches 
under control.

Once we control batch size (in bytes; we already control records), we'll 
have the means to set a memory budget for TopN, then use consolidation to stay 
within that budget.

Or, is this how the code already works?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210183#comment-16210183
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145522320
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ExampleTest.java ---
@@ -59,6 +59,9 @@
 @Ignore
 public class ExampleTest {
 
+  @Rule
+  public final DirTestWatcher dirTestWatcher = new DirTestWatcher();
--- End diff --

This is meant as an example, so perhaps include some Javadoc to explain 
what this is and how to use it. Maybe explain how to use the temp directories, 
and include a new example test below that illustrates the main points. People 
are busy, they don't often don't have time to read documentation, but they 
often will copy & paste a good example...


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210164#comment-16210164
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294152
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
 "  nations.N_NAME,\n" +
 "  regions.R_NAME\n" +
 "FROM\n" +
-"  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` 
nations\n" +
+"  dfs.`/sample-data/nation.parquet` nations\n" +
--- End diff --

Does this work? Did we remap `dfs` to point the per-query temp directory? 
Do we want to do that? Or, should we use `dfs_test` here as well?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210171#comment-16210171
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145515947
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
--- End diff --

Here is where we might have a bit of a discussion. Should we represent 
directory paths as:

* Strings (as done here)
* Java NIO `Path` objects (which Java seems to want to be the new standard)
* Hadoop `Path` objects (since that is what the Drill file system uses)
* Good old fashioned Java `File` objects (which, while old, works well in 
common cases)

I would suggest we use `File` for references to file system paths:

* The Java `Path` has the same name as the Hadoop `Path`, causing endless 
confusion.
* Hadoop `Path` is overkill for simple uses in tests.
* `String` encourages people to write their own code to join paths by 
concatenating.

The use of `File` keeps things simple and standard. Concatenation is simple.

Other solutions are, of course, possible. We could use `String` and 
replicate the methods on `File` or either of the `Path` classes, say. The point 
is, let's pick a standard and use it everywhere.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210156#comment-16210156
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144946834
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +336,42 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(
+  mainMapping, leftMapping, rightMapping, context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static MappingSet createMainMappingSet() {
+return new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createLeftMappingSet() {
+return new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static MappingSet createRightMappingSet() {
+return new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+MappingSet mainMapping, MappingSet leftMapping, MappingSet 
rightMapping,
--- End diff --

Do we ever use more than one mapping set? If not, can we just refer to the 
constants inside this function?

I see that a simpler form exists. Just curious why we also need the complex 
form. For better testing?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210174#comment-16210174
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145508960
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestExternalSort.java
 ---
@@ -138,34 +141,34 @@ public void testNewColumnsManaged() throws Exception {
 testNewColumns(false);
   }
 
-
   @Test
   public void testNewColumnsLegacy() throws Exception {
 testNewColumns(true);
   }
 
   private void testNewColumns(boolean testLegacy) throws Exception {
 final int record_count = 1;
-String dfs_temp = getDfsTestTmpSchemaLocation();
-System.out.println(dfs_temp);
-File table_dir = new File(dfs_temp, "newColumns");
-table_dir.mkdir();
-try (BufferedOutputStream os = new BufferedOutputStream(new 
FileOutputStream(new File(table_dir, "a.json" {
-  String format = "{ a : %d, b : %d }%n";
-  for (int i = 0; i <= record_count; i += 2) {
-os.write(String.format(format, i, i).getBytes());
-  }
+final String tableDirName = "newColumns";
+
+TableFileBuilder tableA = new TableFileBuilder(Lists.newArrayList("a", 
"b"), Lists.newArrayList("%d", "%d"));
--- End diff --

Clever -- but how do we handle types? The original code created JSON of the 
form:

```
{ a : 10, b : 20 }
```

Aside from the fact that the labels are not true JSON (not quoted), the 
mechanism does not work for strings, which need to be quoted. The mechanism 
here assumes numbers. But, of course, if we assume numbers, we don't need the 
second argument, the "%d".

If you look in the `ClusterFixture` code, you'll see a method 
`ClusterFixture.stringify()` that converts an Object to a SQL-compatible 
string. We could create a similar one for JSON.

But, if we take a step back, we are creating JSON. Perfectly fine JSON 
builder classes exist that can be used to build up type-aware JSON and render 
the result as a string. The one limitation is that these classes are for single 
documents; they can't handle the non-standard list-of-objects format that Drill 
users. Still, we can use the per-object classes to build up the list of objects.

Yet another solution is to use the `RowSet` classes which are type aware. 
Yes, they build value vectors, but that is not important here. What is 
important is that they use the full schema power of Drill: including data type, 
repeated, nullable, etc. Then, we just create a RowSet-to-JSON converter. In 
fact, I may have something like that in the "Jig" project I did earlier; I'll 
rummage around and see if I can find it. 


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210163#comment-16210163
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294064
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

I like this improvement. Here we created files just for this one test. You 
are using a workspace to point to those files. Do so avoids the need to doctor 
up the strings.

Is `dfs_test` setup to point to the per-query root directory? Seems to be.

Since I have to ask these questions, would be good to explain in the 
`...drill.test` `package-info.java` file how to set up temporary files and the 
mapping of workspaces to these directories. That will allow others to readily 
use the mechanism. Otherwise, people will guess, and will go back to using what 
is familiar. See `ExampleTest.java` for an example of a "starter" file to help 
people get started.

In fact, maybe we would add a new example test to that file that shows how 
to create a test-specific data file and query that file. (That would sure help 
me...)


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210186#comment-16210186
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145556819
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -908,6 +908,15 @@ public void generateTestData(int count) {
 }
 
   <#else> <#-- type.width <= 8 -->
+@Override
+public void add(Object value) {
+  int index = accessor.getValueCount();
+  int valueCount = index + 1;
+  setValueCount(valueCount);
+
+  set(index, (${type.javaType}) value);
--- End diff --

I don't think this will work as you hope it will -- for obscure reasons 
that we've all learned the hard way...

* The accessor does not keep an ongoing count of values. Rather, the value 
count comes from the writer index which is set only at the end of a write.
* This method sets the value count, but doing so is, in general, expensive.
* Vectors in a batch must be kept in sync. This adds a value to one vector, 
but does not coordinate across a set of vectors.
* Not all values can be set by casting. Works for some numeric values, but 
does not handle, say, `add(10)` when the item is a byte or a short because of 
the need for a two-step cast: `(byte)(int) value`.
* Does not handle conversions for date or decimal.

Yes, this is all a colossal pain in the neck. But, it is the reason that 
the `RowSet` classes were created.

We can talk in person about how to move this idea forward effectively.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210173#comment-16210173
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145505747
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -76,43 +78,50 @@
   //- detecting corrupt values must be deferred to actual data 
page reading
   //- one from 1.4, where there is a proper created-by, but the 
corruption is present
   private static final String MIXED_CORRUPTED_AND_CORRECT_DATES_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/mixed_drill_versions";
+  "/parquet/4203_corrupt_dates/mixed_drill_versions";
   // partitioned with 1.2.0, no certain metadata that these were written 
with Drill
   // the value will be checked to see that they look corrupt and they will 
be corrected
   // by default. Users can use the format plugin option 
autoCorrectCorruptDates to disable
   // this behavior if they have foreign parquet files with valid rare date 
values that are
   // in the similar range as Drill's corrupt values
+  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
   private static final String CORRUPTED_PARTITIONED_DATES_1_2_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203_1_2";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_2_FOLDER).toString();
   // partitioned with 1.4.0, no certain metadata regarding the date 
corruption status.
   // The same detection approach of the corrupt date values as for the 
files partitioned with 1.2.0
+  private static final String PARTITIONED_1_4_FOLDER = 
"partitioned_with_corruption_4203";
   private static final String CORRUPTED_PARTITIONED_DATES_1_4_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/partitioned_with_corruption_4203";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_4_FOLDER).toString();
   private static final String PARQUET_DATE_FILE_WITH_NULL_FILLED_COLS =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  
"/parquet/4203_corrupt_dates/null_date_cols_with_corruption_4203.parquet";
+  private static final String PARTITIONED_1_9_FOLDER = 
"1_9_0_partitioned_no_corruption";
   private static final String CORRECT_PARTITIONED_DATES_1_9_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/1_9_0_partitioned_no_corruption";
+  Paths.get("/parquet/4203_corrupt_dates", 
PARTITIONED_1_9_FOLDER).toString();
   private static final String VARCHAR_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
+  "/parquet/4203_corrupt_dates/fewtypes_varcharpartition";
   private static final String DATE_PARTITIONED =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/fewtypes_datepartition";
+  "/parquet/4203_corrupt_dates/fewtypes_datepartition";
   private static final String EXCEPTION_WHILE_PARSING_CREATED_BY_META =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
+  "/parquet/4203_corrupt_dates/hive1dot2_fewtypes_null";
   private static final String CORRECT_DATES_1_6_0_PATH =
-  
"[WORKING_PATH]/src/test/resources/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
-  private static final String PARTITIONED_1_2_FOLDER = 
"partitioned_with_corruption_4203_1_2";
+  
"/parquet/4203_corrupt_dates/correct_dates_and_old_drill_parquet_writer.parquet";
   private static final String 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER = "mixed_partitioned";
 
-
   @BeforeClass
   public static void initFs() throws Exception {
+dirTestWatcher.copyResourceToRoot("/parquet/4203_corrupt_dates");
--- End diff --

Each time I see this I have to remember that this is an absolute path 
relative to some unstated root. That is, although it looks absolute, it is 
actually relative...


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210160#comment-16210160
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294641
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java 
---
@@ -33,8 +35,11 @@
 @Category(UnlikelyTest.class)
 public class TestBugFixes extends BaseTestQuery {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestBugFixes.class);
-  private static final String WORKING_PATH = TestTools.getWorkingPath();
-  private static final String TEST_RES_PATH = WORKING_PATH + 
"/src/test/resources";
+
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyResourceToRoot("/bugs/DRILL-4192");
--- End diff --

Small nit. Presumably `/bugs/DRILL-4192` is a relative path. Should we 
express it in the normal Unix fashion as `bugs/DRILL-4192`? That way, in 
time-honored fashion, a leading slash tells us the path is absolute.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210177#comment-16210177
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145516665
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
--- End diff --

Do we really want to create yet another way to muck with data? Any reason 
we can't use the `RowSet` concepts here? I fear that if we have multiple ways 
to do the same thing, we won't achieve critical mass on any of them and we'll 
have a bunch of half-baked solutions. I'd rather have one fully-baked solution 
we use over and over. Allows new tests to get done easily because the required 
tools already exist. This, in turn, encourages testing.

What are we doing here that `RowSet` can't (yet) do?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210179#comment-16210179
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145518566
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210172#comment-16210172
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145509549
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java
 ---
@@ -147,7 +152,7 @@ public void sortOneKeyDescendingExternalSortLegacy() 
throws Throwable {
   }
 
   private void sortOneKeyDescendingExternalSort(boolean testLegacy) throws 
Throwable {
-FixtureBuilder builder = ClusterFixture.builder()
+FixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
--- End diff --

Nice, but I might move this to a method:

```
FixtureBuilder builder = ClusterFixture.builder()
.withTempDir(dirTestWatcher)
.configProperty(...) ...
```

Reason: if this has to go into the constructor, then every constructor use 
must change, and tests must create a temp directory even when not needed. But, 
if passed in using a builder method, then we can use it regardless of how we 
create the builder object, and the directory is optional.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210181#comment-16210181
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145520129
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ClusterFixture.java ---
@@ -226,43 +217,16 @@ private void createConfig(FixtureBuilder builder) 
throws Exception {
 
   serviceSet = null;
   usesZk = true;
-  isLocal = false;
 } else {
   // Embedded Drillbit.
 
   serviceSet = RemoteServiceSet.getLocalServiceSet();
-  isLocal = true;
 }
   }
 
-  private void startDrillbits(FixtureBuilder builder) throws Exception {
-//// Ensure that Drill uses the log directory determined here rather 
than
-//// it's hard-coded defaults. WIP: seems to be needed some times but
-//// not others.
-//
-//String logDir = null;
-//if (builder.tempDir != null) {
-//  logDir = builder.tempDir.getAbsolutePath();
-//}
-//if (logDir == null) {
-//  logDir = config.getString(ExecConstants.DRILL_TMP_DIR);
-//  if (logDir != null) {
-//logDir += "/drill/log";
-//  }
-//}
-//if (logDir == null) {
-//  logDir = "/tmp/drill";
-//}
-//new File(logDir).mkdirs();
-//System.setProperty("drill.log-dir", logDir);
-
-dfsTestTempDir = makeTempDir("dfs-test");
-
-// Clean up any files that may have been left from the
-// last run.
-
-preserveLocalFiles = builder.preserveLocalFiles;
-removeLocalFiles();
+  private void startDrillbits() throws Exception {
+dfsTestTempDir = new File(getRootDir(), "dfs-test");
+dfsTestTempDir.mkdirs();
--- End diff --

Is the "preserve local files" behavior maintained? Let me explain.

In automated tests, files are created, used, and discarded.

But, we also use this framework for ad-hoc tests when debugging. In this 
case, we create files, end the test, and examine the files by hand. Examples: 
logs, output files, query profiles, etc.

So, for this test fixture (but not for `BaseTestQuery`), we want two modes:

* Normal mode: as you have implemented.
* "Preserve local files" mode: files are written to `/tmp/drill` (or some 
other fixed, well-known location, can be within the `target` folder) and 
preserved past test exit.

A method on the `FixtureBuilder` identifies which mode is desired. We turn 
on the "preserve" mode for ad-hoc tests.

For more info on this framework, see [this 
description](https://github.com/paul-rogers/drill/wiki/Cluster-Fixture-Framework).


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210178#comment-16210178
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145517411
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BatchUtils.java ---
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.junit.Assert;
+
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+
+public class BatchUtils {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchUtils.class);
+
+  public static Map 
containerToObjects(VectorContainer vectorContainer) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int recordCount = vectorContainer.getRecordCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  ValueVector.Accessor valueVectorAccessor = vectorContainer
+.getValueVector(columnIndex)
+.getValueVector()
+.getAccessor();
+
+  for (int recordIndex = 0; recordIndex < recordCount; recordIndex++) {
+data.add(valueVectorAccessor.getObject(recordIndex));
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static Map 
hyperBatchAndSelectorToObjects(VectorContainer vectorContainer, 
SelectionVector4 selectionVector4) {
+Map rows = Maps.newHashMap();
+int numCols = vectorContainer.getNumberOfColumns();
+int numIndices = selectionVector4.getCount();
+
+for (int columnIndex = 0; columnIndex < numCols; columnIndex++) {
+  String columnName = 
vectorContainer.getSchema().getColumn(columnIndex).getName();
+  List data = Lists.newArrayList();
+
+  VectorWrapper vectorWrapper = 
vectorContainer.getValueVector(columnIndex);
+
+  for (int indexIndex = 0; indexIndex < numIndices; indexIndex++) {
+int sv4Index = selectionVector4.get(indexIndex);
+int batchIndex = SelectionVector4.getBatchIndex(sv4Index);
+int recordIndex = SelectionVector4.getRecordIndex(sv4Index);
+
+ValueVector valueVector = 
vectorWrapper.getValueVectors()[batchIndex];
+Object columnValue = 
valueVector.getAccessor().getObject(recordIndex);
+data.add(columnValue);
+  }
+
+  rows.put(columnName, data);
+}
+
+return rows;
+  }
+
+  public static String toString(Map table) {
+if (table.isEmpty()) {
+  return "[ empty table ]";
+}
+
+List columnNames = Lists.newArrayList(table.keySet());
+Collections.sort(columnNames);
+int numRecords = table.get(columnNames.get(0)).size();
+
+StringBuilder sb = new StringBuilder();
+
+{
+  sb.append("[ ");
+  String separator = "";
+
+  for (String columnName : columnNames) {
+sb.append(separator);
+separator = ", ";
+sb.append(columnName);
+  }
+
+  sb.append(" ]\n");
+}
+
+for (int 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210153#comment-16210153
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144944018
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
--- End diff --

Ah, I see. Thanks.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210170#comment-16210170
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145297931
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.TopN;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Random;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.test.TestBuilder;
+import org.apache.drill.categories.OperatorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.logical.data.Order;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.compile.ClassBuilder;
+import org.apache.drill.exec.compile.CodeCompiler;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.pop.PopUnitTestBase;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.ExpandableHyperContainer;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.server.options.OptionSet;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.FixtureBuilder;
+import org.apache.drill.test.OperatorFixture;
+import org.apache.drill.test.BatchUtils;
+import org.apache.drill.test.DirTestWatcher;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(OperatorTest.class)
+public class TopNBatchTest extends PopUnitTestBase {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
+
+  // Allows you to look at generated code after tests execute
+  @Rule
+  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
+
+  /**
+   * Priority queue unit test.
+   * @throws Exception
+   */
+  @Test
+  public void priorityQueueOrderingTest() throws Exception {
+Properties properties = new Properties();
+properties.setProperty(ClassBuilder.CODE_DIR_OPTION, 
dirTestWatcher.getDirPath());
+
+DrillConfig drillConfig = DrillConfig.create(properties);
+OptionSet optionSet = new OperatorFixture.TestOptionSet();
+
+FieldReference expr = FieldReference.getWithQuotedRef("colA");
+Order.Ordering ordering = new 
Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
+List orderings = Lists.newArrayList(ordering);
+
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
+
+List cols = Lists.newArrayList(colA, colB);
+BatchSchema batchSchema = new 
BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
+
+try (RootAllocator allocator = new RootAllocator(100_000_000)) {
+  VectorContainer expectedVectors = new RowSetBuilder(allocator, 
batchSchema)
+.add(110, 10)
+.add(109, 9)
+.add(108, 8)
+.add(107, 7)
+.add(106, 6)
+.add(105, 5)
+.add(104, 4)
+.add(103, 3)
+.add(102, 2)
+  

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210175#comment-16210175
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145516134
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseDirTestWatcher.java ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test;
+
+import com.google.common.base.Charsets;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.common.util.TestTools;
+import org.apache.drill.exec.util.TestUtilities;
+import org.junit.runner.Description;
+
+import java.io.File;
+import java.io.IOException;
+
+public class BaseDirTestWatcher extends DirTestWatcher {
+  public enum DirType {
+ROOT,
+TEST_TMP;
+  }
+
+  private File tmpDir;
+  private File storeDir;
+  private File dfsTestTmpParentDir;
+  private File dfsTestTmpDir;
+  private File rootDir;
+
+  public BaseDirTestWatcher() {
+super();
+  }
+
+  @Override
+  protected void starting(Description description) {
+super.starting(description);
+
+rootDir = makeSubDir("root");
+tmpDir = new File(rootDir, "tmp");
+storeDir = new File(rootDir, "store");
+dfsTestTmpParentDir = new File(rootDir, "dfsTestTmp");
+
+tmpDir.mkdirs();
+storeDir.mkdirs();
+dfsTestTmpParentDir.mkdirs();
+  }
+
+  public File getTmpDir() {
+return tmpDir;
+  }
+
+  public File getStoreDir() {
+return storeDir;
+  }
+
+  public File getDfsTestTmpParentDir() {
+return dfsTestTmpParentDir;
+  }
+
+  public File getDfsTestTmpDir() {
+return dfsTestTmpDir;
+  }
+
+  public String getDfsTestTmpDirPath() {
+return dfsTestTmpDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public File getRootDir() {
+return rootDir;
+  }
+
+  public String getRootDirPath() {
+return rootDir.getAbsolutePath().replaceAll("/\\./", "/");
+  }
+
+  public void newDfsTestTmpDir() {
+dfsTestTmpDir = 
TestUtilities.createTempDir(BaseTestQuery.dirTestWatcher.getDfsTestTmpParentDir());
+  }
+
+  private File getDir(DirType type) {
+switch (type) {
+  case ROOT:
+return rootDir;
+  case TEST_TMP:
+return dfsTestTmpDir;
+  default:
+throw new IllegalArgumentException(String.format("Unsupported type 
%s", type));
+}
+  }
+
+  public File makeRootSubDir(String relPath) {
+return makeSubDir(relPath, DirType.ROOT);
+  }
+
+  public File makeTestTmpSubDir(String relPath) {
+return makeSubDir(relPath, DirType.TEST_TMP);
+  }
+
+  private File makeSubDir(String relPath, DirType type) {
+File subDir = new File(getDir(type), relPath);
+subDir.mkdirs();
+return subDir;
+  }
+
+  /**
+   * This preserves the relative path of the directory in root
+   * @param relPath
+   * @return
+   */
+  public File copyResourceToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyFileToRoot(String relPath) {
+return copyTo(relPath, relPath, TestTools.DataType.PROJECT, 
DirType.ROOT);
+  }
+
+  public File copyResourceToRoot(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.ROOT);
+  }
+
+  public File copyResourceToTestTmp(String relPath, String destPath) {
+return copyTo(relPath, destPath, TestTools.DataType.RESOURCE, 
DirType.TEST_TMP);
+  }
+
+  private File copyTo(String relPath, String destPath, TestTools.DataType 

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210167#comment-16210167
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145506779
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestCorruptParquetDateCorrection.java
 ---
@@ -377,21 +386,21 @@ public void 
testReadOldMetadataCacheFileOverrideCorrection() throws Exception {
 
   @Test
   public void testReadNewMetadataCacheFileOverOldAndNewFiles() throws 
Exception {
-String table = format("dfs.`%s`", new 
Path(getDfsTestTmpSchemaLocation(), 
MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER));
-copyMetaDataCacheToTempReplacingInternalPaths(
+File meta = dirTestWatcher.copyResourceToRoot(
 
"parquet/4203_corrupt_dates/mixed_version_partitioned_metadata.requires_replace.txt",
-MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME);
+Paths.get(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME).toString());
--- End diff --

Seems awkward. We use a path to concatenate strings, then convert back to a 
string, then to a file. Further, for some crazy reason, Hadoop also has a 
`Path` class that many tests reference.

Can we instead do:

`File meta = ...toRoot(new File(new 
File(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER), 
Metadata.METADATA_FILENAME));
```

Or, better, since seem to use this pattern over and over:

```
File meta = ...toRoot(MIXED_CORRUPTED_AND_CORRECT_PARTITIONED_FOLDER, 
Metadata.METADATA_FILENAME);
```

That is, the `copyResourceToRoot` has a form that takes a variable number 
of arguments to build up the target path. Each should be assumed to be relative 
"a/b", "c", "d.csv", say.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210166#comment-16210166
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145294835
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestCTASPartitionFilter.java ---
@@ -59,48 +58,48 @@ public void withDistribution() throws Exception {
 test("alter session set `planner.slice_target` = 1");
 test("alter session set `store.partition.hash_distribute` = true");
 test("use dfs_test.tmp");
-test(String.format("create table orders_distribution partition by 
(o_orderpriority) as select * from dfs_test.`%s/multilevel/parquet`", 
TEST_RES_PATH));
+test("create table orders_distribution partition by (o_orderpriority) 
as select * from dfs_test.`/multilevel/parquet`");
 String query = "select * from orders_distribution where 
o_orderpriority = '1-URGENT'";
-testExcludeFilter(query, 1, "Filter", 24);
+testExcludeFilter(query, 1, "Filter\\(", 24);
--- End diff --

Why the extra characters?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210165#comment-16210165
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r145297229
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java ---
@@ -559,7 +563,7 @@ public void testRankWithGroupBy() throws Exception {
 
   @Test // DRILL-3404
   public void testWindowSumAggIsNotNull() throws Exception {
-String query = String.format("select count(*) cnt from (select sum ( 
c1 ) over ( partition by c2 order by c1 asc nulls first ) w_sum from 
dfs.`%s/window/table_with_nulls.parquet` ) sub_query where w_sum is not null", 
TEST_RES_PATH);
+String query = "select count(*) cnt from (select sum ( c1 ) over ( 
partition by c2 order by c1 asc nulls first ) w_sum from 
dfs.`/window/table_with_nulls.parquet` ) sub_query where w_sum is not null";
--- End diff --

Right about now I have to give you a huge THANKS! This is a huge amount of 
clean-up you've done. These are really good improvements. Really appreciate the 
tedious work to get this all done.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210148#comment-16210148
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144936994
  
--- Diff: common/src/test/java/org/apache/drill/test/DirTestWatcher.java ---
@@ -32,23 +32,50 @@
 public class DirTestWatcher extends TestWatcher {
   private String dirPath;
   private File dir;
+  private boolean deleteDirAtEnd = true;
--- End diff --

Thanks for the Javadoc added previously. Perhaps add a section on how to 
use this class. If I need a test directory, do I create an instance of this 
class? Or, does JUnit do it for me? If done automagically, how do I get a copy 
of the directory?

If this is explained in JUnit, perhaps just reference that information. 
Even better would be a short example:

> To get a test directory:
```
// Do something here
File myDir = // do something
```


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201840#comment-16201840
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/984
  
@paul-rogers Applied / responding to comments. Also removed 
RecordBatchBuilder and used RowSetBuilder.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201535#comment-16201535
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144207089
  
--- Diff: 
common/src/test/java/org/apache/drill/testutils/SubDirTestWatcher.java ---
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.testutils;
--- End diff --

done


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201524#comment-16201524
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144205146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +333,32 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+OptionSet optionSet, FunctionLookupContext functionLookupContext, 
CodeCompiler codeCompiler,
+List orderings, VectorAccessible batch, boolean 
unionTypeEnabled, boolean codegenDump,
+int limit, BufferAllocator allocator, SelectionVectorMode mode)
+  throws ClassTransformationException, IOException, 
SchemaChangeException {
+final MappingSet mainMapping = new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet leftMapping = new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet rightMapping = new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
--- End diff --

Reverted it back to the way it was


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201449#comment-16201449
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144195098
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -95,7 +91,9 @@ public TopNBatch(TopN popConfig, FragmentContext context, 
RecordBatch incoming)
 super(popConfig, context);
 this.incoming = incoming;
 this.config = popConfig;
-batchPurgeThreshold = 
context.getConfig().getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+DrillConfig drillConfig = context.getConfig();
+batchPurgeThreshold = 
drillConfig.getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+codegenDump = 
drillConfig.getBoolean(CodeCompiler.ENABLE_SAVE_CODE_FOR_DEBUG);
--- End diff --

I will rename the option to drill.debug.codegen.TopN


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201447#comment-16201447
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144194940
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

This is the name of the option which holds the true/false flag to enable 
saving code for debugging. The destination directory is governed by the 
ClassBuilder.CODE_DIR_OPTION property.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201443#comment-16201443
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144194617
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
--- End diff --

Old habit, I'm used to PriorityQueues being called Heaps. I agree it's 
confusing in this context I will fix it.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201442#comment-16201442
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144194241
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
@@ -90,12 +89,11 @@
   private String generatedCode;
   private String generifiedCode;
 
-  CodeGenerator(TemplateClassDefinition definition, 
FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
-this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, 
optionManager);
+  CodeGenerator(TemplateClassDefinition definition, OptionSet 
optionManager) {
+this(ClassGenerator.getDefaultMapping(), definition, optionManager);
--- End diff --

I traced through the code and asked IntelliJ where the variable is used. 
This is what I found:

1. CodeGenerator accepts it as an argument to its constructor
1. It is then passed to the constructor of the EvaluationVisitor
1. A private field is set with the value in the constructor of 
EvaluationVisitor
1. This private field of EvaluationVisitor is then unused.

If your not convinced let's walk through the code when I'm in the office 
next week.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201430#comment-16201430
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144193392
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

The directory that generated code is dumped into is determined by the 
ClassBuilder.CODE_DIR_OPTION property. The DirTestWatcher createw a directory 
of the following form:

```
/target//
```

So if the ClassBuilder.CODE_DIR_OPTION property is configured to use the 
directory generated by the DirTestWatcher, the code is dumped into the same 
well defined location every test run. Also the generated class file has names 
like:

```
Gen.java
```


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201223#comment-16201223
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144132366
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java ---
@@ -90,12 +89,11 @@
   private String generatedCode;
   private String generifiedCode;
 
-  CodeGenerator(TemplateClassDefinition definition, 
FunctionImplementationRegistry funcRegistry, OptionSet optionManager) {
-this(ClassGenerator.getDefaultMapping(), definition, funcRegistry, 
optionManager);
+  CodeGenerator(TemplateClassDefinition definition, OptionSet 
optionManager) {
+this(ClassGenerator.getDefaultMapping(), definition, optionManager);
--- End diff --

Does the code actually work without this item? Isn't the registry where we 
store UDF functions? And, doesn't the expression materializer make use of this 
info to copy UDFs inline into the generated code?

If we can remove this, that is a big convenience. But, I'm skeptical that 
it is, in fact, not needed.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201219#comment-16201219
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144131729
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

When saving generated code, I have to point to the directory from Eclipse. 
This means two things:

1. It works best if the same directory is used from one run to the next. 
(That is, no randomly generated temp directory.)
2. The directory has to be visible to the Eclipse UI. (That is, no hidden 
directories starting with dots.)

Can we make the directory in a known location, and can we use a nice simple 
name like "codegen"?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201224#comment-16201224
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144147334
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -95,7 +91,9 @@ public TopNBatch(TopN popConfig, FragmentContext context, 
RecordBatch incoming)
 super(popConfig, context);
 this.incoming = incoming;
 this.config = popConfig;
-batchPurgeThreshold = 
context.getConfig().getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+DrillConfig drillConfig = context.getConfig();
+batchPurgeThreshold = 
drillConfig.getInt(ExecConstants.BATCH_PURGE_THRESHOLD);
+codegenDump = 
drillConfig.getBoolean(CodeCompiler.ENABLE_SAVE_CODE_FOR_DEBUG);
--- End diff --

Not sure we want to enable this in this way. This would globally enable all 
code to be dumped. In practice, we want to enable module by module depending on 
what we are debugging.

I've been doing this the crude-but-effective way: just uncommenting the 
magic line of code. This usually works because I have to be in the debugger 
anyway to step through the code. But, if we wanted to be fancier, we could have 
config settings for each package. So, here, maybe you would enable 
`drill.debug.codegen.org.apache.drill.exec.physical.impl.TopN`. Or, we could be 
a bit lighter weight and just assign names: `drill.debug.codegen.TopN`.

Because a typical run will generate zillions of files, reducing the number 
of saved files to those of interest will help to keep things tidy.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201222#comment-16201222
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144142548
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/PriorityQueue.java
 ---
@@ -20,22 +20,58 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 public interface PriorityQueue {
-  public void add(FragmentContext context, RecordBatchData batch) throws 
SchemaChangeException;
-  public void init(int limit, FragmentContext context, BufferAllocator 
allocator, boolean hasSv2) throws SchemaChangeException;
-  public void generate() throws SchemaChangeException;
-  public VectorContainer getHyperBatch();
-  public SelectionVector4 getHeapSv4();
-  public SelectionVector4 getFinalSv4();
-  public boolean validate();
-  public void resetQueue(VectorContainer container, SelectionVector4 
vector4) throws SchemaChangeException;
-  public void cleanup();
-
-  public static TemplateClassDefinition TEMPLATE_DEFINITION 
= new TemplateClassDefinition(PriorityQueue.class, 
PriorityQueueTemplate.class);
+  /**
+   * The elements in the given batch are added to the priority queue. Note 
that the priority queue
+   * only retains the top elements that fit within the size specified by 
the {@link #init(int, BufferAllocator, boolean)}
+   * method.
+   * @param batch The batch containing elements we want to add.
+   * @throws SchemaChangeException
+   */
+  void add(RecordBatchData batch) throws SchemaChangeException;
 
+  /**
+   * Initializes the priority queue. This method must be called before any 
other methods on the priority
+   * queue are called.
+   * @param limit The size of the priority queue.
+   * @param allocator The {@link BufferAllocator} to use when creating the 
priority queue.
+   * @param hasSv2 True when incoming batches have v2 selection vectors. 
False otherwise.
+   * @throws SchemaChangeException
+   */
+  void init(int limit, BufferAllocator allocator, boolean hasSv2) throws 
SchemaChangeException;
+
+  /**
+   * This method must be called before fetching the final heap hyper batch 
and final Sv4 vector.
--- End diff --

"heap hyper batch" seems an oxymoron. All batches, including hyper-batches, 
exist in direct memory. What is meant by "heap"?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201221#comment-16201221
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144147623
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/TopN/TopNBatch.java
 ---
@@ -335,20 +333,32 @@ private void purge() throws SchemaChangeException {
 logger.debug("Took {} us to purge", 
watch.elapsed(TimeUnit.MICROSECONDS));
   }
 
-  public PriorityQueue createNewPriorityQueue(FragmentContext context, 
List orderings,
- VectorAccessible 
batch, MappingSet mainMapping, MappingSet leftMapping, MappingSet rightMapping)
-  throws ClassTransformationException, IOException, 
SchemaChangeException{
-CodeGenerator cg = 
CodeGenerator.get(PriorityQueue.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
+  private PriorityQueue createNewPriorityQueue(VectorAccessible batch, int 
limit)
+throws SchemaChangeException, ClassTransformationException, 
IOException {
+return createNewPriorityQueue(context.getOptionSet(), 
context.getFunctionRegistry(), context.getDrillbitContext().getCompiler(),
+  config.getOrderings(), batch, unionTypeEnabled, codegenDump, limit, 
oContext.getAllocator(), schema.getSelectionVectorMode());
+  }
+
+  public static PriorityQueue createNewPriorityQueue(
+OptionSet optionSet, FunctionLookupContext functionLookupContext, 
CodeCompiler codeCompiler,
+List orderings, VectorAccessible batch, boolean 
unionTypeEnabled, boolean codegenDump,
+int limit, BufferAllocator allocator, SelectionVectorMode mode)
+  throws ClassTransformationException, IOException, 
SchemaChangeException {
+final MappingSet mainMapping = new MappingSet((String) null, null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet leftMapping = new MappingSet("leftIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
+final MappingSet rightMapping = new MappingSet("rightIndex", null, 
ClassGenerator.DEFAULT_SCALAR_MAP, ClassGenerator.DEFAULT_SCALAR_MAP);
--- End diff --

Not sure these should be inside the method. Can we still grab hold of the 
variable for testing or debugging? Might we want to leave this in the class 
name space?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201225#comment-16201225
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r144131909
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java ---
@@ -110,6 +109,11 @@ public CodeGenCompiler(final DrillConfig config, final 
OptionSet optionManager)
   public static final String DISABLE_CACHE_CONFIG = COMPILE_BASE + 
".disable_cache";
 
   /**
+   * Enables saving generated code for debugging
+   */
+  public static final String ENABLE_SAVE_CODE_FOR_DEBUG = COMPILE_BASE + 
".codegen.dump";
--- End diff --

Also, how does a static string enable code generation? Isn't this just the 
destination folder once saving code is enabled?


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >