[GitHub] drill issue #850: DRILL-5541: C++ Client Crashes During Simple "Man in the M...

2017-06-06 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/850
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #849: DRILL-5568: Include hadoop-common jars inside drill-jdbc-a...

2017-06-06 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/849
  
+ @paul-rogers - To help review this PR in case JIRA didn't sent out the 
email.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/808
  
Rebased onto latest master and resolved merge conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5572) Redundant user name property in AbstractBase, EasySubScan

2017-06-06 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5572:
--

 Summary: Redundant user name property in AbstractBase, EasySubScan
 Key: DRILL-5572
 URL: https://issues.apache.org/jira/browse/DRILL-5572
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Paul Rogers
Priority: Minor


In Drill physical operators:





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120498921
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/BaseAllocator.java 
---
@@ -793,20 +797,20 @@ public static boolean isDebug() {
 return DEBUG;
   }
 
-  /**
-   * Disk I/O buffer used for all reads and writes of DrillBufs.
-   */
-
-  private byte ioBuffer[];
-
   public byte[] getIOBuffer() {
 if (ioBuffer == null) {
+  // Length chosen to the smallest size that maximizes
+  // disk I/O performance. Smaller sizes slow I/O. Larger
+  // sizes provide no increase in performance.
+  // Revisit from time to time.
--- End diff --

Fixed. It is a constant because no one can detect changes to performance of 
the number from outside Drill. To come up with the number required intense, 
focused tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120497872
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
 ---
@@ -0,0 +1,506 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+public class SortMemoryManager {
+
+  /**
+   * Maximum memory this operator may use. Usually comes from the
+   * operator definition, but may be overridden by a configuration
+   * parameter for unit testing.
+   */
+
+  private final long memoryLimit;
+
+  /**
+   * Estimated size of the records for this query, updated on each
+   * new batch received from upstream.
+   */
+
+  private int estimatedRowWidth;
+
+  /**
+   * Size of the merge batches that this operator produces. Generally
+   * the same as the merge batch size, unless low memory forces a smaller
+   * value.
+   */
+
+  private int expectedMergeBatchSize;
+
+  /**
+   * Estimate of the input batch size based on the largest batch seen
+   * thus far.
+   */
+  private int estimatedInputBatchSize;
+
+  /**
+   * Maximum memory level before spilling occurs. That is, we can buffer 
input
+   * batches in memory until we reach the level given by the buffer memory 
pool.
+   */
+
+  private long bufferMemoryLimit;
+
+  /**
+   * Maximum memory that can hold batches during the merge
+   * phase.
+   */
+
+  private long mergeMemoryLimit;
+
+  /**
+   * The target size for merge batches sent downstream.
+   */
+
+  private int preferredMergeBatchSize;
+
+  /**
+   * The configured size for each spill batch.
+   */
+  private int preferredSpillBatchSize;
+
+  /**
+   * Estimated number of rows that fit into a single spill batch.
+   */
+
+  private int spillBatchRowCount;
+
+  /**
+   * The estimated actual spill batch size which depends on the
+   * details of the data rows for any particular query.
+   */
+
+  private int expectedSpillBatchSize;
+
+  /**
+   * The number of records to add to each output batch sent to the
+   * downstream operator or spilled to disk.
+   */
+
+  private int mergeBatchRowCount;
+
+  private SortConfig config;
+
+//  private long spillPoint;
+
+//  private long minMergeMemory;
+
+  private int estimatedInputSize;
+
+  private boolean potentialOverflow;
+
+  public SortMemoryManager(SortConfig config, long memoryLimit) {
+this.config = config;
+
+// The maximum memory this operator can use as set by the
+// operator definition (propagated to the allocator.)
+
+if (config.maxMemory() > 0) {
+  this.memoryLimit = Math.min(memoryLimit, config.maxMemory());
+} else {
+  this.memoryLimit = memoryLimit;
+}
+
+preferredSpillBatchSize = config.spillBatchSize();;
+preferredMergeBatchSize = config.mergeBatchSize();
+  }
+
+  /**
+   * Update the data-driven memory use numbers including:
+   * 
+   * The average size of incoming records.
+   * The estimated spill and output batch size.
+   * The estimated number of average-size records per
+   * spill and output batch.
+   * The amount of memory set aside to hold the incoming
+   * batches before spilling starts.
+   * 
+   * 
+   * Under normal circumstances, the amount of memory available is much
+   * larger than the input, spill or merge batch sizes. The primary 
question
+   * is to determine how many input batches we can buffer during the load
+   * phase, and how many spill batches we can merge during the merge
+   * phase.
+   *
+   * @param batchSize the overall size of the current batch received from
+   * upstream
+   * @param batchRo

[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120496924
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortImpl.java
 ---
@@ -0,0 +1,495 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperExecContext;
+import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
+import org.apache.drill.exec.physical.impl.xsort.MSortTemplate;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup.InputBatch;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.SortMemoryManager.MergeTask;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorAccessibleUtilities;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+
+/**
+ * Implementation of the external sort which is wrapped into the Drill
+ * "next" protocol by the {@link ExternalSortBatch} class.
+ * 
+ * Accepts incoming batches. Sorts each and will spill to disk as needed.
+ * When all input is delivered, can either do an in-memory merge or a
+ * merge from disk. If runs spilled, may have to do one or more 
"consolidation"
+ * passes to reduce the number of runs to the level that will fit in 
memory.
+ */
+
+public class SortImpl {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExternalSortBatch.class);
+
+  /**
+   * Iterates over the final sorted results. Implemented differently
+   * depending on whether the results are in-memory or spilled to
+   * disk.
+   */
+
+  public interface SortResults {
+/**
+ * Container into which results are delivered. May the
+ * the original operator container, or may be a different
+ * one. This is the container that should be sent
+ * downstream. This is a fixed value for all returned
+ * results.
+ * @return
+ */
+VectorContainer getContainer();
+boolean next();
+void close();
+int getBatchCount();
+int getRecordCount();
+SelectionVector2 getSv2();
+SelectionVector4 getSv4();
+  }
+
+  private final SortConfig config;
+  private final SortMetrics metrics;
+  private final SortMemoryManager memManager;
+  private VectorContainer outputBatch;
+  private OperExecContext context;
+
+  /**
+   * Memory allocator for this operator itself. Incoming batches are
+   * transferred into this allocator. Intermediate batches used during
+   * merge also reside here.
+   */
+
+  private final BufferAllocator allocator;
+
+  private final SpilledRuns spilledRuns;
+
+  private final BufferedBatches bufferedBatches;
+
+  public SortImpl(OperExecContext opContext, SortConfig sortConfig, 
SpilledRuns spilledRuns, VectorContainer batch) {
+this.context = opContext;
+outputBatch = batch;
+this.spilledRuns = spilledRuns;
+allocator = opContext.getAllocator();
+config = sortConfig;
+memManager = new SortMemoryManager(config, allocator.getLimit());
+metrics = new SortMetrics(opContext.getStats());
+bufferedBatches = new BufferedBatches(opContext);
+
+// Reset the allocator to allow a 10% safety margin. This is done 
because
+// the memory manager will enforce the original limit. Changing the 
hard
+// limit will reduce the probability that random chance causes the 
allocator
+// to kill the query because of a small, spurious over-allocation.
+
+allocator.setLim

[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120497990
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
 ---
@@ -0,0 +1,506 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+public class SortMemoryManager {
+
+  /**
+   * Maximum memory this operator may use. Usually comes from the
+   * operator definition, but may be overridden by a configuration
+   * parameter for unit testing.
+   */
+
+  private final long memoryLimit;
+
+  /**
+   * Estimated size of the records for this query, updated on each
+   * new batch received from upstream.
+   */
+
+  private int estimatedRowWidth;
+
+  /**
+   * Size of the merge batches that this operator produces. Generally
+   * the same as the merge batch size, unless low memory forces a smaller
+   * value.
+   */
+
+  private int expectedMergeBatchSize;
+
+  /**
+   * Estimate of the input batch size based on the largest batch seen
+   * thus far.
+   */
+  private int estimatedInputBatchSize;
+
+  /**
+   * Maximum memory level before spilling occurs. That is, we can buffer 
input
+   * batches in memory until we reach the level given by the buffer memory 
pool.
+   */
+
+  private long bufferMemoryLimit;
+
+  /**
+   * Maximum memory that can hold batches during the merge
+   * phase.
+   */
+
+  private long mergeMemoryLimit;
+
+  /**
+   * The target size for merge batches sent downstream.
+   */
+
+  private int preferredMergeBatchSize;
+
+  /**
+   * The configured size for each spill batch.
+   */
+  private int preferredSpillBatchSize;
+
+  /**
+   * Estimated number of rows that fit into a single spill batch.
+   */
+
+  private int spillBatchRowCount;
+
+  /**
+   * The estimated actual spill batch size which depends on the
+   * details of the data rows for any particular query.
+   */
+
+  private int expectedSpillBatchSize;
+
+  /**
+   * The number of records to add to each output batch sent to the
+   * downstream operator or spilled to disk.
+   */
+
+  private int mergeBatchRowCount;
+
+  private SortConfig config;
+
+//  private long spillPoint;
+
+//  private long minMergeMemory;
+
+  private int estimatedInputSize;
+
+  private boolean potentialOverflow;
+
+  public SortMemoryManager(SortConfig config, long memoryLimit) {
+this.config = config;
+
+// The maximum memory this operator can use as set by the
+// operator definition (propagated to the allocator.)
+
+if (config.maxMemory() > 0) {
+  this.memoryLimit = Math.min(memoryLimit, config.maxMemory());
+} else {
+  this.memoryLimit = memoryLimit;
+}
+
+preferredSpillBatchSize = config.spillBatchSize();;
+preferredMergeBatchSize = config.mergeBatchSize();
+  }
+
+  /**
+   * Update the data-driven memory use numbers including:
+   * 
+   * The average size of incoming records.
+   * The estimated spill and output batch size.
+   * The estimated number of average-size records per
+   * spill and output batch.
+   * The amount of memory set aside to hold the incoming
+   * batches before spilling starts.
+   * 
+   * 
+   * Under normal circumstances, the amount of memory available is much
+   * larger than the input, spill or merge batch sizes. The primary 
question
+   * is to determine how many input batches we can buffer during the load
+   * phase, and how many spill batches we can merge during the merge
+   * phase.
+   *
+   * @param batchSize the overall size of the current batch received from
+   * upstream
+   * @param batchRo

[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120485024
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -346,9 +346,7 @@ public IterOutcome innerNext() {
   if (unionTypeEnabled) {
 this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
   } else {
-throw SchemaChangeException.schemaChanged("Schema changes 
not supported in External Sort. Please enable Union type",
-schema,
-incoming.getSchema());
+throw new SchemaChangeException("Schema changes not 
supported in External Sort. Please enable Union type");
--- End diff --

Fixed. Converted to a UserException, else the fragment executor will wrap 
the schema change exception with some generic message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120485596
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortImpl.java
 ---
@@ -0,0 +1,495 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperExecContext;
+import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
+import org.apache.drill.exec.physical.impl.xsort.MSortTemplate;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup.InputBatch;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.SortMemoryManager.MergeTask;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorAccessibleUtilities;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+
+/**
+ * Implementation of the external sort which is wrapped into the Drill
+ * "next" protocol by the {@link ExternalSortBatch} class.
+ * 
+ * Accepts incoming batches. Sorts each and will spill to disk as needed.
+ * When all input is delivered, can either do an in-memory merge or a
+ * merge from disk. If runs spilled, may have to do one or more 
"consolidation"
+ * passes to reduce the number of runs to the level that will fit in 
memory.
+ */
+
+public class SortImpl {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExternalSortBatch.class);
+
+  /**
+   * Iterates over the final sorted results. Implemented differently
+   * depending on whether the results are in-memory or spilled to
+   * disk.
+   */
+
+  public interface SortResults {
+/**
+ * Container into which results are delivered. May the
+ * the original operator container, or may be a different
+ * one. This is the container that should be sent
+ * downstream. This is a fixed value for all returned
+ * results.
+ * @return
+ */
+VectorContainer getContainer();
+boolean next();
+void close();
+int getBatchCount();
+int getRecordCount();
+SelectionVector2 getSv2();
+SelectionVector4 getSv4();
+  }
+
+  private final SortConfig config;
+  private final SortMetrics metrics;
+  private final SortMemoryManager memManager;
+  private VectorContainer outputBatch;
+  private OperExecContext context;
+
+  /**
+   * Memory allocator for this operator itself. Incoming batches are
+   * transferred into this allocator. Intermediate batches used during
+   * merge also reside here.
+   */
+
+  private final BufferAllocator allocator;
+
+  private final SpilledRuns spilledRuns;
+
+  private final BufferedBatches bufferedBatches;
+
+  public SortImpl(OperExecContext opContext, SortConfig sortConfig, 
SpilledRuns spilledRuns, VectorContainer batch) {
+this.context = opContext;
+outputBatch = batch;
+this.spilledRuns = spilledRuns;
+allocator = opContext.getAllocator();
+config = sortConfig;
+memManager = new SortMemoryManager(config, allocator.getLimit());
+metrics = new SortMetrics(opContext.getStats());
+bufferedBatches = new BufferedBatches(opContext);
+
+// Reset the allocator to allow a 10% safety margin. This is done 
because
+// the memory manager will enforce the original limit. Changing the 
hard
+// limit will reduce the probability that random chance causes the 
allocator
+// to kill the query because of a small, spurious over-allocation.
+
+allocator.setLim

[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r119226233
  
--- Diff: common/src/main/java/org/apache/drill/test/SecondaryTest.java ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test;
+
+/**
+ * Label for Drill secondary tests. A secondary test is one that is 
omitted from
+ * the normal Drill build because:
+ * 
+ * It is slow
+ * It tests particular functionality which need not be tested on every
+ * build.
+ * It is old, but still worth running some times.
+ * It requires specialized setup and/or runs on specific 
platforms.
+ * 
+ *
+ * To mark a test as secondary, do either:
+ * {@literal @}Category(SecondaryTest.class)
+ * class MyTest {
+ *...
+ * }
+ * Or:
+ * class MyTest {
+ *   {@literal @}Category(SecondaryTest.class)
+ *   public void slowTest() { ... }
+ * }
+ * Maven is configured as follows:
+ *
+ *  maven-surefire-plugin
+ *  ...
+ *  
+ *...
+ *
org.apache.drill.test.SecondaryTest
+ *  
+ *  ...
+ *
+ *  To run the secondary tests (preliminary):
+ *  > mvn surefire:test -Dgroups=org.apache.drill.test.SecondaryTest 
-DexcludedGroups=
+ *  The above says to run the secondary test and exclude nothing. The 
exclusion
+ *  is required to override the default exclusions: skip that parameter 
and Maven will
+ *  blindly try to run all tests annotated with the SecondaryTest category 
except
+ *  for those annotated with the SecondaryTest category, which is not very 
helpful...
+ *  
+ *  Note that java-exec (only) provides a named execution to run 
large tests:
+ *  
+ *  mvn surefire:test@include-large-tests
+ *  
+ *  However, the above does not work. Nor did it work to include the 
category in
+ *  a profile as described earlier. At present, there is no known way to 
run just
--- End diff --

Sorry, I'll make the comment clearer. I'm trying to say that we claim we 
have a way to run secondary (slower) tests separate from the bulk of our unit 
tests. But, when I tried to use it, it turns out that the solution does not 
work. As a result, I was forced to include certain (slow) tests in the bulk of 
our unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120498177
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
 ---
@@ -0,0 +1,506 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+public class SortMemoryManager {
+
+  /**
+   * Maximum memory this operator may use. Usually comes from the
+   * operator definition, but may be overridden by a configuration
+   * parameter for unit testing.
+   */
+
+  private final long memoryLimit;
+
+  /**
+   * Estimated size of the records for this query, updated on each
+   * new batch received from upstream.
+   */
+
+  private int estimatedRowWidth;
+
+  /**
+   * Size of the merge batches that this operator produces. Generally
+   * the same as the merge batch size, unless low memory forces a smaller
+   * value.
+   */
+
+  private int expectedMergeBatchSize;
+
+  /**
+   * Estimate of the input batch size based on the largest batch seen
+   * thus far.
+   */
+  private int estimatedInputBatchSize;
+
+  /**
+   * Maximum memory level before spilling occurs. That is, we can buffer 
input
+   * batches in memory until we reach the level given by the buffer memory 
pool.
+   */
+
+  private long bufferMemoryLimit;
+
+  /**
+   * Maximum memory that can hold batches during the merge
+   * phase.
+   */
+
+  private long mergeMemoryLimit;
+
+  /**
+   * The target size for merge batches sent downstream.
+   */
+
+  private int preferredMergeBatchSize;
+
+  /**
+   * The configured size for each spill batch.
+   */
+  private int preferredSpillBatchSize;
+
+  /**
+   * Estimated number of rows that fit into a single spill batch.
+   */
+
+  private int spillBatchRowCount;
+
+  /**
+   * The estimated actual spill batch size which depends on the
+   * details of the data rows for any particular query.
+   */
+
+  private int expectedSpillBatchSize;
+
+  /**
+   * The number of records to add to each output batch sent to the
+   * downstream operator or spilled to disk.
+   */
+
+  private int mergeBatchRowCount;
+
+  private SortConfig config;
+
+//  private long spillPoint;
+
+//  private long minMergeMemory;
+
+  private int estimatedInputSize;
+
+  private boolean potentialOverflow;
+
+  public SortMemoryManager(SortConfig config, long memoryLimit) {
+this.config = config;
+
+// The maximum memory this operator can use as set by the
+// operator definition (propagated to the allocator.)
+
+if (config.maxMemory() > 0) {
+  this.memoryLimit = Math.min(memoryLimit, config.maxMemory());
+} else {
+  this.memoryLimit = memoryLimit;
+}
+
+preferredSpillBatchSize = config.spillBatchSize();;
+preferredMergeBatchSize = config.mergeBatchSize();
+  }
+
+  /**
+   * Update the data-driven memory use numbers including:
+   * 
+   * The average size of incoming records.
+   * The estimated spill and output batch size.
+   * The estimated number of average-size records per
+   * spill and output batch.
+   * The amount of memory set aside to hold the incoming
+   * batches before spilling starts.
+   * 
+   * 
+   * Under normal circumstances, the amount of memory available is much
+   * larger than the input, spill or merge batch sizes. The primary 
question
+   * is to determine how many input batches we can buffer during the load
+   * phase, and how many spill batches we can merge during the merge
+   * phase.
+   *
+   * @param batchSize the overall size of the current batch received from
+   * upstream
+   * @param batchRo

[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120496720
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortImpl.java
 ---
@@ -0,0 +1,495 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperExecContext;
+import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
+import org.apache.drill.exec.physical.impl.xsort.MSortTemplate;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup.InputBatch;
+import 
org.apache.drill.exec.physical.impl.xsort.managed.SortMemoryManager.MergeTask;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorAccessibleUtilities;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+
+/**
+ * Implementation of the external sort which is wrapped into the Drill
+ * "next" protocol by the {@link ExternalSortBatch} class.
+ * 
+ * Accepts incoming batches. Sorts each and will spill to disk as needed.
+ * When all input is delivered, can either do an in-memory merge or a
+ * merge from disk. If runs spilled, may have to do one or more 
"consolidation"
+ * passes to reduce the number of runs to the level that will fit in 
memory.
+ */
+
+public class SortImpl {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExternalSortBatch.class);
+
+  /**
+   * Iterates over the final sorted results. Implemented differently
+   * depending on whether the results are in-memory or spilled to
+   * disk.
+   */
+
+  public interface SortResults {
+/**
+ * Container into which results are delivered. May the
+ * the original operator container, or may be a different
+ * one. This is the container that should be sent
+ * downstream. This is a fixed value for all returned
+ * results.
+ * @return
+ */
+VectorContainer getContainer();
+boolean next();
+void close();
+int getBatchCount();
+int getRecordCount();
+SelectionVector2 getSv2();
+SelectionVector4 getSv4();
+  }
+
+  private final SortConfig config;
+  private final SortMetrics metrics;
+  private final SortMemoryManager memManager;
+  private VectorContainer outputBatch;
+  private OperExecContext context;
+
+  /**
+   * Memory allocator for this operator itself. Incoming batches are
+   * transferred into this allocator. Intermediate batches used during
+   * merge also reside here.
+   */
+
+  private final BufferAllocator allocator;
+
+  private final SpilledRuns spilledRuns;
+
+  private final BufferedBatches bufferedBatches;
+
+  public SortImpl(OperExecContext opContext, SortConfig sortConfig, 
SpilledRuns spilledRuns, VectorContainer batch) {
+this.context = opContext;
+outputBatch = batch;
+this.spilledRuns = spilledRuns;
+allocator = opContext.getAllocator();
+config = sortConfig;
+memManager = new SortMemoryManager(config, allocator.getLimit());
+metrics = new SortMetrics(opContext.getStats());
+bufferedBatches = new BufferedBatches(opContext);
+
+// Reset the allocator to allow a 10% safety margin. This is done 
because
+// the memory manager will enforce the original limit. Changing the 
hard
+// limit will reduce the probability that random chance causes the 
allocator
+// to kill the query because of a small, spurious over-allocation.
+
+allocator.setLim

[GitHub] drill pull request #808: DRILL-5325: Unit tests for the managed sort

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/808#discussion_r120497564
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
 ---
@@ -0,0 +1,506 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+public class SortMemoryManager {
--- End diff --

Not sure how transferable the logic here is. This is, essentially, a 
mathematical model of how memory is used by the sort/merge algorithm. Each 
operator needs its own model, else we're back to using made up logic that does 
not map to the actual code behavior.

For the most part, the numbers and algorithms are no longer "fuzzy." 
Instead, they reflect actual code behavior as determined first by inspection, 
then relentless QA testing. Any places where the estimates don't match reality 
have resulted in bugs filed against the sort.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #851: DRILL-5518: Test framework enhancements

2017-06-06 Thread paul-rogers
GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/851

DRILL-5518: Test framework enhancements

* Create a SubOperatorTest base class to do routine setup and shutdown.
* Additional methods to simplify creating complex schemas with field
widths.
* Define a test workspace with plugin-specific options (as for the CSV
storage plugin)
* When verifying row sets, add methods to verify and release just the
"actual" batch in addition to the existing method for verify and free
both the actual and expected batches.
* Allow reading of row set values as object for generic comparisons.
* "Column builder" within schema builder to simplify building a single
MatrializedField for tests.
* Misc. code cleanup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5518

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #851


commit e95ed08915fa04dfdc1977ce899ef34114c96ff1
Author: Paul Rogers 
Date:   2017-05-16T22:55:41Z

DRILL-5518: Test framework enhancements

* Create a SubOperatorTest base class to do routine setup and shutdown.
* Additional methods to simplify creating complex schemas with field
widths.
* Define a test workspace with plugin-specific options (as for the CSV
storage plugin)
* When verifying row sets, add methods to verify and release just the
"actual" batch in addition to the existing method for verify and free
both the actual and expected batches.
* Allow reading of row set values as object for generic comparisons.
* "Column builder" within schema builder to simplify building a single
MatrializedField for tests.
* Misc. code cleanup.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #847: Drill 5545: Update POM to add support for running f...

2017-06-06 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/847#discussion_r120474915
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 ---
@@ -462,7 +431,10 @@ public Void call() throws IOException {
   readStatus.setDiskScanTime(timeToRead);
   assert (totalValuesRead <= totalValuesCount);
 }
-synchronized (queue) {
+// FindBugs reports this is a possible bug, but it is not. You do 
need the synchronized block
--- End diff --

I'd intended to suppress the warning initially, but then got rid of it by 
using a different object to synchronize on. Fixed the comment to not refer to 
FindBugs though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #846: DRILL-5544: Out of heap running CTAS against text d...

2017-06-06 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/846#discussion_r120453820
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
 ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import static 
org.apache.parquet.column.statistics.Statistics.getStatsBasedOnType;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import 
org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.page.DictionaryPage;
+import org.apache.parquet.column.page.PageWriteStore;
+import org.apache.parquet.column.page.PageWriter;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.format.converter.ParquetMetadataConverter;
+import org.apache.parquet.hadoop.CodecFactory.BytesCompressor;
+import org.apache.parquet.io.ParquetEncodingException;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This is a copy of ColumnChunkPageWriteStore from parquet library except 
of OutputStream that is used here.
+ * Using of CapacityByteArrayOutputStream allows to use different 
ByteBuffer allocators.
--- End diff --

I'm not sure that 20 GB is "OK" to write to a file: whether on the heap or 
otherwise. But, if we accept the poor design that needs this much, then I agree 
it is better to have uncontrolled use of direct memory than the heap. But, in 
either case, uncontrolled memory use is a very bad idea.

Note that the only buffering needed is 4K: the disk block size. Experiments 
with spilling show that there is zero benefit to buffering above 32K.

So, for 10 runs * 32K = 320K. This is FAR less than the 20 GB proposed, and 
plenty for heap, to avoid the cost of copying data into, then out of, direct 
memory.

Further, what is wrong with Java's own `BufferedOutputStream`? Was this 
tried? It works well for simple cases, not sure about more complex cases.

Is the problem that Parquet must somehow buffer an entire page so it can 
"backpatch" the first bytes after writing all the rest? If so, that part of 
Parquet design is incompatible with a write-once file system such as HDFS: it 
forces unnecessary load on the memory system. (Note, however, than an updatable 
file system, such as Linux or MFS, would allow us to write data to disk, then 
go back and write the header, without buffering.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5571) Unable to cancel running queries from Web UI

2017-06-06 Thread Kedar Sankar Behera (JIRA)
Kedar Sankar Behera created DRILL-5571:
--

 Summary: Unable to cancel running queries from Web UI
 Key: DRILL-5571
 URL: https://issues.apache.org/jira/browse/DRILL-5571
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - HTTP
Affects Versions: 1.11.0
Reporter: Kedar Sankar Behera


We are unable to access profiles of some running queries. Hit the following 
error on the Web UI:
{code}
{
  “errorMessage” : “VALIDATION ERROR: No profile with given query id 
‘26c90b95-928b-15e3-bedc-bfb4a046cc8b’ exists. Please verify the query 
id.\n\n\n[Error Id: e6896a23-6932-469d-9968-d315fdd06dd4 ]”
}
{code}

And we cannot cancel the running queries whose profile page can be accessed:
{code}
Failure attempting to cancel query 26c90b33-cf7e-0495-8f76-55220f71f809.  
Unable to find information about where query is actively running.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #846: DRILL-5544: Out of heap running CTAS against text d...

2017-06-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/846#discussion_r119834619
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
 ---
@@ -0,0 +1,269 @@
+/*
--- End diff --

See below my answers regarding to the using of memory by parquet library.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #846: DRILL-5544: Out of heap running CTAS against text d...

2017-06-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/846#discussion_r119835458
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
 ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import static 
org.apache.parquet.column.statistics.Statistics.getStatsBasedOnType;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import 
org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.page.DictionaryPage;
+import org.apache.parquet.column.page.PageWriteStore;
+import org.apache.parquet.column.page.PageWriter;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.format.converter.ParquetMetadataConverter;
+import org.apache.parquet.hadoop.CodecFactory.BytesCompressor;
+import org.apache.parquet.io.ParquetEncodingException;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This is a copy of ColumnChunkPageWriteStore from parquet library except 
of OutputStream that is used here.
+ * Using of CapacityByteArrayOutputStream allows to use different 
ByteBuffer allocators.
--- End diff --

To avoid of frequent I/O writes to the disk, parquet library buffers an 
every page as `OutputStream` and writes the data only when `ColumnWriteStore` 
size is reached the block size 
[[link](https://github.com/apache/drill/blob/874bf6296dcd1a42c7cf7f097c1a6b5458010cbb/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java#L288)].
 For large block size value and big number of runs a lot of memory can be used 
before flushing the data. 
And using of ByteBuffers can allow to choose the type of memory to use. 

`ColumnChunkPageWriter` used `CapacityByteArrayOutputStream` insted of 
`ByteArrayOutputStream` and `ConcatenatingByteArrayCollector` before 
[PARQUET-160 
fix](https://github.com/apache/parquet-mr/pull/98/commits/8b54667650873c03ea66721d0f06bfad0b968f19).
 The fact that `ByteBufferAllocator` was introduced in PARQUET-77, but never 
used in the `ColumnChunkPageWriter`.

See the explanation of why direct memory is more suitable for buffering 
than heap space in the next comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #846: DRILL-5544: Out of heap running CTAS against text d...

2017-06-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/846#discussion_r119836892
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
 ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import static 
org.apache.parquet.column.statistics.Statistics.getStatsBasedOnType;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import 
org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.page.DictionaryPage;
+import org.apache.parquet.column.page.PageWriteStore;
+import org.apache.parquet.column.page.PageWriter;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.format.converter.ParquetMetadataConverter;
+import org.apache.parquet.hadoop.CodecFactory.BytesCompressor;
+import org.apache.parquet.io.ParquetEncodingException;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This is a copy of ColumnChunkPageWriteStore from parquet library except 
of OutputStream that is used here.
+ * Using of CapacityByteArrayOutputStream allows to use different 
ByteBuffer allocators.
+ * It will be no need in this class once PARQUET-1006 is resolved.
+ */
+public class ParquetColumnChunkPageWriteStore implements PageWriteStore, 
Closeable {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ParquetDirectByteBufferAllocator.class);
+
+  private static ParquetMetadataConverter parquetMetadataConverter = new 
ParquetMetadataConverter();
+
+  private final Map writers = 
Maps.newHashMap();
+  private final MessageType schema;
+
+  public ParquetColumnChunkPageWriteStore(BytesCompressor compressor,
+  MessageType schema,
+  int initialSlabSize,
+  int maxCapacityHint,
+  ByteBufferAllocator allocator) {
+this.schema = schema;
+for (ColumnDescriptor path : schema.getColumns()) {
+  writers.put(path,  new ColumnChunkPageWriter(path, compressor, 
initialSlabSize, maxCapacityHint, allocator));
+}
+  }
+
+  @Override
+  public PageWriter getPageWriter(ColumnDescriptor path) {
+return writers.get(path);
+  }
+
+  public void flushToFileWriter(ParquetFileWriter writer) throws 
IOException {
+for (ColumnDescriptor path : schema.getColumns()) {
+  ColumnChunkPageWriter pageWriter = writers.get(path);
+  pageWriter.writeToFileWriter(writer);
+}
+  }
+
+  @Override
+  public void close() {
+for (ColumnChunkPageWriter pageWriter : writers.values()) {
+  pageWriter.close();
+}
+  }
+
+  private static final class ColumnChunkPageWriter implements PageWriter, 
Closeable {
+
+private final ColumnDescriptor path;
+private final BytesCompressor compressor;
+
+private final CapacityByteArrayOutputStream buf;
+private DictionaryPage dictionaryPage;
+
+private long uncompressedLength;
+private long compressedLength;
+private long totalValueCount;
+private int pageCount;
+
+// repetition and definition level encodings are used only for v1 
pages a

[GitHub] drill pull request #846: DRILL-5544: Out of heap running CTAS against text d...

2017-06-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/846#discussion_r119836279
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/parquet/hadoop/ParquetColumnChunkPageWriteStore.java
 ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import static 
org.apache.parquet.column.statistics.Statistics.getStatsBasedOnType;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import 
org.apache.drill.exec.store.parquet.ParquetDirectByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.page.DictionaryPage;
+import org.apache.parquet.column.page.PageWriteStore;
+import org.apache.parquet.column.page.PageWriter;
+import org.apache.parquet.column.statistics.Statistics;
+import org.apache.parquet.format.converter.ParquetMetadataConverter;
+import org.apache.parquet.hadoop.CodecFactory.BytesCompressor;
+import org.apache.parquet.io.ParquetEncodingException;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This is a copy of ColumnChunkPageWriteStore from parquet library except 
of OutputStream that is used here.
+ * Using of CapacityByteArrayOutputStream allows to use different 
ByteBuffer allocators.
--- End diff --

Your first case is correct - for 10 runs * 512 (512Mb - default parquet 
block size in Drill) = 5Gb or for 40 runs we need 20Gb of memory. It is so much 
for heap memory, but ok for direct one. And we can control how much data can be 
buffered before writing to the disk with `store.parquet.block-size` option. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #846: DRILL-5544: Out of heap running CTAS against text d...

2017-06-06 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/846#discussion_r119833975
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ---
@@ -268,8 +278,7 @@ private void flush() throws IOException {
 }
 
 store.close();
-// TODO(jaltekruse) - review this close method should no longer be 
necessary
-//ColumnChunkPageWriteStoreExposer.close(pageStore);
+pageStore.close();
--- End diff --

Agree. Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5569) NullPointerException

2017-06-06 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5569:
-

 Summary: NullPointerException
 Key: DRILL-5569
 URL: https://issues.apache.org/jira/browse/DRILL-5569
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.11.0
Reporter: Khurram Faraaz


The below Exception was seen when TPC-DS Query 4 was executed against Drill 
1.11.0

Drill 1.11.0 git commit ID: d11aba2
[root@centos-01 mapr]# cat MapRBuildVersion
5.2.1.42646.GA

Stack trace from drillbit.log
{noformat}
2017-06-06 07:46:43,160 [Drillbit-ShutdownHook#0] WARN  
o.apache.drill.exec.work.WorkManager - Closing WorkManager but there are 80 
running fragments.
2017-06-06 07:46:43,207 [Drillbit-ShutdownHook#0] INFO  
o.a.drill.exec.compile.CodeCompiler - Stats: code gen count: 959, cache miss 
count: 12, hit rate: 99%
2017-06-06 07:46:43,504 [scan-3] ERROR o.a.d.e.u.f.BufferedDirectBufInputStream 
- Error reading from stream 1_1_0.parquet. Error was : Error reading out of an 
FSDataInputStream using the Hadoop 2 ByteBuffer based read method.
2017-06-06 07:46:43,510 [scan-8] ERROR o.a.d.e.u.f.BufferedDirectBufInputStream 
- Error reading from stream 1_1_0.parquet. Error was : Error reading out of an 
FSDataInputStream using the Hadoop 2 ByteBuffer based read method.
2017-06-06 07:46:43,514 [scan-8] INFO  o.a.d.e.s.p.c.AsyncPageReader - User 
Error Occurred: Exception occurred while reading from disk. 
(java.io.IOException: Error reading out of an FSDataInputStream using the 
Hadoop 2 ByteBuffer based read method.)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception 
occurred while reading from disk.

File:  /drill/testdata/tpcds_sf1/parquet/store_sales/1_1_0.parquet
Column:  ss_ext_list_price
Row Group Start:  75660513

[Error Id: 3a758095-fcc4-4364-a50b-33a027c1beb6 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:199)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$600(AsyncPageReader.java:81)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:483)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:392)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_65]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_65]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.io.IOException: java.io.IOException: Error reading out of an 
FSDataInputStream using the Hadoop 2 ByteBuffer based read method.
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.getNextBlock(BufferedDirectBufInputStream.java:185)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.readInternal(BufferedDirectBufInputStream.java:212)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.read(BufferedDirectBufInputStream.java:277)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.util.filereader.DirectBufInputStream.getNext(DirectBufInputStream.java:111)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
... 5 common frames omitted
Caused by: java.io.IOException: Error reading out of an FSDataInputStream using 
the Hadoop 2 ByteBuffer based read method.
at 
org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(CompatibilityUtil.java:99)
 ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.getNextBlock(BufferedDirectBufInputStream.java:182)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
... 9 common frames omitted
Caused by: java.lang.NullPointerException: null
at 
com.mapr.fs.MapRFsInStream.readIntoDirectByteBuffer(MapRFsInStream.java:219) 
~[maprfs-5.2.1-mapr.jar:5.2.1-mapr]
at com.mapr.fs.MapRFsInStream.read(MapRFsInStream.java:333) 
~[maprfs-5.2.1-mapr.jar:5.

[jira] [Created] (DRILL-5570) InterruptedException: null

2017-06-06 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5570:
-

 Summary: InterruptedException: null
 Key: DRILL-5570
 URL: https://issues.apache.org/jira/browse/DRILL-5570
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
 Environment: 3 node CentOS cluster
Reporter: Khurram Faraaz


When TPC-DS query11 was executed concurrently and one of the non-foreman 
Drillbits was stopped (./bin/drillbit.sh stop)
we see the below system error InterruptedException in the drillbit.log of the 
non-foreman node

Drill 1.11.0 git commit ID: d11aba2
[root@centos-01 mapr]# cat MapRBuildVersion
5.2.1.42646.GA

{noformat}
2017-06-06 07:46:44,288 [26c9a242-dfa1-35be-b5f1-ff6b4fa66086:frag:11:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: InterruptedException

Fragment 11:0

[Error Id: 40723399-8983-4777-a2bb-dc9d55ae338e on centos-02.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
InterruptedException

Fragment 11:0

[Error Id: 40723399-8983-4777-a2bb-dc9d55ae338e on centos-02.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_65]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
Interrupted but context.shouldContinue() is true
at 
org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:178)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:141)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:159)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.8.0_65]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_65]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 ~[hadoop-common-2.7.0-mapr-1607.jar:na]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
... 4 common frames omitted
Caused by: java.lang.InterruptedException: null
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
 ~[na:1.8.0_65]
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
 ~[na:1.8.0_65]
at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 ~[na:1.8.0_65]
at 
java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 
~[na:1.8.0_65]
at 
org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(Un

[jira] [Resolved] (DRILL-5164) Equi-join query results in CompileException when inputs have large number of columns

2017-06-06 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-5164.

Resolution: Fixed

Fixed in 
[b14e30b|https://github.com/apache/drill/commit/b14e30b3df9803fef418d083837c91d57d7a5fe3]

> Equi-join query results in CompileException when inputs have large number of 
> columns
> 
>
> Key: DRILL-5164
> URL: https://issues.apache.org/jira/browse/DRILL-5164
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: 1.11.0
>
> Attachments: manyColsInJson.json
>
>
> Drill 1.9.0 
> git commit ID : 4c1b420b
> 4 node CentOS cluster
> JSON file has 4095 keys (columns)
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
> `manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-12-26 09:52:11,321 [279f17fd-c8f0-5d18-1124-76099f0a5cc8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> org.apache.drill.exec.exception.SchemaChangeException: 
> org.apache.drill.exec.exception.ClassTransformationException: 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value:
> package org.apache.drill.exec.test.generated;
> ...
> public class HashJoinProbeGen294 {
> NullableVarCharVector[] vv0;
> NullableVarCharVector vv3;
> NullableVarCharVector[] vv6;
> ...
> vv49137 .copyFromSafe((probeIndex), (outIndex), vv49134);
> vv49143 .copyFromSafe((probeIndex), (outIndex), vv49140);
> vv49149 .copyFromSafe((probeIndex), (outIndex), vv49146);
> }
> }
> 
> public void __DRILL_INIT__()
> throws SchemaChangeException
> {
> }
> }
> at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:302)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78) 
> ~[drill-java-ex