[GitHub] drill issue #1023: DRILL-5922 Fixed Child Allocator Leak. DRILL-5926 Stabali...

2018-01-11 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1023
  
Travis build is flakey. Rerunning build.


---


[GitHub] drill pull request #1023: DRILL-5922 Fixed Child Allocator Leak. DRILL-5926 ...

2018-01-11 Thread ilooner
GitHub user ilooner reopened a pull request:

https://github.com/apache/drill/pull/1023

DRILL-5922 Fixed Child Allocator Leak. DRILL-5926 Stabalize TestValueVector 
tests.

## DRILL-5922

 - QueryContext was never closed when the Foreman finished. So the query 
child allocator was never closed.
 - The PlanSplitter created a QueryContext temporarily to construct an RPC 
message but never closed it.
 - The waitForExit method was error prone. Changed it to use the standard 
condition variable pattern.

## DRILL-5926

The TestValueVector tests would run out of memory. I simple increased the 
MaxDirectMemorySize for the forked test processes in the pom to avoid this.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-5922

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1023


commit 005a478238ff8a0ec085e051c099a109a73d3e8a
Author: Timothy Farkas 
Date:   2017-11-10T19:13:18Z

DRILL-5922:
 - The QueryContext was never closed when the Foreman finished, so it's 
child allocator was never closed. Now it is.
 - The PlanSplitter created a QueryContext temporarily to construct an RPC 
message but never closed it. Now the temp QueryContext is closed.
 - The waitForExit method was error prone. Changed it to use the standard 
condition variable pattern.
 - Fixed timeouts in graceful shutdown tests

commit 551c54eca5ed9a69b990daa2a300c850e11e9376
Author: Timothy Farkas 
Date:   2017-11-10T20:59:42Z

DRILL-5926: The TestValueVector tests would run out of memory. Increased 
the MaxDirectMemorySize for the forked test processes in the pom to avoid this.

commit 31175885bcd60abde5ec88ff3fcece589619265f
Author: Timothy Farkas 
Date:   2018-01-09T09:20:09Z

DRILL-6003: Fixed sporadic test failures caused by the creation of 
duplicate clusters in TestDynamicUDFSupport.




---


[GitHub] drill pull request #1023: DRILL-5922 Fixed Child Allocator Leak. DRILL-5926 ...

2018-01-11 Thread ilooner
Github user ilooner closed the pull request at:

https://github.com/apache/drill/pull/1023


---


[GitHub] drill pull request #1073: DRILL-5967: Fixed memory leak in OrderedPartitionS...

2018-01-11 Thread ilooner
Github user ilooner closed the pull request at:

https://github.com/apache/drill/pull/1073


---


[GitHub] drill pull request #1073: DRILL-5967: Fixed memory leak in OrderedPartitionS...

2018-01-11 Thread ilooner
GitHub user ilooner reopened a pull request:

https://github.com/apache/drill/pull/1073

DRILL-5967: Fixed memory leak in OrderedPartitionSender

The OrderedPartitionSender was leaking memory every time it was created 
because it created a wrapper RecordBatch which allocated memory but was never 
closed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-5967

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1073.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1073


commit 13cd2fe7e8dccb2ac7546de508fbfbed8e19b48b
Author: Timothy Farkas 
Date:   2017-12-14T18:48:27Z

DRILL-5967: Fixed memory leak in OrderedPartitionSender

commit e48e274649be7b9a14746c200d0ec5f5b3740664
Author: Timothy Farkas 
Date:   2017-12-18T20:01:30Z

 - Applied review comments

commit 2233e6b32c1dd9f1b89fe4d6fa557e8bd894592e
Author: Timothy Farkas 
Date:   2017-12-18T21:54:01Z

 - Applied more review comments

commit f3aef285fc168cad07b32e5e99cd2cd7811d765c
Author: Timothy Farkas 
Date:   2017-12-18T21:56:13Z

 - Applied review comments

commit 0cdb2f3c4944f6253c72d318276603fc329546b5
Author: Timothy Farkas 
Date:   2017-12-19T19:27:19Z

 - Removed unnecessary creation of a list in OrderedPartitionSenderCreator

commit f6d4aec39aad7326a735b883647772b28e2b86ab
Author: Timothy Farkas 
Date:   2018-01-11T22:19:18Z

 - Added comment documenting the workaround.




---


[jira] [Created] (DRILL-6083) RestClientFixture does not connect to the correct webserver port

2018-01-11 Thread Karthikeyan Manivannan (JIRA)
Karthikeyan Manivannan created DRILL-6083:
-

 Summary: RestClientFixture does not connect to the correct 
webserver port
 Key: DRILL-6083
 URL: https://issues.apache.org/jira/browse/DRILL-6083
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Affects Versions: Future
Reporter: Karthikeyan Manivannan
Assignee: Karthikeyan Manivannan
 Fix For: 1.13.0


RestClientFixture always connects to the default http port (8047) instead of 
connecting to the webserver-port of the cluster. The cluster's webserver port 
won't be 8047 if there are other Drillbits running when the cluster is launched.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill issue #1073: DRILL-5967: Fixed memory leak in OrderedPartitionSender

2018-01-11 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1073
  
@paul-rogers Added the comment. Please take a look and let me know if you'd 
like any other changes.


---


[GitHub] drill pull request #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-11 Thread chunhui-shi
Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1066#discussion_r161077516
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/AggPruleBase.java
 ---
@@ -82,4 +83,19 @@ protected boolean create2PhasePlan(RelOptRuleCall call, 
DrillAggregateRel aggreg
 }
 return true;
   }
+
+  /**
+   * Returns group-by keys with the remapped arguments for specified 
aggregate.
+   *
+   * @param groupSet ImmutableBitSet of aggregate rel node, whose group-by 
keys should be remapped.
+   * @return {@link ImmutableBitSet} instance with remapped keys.
+   */
+  public static ImmutableBitSet remapGroupSet(ImmutableBitSet groupSet) {
--- End diff --

what is the reason we are going this remap with new calcite? 
And if the result is only depended on size of groupSet, we don't really 
need to iterate through the groupSet.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread daveoshinsky
Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161069079
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/CoreDecimalUtility.java ---
@@ -32,6 +32,14 @@ public static long getDecimal18FromBigDecimal(BigDecimal 
input, int scale, int p
   }
 
   public static int getMaxPrecision(TypeProtos.MinorType decimalType) {
+/*
--- End diff --

The two DAO comments are removed in my local merge environment.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread daveoshinsky
Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161067313
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

I take that back.  Your change ensures it doesn't copy too many bytes.  I'm 
including it in the remerged file.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread daveoshinsky
Github user daveoshinsky commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161066659
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

The Math.min use here ensures that we don't overrun the output buffer.  It 
the string doesn't fit, it's truncated to the length available in the output 
buffer.  Your change makes the code harder to understand than what I had, and 
your change is not necessary (the Math.min again).  I am forced to manually 
remerge quite a few of my changes because of the way git rebase didn't merge to 
master the way I had wished, so am now working on this file.  Leaving the line 
as I had it for now...


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161065237
  
--- Diff: protocol/src/main/protobuf/UserBitShared.proto ---
@@ -148,6 +148,8 @@ message SerializedField {
   optional int32 value_count = 4;
   optional int32 var_byte_length = 5;
   optional int32 buffer_length = 7;
+  optional bool is_dup = 8;
+  optional int32 logical_value_count = 9;
--- End diff --

Just had an offline conversation with Paul and Kunal. The agreement is that 
we should introduce a versioning mechanism at the protocol level so that 
clients could advertise their version identifier; this way the server can use 
this information to turn on / off features based on the client capabilities. 
The task is only to create the preliminary versioning infrastructure; more 
sophistication can be added later on. 


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161040070
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -309,7 +314,7 @@ public void setInitialCapacity(final int valueCount) {
 if (size > MAX_ALLOCATION_SIZE) {
   throw new OversizedAllocationException("Requested amount of memory 
is more than max allowed allocation size");
 }
-allocationSizeInBytes = (int) size;
+allocationSizeInBytes = (int)size;
--- End diff --

Fixed.


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161039122
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -874,6 +880,46 @@ public void setSafe(int index, BigDecimal value) {
   set(index, value);
 }
 
+/**
+ * Copies the bulk input into this value vector and extends its 
capacity if necessary.
+ * @param input bulk input
+ */
+public  void setSafe(VLBulkInput input) {
+  setSafe(input, null);
+}
+
+/**
+ * Copies the bulk input into this value vector and extends its 
capacity if necessary. The callback
+ * mechanism allows decoration as caller is invoked for each bulk 
entry.
+ *
+ * @param input bulk input
+ * @param callback a bulk input callback object (optional)
+ */
+public  void setSafe(VLBulkInput input, 
VLBulkInput.BulkInputCallback callback) {
--- End diff --

This code is not Parquet specific. Instead, it can be triggered by any 
Reader which desires to load data in a bulk fashion. Vectors currently expose 
Mutator APIs for loading single values; I see no good reason which prevent us 
from passing bulk values instead of a single one at a time which prevent us 
from code optimization. Look at ByBuffer APIs they allow you to pass single 
byte values but also byte arrays.


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161036145
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/util/MemoryUtils.java ---
@@ -0,0 +1,186 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import java.lang.reflect.Field;
+import java.nio.ByteOrder;
+
+import sun.misc.Unsafe;
+
+/** Exposes advanced Memory Access APIs for Little-Endian / Unaligned 
platforms */
+@SuppressWarnings("restriction")
+public final class MemoryUtils {
+
+  // Ensure this is a little-endian hardware */
+  static {
+if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) {
+  throw new IllegalStateException("Drill only runs on LittleEndian 
systems.");
+}
+  }
+
+  /** Java's unsafe object */
+  private static Unsafe UNSAFE;
+
+  static {
+try {
+  Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
+  theUnsafe.setAccessible(true);
+  UNSAFE = (Unsafe) theUnsafe.get(null);
+}
+catch (Exception e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  /** Byte arrays offset */
+  private static final long BYTE_ARRAY_OFFSET = 
UNSAFE.arrayBaseOffset(byte[].class);
+
+  /** Number of bytes in a long */
+  public static final int LONG_NUM_BYTES  = 8;
+  /** Number of bytes in an int */
+  public static final int INT_NUM_BYTES   = 4;
+  /** Number of bytes in a short */
+  public static final int SHORT_NUM_BYTES = 2;
+

+//
+// APIs

+//
+
+  /**
+   * @param data source byte array
+   * @param index index within the byte array
+   * @return short value starting at data+index
+   */
+  public static short getShort(byte[] data, int index) {
--- End diff --

Please refer to my reply to Parth. I personally think the memory package 
should be our main abstraction (of course along with guidelines). Putting all 
memory access APIs within the DrillBuf class will only clutter that class.. I 
rather have a centralized memory utility class for advanced processing which do 
not involve a DrillBuf.


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161045748
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -386,7 +391,7 @@ public void reAlloc() {
   throw new OversizedAllocationException("Unable to expand the buffer. 
Max allowed buffer size is reached.");
 }
 
-logger.trace("Reallocating VarChar, new size {}", newAllocationSize);
+logger.trace("Reallocating VarChar, new size {}",newAllocationSize);
--- End diff --

Fixed.


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161030963
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 ---
@@ -449,7 +451,7 @@ public void close() throws Exception {
 // been created. Gracefully handle that case.
 
 if (options != null) {
-  options.close();
-}
+options.close();
   }
 }
+}
--- End diff --

The issue was with the close() method which was not properly indented. 
Fixed it.


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161032779
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableColumnReader.java
 ---
@@ -165,17 +181,133 @@
   + "Run Length: {} \t Null Run Length: {} \t readCount: {} \t 
writeCount: {} \t "
   + "recordsReadInThisIteration: {} \t 
valuesReadInCurrentPass: {} \t "
   + "totalValuesRead: {} \t readStartInBytes: {} \t 
readLength: {} \t pageReader.byteLength: {} \t "
-  + "definitionLevelsRead: {} \t pageReader.currentPageCount: 
{}",
+  + "currPageValuesProcessed: {} \t 
pageReader.currentPageCount: {}",
   recordsToReadInThisPass, runLength, nullRunLength, readCount,
   writeCount, recordsReadInThisIteration, valuesReadInCurrentPass,
   totalValuesRead, readStartInBytes, readLength, 
pageReader.byteLength,
-  definitionLevelsRead, pageReader.currentPageCount);
+  currPageValuesProcessed, pageReader.currentPageCount);
 
 }
 
 valueVec.getMutator().setValueCount(valuesReadInCurrentPass);
   }
 
+  private final void processPagesBulk(long recordsToReadInThisPass) throws 
IOException {
+readStartInBytes   = 0;
+readLength = 0;
+readLengthInBits   = 0;
+recordsReadInThisIteration = 0;
+vectorData = castedBaseVector.getBuffer();
+
+// values need to be spaced out where nulls appear in the column
+// leaving blank space for nulls allows for random access to values
+// to optimize copying data out of the buffered disk stream, runs of 
defined values
+// are located and copied together, rather than copying individual 
values
+
+int valueCount   = 0;
+final int maxValuesToProcess = Math.min((int) recordsToReadInThisPass, 
valueVec.getValueCapacity());
+
+// To handle the case where the page has been already loaded
+if (pageReader.definitionLevels != null && currPageValuesProcessed == 
0) {
+  definitionLevelWrapper.set(pageReader.definitionLevels, 
pageReader.currentPageCount);
+}
+
+while (valueCount < maxValuesToProcess) {
+
+  // read a page if needed
+  if (!pageReader.hasPage() || (currPageValuesProcessed == 
pageReader.currentPageCount)) {
+if (!pageReader.next()) {
+  break;
+}
+
+//New page. Reset the definition level.
+currPageValuesProcessed= 0;
+recordsReadInThisIteration = 0;
+readStartInBytes   = 0;
+
+// Update the Definition Level reader
+definitionLevelWrapper.set(pageReader.definitionLevels, 
pageReader.currentPageCount);
+  }
+
+  definitionLevelWrapper.readFirstIntegerIfNeeded();
+
+  int numNullValues   = 0;
+  int numNonNullValues= 0;
+  final int remaining = maxValuesToProcess - valueCount;
+  int currBatchSz = Math.min(remaining, 
(pageReader.currentPageCount - currPageValuesProcessed));
+  assert currBatchSz > 0;
+
+  // Let's skip the next run of nulls if any ...
+  int idx;
+  for (idx = 0; idx < currBatchSz; ++idx) {
+if (definitionLevelWrapper.readCurrInteger() == 1) {
+  break; // non-value encountered
+}
+definitionLevelWrapper.nextIntegerIfNotEOF();
+  }
+  numNullValues += idx;
+
+  // Write the nulls if any
--- End diff --

This is the original logic for processing fixed precision columns (except 
more efficient); the intent is to figure out how many null values there are 
before processing non-null values. We could have avoided this for-loop if we 
controlled the Parquet ValuesReader API; I was tempted to do that but decided 
to prioritize my tasks. The good news is that I recorded all such improvement 
opportunities and will try to tackle them at some point. 


---


[jira] [Created] (DRILL-6082) RpcExceptionHandler log doesn't print "cause" for exception

2018-01-11 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6082:


 Summary: RpcExceptionHandler log doesn't print "cause" for 
exception
 Key: DRILL-6082
 URL: https://issues.apache.org/jira/browse/DRILL-6082
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC
Affects Versions: 1.12.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia
 Fix For: 1.13.0


In 
[RpcExceptionHandler|https://github.com/apache/drill/blob/master/exec/rpc/src/main/java/org/apache/drill/exec/rpc/RpcExceptionHandler.java#L35]
 we are printing the warning log with cause but in wrong format. This is 
resulting in loss of actual "cause" in the log file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-11 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r161028252
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -471,8 +519,8 @@ public void close() throws Exception {
 container.clear();
 mutator.clear();
 if (currentReader != null) {
-  currentReader.close();
-}
+currentReader.close();
+  }
--- End diff --

done.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161015751
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -418,6 +422,16 @@ public void get(int index, 
Nullable${minor.class}Holder holder){
 
 
 <#switch minor.class>
+<#case "VarDecimal">
+@Override
+public ${friendlyType} getObject(int index) {
+  byte[] b = get(index);
+  BigInteger bi = new BigInteger(b);
+  BigDecimal bd = new BigDecimal(bi, getField().getScale());
+  //System.out.println("VarDecimal getObject " + index + " scale " + 
getField().getScale() + " len " + b.length + " bd " + bd);
--- End diff --

Please remove this comment


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161006318
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastVarCharDecimal.java ---
@@ -85,6 +85,30 @@ public void eval() {
 
 // Assign the scale and precision
 out.scale = (int) scale.value;
+
+<#if type.to.endsWith("VarDecimal")>
+
+// VarDecimal gets its own cast logic
+int readIndex = in.start;
+int endIndex  = in.end;
+StringBuffer sb = new StringBuffer();
--- End diff --

I think it would be easier to receive string using this code:
```
byte[] buf = new byte[in.end - in.start];
buffer.getBytes(in.start, buf, 0, in.end - in.start);
String s = new String(buf, Charsets.UTF_8);
```


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161004122
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastIntDecimal.java ---
@@ -68,15 +68,31 @@ public void setup() {
 
 public void eval() {
 out.scale = (int) scale.value;
+
+<#if !type.to.endsWith("VarDecimal")>
 out.precision = (int) precision.value;
+
 
-<#if type.to == "Decimal9" || type.to == "Decimal18">
+<#if type.to.endsWith("VarDecimal")>
+out.start = 0;
+out.buffer = buffer;
+String s = Long.toString((long)in.value);
+for (int i = 0; i < out.scale; ++i) {  // add 0's to get unscaled 
integer
+s += "0";
+}
+java.math.BigInteger bi = new java.math.BigInteger(s);
+java.math.BigDecimal bd = new java.math.BigDecimal(bi, out.scale);
+//System.out.println("CastIntDecimal in " + in.value + " scale " + 
out.scale + " string " + s + " bd " + bd);
--- End diff --

Please remove this comment.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160963518
  
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
@@ -327,13 +327,17 @@ public Mutator getMutator(){
 return v;
   }
 
-  public void copyFrom(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
+  protected void copyFromUnsafe(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
--- End diff --

Since we already have `copyFromSafe()` method in this class, `copyFrom()` 
is supposed to be unsafe.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160941530
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

I think it would be better just throw the exception. Then it is easier to 
detect a case when data was stored in the vector incorrectly, or if it has come 
from the stored data, inform a user that data was corrupted.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160980556
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -539,7 +553,12 @@ public void setValueLengthSafe(int index, int length) {
 }
 
 
-public void setSafe(int index, int start, int end, DrillBuf buffer){
+<#if type.minor == "VarDecimal">
--- End diff --

Sorry, perhaps I made mistake, yes, it looks OK. 
But do we need to pass `scale` for `VarDecimal` even if we don't use it in 
this method? 
I suppose should be made a change in `ComplexWriters.java` to avoid the 
usage of this method with the additional argument.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160987506
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

I guess we should do the same thing as for other decimals:
```
out.buffer.setBytes(0, String.valueOf(str.substring(0, 
out.end)).getBytes());
```


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160939410
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalFunctions.java ---
@@ -102,7 +111,578 @@
 <#-- For each DECIMAL... type (in DecimalTypes.tdd) ... -->
 <#list comparisonTypesDecimal.decimalTypes as type>
 
-<#if type.name.endsWith("Sparse")>
+<#if type.name.endsWith("VarDecimal")>
+
+<@pp.changeOutputFile 
name="/org/apache/drill/exec/expr/fn/impl/${type.name}Functions.java" />
+
+<#include "/@includes/license.ftl" />
+
+package org.apache.drill.exec.expr.fn.impl;
+
+<#include "/@includes/vv_imports.ftl" />
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
+import org.apache.drill.exec.expr.holders.*;
+import org.apache.drill.exec.record.RecordBatch;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.DrillBuf;
+
+import java.nio.ByteBuffer;
+
+@SuppressWarnings("unused")
+public class ${type.name}Functions {
+private static void initBuffer(DrillBuf buffer) {
+// for VarDecimal, this method of setting initial size is actually 
only a very rough heuristic.
+int size = (${type.storage} * 
(org.apache.drill.exec.util.DecimalUtility.INTEGER_SIZE));
+buffer = buffer.reallocIfNeeded(size);
+ }
+
+@FunctionTemplate(name = "subtract", scope = 
FunctionTemplate.FunctionScope.DECIMAL_ADD_SCALE, nulls = 
NullHandling.NULL_IF_NULL)
--- End diff --

By default `checkPrecision` is false, so it is not required to specify it 
explicitly in this case. 


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160958360
  
--- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
@@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder 
h) {
 
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || 
minor.class == "Decimal28Sparse" || minor.class == "Decimal38Sparse" || 
minor.class == "Decimal28Dense" || minor.class == "Decimal38Dense")>
   public void write${minor.class}(<#list fields as field>${field.type} 
${field.name}<#if field_has_next>, ) {
-mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );
+mutator.addSafe(idx(), <#list fields as field><#if field.name == 
"scale"><#break>${field.name}<#if field_has_next && 
fields[field_index+1].name != "scale" >, );
--- End diff --

I meant to replace this line by something like this: 
```
<#if minor.class == "VarDecimal">
mutator.addSafe(idx()<#list fields as field><#if field.include!true>, 
${field.name}); // or something better
<#else>
mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );

```
Since `field.include` is used only for Decimal types, we can replace all 
this code by this line:
```
mutator.addSafe(idx()<#list fields as field><#if field.include!true>, 
${field.name});
```
But do we use this method for VarDecimal somewhere? If not, it would be 
better to add VarDecimal `minor.class` to the if condition with other decimal 
types above.

BTW, please modify similar changes below in this class and in 
`RepeatedValueVectors.java`


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-11 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r161002913
  
--- Diff: 
common/src/main/java/org/apache/drill/common/util/CoreDecimalUtility.java ---
@@ -32,6 +32,14 @@ public static long getDecimal18FromBigDecimal(BigDecimal 
input, int scale, int p
   }
 
   public static int getMaxPrecision(TypeProtos.MinorType decimalType) {
+/*
--- End diff --

This method is used in `TypeCastRules.getCost()` and in 
`CoreDecimalUtility.getPrecisionRange()`.
Regarding its usage in `TypeCastRules.getCost()`, we may return 
`RelDataTypeSystem.getMaxNumericPrecision()` for VarDecimal. 
But regarding its usage in `CoreDecimalUtility.getPrecisionRange()` method, 
`getPrecisionRange()` is used in DecimalScalePrecisionFunction classes for 
calculating resulting precision. I suppose these methods should be rewritten in 
the same way as it is implemented in Calcite `RelDataTypeFactoryImpl`. But I 
think it would be better to make these changes in separate Jira.

But now please remove these comments. 


---


[GitHub] drill issue #1074: DRILL-6025: Display execution time of a query in RUNNING ...

2018-01-11 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1074
  
+1, LGTM.


---


[GitHub] drill pull request #1074: DRILL-6025: Display execution time of a query in R...

2018-01-11 Thread prasadns14
Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1074#discussion_r161012925
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ---
@@ -211,16 +211,25 @@ public String getExecutionDuration() {
   return NOT_AVAILABLE_LABEL;
 }
 
+long queryEndTime;
+
+// Check if State is RUNNING, set end time to current time
+if (profile.getState() == QueryState.RUNNING) {
--- End diff --

@arina-ielchiieva, thank you for catching the mistake.
Fixed it


---


Odd Error on Login with 1.13-SNAPSHOT

2018-01-11 Thread John Omernik
I am probably missing something minor here, but I am working with Ted
Dunning on some PCAP plugin stuff, so I built his 1.13 SNAPSHOT, and when I
try to login I see

{
  "errorMessage" : "No configuration setting found for key
'drill.exec.http.auth'"
}



I am guessing that something was added that I need to fill out in my
config? Is there a JIRA or something that can guide me on this?


Thanks


[GitHub] drill pull request #1088: DRILL-6081: Set query end time before writing fina...

2018-01-11 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1088

DRILL-6081: Set query end time before writing final profile



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-6081

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1088


commit e60baf9dc194853090173c13ac959f40b938032a
Author: Arina Ielchiieva 
Date:   2018-01-11T15:40:04Z

DRILL-6081: Set query end time before writing final profile




---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-11 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160996060
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -424,6 +429,23 @@ public MetadataContext getMetaContext() {
 return metaContext;
   }
 
+  /**
+   * @return true if this file selectionRoot points to an empty directory, 
false otherwise
+   */
+  public boolean isEmptyDirectory() {
+return emptyDirectory;
+  }
+
+  /**
+   * Setting this as true allows to identify this as empty directory file 
selection
+   *
+   * @param emptyDirectory empty directory flag
+   */
+  public void setEmptyDirectory(boolean emptyDirectory) {
--- End diff --

Done


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-11 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160997779
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
 ---
@@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, 
FileStatus dir) throws IOExcep
 }
 
 boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
-  Path p = new Path(dir.getPath(), 
ParquetFileWriter.PARQUET_METADATA_FILE);
   try {
-if (fs.exists(p)) {
-  return true;
-} else {
-
-  if (metaDataFileExists(fs, dir)) {
-return true;
-  }
-  List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
-  return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
-}
+// There should be at least one file, which is readable by Drill
+List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
+return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
--- End diff --

I have reverted my changes here. SchemalessScan will be created if 
ParquetGroupScan has empty entries and can't be used. The info that parquet 
metadata files are invalid will be logged.


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-11 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160936248
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
+import org.apache.drill.exec.planner.logical.DynamicDrillTable;
+import org.apache.drill.exec.proto.CoordinationProtos;
+
+import java.util.List;
+
+/**
+ *  The type of scan operator, which allows to scan schemaless tables 
({@link DynamicDrillTable} with null selection)
+ */
+@JsonTypeName("schemaless-scan")
+public class SchemalessScan extends AbstractGroupScan implements SubScan {
+
+  private String selectionRoot;
--- End diff --

Done


---


[jira] [Created] (DRILL-6081) Duration of completed queries is continuously increasing.

2018-01-11 Thread Anton Gozhiy (JIRA)
Anton Gozhiy created DRILL-6081:
---

 Summary: Duration of completed queries is continuously increasing.
 Key: DRILL-6081
 URL: https://issues.apache.org/jira/browse/DRILL-6081
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Anton Gozhiy


Steps:
1. Execute some query (for example "select * from sys.version")
2. Go to the profiles page: http://node1:8047/profiles
3. Open the details page for the query
4. Expand the duration section

Expected result:
The duration should be adequate and shouldn't be changed after the page reload

Actual result:
The duration is continuously increasing (You should reload page to notice that)
||Planning||Queued||Execution||Total||
|0.092 sec|0.012 sec|59 min 36.487 sec  |59 min 36.591 sec|

Workaround: none

Note:
The issue introduced after the following fix:
https://issues.apache.org/jira/browse/DRILL-5963



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #1074: DRILL-6025: Display execution time of a query in R...

2018-01-11 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1074#discussion_r160982037
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ---
@@ -211,16 +211,25 @@ public String getExecutionDuration() {
   return NOT_AVAILABLE_LABEL;
 }
 
+long queryEndTime;
+
+// Check if State is RUNNING, set end time to current time
+if (profile.getState() == QueryState.RUNNING) {
--- End diff --

@prasadns14  I think that after rebase you have lost one of the initial 
changes. Since you did not remove check in previous if statement for 
`profile.getState() == QueryState.RUNNING`, this if statement will never be 
executed.


---


[GitHub] drill issue #1078: DRILL-6054: don't try to split the filter when it is not ...

2018-01-11 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1078
  
+1, LGTM.


---


[GitHub] drill issue #1074: DRILL-6025: Display execution time of a query in RUNNING ...

2018-01-11 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1074
  
Thanks, +1.


---