[jira] [Created] (DRILL-5255) Drill requires that dfs.tmp be set up a start time, even if CTTAS not used

2017-02-10 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5255:
--

 Summary: Drill requires that dfs.tmp be set up a start time, even 
if CTTAS not used
 Key: DRILL-5255
 URL: https://issues.apache.org/jira/browse/DRILL-5255
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10
Reporter: Paul Rogers
Assignee: Arina Ielchiieva
 Fix For: 1.10


Drill can operate in embedded mode. In this mode, no storage plugin definitions 
other than the defaults may be present. In particular, when using the Drill 
test framework, only those storage plugins defined in the Drill code are 
available.

Yet, Drill checks for the existence of the dfs.tmp plugin definition (as named 
by the {{drill.exec.default_temporary_workspace}} parameter. Because this 
plugin is not defined, an exception occurs:

{code}
org.apache.drill.common.exceptions.UserException: PARSE ERROR: Unable to create 
or drop tables/views. Schema [dfs.tmp] is immutable.

[Error Id: 792d4e5d-3f31-4f38-8bb4-d108f1a808f6 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
at 
org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToMutableDrillSchema(SchemaUtilites.java:184)
at 
org.apache.drill.exec.planner.sql.SchemaUtilites.getTemporaryWorkspace(SchemaUtilites.java:201)
at 
org.apache.drill.exec.server.Drillbit.validateTemporaryWorkspace(Drillbit.java:264)
at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:135)
at 
org.apache.drill.test.ClusterFixture.startDrillbits(ClusterFixture.java:207)
...
{code}

Expected that either a configuration would exist that would use the default 
/tmp/drill location, or that the check for {{drill.tmp}} would be deferred 
until it is actually required (such as when executing a CTTAS statement.)

As it is, the test framework must be altered to work around this problem by 
defining the necessary properties. But, users of the embedded Drillbit may not 
know to do this configuration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #717: DRILL-5080: Memory-managed version of external sort

2017-02-10 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/717#discussion_r100657733
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.spill;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.BaseAllocator;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.vector.BaseDataValueVector;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.NullableVector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VarCharVector;
+
+import io.netty.buffer.DrillBuf;
+
+/**
+ * Given a record batch or vector container, determines the actual memory
+ * consumed by each column, the average row, and the entire record batch.
+ */
+
+public class RecordBatchSizer {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RecordBatchSizer.class);
+
+  /**
+   * Column size information.
+   */
+  public static class ColumnSize {
+public final MaterializedField metadata;
+
+/**
+ * Assumed size from Drill metadata.
+ */
+public int stdSize;
+/**
+ * Actual memory consumed by all the vectors associated with this 
column.
+ */
+public int totalSize;
+/**
+ * Actual average column width as determined from actual memory use. 
This
+ * size is larger than the actual data size since this size includes 
per-
+ * column overhead such as any unused vector space, etc.
+ */
+public int estSize;
+
+/**
+ * The size of the data vector backing the column. Useful for detecting
+ * cases of possible direct memory fragmentation.
+ */
+public int dataVectorSize;
+public int capacity;
+public int density;
+public int dataSize;
+
+@SuppressWarnings("resource")
+public ColumnSize(VectorWrapper vw) {
+  metadata = vw.getField();
+  stdSize = TypeHelper.getSize(metadata.getType());
+
+  // Can't get size estimates if this is an empty batch.
+
+  ValueVector v = vw.getValueVector();
+  int rowCount = v.getAccessor().getValueCount();
+  if (rowCount == 0) {
+return;
+  }
+  DrillBuf[] bufs = v.getBuffers(false);
+  for (DrillBuf buf : bufs) {
+totalSize += buf.capacity();
+  }
+
+  // Capacity is the number of values that the vector could
+  // contain. This is useful only for fixed-length vectors.
+
+  capacity = v.getValueCapacity();
+
+  // Crude way to get the size of the buffer underlying simple 
(scalar) values.
+  // Ignores maps, lists and other esoterica. Uses a crude way to 
subtract out
+  // the null "bit" (really byte) buffer size for nullable vectors.
+
+  if (v instanceof BaseDataValueVector) {
+dataVectorSize = totalSize;
+if (v instanceof NullableVector) {
+  dataVectorSize -= bufs[0].getActualMemoryConsumed();
+}
+  }
+
+  // Determine "density" the number of rows compared to potential
+  // capacity. Low-density batches occur at block boundaries, ends
+  // of files and so on. Low-density batches throw off our estimates
+  // for Varchar columns because we don't know the actual numb

[jira] [Created] (DRILL-5254) Enhance default reduction factors in optimizer

2017-02-10 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5254:
--

 Summary: Enhance default reduction factors in optimizer
 Key: DRILL-5254
 URL: https://issues.apache.org/jira/browse/DRILL-5254
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.10


Drill uses Calcite for query parsing and optimization. Drill uses Calcite's 
default selectivity (reduction factor) rules to compute the number of rows 
removed by a filter.

The default rules appear to be overly aggressive in estimating reductions. In a 
production use case, an input with 4 billion rows was estimated to return just 
40K rows from a filter. That is, the filter estimated a 1/1,000,000 reduction 
in rows. As it turns out, the actual reduction was closer to 1/2.

The result was that the planner compared the expected 40K rows against another 
input of 2.5 million rows, and decided the 40K rows would be best on the build 
side of a hash join. When confronted with the actual 3 billion rows, the hash 
join ran out of memory.

The moral of the story is that, in Drill, it is worth being conservative when 
planning for memory-intensive operations.

The (sanitized) filter is the following, annotated with (a guess at) the 
default reduction factors in each term:

{code}
col1_s20 in ('Value1','Value2','Value3','Value4',
 'Value5','Value6','Value7','Value8','Value9') -- 25%
AND col2_i <=3 -- 25%
AND col3_s1 = 'Y' -- 15%
AND col4_s1 = 'Y' -- 15%
AND col5_s6 not like '%str1%' -- 25%
AND col5_s6 not like '%str2%' -- 25%
AND col5_s6 not like '%str3%' -- 25%
AND col5_s6 not like '%str4%' -- 25%
{code}

Total reduction is:

{code}
.25 * .25 * .15 ^ 2 * .25 ^ 4 = 0.05
{code}

Filter estimation is a known hard problem. In general, one needs statistics and 
other data, and even then the estimates are just guesses.

Still it is possible to ensure that the defaults are at least unbiased. That is 
if we assume that the probability of A LIKE B being 25%, then the probability 
of A NOT LIKE B should be 75%, not also 25%.

This JIRA suggests creating an experimental set of defaults based on the "core" 
Calcite defaults, but with other reduction factors derived using the laws of 
probability. In particular:

|| Operator || Revised || Explanation || Calcite Default
| = | 0.15 | Default in Calcite | 0.15
| <> | 0.85 | 1 - p(=) | 0.5
| < | 0.425 | p(<>) / 2 | 0.5
| > | 0.425 | p(<>) / 2 | 0.5
| <= | 0.575 | p(<) + p(=) | 0.5
| >= | 0.575 | p(>) + p(=) | 0.5
| LIKE | 0.25 | Default in Calcite | 0.25
| NOT LIKE | 0.75 | 1 - p(LIKE) | 0.25
| NOT NULL | 0.90 | Default in Calcite | 0.90
| IS NULL | 0.10 | 1 - p(NOT NULL) | 0.25
| IS TRUE | 0.5 | 1 / 2 | 0.25
| IS FALSE | 0.5 | 1 / 2 | 0.25
| IS NOT TRUE | 0.55 | 1 - p(IS TRUE) - p(IS NULL) | .25
| IS NOT FALSE | 0.55 | 1 - p(IS FALSE) - p(IS NULL) | .25
| A OR B | Varies | min(p(A) + p(B) - p(A ^ B), 0.5) | 0.5
| IN (a) | 0.15 | p(=) | 0.5
| x IN (a, b, c, ...) | Varies | p(x = a v x = b v x = c v ...) | 0.5

The Calcite defaults should be taken as approximate.

The probability of the IS NOT TRUE statement assumes the presence of nulls, 
while IS TRUE does not. The rule for OR caps the reduction factor at 0.5 per 
standard practice.

With the revised rules, the example WHERE reduction becomes:

{code}
col1_s20 in ('Value1','Value2','Value3','Value4',
 'Value5','Value6','Value7','Value8','Value9') -- 50%
AND col2_i <=3 -- 57%
AND col3_s1 = 'Y' -- 15%
AND col4_s1 = 'Y' -- 15%
AND col5_s6 not like '%str1%' -- 85%
AND col5_s6 not like '%str2%' -- 85%
AND col5_s6 not like '%str3%' -- 85%
AND col5_s6 not like '%str4%' -- 85%

.5 * .57 * .15^2 * .85^4 = 0.003
{code}

The new rules are not a panacea: they are still just guesses. However, they are 
unbiased guesses based on the rules of probability which result in more 
conservative reductions of filters. The result may be better plans in queries 
with large conjunctions (large number of expressions AND'ed together.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #744: DRILL-5040: Parquet writer unable to delete table f...

2017-02-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/744


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #581: DRILL-4864: Add ANSI format for date/time functions

2017-02-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/581


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #744: DRILL-5040: Parquet writer unable to delete table folder o...

2017-02-10 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/744
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #581: DRILL-4864: Add ANSI format for date/time functions

2017-02-10 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/581
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #745: DRILL-5252: Fix a condition that always returns tru...

2017-02-10 Thread lifove
GitHub user lifove opened a pull request:

https://github.com/apache/drill/pull/745

DRILL-5252: Fix a condition that always returns true

Fix for https://issues.apache.org/jira/browse/DRILL-5252

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lifove/drill DRILL-5252

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #745


commit e9ac4c189dd25b224fe45cfa6ebe155ce45f3bf2
Author: JC 
Date:   2017-02-11T02:08:00Z

DRILL-5252: Fix a condition that always returns true




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #594: DRILL-4842: SELECT * on JSON data results in Number...

2017-02-10 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/594#discussion_r100653436
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
 ---
@@ -492,10 +537,18 @@ private void writeDataAllText(MapWriter map, 
FieldSelection selection,
   }
 }
 map.end();
-
   }
 
   /**
+   * Puts copy of field path list to fieldPathWriter map.
+   * @param fieldName
+   */
+  private void putFieldPath(String fieldName, MapWriter map) {
+List fieldPath = Lists.newArrayList(path);
--- End diff --

Before allocating the fieldPath list, should we first check if it is 
already present in the fieldPathWriter ? This function is called for every null 
value which makes it quite expensive (as shown by your performance 
experiments).  If there are 1000 records of type  {'x': null}, ideally we want 
to call this method only for field 'x' once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5253) External sort fails with OOM error (Fails to allocate sv2)

2017-02-10 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-5253:


 Summary: External sort fails with OOM error (Fails to allocate sv2)
 Key: DRILL-5253
 URL: https://issues.apache.org/jira/browse/DRILL-5253
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=2af709f

The data set used in the below query has the same value for every column in 
every row. The query fails with an OOM as it exceeds the allocated memory

{code}
 select count(*) from (select * from identical order by col1, col2, col3, col4, 
col5, col6, col7, col8, col9, col10);
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
buffer after repeated attempts
Fragment 2:0

[Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Exception from the logs
{code}
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
buffer after repeated attempts

[Error Id: aed43fa1-fd8b-4440-9426-0f35d055aabb ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
 [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: 
org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2 
buffer after repeated attempts
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:371)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
 ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.7.0_111]
at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_111]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 ~[hadoop-common-2.7.0-mapr-1607.jar:na]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226)
 [drill-java-exec-1.10.0-SNAP

[GitHub] drill issue #717: DRILL-5080: Memory-managed version of external sort

2017-02-10 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/717
  
Rebased and squashed commits to prepare for pulling into master.

Revised the code to estimate batch size. @Ben-Zvi, can you take a quick 
look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5252) A condition returns always true

2017-02-10 Thread JC (JIRA)
JC created DRILL-5252:
-

 Summary: A condition returns always true
 Key: DRILL-5252
 URL: https://issues.apache.org/jira/browse/DRILL-5252
 Project: Apache Drill
  Issue Type: Bug
Reporter: JC
Priority: Minor


I've found the following code smell in recent github snapshot.
Path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/EqualityVisitor.java

{code:java}
287 
288   @Override
289   public Boolean visitNullConstant(TypedNullConstant e, LogicalExpression 
value) throws RuntimeException {
290 if (!(value instanceof TypedNullConstant)) {
291   return false;
292 }
293 return e.getMajorType().equals(e.getMajorType());
294   }
295
{code}

Should it be like this?
{code:java}
292 }
293 return value.getMajorType().equals(e.getMajorType());
294   }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill issue #739: DRILL-5230: Translation of millisecond duration into hours...

2017-02-10 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/739
  
@paul-rogers Created a new SimpleDurationFormat class. We can expand to 
have more formats, or reimplement on lines of SimpleDateFormat by passing 
format strings in the future. Hope this helps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #742: DRILL-5242: The UI breaks when rendering profiles h...

2017-02-10 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/742#discussion_r100612470
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -163,11 +165,18 @@ public String getMetricsTable() {
   null);
 
   final Number[] values = new Number[metricNames.length];
+  //Track new/Unknown Metrics
+  final Set unknownMetrics = new TreeSet();
   for (final MetricValue metric : op.getMetricList()) {
-if (metric.hasLongValue()) {
-  values[metric.getMetricId()] = metric.getLongValue();
-} else if (metric.hasDoubleValue()) {
-  values[metric.getMetricId()] = metric.getDoubleValue();
+if (metric.getMetricId() < metricNames.length) {
+  if (metric.hasLongValue()) {
+values[metric.getMetricId()] = metric.getLongValue();
+  } else if (metric.hasDoubleValue()) {
+values[metric.getMetricId()] = metric.getDoubleValue();
+  }
+} else {
+  //Tracking unknown metric IDs
+  unknownMetrics.add(metric.getMetricId());
--- End diff --

@paul-rogers Can we bless this? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Time zone

2017-02-10 Thread Julian Hyde
Can someone please clarify the timezone behavior of Drill’s TIMESTAMP data 
type. According to the SQL standard, there is no timezone stored in a TIMESTAMP 
value, nor is there an implicit time zone (such as UTC or the server or 
session’s time zone).

Under the standard model, TIMESTAMP ‘2017-02-10 10:46:00’ does not represent 
‘2017-02-10 10:46:00 PST’ or ‘2017-02-10 10:46:00 UTC’ or ‘2017-02-10 18:46:00 
UTC’, nor does it represent a particular instant in time. It just represents a 
clock on the wall reading ‘2017-02-10 10:46:00’.

I was under the impression that Drill implements standard behavior, but 
https://drill.apache.org/docs/data-type-conversion/#time-zone-limitation 
 and 
http://www.openkb.info/2015/05/understanding-drills-timestamp-and.html#.VUzhotpVhHw
 

 make me doubt.

Julian



[jira] [Created] (DRILL-5251) Fields called 'Name' causing IndexOutOfBounds exception in Joins

2017-02-10 Thread Charles Givre (JIRA)
Charles Givre created DRILL-5251:


 Summary: Fields called 'Name' causing IndexOutOfBounds exception 
in Joins
 Key: DRILL-5251
 URL: https://issues.apache.org/jira/browse/DRILL-5251
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.9.0
 Environment: Mac OS 10.11
Reporter: Charles Givre


In working on a training class for Drill I discovered that if you have fields 
called 'Name' in your data, it will cause IndexOutOfBounds exceptions when you 
use that field in a join.  

This is the query I used:
```
SELECT data2016.`EmpName`, data2016.`JobTitle`, data2016.`AnnualSalary` AS 
salary_2016, data2015.`AnnualSalary` AS salary_2015
FROM dfs.drillclass.`baltimore_salaries_2016.csvh` AS data2016
INNER JOIN dfs.drillclass.`baltimore_salaries_2015.csvh` AS data2015
ON data2016.`EmpName` = data2015.`EmpName`
```
I renamed the Name field to EmpName and the query worked fine. If you want to 
try this out, the data is available here:
https://data.baltimorecity.gov/City-Governm…




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5250) Date is stored wrongly in HIVE generated JSON

2017-02-10 Thread Ravan (JIRA)
Ravan created DRILL-5250:


 Summary: Date is stored wrongly in HIVE generated JSON
 Key: DRILL-5250
 URL: https://issues.apache.org/jira/browse/DRILL-5250
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.8.0
Reporter: Ravan


Query1: Connected to Oracle Database

select a1.empno,a1.HIREDATE as HIREDATE,a1.SAL as SAL from orcl.xx.EMP a1;

Output:-

EMPNO   HIREDATESAL
7369.0  1980-12-17T00:00:00.000+05:30   800.0
7499.0  1981-02-20T00:00:00.000+05:30   1600.0
7521.0  1981-02-22T00:00:00.000+05:30   1250.0
7566.0  1981-04-02T00:00:00.000+05:30   2975.0

Query2: Creating a hive table with above query and storage format is JSON

create table intermediate.HIVE.test_16 as  select a1.empno as EMPNO,a1.HIREDATE 
as HIREDATE,a1.SAL as SAL from orcl.xx.EMP a1;

Query 3: 

SELECT * FROM intermediate.HIVE.test_16

Output:

EMPNO   HIREDATE SAL
7369.0  1980-12-16 18:30:00.000 800.0
7499.0  1981-02-19 18:30:00.000 1600.0
7521.0  1981-02-21 18:30:00.000 1250.0
7566.0  1981-04-01 18:30:00.000 2975.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)