date:20160303

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179526#comment-15179526
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user yufeldman commented on the pull request:

https://github.com/apache/drill/pull/400#issuecomment-192171659
  
Couple of general comments:
1. Since you are using Hadoop UGI it probably makes sense to be more 
compliant with Hadoop auth definitions. Which are: "superuser" can proxy for 
"user(s), group(s) and host(s)". May be adding group that can proxy is OK, but 
it is not what is done in Hadoop world today.
-
hadoop.proxyuser.superuser.hostscomma separated hosts from 
which superuser access are allowed to impersonation. * means wildcard.
hadoop.proxyuser.superuser.groups   comma separated groups to which 
users impersonated by superuser belongs. * means wildcard.
-
2. I think what we call here delegate/delegator is a true impersonation, 
what we call "chained impersonation" is kind of opposite of impersonation as it 
is increasing privileges versus restricting them. 


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179450#comment-15179450
 ] 

ASF GitHub Bot commented on DRILL-4465:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/401


> Refactor Parsing and Planning to canonicalize planning and parsing
> --
>
> Key: DRILL-4465
> URL: https://issues.apache.org/jira/browse/DRILL-4465
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4467) Invalid projection created using PrelUtil.getColumns

2016-03-03 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4467:
--
Fix Version/s: 1.6.0

> Invalid projection created using PrelUtil.getColumns
> 
>
> Key: DRILL-4467
> URL: https://issues.apache.org/jira/browse/DRILL-4467
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Critical
> Fix For: 1.6.0
>
>
> In {{DrillPushProjIntoScan}}, a new scan and a new projection are created 
> using {{PrelUtil#getColumn(RelDataType, List)}}.
> The returned {{ProjectPushInfo}} instance has several fields, one of them is 
> {{desiredFields}} which is the list of projected fields. There's one instance 
> per {{RexNode}} but because instances were initially added to a set, they 
> might not be in the same order as the order they were created.
> The issue happens in the following code:
> {code:java}
>   List newProjects = Lists.newArrayList();
>   for (RexNode n : proj.getChildExps()) {
> newProjects.add(n.accept(columnInfo.getInputRewriter()));
>   }
> {code}
> This code creates a new list of projects out of the initial ones, by mapping 
> the indices from the old projects to the new projects, but the indices of the 
> new RexNode instances might be out of order (because of the ordering of 
> desiredFields). And if indices are out of order, the check 
> {{ProjectRemoveRule.isTrivial(newProj)}} will fail.
> My guess is that desiredFields ordering should be preserved when instances 
> are added, to satisfy the condition above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4467) Invalid projection created using PrelUtil.getColumns

2016-03-03 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179446#comment-15179446
 ] 

Jacques Nadeau commented on DRILL-4467:
---

Yes, agree that desiredFields should be a newLinkedHashSet.



> Invalid projection created using PrelUtil.getColumns
> 
>
> Key: DRILL-4467
> URL: https://issues.apache.org/jira/browse/DRILL-4467
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Critical
> Fix For: 1.6.0
>
>
> In {{DrillPushProjIntoScan}}, a new scan and a new projection are created 
> using {{PrelUtil#getColumn(RelDataType, List)}}.
> The returned {{ProjectPushInfo}} instance has several fields, one of them is 
> {{desiredFields}} which is the list of projected fields. There's one instance 
> per {{RexNode}} but because instances were initially added to a set, they 
> might not be in the same order as the order they were created.
> The issue happens in the following code:
> {code:java}
>   List newProjects = Lists.newArrayList();
>   for (RexNode n : proj.getChildExps()) {
> newProjects.add(n.accept(columnInfo.getInputRewriter()));
>   }
> {code}
> This code creates a new list of projects out of the initial ones, by mapping 
> the indices from the old projects to the new projects, but the indices of the 
> new RexNode instances might be out of order (because of the ordering of 
> desiredFields). And if indices are out of order, the check 
> {{ProjectRemoveRule.isTrivial(newProj)}} will fail.
> My guess is that desiredFields ordering should be preserved when instances 
> are added, to satisfy the condition above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4467) Invalid projection created using PrelUtil.getColumns

2016-03-03 Thread Jacques Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4467:
--
Priority: Critical  (was: Major)

> Invalid projection created using PrelUtil.getColumns
> 
>
> Key: DRILL-4467
> URL: https://issues.apache.org/jira/browse/DRILL-4467
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Critical
>
> In {{DrillPushProjIntoScan}}, a new scan and a new projection are created 
> using {{PrelUtil#getColumn(RelDataType, List)}}.
> The returned {{ProjectPushInfo}} instance has several fields, one of them is 
> {{desiredFields}} which is the list of projected fields. There's one instance 
> per {{RexNode}} but because instances were initially added to a set, they 
> might not be in the same order as the order they were created.
> The issue happens in the following code:
> {code:java}
>   List newProjects = Lists.newArrayList();
>   for (RexNode n : proj.getChildExps()) {
> newProjects.add(n.accept(columnInfo.getInputRewriter()));
>   }
> {code}
> This code creates a new list of projects out of the initial ones, by mapping 
> the indices from the old projects to the new projects, but the indices of the 
> new RexNode instances might be out of order (because of the ordering of 
> desiredFields). And if indices are out of order, the check 
> {{ProjectRemoveRule.isTrivial(newProj)}} will fail.
> My guess is that desiredFields ordering should be preserved when instances 
> are added, to satisfy the condition above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4437) Implement framework for testing operators in isolation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179369#comment-15179369
 ] 

ASF GitHub Bot commented on DRILL-4437:
---

Github user parthchandra commented on the pull request:

https://github.com/apache/drill/pull/394#issuecomment-192114749
  
+1. Great to have this framework; nicely done.


> Implement framework for testing operators in isolation
> --
>
> Key: DRILL-4437
> URL: https://issues.apache.org/jira/browse/DRILL-4437
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>
> Most of the tests written for Drill are end-to-end. We spin up a full 
> instance of the server, submit one or more SQL queries and check the results.
> While integration tests like this are useful for ensuring that all features 
> are guaranteed to not break end-user functionality overuse of this approach 
> has caused a number of pain points.
> Overall the tests end up running a lot of the exact same code, parsing and 
> planning many similar queries.
> Creating consistent reproductions of issues, especially edge cases found in 
> clustered environments can be extremely difficult. Even the simpler case of 
> testing cases where operators are able to handle a particular series of 
> incoming batches of records has required hacks like generating large enough 
> files so that the scanners happen to break them up into separate batches. 
> These tests are brittle as they make assumptions about how the scanners will 
> work in the future. An example of when this could break, we might do perf 
> evaluation to find out we should be producing larger batches in some cases. 
> Existing tests that are trying to test multiple batches by producing a few 
> more records than the current threshold for batch size would not be testing 
> the same code paths.
> We need to make more parts of the system testable without initializing the 
> entire Drill server, as well as making the different internal settings and 
> state of the server configurable for tests.
> This is a first effort to enable testing the physical operators in Drill by 
> mocking the components of the system necessary to enable operators to 
> initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179333#comment-15179333
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54990639
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java 
---
@@ -0,0 +1,147 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.server.options.OptionValue;
+import org.apache.drill.exec.server.options.TypeValidators;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Utilities for user delegation purpose.
+ */
+public class UserDelegationUtil {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class);
+
+  private static final String STAR = "*";
+
+  private static final ObjectMapper delegationDefinitionsMapper = new 
ObjectMapper();
+
+  static {
+
delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, 
false);
+
delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES,
 true);
+  }
+
+  private static class DelegationDefinition {
+public UserGroupDefinition delegates = new UserGroupDefinition();
+public UserGroupDefinition delegators = new UserGroupDefinition();
+  }
+
+  private static class UserGroupDefinition {
+public Set users = Sets.newHashSet();
+public Set groups = Sets.newHashSet();
+  }
+
+  /**
+   * Deserialize delegation definitions string to a list of delegation 
definition objects.
+   *
+   * @param delegationDefinitions delegation definitions as a sting
+   * @return delegation definitions as a list of objects
+   * @throws IOException
+   */
+  public static List 
deserializeDelegationDefinitions(final String delegationDefinitions)
+  throws IOException {
+return delegationDefinitionsMapper.readValue(delegationDefinitions,
+new TypeReference() {});
+  }
+
+  /**
+   * Validator for delegation definitions.
+   */
+  public static class DelegationDefinitionsValidator extends 
TypeValidators.AdminOptionValidator {
+
+public DelegationDefinitionsValidator(String name, String def) {
+  super(name, def);
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+
+  final List definitions;
+  try {
+definitions = deserializeDelegationDefinitions(v.string_val);
+  } catch (final IOException e) {
+throw UserException.validationError()
+.message("Invalid delegation definition.\nDetails: %s", 
e.getMessage())
+.build(logger);
+  }
+
+  for (final DelegationDefinition definition : definitions) {
+if (definition.delegates.users.contains(STAR) ||
+definition.delegates.groups.contains(STAR)) {
+  throw UserException.validationError()
+  .message("No wildcard delegates allowed.")
+  .build(logger);
+}
+  }
+}
+  }
+
+  /**
+   * Check if the given delegate is authorized to delegate for the 
delegator based on the delegation definitions.
+   *
+   * @param delegateName  delegate name
+   * @param delegatorName delegator

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179261#comment-15179261
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54987453
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java 
---
@@ -0,0 +1,147 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.server.options.OptionValue;
+import org.apache.drill.exec.server.options.TypeValidators;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Utilities for user delegation purpose.
+ */
+public class UserDelegationUtil {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class);
+
+  private static final String STAR = "*";
+
+  private static final ObjectMapper delegationDefinitionsMapper = new 
ObjectMapper();
+
+  static {
+
delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, 
false);
+
delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES,
 true);
+  }
+
+  private static class DelegationDefinition {
+public UserGroupDefinition delegates = new UserGroupDefinition();
+public UserGroupDefinition delegators = new UserGroupDefinition();
+  }
+
+  private static class UserGroupDefinition {
+public Set users = Sets.newHashSet();
+public Set groups = Sets.newHashSet();
+  }
+
+  /**
+   * Deserialize delegation definitions string to a list of delegation 
definition objects.
+   *
+   * @param delegationDefinitions delegation definitions as a sting
+   * @return delegation definitions as a list of objects
+   * @throws IOException
+   */
+  public static List 
deserializeDelegationDefinitions(final String delegationDefinitions)
+  throws IOException {
+return delegationDefinitionsMapper.readValue(delegationDefinitions,
+new TypeReference() {});
+  }
+
+  /**
+   * Validator for delegation definitions.
+   */
+  public static class DelegationDefinitionsValidator extends 
TypeValidators.AdminOptionValidator {
+
+public DelegationDefinitionsValidator(String name, String def) {
+  super(name, def);
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+
+  final List definitions;
+  try {
+definitions = deserializeDelegationDefinitions(v.string_val);
+  } catch (final IOException e) {
+throw UserException.validationError()
+.message("Invalid delegation definition.\nDetails: %s", 
e.getMessage())
+.build(logger);
+  }
+
+  for (final DelegationDefinition definition : definitions) {
+if (definition.delegates.users.contains(STAR) ||
+definition.delegates.groups.contains(STAR)) {
+  throw UserException.validationError()
+  .message("No wildcard delegates allowed.")
+  .build(logger);
+}
+  }
+}
+  }
+
+  /**
+   * Check if the given delegate is authorized to delegate for the 
delegator based on the delegation definitions.
+   *
+   * @param delegateName  delegate name
+   * @param delegatorName

[jira] [Commented] (DRILL-4416) Quote path separator for windows

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179210#comment-15179210
 ] 

ASF GitHub Bot commented on DRILL-4416:
---

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/385#issuecomment-192074959
  
This patch causes a random leak. I am backing it off for a while.


> Quote path separator for windows
> 
>
> Key: DRILL-4416
> URL: https://issues.apache.org/jira/browse/DRILL-4416
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
> Fix For: 1.7.0
>
>
> Windows uses backslash as its path separator. We need to do string 
> manipulation using the separator during which the separator must be quoted. 
> This issue proposes (i) creating a global static path separator variable in 
> common and (ii) removing all others and (iii) using quoted separator where 
> need be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4416) Quote path separator for windows

2016-03-03 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes updated DRILL-4416:

Fix Version/s: 1.7.0

> Quote path separator for windows
> 
>
> Key: DRILL-4416
> URL: https://issues.apache.org/jira/browse/DRILL-4416
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
> Fix For: 1.7.0
>
>
> Windows uses backslash as its path separator. We need to do string 
> manipulation using the separator during which the separator must be quoted. 
> This issue proposes (i) creating a global static path separator variable in 
> common and (ii) removing all others and (iii) using quoted separator where 
> need be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4325) ForemanException: One or more nodes lost connectivity during query

2016-03-03 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179143#comment-15179143
 ] 

Jacques Nadeau commented on DRILL-4325:
---

[~vicky], would you be willing to run your same oversaturation test and see if 
our stability is better with this simple patch?

https://github.com/jacques-n/drill/tree/DRILL-4466b

I'd like to see if we can help the kernel scheduler enough that it runs work at 
a larger quantum. This won't solve the gross over-parallelization issue 
directly but it may help the system context switch less.

In reality, a change in scheduling won't actually impact the core problem of 
too many simultaneous tasks. No matter the threading model, having 4000 tasks 
competing for ~40 logical cores is going to mean slow progress. Clearly we need 
to increase the switch quantum in these cases so we make forward progress 
(hopefully impacted with my small patch). However, if we target a quantum of 
100ms, that means tasks would wait 10s between each 100ms of work. In other 
words, we can't schedule this many tasks and expect speedy forward progress. We 
need to enable inbound controls as well as ensure that we reduce the 
parallelization behavior on a heavily loaded node.

> ForemanException: One or more nodes lost connectivity during query
> --
>
> Key: DRILL-4325
> URL: https://issues.apache.org/jira/browse/DRILL-4325
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
> Attachments: drillbit.log.133, drillbit.log.134, drillbit.log.135, 
> drillbit.log.136, stats.133.tar, stats.134.tar, stats.135.tar, stats.136.tar, 
> zookeeper.log
>
>
> The picture pretty much looks like this: bunch of queries are running 
> (usually something more involved than just simple functional tests),usually  
> tpch or tpcds  with lots of major fragments, like query74 from tpcds . 
> Zookeeper decides that particular node is dead and queries that were running 
> at the time of the connection loss are failed by drill ( which is correct 
> behavior, I think )
> It seems that I can reliably reproduce this issue when I bump up number of 
> concurrently running queries and make all of them go to the same forman node 
> (I don't really imply here that  planning is to blame, just seems to 
> reproduce easier)
> On my 4 node cluster I can pretty much reproduce this problem relaiably by 
> running: 
> run.sh -s Advanced/tpcds/tpcds_sf100/original -g smoke -t 600 -n 10 
> {code}
> 2016-01-28 16:30:20,146 [29554d63-b478-6bae-f0f6-435d9f33ffdf:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d63-b478-6bae-f0f6-435d9f33ffdf: select * from sys.version
> 2016-01-28 16:30:22,844 [29554d61-2789-babb-54e5-22b701bf2f64:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d61-2789-babb-54e5-22b701bf2f64: select * from sys.drillbits
> 2016-01-28 16:30:23,281 [29554d60-5bbd-dae1-c38d-21708ad37fbe:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d60-5bbd-dae1-c38d-21708ad37fbe: alter system set 
> `planner.enable_decimal_data_type` = true
> 2016-01-28 16:30:24,889 [29554d5e-d243-6299-3103-58b180135854:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-d243-6299-3103-58b180135854: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:24,931 [29554d5e-b395-14aa-42a4-f6f248059363:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-b395-14aa-42a4-f6f248059363: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:24,964 [29554d5f-24ac-cf00-714c-7419d3894af0:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5f-24ac-cf00-714c-7419d3894af0: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:24,998 [29554d5e-ae92-6306-3495-be5cb7f98139:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-ae92-6306-3495-be5cb7f98139: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,040 [29554d5e-1a20-3d6d-143b-0ee3bcd4aa11:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-1a20-3d6d-143b-0ee3bcd4aa11: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,073 [29554d5d-e7b4-c61c-9735-ce37938aa47d:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-e7b4-c61c-9735-ce37938aa47d: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,106 [29554d5d-823b-0536-e4df-4c6cef64b3e4:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-823b-0536-e4df-4c6cef64b3e4: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,131

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179140#comment-15179140
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54980440
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java 
---
@@ -0,0 +1,147 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.server.options.OptionValue;
+import org.apache.drill.exec.server.options.TypeValidators;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Utilities for user delegation purpose.
+ */
+public class UserDelegationUtil {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class);
+
+  private static final String STAR = "*";
+
+  private static final ObjectMapper delegationDefinitionsMapper = new 
ObjectMapper();
+
+  static {
+
delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, 
false);
+
delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES,
 true);
+  }
+
+  private static class DelegationDefinition {
+public UserGroupDefinition delegates = new UserGroupDefinition();
+public UserGroupDefinition delegators = new UserGroupDefinition();
+  }
+
+  private static class UserGroupDefinition {
+public Set users = Sets.newHashSet();
+public Set groups = Sets.newHashSet();
+  }
+
+  /**
+   * Deserialize delegation definitions string to a list of delegation 
definition objects.
+   *
+   * @param delegationDefinitions delegation definitions as a sting
+   * @return delegation definitions as a list of objects
+   * @throws IOException
+   */
+  public static List 
deserializeDelegationDefinitions(final String delegationDefinitions)
+  throws IOException {
+return delegationDefinitionsMapper.readValue(delegationDefinitions,
+new TypeReference() {});
+  }
+
+  /**
+   * Validator for delegation definitions.
+   */
+  public static class DelegationDefinitionsValidator extends 
TypeValidators.AdminOptionValidator {
+
+public DelegationDefinitionsValidator(String name, String def) {
+  super(name, def);
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+
+  final List definitions;
+  try {
+definitions = deserializeDelegationDefinitions(v.string_val);
+  } catch (final IOException e) {
+throw UserException.validationError()
+.message("Invalid delegation definition.\nDetails: %s", 
e.getMessage())
+.build(logger);
+  }
+
+  for (final DelegationDefinition definition : definitions) {
+if (definition.delegates.users.contains(STAR) ||
+definition.delegates.groups.contains(STAR)) {
+  throw UserException.validationError()
+  .message("No wildcard delegates allowed.")
+  .build(logger);
+}
+  }
+}
+  }
+
+  /**
+   * Check if the given delegate is authorized to delegate for the 
delegator based on the delegation definitions.
+   *
+   * @param delegateName  delegate name
+   * @param delegatorName

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179126#comment-15179126
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54979630
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/security/testing/UserAuthenticatorToTestDelegation.java
 ---
@@ -0,0 +1,72 @@
+package org.apache.drill.exec.rpc.user.security.testing;
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.exec.exception.DrillbitStartupException;
+import org.apache.drill.exec.rpc.user.security.UserAuthenticationException;
+import org.apache.drill.exec.rpc.user.security.UserAuthenticator;
+import org.apache.drill.exec.rpc.user.security.UserAuthenticatorTemplate;
+import org.apache.drill.exec.util.ImpersonationUtil;
+
+import java.io.IOException;
+
+import static org.apache.drill.exec.delegation.TestUserDelegation.OWNER;
+import static 
org.apache.drill.exec.delegation.TestUserDelegation.OWNER_PASSWORD;
+import static 
org.apache.drill.exec.delegation.TestUserDelegation.DELEGATOR_NAME;
+import static 
org.apache.drill.exec.delegation.TestUserDelegation.DELEGATOR_PASSWORD;
+import static 
org.apache.drill.exec.delegation.TestUserDelegation.DELEGATE_NAME;
+import static 
org.apache.drill.exec.delegation.TestUserDelegation.DELEGATE_PASSWORD;
+
+/**
+ * Used by {@link org.apache.drill.exec.delegation.TestUserDelegation}.
+ *
+ * Needs to be in this package.
+ */
+@UserAuthenticatorTemplate(type = UserAuthenticatorToTestDelegation.TYPE)
+public class UserAuthenticatorToTestDelegation implements 
UserAuthenticator {
--- End diff --

Can you add the new users to existing test authenticator impl 
UserAuthenticatorTestImpl.class?


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179096#comment-15179096
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/400#issuecomment-192046477
  
Generally looks good. +1 with the few small items above addressed. Updating 
the names to something else would be good. Since this is also impersonation 
(just client impersonation instead of storage plugin impersonation) I'm not 
sure I would shy away from using the term. The main goal for me is clear 
directionality. I think "principals" works well for the first piece. Ideas for 
the second:
"can_execute_as", "can_impersonate", "can_act_as", ?



> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179091#comment-15179091
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54977484
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/delegation/TestUserDelegation.java
 ---
@@ -0,0 +1,124 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.delegation;
+
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.dotdrill.DotDrillType;
+import org.apache.drill.exec.impersonation.BaseTestImpersonation;
+import org.apache.drill.exec.rpc.user.UserSession;
+import 
org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorToTestDelegation;
+import org.apache.drill.exec.store.dfs.WorkspaceConfig;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.Map;
+import java.util.Properties;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestUserDelegation extends BaseTestImpersonation {
--- End diff --

Can you also add some negative tests that confirm nice error messages? 
(User tries to delegate to disallowed user, group, etc)


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179086#comment-15179086
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54977306
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/delegation/TestDelegationPrivileges.java
 ---
@@ -0,0 +1,137 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.delegation;
+
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.impersonation.BaseTestImpersonation;
+import org.apache.drill.exec.server.options.OptionValue;
+import org.apache.drill.exec.util.UserDelegationUtil;
+import org.junit.Test;
+
+import static junit.framework.Assert.assertEquals;
+
+public class TestDelegationPrivileges extends BaseTestImpersonation {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestDelegationPrivileges.class);
+
+  // definitions on which the tests are based
+  private static final String DELEGATION_DEFINITIONS = "[" +
+  "{ delegates  : { users  : [\"user0_1\"] }," +
--- End diff --

Might be nice to put this in a file so we can have people refer to an 
example set of settings in the codebase (without having to filter out Java 
escaping).


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179083#comment-15179083
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54977178
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java 
---
@@ -0,0 +1,147 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.server.options.OptionValue;
+import org.apache.drill.exec.server.options.TypeValidators;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Utilities for user delegation purpose.
+ */
+public class UserDelegationUtil {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class);
+
+  private static final String STAR = "*";
+
+  private static final ObjectMapper delegationDefinitionsMapper = new 
ObjectMapper();
+
+  static {
+
delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, 
false);
+
delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES,
 true);
+  }
+
+  private static class DelegationDefinition {
+public UserGroupDefinition delegates = new UserGroupDefinition();
+public UserGroupDefinition delegators = new UserGroupDefinition();
+  }
+
+  private static class UserGroupDefinition {
+public Set users = Sets.newHashSet();
+public Set groups = Sets.newHashSet();
+  }
+
+  /**
+   * Deserialize delegation definitions string to a list of delegation 
definition objects.
+   *
+   * @param delegationDefinitions delegation definitions as a sting
+   * @return delegation definitions as a list of objects
+   * @throws IOException
+   */
+  public static List 
deserializeDelegationDefinitions(final String delegationDefinitions)
+  throws IOException {
+return delegationDefinitionsMapper.readValue(delegationDefinitions,
+new TypeReference() {});
+  }
+
+  /**
+   * Validator for delegation definitions.
+   */
+  public static class DelegationDefinitionsValidator extends 
TypeValidators.AdminOptionValidator {
+
+public DelegationDefinitionsValidator(String name, String def) {
+  super(name, def);
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+
+  final List definitions;
+  try {
+definitions = deserializeDelegationDefinitions(v.string_val);
+  } catch (final IOException e) {
+throw UserException.validationError()
+.message("Invalid delegation definition.\nDetails: %s", 
e.getMessage())
+.build(logger);
+  }
+
+  for (final DelegationDefinition definition : definitions) {
+if (definition.delegates.users.contains(STAR) ||
+definition.delegates.groups.contains(STAR)) {
+  throw UserException.validationError()
+  .message("No wildcard delegates allowed.")
+  .build(logger);
+}
+  }
+}
+  }
+
+  /**
+   * Check if the given delegate is authorized to delegate for the 
delegator based on the delegation definitions.
+   *
+   * @param delegateName  delegate name
+   * @param delegatorName delegator

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179082#comment-15179082
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54977126
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java 
---
@@ -0,0 +1,147 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.server.options.OptionValue;
+import org.apache.drill.exec.server.options.TypeValidators;
+import org.apache.hadoop.security.UserGroupInformation;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Utilities for user delegation purpose.
+ */
+public class UserDelegationUtil {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class);
+
+  private static final String STAR = "*";
+
+  private static final ObjectMapper delegationDefinitionsMapper = new 
ObjectMapper();
+
+  static {
+
delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, 
false);
+
delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES,
 true);
+  }
+
+  private static class DelegationDefinition {
+public UserGroupDefinition delegates = new UserGroupDefinition();
+public UserGroupDefinition delegators = new UserGroupDefinition();
+  }
+
+  private static class UserGroupDefinition {
+public Set users = Sets.newHashSet();
+public Set groups = Sets.newHashSet();
+  }
+
+  /**
+   * Deserialize delegation definitions string to a list of delegation 
definition objects.
+   *
+   * @param delegationDefinitions delegation definitions as a sting
+   * @return delegation definitions as a list of objects
+   * @throws IOException
+   */
+  public static List 
deserializeDelegationDefinitions(final String delegationDefinitions)
+  throws IOException {
+return delegationDefinitionsMapper.readValue(delegationDefinitions,
+new TypeReference() {});
+  }
+
+  /**
+   * Validator for delegation definitions.
+   */
+  public static class DelegationDefinitionsValidator extends 
TypeValidators.AdminOptionValidator {
+
+public DelegationDefinitionsValidator(String name, String def) {
+  super(name, def);
+}
+
+@Override
+public void validate(OptionValue v) {
+  super.validate(v);
+
+  final List definitions;
+  try {
+definitions = deserializeDelegationDefinitions(v.string_val);
+  } catch (final IOException e) {
+throw UserException.validationError()
+.message("Invalid delegation definition.\nDetails: %s", 
e.getMessage())
+.build(logger);
+  }
+
+  for (final DelegationDefinition definition : definitions) {
+if (definition.delegates.users.contains(STAR) ||
+definition.delegates.groups.contains(STAR)) {
+  throw UserException.validationError()
+  .message("No wildcard delegates allowed.")
+  .build(logger);
+}
+  }
+}
+  }
+
+  /**
+   * Check if the given delegate is authorized to delegate for the 
delegator based on the delegation definitions.
+   *
+   * @param delegateName  delegate name
+   * @param delegatorName delegator

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179077#comment-15179077
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54976899
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java ---
@@ -116,14 +137,38 @@ public OptionManager getOptions() {
 return sessionOptions;
   }
 
-  public DrillUser getUser() {
-return user;
-  }
-
   public UserCredentials getCredentials() {
 return credentials;
   }
 
+  /**
+   * Replace current user credentials with the given user's credentials, 
if authorized.
+   *
+   * @param delegatorName delegator name
+   * @throws DrillRuntimeException if credentials cannot be replaced
+   */
+  public void replaceUserCredentials(final String delegatorName) {
+assert enableDelegation;
--- End diff --

No need for assert, preconditions makes sure we get Exception instead of 
typically uncaptured Error subclass and this isn't perf sensitive.


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179073#comment-15179073
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/400#discussion_r54976762
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -88,6 +89,7 @@
   String USER_AUTHENTICATION_ENABLED = 
"drill.exec.security.user.auth.enabled";
   String USER_AUTHENTICATOR_IMPL = "drill.exec.security.user.auth.impl";
   String PAM_AUTHENTICATOR_PROFILES = 
"drill.exec.security.user.auth.pam_profiles";
+  String USER_DELEGATION_ENABLED = "drill.exec.delegation.enabled";
--- End diff --

Isn't an empty delegation block enough? Any reason to have a second kill 
switch?


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179026#comment-15179026
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/400#issuecomment-192036063
  
I don't think they are common. How about "principals" and 
"can_delegate_for"? I am not strongly against "can_impersonate", but I want to 
avoid confusion with user impersonation.

Does everything else look good?


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178659#comment-15178659
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-191973058
  
Moving to #405.


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178660#comment-15178660
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam closed the pull request at:

https://github.com/apache/drill/pull/193


> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178655#comment-15178655
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/405

DRILL-3623: For limit 0 queries, use a shorter path when result column 
types are known

Moving from #193 to here.

+ There is a pull request open for first commit (DRILL-4372: #397).
+ Second commit has a "nice to have" check: ensuring planning and execution 
types patch.
+ My changes are in the third commit (e4cfdfa). Please review this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-3623-pr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit c553365e39947ba6c95d645cc971cf4d696ee758
Author: Sudheesh Katkam 
Date:   2015-12-22T04:38:59Z

DRILL-4372: Expose the functions return type to Drill

- Drill-Calite version update:
This commit needs to have Calcite's patch (CALCITE-1062) to plugin 
customized SqlOperator.

- FunctionTemplate
Add FunctionArgumentNumber annotation. This annotation element tells if the 
number of argument(s) is fixed or arbitrary (e.g., String concatenation 
function).

Due to this modification, there are some minor changes in DrillFuncHolder, 
DrillFunctionRegistry and FunctionAttributes.

- Checker
Add a new Checker (which Calcite uses to validate the legitimacy of the 
number of argument(s) for a function) to allow functions with arbitrary 
arguments to pass Caclite's validation

- Type conversion between Drill and Calcite
DrillConstExector is given a static method getDrillTypeFromCalcite() to 
convert Calcite types to Drill's.

- Extract function's return type inference
Unlike other functions, Extract function's return type can be determined 
solely based on the first argument. A logic is added in to allow this inference 
to happen

- DrillCalcite wrapper:
From the aspects of return type inference and argument type checks, 
Calcite's mechanism is very different from Drill's. In addition, currently, 
there is no straightforward way for Drill to plug-in customized mechanisms to 
Calcite. Thus, wrappers are provided to serve the objective.

Except for the mechanisms of type inference and argument type checks, these 
wrappers just forward any method calls to the wrapped SqlOpertor, SqlFuncion or 
SqlAggFunction to respond.

A interface DrillCalciteSqlWrapper is also added for the callers of the 
three wrappers to get the wrapped objects easier.

Due to these wrappers, UnsupportedOperatorsVisitor is modified in a minor 
manner.

- Calcite's SqlOpertor, SqlFuncion or SqlAggFunction are wrapped in 
DrillOperatorTable
Instead of returning Caclite's native SqlOpertor, SqlFuncion or 
SqlAggFunction, return the wrapped ones to ensure customized behaviors can be 
adopted.

- Type inference mechanism
This mechanism is used across all SqlOpertor, SqlFuncion or SqlAggFunction. 
Thus, it is factored out as its own method in TypeInferenceUtils

- Upgrade Drill-Calcite

Bump version number to 1.4.0-drill-test-r16

- Implement two argument version of lpad, rpad

- Implement one argument version of ltrim, rtrim, btrim

commit c3f0649e3ebb45d54e747f099d6699150bfa9869
Author: Hsuan-Yi Chu 
Date:   2016-02-03T05:17:50Z

DRILL-4372: Part 2: Optionally ensure planning and execution types match

commit e4cfdfa9b0562d52ac07f6d80860a82fa8baba40
Author: Sudheesh Katkam 
Date:   2016-03-03T21:25:39Z

DRILL-3623: For limit 0 queries, use a shorter path when result column 
types are known




> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet

[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178627#comment-15178627
 ] 

ASF GitHub Bot commented on DRILL-4281:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/400#issuecomment-191963668
  
Are delegate and delegator common used terms for these things? Frankly, I 
would have to look these up to confirm which is which and could see making a 
mistake reversing them. Any way we can make them something with clearer 
directionality? (If everybody else thinks that this distinction and 
directionality is super clear, nevermind.)


> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting, security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4384) Query profile is missing important information on WebUi

2016-03-03 Thread Jason Altekruse (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178324#comment-15178324
 ] 

Jason Altekruse commented on DRILL-4384:


Fixed in c95b5432301fe487d64a1fc06e765228469fc3a2

> Query profile is missing important information on WebUi
> ---
>
> Key: DRILL-4384
> URL: https://issues.apache.org/jira/browse/DRILL-4384
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
> Attachments: DRILL-4384.patch
>
>
> Built drill from master branch (0a2518d7cf01a92a27a82e29edac5424bedf31d5) and 
> started in embedded mode. Then,
> run a query and checked the query profile through WebUI. However,
> seems that the fragment profiles , operator profiles and visualized
> plan sections are all empty. Tried both Mac and CentOS and hit the same
> problem.
> After doing a binary search over recent commits, seems the patch of
> "DRILL-3581: Upgrade HPPC to 0.7.1" is the cause of broken query
> profiles [1].  The query profile on the commits before DRILL-3581
> looks fine.
> [1] 
> https://github.com/apache/drill/commit/d27127c94d5c08306697a5627a1bac5f144abb22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4471) Add unit test for the Drill Web UI

2016-03-03 Thread Jason Altekruse (JIRA)

Jason Altekruse created DRILL-4471:
--

 Summary: Add unit test for the Drill Web UI
 Key: DRILL-4471
 URL: https://issues.apache.org/jira/browse/DRILL-4471
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


While the Web UI isn't being very actively developed, a few times changes to 
the Drill build or internal parts of the server have broken parts of the Web UI.

As the web UI is a primary interface for viewing cluster information, 
cancelling queries, configuring storage and other tasks, we really should add 
automated tests for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4384) Query profile is missing important information on WebUi

2016-03-03 Thread Jason Altekruse (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178321#comment-15178321
 ] 

Jason Altekruse commented on DRILL-4384:


[~jni] Venki merged this yesterday along with some other outstanding patches. I 
do agree with you about the automated tests for the UI. I have opened 
DRILL-4471 to track this task.

> Query profile is missing important information on WebUi
> ---
>
> Key: DRILL-4384
> URL: https://issues.apache.org/jira/browse/DRILL-4384
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
> Attachments: DRILL-4384.patch
>
>
> Built drill from master branch (0a2518d7cf01a92a27a82e29edac5424bedf31d5) and 
> started in embedded mode. Then,
> run a query and checked the query profile through WebUI. However,
> seems that the fragment profiles , operator profiles and visualized
> plan sections are all empty. Tried both Mac and CentOS and hit the same
> problem.
> After doing a binary search over recent commits, seems the patch of
> "DRILL-3581: Upgrade HPPC to 0.7.1" is the cause of broken query
> profiles [1].  The query profile on the commits before DRILL-3581
> looks fine.
> [1] 
> https://github.com/apache/drill/commit/d27127c94d5c08306697a5627a1bac5f144abb22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178268#comment-15178268
 ] 

ASF GitHub Bot commented on DRILL-4465:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/401#issuecomment-191890837
  
@jinfengni: I've addressed your review comments. Let me know any additional 
feedback. 

thanks!



> Refactor Parsing and Planning to canonicalize planning and parsing
> --
>
> Key: DRILL-4465
> URL: https://issues.apache.org/jira/browse/DRILL-4465
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178153#comment-15178153
 ] 

ASF GitHub Bot commented on DRILL-4465:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/401#discussion_r54912350
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlParser.java
 ---
@@ -0,0 +1,349 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.calcite.adapter.java.JavaTypeFactory;
+import org.apache.calcite.avatica.util.Casing;
+import org.apache.calcite.avatica.util.Quoting;
+import org.apache.calcite.jdbc.CalciteSchemaImpl;
+import org.apache.calcite.jdbc.JavaTypeFactoryImpl;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCostFactory;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.plan.volcano.VolcanoPlanner;
+import org.apache.calcite.prepare.CalciteCatalogReader;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeSystemImpl;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlOperatorTable;
+import org.apache.calcite.sql.parser.SqlParseException;
+import org.apache.calcite.sql.parser.SqlParser;
+import org.apache.calcite.sql.parser.SqlParserImplFactory;
+import org.apache.calcite.sql.parser.SqlParserPos;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.sql.util.ChainedSqlOperatorTable;
+import org.apache.calcite.sql.validate.SqlConformance;
+import org.apache.calcite.sql.validate.SqlValidatorCatalogReader;
+import org.apache.calcite.sql.validate.SqlValidatorImpl;
+import org.apache.calcite.sql2rel.RelDecorrelator;
+import org.apache.calcite.sql2rel.SqlToRelConverter;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.ops.UdfUtilities;
+import org.apache.drill.exec.planner.cost.DrillCostBase;
+import org.apache.drill.exec.planner.logical.DrillConstExecutor;
+import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import 
org.apache.drill.exec.planner.sql.parser.impl.DrillParserWithCompoundIdConverter;
+
+/**
+ * Class responsible for managing parsing, validation and toRel conversion 
for sql statements.
+ */
+public class DrillSqlParser {
--- End diff --

SqlConverter seems fine to me. 


> Refactor Parsing and Planning to canonicalize planning and parsing
> --
>
> Key: DRILL-4465
> URL: https://issues.apache.org/jira/browse/DRILL-4465
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178160#comment-15178160
 ] 

ASF GitHub Bot commented on DRILL-4465:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/401#discussion_r54912629
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlParser.java
 ---
@@ -0,0 +1,349 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.calcite.adapter.java.JavaTypeFactory;
+import org.apache.calcite.avatica.util.Casing;
+import org.apache.calcite.avatica.util.Quoting;
+import org.apache.calcite.jdbc.CalciteSchemaImpl;
+import org.apache.calcite.jdbc.JavaTypeFactoryImpl;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCostFactory;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.plan.volcano.VolcanoPlanner;
+import org.apache.calcite.prepare.CalciteCatalogReader;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeSystemImpl;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.SqlOperatorTable;
+import org.apache.calcite.sql.parser.SqlParseException;
+import org.apache.calcite.sql.parser.SqlParser;
+import org.apache.calcite.sql.parser.SqlParserImplFactory;
+import org.apache.calcite.sql.parser.SqlParserPos;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.sql.util.ChainedSqlOperatorTable;
+import org.apache.calcite.sql.validate.SqlConformance;
+import org.apache.calcite.sql.validate.SqlValidatorCatalogReader;
+import org.apache.calcite.sql.validate.SqlValidatorImpl;
+import org.apache.calcite.sql2rel.RelDecorrelator;
+import org.apache.calcite.sql2rel.SqlToRelConverter;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.ops.UdfUtilities;
+import org.apache.drill.exec.planner.cost.DrillCostBase;
+import org.apache.drill.exec.planner.logical.DrillConstExecutor;
+import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import 
org.apache.drill.exec.planner.sql.parser.impl.DrillParserWithCompoundIdConverter;
+
+/**
+ * Class responsible for managing parsing, validation and toRel conversion 
for sql statements.
+ */
+public class DrillSqlParser {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillSqlParser.class);
+
+  private static DrillTypeSystem DRILL_TYPE_SYSTEM = new DrillTypeSystem();
+
+  private final JavaTypeFactory typeFactory;
+  private final SqlParser.Config parserConfig;
+  private final CalciteCatalogReader catalog;
+  private final PlannerSettings settings;
+  private final SchemaPlus rootSchema;
+  private final SchemaPlus defaultSchema;
+  private final SqlOperatorTable opTab;
+  private final RelOptCostFactory costFactory;
+  private final DrillValidator validator;
+  private final boolean isInnerQuery;
+  private final UdfUtilities util;
+  private final FunctionImplementationRegistry functions;
+
+  private String sql;
+  private VolcanoPlanner planner;
+
+
+  public DrillSqlParser(PlannerSettings settings, SchemaPlus defaultSchema,
+  final SqlOperatorTable operatorTable, UdfUtilities util,

[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178092#comment-15178092
 ] 

ASF GitHub Bot commented on DRILL-4465:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/401#discussion_r54907758
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 ---
@@ -273,12 +282,90 @@ public RelNode visit(RelNode other) {
 
   }
 
+  /**
+   * Transform RelNode to a new RelNode without changing any traits. Also 
will log the outcome.
+   *
+   * @param plannerType
+   *  The type of Planner to use.
+   * @param phase
+   *  The transformation phase we're running.
+   * @param input
+   *  The origianl RelNode
+   * @return The transformed relnode.
+   */
+  private RelNode transform(PlannerType plannerType, PlannerPhase phase, 
RelNode input) {
+return transform(plannerType, phase, input, input.getTraitSet());
+  }
+
+  /**
+   * Transform RelNode to a new RelNode, targeting the provided set of 
traits. Also will log the outcome.
+   *
+   * @param plannerType
+   *  The type of Planner to use.
+   * @param phase
+   *  The transformation phase we're running.
+   * @param input
+   *  The origianl RelNode
+   * @param targetTraits
+   *  The traits we are targeting for output.
+   * @return The transformed relnode.
+   */
+  protected RelNode transform(PlannerType plannerType, PlannerPhase phase, 
RelNode input, RelTraitSet targetTraits) {
+final Stopwatch watch = Stopwatch.createStarted();
+final RuleSet rules = config.getRules(phase);
+final RelTraitSet toTraits = targetTraits.simplify();
+
+final RelNode output;
+switch (plannerType) {
+case HEP_BOTTOM_UP:
+case HEP: {
+  final HepProgramBuilder hepPgmBldr = new HepProgramBuilder();
+  if (plannerType == PlannerType.HEP_BOTTOM_UP) {
+hepPgmBldr.addMatchOrder(HepMatchOrder.BOTTOM_UP);
+  }
+  for (RelOptRule rule : rules) {
+hepPgmBldr.addRuleInstance(rule);
+  }
+
+  final HepPlanner planner = new HepPlanner(hepPgmBldr.build(), 
context.getPlannerSettings());
+
+  final List list = Lists.newArrayList();
+  list.add(DrillDefaultRelMetadataProvider.INSTANCE);
+  planner.registerMetadataProviders(list);
+  final RelMetadataProvider cachingMetaDataProvider = new 
CachingRelMetadataProvider(
+  ChainedRelMetadataProvider.of(list), planner);
+
+  // Modify RelMetaProvider for every RelNode in the SQL operator Rel 
tree.
+  input.accept(new MetaDataProviderModifier(cachingMetaDataProvider));
+  planner.setRoot(input);
+  if (!input.getTraitSet().equals(targetTraits)) {
+planner.changeTraits(input, toTraits);
+  }
+  output = planner.findBestExp();
+  break;
+}
+case VOLCANO:
+default: {
+  // as weird as it seems, the cluster's only planner is the volcano 
planner.
+  final RelOptPlanner planner = input.getCluster().getPlanner();
+  final Program program = Programs.of(rules);
+  output = program.run(planner, input, toTraits);
+
+  break;
+}
+}
+
+log(plannerType.name() + ":" + phase.description, output, logger, 
watch);
--- End diff --

Sorry for the confusion. You are right that there is no impact when debug 
is disabled. The reason for the performance difference is IDE enables debug 
mode, which will cause the unit test to run longer. As long as debug is 
disabled, we would not see difference.  


> Refactor Parsing and Planning to canonicalize planning and parsing
> --
>
> Key: DRILL-4465
> URL: https://issues.apache.org/jira/browse/DRILL-4465
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4441) IN operator does not work with Avro reader

2016-03-03 Thread Jason Altekruse (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178020#comment-15178020
 ] 

Jason Altekruse commented on DRILL-4441:


I need to fix this for varbinary too, but to make the test case work I had to 
fix casts from varchar to varbinary, as it does not appear we support binary 
literals.

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4441) IN operator does not work with Avro reader

2016-03-03 Thread Jason Altekruse (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reassigned DRILL-4441:
--

Assignee: Jason Altekruse

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-03 Thread james norris (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177961#comment-15177961
 ] 

james norris edited comment on DRILL-2048 at 3/3/16 3:32 PM:
-

This also happens when a HIVE storage plugin is configured to point to a thrift 
server, and that thrift server is then unavailable (i.e. turned off). This 
renders Drill unusable.


was (Author: norri...@gmail.com):
This also happens when a HIVE storage plugin is configured to point to a thrift 
server, and that thrift server is then unavailable. This renders Drill unusable.

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
> Fix For: Future
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-03 Thread james norris (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177961#comment-15177961
 ] 

james norris commented on DRILL-2048:
-

This also happens when a HIVE storage plugin is configured to point to a thrift 
server, and that thrift server is then unavailable. This renders Drill unusable.

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
> Fix For: Future
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4470) TPC-H dataset 404

2016-03-03 Thread Michael Mior (JIRA)

Michael Mior created DRILL-4470:
---

 Summary: TPC-H dataset 404
 Key: DRILL-4470
 URL: https://issues.apache.org/jira/browse/DRILL-4470
 Project: Apache Drill
  Issue Type: Bug
Reporter: Michael Mior


The URL for the TPC-H sample data is returning 404 which breaks the build.

http://apache-drill.s3.amazonaws.com/files//sf-0.01_tpc-h_parquet_typed.tgz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4469) SUM window query returns incorrect results over integer data

2016-03-03 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177764#comment-15177764
 ] 

Khurram Faraaz commented on DRILL-4469:
---

Query plan for the query that returns wrong results.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT SUM(c1) OVER w FROM 
(select * from dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER 
BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02Project(w0$o0=[$2])
00-03  Window(window#0=[window(partition {0} order by [0] range between 
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
00-04SelectionVectorRemover
00-05  Sort(sort0=[$0], sort1=[$0], dir0=[ASC], dir1=[ASC])
00-06Project(T6¦¦*=[$0], $1=[ITEM($0, 'c1')])
00-07  Project(T6¦¦*=[$0])
00-08Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype]], 
selectionRoot=maprfs:/tmp/t_alltype, numFiles=1, usedMetadataFile=false, 
columns=[`*`]]])
{noformat}

> SUM window query returns incorrect results over integer data
> 
>
> Key: DRILL-4469
> URL: https://issues.apache.org/jira/browse/DRILL-4469
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Priority: Critical
>  Labels: window_function
> Attachments: t_alltype.csv, t_alltype.parquet
>
>
> SUM window query returns incorrect results as compared to Postgres, with or 
> without the frame clause in the window definition. Note that there is a sub 
> query involved and data in column c1 is sorted integer data with no nulls.
> Drill 1.6.0 commit ID: 6d5f4983
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from 
> dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
> +-+
> | EXPR$0  |
> +-+
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> ...
> | 10585  |
> | 10585  |
> | 10585  |
> ++
> 145 rows selected (0.257 seconds)
> {noformat}
> results from Postgres 9.3
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW 
> w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING);
>  sum
> --
>  4499
>  4499
>  4499
>  4499
>  4499
>  4499
> ...
>  5613
>  5613
>  5613
>   473
>   473
>   473
>   473
>   473
> (145 rows)
> {noformat}
> Removing the frame clause from window definition, still results in completely 
> different results on Postgres vs Drill
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from 
> t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
> +-+
> | EXPR$0  |
> +-+
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> ...
> | 10585  |
> | 10585  |
> | 10585  |
> | 10585  |
> | 10585  |
> ++
> 145 rows selected (0.28 seconds)
> {noformat}
> Results from Postgres
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW 
> w AS (PARTITION BY c8 ORDER BY c1);
>  sum
> --
> 5
>12
>21
>33
>47
>62
>78
>96
>   115
>   135
>   158
>   182
>   207
>   233
>   260
>   289
> ...
> 4914
>  5051
>  5189
>  5328
>  5470
>  5613
> 8
>70
>   198
>   332
>   473
> (145 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4469) SUM window query returns incorrect results over integer data

2016-03-03 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-4469:
--
Attachment: t_alltype.csv
t_alltype.parquet

Attached data files here.

> SUM window query returns incorrect results over integer data
> 
>
> Key: DRILL-4469
> URL: https://issues.apache.org/jira/browse/DRILL-4469
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Priority: Critical
>  Labels: window_function
> Attachments: t_alltype.csv, t_alltype.parquet
>
>
> SUM window query returns incorrect results as compared to Postgres, with or 
> without the frame clause in the window definition. Note that there is a sub 
> query involved and data in column c1 is sorted integer data with no nulls.
> Drill 1.6.0 commit ID: 6d5f4983
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from 
> dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
> +-+
> | EXPR$0  |
> +-+
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> ...
> | 10585  |
> | 10585  |
> | 10585  |
> ++
> 145 rows selected (0.257 seconds)
> {noformat}
> results from Postgres 9.3
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW 
> w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING);
>  sum
> --
>  4499
>  4499
>  4499
>  4499
>  4499
>  4499
> ...
>  5613
>  5613
>  5613
>   473
>   473
>   473
>   473
>   473
> (145 rows)
> {noformat}
> Removing the frame clause from window definition, still results in completely 
> different results on Postgres vs Drill
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from 
> t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
> +-+
> | EXPR$0  |
> +-+
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> ...
> | 10585  |
> | 10585  |
> | 10585  |
> | 10585  |
> | 10585  |
> ++
> 145 rows selected (0.28 seconds)
> {noformat}
> Results from Postgres
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW 
> w AS (PARTITION BY c8 ORDER BY c1);
>  sum
> --
> 5
>12
>21
>33
>47
>62
>78
>96
>   115
>   135
>   158
>   182
>   207
>   233
>   260
>   289
> ...
> 4914
>  5051
>  5189
>  5328
>  5470
>  5613
> 8
>70
>   198
>   332
>   473
> (145 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4469) SUM window query returns incorrect results over integer data

2016-03-03 Thread Khurram Faraaz (JIRA)

Khurram Faraaz created DRILL-4469:
-

 Summary: SUM window query returns incorrect results over integer 
data
 Key: DRILL-4469
 URL: https://issues.apache.org/jira/browse/DRILL-4469
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.6.0
 Environment: 4 node CentOS cluster
Reporter: Khurram Faraaz
Priority: Critical


SUM window query returns incorrect results as compared to Postgres, with or 
without the frame clause in the window definition. Note that there is a sub 
query involved and data in column c1 is sorted integer data with no nulls.

Drill 1.6.0 commit ID: 6d5f4983

Results from Drill 1.6.0

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from 
dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
+-+
| EXPR$0  |
+-+
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
...
| 10585  |
| 10585  |
| 10585  |
++
145 rows selected (0.257 seconds)
{noformat}

results from Postgres 9.3

{noformat}
postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w 
AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING);
 sum
--
 4499
 4499
 4499
 4499
 4499
 4499
...
 5613
 5613
 5613
  473
  473
  473
  473
  473
(145 rows)
{noformat}

Removing the frame clause from window definition, still results in completely 
different results on Postgres vs Drill

Results from Drill 1.6.0

{noformat}
0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from 
t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
+-+
| EXPR$0  |
+-+
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
| 10585   |
...
| 10585  |
| 10585  |
| 10585  |
| 10585  |
| 10585  |
++
145 rows selected (0.28 seconds)
{noformat}

Results from Postgres

{noformat}
postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w 
AS (PARTITION BY c8 ORDER BY c1);
 sum
--
5
   12
   21
   33
   47
   62
   78
   96
  115
  135
  158
  182
  207
  233
  260
  289
...
4914
 5051
 5189
 5328
 5470
 5613
8
   70
  198
  332
  473
(145 rows)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table

2016-03-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177527#comment-15177527
 ] 

ASF GitHub Bot commented on DRILL-3688:
---

Github user arina-ielchiieva closed the pull request at:

https://github.com/apache/drill/pull/382


> Drill should honor "skip.header.line.count" and "skip.footer.line.count" 
> attributes of Hive table
> -
>
> Key: DRILL-3688
> URL: https://issues.apache.org/jira/browse/DRILL-3688
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: 1.1
>Reporter: Hao Zhu
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.6.0
>
>
> Currently Drill does not honor the "skip.header.line.count" attribute of Hive 
> table.
> It may cause some other format conversion issue.
> Reproduce:
> 1. Create a Hive table
> {code}
> create table h1db.testheader(col0 string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE
> tblproperties("skip.header.line.count"="1");
> {code}
> 2. Prepare a sample data:
> {code}
> # cat test.data
> col0
> 2015-01-01
> {code}
> 3. Load sample data into Hive
> {code}
> LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader;
> {code}
> 4. Hive
> {code}
> hive> select * from h1db.testheader ;
> OK
> 2015-01-01
> Time taken: 0.254 seconds, Fetched: 1 row(s)
> {code}
> 5. Drill
> {code}
> >  select * from hive.h1db.testheader ;
> +-+
> |col0 |
> +-+
> | col0|
> | 2015-01-01  |
> +-+
> 2 rows selected (0.257 seconds)
> > select cast(col0 as date) from hive.h1db.testheader ;
> Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
> be in the range [1,12]
> Fragment 0:0
> [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010]
>   (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be 
> in the range [1,12]
> org.joda.time.field.FieldUtils.verifyValueBounds():236
> org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613
> org.joda.time.chrono.BasicChronology.getDateTimeMillis():159
> org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218
> org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67
> org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():147
> org.apache.drill.exec.physical.impl.BaseRootExec.next():83
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79
> org.apache.drill.exec.physical.impl.BaseRootExec.next():73
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1566
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():255
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}
> Also "skip.footer.line.count" should be taken into account.
> If "skip.header.line.count" or "skip.footer.line.count" has incorrect value 
> in Hive, throw appropriate exception in Drill.
> Ex: Hive table property skip.header.line.count value 'someValue' is 
> non-numeric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-4464) Apache Drill cannot read parquet generated outside Drill: Reading past RLE/BitPacking stream

2016-03-03 Thread Miroslav Holubec (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175566#comment-15175566
 ] 

Miroslav Holubec edited comment on DRILL-4464 at 3/3/16 9:01 AM:
-

output from MR-tools meta. TS column is causing an issue:
{noformat}
$ java -jar c:\devel\parquet-mr\parquet-tools\target\parquet-tools-1.8.1.jar 
meta tmp.gz.parquet
file:file:/tmp/tmp.gz.parquet
creator: parquet-mr version 1.8.1 (build 
4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)

file schema: nat

ts:  REQUIRED INT64 R:0 D:0
dr:  REQUIRED INT32 R:0 D:0
ui:  OPTIONAL BINARY O:UTF8 R:0 D:1
up:  OPTIONAL INT32 R:0 D:1
ri:  OPTIONAL BINARY O:UTF8 R:0 D:1
rp:  OPTIONAL INT32 R:0 D:1
di:  OPTIONAL BINARY O:UTF8 R:0 D:1
dp:  OPTIONAL INT32 R:0 D:1
pr:  REQUIRED INT32 R:0 D:0
ob:  OPTIONAL INT64 R:0 D:1
ib:  OPTIONAL INT64 R:0 D:1

row group 1: RC:2418197 TS:30601003 OFFSET:4

ts:   INT64 GZIP DO:0 FPO:4 SZ:2630987/19172128/7.29 VC:2418197 
ENC:BIT_PACKED,PLAIN,PLAIN_DICTIONARY
dr:   INT32 GZIP DO:0 FPO:2630991 SZ:333876/1197646/3.59 VC:2418197 
ENC:BIT_PACKED,PLAIN_DICTIONARY
ui:   BINARY GZIP DO:0 FPO:2964867 SZ:2088/1565/0.75 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
up:   INT32 GZIP DO:0 FPO:2966955 SZ:4514663/4652474/1.03 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
ri:   BINARY GZIP DO:0 FPO:7481618 SZ:2088/1565/0.75 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
rp:   INT32 GZIP DO:0 FPO:7483706 SZ:4511485/4652474/1.03 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
di:   BINARY GZIP DO:0 FPO:11995191 SZ:56/36/0.64 VC:2418197 
ENC:BIT_PACKED,PLAIN,RLE
dp:   INT32 GZIP DO:0 FPO:11995247 SZ:56/36/0.64 VC:2418197 
ENC:BIT_PACKED,PLAIN,RLE
pr:   INT32 GZIP DO:0 FPO:11995303 SZ:627/407/0.65 VC:2418197 
ENC:BIT_PACKED,PLAIN_DICTIONARY
ob:   INT64 GZIP DO:0 FPO:11995930 SZ:3597/3998/1.11 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
ib:   INT64 GZIP DO:0 FPO:11999527 SZ:292939/918674/3.14 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
{noformat}


was (Author: myroch):
output from MR-tools meta. TS column is causing an issue:
{noformat}
java -jar c:\devel\parquet-mr\parquet-tools\target\parquet-tools-1.8.1.jar meta 
tmp.gz.parquet
file:file:/C:/smaz/tmp.gz.parquet
creator: parquet-mr version 1.8.1 (build 
4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)

file schema: nat

ts:  REQUIRED INT64 R:0 D:0
dr:  REQUIRED INT32 R:0 D:0
ui:  OPTIONAL BINARY O:UTF8 R:0 D:1
up:  OPTIONAL INT32 R:0 D:1
ri:  OPTIONAL BINARY O:UTF8 R:0 D:1
rp:  OPTIONAL INT32 R:0 D:1
di:  OPTIONAL BINARY O:UTF8 R:0 D:1
dp:  OPTIONAL INT32 R:0 D:1
pr:  REQUIRED INT32 R:0 D:0
ob:  OPTIONAL INT64 R:0 D:1
ib:  OPTIONAL INT64 R:0 D:1

row group 1: RC:2418197 TS:30601003 OFFSET:4

ts:   INT64 GZIP DO:0 FPO:4 SZ:2630987/19172128/7.29 VC:2418197 
ENC:BIT_PACKED,PLAIN,PLAIN_DICTIONARY
dr:   INT32 GZIP DO:0 FPO:2630991 SZ:333876/1197646/3.59 VC:2418197 
ENC:BIT_PACKED,PLAIN_DICTIONARY
ui:   BINARY GZIP DO:0 FPO:2964867 SZ:2088/1565/0.75 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
up:   INT32 GZIP DO:0 FPO:2966955 SZ:4514663/4652474/1.03 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
ri:   BINARY GZIP DO:0 FPO:7481618 SZ:2088/1565/0.75 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
rp:   INT32 GZIP DO:0 FPO:7483706 SZ:4511485/4652474/1.03 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
di:   BINARY GZIP DO:0 FPO:11995191 SZ:56/36/0.64 VC:2418197 
ENC:BIT_PACKED,PLAIN,RLE
dp:   INT32 GZIP DO:0 FPO:11995247 SZ:56/36/0.64 VC:2418197 
ENC:BIT_PACKED,PLAIN,RLE
pr:   INT32 GZIP DO:0 FPO:11995303 SZ:627/407/0.65 VC:2418197 
ENC:BIT_PACKED,PLAIN_DICTIONARY
ob:   INT64 GZIP DO:0 FPO:11995930 SZ:3597/3998/1.11 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
ib:   INT64 GZIP DO:0 FPO:11999527 SZ:292939/918674/3.14 VC:2418197 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
{noformat}

> Apache Drill cannot read parquet generated outside Drill: Reading past 
> RLE/BitPacking stream
> 
>
> Key: DRILL-4464
> URL: https://issues.apache.org/jira/browse/DRILL-4464
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>

[jira] [Updated] (DRILL-4464) Apache Drill cannot read parquet generated outside Drill: Reading past RLE/BitPacking stream

2016-03-03 Thread Miroslav Holubec (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miroslav Holubec updated DRILL-4464:

Affects Version/s: 1.4.0
  Description: 
When I generate file using MapReduce and parquet 1.8.1 (or 1.8.1-drill-r0), 
which contains REQUIRED INT64 field, I'm not able to read this column in drill, 
but I'm able to read full content using parquet-tools cat/dump. This doesn't 
happened every time, it is input data dependant (so probably different encoding 
is chosen by parquet for given column?).

Error reported by drill:
{noformat}
2016-03-02 03:01:16,354 [29296305-abe2-f4bd-ded0-27bb53f631f0:frag:3:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: 
Reading past RLE/BitPacking stream.

Fragment 3:0

[Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalArgumentException: Reading past RLE/BitPacking stream.

Fragment 3:0

[Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.4.0.jar:1.4.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
parquet record reader.
Message:
Hadoop path: /tmp/tmp.gz.parquet
Total records read: 131070
Mock records read: 0
Records to read: 21845
Row group index: 0
Records in row group: 2418197
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message nat {
  required int64 ts;
  required int32 dr;
  optional binary ui (UTF8);
  optional int32 up;
  optional binary ri (UTF8);
  optional int32 rp;
  optional binary di (UTF8);
  optional int32 dp;
  required int32 pr;
  optional int64 ob;
  optional int64 ib;
}
, metadata: {}}, blocks: [BlockMetaData{2418197, 30601003 [ColumnMetaData{GZIP 
[ts] INT64  [PLAIN_DICTIONARY, BIT_PACKED, PLAIN], 4}, ColumnMetaData{GZIP [dr] 
INT32  [PLAIN_DICTIONARY, BIT_PACKED], 2630991}, ColumnMetaData{GZIP [ui] 
BINARY  [PLAIN_DICTIONARY, RLE, BIT_PACKED], 2964867}, ColumnMetaData{GZIP [up] 
INT32  [PLAIN_DICTIONARY, RLE, BIT_PACKED], 2966955}, ColumnMetaData{GZIP [ri] 
BINARY  [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7481618}, ColumnMetaData{GZIP [rp] 
INT32  [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7483706}, ColumnMetaData{GZIP [di] 
BINARY  [RLE, BIT_PACKED, PLAIN], 11995191}, ColumnMetaData{GZIP [dp] INT32  
[RLE, BIT_PACKED, PLAIN], 11995247}, ColumnMetaData{GZIP [pr] INT32  
[PLAIN_DICTIONARY, BIT_PACKED], 11995303}, ColumnMetaData{GZIP [ob] INT64  
[PLAIN_DICTIONARY, RLE, BIT_PACKED], 11995930}, ColumnMetaData{GZIP [ib] INT64  
[PLAIN_DICTIONARY, RLE, BIT_PACKED], 11999527}]}]}
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:345)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:447)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191) 
~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
 ~[drill-java-exec-1.4.0.jar:1.4.0]

43 matches

Mail list logo