[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179526#comment-15179526 ] ASF GitHub Bot commented on DRILL-4281: --- Github user yufeldman commented on the pull request: https://github.com/apache/drill/pull/400#issuecomment-192171659 Couple of general comments: 1. Since you are using Hadoop UGI it probably makes sense to be more compliant with Hadoop auth definitions. Which are: "superuser" can proxy for "user(s), group(s) and host(s)". May be adding group that can proxy is OK, but it is not what is done in Hadoop world today. - hadoop.proxyuser.superuser.hostscomma separated hosts from which superuser access are allowed to impersonation. * means wildcard. hadoop.proxyuser.superuser.groups comma separated groups to which users impersonated by superuser belongs. * means wildcard. - 2. I think what we call here delegate/delegator is a true impersonation, what we call "chained impersonation" is kind of opposite of impersonation as it is increasing privileges versus restricting them. > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing
[ https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179450#comment-15179450 ] ASF GitHub Bot commented on DRILL-4465: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/401 > Refactor Parsing and Planning to canonicalize planning and parsing > -- > > Key: DRILL-4465 > URL: https://issues.apache.org/jira/browse/DRILL-4465 > Project: Apache Drill > Issue Type: Sub-task > Components: Query Planning & Optimization >Reporter: Jacques Nadeau >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4467) Invalid projection created using PrelUtil.getColumns
[ https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4467: -- Fix Version/s: 1.6.0 > Invalid projection created using PrelUtil.getColumns > > > Key: DRILL-4467 > URL: https://issues.apache.org/jira/browse/DRILL-4467 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Critical > Fix For: 1.6.0 > > > In {{DrillPushProjIntoScan}}, a new scan and a new projection are created > using {{PrelUtil#getColumn(RelDataType, List)}}. > The returned {{ProjectPushInfo}} instance has several fields, one of them is > {{desiredFields}} which is the list of projected fields. There's one instance > per {{RexNode}} but because instances were initially added to a set, they > might not be in the same order as the order they were created. > The issue happens in the following code: > {code:java} > List newProjects = Lists.newArrayList(); > for (RexNode n : proj.getChildExps()) { > newProjects.add(n.accept(columnInfo.getInputRewriter())); > } > {code} > This code creates a new list of projects out of the initial ones, by mapping > the indices from the old projects to the new projects, but the indices of the > new RexNode instances might be out of order (because of the ordering of > desiredFields). And if indices are out of order, the check > {{ProjectRemoveRule.isTrivial(newProj)}} will fail. > My guess is that desiredFields ordering should be preserved when instances > are added, to satisfy the condition above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4467) Invalid projection created using PrelUtil.getColumns
[ https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179446#comment-15179446 ] Jacques Nadeau commented on DRILL-4467: --- Yes, agree that desiredFields should be a newLinkedHashSet. > Invalid projection created using PrelUtil.getColumns > > > Key: DRILL-4467 > URL: https://issues.apache.org/jira/browse/DRILL-4467 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Critical > Fix For: 1.6.0 > > > In {{DrillPushProjIntoScan}}, a new scan and a new projection are created > using {{PrelUtil#getColumn(RelDataType, List)}}. > The returned {{ProjectPushInfo}} instance has several fields, one of them is > {{desiredFields}} which is the list of projected fields. There's one instance > per {{RexNode}} but because instances were initially added to a set, they > might not be in the same order as the order they were created. > The issue happens in the following code: > {code:java} > List newProjects = Lists.newArrayList(); > for (RexNode n : proj.getChildExps()) { > newProjects.add(n.accept(columnInfo.getInputRewriter())); > } > {code} > This code creates a new list of projects out of the initial ones, by mapping > the indices from the old projects to the new projects, but the indices of the > new RexNode instances might be out of order (because of the ordering of > desiredFields). And if indices are out of order, the check > {{ProjectRemoveRule.isTrivial(newProj)}} will fail. > My guess is that desiredFields ordering should be preserved when instances > are added, to satisfy the condition above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4467) Invalid projection created using PrelUtil.getColumns
[ https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-4467: -- Priority: Critical (was: Major) > Invalid projection created using PrelUtil.getColumns > > > Key: DRILL-4467 > URL: https://issues.apache.org/jira/browse/DRILL-4467 > Project: Apache Drill > Issue Type: Bug >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Critical > > In {{DrillPushProjIntoScan}}, a new scan and a new projection are created > using {{PrelUtil#getColumn(RelDataType, List)}}. > The returned {{ProjectPushInfo}} instance has several fields, one of them is > {{desiredFields}} which is the list of projected fields. There's one instance > per {{RexNode}} but because instances were initially added to a set, they > might not be in the same order as the order they were created. > The issue happens in the following code: > {code:java} > List newProjects = Lists.newArrayList(); > for (RexNode n : proj.getChildExps()) { > newProjects.add(n.accept(columnInfo.getInputRewriter())); > } > {code} > This code creates a new list of projects out of the initial ones, by mapping > the indices from the old projects to the new projects, but the indices of the > new RexNode instances might be out of order (because of the ordering of > desiredFields). And if indices are out of order, the check > {{ProjectRemoveRule.isTrivial(newProj)}} will fail. > My guess is that desiredFields ordering should be preserved when instances > are added, to satisfy the condition above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4437) Implement framework for testing operators in isolation
[ https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179369#comment-15179369 ] ASF GitHub Bot commented on DRILL-4437: --- Github user parthchandra commented on the pull request: https://github.com/apache/drill/pull/394#issuecomment-192114749 +1. Great to have this framework; nicely done. > Implement framework for testing operators in isolation > -- > > Key: DRILL-4437 > URL: https://issues.apache.org/jira/browse/DRILL-4437 > Project: Apache Drill > Issue Type: Test > Components: Tools, Build & Test >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.6.0 > > > Most of the tests written for Drill are end-to-end. We spin up a full > instance of the server, submit one or more SQL queries and check the results. > While integration tests like this are useful for ensuring that all features > are guaranteed to not break end-user functionality overuse of this approach > has caused a number of pain points. > Overall the tests end up running a lot of the exact same code, parsing and > planning many similar queries. > Creating consistent reproductions of issues, especially edge cases found in > clustered environments can be extremely difficult. Even the simpler case of > testing cases where operators are able to handle a particular series of > incoming batches of records has required hacks like generating large enough > files so that the scanners happen to break them up into separate batches. > These tests are brittle as they make assumptions about how the scanners will > work in the future. An example of when this could break, we might do perf > evaluation to find out we should be producing larger batches in some cases. > Existing tests that are trying to test multiple batches by producing a few > more records than the current threshold for batch size would not be testing > the same code paths. > We need to make more parts of the system testable without initializing the > entire Drill server, as well as making the different internal settings and > state of the server configurable for tests. > This is a first effort to enable testing the physical operators in Drill by > mocking the components of the system necessary to enable operators to > initialize and execute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179333#comment-15179333 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54990639 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java --- @@ -0,0 +1,147 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.util; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.collect.Sets; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.server.options.TypeValidators; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.util.List; +import java.util.Set; + +/** + * Utilities for user delegation purpose. + */ +public class UserDelegationUtil { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class); + + private static final String STAR = "*"; + + private static final ObjectMapper delegationDefinitionsMapper = new ObjectMapper(); + + static { + delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, false); + delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true); + } + + private static class DelegationDefinition { +public UserGroupDefinition delegates = new UserGroupDefinition(); +public UserGroupDefinition delegators = new UserGroupDefinition(); + } + + private static class UserGroupDefinition { +public Set users = Sets.newHashSet(); +public Set groups = Sets.newHashSet(); + } + + /** + * Deserialize delegation definitions string to a list of delegation definition objects. + * + * @param delegationDefinitions delegation definitions as a sting + * @return delegation definitions as a list of objects + * @throws IOException + */ + public static List deserializeDelegationDefinitions(final String delegationDefinitions) + throws IOException { +return delegationDefinitionsMapper.readValue(delegationDefinitions, +new TypeReference() {}); + } + + /** + * Validator for delegation definitions. + */ + public static class DelegationDefinitionsValidator extends TypeValidators.AdminOptionValidator { + +public DelegationDefinitionsValidator(String name, String def) { + super(name, def); +} + +@Override +public void validate(OptionValue v) { + super.validate(v); + + final List definitions; + try { +definitions = deserializeDelegationDefinitions(v.string_val); + } catch (final IOException e) { +throw UserException.validationError() +.message("Invalid delegation definition.\nDetails: %s", e.getMessage()) +.build(logger); + } + + for (final DelegationDefinition definition : definitions) { +if (definition.delegates.users.contains(STAR) || +definition.delegates.groups.contains(STAR)) { + throw UserException.validationError() + .message("No wildcard delegates allowed.") + .build(logger); +} + } +} + } + + /** + * Check if the given delegate is authorized to delegate for the delegator based on the delegation definitions. + * + * @param delegateName delegate name + * @param delegatorName delegator
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179261#comment-15179261 ] ASF GitHub Bot commented on DRILL-4281: --- Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54987453 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java --- @@ -0,0 +1,147 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.util; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.collect.Sets; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.server.options.TypeValidators; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.util.List; +import java.util.Set; + +/** + * Utilities for user delegation purpose. + */ +public class UserDelegationUtil { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class); + + private static final String STAR = "*"; + + private static final ObjectMapper delegationDefinitionsMapper = new ObjectMapper(); + + static { + delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, false); + delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true); + } + + private static class DelegationDefinition { +public UserGroupDefinition delegates = new UserGroupDefinition(); +public UserGroupDefinition delegators = new UserGroupDefinition(); + } + + private static class UserGroupDefinition { +public Set users = Sets.newHashSet(); +public Set groups = Sets.newHashSet(); + } + + /** + * Deserialize delegation definitions string to a list of delegation definition objects. + * + * @param delegationDefinitions delegation definitions as a sting + * @return delegation definitions as a list of objects + * @throws IOException + */ + public static List deserializeDelegationDefinitions(final String delegationDefinitions) + throws IOException { +return delegationDefinitionsMapper.readValue(delegationDefinitions, +new TypeReference() {}); + } + + /** + * Validator for delegation definitions. + */ + public static class DelegationDefinitionsValidator extends TypeValidators.AdminOptionValidator { + +public DelegationDefinitionsValidator(String name, String def) { + super(name, def); +} + +@Override +public void validate(OptionValue v) { + super.validate(v); + + final List definitions; + try { +definitions = deserializeDelegationDefinitions(v.string_val); + } catch (final IOException e) { +throw UserException.validationError() +.message("Invalid delegation definition.\nDetails: %s", e.getMessage()) +.build(logger); + } + + for (final DelegationDefinition definition : definitions) { +if (definition.delegates.users.contains(STAR) || +definition.delegates.groups.contains(STAR)) { + throw UserException.validationError() + .message("No wildcard delegates allowed.") + .build(logger); +} + } +} + } + + /** + * Check if the given delegate is authorized to delegate for the delegator based on the delegation definitions. + * + * @param delegateName delegate name + * @param delegatorName
[jira] [Commented] (DRILL-4416) Quote path separator for windows
[ https://issues.apache.org/jira/browse/DRILL-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179210#comment-15179210 ] ASF GitHub Bot commented on DRILL-4416: --- Github user hnfgns commented on the pull request: https://github.com/apache/drill/pull/385#issuecomment-192074959 This patch causes a random leak. I am backing it off for a while. > Quote path separator for windows > > > Key: DRILL-4416 > URL: https://issues.apache.org/jira/browse/DRILL-4416 > Project: Apache Drill > Issue Type: Bug >Reporter: Hanifi Gunes >Assignee: Hanifi Gunes > Fix For: 1.7.0 > > > Windows uses backslash as its path separator. We need to do string > manipulation using the separator during which the separator must be quoted. > This issue proposes (i) creating a global static path separator variable in > common and (ii) removing all others and (iii) using quoted separator where > need be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4416) Quote path separator for windows
[ https://issues.apache.org/jira/browse/DRILL-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-4416: Fix Version/s: 1.7.0 > Quote path separator for windows > > > Key: DRILL-4416 > URL: https://issues.apache.org/jira/browse/DRILL-4416 > Project: Apache Drill > Issue Type: Bug >Reporter: Hanifi Gunes >Assignee: Hanifi Gunes > Fix For: 1.7.0 > > > Windows uses backslash as its path separator. We need to do string > manipulation using the separator during which the separator must be quoted. > This issue proposes (i) creating a global static path separator variable in > common and (ii) removing all others and (iii) using quoted separator where > need be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4325) ForemanException: One or more nodes lost connectivity during query
[ https://issues.apache.org/jira/browse/DRILL-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179143#comment-15179143 ] Jacques Nadeau commented on DRILL-4325: --- [~vicky], would you be willing to run your same oversaturation test and see if our stability is better with this simple patch? https://github.com/jacques-n/drill/tree/DRILL-4466b I'd like to see if we can help the kernel scheduler enough that it runs work at a larger quantum. This won't solve the gross over-parallelization issue directly but it may help the system context switch less. In reality, a change in scheduling won't actually impact the core problem of too many simultaneous tasks. No matter the threading model, having 4000 tasks competing for ~40 logical cores is going to mean slow progress. Clearly we need to increase the switch quantum in these cases so we make forward progress (hopefully impacted with my small patch). However, if we target a quantum of 100ms, that means tasks would wait 10s between each 100ms of work. In other words, we can't schedule this many tasks and expect speedy forward progress. We need to enable inbound controls as well as ensure that we reduce the parallelization behavior on a heavily loaded node. > ForemanException: One or more nodes lost connectivity during query > -- > > Key: DRILL-4325 > URL: https://issues.apache.org/jira/browse/DRILL-4325 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.5.0 >Reporter: Victoria Markman > Attachments: drillbit.log.133, drillbit.log.134, drillbit.log.135, > drillbit.log.136, stats.133.tar, stats.134.tar, stats.135.tar, stats.136.tar, > zookeeper.log > > > The picture pretty much looks like this: bunch of queries are running > (usually something more involved than just simple functional tests),usually > tpch or tpcds with lots of major fragments, like query74 from tpcds . > Zookeeper decides that particular node is dead and queries that were running > at the time of the connection loss are failed by drill ( which is correct > behavior, I think ) > It seems that I can reliably reproduce this issue when I bump up number of > concurrently running queries and make all of them go to the same forman node > (I don't really imply here that planning is to blame, just seems to > reproduce easier) > On my 4 node cluster I can pretty much reproduce this problem relaiably by > running: > run.sh -s Advanced/tpcds/tpcds_sf100/original -g smoke -t 600 -n 10 > {code} > 2016-01-28 16:30:20,146 [29554d63-b478-6bae-f0f6-435d9f33ffdf:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d63-b478-6bae-f0f6-435d9f33ffdf: select * from sys.version > 2016-01-28 16:30:22,844 [29554d61-2789-babb-54e5-22b701bf2f64:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d61-2789-babb-54e5-22b701bf2f64: select * from sys.drillbits > 2016-01-28 16:30:23,281 [29554d60-5bbd-dae1-c38d-21708ad37fbe:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d60-5bbd-dae1-c38d-21708ad37fbe: alter system set > `planner.enable_decimal_data_type` = true > 2016-01-28 16:30:24,889 [29554d5e-d243-6299-3103-58b180135854:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5e-d243-6299-3103-58b180135854: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:24,931 [29554d5e-b395-14aa-42a4-f6f248059363:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5e-b395-14aa-42a4-f6f248059363: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:24,964 [29554d5f-24ac-cf00-714c-7419d3894af0:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5f-24ac-cf00-714c-7419d3894af0: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:24,998 [29554d5e-ae92-6306-3495-be5cb7f98139:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5e-ae92-6306-3495-be5cb7f98139: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:25,040 [29554d5e-1a20-3d6d-143b-0ee3bcd4aa11:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5e-1a20-3d6d-143b-0ee3bcd4aa11: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:25,073 [29554d5d-e7b4-c61c-9735-ce37938aa47d:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5d-e7b4-c61c-9735-ce37938aa47d: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:25,106 [29554d5d-823b-0536-e4df-4c6cef64b3e4:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 29554d5d-823b-0536-e4df-4c6cef64b3e4: use `dfs.tpcds_sf100_parquet_views` > 2016-01-28 16:30:25,131
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179140#comment-15179140 ] ASF GitHub Bot commented on DRILL-4281: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54980440 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java --- @@ -0,0 +1,147 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.util; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.collect.Sets; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.server.options.TypeValidators; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.util.List; +import java.util.Set; + +/** + * Utilities for user delegation purpose. + */ +public class UserDelegationUtil { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class); + + private static final String STAR = "*"; + + private static final ObjectMapper delegationDefinitionsMapper = new ObjectMapper(); + + static { + delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, false); + delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true); + } + + private static class DelegationDefinition { +public UserGroupDefinition delegates = new UserGroupDefinition(); +public UserGroupDefinition delegators = new UserGroupDefinition(); + } + + private static class UserGroupDefinition { +public Set users = Sets.newHashSet(); +public Set groups = Sets.newHashSet(); + } + + /** + * Deserialize delegation definitions string to a list of delegation definition objects. + * + * @param delegationDefinitions delegation definitions as a sting + * @return delegation definitions as a list of objects + * @throws IOException + */ + public static List deserializeDelegationDefinitions(final String delegationDefinitions) + throws IOException { +return delegationDefinitionsMapper.readValue(delegationDefinitions, +new TypeReference() {}); + } + + /** + * Validator for delegation definitions. + */ + public static class DelegationDefinitionsValidator extends TypeValidators.AdminOptionValidator { + +public DelegationDefinitionsValidator(String name, String def) { + super(name, def); +} + +@Override +public void validate(OptionValue v) { + super.validate(v); + + final List definitions; + try { +definitions = deserializeDelegationDefinitions(v.string_val); + } catch (final IOException e) { +throw UserException.validationError() +.message("Invalid delegation definition.\nDetails: %s", e.getMessage()) +.build(logger); + } + + for (final DelegationDefinition definition : definitions) { +if (definition.delegates.users.contains(STAR) || +definition.delegates.groups.contains(STAR)) { + throw UserException.validationError() + .message("No wildcard delegates allowed.") + .build(logger); +} + } +} + } + + /** + * Check if the given delegate is authorized to delegate for the delegator based on the delegation definitions. + * + * @param delegateName delegate name + * @param delegatorName
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179126#comment-15179126 ] ASF GitHub Bot commented on DRILL-4281: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54979630 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/security/testing/UserAuthenticatorToTestDelegation.java --- @@ -0,0 +1,72 @@ +package org.apache.drill.exec.rpc.user.security.testing; +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.exec.exception.DrillbitStartupException; +import org.apache.drill.exec.rpc.user.security.UserAuthenticationException; +import org.apache.drill.exec.rpc.user.security.UserAuthenticator; +import org.apache.drill.exec.rpc.user.security.UserAuthenticatorTemplate; +import org.apache.drill.exec.util.ImpersonationUtil; + +import java.io.IOException; + +import static org.apache.drill.exec.delegation.TestUserDelegation.OWNER; +import static org.apache.drill.exec.delegation.TestUserDelegation.OWNER_PASSWORD; +import static org.apache.drill.exec.delegation.TestUserDelegation.DELEGATOR_NAME; +import static org.apache.drill.exec.delegation.TestUserDelegation.DELEGATOR_PASSWORD; +import static org.apache.drill.exec.delegation.TestUserDelegation.DELEGATE_NAME; +import static org.apache.drill.exec.delegation.TestUserDelegation.DELEGATE_PASSWORD; + +/** + * Used by {@link org.apache.drill.exec.delegation.TestUserDelegation}. + * + * Needs to be in this package. + */ +@UserAuthenticatorTemplate(type = UserAuthenticatorToTestDelegation.TYPE) +public class UserAuthenticatorToTestDelegation implements UserAuthenticator { --- End diff -- Can you add the new users to existing test authenticator impl UserAuthenticatorTestImpl.class? > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179096#comment-15179096 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/400#issuecomment-192046477 Generally looks good. +1 with the few small items above addressed. Updating the names to something else would be good. Since this is also impersonation (just client impersonation instead of storage plugin impersonation) I'm not sure I would shy away from using the term. The main goal for me is clear directionality. I think "principals" works well for the first piece. Ideas for the second: "can_execute_as", "can_impersonate", "can_act_as", ? > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179091#comment-15179091 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54977484 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/delegation/TestUserDelegation.java --- @@ -0,0 +1,124 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.delegation; + +import com.google.common.collect.Maps; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.dotdrill.DotDrillType; +import org.apache.drill.exec.impersonation.BaseTestImpersonation; +import org.apache.drill.exec.rpc.user.UserSession; +import org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorToTestDelegation; +import org.apache.drill.exec.store.dfs.WorkspaceConfig; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.permission.FsPermission; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.Map; +import java.util.Properties; + +import static org.junit.Assert.assertEquals; + +public class TestUserDelegation extends BaseTestImpersonation { --- End diff -- Can you also add some negative tests that confirm nice error messages? (User tries to delegate to disallowed user, group, etc) > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179086#comment-15179086 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54977306 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/delegation/TestDelegationPrivileges.java --- @@ -0,0 +1,137 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.delegation; + +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.impersonation.BaseTestImpersonation; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.util.UserDelegationUtil; +import org.junit.Test; + +import static junit.framework.Assert.assertEquals; + +public class TestDelegationPrivileges extends BaseTestImpersonation { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(TestDelegationPrivileges.class); + + // definitions on which the tests are based + private static final String DELEGATION_DEFINITIONS = "[" + + "{ delegates : { users : [\"user0_1\"] }," + --- End diff -- Might be nice to put this in a file so we can have people refer to an example set of settings in the codebase (without having to filter out Java escaping). > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179083#comment-15179083 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54977178 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java --- @@ -0,0 +1,147 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.util; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.collect.Sets; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.server.options.TypeValidators; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.util.List; +import java.util.Set; + +/** + * Utilities for user delegation purpose. + */ +public class UserDelegationUtil { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class); + + private static final String STAR = "*"; + + private static final ObjectMapper delegationDefinitionsMapper = new ObjectMapper(); + + static { + delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, false); + delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true); + } + + private static class DelegationDefinition { +public UserGroupDefinition delegates = new UserGroupDefinition(); +public UserGroupDefinition delegators = new UserGroupDefinition(); + } + + private static class UserGroupDefinition { +public Set users = Sets.newHashSet(); +public Set groups = Sets.newHashSet(); + } + + /** + * Deserialize delegation definitions string to a list of delegation definition objects. + * + * @param delegationDefinitions delegation definitions as a sting + * @return delegation definitions as a list of objects + * @throws IOException + */ + public static List deserializeDelegationDefinitions(final String delegationDefinitions) + throws IOException { +return delegationDefinitionsMapper.readValue(delegationDefinitions, +new TypeReference() {}); + } + + /** + * Validator for delegation definitions. + */ + public static class DelegationDefinitionsValidator extends TypeValidators.AdminOptionValidator { + +public DelegationDefinitionsValidator(String name, String def) { + super(name, def); +} + +@Override +public void validate(OptionValue v) { + super.validate(v); + + final List definitions; + try { +definitions = deserializeDelegationDefinitions(v.string_val); + } catch (final IOException e) { +throw UserException.validationError() +.message("Invalid delegation definition.\nDetails: %s", e.getMessage()) +.build(logger); + } + + for (final DelegationDefinition definition : definitions) { +if (definition.delegates.users.contains(STAR) || +definition.delegates.groups.contains(STAR)) { + throw UserException.validationError() + .message("No wildcard delegates allowed.") + .build(logger); +} + } +} + } + + /** + * Check if the given delegate is authorized to delegate for the delegator based on the delegation definitions. + * + * @param delegateName delegate name + * @param delegatorName delegator
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179082#comment-15179082 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54977126 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/util/UserDelegationUtil.java --- @@ -0,0 +1,147 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.util; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.core.JsonParser; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.collect.Sets; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.server.options.OptionValue; +import org.apache.drill.exec.server.options.TypeValidators; +import org.apache.hadoop.security.UserGroupInformation; + +import java.io.IOException; +import java.util.List; +import java.util.Set; + +/** + * Utilities for user delegation purpose. + */ +public class UserDelegationUtil { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(UserDelegationUtil.class); + + private static final String STAR = "*"; + + private static final ObjectMapper delegationDefinitionsMapper = new ObjectMapper(); + + static { + delegationDefinitionsMapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, false); + delegationDefinitionsMapper.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true); + } + + private static class DelegationDefinition { +public UserGroupDefinition delegates = new UserGroupDefinition(); +public UserGroupDefinition delegators = new UserGroupDefinition(); + } + + private static class UserGroupDefinition { +public Set users = Sets.newHashSet(); +public Set groups = Sets.newHashSet(); + } + + /** + * Deserialize delegation definitions string to a list of delegation definition objects. + * + * @param delegationDefinitions delegation definitions as a sting + * @return delegation definitions as a list of objects + * @throws IOException + */ + public static List deserializeDelegationDefinitions(final String delegationDefinitions) + throws IOException { +return delegationDefinitionsMapper.readValue(delegationDefinitions, +new TypeReference() {}); + } + + /** + * Validator for delegation definitions. + */ + public static class DelegationDefinitionsValidator extends TypeValidators.AdminOptionValidator { + +public DelegationDefinitionsValidator(String name, String def) { + super(name, def); +} + +@Override +public void validate(OptionValue v) { + super.validate(v); + + final List definitions; + try { +definitions = deserializeDelegationDefinitions(v.string_val); + } catch (final IOException e) { +throw UserException.validationError() +.message("Invalid delegation definition.\nDetails: %s", e.getMessage()) +.build(logger); + } + + for (final DelegationDefinition definition : definitions) { +if (definition.delegates.users.contains(STAR) || +definition.delegates.groups.contains(STAR)) { + throw UserException.validationError() + .message("No wildcard delegates allowed.") + .build(logger); +} + } +} + } + + /** + * Check if the given delegate is authorized to delegate for the delegator based on the delegation definitions. + * + * @param delegateName delegate name + * @param delegatorName delegator
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179077#comment-15179077 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54976899 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java --- @@ -116,14 +137,38 @@ public OptionManager getOptions() { return sessionOptions; } - public DrillUser getUser() { -return user; - } - public UserCredentials getCredentials() { return credentials; } + /** + * Replace current user credentials with the given user's credentials, if authorized. + * + * @param delegatorName delegator name + * @throws DrillRuntimeException if credentials cannot be replaced + */ + public void replaceUserCredentials(final String delegatorName) { +assert enableDelegation; --- End diff -- No need for assert, preconditions makes sure we get Exception instead of typically uncaptured Error subclass and this isn't perf sensitive. > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179073#comment-15179073 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/400#discussion_r54976762 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java --- @@ -88,6 +89,7 @@ String USER_AUTHENTICATION_ENABLED = "drill.exec.security.user.auth.enabled"; String USER_AUTHENTICATOR_IMPL = "drill.exec.security.user.auth.impl"; String PAM_AUTHENTICATOR_PROFILES = "drill.exec.security.user.auth.pam_profiles"; + String USER_DELEGATION_ENABLED = "drill.exec.delegation.enabled"; --- End diff -- Isn't an empty delegation block enough? Any reason to have a second kill switch? > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179026#comment-15179026 ] ASF GitHub Bot commented on DRILL-4281: --- Github user sudheeshkatkam commented on the pull request: https://github.com/apache/drill/pull/400#issuecomment-192036063 I don't think they are common. How about "principals" and "can_delegate_for"? I am not strongly against "can_impersonate", but I want to avoid confusion with user impersonation. Does everything else look good? > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema
[ https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178659#comment-15178659 ] ASF GitHub Bot commented on DRILL-3623: --- Github user sudheeshkatkam commented on the pull request: https://github.com/apache/drill/pull/193#issuecomment-191973058 Moving to #405. > Limit 0 should avoid execution when querying a known schema > --- > > Key: DRILL-3623 > URL: https://issues.apache.org/jira/browse/DRILL-3623 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: MapR cluster >Reporter: Andries Engelbrecht >Assignee: Sudheesh Katkam > Labels: doc-impacting > Fix For: Future > > > Running a select * from hive.table limit 0 does not return (hangs). > Select * from hive.table limit 1 works fine > Hive table is about 6GB with 330 files with parquet using snappy compression. > Data types are int, bigint, string and double. > Querying directory with parquet files through the DFS plugin works fine > select * from dfs.root.`/user/hive/warehouse/database/table` limit 0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema
[ https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178660#comment-15178660 ] ASF GitHub Bot commented on DRILL-3623: --- Github user sudheeshkatkam closed the pull request at: https://github.com/apache/drill/pull/193 > Limit 0 should avoid execution when querying a known schema > --- > > Key: DRILL-3623 > URL: https://issues.apache.org/jira/browse/DRILL-3623 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: MapR cluster >Reporter: Andries Engelbrecht >Assignee: Sudheesh Katkam > Labels: doc-impacting > Fix For: Future > > > Running a select * from hive.table limit 0 does not return (hangs). > Select * from hive.table limit 1 works fine > Hive table is about 6GB with 330 files with parquet using snappy compression. > Data types are int, bigint, string and double. > Querying directory with parquet files through the DFS plugin works fine > select * from dfs.root.`/user/hive/warehouse/database/table` limit 0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3623) Limit 0 should avoid execution when querying a known schema
[ https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178655#comment-15178655 ] ASF GitHub Bot commented on DRILL-3623: --- GitHub user sudheeshkatkam opened a pull request: https://github.com/apache/drill/pull/405 DRILL-3623: For limit 0 queries, use a shorter path when result column types are known Moving from #193 to here. + There is a pull request open for first commit (DRILL-4372: #397). + Second commit has a "nice to have" check: ensuring planning and execution types patch. + My changes are in the third commit (e4cfdfa). Please review this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sudheeshkatkam/drill DRILL-3623-pr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #405 commit c553365e39947ba6c95d645cc971cf4d696ee758 Author: Sudheesh KatkamDate: 2015-12-22T04:38:59Z DRILL-4372: Expose the functions return type to Drill - Drill-Calite version update: This commit needs to have Calcite's patch (CALCITE-1062) to plugin customized SqlOperator. - FunctionTemplate Add FunctionArgumentNumber annotation. This annotation element tells if the number of argument(s) is fixed or arbitrary (e.g., String concatenation function). Due to this modification, there are some minor changes in DrillFuncHolder, DrillFunctionRegistry and FunctionAttributes. - Checker Add a new Checker (which Calcite uses to validate the legitimacy of the number of argument(s) for a function) to allow functions with arbitrary arguments to pass Caclite's validation - Type conversion between Drill and Calcite DrillConstExector is given a static method getDrillTypeFromCalcite() to convert Calcite types to Drill's. - Extract function's return type inference Unlike other functions, Extract function's return type can be determined solely based on the first argument. A logic is added in to allow this inference to happen - DrillCalcite wrapper: From the aspects of return type inference and argument type checks, Calcite's mechanism is very different from Drill's. In addition, currently, there is no straightforward way for Drill to plug-in customized mechanisms to Calcite. Thus, wrappers are provided to serve the objective. Except for the mechanisms of type inference and argument type checks, these wrappers just forward any method calls to the wrapped SqlOpertor, SqlFuncion or SqlAggFunction to respond. A interface DrillCalciteSqlWrapper is also added for the callers of the three wrappers to get the wrapped objects easier. Due to these wrappers, UnsupportedOperatorsVisitor is modified in a minor manner. - Calcite's SqlOpertor, SqlFuncion or SqlAggFunction are wrapped in DrillOperatorTable Instead of returning Caclite's native SqlOpertor, SqlFuncion or SqlAggFunction, return the wrapped ones to ensure customized behaviors can be adopted. - Type inference mechanism This mechanism is used across all SqlOpertor, SqlFuncion or SqlAggFunction. Thus, it is factored out as its own method in TypeInferenceUtils - Upgrade Drill-Calcite Bump version number to 1.4.0-drill-test-r16 - Implement two argument version of lpad, rpad - Implement one argument version of ltrim, rtrim, btrim commit c3f0649e3ebb45d54e747f099d6699150bfa9869 Author: Hsuan-Yi Chu Date: 2016-02-03T05:17:50Z DRILL-4372: Part 2: Optionally ensure planning and execution types match commit e4cfdfa9b0562d52ac07f6d80860a82fa8baba40 Author: Sudheesh Katkam Date: 2016-03-03T21:25:39Z DRILL-3623: For limit 0 queries, use a shorter path when result column types are known > Limit 0 should avoid execution when querying a known schema > --- > > Key: DRILL-3623 > URL: https://issues.apache.org/jira/browse/DRILL-3623 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: MapR cluster >Reporter: Andries Engelbrecht >Assignee: Sudheesh Katkam > Labels: doc-impacting > Fix For: Future > > > Running a select * from hive.table limit 0 does not return (hangs). > Select * from hive.table limit 1 works fine > Hive table is about 6GB with 330 files with parquet using snappy compression. > Data types are int, bigint, string and double. > Querying directory with parquet
[jira] [Commented] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178627#comment-15178627 ] ASF GitHub Bot commented on DRILL-4281: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/400#issuecomment-191963668 Are delegate and delegator common used terms for these things? Frankly, I would have to look these up to confirm which is which and could see making a mistake reversing them. Any way we can make them something with clearer directionality? (If everybody else thinks that this distinction and directionality is super clear, nevermind.) > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: doc-impacting, security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4384) Query profile is missing important information on WebUi
[ https://issues.apache.org/jira/browse/DRILL-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178324#comment-15178324 ] Jason Altekruse commented on DRILL-4384: Fixed in c95b5432301fe487d64a1fc06e765228469fc3a2 > Query profile is missing important information on WebUi > --- > > Key: DRILL-4384 > URL: https://issues.apache.org/jira/browse/DRILL-4384 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jacques Nadeau >Priority: Blocker > Fix For: 1.6.0 > > Attachments: DRILL-4384.patch > > > Built drill from master branch (0a2518d7cf01a92a27a82e29edac5424bedf31d5) and > started in embedded mode. Then, > run a query and checked the query profile through WebUI. However, > seems that the fragment profiles , operator profiles and visualized > plan sections are all empty. Tried both Mac and CentOS and hit the same > problem. > After doing a binary search over recent commits, seems the patch of > "DRILL-3581: Upgrade HPPC to 0.7.1" is the cause of broken query > profiles [1]. The query profile on the commits before DRILL-3581 > looks fine. > [1] > https://github.com/apache/drill/commit/d27127c94d5c08306697a5627a1bac5f144abb22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4471) Add unit test for the Drill Web UI
Jason Altekruse created DRILL-4471: -- Summary: Add unit test for the Drill Web UI Key: DRILL-4471 URL: https://issues.apache.org/jira/browse/DRILL-4471 Project: Apache Drill Issue Type: Test Reporter: Jason Altekruse Assignee: Jason Altekruse While the Web UI isn't being very actively developed, a few times changes to the Drill build or internal parts of the server have broken parts of the Web UI. As the web UI is a primary interface for viewing cluster information, cancelling queries, configuring storage and other tasks, we really should add automated tests for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4384) Query profile is missing important information on WebUi
[ https://issues.apache.org/jira/browse/DRILL-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178321#comment-15178321 ] Jason Altekruse commented on DRILL-4384: [~jni] Venki merged this yesterday along with some other outstanding patches. I do agree with you about the automated tests for the UI. I have opened DRILL-4471 to track this task. > Query profile is missing important information on WebUi > --- > > Key: DRILL-4384 > URL: https://issues.apache.org/jira/browse/DRILL-4384 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jacques Nadeau >Priority: Blocker > Fix For: 1.6.0 > > Attachments: DRILL-4384.patch > > > Built drill from master branch (0a2518d7cf01a92a27a82e29edac5424bedf31d5) and > started in embedded mode. Then, > run a query and checked the query profile through WebUI. However, > seems that the fragment profiles , operator profiles and visualized > plan sections are all empty. Tried both Mac and CentOS and hit the same > problem. > After doing a binary search over recent commits, seems the patch of > "DRILL-3581: Upgrade HPPC to 0.7.1" is the cause of broken query > profiles [1]. The query profile on the commits before DRILL-3581 > looks fine. > [1] > https://github.com/apache/drill/commit/d27127c94d5c08306697a5627a1bac5f144abb22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing
[ https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178268#comment-15178268 ] ASF GitHub Bot commented on DRILL-4465: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/401#issuecomment-191890837 @jinfengni: I've addressed your review comments. Let me know any additional feedback. thanks! > Refactor Parsing and Planning to canonicalize planning and parsing > -- > > Key: DRILL-4465 > URL: https://issues.apache.org/jira/browse/DRILL-4465 > Project: Apache Drill > Issue Type: Sub-task > Components: Query Planning & Optimization >Reporter: Jacques Nadeau >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing
[ https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178153#comment-15178153 ] ASF GitHub Bot commented on DRILL-4465: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/401#discussion_r54912350 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlParser.java --- @@ -0,0 +1,349 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import java.util.Arrays; +import java.util.List; + +import org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.calcite.avatica.util.Casing; +import org.apache.calcite.avatica.util.Quoting; +import org.apache.calcite.jdbc.CalciteSchemaImpl; +import org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.calcite.plan.ConventionTraitDef; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptCostFactory; +import org.apache.calcite.plan.RelOptTable; +import org.apache.calcite.plan.volcano.VolcanoPlanner; +import org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.calcite.rel.type.RelDataTypeSystemImpl; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.sql.SqlNode; +import org.apache.calcite.sql.SqlOperatorTable; +import org.apache.calcite.sql.parser.SqlParseException; +import org.apache.calcite.sql.parser.SqlParser; +import org.apache.calcite.sql.parser.SqlParserImplFactory; +import org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.calcite.sql.type.SqlTypeName; +import org.apache.calcite.sql.util.ChainedSqlOperatorTable; +import org.apache.calcite.sql.validate.SqlConformance; +import org.apache.calcite.sql.validate.SqlValidatorCatalogReader; +import org.apache.calcite.sql.validate.SqlValidatorImpl; +import org.apache.calcite.sql2rel.RelDecorrelator; +import org.apache.calcite.sql2rel.SqlToRelConverter; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry; +import org.apache.drill.exec.ops.UdfUtilities; +import org.apache.drill.exec.planner.cost.DrillCostBase; +import org.apache.drill.exec.planner.logical.DrillConstExecutor; +import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.planner.sql.parser.impl.DrillParserWithCompoundIdConverter; + +/** + * Class responsible for managing parsing, validation and toRel conversion for sql statements. + */ +public class DrillSqlParser { --- End diff -- SqlConverter seems fine to me. > Refactor Parsing and Planning to canonicalize planning and parsing > -- > > Key: DRILL-4465 > URL: https://issues.apache.org/jira/browse/DRILL-4465 > Project: Apache Drill > Issue Type: Sub-task > Components: Query Planning & Optimization >Reporter: Jacques Nadeau >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing
[ https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178160#comment-15178160 ] ASF GitHub Bot commented on DRILL-4465: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/401#discussion_r54912629 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlParser.java --- @@ -0,0 +1,349 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import java.util.Arrays; +import java.util.List; + +import org.apache.calcite.adapter.java.JavaTypeFactory; +import org.apache.calcite.avatica.util.Casing; +import org.apache.calcite.avatica.util.Quoting; +import org.apache.calcite.jdbc.CalciteSchemaImpl; +import org.apache.calcite.jdbc.JavaTypeFactoryImpl; +import org.apache.calcite.plan.ConventionTraitDef; +import org.apache.calcite.plan.RelOptCluster; +import org.apache.calcite.plan.RelOptCostFactory; +import org.apache.calcite.plan.RelOptTable; +import org.apache.calcite.plan.volcano.VolcanoPlanner; +import org.apache.calcite.prepare.CalciteCatalogReader; +import org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.calcite.rel.type.RelDataTypeSystemImpl; +import org.apache.calcite.rex.RexBuilder; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.sql.SqlNode; +import org.apache.calcite.sql.SqlOperatorTable; +import org.apache.calcite.sql.parser.SqlParseException; +import org.apache.calcite.sql.parser.SqlParser; +import org.apache.calcite.sql.parser.SqlParserImplFactory; +import org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.calcite.sql.type.SqlTypeName; +import org.apache.calcite.sql.util.ChainedSqlOperatorTable; +import org.apache.calcite.sql.validate.SqlConformance; +import org.apache.calcite.sql.validate.SqlValidatorCatalogReader; +import org.apache.calcite.sql.validate.SqlValidatorImpl; +import org.apache.calcite.sql2rel.RelDecorrelator; +import org.apache.calcite.sql2rel.SqlToRelConverter; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry; +import org.apache.drill.exec.ops.UdfUtilities; +import org.apache.drill.exec.planner.cost.DrillCostBase; +import org.apache.drill.exec.planner.logical.DrillConstExecutor; +import org.apache.drill.exec.planner.physical.DrillDistributionTraitDef; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.planner.sql.parser.impl.DrillParserWithCompoundIdConverter; + +/** + * Class responsible for managing parsing, validation and toRel conversion for sql statements. + */ +public class DrillSqlParser { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DrillSqlParser.class); + + private static DrillTypeSystem DRILL_TYPE_SYSTEM = new DrillTypeSystem(); + + private final JavaTypeFactory typeFactory; + private final SqlParser.Config parserConfig; + private final CalciteCatalogReader catalog; + private final PlannerSettings settings; + private final SchemaPlus rootSchema; + private final SchemaPlus defaultSchema; + private final SqlOperatorTable opTab; + private final RelOptCostFactory costFactory; + private final DrillValidator validator; + private final boolean isInnerQuery; + private final UdfUtilities util; + private final FunctionImplementationRegistry functions; + + private String sql; + private VolcanoPlanner planner; + + + public DrillSqlParser(PlannerSettings settings, SchemaPlus defaultSchema, + final SqlOperatorTable operatorTable, UdfUtilities util,
[jira] [Commented] (DRILL-4465) Refactor Parsing and Planning to canonicalize planning and parsing
[ https://issues.apache.org/jira/browse/DRILL-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178092#comment-15178092 ] ASF GitHub Bot commented on DRILL-4465: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/401#discussion_r54907758 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java --- @@ -273,12 +282,90 @@ public RelNode visit(RelNode other) { } + /** + * Transform RelNode to a new RelNode without changing any traits. Also will log the outcome. + * + * @param plannerType + * The type of Planner to use. + * @param phase + * The transformation phase we're running. + * @param input + * The origianl RelNode + * @return The transformed relnode. + */ + private RelNode transform(PlannerType plannerType, PlannerPhase phase, RelNode input) { +return transform(plannerType, phase, input, input.getTraitSet()); + } + + /** + * Transform RelNode to a new RelNode, targeting the provided set of traits. Also will log the outcome. + * + * @param plannerType + * The type of Planner to use. + * @param phase + * The transformation phase we're running. + * @param input + * The origianl RelNode + * @param targetTraits + * The traits we are targeting for output. + * @return The transformed relnode. + */ + protected RelNode transform(PlannerType plannerType, PlannerPhase phase, RelNode input, RelTraitSet targetTraits) { +final Stopwatch watch = Stopwatch.createStarted(); +final RuleSet rules = config.getRules(phase); +final RelTraitSet toTraits = targetTraits.simplify(); + +final RelNode output; +switch (plannerType) { +case HEP_BOTTOM_UP: +case HEP: { + final HepProgramBuilder hepPgmBldr = new HepProgramBuilder(); + if (plannerType == PlannerType.HEP_BOTTOM_UP) { +hepPgmBldr.addMatchOrder(HepMatchOrder.BOTTOM_UP); + } + for (RelOptRule rule : rules) { +hepPgmBldr.addRuleInstance(rule); + } + + final HepPlanner planner = new HepPlanner(hepPgmBldr.build(), context.getPlannerSettings()); + + final List list = Lists.newArrayList(); + list.add(DrillDefaultRelMetadataProvider.INSTANCE); + planner.registerMetadataProviders(list); + final RelMetadataProvider cachingMetaDataProvider = new CachingRelMetadataProvider( + ChainedRelMetadataProvider.of(list), planner); + + // Modify RelMetaProvider for every RelNode in the SQL operator Rel tree. + input.accept(new MetaDataProviderModifier(cachingMetaDataProvider)); + planner.setRoot(input); + if (!input.getTraitSet().equals(targetTraits)) { +planner.changeTraits(input, toTraits); + } + output = planner.findBestExp(); + break; +} +case VOLCANO: +default: { + // as weird as it seems, the cluster's only planner is the volcano planner. + final RelOptPlanner planner = input.getCluster().getPlanner(); + final Program program = Programs.of(rules); + output = program.run(planner, input, toTraits); + + break; +} +} + +log(plannerType.name() + ":" + phase.description, output, logger, watch); --- End diff -- Sorry for the confusion. You are right that there is no impact when debug is disabled. The reason for the performance difference is IDE enables debug mode, which will cause the unit test to run longer. As long as debug is disabled, we would not see difference. > Refactor Parsing and Planning to canonicalize planning and parsing > -- > > Key: DRILL-4465 > URL: https://issues.apache.org/jira/browse/DRILL-4465 > Project: Apache Drill > Issue Type: Sub-task > Components: Query Planning & Optimization >Reporter: Jacques Nadeau >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4441) IN operator does not work with Avro reader
[ https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178020#comment-15178020 ] Jason Altekruse commented on DRILL-4441: I need to fix this for varbinary too, but to make the test case work I had to fix casts from varchar to varbinary, as it does not appear we support binary literals. > IN operator does not work with Avro reader > -- > > Key: DRILL-4441 > URL: https://issues.apache.org/jira/browse/DRILL-4441 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.5.0 > Environment: Ubuntu >Reporter: Stefán Baxter >Assignee: Jason Altekruse >Priority: Critical > Fix For: 1.6.0 > > > IN operator simply does not work. > (And I find it interesting that Storage-Avro is not available here in Jira as > a Storage component) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4441) IN operator does not work with Avro reader
[ https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse reassigned DRILL-4441: -- Assignee: Jason Altekruse > IN operator does not work with Avro reader > -- > > Key: DRILL-4441 > URL: https://issues.apache.org/jira/browse/DRILL-4441 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Avro >Affects Versions: 1.5.0 > Environment: Ubuntu >Reporter: Stefán Baxter >Assignee: Jason Altekruse >Priority: Critical > Fix For: 1.6.0 > > > IN operator simply does not work. > (And I find it interesting that Storage-Avro is not available here in Jira as > a Storage component) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting
[ https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177961#comment-15177961 ] james norris edited comment on DRILL-2048 at 3/3/16 3:32 PM: - This also happens when a HIVE storage plugin is configured to point to a thrift server, and that thrift server is then unavailable (i.e. turned off). This renders Drill unusable. was (Author: norri...@gmail.com): This also happens when a HIVE storage plugin is configured to point to a thrift server, and that thrift server is then unavailable. This renders Drill unusable. > Malformed drill stoage config stored in zookeeper will prevent Drill from > starting > -- > > Key: DRILL-2048 > URL: https://issues.apache.org/jira/browse/DRILL-2048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Jason Altekruse > Fix For: Future > > > We noticed this problem while trying to test dev builds on a common cluster. > When applying changes that added a field to the configuration of a storage > plugin, the new format of the configuration would be persisted in zookeeper. > When a different dev build that did not include the change set tried to be > deployed on the same cluster the config stored in zookeeper would fail to > parse and the drillbit would not be able to start. This is not system > critical configuration so the drillbit should be able to still start with the > plugin disabled. > This fix could also include changing the jackson mapper to allow ignoring > unexpected fields in the configuration. This would give a little better > chance for interoperability between future versions of Drill as we add new > configuration options as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting
[ https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177961#comment-15177961 ] james norris commented on DRILL-2048: - This also happens when a HIVE storage plugin is configured to point to a thrift server, and that thrift server is then unavailable. This renders Drill unusable. > Malformed drill stoage config stored in zookeeper will prevent Drill from > starting > -- > > Key: DRILL-2048 > URL: https://issues.apache.org/jira/browse/DRILL-2048 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Jason Altekruse > Fix For: Future > > > We noticed this problem while trying to test dev builds on a common cluster. > When applying changes that added a field to the configuration of a storage > plugin, the new format of the configuration would be persisted in zookeeper. > When a different dev build that did not include the change set tried to be > deployed on the same cluster the config stored in zookeeper would fail to > parse and the drillbit would not be able to start. This is not system > critical configuration so the drillbit should be able to still start with the > plugin disabled. > This fix could also include changing the jackson mapper to allow ignoring > unexpected fields in the configuration. This would give a little better > chance for interoperability between future versions of Drill as we add new > configuration options as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4470) TPC-H dataset 404
Michael Mior created DRILL-4470: --- Summary: TPC-H dataset 404 Key: DRILL-4470 URL: https://issues.apache.org/jira/browse/DRILL-4470 Project: Apache Drill Issue Type: Bug Reporter: Michael Mior The URL for the TPC-H sample data is returning 404 which breaks the build. http://apache-drill.s3.amazonaws.com/files//sf-0.01_tpc-h_parquet_typed.tgz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4469) SUM window query returns incorrect results over integer data
[ https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177764#comment-15177764 ] Khurram Faraaz commented on DRILL-4469: --- Query plan for the query that returns wrong results. {noformat} 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT SUM(c1) OVER w FROM (select * from dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02Project(w0$o0=[$2]) 00-03 Window(window#0=[window(partition {0} order by [0] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])]) 00-04SelectionVectorRemover 00-05 Sort(sort0=[$0], sort1=[$0], dir0=[ASC], dir1=[ASC]) 00-06Project(T6¦¦*=[$0], $1=[ITEM($0, 'c1')]) 00-07 Project(T6¦¦*=[$0]) 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype]], selectionRoot=maprfs:/tmp/t_alltype, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) {noformat} > SUM window query returns incorrect results over integer data > > > Key: DRILL-4469 > URL: https://issues.apache.org/jira/browse/DRILL-4469 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 > Environment: 4 node CentOS cluster >Reporter: Khurram Faraaz >Priority: Critical > Labels: window_function > Attachments: t_alltype.csv, t_alltype.parquet > > > SUM window query returns incorrect results as compared to Postgres, with or > without the frame clause in the window definition. Note that there is a sub > query involved and data in column c1 is sorted integer data with no nulls. > Drill 1.6.0 commit ID: 6d5f4983 > Results from Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from > dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE > BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); > +-+ > | EXPR$0 | > +-+ > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ... > | 10585 | > | 10585 | > | 10585 | > ++ > 145 rows selected (0.257 seconds) > {noformat} > results from Postgres 9.3 > {noformat} > postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW > w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING); > sum > -- > 4499 > 4499 > 4499 > 4499 > 4499 > 4499 > ... > 5613 > 5613 > 5613 > 473 > 473 > 473 > 473 > 473 > (145 rows) > {noformat} > Removing the frame clause from window definition, still results in completely > different results on Postgres vs Drill > Results from Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from > t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1); > +-+ > | EXPR$0 | > +-+ > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ... > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ++ > 145 rows selected (0.28 seconds) > {noformat} > Results from Postgres > {noformat} > postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW > w AS (PARTITION BY c8 ORDER BY c1); > sum > -- > 5 >12 >21 >33 >47 >62 >78 >96 > 115 > 135 > 158 > 182 > 207 > 233 > 260 > 289 > ... > 4914 > 5051 > 5189 > 5328 > 5470 > 5613 > 8 >70 > 198 > 332 > 473 > (145 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4469) SUM window query returns incorrect results over integer data
[ https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-4469: -- Attachment: t_alltype.csv t_alltype.parquet Attached data files here. > SUM window query returns incorrect results over integer data > > > Key: DRILL-4469 > URL: https://issues.apache.org/jira/browse/DRILL-4469 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 > Environment: 4 node CentOS cluster >Reporter: Khurram Faraaz >Priority: Critical > Labels: window_function > Attachments: t_alltype.csv, t_alltype.parquet > > > SUM window query returns incorrect results as compared to Postgres, with or > without the frame clause in the window definition. Note that there is a sub > query involved and data in column c1 is sorted integer data with no nulls. > Drill 1.6.0 commit ID: 6d5f4983 > Results from Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from > dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE > BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); > +-+ > | EXPR$0 | > +-+ > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ... > | 10585 | > | 10585 | > | 10585 | > ++ > 145 rows selected (0.257 seconds) > {noformat} > results from Postgres 9.3 > {noformat} > postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW > w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING); > sum > -- > 4499 > 4499 > 4499 > 4499 > 4499 > 4499 > ... > 5613 > 5613 > 5613 > 473 > 473 > 473 > 473 > 473 > (145 rows) > {noformat} > Removing the frame clause from window definition, still results in completely > different results on Postgres vs Drill > Results from Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from > t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1); > +-+ > | EXPR$0 | > +-+ > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ... > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ++ > 145 rows selected (0.28 seconds) > {noformat} > Results from Postgres > {noformat} > postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW > w AS (PARTITION BY c8 ORDER BY c1); > sum > -- > 5 >12 >21 >33 >47 >62 >78 >96 > 115 > 135 > 158 > 182 > 207 > 233 > 260 > 289 > ... > 4914 > 5051 > 5189 > 5328 > 5470 > 5613 > 8 >70 > 198 > 332 > 473 > (145 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4469) SUM window query returns incorrect results over integer data
Khurram Faraaz created DRILL-4469: - Summary: SUM window query returns incorrect results over integer data Key: DRILL-4469 URL: https://issues.apache.org/jira/browse/DRILL-4469 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.6.0 Environment: 4 node CentOS cluster Reporter: Khurram Faraaz Priority: Critical SUM window query returns incorrect results as compared to Postgres, with or without the frame clause in the window definition. Note that there is a sub query involved and data in column c1 is sorted integer data with no nulls. Drill 1.6.0 commit ID: 6d5f4983 Results from Drill 1.6.0 {noformat} 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); +-+ | EXPR$0 | +-+ | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | ... | 10585 | | 10585 | | 10585 | ++ 145 rows selected (0.257 seconds) {noformat} results from Postgres 9.3 {noformat} postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); sum -- 4499 4499 4499 4499 4499 4499 ... 5613 5613 5613 473 473 473 473 473 (145 rows) {noformat} Removing the frame clause from window definition, still results in completely different results on Postgres vs Drill Results from Drill 1.6.0 {noformat} 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1); +-+ | EXPR$0 | +-+ | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | ... | 10585 | | 10585 | | 10585 | | 10585 | | 10585 | ++ 145 rows selected (0.28 seconds) {noformat} Results from Postgres {noformat} postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1); sum -- 5 12 21 33 47 62 78 96 115 135 158 182 207 233 260 289 ... 4914 5051 5189 5328 5470 5613 8 70 198 332 473 (145 rows) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table
[ https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177527#comment-15177527 ] ASF GitHub Bot commented on DRILL-3688: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/382 > Drill should honor "skip.header.line.count" and "skip.footer.line.count" > attributes of Hive table > - > > Key: DRILL-3688 > URL: https://issues.apache.org/jira/browse/DRILL-3688 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: 1.1 >Reporter: Hao Zhu >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.6.0 > > > Currently Drill does not honor the "skip.header.line.count" attribute of Hive > table. > It may cause some other format conversion issue. > Reproduce: > 1. Create a Hive table > {code} > create table h1db.testheader(col0 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE > tblproperties("skip.header.line.count"="1"); > {code} > 2. Prepare a sample data: > {code} > # cat test.data > col0 > 2015-01-01 > {code} > 3. Load sample data into Hive > {code} > LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; > {code} > 4. Hive > {code} > hive> select * from h1db.testheader ; > OK > 2015-01-01 > Time taken: 0.254 seconds, Fetched: 1 row(s) > {code} > 5. Drill > {code} > > select * from hive.h1db.testheader ; > +-+ > |col0 | > +-+ > | col0| > | 2015-01-01 | > +-+ > 2 rows selected (0.257 seconds) > > select cast(col0 as date) from hive.h1db.testheader ; > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > Fragment 0:0 > [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] > (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be > in the range [1,12] > org.joda.time.field.FieldUtils.verifyValueBounds():236 > org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 > org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 > org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 > org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 > org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():147 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1566 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():255 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} > Also "skip.footer.line.count" should be taken into account. > If "skip.header.line.count" or "skip.footer.line.count" has incorrect value > in Hive, throw appropriate exception in Drill. > Ex: Hive table property skip.header.line.count value 'someValue' is > non-numeric -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4464) Apache Drill cannot read parquet generated outside Drill: Reading past RLE/BitPacking stream
[ https://issues.apache.org/jira/browse/DRILL-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175566#comment-15175566 ] Miroslav Holubec edited comment on DRILL-4464 at 3/3/16 9:01 AM: - output from MR-tools meta. TS column is causing an issue: {noformat} $ java -jar c:\devel\parquet-mr\parquet-tools\target\parquet-tools-1.8.1.jar meta tmp.gz.parquet file:file:/tmp/tmp.gz.parquet creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) file schema: nat ts: REQUIRED INT64 R:0 D:0 dr: REQUIRED INT32 R:0 D:0 ui: OPTIONAL BINARY O:UTF8 R:0 D:1 up: OPTIONAL INT32 R:0 D:1 ri: OPTIONAL BINARY O:UTF8 R:0 D:1 rp: OPTIONAL INT32 R:0 D:1 di: OPTIONAL BINARY O:UTF8 R:0 D:1 dp: OPTIONAL INT32 R:0 D:1 pr: REQUIRED INT32 R:0 D:0 ob: OPTIONAL INT64 R:0 D:1 ib: OPTIONAL INT64 R:0 D:1 row group 1: RC:2418197 TS:30601003 OFFSET:4 ts: INT64 GZIP DO:0 FPO:4 SZ:2630987/19172128/7.29 VC:2418197 ENC:BIT_PACKED,PLAIN,PLAIN_DICTIONARY dr: INT32 GZIP DO:0 FPO:2630991 SZ:333876/1197646/3.59 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ui: BINARY GZIP DO:0 FPO:2964867 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY up: INT32 GZIP DO:0 FPO:2966955 SZ:4514663/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ri: BINARY GZIP DO:0 FPO:7481618 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY rp: INT32 GZIP DO:0 FPO:7483706 SZ:4511485/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY di: BINARY GZIP DO:0 FPO:11995191 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE dp: INT32 GZIP DO:0 FPO:11995247 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE pr: INT32 GZIP DO:0 FPO:11995303 SZ:627/407/0.65 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ob: INT64 GZIP DO:0 FPO:11995930 SZ:3597/3998/1.11 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ib: INT64 GZIP DO:0 FPO:11999527 SZ:292939/918674/3.14 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY {noformat} was (Author: myroch): output from MR-tools meta. TS column is causing an issue: {noformat} java -jar c:\devel\parquet-mr\parquet-tools\target\parquet-tools-1.8.1.jar meta tmp.gz.parquet file:file:/C:/smaz/tmp.gz.parquet creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) file schema: nat ts: REQUIRED INT64 R:0 D:0 dr: REQUIRED INT32 R:0 D:0 ui: OPTIONAL BINARY O:UTF8 R:0 D:1 up: OPTIONAL INT32 R:0 D:1 ri: OPTIONAL BINARY O:UTF8 R:0 D:1 rp: OPTIONAL INT32 R:0 D:1 di: OPTIONAL BINARY O:UTF8 R:0 D:1 dp: OPTIONAL INT32 R:0 D:1 pr: REQUIRED INT32 R:0 D:0 ob: OPTIONAL INT64 R:0 D:1 ib: OPTIONAL INT64 R:0 D:1 row group 1: RC:2418197 TS:30601003 OFFSET:4 ts: INT64 GZIP DO:0 FPO:4 SZ:2630987/19172128/7.29 VC:2418197 ENC:BIT_PACKED,PLAIN,PLAIN_DICTIONARY dr: INT32 GZIP DO:0 FPO:2630991 SZ:333876/1197646/3.59 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ui: BINARY GZIP DO:0 FPO:2964867 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY up: INT32 GZIP DO:0 FPO:2966955 SZ:4514663/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ri: BINARY GZIP DO:0 FPO:7481618 SZ:2088/1565/0.75 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY rp: INT32 GZIP DO:0 FPO:7483706 SZ:4511485/4652474/1.03 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY di: BINARY GZIP DO:0 FPO:11995191 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE dp: INT32 GZIP DO:0 FPO:11995247 SZ:56/36/0.64 VC:2418197 ENC:BIT_PACKED,PLAIN,RLE pr: INT32 GZIP DO:0 FPO:11995303 SZ:627/407/0.65 VC:2418197 ENC:BIT_PACKED,PLAIN_DICTIONARY ob: INT64 GZIP DO:0 FPO:11995930 SZ:3597/3998/1.11 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ib: INT64 GZIP DO:0 FPO:11999527 SZ:292939/918674/3.14 VC:2418197 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY {noformat} > Apache Drill cannot read parquet generated outside Drill: Reading past > RLE/BitPacking stream > > > Key: DRILL-4464 > URL: https://issues.apache.org/jira/browse/DRILL-4464 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0, 1.5.0 >
[jira] [Updated] (DRILL-4464) Apache Drill cannot read parquet generated outside Drill: Reading past RLE/BitPacking stream
[ https://issues.apache.org/jira/browse/DRILL-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miroslav Holubec updated DRILL-4464: Affects Version/s: 1.4.0 Description: When I generate file using MapReduce and parquet 1.8.1 (or 1.8.1-drill-r0), which contains REQUIRED INT64 field, I'm not able to read this column in drill, but I'm able to read full content using parquet-tools cat/dump. This doesn't happened every time, it is input data dependant (so probably different encoding is chosen by parquet for given column?). Error reported by drill: {noformat} 2016-03-02 03:01:16,354 [29296305-abe2-f4bd-ded0-27bb53f631f0:frag:3:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking stream. Fragment 3:0 [Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalArgumentException: Reading past RLE/BitPacking stream. Fragment 3:0 [Error Id: e2d02152-1b67-4c9f-9cb1-bd2b9ff302d8 on drssc9a4:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet record reader. Message: Hadoop path: /tmp/tmp.gz.parquet Total records read: 131070 Mock records read: 0 Records to read: 21845 Row group index: 0 Records in row group: 2418197 Parquet Metadata: ParquetMetaData{FileMetaData{schema: message nat { required int64 ts; required int32 dr; optional binary ui (UTF8); optional int32 up; optional binary ri (UTF8); optional int32 rp; optional binary di (UTF8); optional int32 dp; required int32 pr; optional int64 ob; optional int64 ib; } , metadata: {}}, blocks: [BlockMetaData{2418197, 30601003 [ColumnMetaData{GZIP [ts] INT64 [PLAIN_DICTIONARY, BIT_PACKED, PLAIN], 4}, ColumnMetaData{GZIP [dr] INT32 [PLAIN_DICTIONARY, BIT_PACKED], 2630991}, ColumnMetaData{GZIP [ui] BINARY [PLAIN_DICTIONARY, RLE, BIT_PACKED], 2964867}, ColumnMetaData{GZIP [up] INT32 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 2966955}, ColumnMetaData{GZIP [ri] BINARY [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7481618}, ColumnMetaData{GZIP [rp] INT32 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 7483706}, ColumnMetaData{GZIP [di] BINARY [RLE, BIT_PACKED, PLAIN], 11995191}, ColumnMetaData{GZIP [dp] INT32 [RLE, BIT_PACKED, PLAIN], 11995247}, ColumnMetaData{GZIP [pr] INT32 [PLAIN_DICTIONARY, BIT_PACKED], 11995303}, ColumnMetaData{GZIP [ob] INT64 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 11995930}, ColumnMetaData{GZIP [ib] INT64 [PLAIN_DICTIONARY, RLE, BIT_PACKED], 11999527}]}]} at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:345) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:447) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:191) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.4.0.jar:1.4.0]