[jira] [Created] (HIVE-19356) TestNegativeMinimrCliDriver (cluster_tasklog_retrieval, mapreduce_stack_trace, mapreduce_stack_trace_turnoff, and minimr_broken_pipe) are failing
Deepak Jaiswal created HIVE-19356: - Summary: TestNegativeMinimrCliDriver (cluster_tasklog_retrieval, mapreduce_stack_trace, mapreduce_stack_trace_turnoff, and minimr_broken_pipe) are failing Key: HIVE-19356 URL: https://issues.apache.org/jira/browse/HIVE-19356 Project: Hive Issue Type: Sub-task Reporter: Deepak Jaiswal All these tests fail the same way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19355) Test TestJdbcWithDBTokenStoreNoDoAs has multiple failures
Deepak Jaiswal created HIVE-19355: - Summary: Test TestJdbcWithDBTokenStoreNoDoAs has multiple failures Key: HIVE-19355 URL: https://issues.apache.org/jira/browse/HIVE-19355 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Deepak Jaiswal Test TestJdbcWithDBTokenStoreNoDoAs has many subtests which are failing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19354) from_utc_timestamp returns incorrect results for datetime values with timezone
Bruce Robbins created HIVE-19354: Summary: from_utc_timestamp returns incorrect results for datetime values with timezone Key: HIVE-19354 URL: https://issues.apache.org/jira/browse/HIVE-19354 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0 Reporter: Bruce Robbins On the master branch, from_utc_timestamp returns incorrect results for datetime strings that contain a timezone: {noformat} hive> select from_utc_timestamp('2000-10-10 00:00:00+00:00', 'America/Los_Angeles'); OK 2000-10-09 10:00:00 Time taken: 0.294 seconds, Fetched: 1 row(s) hive> select from_utc_timestamp('2000-10-10 00:00:00', 'America/Los_Angeles'); OK 2000-10-09 17:00:00 Time taken: 0.121 seconds, Fetched: 1 row(s) hive> {noformat} Both inputs are 2000-10-10 00:00:00 in UTC time, but I got two different results. In version 2.3.3, from_utc_timestamp doesn't accept timezones in its input strings, so it does not have this bug: {noformat} hive> select from_utc_timestamp('2000-10-10 00:00:00+00:00', 'America/Los_Angeles'); OK NULL Time taken: 5.152 seconds, Fetched: 1 row(s) hive> select from_utc_timestamp('2000-10-10 00:00:00', 'America/Los_Angeles'); OK 2000-10-09 17:00:00 Time taken: 0.069 seconds, Fetched: 1 row(s) hive> {noformat} Since the function is expecting a UTC datetime value, it probably should continue to reject input that contains a timezone component. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Proposal: Apply SQL based authorization functions in the metastore.
Hi Elliot, One scenario where Storage based authorization doesn't work is the case of object stores such as S3. In those scenarios, the tool/platform that is accessing the data won't have any restrictions on data access either. I am not sure how the data access would be secured in such cases, even if metastore access is controlled. Overall, the metastore api is a much lower level API, and as a result it is difficult to enforce higher level restrictions at that level. (More on that below). I agree that O/JDBC via HS2 is not something distributed tools can use (at least with standard API). I think the ideal way to enforce security is having tools/platforms read via a 'table server' (and not give them direct file system access). At Hortonworks, we have been using this to provide security for Spark, by allowing it to read in parallel from LLAP deamons - https://www.slideshare.net/Hadoop_Summit/security-updates-more-seamless-access-controls-with-apache-spark-and-apache-ranger https://github.com/hortonworks-spark/spark-llap/wiki/1.-Goal-and-features (You can replace Ranger with SQL auth as well in above examples). The next phase of that work would likely make use Apache Arrow for the data exchange (there are some hive jiras created recently around it). I had considered having the authorization at metastore level, but realized that is not the right place to enforce the RDBMS/SQL style policies. Here are some notes I wrote while back about it - http://hadoop-pig-hive-thejas.blogspot.com/2014/03/hive-sql-standard-authorization-why-not.html Quoting from there - The advantage of doing it at the metastore api level would have been that pig and MR would also be covered under this authorization model. But this works only if the SQL actions always needs some metastore api calls, and access control on these calls it needs to make can be used to enforce the SQL level authorization. Take for example INSERT privilege in SQL, you can grant INSERT without granting SELECT privilege. But when processing insert queries for the user, we need to be able to do a getTable() and read the schema of the table. But if you look at it from metastore api perspective, you should not be able to do a getTable() without having SELECT privileges on the table. Similar issues happen with DELETE and UPDATE privileges, which you can grant without SELECT. Another example is URIs in the SQL statement, you don't need to make any metatore api calls before access URIs. So URI access control can't be implemented using metastore api calls. Another use case is anything that you want to allow the ADMIN to do but the action does not involve specific metastore api calls that can be used to control the action. Thanks, Thejas On Fri, Apr 20, 2018 at 6:30 AM, Elliot Westwrote: > Hello, > > I’d like to propose that SQL based authorization (or something similar) be > applied and enforced also in the metastore service as part of the initiative > to extract HMS as an independent project. While any such implementation > cannot be ’system complete’ like HiveServer2 (HS2) (HMS has no scope to > intercept operations applied to table data, only metadata), it would be a > significant step forward for controlling the operations that can be actioned > by the many non-HS2 clients in the Hive ecosystem. > > I believe this is a good time to consider this option as there is currently > much discussion in the Hive community on the future directions of HMS and > greater recognition that HMS is now seen as general data platform > infrastructure and not simply an internal Hive component. > > Further details are below. I’d be grateful for any feedback, thoughts, and > suggestions on how this could move forward. > > Problem > At this time, Hive’s SQL based authorization feature is the recommended > approach for controlling which operations may be performed on what by whom. > This feature is applied in the HS2 component. However, a large number of > platforms that integrate with Hive do not do so via HS2, instead talking to > the metastore service directly and so bypassing authorization. They can > perform destructive operations such as a table drop even though the > permissions declared in the metastore may explicitly forbid it as they are > able to circumvent the authorization logic in HS2. > > In short, there seems to be a lack of encapsulation with authorization in > the metastore; HMS owns the metadata, is responsible for performing actions > on metadata, for maintaining permissions on what actions are permissible by > whom, and yet has no means to use the information it has to protect the data > it owns. > > Workarounds > Common workarounds to this deficiency include falling back to storage based > authorization or running read only metastore instances. However, both of > these approaches have significant drawbacks: > > File based auth does not function when using object stores such as S3 and so > is not usable in cloud deployments of Hive - a
[jira] [Created] (HIVE-19353) Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST
Matt McCline created HIVE-19353: --- Summary: Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST Key: HIVE-19353 URL: https://issues.apache.org/jira/browse/HIVE-19353 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Found by enabling vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData {noformat} Caused by: java.lang.RuntimeException: Unexpected column vector type LIST at org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:237) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:984) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:722) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19352) Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
Matt McCline created HIVE-19352: --- Summary: Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData Key: HIVE-19352 URL: https://issues.apache.org/jira/browse/HIVE-19352 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Turning vectorization on triggers a bug - see Jira . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19351) Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1
Matt McCline created HIVE-19351: --- Summary: Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1 Key: HIVE-19351 URL: https://issues.apache.org/jira/browse/HIVE-19351 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Why were the operator numbers unstable for: TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] when vectorization was enabled? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19350) Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1
Matt McCline created HIVE-19350: --- Summary: Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1 Key: HIVE-19350 URL: https://issues.apache.org/jira/browse/HIVE-19350 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Assignee: Matt McCline Seem like the operator number instability issue to me that Pengcheng Xiong that could occur with vectorization. For now, turning off vectorization for: TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] Follow up Jira is -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19349) TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] failing
Vineet Garg created HIVE-19349: -- Summary: TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] failing Key: HIVE-19349 URL: https://issues.apache.org/jira/browse/HIVE-19349 Project: Hive Issue Type: Sub-task Reporter: Vineet Garg Related to HIVE-19326 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66805/ --- (Updated April 28, 2018, 10:26 p.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, and Vineet Garg. Changes --- Removed some tests and updated one. Bugs: HIVE-19311 https://issues.apache.org/jira/browse/HIVE-19311 Repository: hive-git Description --- Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed. It should be able to launch an insert job to load the data instead. Diffs (updated) - data/files/load_data_job/bucketing.txt PRE-CREATION data/files/load_data_job/load_data_1_partition.txt PRE-CREATION data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION itests/src/test/resources/testconfiguration.properties 1a346593fd ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 7d33fa3892 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 ql/src/test/results/clientnegative/nopart_load.q.out 881514640c ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION Diff: https://reviews.apache.org/r/66805/diff/4/ Changes: https://reviews.apache.org/r/66805/diff/3-4/ Testing --- Added a unit test. Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
Vineet Garg created HIVE-19348: -- Summary: org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing Key: HIVE-19348 URL: https://issues.apache.org/jira/browse/HIVE-19348 Project: Hive Issue Type: Sub-task Reporter: Vineet Garg Assignee: Zoltan Haindrich {noformat} Error Message expected:<1> but was:<2> {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19347) TestTriggersWorkloadManager tests are failing consistently
Vineet Garg created HIVE-19347: -- Summary: TestTriggersWorkloadManager tests are failing consistently Key: HIVE-19347 URL: https://issues.apache.org/jira/browse/HIVE-19347 Project: Hive Issue Type: Sub-task Reporter: Vineet Garg {noformat} Error Message Expected query to succeed expected null, but was:
[jira] [Created] (HIVE-19346) TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_5] failling
Vineet Garg created HIVE-19346: -- Summary: TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_5] failling Key: HIVE-19346 URL: https://issues.apache.org/jira/browse/HIVE-19346 Project: Hive Issue Type: Sub-task Reporter: Vineet Garg Assignee: Jesus Camacho Rodriguez {noformat} Error Message Client Execution succeeded but contained differences (error code = 1) after executing materialized_view_create_rewrite_5.q 402c402 < totalSize1053 --- > totalSize1055 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66805/ --- (Updated April 28, 2018, 6:26 a.m.) Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, and Vineet Garg. Changes --- Modified the tests to use correct data files. Bugs: HIVE-19311 https://issues.apache.org/jira/browse/HIVE-19311 Repository: hive-git Description --- Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed. It should be able to launch an insert job to load the data instead. Diffs (updated) - data/files/load_data_job/bucketing.txt PRE-CREATION data/files/load_data_job/load_data_1_partition.txt PRE-CREATION data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION itests/src/test/resources/testconfiguration.properties 1a346593fd ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 7d33fa3892 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_data_using_job.q.out PRE-CREATION Diff: https://reviews.apache.org/r/66805/diff/3/ Changes: https://reviews.apache.org/r/66805/diff/2-3/ Testing --- Added a unit test. Thanks, Deepak Jaiswal