[jira] [Created] (HIVE-19356) TestNegativeMinimrCliDriver (cluster_tasklog_retrieval, mapreduce_stack_trace, mapreduce_stack_trace_turnoff, and minimr_broken_pipe) are failing

2018-04-28 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-19356:
-

 Summary: TestNegativeMinimrCliDriver (cluster_tasklog_retrieval, 
mapreduce_stack_trace, mapreduce_stack_trace_turnoff, and minimr_broken_pipe) 
are failing
 Key: HIVE-19356
 URL: https://issues.apache.org/jira/browse/HIVE-19356
 Project: Hive
  Issue Type: Sub-task
Reporter: Deepak Jaiswal


All these tests fail the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19355) Test TestJdbcWithDBTokenStoreNoDoAs has multiple failures

2018-04-28 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-19355:
-

 Summary: Test TestJdbcWithDBTokenStoreNoDoAs has multiple failures
 Key: HIVE-19355
 URL: https://issues.apache.org/jira/browse/HIVE-19355
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Deepak Jaiswal


Test TestJdbcWithDBTokenStoreNoDoAs has many subtests which are failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19354) from_utc_timestamp returns incorrect results for datetime values with timezone

2018-04-28 Thread Bruce Robbins (JIRA)
Bruce Robbins created HIVE-19354:


 Summary: from_utc_timestamp returns incorrect results for datetime 
values with timezone
 Key: HIVE-19354
 URL: https://issues.apache.org/jira/browse/HIVE-19354
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
Reporter: Bruce Robbins


On the master branch, from_utc_timestamp returns incorrect results for datetime 
strings that contain a timezone:
{noformat}
hive> select from_utc_timestamp('2000-10-10 00:00:00+00:00', 
'America/Los_Angeles');
OK
2000-10-09 10:00:00
Time taken: 0.294 seconds, Fetched: 1 row(s)
hive> select from_utc_timestamp('2000-10-10 00:00:00', 'America/Los_Angeles');
OK
2000-10-09 17:00:00
Time taken: 0.121 seconds, Fetched: 1 row(s)
hive> 
{noformat}
Both inputs are 2000-10-10 00:00:00 in UTC time, but I got two different 
results.

In version 2.3.3, from_utc_timestamp doesn't accept timezones in its input 
strings, so it does not have this bug:
{noformat}
hive> select from_utc_timestamp('2000-10-10 00:00:00+00:00', 
'America/Los_Angeles');
OK
NULL
Time taken: 5.152 seconds, Fetched: 1 row(s)
hive> select from_utc_timestamp('2000-10-10 00:00:00', 'America/Los_Angeles');
OK
2000-10-09 17:00:00
Time taken: 0.069 seconds, Fetched: 1 row(s)
hive> 
{noformat}
Since the function is expecting a UTC datetime value, it probably should 
continue to reject input that contains a timezone component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Proposal: Apply SQL based authorization functions in the metastore.

2018-04-28 Thread Thejas Nair
Hi Elliot,

One scenario where Storage based authorization doesn't work is the
case of object stores such as S3. In those scenarios, the
tool/platform that is accessing the data won't have any restrictions
on data access either. I am not sure how the data access would be
secured in such cases, even if metastore access is controlled.

Overall, the metastore api is a much lower level API, and as a result
it is difficult to enforce higher level restrictions at that level.
(More on that below).

I agree that O/JDBC via HS2 is not something distributed tools can use
(at least with standard API).
I think the ideal way to enforce security is having tools/platforms
read via a 'table server' (and not give them direct file system
access).
At Hortonworks, we have been using this to provide security for Spark,
by allowing it to read in parallel from LLAP deamons -
https://www.slideshare.net/Hadoop_Summit/security-updates-more-seamless-access-controls-with-apache-spark-and-apache-ranger
https://github.com/hortonworks-spark/spark-llap/wiki/1.-Goal-and-features
(You can replace Ranger with SQL auth as well in above examples).

The next phase of that work would likely make use Apache Arrow for the
data exchange (there are some hive jiras created recently around it).

I had considered having the authorization at metastore level, but
realized that is not the right place to enforce the RDBMS/SQL style
policies. Here are some notes I wrote while back about it -
http://hadoop-pig-hive-thejas.blogspot.com/2014/03/hive-sql-standard-authorization-why-not.html

Quoting from there -
The advantage of doing it at the metastore api level would have been
that pig and MR would also be covered under this authorization model.
But this works only if the SQL actions always needs some metastore api
calls, and access control on these calls it needs to make can be used
to enforce the SQL level authorization.

Take for example INSERT privilege in SQL, you can grant INSERT without
granting SELECT privilege. But when processing insert queries for the
user, we need to be able to do a getTable() and read the schema of the
table. But if you look at it from metastore api perspective, you
should not be able to do a getTable() without having SELECT privileges
on the table.
Similar issues happen with DELETE and UPDATE privileges, which you can
grant without SELECT.

Another example is URIs in the SQL statement, you don't need to make
any metatore api calls before access URIs. So URI access control can't
be implemented using metastore api calls.

Another use case is anything that you want to allow the ADMIN to do
but the action does not involve specific metastore api calls that can
be used to control the action.

Thanks,
Thejas


On Fri, Apr 20, 2018 at 6:30 AM, Elliot West  wrote:
> Hello,
>
> I’d like to propose that SQL based authorization (or something similar) be
> applied and enforced also in the metastore service as part of the initiative
> to extract HMS as an independent project. While any such implementation
> cannot be ’system complete’ like HiveServer2 (HS2) (HMS has no scope to
> intercept operations applied to table data, only metadata), it would be a
> significant step forward for controlling the operations that can be actioned
> by the many non-HS2 clients in the Hive ecosystem.
>
> I believe this is a good time to consider this option as there is currently
> much discussion in the Hive community on the future directions of HMS and
> greater recognition that HMS is now seen as general data platform
> infrastructure and not simply an internal Hive component.
>
> Further details are below. I’d be grateful for any feedback, thoughts, and
> suggestions on how this could move forward.
>
> Problem
> At this time, Hive’s SQL based authorization feature is the recommended
> approach for controlling which operations may be performed on what by whom.
> This feature is applied in the HS2 component. However, a large number of
> platforms that integrate with Hive do not do so via HS2, instead talking to
> the metastore service directly and so bypassing authorization. They can
> perform destructive operations such as a table drop even though the
> permissions declared in the metastore may explicitly forbid it as they are
> able to circumvent the authorization logic in HS2.
>
> In short, there seems to be a lack of encapsulation with authorization in
> the metastore; HMS owns the metadata, is responsible for performing actions
> on metadata, for maintaining permissions on what actions are permissible by
> whom, and yet has no means to use the information it has to protect the data
> it owns.
>
> Workarounds
> Common workarounds to this deficiency include falling back to storage based
> authorization or running read only metastore instances. However, both of
> these approaches have significant drawbacks:
>
> File based auth does not function when using object stores such as S3 and so
> is not usable in cloud deployments of Hive - a 

[jira] [Created] (HIVE-19353) Vectorization: ConstantVectorExpression --> RuntimeException: Unexpected column vector type LIST

2018-04-28 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19353:
---

 Summary: Vectorization: ConstantVectorExpression  --> 
RuntimeException: Unexpected column vector type LIST
 Key: HIVE-19353
 URL: https://issues.apache.org/jira/browse/HIVE-19353
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Found by enabling vectorization for 
org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
{noformat}
Caused by: java.lang.RuntimeException: Unexpected column vector type LIST
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:237)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:955) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:984)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:722) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19352) Vectorization: Disable vectorization for org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData

2018-04-28 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19352:
---

 Summary: Vectorization: Disable vectorization for 
org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
 Key: HIVE-19352
 URL: https://issues.apache.org/jira/browse/HIVE-19352
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Turning vectorization on triggers a bug - see Jira .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19351) Vectorization: Followup on why operator numbers are unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1

2018-04-28 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19351:
---

 Summary: Vectorization: Followup on why operator numbers are 
unstable in User EXPLAIN for explainuser_1.q / spark_explainuser_1
 Key: HIVE-19351
 URL: https://issues.apache.org/jira/browse/HIVE-19351
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Why were the operator numbers unstable for:

TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]

TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] 

when vectorization was enabled?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19350) Vectorization: Turn off vectorization for explainuser_1.q / spark_explainuser_1

2018-04-28 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19350:
---

 Summary: Vectorization: Turn off vectorization for explainuser_1.q 
/ spark_explainuser_1
 Key: HIVE-19350
 URL: https://issues.apache.org/jira/browse/HIVE-19350
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Seem like the operator number instability issue to me that Pengcheng Xiong that 
could occur with vectorization.

For now, turning off vectorization for:

TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]

TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] 

Follow up Jira is 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19349) TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] failing

2018-04-28 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-19349:
--

 Summary: 
TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] failing
 Key: HIVE-19349
 URL: https://issues.apache.org/jira/browse/HIVE-19349
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg


Related to HIVE-19326



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

2018-04-28 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
---

(Updated April 28, 2018, 10:26 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho 
Rodríguez, and Vineet Garg.


Changes
---

Removed some tests and updated one.


Bugs: HIVE-19311
https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
---

Currently, "load data" statement is very limited. It errors out if any of the 
information is missing such as partitioning info if table is partitioned or 
appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 1a346593fd 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 7d33fa3892 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
2f3b07f4af 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
  ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
  ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
  ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/4/

Changes: https://reviews.apache.org/r/66805/diff/3-4/


Testing
---

Added a unit test.


Thanks,

Deepak Jaiswal



[jira] [Created] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-28 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-19348:
--

 Summary:  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp 
are failing
 Key: HIVE-19348
 URL: https://issues.apache.org/jira/browse/HIVE-19348
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg
Assignee: Zoltan Haindrich


{noformat}
Error Message
expected:<1> but was:<2>
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19347) TestTriggersWorkloadManager tests are failing consistently

2018-04-28 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-19347:
--

 Summary: TestTriggersWorkloadManager tests are failing consistently
 Key: HIVE-19347
 URL: https://issues.apache.org/jira/browse/HIVE-19347
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg


{noformat}
Error Message
Expected query to succeed expected null, but was:

[jira] [Created] (HIVE-19346) TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_5] failling

2018-04-28 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-19346:
--

 Summary: 
TestMiniLlapLocalCliDriver.testCliDriver[materialized_view_create_rewrite_5] 
failling
 Key: HIVE-19346
 URL: https://issues.apache.org/jira/browse/HIVE-19346
 Project: Hive
  Issue Type: Sub-task
Reporter: Vineet Garg
Assignee: Jesus Camacho Rodriguez


{noformat}
Error Message
Client Execution succeeded but contained differences (error code = 1) after 
executing materialized_view_create_rewrite_5.q 
402c402
<  totalSize1053
---
>  totalSize1055
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

2018-04-28 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
---

(Updated April 28, 2018, 6:26 a.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho 
Rodríguez, and Vineet Garg.


Changes
---

Modified the tests to use correct data files.


Bugs: HIVE-19311
https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
---

Currently, "load data" statement is very limited. It errors out if any of the 
information is missing such as partitioning info if table is partitioned or 
appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 1a346593fd 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 7d33fa3892 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java a51fdd322f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1dccf969ff 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
2f3b07f4af 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/load_data_using_job.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/3/

Changes: https://reviews.apache.org/r/66805/diff/2-3/


Testing
---

Added a unit test.


Thanks,

Deepak Jaiswal