Re: Ranger support for s3 (RANGER-1300)

Abhishek Somani Sun, 05 Feb 2017 22:08:19 -0800

Hi Bosco,

Thank you for your reply. I have a few specific question related to s3
authorization(with a usecase in mind) with HiveServer2 and would be much
obliged if you can spare a few moments to answer it.

Please consider the following points on which I base my questions. I might
be wrong on these premises and would be grateful for any corrections:
1. Ranger with HiveServer2 will take care of all authorization, and with
doAs=false, all s3 credentials used to access data would be of the
HiveServer2 service and not of the user.
2. According to this wiki page,
https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization,
Hive (in its native authorization) is supposed to do a storage level(URI
check) when a user creates an external table against a location(URI).
3. Hive does this URI level check (RWX privilege + Ownership) check for an
HDFS URI by checking privileges against the filesystem..if the user has the
required privileges, a create table is allowed to pass.

My questions are:
1. Does Ranger also do this URI check?
2. Does it do this check in case of s3 as well? If yes, how does it manage
to check s3 storage level permissions of a user?
3. If it does not do such a check today, is this also a requirement that we
plan to fulfil via RANGER-1300?

Thanks and much obliged.
PS: I have not taken you up on your suggestion of discussing this on the
JIRA because I fear I might divert the intent of the JIRA in my ignorance.
Please take a call on if you think this discussion is worthy of a JIRA
comment and I will add it there.

Thanks,
Abhishek

On Sat, Feb 4, 2017 at 1:09 AM, Don Bosco Durai <[email protected]> wrote:

> Abhishek
>
> If you are using Ranger for Hive authorization, then we recommend that you
> allow users to access Hive using HiveServer2 (beeline or JDBC) only. In
> this way, you can set the access permissions or tables, columns or even row
> level from Ranger and it would be enforced at the HiveServer2 server.
>
> On the HiveServer2 service, you need to set the configuration for
> doAs=false, which essentially means, it would use the (hive) service user
> credentials to access the underlying store (HDFS or S3). This has multiple
> advantages, including limiting the level of permissions you want to give
> the HDFS or S3 layer.
>
> The JIRA RANGER-1300 is primarily to enable authorization for tools which
> access the data layer directly, without any intermediate process. E.g.
> Apache Spark with LLAP. In this, the only logical enforcement point is at
> HDFS or S3.
>
> S3 is a shared service and hosted by AWS. And it doesn't provide any hook
> for 3rd party extension. This makes it difficult for Ranger to embed it's
> plugin within S3. Currently, the only option open is for Ranger to manage
> the S3 ACLs. This would require some work to be done on the Ranger side.
>
> If you have any suggestions for managing or enforcing permissions in S3,
> then let's discuss in RANGER-1300. It will be very helpful for everyone.
>
> Thanks
>
> Bosco
>
> On 2/2/17, 11:28 PM, "Abhishek Somani" <[email protected]> wrote:
>
>     Hi,
>
>     I am currently evaluating using Ranger for hive authorization for
> tables
>     with data residing in s3. With reference to
>     https://issues.apache.org/jira/browse/RANGER-1300, can someone please
>     explain what is the current support for s3 in Ranger. Does
> Ranger(primarily
>     focused on Hive Authorization) work at all for tables backed with data
> in
>     s3? I am sorry but in my few searches, I have not been able to find
>     relevant documentation.
>
>
>     Thanks,
>     Abhishek
>
>
>
>

Re: Ranger support for s3 (RANGER-1300)

Reply via email to