Hi Bosco, Thank you for your reply. I have a few specific question related to s3 authorization(with a usecase in mind) with HiveServer2 and would be much obliged if you can spare a few moments to answer it.
Please consider the following points on which I base my questions. I might be wrong on these premises and would be grateful for any corrections: 1. Ranger with HiveServer2 will take care of all authorization, and with doAs=false, all s3 credentials used to access data would be of the HiveServer2 service and not of the user. 2. According to this wiki page, https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization, Hive (in its native authorization) is supposed to do a storage level(URI check) when a user creates an external table against a location(URI). 3. Hive does this URI level check (RWX privilege + Ownership) check for an HDFS URI by checking privileges against the filesystem..if the user has the required privileges, a create table is allowed to pass. My questions are: 1. Does Ranger also do this URI check? 2. Does it do this check in case of s3 as well? If yes, how does it manage to check s3 storage level permissions of a user? 3. If it does not do such a check today, is this also a requirement that we plan to fulfil via RANGER-1300? Thanks and much obliged. PS: I have not taken you up on your suggestion of discussing this on the JIRA because I fear I might divert the intent of the JIRA in my ignorance. Please take a call on if you think this discussion is worthy of a JIRA comment and I will add it there. Thanks, Abhishek On Sat, Feb 4, 2017 at 1:09 AM, Don Bosco Durai <[email protected]> wrote: > Abhishek > > If you are using Ranger for Hive authorization, then we recommend that you > allow users to access Hive using HiveServer2 (beeline or JDBC) only. In > this way, you can set the access permissions or tables, columns or even row > level from Ranger and it would be enforced at the HiveServer2 server. > > On the HiveServer2 service, you need to set the configuration for > doAs=false, which essentially means, it would use the (hive) service user > credentials to access the underlying store (HDFS or S3). This has multiple > advantages, including limiting the level of permissions you want to give > the HDFS or S3 layer. > > The JIRA RANGER-1300 is primarily to enable authorization for tools which > access the data layer directly, without any intermediate process. E.g. > Apache Spark with LLAP. In this, the only logical enforcement point is at > HDFS or S3. > > S3 is a shared service and hosted by AWS. And it doesn't provide any hook > for 3rd party extension. This makes it difficult for Ranger to embed it's > plugin within S3. Currently, the only option open is for Ranger to manage > the S3 ACLs. This would require some work to be done on the Ranger side. > > If you have any suggestions for managing or enforcing permissions in S3, > then let's discuss in RANGER-1300. It will be very helpful for everyone. > > Thanks > > Bosco > > On 2/2/17, 11:28 PM, "Abhishek Somani" <[email protected]> wrote: > > Hi, > > I am currently evaluating using Ranger for hive authorization for > tables > with data residing in s3. With reference to > https://issues.apache.org/jira/browse/RANGER-1300, can someone please > explain what is the current support for s3 in Ranger. Does > Ranger(primarily > focused on Hive Authorization) work at all for tables backed with data > in > s3? I am sorry but in my few searches, I have not been able to find > relevant documentation. > > > Thanks, > Abhishek > > > >
