Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia
If you run on YARN, you can use Kerberos, be authenticated as the right user, etc in the same way as MapReduce jobs. Matei > On Sep 3, 2015, at 1:37 PM, Daniel Schulz > wrote: > > Hi, > > I really enjoy using Spark. An obstacle to sell it to our clients

Re: Ranger-like Security on Spark

2015-09-03 Thread Daniel Schulz
Hi Matei, Thanks for your answer. My question is regarding simple authenticated Spark-on-YARN only, without Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my HDFS user and only be able to access files I am entitled to read/write? Will it enforce HDFS ACLs and Ranger

Ranger-like Security on Spark

2015-09-03 Thread Daniel Schulz
Hi, I really enjoy using Spark. An obstacle to sell it to our clients currently is the missing Kerberos-like security on a Hadoop with simple authentication. Are there plans, a proposal, or a project to deliver a Ranger plugin or something similar to Spark. The target is to differentiate users

Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia
Even simple Spark-on-YARN should run as the user that submitted the job, yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of Ranger. Matei > On Sep 3, 2015, at 4:57 PM, Jörn Franke wrote: > > Well if it needs to read from hdfs then it will adhere

Re: Ranger-like Security on Spark

2015-09-03 Thread Marcelo Vanzin
On Thu, Sep 3, 2015 at 5:15 PM, Matei Zaharia wrote: > Even simple Spark-on-YARN should run as the user that submitted the job, > yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of > Ranger. It's slightly more complicated than that (without

Re: Ranger-like Security on Spark

2015-09-03 Thread Ruslan Dautkhanov
You could define access in Sentry and enable permissions sync with HDFS, so you could just grant access on Hive per-database or per-table basis. It should work for Spark too, as Sentry will propage "grants" to HDFS acls.

Re: Ranger-like Security on Spark

2015-09-03 Thread Jörn Franke
Well if it needs to read from hdfs then it will adhere to the permissions defined there And/or in ranger. However, I am not aware that you can protect dataframes, tables or streams in general in Spark. Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz a écrit : > Hi