Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia
If you run on YARN, you can use Kerberos, be authenticated as the right user, 
etc in the same way as MapReduce jobs.

Matei

> On Sep 3, 2015, at 1:37 PM, Daniel Schulz  
> wrote:
> 
> Hi,
> 
> I really enjoy using Spark. An obstacle to sell it to our clients currently 
> is the missing Kerberos-like security on a Hadoop with simple authentication. 
> Are there plans, a proposal, or a project to deliver a Ranger plugin or 
> something similar to Spark. The target is to differentiate users and their 
> privileges when reading and writing data to HDFS? Is Kerberos my only option 
> then?
> 
> Kind regards, Daniel.
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Ranger-like Security on Spark

2015-09-03 Thread Daniel Schulz
Hi Matei,

Thanks for your answer.

My question is regarding simple authenticated Spark-on-YARN only, without 
Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my HDFS 
user and only be able to access files I am entitled to read/write? Will it 
enforce HDFS ACLs and Ranger policies as well?

Best regards, Daniel.

> On 03 Sep 2015, at 21:16, Matei Zaharia  wrote:
> 
> If you run on YARN, you can use Kerberos, be authenticated as the right user, 
> etc in the same way as MapReduce jobs.
> 
> Matei
> 
>> On Sep 3, 2015, at 1:37 PM, Daniel Schulz  
>> wrote:
>> 
>> Hi,
>> 
>> I really enjoy using Spark. An obstacle to sell it to our clients currently 
>> is the missing Kerberos-like security on a Hadoop with simple 
>> authentication. Are there plans, a proposal, or a project to deliver a 
>> Ranger plugin or something similar to Spark. The target is to differentiate 
>> users and their privileges when reading and writing data to HDFS? Is 
>> Kerberos my only option then?
>> 
>> Kind regards, Daniel.
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Ranger-like Security on Spark

2015-09-03 Thread Daniel Schulz
Hi,

I really enjoy using Spark. An obstacle to sell it to our clients currently is 
the missing Kerberos-like security on a Hadoop with simple authentication. Are 
there plans, a proposal, or a project to deliver a Ranger plugin or something 
similar to Spark. The target is to differentiate users and their privileges 
when reading and writing data to HDFS? Is Kerberos my only option then?

Kind regards, Daniel.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia
Even simple Spark-on-YARN should run as the user that submitted the job, yes, 
so HDFS ACLs should be enforced. Not sure how it plays with the rest of Ranger.

Matei

> On Sep 3, 2015, at 4:57 PM, Jörn Franke  wrote:
> 
> Well if it needs to read from hdfs then it will adhere to the permissions 
> defined there And/or in ranger. However, I am not aware that you can protect 
> dataframes, tables or streams in general in Spark.
> 
> Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz  > a écrit :
> Hi Matei,
> 
> Thanks for your answer.
> 
> My question is regarding simple authenticated Spark-on-YARN only, without 
> Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my 
> HDFS user and only be able to access files I am entitled to read/write? Will 
> it enforce HDFS ACLs and Ranger policies as well?
> 
> Best regards, Daniel.
> 
> > On 03 Sep 2015, at 21:16, Matei Zaharia  > > wrote:
> >
> > If you run on YARN, you can use Kerberos, be authenticated as the right 
> > user, etc in the same way as MapReduce jobs.
> >
> > Matei
> >
> >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz  >> > wrote:
> >>
> >> Hi,
> >>
> >> I really enjoy using Spark. An obstacle to sell it to our clients 
> >> currently is the missing Kerberos-like security on a Hadoop with simple 
> >> authentication. Are there plans, a proposal, or a project to deliver a 
> >> Ranger plugin or something similar to Spark. The target is to 
> >> differentiate users and their privileges when reading and writing data to 
> >> HDFS? Is Kerberos my only option then?
> >>
> >> Kind regards, Daniel.
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> >> 
> >> For additional commands, e-mail: user-h...@spark.apache.org 
> >> 
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > 
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > 
> >
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 



Re: Ranger-like Security on Spark

2015-09-03 Thread Marcelo Vanzin
On Thu, Sep 3, 2015 at 5:15 PM, Matei Zaharia  wrote:
> Even simple Spark-on-YARN should run as the user that submitted the job,
> yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of
> Ranger.

It's slightly more complicated than that (without kerberos, the
underlying process runs as the same user running the YARN daemons, but
the connections to HDFS and other Hadoop services identify as the user
who submitted the application), but the end effect is what Matei
describes. I also do not know about how Ranger enforces things.

Also note that "simple authentication" is not secure at all. You're
basically just asking your users to be nice instead of actually
enforcing anything. Any user can tell YARN that he's actually someone
else when starting the application, and YARN will believe him. Just
say "HADOOP_USER_NAME=somebodyelse" and you're good to go!

> On Sep 3, 2015, at 4:57 PM, Jörn Franke  wrote:
>
> Well if it needs to read from hdfs then it will adhere to the permissions
> defined there And/or in ranger. However, I am not aware that you can protect
> dataframes, tables or streams in general in Spark.
>
>
> Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz  a
> écrit :
>>
>> Hi Matei,
>>
>> Thanks for your answer.
>>
>> My question is regarding simple authenticated Spark-on-YARN only, without
>> Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my
>> HDFS user and only be able to access files I am entitled to read/write? Will
>> it enforce HDFS ACLs and Ranger policies as well?
>>
>> Best regards, Daniel.
>>
>> > On 03 Sep 2015, at 21:16, Matei Zaharia  wrote:
>> >
>> > If you run on YARN, you can use Kerberos, be authenticated as the right
>> > user, etc in the same way as MapReduce jobs.
>> >
>> > Matei
>> >
>> >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz
>> >>  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I really enjoy using Spark. An obstacle to sell it to our clients
>> >> currently is the missing Kerberos-like security on a Hadoop with simple
>> >> authentication. Are there plans, a proposal, or a project to deliver a
>> >> Ranger plugin or something similar to Spark. The target is to 
>> >> differentiate
>> >> users and their privileges when reading and writing data to HDFS? Is
>> >> Kerberos my only option then?


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Ranger-like Security on Spark

2015-09-03 Thread Ruslan Dautkhanov
You could define access in Sentry and enable permissions sync with HDFS, so
you could just grant access on Hive per-database or per-table basis. It
should work for Spark too, as Sentry will propage "grants" to HDFS acls.

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_hdfs_sentry_sync.html




-- 
Ruslan Dautkhanov

On Thu, Sep 3, 2015 at 1:46 PM, Daniel Schulz 
wrote:

> Hi Matei,
>
> Thanks for your answer.
>
> My question is regarding simple authenticated Spark-on-YARN only, without
> Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my
> HDFS user and only be able to access files I am entitled to read/write?
> Will it enforce HDFS ACLs and Ranger policies as well?
>
> Best regards, Daniel.
>
> > On 03 Sep 2015, at 21:16, Matei Zaharia  wrote:
> >
> > If you run on YARN, you can use Kerberos, be authenticated as the right
> user, etc in the same way as MapReduce jobs.
> >
> > Matei
> >
> >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz 
> wrote:
> >>
> >> Hi,
> >>
> >> I really enjoy using Spark. An obstacle to sell it to our clients
> currently is the missing Kerberos-like security on a Hadoop with simple
> authentication. Are there plans, a proposal, or a project to deliver a
> Ranger plugin or something similar to Spark. The target is to differentiate
> users and their privileges when reading and writing data to HDFS? Is
> Kerberos my only option then?
> >>
> >> Kind regards, Daniel.
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Ranger-like Security on Spark

2015-09-03 Thread Jörn Franke
Well if it needs to read from hdfs then it will adhere to the permissions
defined there And/or in ranger. However, I am not aware that you can
protect dataframes, tables or streams in general in Spark.

Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz 
a écrit :

> Hi Matei,
>
> Thanks for your answer.
>
> My question is regarding simple authenticated Spark-on-YARN only, without
> Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my
> HDFS user and only be able to access files I am entitled to read/write?
> Will it enforce HDFS ACLs and Ranger policies as well?
>
> Best regards, Daniel.
>
> > On 03 Sep 2015, at 21:16, Matei Zaharia  wrote:
> >
> > If you run on YARN, you can use Kerberos, be authenticated as the right
> user, etc in the same way as MapReduce jobs.
> >
> > Matei
> >
> >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz 
> wrote:
> >>
> >> Hi,
> >>
> >> I really enjoy using Spark. An obstacle to sell it to our clients
> currently is the missing Kerberos-like security on a Hadoop with simple
> authentication. Are there plans, a proposal, or a project to deliver a
> Ranger plugin or something similar to Spark. The target is to differentiate
> users and their privileges when reading and writing data to HDFS? Is
> Kerberos my only option then?
> >>
> >> Kind regards, Daniel.
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>