Re: Securing Spark Job on Cluster

2017-04-28 Thread Mark Hamstra
spark.local.dir

http://spark.apache.org/docs/latest/configuration.html

On Fri, Apr 28, 2017 at 8:51 AM, Shashi Vishwakarma <
shashi.vish...@gmail.com> wrote:

> Yes I am using HDFS .Just trying to understand couple of point.
>
> There would be two kind of encryption which would be required.
>
> 1. Data in Motion - This could be achieved by enabling SSL -
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.
> 0/bk_spark-component-guide/content/spark-encryption.html
>
> 2. Data at Rest - HDFS Encryption can be applied.
>
> Apart from this when spark executes a job , each disk available in all
> node needs to be encrypted .
>
> I can have multiple disk on each node and encrypting all of them could be
> costly operation - Therefore I was trying to identify during job execution
> what are possible folders where spark can spill data .
>
> Once these items are identified those specific disk can be encrypted.
>
> Thanks
> Shashi
>
>
>
>
> On Fri, Apr 28, 2017 at 4:34 PM, Jörn Franke  wrote:
>
>> Why don't you use whole disk encryption?
>> Are you using HDFS?
>>
>> On 28. Apr 2017, at 16:57, Shashi Vishwakarma 
>> wrote:
>>
>> Agreed Jorn. Disk encryption is one option that will help to secure data
>> but how do I know at which location Spark is spilling temp file, shuffle
>> data and application data ?
>>
>> Thanks
>> Shashi
>>
>> On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke 
>> wrote:
>>
>>> You can use disk encryption as provided by the operating system.
>>> Additionally, you may think about shredding disks after they are not used
>>> anymore.
>>>
>>> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma <
>>> shashi.vish...@gmail.com> wrote:
>>> >
>>> > Hi All
>>> >
>>> > I was dealing with one the spark requirement here where Client (like
>>> Banking Client where security is major concern) needs all spark processing
>>> should happen securely.
>>> >
>>> > For example all communication happening between spark client and
>>> server ( driver & executor communication) should be on secure channel. Even
>>> when spark spills on disk based on storage level (Mem+Disk), it should not
>>> be written in un-encrypted format on local disk or there should be some
>>> workaround to prevent spill.
>>> >
>>> > I did some research  but could not get any concrete solution.Let me
>>> know if someone has done this.
>>> >
>>> > Any guidance would be a great help.
>>> >
>>> > Thanks
>>> > Shashi
>>>
>>
>>
>


Re: Securing Spark Job on Cluster

2017-04-28 Thread Shashi Vishwakarma
Yes I am using HDFS .Just trying to understand couple of point.

There would be two kind of encryption which would be required.

1. Data in Motion - This could be achieved by enabling SSL -
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_spark-component-guide/content/spark-encryption.html

2. Data at Rest - HDFS Encryption can be applied.

Apart from this when spark executes a job , each disk available in all node
needs to be encrypted .

I can have multiple disk on each node and encrypting all of them could be
costly operation - Therefore I was trying to identify during job execution
what are possible folders where spark can spill data .

Once these items are identified those specific disk can be encrypted.

Thanks
Shashi




On Fri, Apr 28, 2017 at 4:34 PM, Jörn Franke  wrote:

> Why don't you use whole disk encryption?
> Are you using HDFS?
>
> On 28. Apr 2017, at 16:57, Shashi Vishwakarma 
> wrote:
>
> Agreed Jorn. Disk encryption is one option that will help to secure data
> but how do I know at which location Spark is spilling temp file, shuffle
> data and application data ?
>
> Thanks
> Shashi
>
> On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke  wrote:
>
>> You can use disk encryption as provided by the operating system.
>> Additionally, you may think about shredding disks after they are not used
>> anymore.
>>
>> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma 
>> wrote:
>> >
>> > Hi All
>> >
>> > I was dealing with one the spark requirement here where Client (like
>> Banking Client where security is major concern) needs all spark processing
>> should happen securely.
>> >
>> > For example all communication happening between spark client and server
>> ( driver & executor communication) should be on secure channel. Even when
>> spark spills on disk based on storage level (Mem+Disk), it should not be
>> written in un-encrypted format on local disk or there should be some
>> workaround to prevent spill.
>> >
>> > I did some research  but could not get any concrete solution.Let me
>> know if someone has done this.
>> >
>> > Any guidance would be a great help.
>> >
>> > Thanks
>> > Shashi
>>
>
>


Re: Securing Spark Job on Cluster

2017-04-28 Thread Jörn Franke
Why don't you use whole disk encryption?
Are you using HDFS?

> On 28. Apr 2017, at 16:57, Shashi Vishwakarma  
> wrote:
> 
> Agreed Jorn. Disk encryption is one option that will help to secure data but 
> how do I know at which location Spark is spilling temp file, shuffle data and 
> application data ?
> 
> Thanks 
> Shashi
> 
>> On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke  wrote:
>> You can use disk encryption as provided by the operating system. 
>> Additionally, you may think about shredding disks after they are not used 
>> anymore.
>> 
>> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma  
>> > wrote:
>> >
>> > Hi All
>> >
>> > I was dealing with one the spark requirement here where Client (like 
>> > Banking Client where security is major concern) needs all spark processing 
>> > should happen securely.
>> >
>> > For example all communication happening between spark client and server ( 
>> > driver & executor communication) should be on secure channel. Even when 
>> > spark spills on disk based on storage level (Mem+Disk), it should not be 
>> > written in un-encrypted format on local disk or there should be some 
>> > workaround to prevent spill.
>> >
>> > I did some research  but could not get any concrete solution.Let me know 
>> > if someone has done this.
>> >
>> > Any guidance would be a great help.
>> >
>> > Thanks
>> > Shashi
> 


Re: Securing Spark Job on Cluster

2017-04-28 Thread Shashi Vishwakarma
Agreed Jorn. Disk encryption is one option that will help to secure data
but how do I know at which location Spark is spilling temp file, shuffle
data and application data ?

Thanks
Shashi

On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke  wrote:

> You can use disk encryption as provided by the operating system.
> Additionally, you may think about shredding disks after they are not used
> anymore.
>
> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma 
> wrote:
> >
> > Hi All
> >
> > I was dealing with one the spark requirement here where Client (like
> Banking Client where security is major concern) needs all spark processing
> should happen securely.
> >
> > For example all communication happening between spark client and server
> ( driver & executor communication) should be on secure channel. Even when
> spark spills on disk based on storage level (Mem+Disk), it should not be
> written in un-encrypted format on local disk or there should be some
> workaround to prevent spill.
> >
> > I did some research  but could not get any concrete solution.Let me know
> if someone has done this.
> >
> > Any guidance would be a great help.
> >
> > Thanks
> > Shashi
>


Re: Securing Spark Job on Cluster

2017-04-28 Thread Jörn Franke
You can use disk encryption as provided by the operating system. Additionally, 
you may think about shredding disks after they are not used anymore.

> On 28. Apr 2017, at 14:45, Shashi Vishwakarma  
> wrote:
> 
> Hi All
> 
> I was dealing with one the spark requirement here where Client (like Banking 
> Client where security is major concern) needs all spark processing should 
> happen securely. 
> 
> For example all communication happening between spark client and server ( 
> driver & executor communication) should be on secure channel. Even when spark 
> spills on disk based on storage level (Mem+Disk), it should not be written in 
> un-encrypted format on local disk or there should be some workaround to 
> prevent spill. 
> 
> I did some research  but could not get any concrete solution.Let me know if 
> someone has done this.
> 
> Any guidance would be a great help.
> 
> Thanks
> Shashi

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Securing Spark Job on Cluster

2017-04-28 Thread Shashi Vishwakarma
Kerberos is not a apache project. Kerberos provides a way to do
authentication but does not provide data security.

On Fri, Apr 28, 2017 at 3:24 PM, veera satya nv Dantuluri <
dvsnva...@gmail.com> wrote:

> Hi Shashi,
>
> Based on your requirement for securing data,  we can use Apache kebros, or
> we could use the security feature in Spark.
>
>
> > On Apr 28, 2017, at 8:45 AM, Shashi Vishwakarma <
> shashi.vish...@gmail.com> wrote:
> >
> > Hi All
> >
> > I was dealing with one the spark requirement here where Client (like
> Banking Client where security is major concern) needs all spark processing
> should happen securely.
> >
> > For example all communication happening between spark client and server
> ( driver & executor communication) should be on secure channel. Even when
> spark spills on disk based on storage level (Mem+Disk), it should not be
> written in un-encrypted format on local disk or there should be some
> workaround to prevent spill.
> >
> > I did some research  but could not get any concrete solution.Let me know
> if someone has done this.
> >
> > Any guidance would be a great help.
> >
> > Thanks
> > Shashi
>
>


Re: Securing Spark Job on Cluster

2017-04-28 Thread veera satya nv Dantuluri
Hi Shashi,

Based on your requirement for securing data,  we can use Apache kebros, or we 
could use the security feature in Spark.


> On Apr 28, 2017, at 8:45 AM, Shashi Vishwakarma  
> wrote:
> 
> Hi All
> 
> I was dealing with one the spark requirement here where Client (like Banking 
> Client where security is major concern) needs all spark processing should 
> happen securely. 
> 
> For example all communication happening between spark client and server ( 
> driver & executor communication) should be on secure channel. Even when spark 
> spills on disk based on storage level (Mem+Disk), it should not be written in 
> un-encrypted format on local disk or there should be some workaround to 
> prevent spill. 
> 
> I did some research  but could not get any concrete solution.Let me know if 
> someone has done this.
> 
> Any guidance would be a great help.
> 
> Thanks
> Shashi


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org