Re: Securing Spark Job on Cluster
spark.local.dir http://spark.apache.org/docs/latest/configuration.html On Fri, Apr 28, 2017 at 8:51 AM, Shashi Vishwakarma < shashi.vish...@gmail.com> wrote: > Yes I am using HDFS .Just trying to understand couple of point. > > There would be two kind of encryption which would be required. > > 1. Data in Motion - This could be achieved by enabling SSL - > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6. > 0/bk_spark-component-guide/content/spark-encryption.html > > 2. Data at Rest - HDFS Encryption can be applied. > > Apart from this when spark executes a job , each disk available in all > node needs to be encrypted . > > I can have multiple disk on each node and encrypting all of them could be > costly operation - Therefore I was trying to identify during job execution > what are possible folders where spark can spill data . > > Once these items are identified those specific disk can be encrypted. > > Thanks > Shashi > > > > > On Fri, Apr 28, 2017 at 4:34 PM, Jörn Frankewrote: > >> Why don't you use whole disk encryption? >> Are you using HDFS? >> >> On 28. Apr 2017, at 16:57, Shashi Vishwakarma >> wrote: >> >> Agreed Jorn. Disk encryption is one option that will help to secure data >> but how do I know at which location Spark is spilling temp file, shuffle >> data and application data ? >> >> Thanks >> Shashi >> >> On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke >> wrote: >> >>> You can use disk encryption as provided by the operating system. >>> Additionally, you may think about shredding disks after they are not used >>> anymore. >>> >>> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma < >>> shashi.vish...@gmail.com> wrote: >>> > >>> > Hi All >>> > >>> > I was dealing with one the spark requirement here where Client (like >>> Banking Client where security is major concern) needs all spark processing >>> should happen securely. >>> > >>> > For example all communication happening between spark client and >>> server ( driver & executor communication) should be on secure channel. Even >>> when spark spills on disk based on storage level (Mem+Disk), it should not >>> be written in un-encrypted format on local disk or there should be some >>> workaround to prevent spill. >>> > >>> > I did some research but could not get any concrete solution.Let me >>> know if someone has done this. >>> > >>> > Any guidance would be a great help. >>> > >>> > Thanks >>> > Shashi >>> >> >> >
Re: Securing Spark Job on Cluster
Yes I am using HDFS .Just trying to understand couple of point. There would be two kind of encryption which would be required. 1. Data in Motion - This could be achieved by enabling SSL - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_spark-component-guide/content/spark-encryption.html 2. Data at Rest - HDFS Encryption can be applied. Apart from this when spark executes a job , each disk available in all node needs to be encrypted . I can have multiple disk on each node and encrypting all of them could be costly operation - Therefore I was trying to identify during job execution what are possible folders where spark can spill data . Once these items are identified those specific disk can be encrypted. Thanks Shashi On Fri, Apr 28, 2017 at 4:34 PM, Jörn Frankewrote: > Why don't you use whole disk encryption? > Are you using HDFS? > > On 28. Apr 2017, at 16:57, Shashi Vishwakarma > wrote: > > Agreed Jorn. Disk encryption is one option that will help to secure data > but how do I know at which location Spark is spilling temp file, shuffle > data and application data ? > > Thanks > Shashi > > On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke wrote: > >> You can use disk encryption as provided by the operating system. >> Additionally, you may think about shredding disks after they are not used >> anymore. >> >> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma >> wrote: >> > >> > Hi All >> > >> > I was dealing with one the spark requirement here where Client (like >> Banking Client where security is major concern) needs all spark processing >> should happen securely. >> > >> > For example all communication happening between spark client and server >> ( driver & executor communication) should be on secure channel. Even when >> spark spills on disk based on storage level (Mem+Disk), it should not be >> written in un-encrypted format on local disk or there should be some >> workaround to prevent spill. >> > >> > I did some research but could not get any concrete solution.Let me >> know if someone has done this. >> > >> > Any guidance would be a great help. >> > >> > Thanks >> > Shashi >> > >
Re: Securing Spark Job on Cluster
Why don't you use whole disk encryption? Are you using HDFS? > On 28. Apr 2017, at 16:57, Shashi Vishwakarma> wrote: > > Agreed Jorn. Disk encryption is one option that will help to secure data but > how do I know at which location Spark is spilling temp file, shuffle data and > application data ? > > Thanks > Shashi > >> On Fri, Apr 28, 2017 at 3:54 PM, Jörn Franke wrote: >> You can use disk encryption as provided by the operating system. >> Additionally, you may think about shredding disks after they are not used >> anymore. >> >> > On 28. Apr 2017, at 14:45, Shashi Vishwakarma >> > wrote: >> > >> > Hi All >> > >> > I was dealing with one the spark requirement here where Client (like >> > Banking Client where security is major concern) needs all spark processing >> > should happen securely. >> > >> > For example all communication happening between spark client and server ( >> > driver & executor communication) should be on secure channel. Even when >> > spark spills on disk based on storage level (Mem+Disk), it should not be >> > written in un-encrypted format on local disk or there should be some >> > workaround to prevent spill. >> > >> > I did some research but could not get any concrete solution.Let me know >> > if someone has done this. >> > >> > Any guidance would be a great help. >> > >> > Thanks >> > Shashi >
Re: Securing Spark Job on Cluster
Agreed Jorn. Disk encryption is one option that will help to secure data but how do I know at which location Spark is spilling temp file, shuffle data and application data ? Thanks Shashi On Fri, Apr 28, 2017 at 3:54 PM, Jörn Frankewrote: > You can use disk encryption as provided by the operating system. > Additionally, you may think about shredding disks after they are not used > anymore. > > > On 28. Apr 2017, at 14:45, Shashi Vishwakarma > wrote: > > > > Hi All > > > > I was dealing with one the spark requirement here where Client (like > Banking Client where security is major concern) needs all spark processing > should happen securely. > > > > For example all communication happening between spark client and server > ( driver & executor communication) should be on secure channel. Even when > spark spills on disk based on storage level (Mem+Disk), it should not be > written in un-encrypted format on local disk or there should be some > workaround to prevent spill. > > > > I did some research but could not get any concrete solution.Let me know > if someone has done this. > > > > Any guidance would be a great help. > > > > Thanks > > Shashi >
Re: Securing Spark Job on Cluster
You can use disk encryption as provided by the operating system. Additionally, you may think about shredding disks after they are not used anymore. > On 28. Apr 2017, at 14:45, Shashi Vishwakarma> wrote: > > Hi All > > I was dealing with one the spark requirement here where Client (like Banking > Client where security is major concern) needs all spark processing should > happen securely. > > For example all communication happening between spark client and server ( > driver & executor communication) should be on secure channel. Even when spark > spills on disk based on storage level (Mem+Disk), it should not be written in > un-encrypted format on local disk or there should be some workaround to > prevent spill. > > I did some research but could not get any concrete solution.Let me know if > someone has done this. > > Any guidance would be a great help. > > Thanks > Shashi - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Securing Spark Job on Cluster
Kerberos is not a apache project. Kerberos provides a way to do authentication but does not provide data security. On Fri, Apr 28, 2017 at 3:24 PM, veera satya nv Dantuluri < dvsnva...@gmail.com> wrote: > Hi Shashi, > > Based on your requirement for securing data, we can use Apache kebros, or > we could use the security feature in Spark. > > > > On Apr 28, 2017, at 8:45 AM, Shashi Vishwakarma < > shashi.vish...@gmail.com> wrote: > > > > Hi All > > > > I was dealing with one the spark requirement here where Client (like > Banking Client where security is major concern) needs all spark processing > should happen securely. > > > > For example all communication happening between spark client and server > ( driver & executor communication) should be on secure channel. Even when > spark spills on disk based on storage level (Mem+Disk), it should not be > written in un-encrypted format on local disk or there should be some > workaround to prevent spill. > > > > I did some research but could not get any concrete solution.Let me know > if someone has done this. > > > > Any guidance would be a great help. > > > > Thanks > > Shashi > >
Re: Securing Spark Job on Cluster
Hi Shashi, Based on your requirement for securing data, we can use Apache kebros, or we could use the security feature in Spark. > On Apr 28, 2017, at 8:45 AM, Shashi Vishwakarma> wrote: > > Hi All > > I was dealing with one the spark requirement here where Client (like Banking > Client where security is major concern) needs all spark processing should > happen securely. > > For example all communication happening between spark client and server ( > driver & executor communication) should be on secure channel. Even when spark > spills on disk based on storage level (Mem+Disk), it should not be written in > un-encrypted format on local disk or there should be some workaround to > prevent spill. > > I did some research but could not get any concrete solution.Let me know if > someone has done this. > > Any guidance would be a great help. > > Thanks > Shashi - To unsubscribe e-mail: user-unsubscr...@spark.apache.org