Re: Spark Security
Hello, My hard drive has about 80 GB of space left on it, and the RAM is about 12GB. I am not sure the size of the .tsv file, but it will most likely be around 30 GB. Thanks, Wilbert Seoane On Fri, May 29, 2020 at 5:03 PM Anwar AliKhan wrote: > What is the size of your .tsv file sir ? > What is the size of your local hard drive sir ? > > > Regards > > > Wali Ahaad > > > On Fri, 29 May 2020, 16:21 , wrote: > >> Hello, >> >> I plan to load in a local .tsv file from my hard drive using sparklyr (an >> R package). I have figured out how to do this already on small files. >> >> When I decide to receive my client’s large .tsv file, can I be confident >> that loading in data this way will be secure? I know that this creates a >> Spark connection to help process the data more quickly, but I want to >> verify that the data will be secure after loading it with the Spark >> connection and sparklyr. >> >> >> Thanks, >> >> Wilbert J. Seoane >> >> Sent from iPhone >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Spark Security
What is the size of your .tsv file sir ? What is the size of your local hard drive sir ? Regards Wali Ahaad On Fri, 29 May 2020, 16:21 , wrote: > Hello, > > I plan to load in a local .tsv file from my hard drive using sparklyr (an > R package). I have figured out how to do this already on small files. > > When I decide to receive my client’s large .tsv file, can I be confident > that loading in data this way will be secure? I know that this creates a > Spark connection to help process the data more quickly, but I want to > verify that the data will be secure after loading it with the Spark > connection and sparklyr. > > > Thanks, > > Wilbert J. Seoane > > Sent from iPhone > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Spark Security
If you load a file on your computer, that is unrelated to Spark. Whatever you load via Spark APIs will at some point live in memory on the Spark cluster, or the storage you back it with if you store it. Whether the cluster and storage are secure (like, ACLs / auth enabled) is up to whoever runs the cluster. On Fri, May 29, 2020 at 1:54 PM wrote: > Hi Sean > > I mean that I won’t be opening up my client for any data breaches or > anything like that by connecting to Spark and loading in their data using > sparklyr in R studio. > > Connecting with spark and loading in a tsv file on my local computer is > secure correct? > > > Thanks > > Wilbert J. Seoane > > Sent from iPhone > > On May 29, 2020, at 11:25 AM, Sean Owen wrote: > > > What do you mean by secure here? > > On Fri, May 29, 2020 at 10:21 AM wrote: > >> Hello, >> >> I plan to load in a local .tsv file from my hard drive using sparklyr (an >> R package). I have figured out how to do this already on small files. >> >> When I decide to receive my client’s large .tsv file, can I be confident >> that loading in data this way will be secure? I know that this creates a >> Spark connection to help process the data more quickly, but I want to >> verify that the data will be secure after loading it with the Spark >> connection and sparklyr. >> >> >> Thanks, >> >> Wilbert J. Seoane >> >> Sent from iPhone >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Spark Security
What do you mean by secure here? On Fri, May 29, 2020 at 10:21 AM wrote: > Hello, > > I plan to load in a local .tsv file from my hard drive using sparklyr (an > R package). I have figured out how to do this already on small files. > > When I decide to receive my client’s large .tsv file, can I be confident > that loading in data this way will be secure? I know that this creates a > Spark connection to help process the data more quickly, but I want to > verify that the data will be secure after loading it with the Spark > connection and sparklyr. > > > Thanks, > > Wilbert J. Seoane > > Sent from iPhone > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Spark Security
Hello, I plan to load in a local .tsv file from my hard drive using sparklyr (an R package). I have figured out how to do this already on small files. When I decide to receive my client’s large .tsv file, can I be confident that loading in data this way will be secure? I know that this creates a Spark connection to help process the data more quickly, but I want to verify that the data will be secure after loading it with the Spark connection and sparklyr. Thanks, Wilbert J. Seoane Sent from iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark security
On 13 Oct 2016, at 14:40, Mendelson, Assaf> wrote: Hi, We have a spark cluster and we wanted to add some security for it. I was looking at the documentation (in http://spark.apache.org/docs/latest/security.html) and had some questions. 1. Do all executors listen by the same blockManager port? For example, in yarn there are multiple executors per node, do they all listen to the same port? On YARN the executors will come up on their own ports. 2. Are ports defined in earlier version (e.g. http://spark.apache.org/docs/1.6.1/security.html) and removed in the latest (such as spark.executor.port and spark.fileserver.port) gone and can be blocked? 3. If I define multiple workers per node in spark standalone mode, how do I set the different ports for each worker (there is only one spark.worker.ui.port / SPARK_WORKER_WEBUI_PORT definition. Do I have to start each worker separately to configure a port?) The same is true for the worker port (SPARK_WORKER_PORT) 4. Is it possible to encrypt the logs instead of just limiting with permissions the log directory? if writing to HDFS on a Hadoop 2.7+ cluster you can use HDFS Encryption At Rest to encrypt the data on the disks. If you are talking to S3 with the Hadoop 2.8+ libraries (not officially shipping), you can use S3 server side encryption with AWS managed keys too. 5. Is the communication between the servers encrypted (e.g. using ssh?) you can enable this; https://spark.apache.org/docs/latest/security.html https://spark.apache.org/docs/latest/configuration.html#security spark.network.sasl.serverAlwaysEncrypt true spark.authenticate.enableSaslEncryption true I *believe* that encrypted shuffle comes with 2.1 https://issues.apache.org/jira/browse/SPARK-5682 as usual, look in the source to really understand there's various ways to interact with spark and within; you need to make sure they are all secured against malicious users -web UI. on YARN, you can use SPNEGO to kerberos-auth the yarn RM proxy; the Spark UI will 302 all direct requests to its web UI back to that proxy. Communications behind the scnese between the RM and the Spark UI will not, AFAIK, be encrypted/authed. -spark-driver executor comms -bulk data exchange between drivers -shuffle service in executor, or hosted inside YARN node managers. -spark-filesystem communications -spark to other data source communications (Kafka, etc) You're going to have go through them all and do the checklist. As is usual in an open source project, documentation improvements are always welcome. There is a good security doc in the spark source —but I'm sure extra contributions will be welcome 6. Are there any additional best practices beyond what is written in the documentation? Thanks, In a YARN cluster, Kerberos is mandatory if you want any form of security. Sorry.
RE: Spark security
Anyone can assist with this? From: Mendelson, Assaf [mailto:assaf.mendel...@rsa.com] Sent: Thursday, October 13, 2016 3:41 PM To: user@spark.apache.org Subject: Spark security Hi, We have a spark cluster and we wanted to add some security for it. I was looking at the documentation (in http://spark.apache.org/docs/latest/security.html) and had some questions. 1. Do all executors listen by the same blockManager port? For example, in yarn there are multiple executors per node, do they all listen to the same port? 2. Are ports defined in earlier version (e.g. http://spark.apache.org/docs/1.6.1/security.html) and removed in the latest (such as spark.executor.port and spark.fileserver.port) gone and can be blocked? 3. If I define multiple workers per node in spark standalone mode, how do I set the different ports for each worker (there is only one spark.worker.ui.port / SPARK_WORKER_WEBUI_PORT definition. Do I have to start each worker separately to configure a port?) The same is true for the worker port (SPARK_WORKER_PORT) 4. Is it possible to encrypt the logs instead of just limiting with permissions the log directory? 5. Is the communication between the servers encrypted (e.g. using ssh?) 6. Are there any additional best practices beyond what is written in the documentation? Thanks, Assaf.
Spark security
Hi, We have a spark cluster and we wanted to add some security for it. I was looking at the documentation (in http://spark.apache.org/docs/latest/security.html) and had some questions. 1. Do all executors listen by the same blockManager port? For example, in yarn there are multiple executors per node, do they all listen to the same port? 2. Are ports defined in earlier version (e.g. http://spark.apache.org/docs/1.6.1/security.html) and removed in the latest (such as spark.executor.port and spark.fileserver.port) gone and can be blocked? 3. If I define multiple workers per node in spark standalone mode, how do I set the different ports for each worker (there is only one spark.worker.ui.port / SPARK_WORKER_WEBUI_PORT definition. Do I have to start each worker separately to configure a port?) The same is true for the worker port (SPARK_WORKER_PORT) 4. Is it possible to encrypt the logs instead of just limiting with permissions the log directory? 5. Is the communication between the servers encrypted (e.g. using ssh?) 6. Are there any additional best practices beyond what is written in the documentation? Thanks, Assaf.