Re: Spark and S3 server side encryption
"fs.s3a.server-side-encryption-algorithm" is honored by s3a support in hadoop 2.6.0+ as well. Cheers On Thu, Jan 29, 2015 at 6:51 AM, Danny wrote: > On Spark 1.2.0 you have the "s3a" library to work with S3. And there is a > config param named "fs.s3a.server-side-encryption-algorithm": > > https://github.com/Aloisius/hadoop-s3a > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377p21420.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Spark and S3 server side encryption
On Spark 1.2.0 you have the "s3a" library to work with S3. And there is a config param named "fs.s3a.server-side-encryption-algorithm": https://github.com/Aloisius/hadoop-s3a -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377p21420.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark and S3 server side encryption
I have been trying to work around a similar problem with my Typesafe config *.conf files seemingly not appearing on the executors. (Though now that I think about it its not because the files are absent in the JAR, but because the -Dconf.resource environment variable I pass to the master obviously doesn't get relayed to the workers.) What happens if you do something like this: nohup ./bin/spark-submit --verbose —jars lib/app.jar \ --master spark://master-amazonaws.com:7077 \ --class com.elsevier.spark.SparkSync \ --conf "spark.executor.extraJavaOptions=-Ds3service.server-side-encryption=AES256" lib/app.jar > out.log & (I bet this will fix my problem too.) On Wed Jan 28 2015 at 10:17:09 AM Kohler, Curt E (ELS-STL) < c.koh...@elsevier.com> wrote: > So, following up on your suggestion, I'm still having some problems > getting the configuration changes recognized when my job run. > > > I’ve added jets3t.properties to the root of my application jar file that > I > submit to Spark (via spark-submit). > > I’ve verified that my jets3t.properties is at the root of my application > jar by executing jar tf app.jar. > > I submit my job to the cluster with the following command. > > nohup ./bin/spark-submit --verbose —jars lib/app.jar --master > spark://master-amazonaws.com:7077 --class com.elsevier.spark.SparkSync > lib/app.jar > out.log & > > > > In my mainline of app.jar, I also added the following code: > > > log.info(System.getProperty("java.class.path")); > InputStream in = > SparkSync.class.getClassLoader().getResourceAsStream("jets3t.properties"); > log.info(getStringFromInputStream(in)); > > And I can see that the jets3t.properties I provided is found because it > outputs: > > s3service.server-side-encryption=AES256 > > It’s almost as if the hadoop/jets3t piece has already been initialized and > is ignoring my jets3t.properties. > > I can get this all working inside of Eclipse by including the folder > containing my jets3t.properties. But, I can’t get things working when > trying to submit this to a spark stand-alone cluster. > > Any insights would be appreciated. > -------------- > *From:* Thomas Demoor > *Sent:* Tuesday, January 27, 2015 4:41 AM > *To:* Kohler, Curt E (ELS-STL) > *Cc:* user@spark.apache.org > *Subject:* Re: Spark and S3 server side encryption > > Spark uses the Hadoop filesystems. > > I assume you are trying to use s3n:// which, under the hood, uses the > 3rd party jets3t library. It is configured through the jets3t.properties > file (google "hadoop s3n jets3t") which you should put on Spark's > classpath. The setting you are looking for > is s3service.server-side-encryption > > The last version of hadoop (2.6) introduces a new and improved s3a:// > filesystem which has the official sdk from Amazon under the hood. > > > On Mon, Jan 26, 2015 at 10:01 PM, curtkohler > wrote: > >> We are trying to create a Spark job that writes out a file to S3 that >> leverage S3's server side encryption for sensitive data. Typically this is >> accomplished by setting the appropriate header on the put request, but it >> isn't clear whether this capability is exposed in the Spark/Hadoop APIs. >> Does anyone have any suggestions? >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Spark and S3 server side encryption
So, following up on your suggestion, I'm still having some problems getting the configuration changes recognized when my job run. I've added jets3t.properties to the root of my application jar file that I submit to Spark (via spark-submit). I've verified that my jets3t.properties is at the root of my application jar by executing jar tf app.jar. I submit my job to the cluster with the following command. nohup ./bin/spark-submit --verbose -jars lib/app.jar --master spark://master-amazonaws.com:7077 --class com.elsevier.spark.SparkSync lib/app.jar > out.log & In my mainline of app.jar, I also added the following code: log.info(System.getProperty("java.class.path")); InputStream in = SparkSync.class.getClassLoader().getResourceAsStream("jets3t.properties"); log.info(getStringFromInputStream(in)); And I can see that the jets3t.properties I provided is found because it outputs: s3service.server-side-encryption=AES256 It's almost as if the hadoop/jets3t piece has already been initialized and is ignoring my jets3t.properties. I can get this all working inside of Eclipse by including the folder containing my jets3t.properties. But, I can't get things working when trying to submit this to a spark stand-alone cluster. Any insights would be appreciated.? From: Thomas Demoor Sent: Tuesday, January 27, 2015 4:41 AM To: Kohler, Curt E (ELS-STL) Cc: user@spark.apache.org Subject: Re: Spark and S3 server side encryption Spark uses the Hadoop filesystems. I assume you are trying to use s3n:// which, under the hood, uses the 3rd party jets3t library. It is configured through the jets3t.properties file (google "hadoop s3n jets3t") which you should put on Spark's classpath. The setting you are looking for is s3service.server-side-encryption The last version of hadoop (2.6) introduces a new and improved s3a:// filesystem which has the official sdk from Amazon under the hood. On Mon, Jan 26, 2015 at 10:01 PM, curtkohler mailto:c.koh...@elsevier.com>> wrote: We are trying to create a Spark job that writes out a file to S3 that leverage S3's server side encryption for sensitive data. Typically this is accomplished by setting the appropriate header on the put request, but it isn't clear whether this capability is exposed in the Spark/Hadoop APIs. Does anyone have any suggestions? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>
Re: Spark and S3 server side encryption
Adding on what Thomas said. There have been a few bug fixes for s3a since Hadoop 2.6.0 was released. One example is HADOOP-11446. The fixes would be in Hadoop 2.7.0 Cheers > On Jan 27, 2015, at 1:41 AM, Thomas Demoor > wrote: > > Spark uses the Hadoop filesystems. > > I assume you are trying to use s3n:// which, under the hood, uses the 3rd > party jets3t library. It is configured through the jets3t.properties file > (google "hadoop s3n jets3t") which you should put on Spark's classpath. The > setting you are looking for is s3service.server-side-encryption > > The last version of hadoop (2.6) introduces a new and improved s3a:// > filesystem which has the official sdk from Amazon under the hood. > > >> On Mon, Jan 26, 2015 at 10:01 PM, curtkohler wrote: >> We are trying to create a Spark job that writes out a file to S3 that >> leverage S3's server side encryption for sensitive data. Typically this is >> accomplished by setting the appropriate header on the put request, but it >> isn't clear whether this capability is exposed in the Spark/Hadoop APIs. >> Does anyone have any suggestions? >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >
Re: Spark and S3 server side encryption
Spark uses the Hadoop filesystems. I assume you are trying to use s3n:// which, under the hood, uses the 3rd party jets3t library. It is configured through the jets3t.properties file (google "hadoop s3n jets3t") which you should put on Spark's classpath. The setting you are looking for is s3service.server-side-encryption The last version of hadoop (2.6) introduces a new and improved s3a:// filesystem which has the official sdk from Amazon under the hood. On Mon, Jan 26, 2015 at 10:01 PM, curtkohler wrote: > We are trying to create a Spark job that writes out a file to S3 that > leverage S3's server side encryption for sensitive data. Typically this is > accomplished by setting the appropriate header on the put request, but it > isn't clear whether this capability is exposed in the Spark/Hadoop APIs. > Does anyone have any suggestions? > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >