Re: Spark and S3 server side encryption

Kohler, Curt E (ELS-STL) Wed, 28 Jan 2015 07:17:49 -0800

So, following up on your suggestion, I'm still having some problems getting the 
configuration changes recognized when my job run.



I've added jets3t.properties to the root of my application jar file that I
submit to Spark (via spark-submit).

I've verified that my jets3t.properties is at the root of my application
jar by executing jar tf app.jar.

I submit my job to the cluster with the following command.

nohup ./bin/spark-submit --verbose -jars lib/app.jar --master
spark://master-amazonaws.com:7077  --class com.elsevier.spark.SparkSync
lib/app.jar > out.log &



In my mainline of app.jar, I also added the following code:


log.info(System.getProperty("java.class.path"));
InputStream in =
SparkSync.class.getClassLoader().getResourceAsStream("jets3t.properties");
log.info(getStringFromInputStream(in));

And I can see that the jets3t.properties I provided is found because it
outputs:

s3service.server-side-encryption=AES256

It's almost as if the hadoop/jets3t piece has already been initialized and
is ignoring my jets3t.properties.

I can get this all working inside of Eclipse by including the folder
containing my jets3t.properties.  But, I can't get things working when
trying to submit this to a spark stand-alone cluster.

Any insights would be appreciated.?

________________________________
From: Thomas Demoor <thomas.dem...@amplidata.com>
Sent: Tuesday, January 27, 2015 4:41 AM
To: Kohler, Curt E (ELS-STL)
Cc: user@spark.apache.org
Subject: Re: Spark and S3 server side encryption

Spark uses the Hadoop filesystems.

I assume you are trying to use s3n:// which, under the hood, uses the 3rd party 
jets3t library. It is configured through the jets3t.properties file (google 
"hadoop s3n jets3t") which you should put on Spark's classpath. The setting you 
are looking for is s3service.server-side-encryption

The last version of hadoop (2.6) introduces a new and improved s3a:// 
filesystem which has the official sdk from Amazon under the hood.


On Mon, Jan 26, 2015 at 10:01 PM, curtkohler 
<c.koh...@elsevier.com<mailto:c.koh...@elsevier.com>> wrote:
We are trying to create a Spark job that writes out a file to S3 that
leverage S3's server side encryption for sensitive data. Typically this is
accomplished by setting the appropriate header on the put request, but it
isn't clear whether this capability is exposed in the Spark/Hadoop APIs.
Does anyone have any suggestions?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-S3-server-side-encryption-tp21377.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Re: Spark and S3 server side encryption

Reply via email to