[ 
https://issues.apache.org/jira/browse/SPARK-13979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200183#comment-15200183
 ] 

Mitesh commented on SPARK-13979:
--------------------------------

I'm seeing this too. Its really annoying because I set the s3 access and secret 
keys in all places that the docs specify:

{noformat}
sparkConf.hadoopConf.set("fs.s3n.awsAccessKeyId", ..)
sparkConf.set("spark.hadoop.fs.s3n.awsAccessKeyId", ..)
sparkConf.set("spark.hadoop.cloneConf", true)
<spark conf dir>/core-site.xml   <property><name>fs.s3n.awsAccessKeyId</name>
<spark conf dir>/spark-env.sh   export AWS_ACCESS_KEY_ID = ...
{noformat}

None of that seems to work. If I kill a running executor, it comes back up and 
doesnt have the keys anymore.

> Killed executor is respawned without AWS keys in standalone spark cluster
> -------------------------------------------------------------------------
>
>                 Key: SPARK-13979
>                 URL: https://issues.apache.org/jira/browse/SPARK-13979
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.2
>         Environment: I'm using Spark 1.5.2 with Hadoop 2.7 and running 
> experiments on a simple standalone cluster:
> 1 master
> 2 workers
> All ubuntu 14.04 with Java 8/Scala 2.10
>            Reporter: Allen George
>
> I'm having a problem where respawning a failed executor during a job that 
> reads/writes parquet on S3 causes subsequent tasks to fail because of missing 
> AWS keys.
> h4. Setup:
> I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple 
> standalone cluster:
> 1 master
> 2 workers
> My application is co-located on the master machine, while the two workers are 
> on two other machines (one worker per machine). All machines are running in 
> EC2. I've configured my setup so that my application executes its task on two 
> executors (one executor per worker).
> h4. Application:
> My application reads and writes parquet files on S3. I set the AWS keys on 
> the SparkContext by doing:
> val sc = new SparkContext()
> val hadoopConf = sc.hadoopConfiguration
> hadoopConf.set("fs.s3n.awsAccessKeyId", "SOME_KEY")
> hadoopConf.set("fs.s3n.awsSecretAccessKey", "SOME_SECRET")
> At this point I'm done, and I go ahead and use "sc".
> h4. Issue:
> I can read and write parquet files without a problem with this setup. *BUT* 
> if an executor dies during a job and is respawned by a worker, tasks fail 
> with the following error:
> "Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret 
> Access Key must be specified as the username or password (respectively) of a 
> s3n URL, or by setting the {{fs.s3n.awsAccessKeyId}} or 
> {{fs.s3n.awsSecretAccessKey}} properties (respectively)."
> h4. Basic analysis
> I think I've traced this down to the following:
> SparkHadoopUtil is initialized with an empty {{SparkConf}}. Later, classes 
> like {{DataSourceStrategy}} simply call {{SparkHadoopUtil.get.conf}} and 
> access the (now invalid; missing various properties) {{HadoopConfiguration}} 
> that's built from this empty {{SparkConf}} object. It's unclear to me why 
> this is done, and it seems that the code as written would cause broken 
> results anytime callers use {{SparkHadoopUtil.get.conf}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to