[
https://issues.apache.org/jira/browse/SPARK-13979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200183#comment-15200183
]
Mitesh commented on SPARK-13979:
--------------------------------
I'm seeing this too. Its really annoying because I set the s3 access and secret
keys in all places that the docs specify:
{noformat}
sparkConf.hadoopConf.set("fs.s3n.awsAccessKeyId", ..)
sparkConf.set("spark.hadoop.fs.s3n.awsAccessKeyId", ..)
sparkConf.set("spark.hadoop.cloneConf", true)
<spark conf dir>/core-site.xml <property><name>fs.s3n.awsAccessKeyId</name>
<spark conf dir>/spark-env.sh export AWS_ACCESS_KEY_ID = ...
{noformat}
None of that seems to work. If I kill a running executor, it comes back up and
doesnt have the keys anymore.
> Killed executor is respawned without AWS keys in standalone spark cluster
> -------------------------------------------------------------------------
>
> Key: SPARK-13979
> URL: https://issues.apache.org/jira/browse/SPARK-13979
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.5.2
> Environment: I'm using Spark 1.5.2 with Hadoop 2.7 and running
> experiments on a simple standalone cluster:
> 1 master
> 2 workers
> All ubuntu 14.04 with Java 8/Scala 2.10
> Reporter: Allen George
>
> I'm having a problem where respawning a failed executor during a job that
> reads/writes parquet on S3 causes subsequent tasks to fail because of missing
> AWS keys.
> h4. Setup:
> I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple
> standalone cluster:
> 1 master
> 2 workers
> My application is co-located on the master machine, while the two workers are
> on two other machines (one worker per machine). All machines are running in
> EC2. I've configured my setup so that my application executes its task on two
> executors (one executor per worker).
> h4. Application:
> My application reads and writes parquet files on S3. I set the AWS keys on
> the SparkContext by doing:
> val sc = new SparkContext()
> val hadoopConf = sc.hadoopConfiguration
> hadoopConf.set("fs.s3n.awsAccessKeyId", "SOME_KEY")
> hadoopConf.set("fs.s3n.awsSecretAccessKey", "SOME_SECRET")
> At this point I'm done, and I go ahead and use "sc".
> h4. Issue:
> I can read and write parquet files without a problem with this setup. *BUT*
> if an executor dies during a job and is respawned by a worker, tasks fail
> with the following error:
> "Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret
> Access Key must be specified as the username or password (respectively) of a
> s3n URL, or by setting the {{fs.s3n.awsAccessKeyId}} or
> {{fs.s3n.awsSecretAccessKey}} properties (respectively)."
> h4. Basic analysis
> I think I've traced this down to the following:
> SparkHadoopUtil is initialized with an empty {{SparkConf}}. Later, classes
> like {{DataSourceStrategy}} simply call {{SparkHadoopUtil.get.conf}} and
> access the (now invalid; missing various properties) {{HadoopConfiguration}}
> that's built from this empty {{SparkConf}} object. It's unclear to me why
> this is done, and it seems that the code as written would cause broken
> results anytime callers use {{SparkHadoopUtil.get.conf}} directly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]