Re: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

Jerry Lam Tue, 27 Oct 2015 12:13:49 -0700

Hi Marcelo,

I tried setting the properties before instantiating spark context via
SparkConf. It works fine.
Originally, the code I have read hadoop configurations from hdfs-site.xml
which works perfectly fine as well.
Therefore, can I conclude that sparkContext.hadoopConfiguration.set("key",
"value") does not propagate through all SQL jobs within the same
SparkContext? I haven't try with Spark Core so I cannot tell.


Is there a workaround given it seems to be broken? I need to do this
programmatically after the SparkContext is instantiated not before...

Best Regards,

Jerry

On Tue, Oct 27, 2015 at 2:30 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> If setting the values in SparkConf works, there's probably some bug in
> the SQL code; e.g. creating a new Configuration object instead of
> using the one in SparkContext. But I'm not really familiar with that
> code.
>
> On Tue, Oct 27, 2015 at 11:22 AM, Jerry Lam <chiling...@gmail.com> wrote:
> > Hi Marcelo,
> >
> > Thanks for the advice. I understand that we could set the configurations
> > before creating SparkContext. My question is
> > SparkContext.hadoopConfiguration.set("key","value") doesn't seem to
> > propagate to all subsequent SQLContext jobs. Note that I mentioned I can
> > load the parquet file but I cannot perform a count on the parquet file
> > because of the AmazonClientException. It means that the credential is
> used
> > during the loading of the parquet but not when we are processing the
> parquet
> > file. How this can happen?
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Tue, Oct 27, 2015 at 2:05 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>
> >> On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam <chiling...@gmail.com>
> wrote:
> >> > Anyone experiences issues in setting hadoop configurations after
> >> > SparkContext is initialized? I'm using Spark 1.5.1.
> >> >
> >> > I'm trying to use s3a which requires access and secret key set into
> >> > hadoop
> >> > configuration. I tried to set the properties in the hadoop
> configuration
> >> > from sparktcontext.
> >> >
> >> > sc.hadoopConfiguration.set("fs.s3a.access.key", AWSAccessKeyId)
> >> > sc.hadoopConfiguration.set("fs.s3a.secret.key", AWSSecretKey)
> >>
> >> Try setting "spark.hadoop.fs.s3a.access.key" and
> >> "spark.hadoop.fs.s3a.secret.key" in your SparkConf before creating the
> >> SparkContext.
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>

Re: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

Reply via email to