RE: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

Cheng, Hao Tue, 27 Oct 2015 23:33:54 -0700

Hi Jerry, I’ve filed a bug in jira, and also the fixing

https://issues.apache.org/jira/browse/SPARK-11364

It will be great appreciated if you can verify the PR with your case.

Thanks,
Hao

From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Wednesday, October 28, 2015 8:51 AM
To: Jerry Lam; Marcelo Vanzin
Cc: user@spark.apache.org
Subject: RE: [Spark-SQL]: Unable to propagate hadoop configuration after 
SparkContext is initialized

After a draft glance, seems a bug in Spark SQL, do you mind to create a jira 
for this? And then I can start to fix it.

Thanks,
Hao

From: Jerry Lam [mailto:chiling...@gmail.com]
Sent: Wednesday, October 28, 2015 3:13 AM
To: Marcelo Vanzin
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: [Spark-SQL]: Unable to propagate hadoop configuration after 
SparkContext is initialized

Hi Marcelo,

I tried setting the properties before instantiating spark context via 
SparkConf. It works fine.
Originally, the code I have read hadoop configurations from hdfs-site.xml which 
works perfectly fine as well.
Therefore, can I conclude that sparkContext.hadoopConfiguration.set("key", 
"value") does not propagate through all SQL jobs within the same SparkContext? 
I haven't try with Spark Core so I cannot tell.

Is there a workaround given it seems to be broken? I need to do this 
programmatically after the SparkContext is instantiated not before...

Best Regards,

Jerry

On Tue, Oct 27, 2015 at 2:30 PM, Marcelo Vanzin 
<van...@cloudera.com<mailto:van...@cloudera.com>> wrote:
If setting the values in SparkConf works, there's probably some bug in
the SQL code; e.g. creating a new Configuration object instead of
using the one in SparkContext. But I'm not really familiar with that
code.

On Tue, Oct 27, 2015 at 11:22 AM, Jerry Lam 
<chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote:
> Hi Marcelo,
>
> Thanks for the advice. I understand that we could set the configurations
> before creating SparkContext. My question is
> SparkContext.hadoopConfiguration.set("key","value") doesn't seem to
> propagate to all subsequent SQLContext jobs. Note that I mentioned I can
> load the parquet file but I cannot perform a count on the parquet file
> because of the AmazonClientException. It means that the credential is used
> during the loading of the parquet but not when we are processing the parquet
> file. How this can happen?
>
> Best Regards,
>
> Jerry
>
>
> On Tue, Oct 27, 2015 at 2:05 PM, Marcelo Vanzin 
> <van...@cloudera.com<mailto:van...@cloudera.com>> wrote:
>>
>> On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam 
>> <chiling...@gmail.com<mailto:chiling...@gmail.com>> wrote:
>> > Anyone experiences issues in setting hadoop configurations after
>> > SparkContext is initialized? I'm using Spark 1.5.1.
>> >
>> > I'm trying to use s3a which requires access and secret key set into
>> > hadoop
>> > configuration. I tried to set the properties in the hadoop configuration
>> > from sparktcontext.
>> >
>> > sc.hadoopConfiguration.set("fs.s3a.access.key", AWSAccessKeyId)
>> > sc.hadoopConfiguration.set("fs.s3a.secret.key", AWSSecretKey)
>>
>> Try setting "spark.hadoop.fs.s3a.access.key" and
>> "spark.hadoop.fs.s3a.secret.key" in your SparkConf before creating the
>> SparkContext.
>>
>> --
>> Marcelo
>
>

--
Marcelo

RE: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

Reply via email to