Re: SparkSQL integration issue with AWS S3a

2016-01-06 Thread Kostiantyn Kudriavtsev
Hi guys, the only one big issue with this approach: > spark.hadoop.s3a.access.key is now visible everywhere, in logs, in spark > webui and is not secured at all... On Jan 2, 2016, at 11:13 AM, KOSTIANTYN Kudriavtsev wrote: > thanks Jerry, it works! > really

Re: SparkSQL integration issue with AWS S3a

2016-01-06 Thread Jerry Lam
Hi Kostiantyn, Yes. If security is a concern then this approach cannot satisfy it. The keys are visible in the properties files. If the goal is to hide them, you might be able go a bit further with this approach. Have you look at spark security page? Best Regards, Jerry Sent from my iPhone

Re: SparkSQL integration issue with AWS S3a

2016-01-02 Thread KOSTIANTYN Kudriavtsev
thanks Jerry, it works! really appreciate your help Thank you, Konstantin Kudryavtsev On Fri, Jan 1, 2016 at 4:35 PM, Jerry Lam wrote: > Hi Kostiantyn, > > You should be able to use spark.conf to specify s3a keys. > > I don't remember exactly but you can add hadoop

Re: SparkSQL integration issue with AWS S3a

2016-01-01 Thread Jerry Lam
Hi Kostiantyn, You should be able to use spark.conf to specify s3a keys. I don't remember exactly but you can add hadoop properties by prefixing spark.hadoop.* * is the s3a properties. For instance, spark.hadoop.s3a.access.key wudjgdueyhsj Of course, you need to make sure the property key is

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread Steve Loughran
> On 30 Dec 2015, at 19:31, KOSTIANTYN Kudriavtsev > wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds - > on the same instances. Could you shed some light if it's possible to achieve > with hdfs-site? > >

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread Brian London
Since you're running in standalone mode, can you try it using Spark 1.5.1 please? On Thu, Dec 31, 2015 at 9:09 AM Steve Loughran wrote: > > > On 30 Dec 2015, at 19:31, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > > > Hi Jerry, > > > > I want to

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread KOSTIANTYN Kudriavtsev
Hi Jerry, thanks for the hint, could you please more specific how can I pass different spark-{usr}.conf per user during job submit and which propery I can use to specify custom hdfs-site.xml? I tried to google, but didn't find nothing Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread KOSTIANTYN Kudriavtsev
Hi Jerry, what you suggested looks to be working (I put hdfs-site.xml into $SPARK_HOME/conf folder), but could you shed some light on how it can be federated per user? Thanks in advance! Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 2:37 PM, Jerry Lam wrote:

SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Dear Spark community, I faced the following issue with trying accessing data on S3a, my code is the following: val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Chris Fregly
are the credentials visible from each Worker node to all the Executor JVMs on each Worker? > On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev > wrote: > > Dear Spark community, > > I faced the following issue with trying accessing data on S3a, my code is

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Chris, good question, as you can see from the code I set up them on driver, so I expect they will be propagated to all nodes, won't them? Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly wrote: > are the credentials visible from each Worker

SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Dear Spark community, I faced the following issue with trying accessing data on S3a, my code is the following: val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Blaž Šnuderl
Try setting s3 credentials using keys specified here https://github.com/Aloisius/hadoop-s3a/blob/master/README.md Blaz On Dec 30, 2015 6:48 PM, "KOSTIANTYN Kudriavtsev" < kudryavtsev.konstan...@gmail.com> wrote: > Dear Spark community, > > I faced the following issue with trying accessing data

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Hi Blaz, I did, the same result Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 12:54 PM, Blaž Šnuderl wrote: > Try setting s3 credentials using keys specified here > https://github.com/Aloisius/hadoop-s3a/blob/master/README.md > > Blaz > On Dec 30, 2015 6:48 PM,

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Chris Fregly
couple things: 1) switch to IAM roles if at all possible - explicitly passing AWS credentials is a long and lonely road in the end 2) one really bad workaround/hack is to run a job that hits every worker and writes the credentials to the proper location (~/.awscredentials or whatever) ^^ i

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Chris, thanks for the hist with AIM roles, but in my case I need to run different jobs with different S3 permissions on the same cluster, so this approach doesn't work for me as far as I understood it Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Jerry Lam
Hi Kostiantyn, Can you define those properties in hdfs-site.xml and make sure it is visible in the class path when you spark-submit? It looks like a conf sourcing issue to me. Cheers, Sent from my iPhone > On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev >

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Hi Jerry, I want to run different jobs on different S3 buckets - different AWS creds - on the same instances. Could you shed some light if it's possible to achieve with hdfs-site? Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam wrote: > Hi

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Jerry Lam
Hi Kostiantyn, I want to confirm that it works first by using hdfs-site.xml. If yes, you could define different spark-{user-x}.conf and source them during spark-submit. let us know if hdfs-site.xml works first. It should. Best Regards, Jerry Sent from my iPhone > On 30 Dec, 2015, at 2:31