Re: s3a file system and spark deployment mode

2015-10-17 Thread Raghavendra Pandey
You can add classpath info in hadoop env file... Add the following line to your $HADOOP_HOME/etc/hadoop/hadoop-env.sh export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/* Add the following line to $SPARK_HOME/conf/spark-env.sh export

Re: s3a file system and spark deployment mode

2015-10-16 Thread Steve Loughran
> On 15 Oct 2015, at 19:04, Scott Reynolds wrote: > > List, > > Right now we build our spark jobs with the s3a hadoop client. We do this > because our machines are only allowed to use IAM access to the s3 store. We > can build our jars with the s3a filesystem and the

Re: s3a file system and spark deployment mode

2015-10-16 Thread Scott Reynolds
hmm I tried using --jars and that got passed to MasterArguments and that doesn't work :-( https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala Same with Worker:

Re: s3a file system and spark deployment mode

2015-10-15 Thread Raghavendra Pandey
You can use spark 1.5.1 with no hadoop and hadoop 2.7.1.. Hadoop 2.7.1 is more mature for s3a access. You also need to set hadoop tools dir into hadoop classpath... Raghav On Oct 16, 2015 1:09 AM, "Scott Reynolds" wrote: > We do not use EMR. This is deployed on Amazon VMs

Re: s3a file system and spark deployment mode

2015-10-15 Thread Spark Newbie
Are you using EMR? You can install Hadoop-2.6.0 along with Spark-1.5.1 in your EMR cluster. And that brings s3a jars to the worker nodes and it becomes available to your application. On Thu, Oct 15, 2015 at 11:04 AM, Scott Reynolds wrote: > List, > > Right now we build our

Re: s3a file system and spark deployment mode

2015-10-15 Thread Scott Reynolds
We do not use EMR. This is deployed on Amazon VMs We build Spark with Hadoop-2.6.0 but that does not include the s3a filesystem nor the Amazon AWS SDK On Thu, Oct 15, 2015 at 12:26 PM, Spark Newbie wrote: > Are you using EMR? > You can install Hadoop-2.6.0 along with

s3a file system and spark deployment mode

2015-10-15 Thread Scott Reynolds
List, Right now we build our spark jobs with the s3a hadoop client. We do this because our machines are only allowed to use IAM access to the s3 store. We can build our jars with the s3a filesystem and the aws sdk just fine and this jars run great in *client mode*. We would like to move from