Re: Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4

2015-03-03 Thread Ted Yu
If you can use hadoop 2.6.0 binary, you can use s3a

s3a is being polished in the upcoming 2.7.0 release:
https://issues.apache.org/jira/browse/HADOOP-11571

Cheers

On Tue, Mar 3, 2015 at 9:44 AM, Ankur Srivastava ankur.srivast...@gmail.com
 wrote:

 Hi,

 We recently upgraded to Spark 1.2.1 - Hadoop 2.4 binary. We are not having
 any other dependency on hadoop jars, except for reading our source files
 from S3.

 Since we have upgraded to the latest version our reads from S3 have
 considerably slowed down. For some jobs we see the read from S3 is stalled
 for a long time and then it starts.

 Is there a known issue with S3 or do we need to upgrade any settings? The
 only settings that we are using are:
 sc.hadoopConfiguration().set(fs.s3n.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem);

 sc.hadoopConfiguration().set(fs.s3n.awsAccessKeyId, someKey);

  sc.hadoopConfiguration().set(fs.s3n.awsSecretAccessKey, someSecret);


 Thanks for help!!

 - Ankur



Re: Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4

2015-03-03 Thread Ankur Srivastava
Thanks a lot Ted!!

On Tue, Mar 3, 2015 at 9:53 AM, Ted Yu yuzhih...@gmail.com wrote:

 If you can use hadoop 2.6.0 binary, you can use s3a

 s3a is being polished in the upcoming 2.7.0 release:
 https://issues.apache.org/jira/browse/HADOOP-11571

 Cheers

 On Tue, Mar 3, 2015 at 9:44 AM, Ankur Srivastava 
 ankur.srivast...@gmail.com wrote:

 Hi,

 We recently upgraded to Spark 1.2.1 - Hadoop 2.4 binary. We are not
 having any other dependency on hadoop jars, except for reading our source
 files from S3.

 Since we have upgraded to the latest version our reads from S3 have
 considerably slowed down. For some jobs we see the read from S3 is stalled
 for a long time and then it starts.

 Is there a known issue with S3 or do we need to upgrade any settings? The
 only settings that we are using are:
 sc.hadoopConfiguration().set(fs.s3n.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem);

 sc.hadoopConfiguration().set(fs.s3n.awsAccessKeyId, someKey);

  sc.hadoopConfiguration().set(fs.s3n.awsSecretAccessKey, someSecret);


 Thanks for help!!

 - Ankur