parallelism of task executor worker threads during s3 reads

2016-05-12 Thread sanusha
I am using a spark cluster on Amazon (launched using spark-1.6-prebuilt-with-hadoop-2.6 spark-ec2 script) to run a scala driver application to read S3 object content in parallel. I have tried “s3n://bucket” with sc.textFile as well as set up an RDD with the S3 keys and then used java aws sdk

spark-ec2 hitting yum install issues

2016-04-14 Thread sanusha
I am using spark-1.6.1-prebuilt-with-hadoop-2.6 on mac. I am using the spark-ec2 to launch a cluster in Amazon VPC. The setup.sh script [run first thing on master after launch] uses pssh and tries to install it via 'yum install -y pssh'. This step always fails on the master AMI that the script