We use the spark-ec2 script to create AWS clusters as needed (we do not use
AWS EMR)
1. will we get better performance if we copy data to HDFS before we run
instead of reading directly from S3?
 2. What is a good way to move results from HDFS to S3?


It seems like there are many ways to bulk copy to s3. Many of them require
we explicitly use the AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
<mailto:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt> . This seems like a
bad idea? 

What would you recommend?

Thanks

Andy




Reply via email to