newbie HDFS S3 best practices

Andy Davidson Tue, 15 Mar 2016 11:45:35 -0700

We use the spark-ec2 script to create AWS clusters as needed (we do not use
AWS EMR)
1. will we get better performance if we copy data to HDFS before we run
instead of reading directly from S3?
 2. What is a good way to move results from HDFS to S3?



It seems like there are many ways to bulk copy to s3. Many of them require
we explicitly use the AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
<mailto:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt> . This seems like a
bad idea? 

What would you recommend?

Thanks

Andy

newbie HDFS S3 best practices

Reply via email to