I have a spark streaming app that saves JSON files to s3:// . It works fine

Now I need to calculate some basic summary stats and am running into
horrible performance problems.

I want to run a test to see if reading from hdfs instead of s3 makes
difference. I am able to quickly copy my the data from s3 to a machine in my
cluster how ever hadoop fs ­put is pain fully slow. Is there a better way to
copy large data to hdfs?

I should mention I am not using EMR . I.E. According to AWS support there is
no way to have Œ$aws s3¹ copy directory to hdfs://

Hadoop distcp can not copy files from the local files system

Thanks in advance

Andy






Reply via email to