For lack of a better solution I am using ŒAWS s3 copy¹ to copy my files
locally and Œhadoop fs ­put ./tmp/* Œ to transfer them. In general put works
much better with a smaller number of big files compared to a large number of
small files

Your milage may vary

Andy

From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Wednesday, July 27, 2016 at 4:25 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  how to copy local files to hdfs quickly?

> I have a spark streaming app that saves JSON files to s3:// . It works fine
> 
> Now I need to calculate some basic summary stats and am running into horrible
> performance problems.
> 
> I want to run a test to see if reading from hdfs instead of s3 makes
> difference. I am able to quickly copy my the data from s3 to a machine in my
> cluster how ever hadoop fs ­put is pain fully slow. Is there a better way to
> copy large data to hdfs?
> 
> I should mention I am not using EMR . I.E. According to AWS support there is
> no way to have Œ$aws s3¹ copy directory to hdfs://
> 
> Hadoop distcp can not copy files from the local files system
> 
> Thanks in advance
> 
> Andy
> 
> 
> 
> 


Reply via email to