load large number of files from s3
Hi, We have 30 million small files (100k each) on s3. I want to know how bad it is to load them directly from s3 ( eg driver memory, io, executor memory, s3 reliability) before merge or distcp them. Anybody has experience? Thanks in advance! Regards, Shawn -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/load-large-number-of-files-from-s3-tp28062.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
load large number of files from s3
Hi, We have 30 million small files (100k each) on s3. I want to know how bad it is to load them directly from s3 ( eg driver memory, io, executor memory, s3 reliability) before merge or distcp them. Anybody has experience? Thanks in advance! Regards, Shawn