binaryFiles() for 1 million files, too much memory required

Konstantinos Kougios Wed, 01 Jul 2015 09:07:57 -0700

Once again I am trying to read a directory tree using binary files.

My directory tree has a root dir ROOTDIR and subdirs where the files arelocated, i.e.


ROOTDIR/1
ROOTDIR/2
ROOTDIR/..
ROOTDIR/100

A total of 1 mil files split into 100 sub dirs

Using binaryFiles requires too much memory on the driver. I've alsotried rdds of binaryFiles(each subdir) and then ++ those andrdd.saveAsObjectFile("outputDir"). That causes a lot of memory to berequired in the executors!


What is the proper way to use binaryFiles with this number of files?

Thanks





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

binaryFiles() for 1 million files, too much memory required

Reply via email to