Small files

ayan guha Mon, 12 Sep 2016 03:39:53 -0700

Hi

I have a general question: I have 1.6 mil small files, about 200G all put
together. I want to put them on hdfs for spark processing.
I know sequence file is the way to go because putting small files on hdfs
is not correct practice. Also, I can write a code to consolidate small
files to seq files locally.
My question: is there any way to do this in parallel, for example using
spark or mr or anything else.


Thanks
Ayan

Small files

Reply via email to