Hi

I have a general question: I have 1.6 mil small files, about 200G all put
together. I want to put them on hdfs for spark processing.
I know sequence file is the way to go because putting small files on hdfs
is not correct practice. Also, I can write a code to consolidate small
files to seq files locally.
My question: is there any way to do this in parallel, for example using
spark or mr or anything else.

Thanks
Ayan

Reply via email to