Hi I have a general question: I have 1.6 mil small files, about 200G all put together. I want to put them on hdfs for spark processing. I know sequence file is the way to go because putting small files on hdfs is not correct practice. Also, I can write a code to consolidate small files to seq files locally. My question: is there any way to do this in parallel, for example using spark or mr or anything else.
Thanks Ayan