Hi all, I am working with Spark 1.6, scala and have a big dataset divided into several small files. My question is: right now the read operation takes really long time and often has RDD warnings. Is there a way I can read the files in parallel, that all nodes or workers read the file at the same time? Many thanksJunjie
- How to parallel read files in a directory Junjie Qian
- Re: How to parallel read files in a directory Jakob Odersky
- Re: How to parallel read files in a directory Arkadiusz Bicz
- Re: How to parallel read files in a directory Jörn Franke