Hi Junjie, How do you access the files currently? Have you considered using hdfs? It's designed to be distributed across a cluster and Spark has built-in support.
Best, --Jakob On Feb 11, 2016 9:33 AM, "Junjie Qian" <qian.jun...@outlook.com> wrote: > Hi all, > > I am working with Spark 1.6, scala and have a big dataset divided into > several small files. > > My question is: right now the read operation takes really long time and > often has RDD warnings. Is there a way I can read the files in parallel, > that all nodes or workers read the file at the same time? > > Many thanks > Junjie >