Re: How to parallel read files in a directory

Jakob Odersky Thu, 11 Feb 2016 11:25:10 -0800

Hi Junjie,

How do you access the files currently? Have you considered using hdfs? It's
designed to be distributed across a cluster and Spark has built-in support.


Best,
--Jakob
On Feb 11, 2016 9:33 AM, "Junjie Qian" <qian.jun...@outlook.com> wrote:

> Hi all,
>
> I am working with Spark 1.6, scala and have a big dataset divided into
> several small files.
>
> My question is: right now the read operation takes really long time and
> often has RDD warnings. Is there a way I can read the files in parallel,
> that all nodes or workers read the file at the same time?
>
> Many thanks
> Junjie
>

Re: How to parallel read files in a directory

Reply via email to