Re: Reading too many files

Artemis User Tue, 04 Oct 2022 17:02:42 -0700

Read by default can't be parallelized in a Spark job, and doing your ownmulti-threaded programming in a Spark program isn't a good idea. Addingfast disk I/O and increase RAM may speed things up, but won't help withparallelization. You may have to be more creative here. One optionwould be, If each file or groups of files can be processedindependently, you can create a script or program on the client side tospawn multiple jobs and achieve parallel processing that way...


On 10/3/22 7:29 PM, Henrik Pang wrote:

you may need a large cluster memory and fast disk IO.
Sachit Murarka wrote:
Can anyone please suggest if there is any property to improve theparallel reads? I am reading more than 25000 files .



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Reading too many files

Reply via email to