Re: Reading too many files

2022-10-05 Thread Enrico Minack
rquet") takes forever? How long do individual tasks take? How many tasks are there for this line? Where are the Parquet files stored? Where does the Spark job run? Enrico Am 03.10.22 um 18:22 schrieb Sachit Murarka: Hello, I am reading too many files in Spark 3.2(Parquet) . It is not

Re: Reading too many files

2022-10-04 Thread Artemis User
Read by default can't be parallelized in a Spark job, and doing your own multi-threaded programming in a Spark program isn't a good idea.  Adding fast disk I/O and increase RAM may speed things up, but won't help with parallelization. You may have to be more creative here.  One option would

Re: Reading too many files

2022-10-03 Thread Henrik Pang
you may need a large cluster memory and fast disk IO. Sachit Murarka wrote: Can anyone please suggest if there is any property to improve the parallel reads? I am reading more than 25000 files . -- Simple Mail https://simplemail.co.in/

Re: Reading too many files

2022-10-03 Thread Sid
Are you trying to run on cloud ? On Mon, 3 Oct 2022, 21:55 Sachit Murarka, wrote: > Hello, > > I am reading too many files in Spark 3.2(Parquet) . It is not giving any > error in the logs. But after spark.read.parquet , it is not able to proceed > further. > Can any

Reading too many files

2022-10-03 Thread Sachit Murarka
Hello, I am reading too many files in Spark 3.2(Parquet) . It is not giving any error in the logs. But after spark.read.parquet , it is not able to proceed further. Can anyone please suggest if there is any property to improve the parallel reads? I am reading more than 25000 files . Kind Regards