rquet")
takes forever?
How long do individual tasks take? How many tasks are there for this line?
Where are the Parquet files stored? Where does the Spark job run?
Enrico
Am 03.10.22 um 18:22 schrieb Sachit Murarka:
Hello,
I am reading too many files in Spark 3.2(Parquet) . It is not
Read by default can't be parallelized in a Spark job, and doing your own
multi-threaded programming in a Spark program isn't a good idea. Adding
fast disk I/O and increase RAM may speed things up, but won't help with
parallelization. You may have to be more creative here. One option
would
you may need a large cluster memory and fast disk IO.
Sachit Murarka wrote:
Can anyone please suggest if there is any property to improve the
parallel reads? I am reading more than 25000 files .
--
Simple Mail
https://simplemail.co.in/
Are you trying to run on cloud ?
On Mon, 3 Oct 2022, 21:55 Sachit Murarka, wrote:
> Hello,
>
> I am reading too many files in Spark 3.2(Parquet) . It is not giving any
> error in the logs. But after spark.read.parquet , it is not able to proceed
> further.
> Can any
Hello,
I am reading too many files in Spark 3.2(Parquet) . It is not giving any
error in the logs. But after spark.read.parquet , it is not able to proceed
further.
Can anyone please suggest if there is any property to improve the parallel
reads? I am reading more than 25000 files .
Kind Regards