Hi,
I have inpot data that are many of very small files containing one .json.
For performance reasons (I use PySpark) I have to do repartioning, currently I
do:
sc.textFile(files).coalesce(100))
Problem is that I have to guess the number of partitions in a such way that
it's as fast as possible and I am still on the sefe side with the RAM memory.
So this is quiet difficult.
For this reason I would like to ask if there is some way, how to replace
coalesce(100) by something that creates N partitions of the given size? I went
through the documentation, but I was not able to find some way, how to do that.
thank you in advance for any help or advice.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org