Hi,

I have inpot data that are many of very small files containing one .json.
For performance reasons (I use PySpark) I have to do repartioning, currently I 
do:

sc.textFile(files).coalesce(100))
 
Problem is that I have to guess the number of partitions in a such way that 
it's as fast as possible and I am still on the sefe side with the RAM memory. 
So this is quiet difficult.

For this reason I would like to ask if there is some way, how to replace 
coalesce(100) by something that creates N partitions of the given size? I went 
through the documentation, but I was not able to find some way, how to do that.

thank you in advance for any help or advice. 
 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to