Hello, I have 50,000 items parallelized into an RDD with 10 partitions, I would like to evenly split the items over the partitions so: 50,000/10 = 5,000 in each RDD partition.
What I get instead is the following (partition index, partition count): [(0, 4096), (1, 5120), (2, 5120), (3, 5120), (4, 5120), (5, 5120), (6, 5120), (7, 5120), (8, 5120), (9, 4944)] the total is correct (4096 + 4944 + 8*5120 = 50,000) but the partitions are imbalanced. Is there a way to do that? Thank you, Ayman