You can use mapValues to ensure partitioning is not lost.
From: Brian London <brianmlon...@gmail.com<mailto:brianmlon...@gmail.com>>
Date: Monday, February 22, 2016 at 1:21 PM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: map operation c
The problem is that your new mapped values may be in the wrong
partition, according to your partitioner. Look for methods that have a
preservesPartitioning flag, which is a way to indicate that you know
the partitioning remains correct. (Like, you partition by keys and
didn't change the keys in
It appears that when a custom partitioner is applied in a groupBy
operation, it is not propagated through subsequent non-shuffle operations.
Is this intentional? Is there any way to carry custom partitioning through
maps?
I've uploaded a gist that exhibits the behavior.