Hello everyone, Here is an issue that i am facing in partitioning dtafarame.
I have a dataframe which called data_df. It is look like: Group_Id | Object_Id | Trajectory 1 | obj1 | Traj1 2 | obj2 | Traj2 1 | obj3 | Traj3 3 | obj4 | Traj4 2 | obj5 | Traj5 This dataframe has 5045 rows where each row has value in Group_Id from 1 to 7, and the number of rows per group_id is arbitrary. I want to split the rdd which produced by from this dataframe in 7 partitions one for each group_id and then apply mapPartitions() where i call function custom_func(). How can i create these partitions from this dataframe? Should i first apply group by (create the grouped_df) in order to create a dataframe with 7 rows and then call partitioned_rdd=grouped_df.rdd.mapPartitions()? Which is the optimal way to do it? Thank you in advance