Caching the partitioned_df <- this one, but you have to do the partitioning using something like sql("SELECT * FROM ... CLUSTER BY a") as there is no such operation exposed on dataframes.
2) Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-5354