When I call rdd() on a DataFrame, it ends the current stage and starts a new one that just maps the DataFrame to rdd and nothing else. It doesn't seem to do a shuffle (which is good and expected), but then why does why is there a separate stage?
I also thought that stages only end when there's a shuffle or the job ends with the action that triggered the job. Thanks.