When I call rdd() on a DataFrame, it ends the current stage and starts a
new one that just maps the DataFrame to rdd and nothing else. It doesn't
seem to do a shuffle (which is good and expected), but then why does why is
there a separate stage?

I also thought that stages only end when there's a shuffle or the job ends
with the action that triggered the job.

Thanks.

Reply via email to