I tried this on the users mailing list but didn't get traction. It's probably more appropriate here anyway.
I've noticed that DataSet.sqlContext is public in Scala but the equivalent (DataFrame._sc) in PySpark is named as if it should be treated as private. Is this intentional? If so, what's the rationale? If not, then it feels like a bug and DataFrame should have some form of public access back to the context/session. I'm happy to log the bug but thought I would ask here first. Thanks!