Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet
Ah, yes, I clearly didn't read closely enough! It was right there in front of me... Thanks! On Wed, Mar 18, 2020, 12:36 PM Maciej Szymkiewicz wrote: > Hi Ben, > > Please note that `_sc` is not a SQLContext. It is a SparkContext, which > is used primarily for internal calls. > > SQLContext is exposed through `sql_ctx` > ( > https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80 > ) > > On 3/17/20 5:53 PM, Ben Roling wrote: > > I tried this on the users mailing list but didn't get traction. It's > > probably more appropriate here anyway. > > > > I've noticed that DataSet.sqlContext is public in Scala but the > > equivalent (DataFrame._sc) in PySpark is named as if it should be > > treated as private. > > > > Is this intentional? If so, what's the rationale? If not, then it > > feels like a bug and DataFrame should have some form of public access > > back to the context/session. I'm happy to log the bug but thought I > > would ask here first. Thanks! > > -- > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > Keybase: https://keybase.io/zero323 > Gigs: https://www.codementor.io/@zero323 > PGP: C095AA7F33E6123A > > >
Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet
Hi Ben, Please note that `_sc` is not a SQLContext. It is a SparkContext, which is used primarily for internal calls. SQLContext is exposed through `sql_ctx` (https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80) On 3/17/20 5:53 PM, Ben Roling wrote: > I tried this on the users mailing list but didn't get traction. It's > probably more appropriate here anyway. > > I've noticed that DataSet.sqlContext is public in Scala but the > equivalent (DataFrame._sc) in PySpark is named as if it should be > treated as private. > > Is this intentional? If so, what's the rationale? If not, then it > feels like a bug and DataFrame should have some form of public access > back to the context/session. I'm happy to log the bug but thought I > would ask here first. Thanks! -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: C095AA7F33E6123A signature.asc Description: OpenPGP digital signature
Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet
I tried this on the users mailing list but didn't get traction. It's probably more appropriate here anyway. I've noticed that DataSet.sqlContext is public in Scala but the equivalent (DataFrame._sc) in PySpark is named as if it should be treated as private. Is this intentional? If so, what's the rationale? If not, then it feels like a bug and DataFrame should have some form of public access back to the context/session. I'm happy to log the bug but thought I would ask here first. Thanks!