Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-20 Thread Ben Roling
Ah, yes, I clearly didn't read closely enough!  It was right there in front
of me...

Thanks!

On Wed, Mar 18, 2020, 12:36 PM Maciej Szymkiewicz 
wrote:

> Hi Ben,
>
> Please note that `_sc` is not a SQLContext. It is a SparkContext, which
> is used primarily for internal calls.
>
> SQLContext is exposed through `sql_ctx`
> (
> https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80
> )
>
> On 3/17/20 5:53 PM, Ben Roling wrote:
> > I tried this on the users mailing list but didn't get traction.  It's
> > probably more appropriate here anyway.
> >
> > I've noticed that DataSet.sqlContext is public in Scala but the
> > equivalent (DataFrame._sc) in PySpark is named as if it should be
> > treated as private.
> >
> > Is this intentional?  If so, what's the rationale?  If not, then it
> > feels like a bug and DataFrame should have some form of public access
> > back to the context/session.  I'm happy to log the bug but thought I
> > would ask here first.  Thanks!
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> Keybase: https://keybase.io/zero323
> Gigs: https://www.codementor.io/@zero323
> PGP: C095AA7F33E6123A
>
>
>


Re: Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-18 Thread Maciej Szymkiewicz
Hi Ben,

Please note that `_sc` is not a SQLContext. It is a SparkContext, which
is used primarily for internal calls.

SQLContext is exposed through `sql_ctx`
(https://github.com/apache/spark/blob/8bfaa62f2fcc942dd99a63b20366167277bce2a1/python/pyspark/sql/dataframe.py#L80)

On 3/17/20 5:53 PM, Ben Roling wrote:
> I tried this on the users mailing list but didn't get traction.  It's
> probably more appropriate here anyway.
>
> I've noticed that DataSet.sqlContext is public in Scala but the
> equivalent (DataFrame._sc) in PySpark is named as if it should be
> treated as private.
>
> Is this intentional?  If so, what's the rationale?  If not, then it
> feels like a bug and DataFrame should have some form of public access
> back to the context/session.  I'm happy to log the bug but thought I
> would ask here first.  Thanks!

-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: C095AA7F33E6123A




signature.asc
Description: OpenPGP digital signature


Scala vs PySpark Inconsistency: SQLContext/SparkSession access from DataFrame/DataSet

2020-03-17 Thread Ben Roling
I tried this on the users mailing list but didn't get traction.  It's
probably more appropriate here anyway.

I've noticed that DataSet.sqlContext is public in Scala but the equivalent
(DataFrame._sc) in PySpark is named as if it should be treated as private.

Is this intentional?  If so, what's the rationale?  If not, then it feels
like a bug and DataFrame should have some form of public access back to the
context/session.  I'm happy to log the bug but thought I would ask here
first.  Thanks!