[jira] [Commented] (LIVY-457) PySpark `sqlContext.sparkSession` incorrect on Spark 2.x

Saisai Shao (JIRA) Mon, 16 Apr 2018 23:19:13 -0700

    [ 
https://issues.apache.org/jira/browse/LIVY-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440460#comment-16440460
 ]


Saisai Shao commented on LIVY-457:
----------------------------------

I made a PR about this issue (https://github.com/apache/incubator-livy/pull/86).

> PySpark `sqlContext.sparkSession` incorrect on Spark 2.x
> --------------------------------------------------------
>
>                 Key: LIVY-457
>                 URL: https://issues.apache.org/jira/browse/LIVY-457
>             Project: Livy
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>         Environment: RHEL6, Spark 2.1.2.1
>            Reporter: Dan Fike
>            Priority: Major
>
> It looks like the {{SQLContext}} we create in {{PySpark}} sessions isn't 
> constructed correctly. Compare how the behavior has changed between Livy 
> 0.4.0 and what is currently on {{master}} (0.6.0).
> Livy 0.4.0
> {code}
> $ curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
> application/json" localhost:8998/sessions | python -m json.tool
> $ curl --silent localhost:8998/sessions/1/statements -X POST -H 
> 'Content-Type: application/json' -d '{"code":"sqlContext.sparkSession"}' | 
> python -m json.tool
> $ curl --silent localhost:8998/sessions/1/statements/0 | python -m json.tool
> {
>     "id": 0,
>     "state": "available",
>     "output": {
>         "status": "ok",
>         "execution_count": 0,
>         "data": {
>             "text/plain": "<pyspark.sql.session.SparkSession object at 
> 0x15a26d0>"
>         }
>     },
>     "progress": 1.0
> }
> {code}
> Livy 0.6.0
> {code}
> $ curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
> application/json" localhost:8998/sessions | python -m json.tool
> $ curl --silent localhost:8998/sessions/0/statements -X POST -H 
> 'Content-Type: application/json' -d '{"code":"sqlContext.sparkSession"}' | 
> python -m json.tool
> $ curl --silent localhost:8998/sessions/0/statements/0 | python -m json.tool
> {
>     "id": 0,
>     "code": "sqlContext.sparkSession",
>     "state": "available",
>     "output": {
>         "status": "ok",
>         "execution_count": 0,
>         "data": {
>             "text/plain": "JavaObject id=o4"
>         }
>     },
>     "progress": 1.0
> }
> $ curl --silent localhost:8998/sessions/0/statements -X POST -H 
> 'Content-Type: application/json' -d 
> '{"code":"sqlContext.sparkSession.toString()"}' | python -m json.tool
> $ curl --silent localhost:8998/sessions/0/statements/1 | python -m json.tool
> {
>     "id": 1,
>     "code": "sqlContext.sparkSession.toString()",
>     "state": "available",
>     "output": {
>         "status": "ok",
>         "execution_count": 1,
>         "data": {
>             "text/plain": "'org.apache.spark.sql.hive.HiveContext@200334d0'"
>         }
>     },
>     "progress": 1.0
> }
> {code}
> Notice how the value of {{sqlContext.sparkSession}} went from a 
> {{pyspark.sql.session.SparkSession}} to a 
> {{org.apache.spark.sql.hive.HiveContext}}?
> I suspect this is because of the change @ 
> https://github.com/apache/incubator-livy/commit/c1aafeb6cb87f2bd7f4cb7cf538822b59fb34a9c#diff-c58e3946d3530f54014129c268988e01R563
>  passing {{jsqlc}} in as the second positional parameter to {{SQLContext}}, 
> whereas the diff @ 
> https://github.com/apache/spark/commit/89addd40abdacd65cc03ac8aa5f9cf3dd4a4c19b#diff-74ba016ef40c1cb268e14aee817d71bdR50
>  suggests it should be the _third_ positional parameter.
> I'd wager the fix is simply to explicitly pass that parameter as a keyword 
> argument instead.
> {code}
> sqlc = SQLContext(sc, jsqlContext=jsqlc)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (LIVY-457) PySpark `sqlContext.sparkSession` incorrect on Spark 2.x

Reply via email to