[jira] [Created] (LIVY-504) Pyspark sqlContext behavior does not my spark shell

Adam Bronte (JIRA) Tue, 28 Aug 2018 13:54:17 -0700

Adam Bronte created LIVY-504:
--------------------------------

             Summary: Pyspark sqlContext behavior does not my spark shell
                 Key: LIVY-504
                 URL: https://issues.apache.org/jira/browse/LIVY-504
             Project: Livy
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.5.0
         Environment: AWS EMR 5.16.0
            Reporter: Adam Bronte



On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
context and sqlContext compared to the pyspark shell.

For example running this through the pyspark shell works:
{code:java}
[root@ip-10-0-0-32 ~]# pyspark
Python 2.7.14 (default, May 2 2018, 18:31:34)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 2.7.14 (default, May 2 2018 18:31:34)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> my_sql_context = SQLContext.getOrCreate(sc)
>>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
>>> print(df.count())
67556724
{code}
But through Livy, the same code throws an exception
{code:java}
from pyspark.sql import SQLContext
my_sql_context = SQLContext.getOrCreate(sc)
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
    return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
    self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
Also trying to use the default initialized sqlContext throws the same error
{code:java}
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
    return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
    self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
In both the spark shell and the livy versions, the objects look the same.

pyspark shell:
{code:java}
>>> print(sc)
<SparkContext master=yarn appName=PySparkShell>
>>> print(sqlContext)
<pyspark.sql.context.SQLContext object at 0x7fd15dfc3450>
>>> print(my_sql_context)
<pyspark.sql.context.SQLContext object at 0x7fd15dfc3450>{code}
livy:
{code:java}
print(sc)
<SparkContext master=yarn appName=livy-session-1>

print(sqlContext)
<pyspark.sql.context.SQLContext object at 0x7f478c06b850>

print(my_sql_context)
<pyspark.sql.context.SQLContext object at 0x7f478c06b850>{code}
I'm running this through sparkmagic but also have confirmed this is the same 
behavior when calling the api directly.
{code:java}
curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
application/json" localhost:8998/sessions | python -m json.tool
{
    "appId": null,
    "appInfo": {
        "driverLogUrl": null,
        "sparkUiUrl": null
    },
    "id": 3,
    "kind": "pyspark",
    "log": [
        "stdout: ",
        "\nstderr: ",
        "\nYARN Diagnostics: "
    ],
    "owner": null,
    "proxyUser": null,
    "state": "starting"
}
{code}
{code:java}
curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: 
application/json' -d '{"code":"df = 
sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m 
json.tool
{
    "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")",
    "id": 1,
    "output": null,
    "progress": 0.0,
    "state": "running"
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (LIVY-504) Pyspark sqlContext behavior does not my spark shell

Reply via email to