[jira] [Updated] (LIVY-504) Livy pyspark sqlContext behavior does not match pyspark shell
[ https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Bronte updated LIVY-504: - Summary: Livy pyspark sqlContext behavior does not match pyspark shell (was: Pyspark sqlContext behavior does not match pyspark shell) > Livy pyspark sqlContext behavior does not match pyspark shell > - > > Key: LIVY-504 > URL: https://issues.apache.org/jira/browse/LIVY-504 > Project: Livy > Issue Type: Bug > Components: Core >Affects Versions: 0.5.0 > Environment: AWS EMR 5.16.0 >Reporter: Adam Bronte >Priority: Major > > On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark > context and sqlContext compared to the pyspark shell. > For example running this through the pyspark shell works: > {code:java} > [root@ip-10-0-0-32 ~]# pyspark > Python 2.7.14 (default, May 2 2018, 18:31:34) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive > is set, falling back to uploading libraries under SPARK_HOME. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 > /_/ > Using Python version 2.7.14 (default, May 2 2018 18:31:34) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> my_sql_context = SQLContext.getOrCreate(sc) > >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') > >>> print(df.count()) > 67556724 > {code} > But through Livy, the same code throws an exception > {code:java} > from pyspark.sql import SQLContext > my_sql_context = SQLContext.getOrCreate(sc) > df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') > 'JavaMember' object has no attribute 'read' > Traceback (most recent call last): > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line > 433, in read > return DataFrameReader(self) > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", > line 70, in __init__ > self._jreader = spark._ssql_ctx.read() > AttributeError: 'JavaMember' object has no attribute 'read'{code} > Also trying to use the default initialized sqlContext throws the same error > {code:java} > df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet') > 'JavaMember' object has no attribute 'read' > Traceback (most recent call last): > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line > 433, in read > return DataFrameReader(self) > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", > line 70, in __init__ > self._jreader = spark._ssql_ctx.read() > AttributeError: 'JavaMember' object has no attribute 'read'{code} > In both the spark shell and the livy versions, the objects look the same. > pyspark shell: > {code:java} > >>> print(sc) > > >>> print(sqlContext) > > >>> print(my_sql_context) > {code} > livy: > {code:java} > print(sc) > > print(sqlContext) > > print(my_sql_context) > {code} > I'm running this through sparkmagic but also have confirmed this is the same > behavior when calling the api directly. > {code:java} > curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: > application/json" localhost:8998/sessions | python -m json.tool > { > "appId": null, > "appInfo": { > "driverLogUrl": null, > "sparkUiUrl": null > }, > "id": 3, > "kind": "pyspark", > "log": [ > "stdout: ", > "\nstderr: ", > "\nYARN Diagnostics: " > ], > "owner": null, > "proxyUser": null, > "state": "starting" > } > {code} > {code:java} > curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: > application/json' -d '{"code":"df = > sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m > json.tool > { > "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")", > "id": 1, > "output": null, > "progress": 0.0, > "state": "running" > } > {code} > When running on 0.4.0 both pyspark shell and livy versions worked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (LIVY-504) Pyspark sqlContext behavior does not match pyspark shell
[ https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Bronte updated LIVY-504: - Summary: Pyspark sqlContext behavior does not match pyspark shell (was: Pyspark sqlContext behavior does not my spark shell) > Pyspark sqlContext behavior does not match pyspark shell > > > Key: LIVY-504 > URL: https://issues.apache.org/jira/browse/LIVY-504 > Project: Livy > Issue Type: Bug > Components: Core >Affects Versions: 0.5.0 > Environment: AWS EMR 5.16.0 >Reporter: Adam Bronte >Priority: Major > > On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark > context and sqlContext compared to the pyspark shell. > For example running this through the pyspark shell works: > {code:java} > [root@ip-10-0-0-32 ~]# pyspark > Python 2.7.14 (default, May 2 2018, 18:31:34) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive > is set, falling back to uploading libraries under SPARK_HOME. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 > /_/ > Using Python version 2.7.14 (default, May 2 2018 18:31:34) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> my_sql_context = SQLContext.getOrCreate(sc) > >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') > >>> print(df.count()) > 67556724 > {code} > But through Livy, the same code throws an exception > {code:java} > from pyspark.sql import SQLContext > my_sql_context = SQLContext.getOrCreate(sc) > df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') > 'JavaMember' object has no attribute 'read' > Traceback (most recent call last): > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line > 433, in read > return DataFrameReader(self) > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", > line 70, in __init__ > self._jreader = spark._ssql_ctx.read() > AttributeError: 'JavaMember' object has no attribute 'read'{code} > Also trying to use the default initialized sqlContext throws the same error > {code:java} > df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet') > 'JavaMember' object has no attribute 'read' > Traceback (most recent call last): > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line > 433, in read > return DataFrameReader(self) > File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", > line 70, in __init__ > self._jreader = spark._ssql_ctx.read() > AttributeError: 'JavaMember' object has no attribute 'read'{code} > In both the spark shell and the livy versions, the objects look the same. > pyspark shell: > {code:java} > >>> print(sc) > > >>> print(sqlContext) > > >>> print(my_sql_context) > {code} > livy: > {code:java} > print(sc) > > print(sqlContext) > > print(my_sql_context) > {code} > I'm running this through sparkmagic but also have confirmed this is the same > behavior when calling the api directly. > {code:java} > curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: > application/json" localhost:8998/sessions | python -m json.tool > { > "appId": null, > "appInfo": { > "driverLogUrl": null, > "sparkUiUrl": null > }, > "id": 3, > "kind": "pyspark", > "log": [ > "stdout: ", > "\nstderr: ", > "\nYARN Diagnostics: " > ], > "owner": null, > "proxyUser": null, > "state": "starting" > } > {code} > {code:java} > curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: > application/json' -d '{"code":"df = > sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m > json.tool > { > "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")", > "id": 1, > "output": null, > "progress": 0.0, > "state": "running" > } > {code} > When running on 0.4.0 both pyspark shell and livy versions worked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (LIVY-504) Pyspark sqlContext behavior does not my spark shell
[ https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Bronte updated LIVY-504: - Description: On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark context and sqlContext compared to the pyspark shell. For example running this through the pyspark shell works: {code:java} [root@ip-10-0-0-32 ~]# pyspark Python 2.7.14 (default, May 2 2018, 18:31:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 2.7.14 (default, May 2 2018 18:31:34) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> my_sql_context = SQLContext.getOrCreate(sc) >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') >>> print(df.count()) 67556724 {code} But through Livy, the same code throws an exception {code:java} from pyspark.sql import SQLContext my_sql_context = SQLContext.getOrCreate(sc) df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader = spark._ssql_ctx.read() AttributeError: 'JavaMember' object has no attribute 'read'{code} Also trying to use the default initialized sqlContext throws the same error {code:java} df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader = spark._ssql_ctx.read() AttributeError: 'JavaMember' object has no attribute 'read'{code} In both the spark shell and the livy versions, the objects look the same. pyspark shell: {code:java} >>> print(sc) >>> print(sqlContext) >>> print(my_sql_context) {code} livy: {code:java} print(sc) print(sqlContext) print(my_sql_context) {code} I'm running this through sparkmagic but also have confirmed this is the same behavior when calling the api directly. {code:java} curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions | python -m json.tool { "appId": null, "appInfo": { "driverLogUrl": null, "sparkUiUrl": null }, "id": 3, "kind": "pyspark", "log": [ "stdout: ", "\nstderr: ", "\nYARN Diagnostics: " ], "owner": null, "proxyUser": null, "state": "starting" } {code} {code:java} curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: application/json' -d '{"code":"df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m json.tool { "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")", "id": 1, "output": null, "progress": 0.0, "state": "running" } {code} When running on 0.4.0 both pyspark shell and livy versions worked. was: On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark context and sqlContext compared to the pyspark shell. For example running this through the pyspark shell works: {code:java} [root@ip-10-0-0-32 ~]# pyspark Python 2.7.14 (default, May 2 2018, 18:31:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 2.7.14 (default, May 2 2018 18:31:34) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> my_sql_context = SQLContext.getOrCreate(sc) >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') >>> print(df.count()) 67556724 {code} But through Livy, the same code throws an exception {code:java} from pyspark.sql import SQLContext my_sql_context = SQLContext.getOrCreate(sc) df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File
[jira] [Updated] (LIVY-504) Pyspark sqlContext behavior does not my spark shell
[ https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Bronte updated LIVY-504: - Description: On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark context and sqlContext compared to the pyspark shell. For example running this through the pyspark shell works: {code:java} [root@ip-10-0-0-32 ~]# pyspark Python 2.7.14 (default, May 2 2018, 18:31:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 2.7.14 (default, May 2 2018 18:31:34) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> my_sql_context = SQLContext.getOrCreate(sc) >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') >>> print(df.count()) 67556724 {code} But through Livy, the same code throws an exception {code:java} from pyspark.sql import SQLContext my_sql_context = SQLContext.getOrCreate(sc) df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader = spark._ssql_ctx.read() AttributeError: 'JavaMember' object has no attribute 'read'{code} Also trying to use the default initialized sqlContext throws the same error {code:java} df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader = spark._ssql_ctx.read() AttributeError: 'JavaMember' object has no attribute 'read'{code} In both the spark shell and the livy versions, the objects look the same. pyspark shell: {code:java} >>> print(sc) >>> print(sqlContext) >>> print(my_sql_context) {code} livy: {code:java} print(sc) print(sqlContext) print(my_sql_context) {code} I'm running this through sparkmagic but also have confirmed this is the same behavior when calling the api directly. {code:java} curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions | python -m json.tool { "appId": null, "appInfo": { "driverLogUrl": null, "sparkUiUrl": null }, "id": 3, "kind": "pyspark", "log": [ "stdout: ", "\nstderr: ", "\nYARN Diagnostics: " ], "owner": null, "proxyUser": null, "state": "starting" } {code} {code:java} curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: application/json' -d '{"code":"df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m json.tool { "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")", "id": 1, "output": null, "progress": 0.0, "state": "running" } {code} was: On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark context and sqlContext compared to the pyspark shell. For example running this through the pyspark shell works: {code:java} [root@ip-10-0-0-32 ~]# pyspark Python 2.7.14 (default, May 2 2018, 18:31:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 2.7.14 (default, May 2 2018 18:31:34) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> my_sql_context = SQLContext.getOrCreate(sc) >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') >>> print(df.count()) 67556724 {code} But through Livy, the same code throws an exception {code:java} from pyspark.sql import SQLContext my_sql_context = SQLContext.getOrCreate(sc) df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader =
[jira] [Created] (LIVY-504) Pyspark sqlContext behavior does not my spark shell
Adam Bronte created LIVY-504: Summary: Pyspark sqlContext behavior does not my spark shell Key: LIVY-504 URL: https://issues.apache.org/jira/browse/LIVY-504 Project: Livy Issue Type: Bug Components: Core Affects Versions: 0.5.0 Environment: AWS EMR 5.16.0 Reporter: Adam Bronte On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark context and sqlContext compared to the pyspark shell. For example running this through the pyspark shell works: {code:java} [root@ip-10-0-0-32 ~]# pyspark Python 2.7.14 (default, May 2 2018, 18:31:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 2.7.14 (default, May 2 2018 18:31:34) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> my_sql_context = SQLContext.getOrCreate(sc) >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') >>> print(df.count()) 67556724 {code} But through Livy, the same code throws an exception {code:java} from pyspark.sql import SQLContext my_sql_context = SQLContext.getOrCreate(sc) df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader = spark._ssql_ctx.read() AttributeError: 'JavaMember' object has no attribute 'read'{code} Also trying to use the default initialized sqlContext throws the same error {code:java} df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet') 'JavaMember' object has no attribute 'read' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 433, in read return DataFrameReader(self) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 70, in __init__ self._jreader = spark._ssql_ctx.read() AttributeError: 'JavaMember' object has no attribute 'read'{code} In both the spark shell and the livy versions, the objects look the same. pyspark shell: {code:java} >>> print(sc) >>> print(sqlContext) >>> print(my_sql_context) {code} livy: {code:java} print(sc) print(sqlContext) print(my_sql_context) {code} I'm running this through sparkmagic but also have confirmed this is the same behavior when calling the api directly. {code:java} curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions | python -m json.tool { "appId": null, "appInfo": { "driverLogUrl": null, "sparkUiUrl": null }, "id": 3, "kind": "pyspark", "log": [ "stdout: ", "\nstderr: ", "\nYARN Diagnostics: " ], "owner": null, "proxyUser": null, "state": "starting" } {code} {code:java} curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: application/json' -d '{"code":"df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m json.tool { "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")", "id": 1, "output": null, "progress": 0.0, "state": "running" } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)