[jira] [Created] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3

2018-08-28 Thread shanyu zhao (JIRA)
shanyu zhao created LIVY-505:


 Summary: sparkR.session failed with "invalid jobj 1" error in 
Spark 2.3
 Key: LIVY-505
 URL: https://issues.apache.org/jira/browse/LIVY-505
 Project: Livy
  Issue Type: Bug
  Components: Interpreter
Affects Versions: 0.5.0, 0.5.1
Reporter: shanyu zhao


In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type:
{code:java}
%sparkr
sparkR.session(){code}
You will see error:

[1] "Error in writeJobj(con, object): invalid jobj 1"

In a successful case with older livy and spark versions, we see something like 
this:

Java ref type org.apache.spark.sql.SparkSession id 1

This indicates isValidJobj() function in Spark code returned false for 
SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI:
{code:java}
isValidJobj <- function(jobj) {
  if (exists(".scStartTime", envir = .sparkREnv)) {
    jobj$appId == get(".scStartTime", envir = .sparkREnv)
  } else {
    FALSE
  }
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-504) Livy pyspark sqlContext behavior does not match pyspark shell

2018-08-28 Thread Adam Bronte (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Bronte updated LIVY-504:
-
Summary: Livy pyspark sqlContext behavior does not match pyspark shell  
(was: Pyspark sqlContext behavior does not match pyspark shell)

> Livy pyspark sqlContext behavior does not match pyspark shell
> -
>
> Key: LIVY-504
> URL: https://issues.apache.org/jira/browse/LIVY-504
> Project: Livy
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.5.0
> Environment: AWS EMR 5.16.0
>Reporter: Adam Bronte
>Priority: Major
>
> On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
> context and sqlContext compared to the pyspark shell.
> For example running this through the pyspark shell works:
> {code:java}
> [root@ip-10-0-0-32 ~]# pyspark
> Python 2.7.14 (default, May 2 2018, 18:31:34)
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
> is set, falling back to uploading libraries under SPARK_HOME.
> Welcome to
>  __
> / __/__ ___ _/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /__ / .__/\_,_/_/ /_/\_\ version 2.3.1
> /_/
> Using Python version 2.7.14 (default, May 2 2018 18:31:34)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> my_sql_context = SQLContext.getOrCreate(sc)
> >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
> >>> print(df.count())
> 67556724
> {code}
> But through Livy, the same code throws an exception
> {code:java}
> from pyspark.sql import SQLContext
> my_sql_context = SQLContext.getOrCreate(sc)
> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
> 'JavaMember' object has no attribute 'read'
> Traceback (most recent call last):
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
> 433, in read
> return DataFrameReader(self)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
> line 70, in __init__
> self._jreader = spark._ssql_ctx.read()
> AttributeError: 'JavaMember' object has no attribute 'read'{code}
> Also trying to use the default initialized sqlContext throws the same error
> {code:java}
> df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet')
> 'JavaMember' object has no attribute 'read'
> Traceback (most recent call last):
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
> 433, in read
> return DataFrameReader(self)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
> line 70, in __init__
> self._jreader = spark._ssql_ctx.read()
> AttributeError: 'JavaMember' object has no attribute 'read'{code}
> In both the spark shell and the livy versions, the objects look the same.
> pyspark shell:
> {code:java}
> >>> print(sc)
> 
> >>> print(sqlContext)
> 
> >>> print(my_sql_context)
> {code}
> livy:
> {code:java}
> print(sc)
> 
> print(sqlContext)
> 
> print(my_sql_context)
> {code}
> I'm running this through sparkmagic but also have confirmed this is the same 
> behavior when calling the api directly.
> {code:java}
> curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
> application/json" localhost:8998/sessions | python -m json.tool
> {
> "appId": null,
> "appInfo": {
> "driverLogUrl": null,
> "sparkUiUrl": null
> },
> "id": 3,
> "kind": "pyspark",
> "log": [
> "stdout: ",
> "\nstderr: ",
> "\nYARN Diagnostics: "
> ],
> "owner": null,
> "proxyUser": null,
> "state": "starting"
> }
> {code}
> {code:java}
> curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: 
> application/json' -d '{"code":"df = 
> sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m 
> json.tool
> {
> "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")",
> "id": 1,
> "output": null,
> "progress": 0.0,
> "state": "running"
> }
> {code}
> When running on 0.4.0 both pyspark shell and livy versions worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-504) Pyspark sqlContext behavior does not match pyspark shell

2018-08-28 Thread Adam Bronte (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Bronte updated LIVY-504:
-
Summary: Pyspark sqlContext behavior does not match pyspark shell  (was: 
Pyspark sqlContext behavior does not my spark shell)

> Pyspark sqlContext behavior does not match pyspark shell
> 
>
> Key: LIVY-504
> URL: https://issues.apache.org/jira/browse/LIVY-504
> Project: Livy
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.5.0
> Environment: AWS EMR 5.16.0
>Reporter: Adam Bronte
>Priority: Major
>
> On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
> context and sqlContext compared to the pyspark shell.
> For example running this through the pyspark shell works:
> {code:java}
> [root@ip-10-0-0-32 ~]# pyspark
> Python 2.7.14 (default, May 2 2018, 18:31:34)
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
> is set, falling back to uploading libraries under SPARK_HOME.
> Welcome to
>  __
> / __/__ ___ _/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /__ / .__/\_,_/_/ /_/\_\ version 2.3.1
> /_/
> Using Python version 2.7.14 (default, May 2 2018 18:31:34)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> my_sql_context = SQLContext.getOrCreate(sc)
> >>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
> >>> print(df.count())
> 67556724
> {code}
> But through Livy, the same code throws an exception
> {code:java}
> from pyspark.sql import SQLContext
> my_sql_context = SQLContext.getOrCreate(sc)
> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
> 'JavaMember' object has no attribute 'read'
> Traceback (most recent call last):
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
> 433, in read
> return DataFrameReader(self)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
> line 70, in __init__
> self._jreader = spark._ssql_ctx.read()
> AttributeError: 'JavaMember' object has no attribute 'read'{code}
> Also trying to use the default initialized sqlContext throws the same error
> {code:java}
> df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet')
> 'JavaMember' object has no attribute 'read'
> Traceback (most recent call last):
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
> 433, in read
> return DataFrameReader(self)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
> line 70, in __init__
> self._jreader = spark._ssql_ctx.read()
> AttributeError: 'JavaMember' object has no attribute 'read'{code}
> In both the spark shell and the livy versions, the objects look the same.
> pyspark shell:
> {code:java}
> >>> print(sc)
> 
> >>> print(sqlContext)
> 
> >>> print(my_sql_context)
> {code}
> livy:
> {code:java}
> print(sc)
> 
> print(sqlContext)
> 
> print(my_sql_context)
> {code}
> I'm running this through sparkmagic but also have confirmed this is the same 
> behavior when calling the api directly.
> {code:java}
> curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
> application/json" localhost:8998/sessions | python -m json.tool
> {
> "appId": null,
> "appInfo": {
> "driverLogUrl": null,
> "sparkUiUrl": null
> },
> "id": 3,
> "kind": "pyspark",
> "log": [
> "stdout: ",
> "\nstderr: ",
> "\nYARN Diagnostics: "
> ],
> "owner": null,
> "proxyUser": null,
> "state": "starting"
> }
> {code}
> {code:java}
> curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: 
> application/json' -d '{"code":"df = 
> sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m 
> json.tool
> {
> "code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")",
> "id": 1,
> "output": null,
> "progress": 0.0,
> "state": "running"
> }
> {code}
> When running on 0.4.0 both pyspark shell and livy versions worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-504) Pyspark sqlContext behavior does not my spark shell

2018-08-28 Thread Adam Bronte (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Bronte updated LIVY-504:
-
Description: 
On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
context and sqlContext compared to the pyspark shell.

For example running this through the pyspark shell works:
{code:java}
[root@ip-10-0-0-32 ~]# pyspark
Python 2.7.14 (default, May 2 2018, 18:31:34)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
 __
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 2.7.14 (default, May 2 2018 18:31:34)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> my_sql_context = SQLContext.getOrCreate(sc)
>>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
>>> print(df.count())
67556724
{code}
But through Livy, the same code throws an exception
{code:java}
from pyspark.sql import SQLContext
my_sql_context = SQLContext.getOrCreate(sc)
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
Also trying to use the default initialized sqlContext throws the same error
{code:java}
df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
In both the spark shell and the livy versions, the objects look the same.

pyspark shell:
{code:java}
>>> print(sc)

>>> print(sqlContext)

>>> print(my_sql_context)
{code}
livy:
{code:java}
print(sc)


print(sqlContext)


print(my_sql_context)
{code}
I'm running this through sparkmagic but also have confirmed this is the same 
behavior when calling the api directly.
{code:java}
curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
application/json" localhost:8998/sessions | python -m json.tool
{
"appId": null,
"appInfo": {
"driverLogUrl": null,
"sparkUiUrl": null
},
"id": 3,
"kind": "pyspark",
"log": [
"stdout: ",
"\nstderr: ",
"\nYARN Diagnostics: "
],
"owner": null,
"proxyUser": null,
"state": "starting"
}
{code}
{code:java}
curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: 
application/json' -d '{"code":"df = 
sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m 
json.tool
{
"code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")",
"id": 1,
"output": null,
"progress": 0.0,
"state": "running"
}
{code}

When running on 0.4.0 both pyspark shell and livy versions worked.

  was:
On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
context and sqlContext compared to the pyspark shell.

For example running this through the pyspark shell works:
{code:java}
[root@ip-10-0-0-32 ~]# pyspark
Python 2.7.14 (default, May 2 2018, 18:31:34)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
 __
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 2.7.14 (default, May 2 2018 18:31:34)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> my_sql_context = SQLContext.getOrCreate(sc)
>>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
>>> print(df.count())
67556724
{code}
But through Livy, the same code throws an exception
{code:java}
from pyspark.sql import SQLContext
my_sql_context = SQLContext.getOrCreate(sc)
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File 

[jira] [Updated] (LIVY-504) Pyspark sqlContext behavior does not my spark shell

2018-08-28 Thread Adam Bronte (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Bronte updated LIVY-504:
-
Description: 
On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
context and sqlContext compared to the pyspark shell.

For example running this through the pyspark shell works:
{code:java}
[root@ip-10-0-0-32 ~]# pyspark
Python 2.7.14 (default, May 2 2018, 18:31:34)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
 __
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 2.7.14 (default, May 2 2018 18:31:34)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> my_sql_context = SQLContext.getOrCreate(sc)
>>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
>>> print(df.count())
67556724
{code}
But through Livy, the same code throws an exception
{code:java}
from pyspark.sql import SQLContext
my_sql_context = SQLContext.getOrCreate(sc)
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
Also trying to use the default initialized sqlContext throws the same error
{code:java}
df = sqlContext.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
In both the spark shell and the livy versions, the objects look the same.

pyspark shell:
{code:java}
>>> print(sc)

>>> print(sqlContext)

>>> print(my_sql_context)
{code}
livy:
{code:java}
print(sc)


print(sqlContext)


print(my_sql_context)
{code}
I'm running this through sparkmagic but also have confirmed this is the same 
behavior when calling the api directly.
{code:java}
curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
application/json" localhost:8998/sessions | python -m json.tool
{
"appId": null,
"appInfo": {
"driverLogUrl": null,
"sparkUiUrl": null
},
"id": 3,
"kind": "pyspark",
"log": [
"stdout: ",
"\nstderr: ",
"\nYARN Diagnostics: "
],
"owner": null,
"proxyUser": null,
"state": "starting"
}
{code}
{code:java}
curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: 
application/json' -d '{"code":"df = 
sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m 
json.tool
{
"code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")",
"id": 1,
"output": null,
"progress": 0.0,
"state": "running"
}
{code}

  was:
On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
context and sqlContext compared to the pyspark shell.

For example running this through the pyspark shell works:
{code:java}
[root@ip-10-0-0-32 ~]# pyspark
Python 2.7.14 (default, May 2 2018, 18:31:34)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
 __
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 2.7.14 (default, May 2 2018 18:31:34)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> my_sql_context = SQLContext.getOrCreate(sc)
>>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
>>> print(df.count())
67556724
{code}
But through Livy, the same code throws an exception
{code:java}
from pyspark.sql import SQLContext
my_sql_context = SQLContext.getOrCreate(sc)
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = 

[jira] [Created] (LIVY-504) Pyspark sqlContext behavior does not my spark shell

2018-08-28 Thread Adam Bronte (JIRA)
Adam Bronte created LIVY-504:


 Summary: Pyspark sqlContext behavior does not my spark shell
 Key: LIVY-504
 URL: https://issues.apache.org/jira/browse/LIVY-504
 Project: Livy
  Issue Type: Bug
  Components: Core
Affects Versions: 0.5.0
 Environment: AWS EMR 5.16.0
Reporter: Adam Bronte


On 0.5.0 I'm seeing inconsistent behavior through Livy regarding the spark 
context and sqlContext compared to the pyspark shell.

For example running this through the pyspark shell works:
{code:java}
[root@ip-10-0-0-32 ~]# pyspark
Python 2.7.14 (default, May 2 2018, 18:31:34)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/28 18:50:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
 __
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/

Using Python version 2.7.14 (default, May 2 2018 18:31:34)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> my_sql_context = SQLContext.getOrCreate(sc)
>>> df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')
>>> print(df.count())
67556724
{code}
But through Livy, the same code throws an exception
{code:java}
from pyspark.sql import SQLContext
my_sql_context = SQLContext.getOrCreate(sc)
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
Also trying to use the default initialized sqlContext throws the same error
{code:java}
df = my_sql_context.read.parquet('s3://my-bucket/mydata.parquet')

'JavaMember' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 
433, in read
return DataFrameReader(self)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
70, in __init__
self._jreader = spark._ssql_ctx.read()
AttributeError: 'JavaMember' object has no attribute 'read'{code}
In both the spark shell and the livy versions, the objects look the same.

pyspark shell:
{code:java}
>>> print(sc)

>>> print(sqlContext)

>>> print(my_sql_context)
{code}
livy:
{code:java}
print(sc)


print(sqlContext)


print(my_sql_context)
{code}
I'm running this through sparkmagic but also have confirmed this is the same 
behavior when calling the api directly.
{code:java}
curl --silent -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
application/json" localhost:8998/sessions | python -m json.tool
{
"appId": null,
"appInfo": {
"driverLogUrl": null,
"sparkUiUrl": null
},
"id": 3,
"kind": "pyspark",
"log": [
"stdout: ",
"\nstderr: ",
"\nYARN Diagnostics: "
],
"owner": null,
"proxyUser": null,
"state": "starting"
}
{code}
{code:java}
curl --silent localhost:8998/sessions/3/statements -X POST -H 'Content-Type: 
application/json' -d '{"code":"df = 
sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")"}' | python -m 
json.tool
{
"code": "df = sqlContext.read.parquet(\"s3://my-bucket/mydata.parquet\")",
"id": 1,
"output": null,
"progress": 0.0,
"state": "running"
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-503) More RPC classes used in thrifserver in a separate module

2018-08-28 Thread Marco Gaido (JIRA)
Marco Gaido created LIVY-503:


 Summary: More RPC classes used in thrifserver in a separate module
 Key: LIVY-503
 URL: https://issues.apache.org/jira/browse/LIVY-503
 Project: Livy
  Issue Type: Sub-task
Reporter: Marco Gaido


As suggested in the discussion for the original PR 
(https://github.com/apache/incubator-livy/pull/104#discussion_r212806490), we 
should move the RPC classes which need to be uploaded to the Spark session in a 
separate module, in order to upload as few classes as possible and avoid 
eventual interaction with the Spark session created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-502) Cleanup Hive dependencies

2018-08-28 Thread Marco Gaido (JIRA)
Marco Gaido created LIVY-502:


 Summary: Cleanup Hive dependencies
 Key: LIVY-502
 URL: https://issues.apache.org/jira/browse/LIVY-502
 Project: Livy
  Issue Type: Sub-task
Reporter: Marco Gaido


In the starting implementation we are relying/delegating some of the work to 
the Hive classes used in the HiveServer2. This helped simplifying the creation 
of the first implementation, as it saved to write a lot of code. But this 
caused also a dependency on the {{hive-exec}} package, as well as compelled us 
to modify a bit some of the existing Hive classes.

The JIRA tracks removing these workarounds by re-implementing the same logic in 
Livy to get rid of all Hive dependencies, other than the rpc and service layers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)