[jira] [Updated] (LIVY-750) Livy uploads local pyspark archives to Yarn distributed cache

2020-03-16 Thread shanyu zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-750:
-
Description: 
On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of finding and uploading pyspark archives if it is not 
local, there is no need for Livy to redundantly do so.

  was:
On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of uploading pyspark archives, there is no need for Livy to 
redundantly do so.


> Livy uploads local pyspark archives to Yarn distributed cache
> -
>
> Key: LIVY-750
> URL: https://issues.apache.org/jira/browse/LIVY-750
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.6.0, 0.7.0
>Reporter: shanyu zhao
>Priority: Major
> Attachments: image-2020-02-16-13-19-40-645.png, 
> image-2020-02-16-13-19-59-591.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On Livy Server, even if we set  pyspark archives to use local files:
> {code:bash}
> export 
> PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
> {code}
> Livy still upload these local pyspark archives to Yarn distributed cache:
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
> yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
> yarn.Client: Uploading resource 
> file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> 
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
> Note that this is after we fixed Spark code in SPARK-30845 to not always 
> upload local archives.
> The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
> which will be added to Yarn distributed cache by Spark. Since spark-submit 
> already takes care of finding and uploading pyspark archives if it is not 
> local, there is no need for Livy to redundantly do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-576) Unknown YARN state RUNNING for app with final status SUCCEEDED

2020-02-24 Thread shanyu zhao (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043920#comment-17043920
 ] 

shanyu zhao commented on LIVY-576:
--

Thanks [~andrasbeni], yes LIVY-642 is basically a duplicate of this JIRA with 
exactly the same fix. We can close this JIRA now that LIVY-642 is checked in.

On the other hand, [~jerryshao], maybe we should be inspecting the JIRAs more 
often. This JIRA has been sitting there for almost 1 year now without anyone 
reviewing it. And a duplicate JIRA was reviewed and checked in after about 6 
months.

> Unknown YARN state RUNNING for app with final status SUCCEEDED
> --
>
> Key: LIVY-576
> URL: https://issues.apache.org/jira/browse/LIVY-576
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-576.patch
>
>
> Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return 
> application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, 
> this means the Yarn application is finishing up and about to be successful. 
> Livy's mapYarnSate() method does not have a valid mapping for this 
> combination and therefore it render the session 'dead'.
> I saw this in Livy server log:
> 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for 
> app application_1553542555261_0063 with final status SUCCEEDED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (LIVY-750) Livy uploads local pyspark archives to Yarn distributed cache

2020-02-16 Thread shanyu zhao (Jira)
shanyu zhao created LIVY-750:


 Summary: Livy uploads local pyspark archives to Yarn distributed 
cache
 Key: LIVY-750
 URL: https://issues.apache.org/jira/browse/LIVY-750
 Project: Livy
  Issue Type: Bug
  Components: Server
Affects Versions: 0.7.0, 0.6.0
Reporter: shanyu zhao
 Attachments: image-2020-02-16-13-19-40-645.png, 
image-2020-02-16-13-19-59-591.png

On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of uploading pyspark archives, there is no need for Livy to 
redundantly do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (LIVY-749) Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy session

2020-02-14 Thread shanyu zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-749:
-
Issue Type: Bug  (was: New Feature)

> Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy 
> session
> 
>
> Key: LIVY-749
> URL: https://issues.apache.org/jira/browse/LIVY-749
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.6.0, 0.7.0
>Reporter: shanyu zhao
>Priority: Major
>
> If we start any Livy session with hive support 
> (livy.repl.enable-hive-context=true), we see that 3 datanucleus jars are 
> uploaded to HDFS and downloaded to drivers/executors:
> Uploading resource file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar -> 
> hdfs://namenode/user/test1/.sparkStaging/application_1581024490249_0002/datanucleus-api-jdo-3.2.6.jar
> ...
> These 3 datanucleus jars are not needed bacause they are already included in 
> Spark 2.x jars folder.
> The reason is because in InteractiveSession.scala, method 
> mergeHiveSiteAndHiveDeps(), we merged datanucleus jars to spark.jars list 
> with method datanucleusJars(). We should remove datanucleusJars() function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (LIVY-749) Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy session

2020-02-14 Thread shanyu zhao (Jira)
shanyu zhao created LIVY-749:


 Summary: Datanucleus jars are uploaded to hdfs unnecessarily when 
starting a livy session
 Key: LIVY-749
 URL: https://issues.apache.org/jira/browse/LIVY-749
 Project: Livy
  Issue Type: New Feature
  Components: Server
Affects Versions: 0.7.0, 0.6.0
Reporter: shanyu zhao


If we start any Livy session with hive support 
(livy.repl.enable-hive-context=true), we see that 3 datanucleus jars are 
uploaded to HDFS and downloaded to drivers/executors:

Uploading resource file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar -> 
hdfs://namenode/user/test1/.sparkStaging/application_1581024490249_0002/datanucleus-api-jdo-3.2.6.jar
...

These 3 datanucleus jars are not needed bacause they are already included in 
Spark 2.x jars folder.

The reason is because in InteractiveSession.scala, method 
mergeHiveSiteAndHiveDeps(), we merged datanucleus jars to spark.jars list with 
method datanucleusJars(). We should remove datanucleusJars() function.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy

2020-01-13 Thread shanyu zhao (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014723#comment-17014723
 ] 

shanyu zhao commented on LIVY-718:
--

[~jerryshao] the active-standby HA for Livy server is to solve the problem of 
hardware/networking failures and upgrade scenario on the active Livy server. 
When the active Livy server is offline, the standby Livy server becomes active 
and read the states from Zookeeper and start to serve requests. This aims at 
High Availability rather then scalability.

The active-active proposal in this PR seems to be more geared towards 
scalability. The designated server proposal by [~yihengw] is simpler and more 
realistic to implement. As far as I know, the HiveServer2 HA is also using the 
designated server approach. The stateless proposal by [~bikassaha] is more 
desirable but much harder to implement. There are many in-memory states like 
access times need to be moved to persistent store, and may need locks for some 
variables.

I think it is beneficial to first have active-standby HA (LIVY-11) checked in, 
while this PR is being worked on, especially it satisfy users with the need for 
HA rather than scalability. 

> Support multi-active high availability in Livy
> --
>
> Key: LIVY-718
> URL: https://issues.apache.org/jira/browse/LIVY-718
> Project: Livy
>  Issue Type: Epic
>  Components: RSC, Server
>Reporter: Yiheng Wang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (LIVY-743) Support Livy metrics based on dropwizard

2020-01-13 Thread shanyu zhao (Jira)
shanyu zhao created LIVY-743:


 Summary: Support Livy metrics based on dropwizard
 Key: LIVY-743
 URL: https://issues.apache.org/jira/browse/LIVY-743
 Project: Livy
  Issue Type: New Feature
  Components: Server
Affects Versions: 0.6.0
Reporter: shanyu zhao


Livy server should support metrics, ideally based on dropwizard (the one used 
by Spark). With this we could use metrics sink to output session/batch metrics 
to influxdb for telemtry and diagnostics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy

2020-01-08 Thread shanyu zhao (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010965#comment-17010965
 ] 

shanyu zhao commented on LIVY-718:
--

LIVY-11 aims at active-standby HA, user should be able to choose active-active 
vs. active-standby HA through configurations.

> Support multi-active high availability in Livy
> --
>
> Key: LIVY-718
> URL: https://issues.apache.org/jira/browse/LIVY-718
> Project: Livy
>  Issue Type: Epic
>  Components: RSC, Server
>Reporter: Yiheng Wang
>Priority: Major
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active

2019-07-31 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897494#comment-16897494
 ] 

shanyu zhao commented on LIVY-547:
--

Created new PR and added new configuration to change the behavior of timeout 
check and the default value has backward compatibility.

# Whether or not to skip timeout check for a busy session
# livy.server.session.timeout-check.skip-busy = false

> Livy kills session after livy.server.session.timeout even if the session is 
> active
> --
>
> Key: LIVY-547
> URL: https://issues.apache.org/jira/browse/LIVY-547
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Reporter: Sandeep Nemuri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Livy kills session after {{livy.server.session.timeout}} even if the session 
> is active.
> Code that runs more than the {{livy.server.session.timeout}} with 
> intermediate sleeps.
> {noformat}
> %pyspark 
> import time 
> import datetime 
> import random
> def inside(p):
> x, y = random.random(), random.random()
> return x*x + y*y < 1
> NUM_SAMPLES=10
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 100 s") 
> time.sleep(100) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 200 s") 
> time.sleep(200) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s1") 
> time.sleep(300)
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s2") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s3") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s4") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> {noformat}
> Livy log:
> {noformat}
> 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created 
> [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: 
> Some(admin), state: idle, kind: shared, info: 
> {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin,
>  
> sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}]
> 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14...
> 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session 
> reply.
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 19/01/07 17:52:56 WARN RpcDispatcher: [ClientProtocol] Closing RPC channel 
> with 1 outstanding RPCs.
> 19/01/07 17:52:56 WARN Interacti

[jira] [Created] (LIVY-617) Livy session leak on Yarn when creating session duplicated names

2019-07-30 Thread shanyu zhao (JIRA)
shanyu zhao created LIVY-617:


 Summary: Livy session leak on Yarn when creating session 
duplicated names
 Key: LIVY-617
 URL: https://issues.apache.org/jira/browse/LIVY-617
 Project: Livy
  Issue Type: Bug
  Components: Server
Affects Versions: 0.6.0
Reporter: shanyu zhao


When running Livy on Yarn and try to create session with duplicated names, Livy 
server sends response to client "Duplicate session name: xxx" but it doesn't 
stop the session. The session creation failed, however, the Yarn application 
got started and keeps running forever.

This is because during livy session register method, exception 
"IllegalArgumentException" is thrown without stopping the session:
{code:java}
def register(session: S): S = {
    info(s"Registering new session ${session.id}")
    synchronized {
  session.name.foreach { sessionName =>
    if (sessionsByName.contains(sessionName)) {
  throw new IllegalArgumentException(s"Duplicate session name: 
${session.name}")
    } else {
  sessionsByName.put(sessionName, session)
    }
  }
  sessions.put(session.id, session)
  session.start()
    }
    session
  }{code}
 

Reproduction scripts:

curl -s -k -u username:password -X POST --data '\{"name": "duplicatedname", 
"kind": "pyspark"}' -H "Content-Type: application/json" 
'https://myserver/livy/v1/sessions'



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3

2019-03-27 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803315#comment-16803315
 ] 

shanyu zhao commented on LIVY-505:
--

Root cause:

SparkR uses SparkR env variable ".sparkRjsc" as a singleton for existing spark 
context.
 However, in Livy's SparkRInterpreter.scala, we created spark context and 
assigned to SparkR env variable ".sc".

sendRequest("""assign(".sc", 
SparkR:::callJStatic("org.apache.livy.repl.SparkRInterpreter", 
"getSparkContext"), envir = SparkR:::.sparkREnv)""")

 

This means if we execute sparkR.session() in SparkR notebook, SparkR will 
overwrite SparkR env variable ".scStartTime" (which was set in Livy's 
SparkRInterpreter.scala), and render previously created jobj invalid because 
that jobj$appId is different from SparkR env variable ".scStartTime".

 

Please see the fix in LIVY-505.patch

> sparkR.session failed with "invalid jobj 1" error in Spark 2.3
> --
>
> Key: LIVY-505
> URL: https://issues.apache.org/jira/browse/LIVY-505
> Project: Livy
>  Issue Type: Bug
>  Components: Interpreter
>Affects Versions: 0.5.0, 0.5.1
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-505.patch
>
>
> In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type:
> {code:java}
> %sparkr
> sparkR.session(){code}
> You will see error:
> [1] "Error in writeJobj(con, object): invalid jobj 1"
> In a successful case with older livy and spark versions, we see something 
> like this:
> Java ref type org.apache.spark.sql.SparkSession id 1
> This indicates isValidJobj() function in Spark code returned false for 
> SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI:
> {code:java}
> isValidJobj <- function(jobj) {
>   if (exists(".scStartTime", envir = .sparkREnv)) {
>     jobj$appId == get(".scStartTime", envir = .sparkREnv)
>   } else {
>     FALSE
>   }
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3

2019-03-27 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-505:
-
Attachment: LIVY-505.patch

> sparkR.session failed with "invalid jobj 1" error in Spark 2.3
> --
>
> Key: LIVY-505
> URL: https://issues.apache.org/jira/browse/LIVY-505
> Project: Livy
>  Issue Type: Bug
>  Components: Interpreter
>Affects Versions: 0.5.0, 0.5.1
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-505.patch
>
>
> In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type:
> {code:java}
> %sparkr
> sparkR.session(){code}
> You will see error:
> [1] "Error in writeJobj(con, object): invalid jobj 1"
> In a successful case with older livy and spark versions, we see something 
> like this:
> Java ref type org.apache.spark.sql.SparkSession id 1
> This indicates isValidJobj() function in Spark code returned false for 
> SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI:
> {code:java}
> isValidJobj <- function(jobj) {
>   if (exists(".scStartTime", envir = .sparkREnv)) {
>     jobj$appId == get(".scStartTime", envir = .sparkREnv)
>   } else {
>     FALSE
>   }
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-576) Unknown YARN state RUNNING for app with final status SUCCEEDED

2019-03-25 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-576:
-
Attachment: LIVY-576.patch

> Unknown YARN state RUNNING for app with final status SUCCEEDED
> --
>
> Key: LIVY-576
> URL: https://issues.apache.org/jira/browse/LIVY-576
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-576.patch
>
>
> Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return 
> application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, 
> this means the Yarn application is finishing up and about to be successful. 
> Livy's mapYarnSate() method does not have a valid mapping for this 
> combination and therefore it render the session 'dead'.
> I saw this in Livy server log:
> 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for 
> app application_1553542555261_0063 with final status SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-576) Unknown YARN state RUNNING for app with final status SUCCEEDED

2019-03-25 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-576:
-
Summary: Unknown YARN state RUNNING for app with final status SUCCEEDED  
(was: Unknown YARN state RUNNING FOR app with final status SUCCEEDED)

> Unknown YARN state RUNNING for app with final status SUCCEEDED
> --
>
> Key: LIVY-576
> URL: https://issues.apache.org/jira/browse/LIVY-576
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.5.0
>Reporter: shanyu zhao
>Priority: Major
>
> Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return 
> application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, 
> this means the Yarn application is finishing up and about to be successful. 
> Livy's mapYarnSate() method does not have a valid mapping for this 
> combination and therefore it render the session 'dead'.
> I saw this in Livy server log:
> 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for 
> app application_1553542555261_0063 with final status SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-576) Unknown YARN state RUNNING FOR app with final status SUCCEEDED

2019-03-25 Thread shanyu zhao (JIRA)
shanyu zhao created LIVY-576:


 Summary: Unknown YARN state RUNNING FOR app with final status 
SUCCEEDED
 Key: LIVY-576
 URL: https://issues.apache.org/jira/browse/LIVY-576
 Project: Livy
  Issue Type: Bug
  Components: Server
Affects Versions: 0.5.0
Reporter: shanyu zhao


Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return 
application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, 
this means the Yarn application is finishing up and about to be successful. 
Livy's mapYarnSate() method does not have a valid mapping for this combination 
and therefore it render the session 'dead'.

I saw this in Livy server log:

19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for app 
application_1553542555261_0063 with final status SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active

2019-02-27 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780032#comment-16780032
 ] 

shanyu zhao commented on LIVY-547:
--

[~Jassy] are you still working on this issue?

> Livy kills session after livy.server.session.timeout even if the session is 
> active
> --
>
> Key: LIVY-547
> URL: https://issues.apache.org/jira/browse/LIVY-547
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Reporter: Sandeep Nemuri
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Livy kills session after {{livy.server.session.timeout}} even if the session 
> is active.
> Code that runs more than the {{livy.server.session.timeout}} with 
> intermediate sleeps.
> {noformat}
> %pyspark 
> import time 
> import datetime 
> import random
> def inside(p):
> x, y = random.random(), random.random()
> return x*x + y*y < 1
> NUM_SAMPLES=10
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 100 s") 
> time.sleep(100) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 200 s") 
> time.sleep(200) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s1") 
> time.sleep(300)
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s2") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s3") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s4") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> {noformat}
> Livy log:
> {noformat}
> 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created 
> [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: 
> Some(admin), state: idle, kind: shared, info: 
> {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin,
>  
> sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}]
> 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14...
> 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session 
> reply.
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 19/01/07 17:52:56 WARN RpcDispatcher: [ClientProtocol] Closing RPC channel 
> with 1 outstanding RPCs.
> 19/01/07 17:52:56 WARN InteractiveSession: Failed to stop RSCDriver. Killing 
> it...
> 19/01/07 17:52:56 INFO YarnClientImpl: Killed application 
> application_1546711709239_0002
> 19/01/07 17:52:56 INFO InteractiveSession: Stopped 

[jira] [Commented] (LIVY-266) Livy sessions/batches are not secured. Any user can stop another user session/batch

2018-11-29 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704312#comment-16704312
 ] 

shanyu zhao commented on LIVY-266:
--

[~tc0312] are you saying that all the requests are actually using SPNEGO with 
identity "prabhu", therefore it all works fine. If a different user tries to 
post statements or kill sessions owned by user "prabhu", livy will deny that 
request?

If knox is used to access livy in a keberized cluster, and knox user is 
configured as "livy.superusers", then proxyUser field is enforced for post 
/sessions request, However, post statements request to any sessions from knox 
server to livy server will always be successful because the caller identity is 
knox user not the end user. How does livy find out who is making the request to 
it?

> Livy sessions/batches are not secured. Any user can stop another user 
> session/batch
> ---
>
> Key: LIVY-266
> URL: https://issues.apache.org/jira/browse/LIVY-266
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.3
>Reporter: Prabhu Kasinathan
>Priority: Major
>
> Dev,
> Livy session or batches are not currently secured. i.e. User A can start a 
> session or batch and User B can submit code to session started by User A or 
> even stop that session. This is critical issue on secured cluster, when User 
> A is having sensitive data access, there may be a chance User B can access 
> those sensitive datasets through User-A Session.
> Here, is an example from our secured cluster.
> # Starting session from user "prabhu"
> curl --silent --negotiate -u:prabhu localhost:8998/sessions -X POST -H 
> 'Content-Type: application/json' -d '{
>   "kind":"scala",
>   "proxyUser":"prabhu",
>   "name":"Testing"
> }' | python -m json.tool
> {
> "id": 371,
> "appId": null,
> "owner": "prabhu",
> "proxyUser": "prabhu",
> "state": "starting",
> "kind": "spark",
> "appInfo": {
> "driverLogUrl": null,
> "sparkUiUrl": null
> },
> "log": []
> }
> # Executing code to above session by some other user "don"
> curl --silent --negotiate -u:don localhost:8998/sessions/371/statements -X 
> POST -H 'Content-Type: application/json' -d '{
>   "code":"sc.applicationId"
> }' | python -m json.tool
> {
> "id": 0,
> "state": "available",
> "output": {
> "status": "ok",
> "execution_count": 0,
> "data": {
> "text/plain": "res0: String = application_1476926173701_398436"
> }
> }
> }
> # Stopping above session by different user "john" this time
> curl --silent --negotiate -u:john localhost:8998/sessions/371 -X DELETE | 
> python -m json.tool
> {
> "msg": "deleted"
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-231) Session recovery for batch sessions with multiple Livy servers

2018-11-14 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687299#comment-16687299
 ] 

shanyu zhao commented on LIVY-231:
--

Any update for this feature? Or should we implement an active-standby mode 
multi-node HA?

> Session recovery for batch sessions with multiple Livy servers
> --
>
> Key: LIVY-231
> URL: https://issues.apache.org/jira/browse/LIVY-231
> Project: Livy
>  Issue Type: New Feature
>  Components: Batch, Core, RSC, Server
>Affects Versions: 0.2
>Reporter: Meisam
>Priority: Minor
>
> HA with one Livy server node means that Livy will be unavailable during the 
> recovery. Having more than one node makes Livy available even if one of the 
> server nodes goes down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3

2018-08-28 Thread shanyu zhao (JIRA)
shanyu zhao created LIVY-505:


 Summary: sparkR.session failed with "invalid jobj 1" error in 
Spark 2.3
 Key: LIVY-505
 URL: https://issues.apache.org/jira/browse/LIVY-505
 Project: Livy
  Issue Type: Bug
  Components: Interpreter
Affects Versions: 0.5.0, 0.5.1
Reporter: shanyu zhao


In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type:
{code:java}
%sparkr
sparkR.session(){code}
You will see error:

[1] "Error in writeJobj(con, object): invalid jobj 1"

In a successful case with older livy and spark versions, we see something like 
this:

Java ref type org.apache.spark.sql.SparkSession id 1

This indicates isValidJobj() function in Spark code returned false for 
SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI:
{code:java}
isValidJobj <- function(jobj) {
  if (exists(".scStartTime", envir = .sparkREnv)) {
    jobj$appId == get(".scStartTime", envir = .sparkREnv)
  } else {
    FALSE
  }
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-501) Support Spark 2.3 SparkR backend

2018-08-27 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594481#comment-16594481
 ] 

shanyu zhao commented on LIVY-501:
--

Thanks [~jerryshao]. Where does Livy mention the compatibility with Spark 
versions? I checked release notes of livy 0.5 which didn't say which version of 
Spark does it support.

In [https://github.com/apache/incubator-livy,] it mentioned that:

"Livy requires at least Spark 1.6 and supports both Scala 2.10 and 2.11 builds 
of Spark, Livy will automatically pick repl dependencies through detecting the 
Scala version of Spark.

Livy also supports Spark 2.0+ for both interactive and batch submission, you 
could seamlessly switch to different versions of Spark through {{SPARK_HOME}} 
configuration, without needing to rebuild Livy."

> Support Spark 2.3 SparkR backend
> 
>
> Key: LIVY-501
> URL: https://issues.apache.org/jira/browse/LIVY-501
> Project: Livy
>  Issue Type: Bug
>  Components: Interpreter
>Affects Versions: 0.5.0, 0.5.1
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-501.patch
>
>
> Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] 
> instead of just an Int.
> [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca]
> Livy does not work with this version of RBackend because it still expecting a 
> single Int in the return value from init().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-501) Support Spark 2.3 SparkR backend

2018-08-27 Thread shanyu zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594336#comment-16594336
 ] 

shanyu zhao commented on LIVY-501:
--

Spark 2.3 works except for SparkR. I just attached the patch on top of 
branch-0.5

> Support Spark 2.3 SparkR backend
> 
>
> Key: LIVY-501
> URL: https://issues.apache.org/jira/browse/LIVY-501
> Project: Livy
>  Issue Type: Bug
>  Components: Interpreter
>Affects Versions: 0.5.0, 0.5.1
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-501.patch
>
>
> Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] 
> instead of just an Int.
> [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca]
> Livy does not work with this version of RBackend because it still expecting a 
> single Int in the return value from init().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (LIVY-501) Support Spark 2.3 SparkR backend

2018-08-27 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-501:
-
Attachment: LIVY-501.patch

> Support Spark 2.3 SparkR backend
> 
>
> Key: LIVY-501
> URL: https://issues.apache.org/jira/browse/LIVY-501
> Project: Livy
>  Issue Type: Bug
>  Components: Interpreter
>Affects Versions: 0.5.0, 0.5.1
>Reporter: shanyu zhao
>Priority: Major
> Attachments: LIVY-501.patch
>
>
> Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] 
> instead of just an Int.
> [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca]
> Livy does not work with this version of RBackend because it still expecting a 
> single Int in the return value from init().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-501) Support Spark 2.3 SparkR backend

2018-08-27 Thread shanyu zhao (JIRA)
shanyu zhao created LIVY-501:


 Summary: Support Spark 2.3 SparkR backend
 Key: LIVY-501
 URL: https://issues.apache.org/jira/browse/LIVY-501
 Project: Livy
  Issue Type: Bug
  Components: Interpreter
Affects Versions: 0.5.0, 0.5.1
Reporter: shanyu zhao


Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] 
instead of just an Int.

[https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca]

Livy does not work with this version of RBackend because it still expecting a 
single Int in the return value from init().

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)