[jira] [Updated] (LIVY-750) Livy uploads local pyspark archives to Yarn distributed cache
[ https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated LIVY-750: - Description: On Livy Server, even if we set pyspark archives to use local files: {code:bash} export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip {code} Livy still upload these local pyspark archives to Yarn distributed cache: 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives. The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of finding and uploading pyspark archives if it is not local, there is no need for Livy to redundantly do so. was: On Livy Server, even if we set pyspark archives to use local files: {code:bash} export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip {code} Livy still upload these local pyspark archives to Yarn distributed cache: 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives. The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of uploading pyspark archives, there is no need for Livy to redundantly do so. > Livy uploads local pyspark archives to Yarn distributed cache > - > > Key: LIVY-750 > URL: https://issues.apache.org/jira/browse/LIVY-750 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0, 0.7.0 >Reporter: shanyu zhao >Priority: Major > Attachments: image-2020-02-16-13-19-40-645.png, > image-2020-02-16-13-19-59-591.png > > Time Spent: 10m > Remaining Estimate: 0h > > On Livy Server, even if we set pyspark archives to use local files: > {code:bash} > export > PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip > {code} > Livy still upload these local pyspark archives to Yarn distributed cache: > 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO > yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> > hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip > 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO > yarn.Client: Uploading resource > file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> > hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip > Note that this is after we fixed Spark code in SPARK-30845 to not always > upload local archives. > The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", > which will be added to Yarn distributed cache by Spark. Since spark-submit > already takes care of finding and uploading pyspark archives if it is not > local, there is no need for Livy to redundantly do so. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-576) Unknown YARN state RUNNING for app with final status SUCCEEDED
[ https://issues.apache.org/jira/browse/LIVY-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043920#comment-17043920 ] shanyu zhao commented on LIVY-576: -- Thanks [~andrasbeni], yes LIVY-642 is basically a duplicate of this JIRA with exactly the same fix. We can close this JIRA now that LIVY-642 is checked in. On the other hand, [~jerryshao], maybe we should be inspecting the JIRAs more often. This JIRA has been sitting there for almost 1 year now without anyone reviewing it. And a duplicate JIRA was reviewed and checked in after about 6 months. > Unknown YARN state RUNNING for app with final status SUCCEEDED > -- > > Key: LIVY-576 > URL: https://issues.apache.org/jira/browse/LIVY-576 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.5.0 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-576.patch > > > Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return > application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, > this means the Yarn application is finishing up and about to be successful. > Livy's mapYarnSate() method does not have a valid mapping for this > combination and therefore it render the session 'dead'. > I saw this in Livy server log: > 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for > app application_1553542555261_0063 with final status SUCCEEDED. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (LIVY-750) Livy uploads local pyspark archives to Yarn distributed cache
shanyu zhao created LIVY-750: Summary: Livy uploads local pyspark archives to Yarn distributed cache Key: LIVY-750 URL: https://issues.apache.org/jira/browse/LIVY-750 Project: Livy Issue Type: Bug Components: Server Affects Versions: 0.7.0, 0.6.0 Reporter: shanyu zhao Attachments: image-2020-02-16-13-19-40-645.png, image-2020-02-16-13-19-59-591.png On Livy Server, even if we set pyspark archives to use local files: {code:bash} export PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip {code} Livy still upload these local pyspark archives to Yarn distributed cache: 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip Note that this is after we fixed Spark code in SPARK-30845 to not always upload local archives. The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", which will be added to Yarn distributed cache by Spark. Since spark-submit already takes care of uploading pyspark archives, there is no need for Livy to redundantly do so. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (LIVY-749) Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy session
[ https://issues.apache.org/jira/browse/LIVY-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated LIVY-749: - Issue Type: Bug (was: New Feature) > Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy > session > > > Key: LIVY-749 > URL: https://issues.apache.org/jira/browse/LIVY-749 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.6.0, 0.7.0 >Reporter: shanyu zhao >Priority: Major > > If we start any Livy session with hive support > (livy.repl.enable-hive-context=true), we see that 3 datanucleus jars are > uploaded to HDFS and downloaded to drivers/executors: > Uploading resource file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar -> > hdfs://namenode/user/test1/.sparkStaging/application_1581024490249_0002/datanucleus-api-jdo-3.2.6.jar > ... > These 3 datanucleus jars are not needed bacause they are already included in > Spark 2.x jars folder. > The reason is because in InteractiveSession.scala, method > mergeHiveSiteAndHiveDeps(), we merged datanucleus jars to spark.jars list > with method datanucleusJars(). We should remove datanucleusJars() function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (LIVY-749) Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy session
shanyu zhao created LIVY-749: Summary: Datanucleus jars are uploaded to hdfs unnecessarily when starting a livy session Key: LIVY-749 URL: https://issues.apache.org/jira/browse/LIVY-749 Project: Livy Issue Type: New Feature Components: Server Affects Versions: 0.7.0, 0.6.0 Reporter: shanyu zhao If we start any Livy session with hive support (livy.repl.enable-hive-context=true), we see that 3 datanucleus jars are uploaded to HDFS and downloaded to drivers/executors: Uploading resource file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar -> hdfs://namenode/user/test1/.sparkStaging/application_1581024490249_0002/datanucleus-api-jdo-3.2.6.jar ... These 3 datanucleus jars are not needed bacause they are already included in Spark 2.x jars folder. The reason is because in InteractiveSession.scala, method mergeHiveSiteAndHiveDeps(), we merged datanucleus jars to spark.jars list with method datanucleusJars(). We should remove datanucleusJars() function. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014723#comment-17014723 ] shanyu zhao commented on LIVY-718: -- [~jerryshao] the active-standby HA for Livy server is to solve the problem of hardware/networking failures and upgrade scenario on the active Livy server. When the active Livy server is offline, the standby Livy server becomes active and read the states from Zookeeper and start to serve requests. This aims at High Availability rather then scalability. The active-active proposal in this PR seems to be more geared towards scalability. The designated server proposal by [~yihengw] is simpler and more realistic to implement. As far as I know, the HiveServer2 HA is also using the designated server approach. The stateless proposal by [~bikassaha] is more desirable but much harder to implement. There are many in-memory states like access times need to be moved to persistent store, and may need locks for some variables. I think it is beneficial to first have active-standby HA (LIVY-11) checked in, while this PR is being worked on, especially it satisfy users with the need for HA rather than scalability. > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (LIVY-743) Support Livy metrics based on dropwizard
shanyu zhao created LIVY-743: Summary: Support Livy metrics based on dropwizard Key: LIVY-743 URL: https://issues.apache.org/jira/browse/LIVY-743 Project: Livy Issue Type: New Feature Components: Server Affects Versions: 0.6.0 Reporter: shanyu zhao Livy server should support metrics, ideally based on dropwizard (the one used by Spark). With this we could use metrics sink to output session/batch metrics to influxdb for telemtry and diagnostics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-718) Support multi-active high availability in Livy
[ https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010965#comment-17010965 ] shanyu zhao commented on LIVY-718: -- LIVY-11 aims at active-standby HA, user should be able to choose active-active vs. active-standby HA through configurations. > Support multi-active high availability in Livy > -- > > Key: LIVY-718 > URL: https://issues.apache.org/jira/browse/LIVY-718 > Project: Livy > Issue Type: Epic > Components: RSC, Server >Reporter: Yiheng Wang >Priority: Major > > In this JIRA we want to discuss how to implement multi-active high > availability in Livy. > Currently, Livy only supports single node recovery. This is not sufficient in > some production environments. In our scenario, the Livy server serves many > notebook and JDBC services. We want to make Livy service more fault-tolerant > and scalable. > There're already some proposals in the community for high availability. But > they're not so complete or just for active-standby high availability. So we > propose a multi-active high availability design to achieve the following > goals: > # One or more servers will serve the client requests at the same time. > # Sessions are allocated among different servers. > # When one node crashes, the affected sessions will be moved to other active > services. > Here's our design document, please review and comment: > https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active
[ https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897494#comment-16897494 ] shanyu zhao commented on LIVY-547: -- Created new PR and added new configuration to change the behavior of timeout check and the default value has backward compatibility. # Whether or not to skip timeout check for a busy session # livy.server.session.timeout-check.skip-busy = false > Livy kills session after livy.server.session.timeout even if the session is > active > -- > > Key: LIVY-547 > URL: https://issues.apache.org/jira/browse/LIVY-547 > Project: Livy > Issue Type: Bug > Components: Server >Reporter: Sandeep Nemuri >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Livy kills session after {{livy.server.session.timeout}} even if the session > is active. > Code that runs more than the {{livy.server.session.timeout}} with > intermediate sleeps. > {noformat} > %pyspark > import time > import datetime > import random > def inside(p): > x, y = random.random(), random.random() > return x*x + y*y < 1 > NUM_SAMPLES=10 > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 100 s") > time.sleep(100) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 200 s") > time.sleep(200) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s1") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s2") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s3") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s4") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > {noformat} > Livy log: > {noformat} > 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created > [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: > Some(admin), state: idle, kind: shared, info: > {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin, > > sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}] > 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14... > 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session > reply. > java.util.concurrent.TimeoutException > at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) > at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223) > at > org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471) > at > org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471) > at scala.Option.foreach(Option.scala:236) > at > org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471) > at > org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174) > at > org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171) > at > org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 19/01/07 17:52:56 WARN RpcDispatcher: [ClientProtocol] Closing RPC channel > with 1 outstanding RPCs. > 19/01/07 17:52:56 WARN Interacti
[jira] [Created] (LIVY-617) Livy session leak on Yarn when creating session duplicated names
shanyu zhao created LIVY-617: Summary: Livy session leak on Yarn when creating session duplicated names Key: LIVY-617 URL: https://issues.apache.org/jira/browse/LIVY-617 Project: Livy Issue Type: Bug Components: Server Affects Versions: 0.6.0 Reporter: shanyu zhao When running Livy on Yarn and try to create session with duplicated names, Livy server sends response to client "Duplicate session name: xxx" but it doesn't stop the session. The session creation failed, however, the Yarn application got started and keeps running forever. This is because during livy session register method, exception "IllegalArgumentException" is thrown without stopping the session: {code:java} def register(session: S): S = { info(s"Registering new session ${session.id}") synchronized { session.name.foreach { sessionName => if (sessionsByName.contains(sessionName)) { throw new IllegalArgumentException(s"Duplicate session name: ${session.name}") } else { sessionsByName.put(sessionName, session) } } sessions.put(session.id, session) session.start() } session }{code} Reproduction scripts: curl -s -k -u username:password -X POST --data '\{"name": "duplicatedname", "kind": "pyspark"}' -H "Content-Type: application/json" 'https://myserver/livy/v1/sessions' -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3
[ https://issues.apache.org/jira/browse/LIVY-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803315#comment-16803315 ] shanyu zhao commented on LIVY-505: -- Root cause: SparkR uses SparkR env variable ".sparkRjsc" as a singleton for existing spark context. However, in Livy's SparkRInterpreter.scala, we created spark context and assigned to SparkR env variable ".sc". sendRequest("""assign(".sc", SparkR:::callJStatic("org.apache.livy.repl.SparkRInterpreter", "getSparkContext"), envir = SparkR:::.sparkREnv)""") This means if we execute sparkR.session() in SparkR notebook, SparkR will overwrite SparkR env variable ".scStartTime" (which was set in Livy's SparkRInterpreter.scala), and render previously created jobj invalid because that jobj$appId is different from SparkR env variable ".scStartTime". Please see the fix in LIVY-505.patch > sparkR.session failed with "invalid jobj 1" error in Spark 2.3 > -- > > Key: LIVY-505 > URL: https://issues.apache.org/jira/browse/LIVY-505 > Project: Livy > Issue Type: Bug > Components: Interpreter >Affects Versions: 0.5.0, 0.5.1 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-505.patch > > > In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type: > {code:java} > %sparkr > sparkR.session(){code} > You will see error: > [1] "Error in writeJobj(con, object): invalid jobj 1" > In a successful case with older livy and spark versions, we see something > like this: > Java ref type org.apache.spark.sql.SparkSession id 1 > This indicates isValidJobj() function in Spark code returned false for > SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI: > {code:java} > isValidJobj <- function(jobj) { > if (exists(".scStartTime", envir = .sparkREnv)) { > jobj$appId == get(".scStartTime", envir = .sparkREnv) > } else { > FALSE > } > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3
[ https://issues.apache.org/jira/browse/LIVY-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated LIVY-505: - Attachment: LIVY-505.patch > sparkR.session failed with "invalid jobj 1" error in Spark 2.3 > -- > > Key: LIVY-505 > URL: https://issues.apache.org/jira/browse/LIVY-505 > Project: Livy > Issue Type: Bug > Components: Interpreter >Affects Versions: 0.5.0, 0.5.1 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-505.patch > > > In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type: > {code:java} > %sparkr > sparkR.session(){code} > You will see error: > [1] "Error in writeJobj(con, object): invalid jobj 1" > In a successful case with older livy and spark versions, we see something > like this: > Java ref type org.apache.spark.sql.SparkSession id 1 > This indicates isValidJobj() function in Spark code returned false for > SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI: > {code:java} > isValidJobj <- function(jobj) { > if (exists(".scStartTime", envir = .sparkREnv)) { > jobj$appId == get(".scStartTime", envir = .sparkREnv) > } else { > FALSE > } > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (LIVY-576) Unknown YARN state RUNNING for app with final status SUCCEEDED
[ https://issues.apache.org/jira/browse/LIVY-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated LIVY-576: - Attachment: LIVY-576.patch > Unknown YARN state RUNNING for app with final status SUCCEEDED > -- > > Key: LIVY-576 > URL: https://issues.apache.org/jira/browse/LIVY-576 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.5.0 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-576.patch > > > Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return > application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, > this means the Yarn application is finishing up and about to be successful. > Livy's mapYarnSate() method does not have a valid mapping for this > combination and therefore it render the session 'dead'. > I saw this in Livy server log: > 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for > app application_1553542555261_0063 with final status SUCCEEDED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (LIVY-576) Unknown YARN state RUNNING for app with final status SUCCEEDED
[ https://issues.apache.org/jira/browse/LIVY-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated LIVY-576: - Summary: Unknown YARN state RUNNING for app with final status SUCCEEDED (was: Unknown YARN state RUNNING FOR app with final status SUCCEEDED) > Unknown YARN state RUNNING for app with final status SUCCEEDED > -- > > Key: LIVY-576 > URL: https://issues.apache.org/jira/browse/LIVY-576 > Project: Livy > Issue Type: Bug > Components: Server >Affects Versions: 0.5.0 >Reporter: shanyu zhao >Priority: Major > > Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return > application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, > this means the Yarn application is finishing up and about to be successful. > Livy's mapYarnSate() method does not have a valid mapping for this > combination and therefore it render the session 'dead'. > I saw this in Livy server log: > 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for > app application_1553542555261_0063 with final status SUCCEEDED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (LIVY-576) Unknown YARN state RUNNING FOR app with final status SUCCEEDED
shanyu zhao created LIVY-576: Summary: Unknown YARN state RUNNING FOR app with final status SUCCEEDED Key: LIVY-576 URL: https://issues.apache.org/jira/browse/LIVY-576 Project: Livy Issue Type: Bug Components: Server Affects Versions: 0.5.0 Reporter: shanyu zhao Livy with Spark 2.3/2.4 on Yarn 2.9, there is a chance for Yarn to return application reports with Yarn state RUNNING and final Yarn status SUCCEEDED, this means the Yarn application is finishing up and about to be successful. Livy's mapYarnSate() method does not have a valid mapping for this combination and therefore it render the session 'dead'. I saw this in Livy server log: 19/03/25 20:04:28 ERROR utils.SparkYarnApp: Unknown YARN state RUNNING for app application_1553542555261_0063 with final status SUCCEEDED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active
[ https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780032#comment-16780032 ] shanyu zhao commented on LIVY-547: -- [~Jassy] are you still working on this issue? > Livy kills session after livy.server.session.timeout even if the session is > active > -- > > Key: LIVY-547 > URL: https://issues.apache.org/jira/browse/LIVY-547 > Project: Livy > Issue Type: Bug > Components: Server >Reporter: Sandeep Nemuri >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Livy kills session after {{livy.server.session.timeout}} even if the session > is active. > Code that runs more than the {{livy.server.session.timeout}} with > intermediate sleeps. > {noformat} > %pyspark > import time > import datetime > import random > def inside(p): > x, y = random.random(), random.random() > return x*x + y*y < 1 > NUM_SAMPLES=10 > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 100 s") > time.sleep(100) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 200 s") > time.sleep(200) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s1") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s2") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s3") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > print("waiting for 300 s4") > time.sleep(300) > count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ > .filter(inside).count() > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > {noformat} > Livy log: > {noformat} > 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created > [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: > Some(admin), state: idle, kind: shared, info: > {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin, > > sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}] > 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14... > 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session > reply. > java.util.concurrent.TimeoutException > at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) > at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223) > at > org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471) > at > org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471) > at scala.Option.foreach(Option.scala:236) > at > org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471) > at > org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174) > at > org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171) > at > org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 19/01/07 17:52:56 WARN RpcDispatcher: [ClientProtocol] Closing RPC channel > with 1 outstanding RPCs. > 19/01/07 17:52:56 WARN InteractiveSession: Failed to stop RSCDriver. Killing > it... > 19/01/07 17:52:56 INFO YarnClientImpl: Killed application > application_1546711709239_0002 > 19/01/07 17:52:56 INFO InteractiveSession: Stopped
[jira] [Commented] (LIVY-266) Livy sessions/batches are not secured. Any user can stop another user session/batch
[ https://issues.apache.org/jira/browse/LIVY-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704312#comment-16704312 ] shanyu zhao commented on LIVY-266: -- [~tc0312] are you saying that all the requests are actually using SPNEGO with identity "prabhu", therefore it all works fine. If a different user tries to post statements or kill sessions owned by user "prabhu", livy will deny that request? If knox is used to access livy in a keberized cluster, and knox user is configured as "livy.superusers", then proxyUser field is enforced for post /sessions request, However, post statements request to any sessions from knox server to livy server will always be successful because the caller identity is knox user not the end user. How does livy find out who is making the request to it? > Livy sessions/batches are not secured. Any user can stop another user > session/batch > --- > > Key: LIVY-266 > URL: https://issues.apache.org/jira/browse/LIVY-266 > Project: Livy > Issue Type: Task > Components: Core >Affects Versions: 0.3 >Reporter: Prabhu Kasinathan >Priority: Major > > Dev, > Livy session or batches are not currently secured. i.e. User A can start a > session or batch and User B can submit code to session started by User A or > even stop that session. This is critical issue on secured cluster, when User > A is having sensitive data access, there may be a chance User B can access > those sensitive datasets through User-A Session. > Here, is an example from our secured cluster. > # Starting session from user "prabhu" > curl --silent --negotiate -u:prabhu localhost:8998/sessions -X POST -H > 'Content-Type: application/json' -d '{ > "kind":"scala", > "proxyUser":"prabhu", > "name":"Testing" > }' | python -m json.tool > { > "id": 371, > "appId": null, > "owner": "prabhu", > "proxyUser": "prabhu", > "state": "starting", > "kind": "spark", > "appInfo": { > "driverLogUrl": null, > "sparkUiUrl": null > }, > "log": [] > } > # Executing code to above session by some other user "don" > curl --silent --negotiate -u:don localhost:8998/sessions/371/statements -X > POST -H 'Content-Type: application/json' -d '{ > "code":"sc.applicationId" > }' | python -m json.tool > { > "id": 0, > "state": "available", > "output": { > "status": "ok", > "execution_count": 0, > "data": { > "text/plain": "res0: String = application_1476926173701_398436" > } > } > } > # Stopping above session by different user "john" this time > curl --silent --negotiate -u:john localhost:8998/sessions/371 -X DELETE | > python -m json.tool > { > "msg": "deleted" > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (LIVY-231) Session recovery for batch sessions with multiple Livy servers
[ https://issues.apache.org/jira/browse/LIVY-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687299#comment-16687299 ] shanyu zhao commented on LIVY-231: -- Any update for this feature? Or should we implement an active-standby mode multi-node HA? > Session recovery for batch sessions with multiple Livy servers > -- > > Key: LIVY-231 > URL: https://issues.apache.org/jira/browse/LIVY-231 > Project: Livy > Issue Type: New Feature > Components: Batch, Core, RSC, Server >Affects Versions: 0.2 >Reporter: Meisam >Priority: Minor > > HA with one Livy server node means that Livy will be unavailable during the > recovery. Having more than one node makes Livy available even if one of the > server nodes goes down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (LIVY-505) sparkR.session failed with "invalid jobj 1" error in Spark 2.3
shanyu zhao created LIVY-505: Summary: sparkR.session failed with "invalid jobj 1" error in Spark 2.3 Key: LIVY-505 URL: https://issues.apache.org/jira/browse/LIVY-505 Project: Livy Issue Type: Bug Components: Interpreter Affects Versions: 0.5.0, 0.5.1 Reporter: shanyu zhao In Spark 2.3 cluster, use Zeppelin with livy2 interpreter, and type: {code:java} %sparkr sparkR.session(){code} You will see error: [1] "Error in writeJobj(con, object): invalid jobj 1" In a successful case with older livy and spark versions, we see something like this: Java ref type org.apache.spark.sql.SparkSession id 1 This indicates isValidJobj() function in Spark code returned false for SparkSession obj. This is isValidJobj() function in Spark 2.3 code FYI: {code:java} isValidJobj <- function(jobj) { if (exists(".scStartTime", envir = .sparkREnv)) { jobj$appId == get(".scStartTime", envir = .sparkREnv) } else { FALSE } }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (LIVY-501) Support Spark 2.3 SparkR backend
[ https://issues.apache.org/jira/browse/LIVY-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594481#comment-16594481 ] shanyu zhao commented on LIVY-501: -- Thanks [~jerryshao]. Where does Livy mention the compatibility with Spark versions? I checked release notes of livy 0.5 which didn't say which version of Spark does it support. In [https://github.com/apache/incubator-livy,] it mentioned that: "Livy requires at least Spark 1.6 and supports both Scala 2.10 and 2.11 builds of Spark, Livy will automatically pick repl dependencies through detecting the Scala version of Spark. Livy also supports Spark 2.0+ for both interactive and batch submission, you could seamlessly switch to different versions of Spark through {{SPARK_HOME}} configuration, without needing to rebuild Livy." > Support Spark 2.3 SparkR backend > > > Key: LIVY-501 > URL: https://issues.apache.org/jira/browse/LIVY-501 > Project: Livy > Issue Type: Bug > Components: Interpreter >Affects Versions: 0.5.0, 0.5.1 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-501.patch > > > Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] > instead of just an Int. > [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca] > Livy does not work with this version of RBackend because it still expecting a > single Int in the return value from init(). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (LIVY-501) Support Spark 2.3 SparkR backend
[ https://issues.apache.org/jira/browse/LIVY-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594336#comment-16594336 ] shanyu zhao commented on LIVY-501: -- Spark 2.3 works except for SparkR. I just attached the patch on top of branch-0.5 > Support Spark 2.3 SparkR backend > > > Key: LIVY-501 > URL: https://issues.apache.org/jira/browse/LIVY-501 > Project: Livy > Issue Type: Bug > Components: Interpreter >Affects Versions: 0.5.0, 0.5.1 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-501.patch > > > Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] > instead of just an Int. > [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca] > Livy does not work with this version of RBackend because it still expecting a > single Int in the return value from init(). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (LIVY-501) Support Spark 2.3 SparkR backend
[ https://issues.apache.org/jira/browse/LIVY-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated LIVY-501: - Attachment: LIVY-501.patch > Support Spark 2.3 SparkR backend > > > Key: LIVY-501 > URL: https://issues.apache.org/jira/browse/LIVY-501 > Project: Livy > Issue Type: Bug > Components: Interpreter >Affects Versions: 0.5.0, 0.5.1 >Reporter: shanyu zhao >Priority: Major > Attachments: LIVY-501.patch > > > Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] > instead of just an Int. > [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca] > Livy does not work with this version of RBackend because it still expecting a > single Int in the return value from init(). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (LIVY-501) Support Spark 2.3 SparkR backend
shanyu zhao created LIVY-501: Summary: Support Spark 2.3 SparkR backend Key: LIVY-501 URL: https://issues.apache.org/jira/browse/LIVY-501 Project: Livy Issue Type: Bug Components: Interpreter Affects Versions: 0.5.0, 0.5.1 Reporter: shanyu zhao Spark 2.3 modified RBackend:init() method to return Tuple2[Int, RAuthHelper] instead of just an Int. [https://github.com/apache/spark/commit/628c7b517969c4a7ccb26ea67ab3dd61266073ca] Livy does not work with this version of RBackend because it still expecting a single Int in the return value from init(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)