[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447710#comment-15447710 ] Sun Rui commented on SPARK-13573: - [~chipsenkbeil], we have made public the method for creating Java objects and invoking object methods: sparkR.callJMethod(), sparkR.callJStatic(), sparkR.newJObject(). please refer to SPARK-16581 and https://github.com/apache/spark/blob/master/R/pkg/R/jvm.R > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184912#comment-15184912 ] Chip Senkbeil commented on SPARK-13573: --- [~sunrui], it's definitely a start to remove the fork, but there is always a concern that the functionality will be removed in the future without considering applications that would benefit from using it. So, I want to push on the Spark community to either adopt these APIs as public or form some reasonable subset of APIs that would allow Toree to realistically perform the same tasks as it does now. What are your thoughts? > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184806#comment-15184806 ] Sun Rui commented on SPARK-13573: - [~chipsenkbeil] It seems that Toree runs SparkR kernel and the RBackend in the same JVM, which is an interesting architecture that the SparkR design has not considered. I am not sure if the SparkR community can be convinced to adopt the proposed changes, as the R-JVM bridge and RBackend APIs are intended for internal use. If we made them public, that means we have to maintain the APIs stable, which may limit our future evolution. One possible solution for Toree could be: 1. Use SparkR::: prefix to access all private methods; 2. Move SparkR.connect() to your sparkr_runner.R; 3. Keep using the existing ReflectiveRBackend to access RBackend. Then generally you don't need maintain a fork of SparkR. [~shivaram] any comments? > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175093#comment-15175093 ] Chip Senkbeil commented on SPARK-13573: --- I'd gladly create a PR with the changes if needed. We haven't synced with Spark 1.6.0+ yet, so it'd just take me a little bit to get up to speed. Other than the one new method to enable connecting without creating a Spark Context, it's just exporting functions and switching the RBackend class to be public. > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175089#comment-15175089 ] Chip Senkbeil commented on SPARK-13573: --- In terms of the JVM class whose methods we are invoking, the majority can be found here: https://github.com/apache/incubator-toree/blob/master/kernel-api/src/main/scala/org/apache/toree/interpreter/broker/BrokerState.scala#L90 We basically maintain an object that acts as a code queue where the SparkR process pulls off code to evaluate and then sends back results as strings. We also had to write a wrapper for the RBackend since it was package protected: https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/scala/org/apache/toree/kernel/interpreter/sparkr/ReflectiveRBackend.scala > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175079#comment-15175079 ] Chip Senkbeil commented on SPARK-13573: --- [~sunrui], IIRC, Toree supported SparkR from 1.4.x and 1.5.x. Just a bit of a pain to keep in sync. So, the process Toree uses the methods to interact with SparkR is as follows: # We added a SparkR.connect method (https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/R/pkg/R/sparkR.R#L220) that uses the EXISTING_SPARKR_BACKEND_PORT to connect to an R backend but does not attempt to initialize the Spark Context # We use the exposed callJStatic to acquire a reference to a Java (well, Scala) object that has additional variables like the Spark Context hanging off of it (https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/kernelR/sparkr_runner.R#L50) {code}# Retrieve the bridge used to perform actions on the JVM bridge <- callJStatic( "org.apache.toree.kernel.interpreter.sparkr.SparkRBridge", "sparkRBridge" ) # Retrieve the state used to pull code off the JVM and push results back state <- callJMethod(bridge, "state") # Acquire the kernel API instance to expose kernel <- callJMethod(bridge, "kernel") assign("kernel", kernel, .runnerEnv){code} # We then invoke methods using callJMethod to get the next string of R code to evaluate {code}# Load the conainer of the code codeContainer <- callJMethod(state, "nextCode") # If not valid result, wait 1 second and try again if (!class(codeContainer) == "jobj") { Sys.sleep(1) next() } # Retrieve the code id (for response) and code codeId <- callJMethod(codeContainer, "codeId") code <- callJMethod(codeContainer, "code"){code} # Finally, we evaluate the acquired code string and send the results back to our running JVM (which represents a Jupyter kernel) {code} # Parse the code into an expression to be evaluated codeExpr <- parse(text = code) print(paste("Code expr", codeExpr)) tryCatch({ # Evaluate the code provided and capture the result as a string result <- capture.output(eval(codeExpr, envir = .runnerEnv)) print(paste("Result type", class(result), length(result))) print(paste("Success", codeId, result)) # Mark the execution as a success and send back the result # If output is null/empty, ensure that we can send it (otherwise fails) if (is.null(result) || length(result) <= 0) { print("Marking success with no output") callJMethod(state, "markSuccess", codeId) } else { # Clean the result before sending it back cleanedResult <- trimws(flatten(result, shouldTrim = FALSE)) print(paste("Marking success with output:", cleanedResult)) callJMethod(state, "markSuccess", codeId, cleanedResult) } }, error = function(ex) { # Mark the execution as a failure and send back the error print(paste("Failure", codeId, toString(ex))) callJMethod(state, "markFailure", codeId, toString(ex)) }){code} > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174726#comment-15174726 ] Sun Rui commented on SPARK-13573: - [~chipsenkbeil] glad to know Toree is to support SparkR. I tried it and can't figure out how to interact with SparkR. Could you describe how Toree uses the methods to provide interaction with SparkR? > Open SparkR APIs (R package) to allow better 3rd party usage > > > Key: SPARK-13573 > URL: https://issues.apache.org/jira/browse/SPARK-13573 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Chip Senkbeil > > Currently, SparkR's R package does not expose enough of its APIs to be used > flexibly. That I am aware of, SparkR still requires you to create a new > SparkContext by invoking the sparkR.init method (so you cannot connect to a > running one) and there is no way to invoke custom Java methods using the > exposed SparkR API (unlike PySpark). > We currently maintain a fork of SparkR that is used to power the R > implementation of Apache Toree, which is a gateway to use Apache Spark. This > fork provides a connect method (to use an existing Spark Context), exposes > needed methods like invokeJava (to be able to communicate with our JVM to > retrieve code to run, etc), and uses reflection to access > org.apache.spark.api.r.RBackend. > Here is the documentation I recorded regarding changes we need to enable > SparkR as an option for Apache Toree: > https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org