[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-08-29 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447710#comment-15447710
 ] 

Sun Rui commented on SPARK-13573:
-

[~chipsenkbeil], we have made public the method for creating Java objects and 
invoking object methods: sparkR.callJMethod(), sparkR.callJStatic(), 
sparkR.newJObject().  please refer to SPARK-16581 and 
https://github.com/apache/spark/blob/master/R/pkg/R/jvm.R 

> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-08 Thread Chip Senkbeil (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184912#comment-15184912
 ] 

Chip Senkbeil commented on SPARK-13573:
---

[~sunrui], it's definitely a start to remove the fork, but there is always a 
concern that the functionality will be removed in the future without 
considering applications that would benefit from using it. So, I want to push 
on the Spark community to either adopt these APIs as public  or form some 
reasonable subset of APIs that would allow Toree to realistically perform the 
same tasks as it does now. What are your thoughts?

> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-08 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184806#comment-15184806
 ] 

Sun Rui commented on SPARK-13573:
-

[~chipsenkbeil] It seems that Toree runs SparkR kernel and the RBackend in the 
same JVM, which is an interesting architecture that the SparkR design has not 
considered. I am not sure if the SparkR community can be convinced to adopt the 
proposed changes, as the R-JVM bridge and RBackend APIs are intended for 
internal use. If we made them public, that means we have to maintain the APIs 
stable, which may limit our future evolution.

One possible solution for Toree could be:
1. Use SparkR::: prefix to access all private methods;
2. Move SparkR.connect() to your sparkr_runner.R;
3. Keep using the existing ReflectiveRBackend to access RBackend.

Then generally you don't need maintain a fork of SparkR.

[~shivaram] any comments?


> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-01 Thread Chip Senkbeil (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175093#comment-15175093
 ] 

Chip Senkbeil commented on SPARK-13573:
---

I'd gladly create a PR with the changes if needed. We haven't synced with Spark 
1.6.0+ yet, so it'd just take me a little bit to get up to speed. Other than 
the one new method to enable connecting without creating a Spark Context, it's 
just exporting functions and switching the RBackend class to be public.

> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-01 Thread Chip Senkbeil (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175089#comment-15175089
 ] 

Chip Senkbeil commented on SPARK-13573:
---

In terms of the JVM class whose methods we are invoking, the majority can be 
found here: 
https://github.com/apache/incubator-toree/blob/master/kernel-api/src/main/scala/org/apache/toree/interpreter/broker/BrokerState.scala#L90

We basically maintain an object that acts as a code queue where the SparkR 
process pulls off code to evaluate and then sends back results as strings.

We also had to write a wrapper for the RBackend since it was package protected: 
https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/scala/org/apache/toree/kernel/interpreter/sparkr/ReflectiveRBackend.scala

> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-01 Thread Chip Senkbeil (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175079#comment-15175079
 ] 

Chip Senkbeil commented on SPARK-13573:
---

[~sunrui], IIRC, Toree supported SparkR from 1.4.x and 1.5.x. Just a bit of a 
pain to keep in sync.

So, the process Toree uses the methods to interact with SparkR is as follows:

# We added a SparkR.connect method 
(https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/R/pkg/R/sparkR.R#L220)
 that uses the EXISTING_SPARKR_BACKEND_PORT to connect to an R backend but does 
not attempt to initialize the Spark Context
# We use the exposed callJStatic to acquire a reference to a Java (well, Scala) 
object that has additional variables like the Spark Context hanging off of it 
(https://github.com/apache/incubator-toree/blob/master/sparkr-interpreter/src/main/resources/kernelR/sparkr_runner.R#L50)
 {code}# Retrieve the bridge used to perform actions on the JVM
bridge <- callJStatic(
  "org.apache.toree.kernel.interpreter.sparkr.SparkRBridge", "sparkRBridge"
)

# Retrieve the state used to pull code off the JVM and push results back
state <- callJMethod(bridge, "state")

# Acquire the kernel API instance to expose
kernel <- callJMethod(bridge, "kernel")
assign("kernel", kernel, .runnerEnv){code}
# We then invoke methods using callJMethod to get the next string of R code to 
evaluate {code}# Load the conainer of the code
  codeContainer <- callJMethod(state, "nextCode")

  # If not valid result, wait 1 second and try again
  if (!class(codeContainer) == "jobj") {
Sys.sleep(1)
next()
  }

  # Retrieve the code id (for response) and code
  codeId <- callJMethod(codeContainer, "codeId")
  code <- callJMethod(codeContainer, "code"){code}
# Finally, we evaluate the acquired code string and send the results back to 
our running JVM (which represents a Jupyter kernel) {code}  # Parse the code 
into an expression to be evaluated
  codeExpr <- parse(text = code)
  print(paste("Code expr", codeExpr))

  tryCatch({
# Evaluate the code provided and capture the result as a string
result <- capture.output(eval(codeExpr, envir = .runnerEnv))
print(paste("Result type", class(result), length(result)))
print(paste("Success", codeId, result))

# Mark the execution as a success and send back the result
# If output is null/empty, ensure that we can send it (otherwise fails)
if (is.null(result) || length(result) <= 0) {
  print("Marking success with no output")
  callJMethod(state, "markSuccess", codeId)
} else {
  # Clean the result before sending it back
  cleanedResult <- trimws(flatten(result, shouldTrim = FALSE))

  print(paste("Marking success with output:", cleanedResult))
  callJMethod(state, "markSuccess", codeId, cleanedResult)
}
  }, error = function(ex) {
# Mark the execution as a failure and send back the error
print(paste("Failure", codeId, toString(ex)))
callJMethod(state, "markFailure", codeId, toString(ex))
  }){code}

> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-01 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174726#comment-15174726
 ] 

Sun Rui commented on SPARK-13573:
-

[~chipsenkbeil] glad to know Toree is to support SparkR. I tried it and can't 
figure out how to interact with SparkR. Could you describe how Toree uses the 
methods to provide interaction with SparkR?

> Open SparkR APIs (R package) to allow better 3rd party usage
> 
>
> Key: SPARK-13573
> URL: https://issues.apache.org/jira/browse/SPARK-13573
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Chip Senkbeil
>
> Currently, SparkR's R package does not expose enough of its APIs to be used 
> flexibly. That I am aware of, SparkR still requires you to create a new 
> SparkContext by invoking the sparkR.init method (so you cannot connect to a 
> running one) and there is no way to invoke custom Java methods using the 
> exposed SparkR API (unlike PySpark).
> We currently maintain a fork of SparkR that is used to power the R 
> implementation of Apache Toree, which is a gateway to use Apache Spark. This 
> fork provides a connect method (to use an existing Spark Context), exposes 
> needed methods like invokeJava (to be able to communicate with our JVM to 
> retrieve code to run, etc), and uses reflection to access 
> org.apache.spark.api.r.RBackend.
> Here is the documentation I recorded regarding changes we need to enable 
> SparkR as an option for Apache Toree: 
> https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org