[GitHub] spark pull request #14433: [SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14433#discussion_r77205301 --- Diff: core/src/main/scala/org/apache/spark/internal/Logging.scala --- @@ -135,7 +136,12 @@ private[spark] trait Logging { val replLevel = Option(replLogger.getLevel()).getOrElse(Level.WARN) if (replLevel != rootLogger.getEffectiveLevel()) { System.err.printf("Setting default log level to \"%s\".\n", replLevel) - System.err.println("To adjust logging level use sc.setLogLevel(newLevel).") + if (SparkSubmit.isRShell) { --- End diff -- +1 -- I personally find it odd that we would import `deploy.SparkSubmit` into `Logging` which is a base class used throughout the code base. We could also change the log message to be more vague. Something like `To adjust logging level please call setLogLevel(newLevel) using SparkContext` might work for all languages ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 @sun-rui Any other comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm should hav...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14856 Thanks @keypointt for the PR and @junyangq @felixcheung for reviewing. Merging this into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14903: [SparkR][Minor] Fix windowPartitionBy example
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14903 Merging this into master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14903: [SparkR][Minor] Fix windowPartitionBy example
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14903 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Sorry I think this was a break that I just fixed in #14904 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14904: [SPARKR][SPARK-16581] Fix JVM API tests in SparkR
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14904 Thanks - merging this to master, branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14904: [SPARKR][SPARK-16581] Fix JVM API tests in SparkR
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14904 cc @junyangq @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14903: [SparkR][Minor] Fix windowPartitionBy example
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14903 The fix is in #14904 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14904: [SPARKR][SPARK-16581] Fix JVM API tests in SparkR
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/14904 [SPARKR][SPARK-16581] Fix JVM API tests in SparkR ## What changes were proposed in this pull request? Remove cleanup.jobj test. Use JVM wrapper API for other test cases. ## How was this patch tested? Run R unit tests with testthat 1.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 sparkr-jvm-tests-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14904.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14904 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14903: [SparkR][Minor] Fix windowPartitionBy example
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14903 @junyangq @felixcheung The test error is due to an unrelated issue caused when we upgraded testthat on Jenkins. I'm sending a fix for it now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14784: [SPARK-17210][SPARKR] sparkr.zip is not distribut...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14784#discussion_r77073459 --- Diff: R/pkg/R/sparkR.R --- @@ -365,6 +365,10 @@ sparkR.session <- function( } overrideEnvs(sparkConfigMap, paramMap) } + if (nzchar(master)) { +assign("spark.master", master, envir = sparkConfigMap) --- End diff -- That just seems like an ordering issue ? For example ``` > val a = SparkSession.builder().config("spark.master", "local[3]").master("yarn").getOrCreate() > a.conf.get("spark.master") res1: String = yarn ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext in Spar...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14889 Yeah lets open a separate JIRA to update testthat on the Jenkins boxes. LGTM. Merging this to master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14784: [SPARK-17210][SPARKR] sparkr.zip is not distribut...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14784#discussion_r77031075 --- Diff: R/pkg/R/sparkR.R --- @@ -365,6 +365,10 @@ sparkR.session <- function( } overrideEnvs(sparkConfigMap, paramMap) } + if (nzchar(master)) { +assign("spark.master", master, envir = sparkConfigMap) --- End diff -- Yeah this is a tough one - I think preferring `master=` makes as it follows from a convention that the explicit arguments take precedence over more implicit ones passed as a list or read from a file. Also I think mimicing what `bin/spark-shell` does is probably good for consistency. The other option is to just throw an error and let the user fix this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14859 Thanks @HyukjinKwon - Some comments: - I think we can filter commits by `[SPARKR]` in the PR title. Changes to core/ or sql/ could of course break the SparkR tests but it'll give us some coverage for R PRs. - The periodic master branch build sounds good to me. Something like nightly might be good enough ? - Could you convert your comment describing how to setup / use AppVeyor into a markdown file in `dev/` ? - I created a JIRA to track the failing tests at https://issues.apache.org/jira/browse/SPARK-17339 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext in Spar...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14889 I just ran this on one of the Jenkins machines ``` > packageVersion("testthat") [1] â0.11.0â ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14890: [MINOR][SPARKR] Verbose build comment in WINDOWS.md rath...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14890 LGTM. Thanks @HyukjinKwon - Merging this to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext in Spar...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14889 Thanks @HyukjinKwon - This is a great catch. LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14744 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14859 Good point on only running this on the master branch. We could even run it periodically (say nightly) instead of on every commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14775: [SPARK-16581][SPARKR] Make JVM backend calling functions...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14775 Thanks @felixcheung - Addressed both the comments. Let me know if this looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76654300 --- Diff: R/pkg/R/backend.R --- @@ -25,9 +25,23 @@ isInstanceOf <- function(jobj, className) { callJMethod(cls, "isInstance", jobj) } -# Call a Java method named methodName on the object -# specified by objId. objId should be a "jobj" returned -# from the SparkRBackend. +#' Call Java Methods +#' +#' Call a Java method in the JVM running the Spark driver. +#' +#' @param objId object to invoke the method on. Should be a "jobj" created by newJObject. +#' @param methodName method name to call. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJStatic, newJObject +#' @examples --- End diff -- Added `rdname` for all 3 methods now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76654243 --- Diff: R/pkg/R/backend.R --- @@ -37,12 +51,42 @@ callJMethod <- function(objId, methodName, ...) { invokeJava(isStatic = FALSE, objId$id, methodName, ...) } -# Call a static method on a specified className +#' Call Static Java Methods +#' +#' Call a static method in the JVM running the Spark driver. +#' +#' @param className class containing the static method to invoke. +#' @param methodName name of static method to invoke. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJMethod, newJObject +#' @examples +#' \dontrun{ +#' sparkR.session() # Need to have a Spark JVM running before calling callJStatic +#' callJStatic("java.lang.System", "currentTimeMillis") +#' callJStatic("java.lang.System", "getProperty", "java.home") +#' } callJStatic <- function(className, methodName, ...) { invokeJava(isStatic = TRUE, className, methodName, ...) } -# Create a new object of the specified class name +#' Create Java Objects +#' +#' Create a new Java object in the JVM running the Spark driver. +#' +#' @param className name of the class to create --- End diff -- I added some comments in a `Details` section. Let me know if it reads ok. I agree that having a md file might be useful in the long run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76645788 --- Diff: R/pkg/DESCRIPTION --- @@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "felixche...@apache.org"), person(family = "The Apache Software Foundation", role = c("aut", "cph"))) URL: http://www.apache.org/ http://spark.apache.org/ -BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4 +BugReports: http://issues.apache.org/jira/browse/SPARK --- End diff -- I updated the wiki - Let me know if it looks better or if you have other suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14859 Thanks @HyukjinKwon - I for one would at least like the SparkR client to work on Windows as I think there are quite a few R users who are on Windows. @HyukjinKwon Could you add some documentation on how this build works, where we can check its status and what we can do to restart a build etc ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76526127 --- Diff: R/pkg/DESCRIPTION --- @@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "felixche...@apache.org"), person(family = "The Apache Software Foundation", role = c("aut", "cph"))) URL: http://www.apache.org/ http://spark.apache.org/ -BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4 +BugReports: http://issues.apache.org/jira/browse/SPARK --- End diff -- Other than being long that link also doesn't seem to work if you are not logged in (would be good if you can also check this). The other thing we could do is to just link to the wiki page at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports -- Do you think that is better ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14775: [SPARK-16581][SPARKR] Make JVM backend calling functions...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14775 @felixcheung Good point about having a wrapper -- That will make it easier to update the methods going forward. I added a new file `jvm.R` with the wrapper functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76523010 --- Diff: R/pkg/R/jobj.R --- @@ -82,7 +82,20 @@ getClassName.jobj <- function(x) { callJMethod(cls, "getName") } -cleanup.jobj <- function(jobj) { +#' Garbage collect Java Objects +#' +#' Garbage collect an object allocated on Spark driver JVM heap. +#' +#' cleanup.jobj is a low level method that lets developers manually garbage collect objects +#' allocated using newJObject. This is only to be used for advanced use cases as objects allocated +#' on the JVM heap are automatically garbage collected when the corresponding R reference goes out +#' of scope. +#' +#' @param x the Java object that should be garbage collected. +#' @note cleanup.jobj since 2.0.1 +#' @export +cleanup.jobj <- function(x) { + jobj <- x if (isValidJobj(jobj)) { --- End diff -- I guess since this is back to an internal method we dont need to worry about it. But I guess I can add the check if we still want it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522998 --- Diff: R/pkg/inst/tests/testthat/test_jvm_api.R --- @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +context("JVM API") + +sparkSession <- sparkR.session(enableHiveSupport = FALSE) + +test_that("Create and call methods on object", { + jarr <- newJObject("java.util.ArrayList") + # Add an element to the array + callJMethod(jarr, "add", 1L) + # Check if get returns the same element + expect_equal(callJMethod(jarr, "get", 0L), 1L) +}) + +test_that("Call static methods", { + # Convert a boolean to a string + strTrue <- callJStatic("java.lang.String", "valueOf", TRUE) + expect_equal(strTrue, "true") +}) + +test_that("Manually garbage collect objects", { + jarr <- newJObject("java.util.ArrayList") + cleanup.jobj(jarr) + # Using a jobj after GC should throw an error + expect_error(print(jarr), "Error in invokeJava.*") +}) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522995 --- Diff: R/pkg/R/jobj.R --- @@ -82,7 +82,20 @@ getClassName.jobj <- function(x) { callJMethod(cls, "getName") } -cleanup.jobj <- function(jobj) { +#' Garbage collect Java Objects +#' +#' Garbage collect an object allocated on Spark driver JVM heap. +#' +#' cleanup.jobj is a low level method that lets developers manually garbage collect objects +#' allocated using newJObject. This is only to be used for advanced use cases as objects allocated +#' on the JVM heap are automatically garbage collected when the corresponding R reference goes out +#' of scope. +#' +#' @param x the Java object that should be garbage collected. +#' @note cleanup.jobj since 2.0.1 +#' @export +cleanup.jobj <- function(x) { --- End diff -- Yeah thats a good point - but I've removed this as we don't need to export this for now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522987 --- Diff: R/pkg/R/jobj.R --- @@ -82,7 +82,20 @@ getClassName.jobj <- function(x) { callJMethod(cls, "getName") } -cleanup.jobj <- function(jobj) { +#' Garbage collect Java Objects +#' +#' Garbage collect an object allocated on Spark driver JVM heap. +#' +#' cleanup.jobj is a low level method that lets developers manually garbage collect objects +#' allocated using newJObject. This is only to be used for advanced use cases as objects allocated +#' on the JVM heap are automatically garbage collected when the corresponding R reference goes out +#' of scope. +#' +#' @param x the Java object that should be garbage collected. +#' @note cleanup.jobj since 2.0.1 --- End diff -- This is removed now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522986 --- Diff: R/pkg/R/backend.R --- @@ -37,12 +51,42 @@ callJMethod <- function(objId, methodName, ...) { invokeJava(isStatic = FALSE, objId$id, methodName, ...) } -# Call a static method on a specified className +#' Call Static Java Methods +#' +#' Call a static method in the JVM running the Spark driver. +#' +#' @param className class containing the static method to invoke. +#' @param methodName name of static method to invoke. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJMethod, newJObject +#' @examples +#' \dontrun{ +#' sparkR.session() # Need to have a Spark JVM running before calling callJStatic +#' callJStatic("java.lang.System", "currentTimeMillis") +#' callJStatic("java.lang.System", "getProperty", "java.home") +#' } callJStatic <- function(className, methodName, ...) { invokeJava(isStatic = TRUE, className, methodName, ...) } -# Create a new object of the specified class name +#' Create Java Objects +#' +#' Create a new Java object in the JVM running the Spark driver. +#' +#' @param className name of the class to create +#' @param ... arguments to be passed to the constructor +#' @export +#' @seealso callJMethod, callJStatic --- End diff -- Done the `seealso` and `return` - Lets discuss the `alias`, `rdname` in the other thread --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522977 --- Diff: R/pkg/R/backend.R --- @@ -37,12 +51,42 @@ callJMethod <- function(objId, methodName, ...) { invokeJava(isStatic = FALSE, objId$id, methodName, ...) } -# Call a static method on a specified className +#' Call Static Java Methods +#' +#' Call a static method in the JVM running the Spark driver. +#' +#' @param className class containing the static method to invoke. +#' @param methodName name of static method to invoke. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJMethod, newJObject +#' @examples +#' \dontrun{ +#' sparkR.session() # Need to have a Spark JVM running before calling callJStatic +#' callJStatic("java.lang.System", "currentTimeMillis") +#' callJStatic("java.lang.System", "getProperty", "java.home") +#' } callJStatic <- function(className, methodName, ...) { invokeJava(isStatic = TRUE, className, methodName, ...) } -# Create a new object of the specified class name +#' Create Java Objects +#' +#' Create a new Java object in the JVM running the Spark driver. +#' +#' @param className name of the class to create --- End diff -- I tried to qualify this in the comment on what gets returned. The trouble is that we dont have an externally visible documentation of what types will get converted vs. what will not. Also I think this is bound to change with versions (like the SQL decimal change for example). However this is a very low level API that is only for advanced developers. So I wonder if we should just leave a pointer to the source file ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522953 --- Diff: R/pkg/R/backend.R --- @@ -37,12 +51,42 @@ callJMethod <- function(objId, methodName, ...) { invokeJava(isStatic = FALSE, objId$id, methodName, ...) } -# Call a static method on a specified className +#' Call Static Java Methods +#' +#' Call a static method in the JVM running the Spark driver. +#' +#' @param className class containing the static method to invoke. --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522957 --- Diff: R/pkg/R/backend.R --- @@ -37,12 +51,42 @@ callJMethod <- function(objId, methodName, ...) { invokeJava(isStatic = FALSE, objId$id, methodName, ...) } -# Call a static method on a specified className +#' Call Static Java Methods +#' +#' Call a static method in the JVM running the Spark driver. +#' +#' @param className class containing the static method to invoke. +#' @param methodName name of static method to invoke. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJMethod, newJObject --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522952 --- Diff: R/pkg/R/backend.R --- @@ -25,9 +25,23 @@ isInstanceOf <- function(jobj, className) { callJMethod(cls, "isInstance", jobj) } -# Call a Java method named methodName on the object -# specified by objId. objId should be a "jobj" returned -# from the SparkRBackend. +#' Call Java Methods +#' +#' Call a Java method in the JVM running the Spark driver. +#' +#' @param objId object to invoke the method on. Should be a "jobj" created by newJObject. +#' @param methodName method name to call. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJStatic, newJObject +#' @examples --- End diff -- add the `@note` and `@return` to all the methods. I am not sure we need the `@aliases` or `@rdname` because these are S3 methods ? The rdnames come out as `sparkR.callJMethod.rd` etc. and look fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522940 --- Diff: R/pkg/R/backend.R --- @@ -25,9 +25,23 @@ isInstanceOf <- function(jobj, className) { callJMethod(cls, "isInstance", jobj) } -# Call a Java method named methodName on the object -# specified by objId. objId should be a "jobj" returned -# from the SparkRBackend. +#' Call Java Methods +#' +#' Call a Java method in the JVM running the Spark driver. +#' +#' @param objId object to invoke the method on. Should be a "jobj" created by newJObject. --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76522943 --- Diff: R/pkg/R/backend.R --- @@ -25,9 +25,23 @@ isInstanceOf <- function(jobj, className) { callJMethod(cls, "isInstance", jobj) } -# Call a Java method named methodName on the object -# specified by objId. objId should be a "jobj" returned -# from the SparkRBackend. +#' Call Java Methods +#' +#' Call a Java method in the JVM running the Spark driver. +#' +#' @param objId object to invoke the method on. Should be a "jobj" created by newJObject. +#' @param methodName method name to call. +#' @param ... parameters to pass to the Java method. +#' @export +#' @seealso callJStatic, newJObject --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76444282 --- Diff: R/pkg/NAMESPACE --- @@ -363,4 +363,9 @@ S3method(structField, jobj) S3method(structType, jobj) S3method(structType, structField) +export("newJObject") +export("callJMethod") +export("callJStatic") +export("cleanup.jobj") --- End diff -- Thanks @sun-rui and @aloknsingh - I'll update the PR today --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13584 Yeah I was going to say that we need to handle cases where `labels_output` is also used. We can just add a numeric suffix maybe ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14774: [SparkR][BUILD]:ignore cran-check.out under R folder
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14774 LGTM. Thanks @wangmiao1981 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13584 Hmm - the problem still seems to be relevant. @mengxr @junyangq Would one of you be able to look at this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13584 @keypointt Is this PR still relevant ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14046: [SPARK-16366][SPARKR] Fix time comparison failures in Sp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14046 @sun-rui Is this PR still active ? Can we close it if its not relevant ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14744 LGTM. I agree that using SparkConf is preferable over environment variables -- but it would be good to make the documentation clear on when each option is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set s...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14744#discussion_r76083596 --- Diff: docs/configuration.md --- @@ -1752,6 +1752,13 @@ showDF(properties, numRows = 200, truncate = FALSE) Executable for executing R scripts in client modes for driver. Ignored in cluster modes. + + spark.r.shell.command + R + +Executable for executing R shell. --- End diff -- +1 would be good to explain how this is different from `spark.r.shell.command` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Thanks @clarkfitzg -- I'll take a look at this tomorrow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14775: [SPARK-16581][SPARKR] Make JVM backend calling functions...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14775 cc @felixcheung @olarayej --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/14775 [SPARK-16581][SPARKR] Make JVM backend calling functions public ## What changes were proposed in this pull request? This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tests, CRAN checks You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 sparkr-java-api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14775 commit 472072ae12c146bd1a7a844337c4496bbdfdbd2d Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-22T19:37:38Z First cut of making JVM functions public commit d772285ab1d6ed7facd5db05b772ba752428a30f Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-23T17:29:43Z Improve documentation for JVM methods commit 5f4b53ad03f6c6b5d4d643a8e6161856b0452704 Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-23T17:29:59Z Merge branch 'master' of https://github.com/apache/spark into sparkr-java-api commit 5f2936165e7c9fa00bd589563a6b0a324b1e7989 Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-23T18:30:52Z Merge branch 'master' of https://github.com/apache/spark into sparkr-java-api commit d267f2f271aa4e3f7bd8542fa1772ecc9788 Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-23T19:40:39Z Add unit tests for JVM API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14741: [SPARK-6832][SPARKR][WIP]Handle partial reads in SparkR
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14741 The value of `what` shouldn't change if we are retrying the read. If it was `integer` it should remain `integer`. There is a corner case of what happens if we say read the first byte of the integer and not the remaining 3 bytes -- One way to handle that would be to read everything into a raw byte stream and do the cast to integer later. But I think we can postpone this to a later PR. Regarding your earlier question the easiest way to retry is to run a while loop around `readBin` while there are more bytes to read. Also we can have a fixed upper bound and throw an error after those many retries. Also I dont think SIGKIlLL is the appropriate signal to send - You might want to send SIGUSR1 (the code is 10 according to http://linuxandfriends.com/linux-signals/) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14447 @keypointt The error is happening because of the `href` line in the `spark.mlp` documentation. If you move the opening brace to the previous line the error goes away. i.e. the code should look like ``` #' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html}{ #' Multilayer Perceptron} ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14759: [SPARK-16577][SPARKR] Add CRAN documentation checks to r...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14759 Merging this to master, branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14758: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14758 Hmm thats a good catch. Unfortunately it looks like CRAN only wants one person / one email address to be in the Maintainer field. This is also documented at http://r-pkgs.had.co.nz/description.html#author in the 'On CRAN' section. I don't mind the `Authors@R` format -- it will I think generate a `Maintainers:` and `Authors:` automatically from it. One thing that would be good to add is to add ASF website, JIRA, developer mailing lists. I think we can add it to the `URL` and `BugReports` fields ? @felixcheung Do you want to send a PR for this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14759: [SPARK-16577][SPARKR] Add CRAN documentation chec...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14759#discussion_r75781286 --- Diff: R/run-tests.sh --- @@ -26,14 +26,35 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE + +NO_TESTS=1 NO_MANUAL=1 $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE +FAILED=$((PIPESTATUS[0]||$FAILED)) + +NUM_CRAN_WARNING="$(grep -c WARNING$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_ERROR="$(grep -c ERROR$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_NOTES="$(grep -c NOTE$ $CRAN_CHECK_LOG_FILE)" + if [[ $FAILED != 0 ]]; then cat $LOGFILE echo -en "\033[31m" # Red echo "Had test failures; see logs." echo -en "\033[0m" # No color exit -1 else -echo -en "\033[32m" # Green -echo "Tests passed." -echo -en "\033[0m" # No color +# We have 2 existing NOTEs for new maintainer, attach() +# We have one more NOTE in Jenkins due to "No repository set" +if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 3 ]]; then --- End diff -- I think we can leave this as `3` for now to satisfy Jenkins -- It means that if a PR is tested locally it might pass despite a new note being added, but Jenkins should anyway catch this during code review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14734: [SPARK-16508][SPARKR] doc updates and more CRAN check fi...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14734 It was mostly manual -- I ran `dev/merge_spark_pr` and it prompted me to say the cherry-pick has conflicts. I then opened another terminal and found that the conflicts were in 3 files. I manually fixed those three files, ran `git add` and then `git cherrry-pick --continue` to finish the commit. Then I came back to the script window to do the push. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14734: [SPARK-16508][SPARKR] doc updates and more CRAN check fi...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14734 I just did the merge in https://github.com/apache/spark/commit/b65b041af8b64413c7d460d4ea110b2044d6f36e -- Will be great if you can run CRAN checks using this and make sure I didn't miss anything --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14759: [SPARK-16577][SPARKR] Add CRAN documentation chec...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14759#discussion_r75776464 --- Diff: R/run-tests.sh --- @@ -26,14 +26,35 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE + +NO_TESTS=1 NO_MANUAL=1 $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE +FAILED=$((PIPESTATUS[0]||$FAILED)) + +NUM_CRAN_WARNING="$(grep -c WARNING$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_ERROR="$(grep -c ERROR$ $CRAN_CHECK_LOG_FILE)" +NUM_CRAN_NOTES="$(grep -c NOTE$ $CRAN_CHECK_LOG_FILE)" + if [[ $FAILED != 0 ]]; then cat $LOGFILE echo -en "\033[31m" # Red echo "Had test failures; see logs." echo -en "\033[0m" # No color exit -1 else -echo -en "\033[32m" # Green -echo "Tests passed." -echo -en "\033[0m" # No color +# We have 2 existing NOTEs for new maintainer, attach() +# We have one more NOTE in Jenkins due to "No repository set" +if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 3 ]]; then --- End diff -- Yeah I think its a jenkins specific problem as there is no default CRAN repository set in the R config of that machine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14734: [SPARK-16508][SPARKR] doc updates and more CRAN check fi...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14734 Hmm - let me give it a shot. If not I will open a fresh PR for `branch-2.0` after doing a manual edit per file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14759: [SPARK-16577][SPARKR] Add CRAN documentation checks to r...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14759 So running `R/check-cran.sh` out of the box on master branch will fail. However if you run `NO_TESTS=1 R/check-cran.sh` it should pass. Basically we can't use check-cran.sh to run unit tests on the master branch (or on any unreleased branch as it is). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] doc updates and more CRAN c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14734#discussion_r75772452 --- Diff: R/pkg/NAMESPACE --- @@ -1,5 +1,9 @@ # Imports from base R -importFrom(methods, setGeneric, setMethod, setOldClass) +# Do not include stats:: "rpois", "runif" - causes error at runtime +importFrom("methods", "setGeneric", "setMethod", "setOldClass") +importFrom("methods", "is", "new", "signature", "show") --- End diff -- It might have to do with the R version used. I am using R 3.2.1 on my machine while this is from R 3.3.0 -- Using a later R version is obviously better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14759: [SPARK-16577][SPARKR] Add CRAN documentation checks to r...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14759 The new check I added doesn't run unit tests using `R CMD check`-- it only checks the documentation etc. and thus works on the master branch as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14759: [SPARK-16577][SPARKR] Add CRAN documentation checks to r...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14759 @junyangq I guess we could - but it should be pretty simple to figure out what is the problem given the entire log file ? FWIW the error we have right now will be fixed when we merge #14734 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14759: [SPARK-16577][SPARKR] Add CRAN documentation chec...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14759#discussion_r75766425 --- Diff: R/run-tests.sh --- @@ -26,14 +26,30 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE +CRAN_CHECK_OPTIONS="--no-tests --no-manual" $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE --- End diff -- Yeah - I think it will get generated by CRAN online (we submit a source package to them I think) - We will also do a check before we release a package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14759: [SPARK-16577][SPARKR] Add CRAN documentation chec...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14759#discussion_r75758514 --- Diff: R/run-tests.sh --- @@ -26,14 +26,30 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE +CRAN_CHECK_OPTIONS="--no-tests --no-manual" $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE --- End diff -- Yeah I've turned it off because of the lack of latex packages on Jenkins. I dont think this matters for now as the PDF manual isn't something we use. But we can revisit this later if required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14759: [SPARK-16577][SPARKR] Add CRAN documentation chec...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14759#discussion_r75758122 --- Diff: R/run-tests.sh --- @@ -26,14 +26,30 @@ rm -f $LOGFILE SPARK_TESTING=1 $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.default.name="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE FAILED=$((PIPESTATUS[0]||$FAILED)) +# Also run the documentation tests for CRAN +CRAN_CHECK_LOG_FILE=$FWDIR/cran-check.out +rm -f $CRAN_CHECK_LOG_FILE +CRAN_CHECK_OPTIONS="--no-tests --no-manual" $FWDIR/check-cran.sh 2>&1 | tee -a $CRAN_CHECK_LOG_FILE +FAILED=$((PIPESTATUS[0]||$FAILED)) + +HAS_CRAN_WARNING="$(grep -c WARNING $CRAN_CHECK_LOG_FILE)" --- End diff -- Good point - Added that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14758: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14758 Merging to master, branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14759: [SPARK-16577][SPARKR] Add CRAN documentation checks to r...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14759 cc @felixcheung @junyangq --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14759: [SPARK-16577][SPARKR] Add CRAN documentation chec...
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/14759 [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 sparkr-cran-jenkins Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14759 commit 349d95d0ce933d6670d5326ab560ccef420b814e Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-21T20:43:15Z Add CRAN documentation checks to run-tests.sh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14735: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reforma...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14735 Leaving it out of branch-2.0 sounds good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14758: [SPARKR][MINOR] Add Xiangrui and Felix to maintainers
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14758 cc @mengxr @felixcheung FYI - This is mostly to ensure that we can have more maintainers who can update the CRAN submissions. This shouldn't affect anything else on the development side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14758: [SPARKR][MINOR] Add Xiangrui and Felix to maintai...
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/14758 [SPARKR][MINOR] Add Xiangrui and Felix to maintainers ## What changes were proposed in this pull request? This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 sparkr-maintainers Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14758.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14758 commit 3ab82a0d3828faa084b7bf77aebb62c7d89db775 Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-22T18:05:13Z Add Xiangrui and Felix to maintainers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14735: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reforma...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14735 @felixcheung I didn't look at the code very closely, but will this change be required in `branch-2.0` as well ? If so the merge might be hard to --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] doc updates and more CRAN c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14734#discussion_r75720021 --- Diff: R/pkg/R/DataFrame.R --- @@ -3058,7 +3057,7 @@ setMethod("str", #' @note drop since 2.0.0 setMethod("drop", signature(x = "SparkDataFrame"), - function(x, col, ...) { --- End diff -- Thanks - that sounds good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14734: [SPARK-16508][SPARKR] doc updates and more CRAN check fi...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14734 LGTM. I had a couple of minor comments inline. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] doc updates and more CRAN c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14734#discussion_r75718809 --- Diff: R/pkg/R/generics.R --- @@ -1339,7 +1339,6 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s setGeneric("spark.survreg", function(data, formula) { standardGeneric("spark.survreg") }) #' @rdname spark.lda -#' @param ... Additional parameters to tune LDA. --- End diff -- never mind - I see that this is moved to mllib.R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] doc updates and more CRAN c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14734#discussion_r75718694 --- Diff: R/pkg/R/generics.R --- @@ -1339,7 +1339,6 @@ setGeneric("spark.naiveBayes", function(data, formula, ...) { standardGeneric("s setGeneric("spark.survreg", function(data, formula) { standardGeneric("spark.survreg") }) #' @rdname spark.lda -#' @param ... Additional parameters to tune LDA. --- End diff -- Just checking - removing `...` here is intentional ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] doc updates and more CRAN c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14734#discussion_r75718243 --- Diff: R/pkg/R/DataFrame.R --- @@ -3058,7 +3057,7 @@ setMethod("str", #' @note drop since 2.0.0 setMethod("drop", signature(x = "SparkDataFrame"), - function(x, col, ...) { --- End diff -- just to clarify removing `...` is intentional ? Just wondering as we have the `@param` documentation above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] doc updates and more CRAN c...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14734#discussion_r75717898 --- Diff: R/pkg/NAMESPACE --- @@ -1,5 +1,9 @@ # Imports from base R -importFrom(methods, setGeneric, setMethod, setOldClass) +# Do not include stats:: "rpois", "runif" - causes error at runtime +importFrom("methods", "setGeneric", "setMethod", "setOldClass") +importFrom("methods", "is", "new", "signature", "show") --- End diff -- Do these things show up as CRAN warnings ? I dont see them on my machine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14734: [SPARK-16508][SPARKR] doc updates and more CRAN check fi...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14734 @junyangq Could you take one more look ? I will also do a pass now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14743: [SparkR][Minor] Fix Cache Folder Path in Windows
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14743 BTW LGTM. Merging this PR into master, branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14743: [SparkR][Minor] Fix Cache Folder Path in Windows
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14743 Thanks @HyukjinKwon -- this is a bit surprising as it was only recently that you fixed the windows tests in https://github.com/apache/spark/commit/1c403733b89258e57daf7b8b0a2011981ad7ed8a Lets file a separate JIRA for these test failures -- And I dont think we have Windows infrastructure in the AMPLab Jenkins cluster. If we can setup a travis one that runs something like nightly / weekly that will be great --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14743: [SparkR][Minor] Fix Cache Folder Path in Windows
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14743 @HyukjinKwon can you test this on Windows ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 I cherry-picked this PR into `branch-2.0` in https://github.com/apache/spark/commit/0297896119e11f23da4b14f62f50ec72b5fac57f -- The merge was a little awkward and but I think I got it to work correctly. The CRAN checks look fine to me locally, but it would be good if somebody else can also verify this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 Yeah so we can do a couple of things. One is we try to cherry-pick this PR to branch-2.0 and then fix all the merge conflicts that are thrown. I think that should handle cases where the method doesn't exist in 2.0 etc. The other option is to create a new PR that is targeted at branch-2.0 (i.e. the cherry-pick / merge can be done as a part of development) and then we can review, merge it. Let me know if you or @junyangq want to try the second option -- If not I can try the first one and see how many conflicts there are. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14735 This seems a big enough change that it might be good to have a JIRA for this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14489: [MINOR][SparkR] R API documentation for "coltypes" is co...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14489 The warning I got in `branch-2.0` before this PR was ``` Duplicated \argument entries in documentation object 'coltypes': âxâ ``` With the backport this warning goes away, so nothing needs to be fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 Yeah I think we will be more careful about adding new algorithms to override existing R methods -- but given that `glm` is already exposed I'd think we can make an exception for just this one. @mengxr Any thoughts on this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14705 Thanks @junyangq -- I just ran the check on `branch-2.0` with this PR and in addition to `glm` there were two warnings for `...` in `first` and `unpersist` that we can fix in this PR. For `glm` its kind of unfortunate but I think we need to just go ahead and add those args and say that those are not used by SparkR. Is there any other workaround you have in mind ? We can do that in a separate PR if required. ``` Undocumented arguments in documentation object 'first' â...â Undocumented arguments in documentation object 'glm' âweightsâ âsubsetâ âna.actionâ âstartâ âetastartâ âmustartâ âoffsetâ âcontrolâ âmodelâ âmethodâ âxâ âyâ âcontrastsâ â...â Undocumented arguments in documentation object 'unpersist' â...â ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14489: [MINOR][SparkR] R API documentation for "coltypes" is co...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14489 @felixcheung FYI I'm also backporting this to branch-2.0 as it gets rid of one of the CRAN warnings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14639#discussion_r75344423 --- Diff: R/pkg/R/sparkR.R --- @@ -344,6 +344,7 @@ sparkRHive.init <- function(jsc = NULL) { #' @note sparkR.session since 2.0.0 sparkR.session <- function( master = "", + deployMode = "", --- End diff -- We should move this to the end of the param list as @felixcheung says. On a different note, is there a reason this should be a parameter to `spark.session` ? It would be good if the users didn't have to think of deploy mode and we inferred it from the spark-submit options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14639 @zjffdu Thanks for clarifying -- I now remember that in the YARN cluster mode there is no `SPARK_HOME` set. However in this case the JVM comes up first and the R process then connects to it. So in such cases we should never have to download Spark as Spark is already running. Thus I think @zjffdu's change of not calling install.spark for cluster mode is the right fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics that cr...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14626 LGTM. Thanks @felixcheung - Merging this to master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14641 Thanks - Merging this to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics ...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14626#discussion_r74866185 --- Diff: R/pkg/R/generics.R --- @@ -152,9 +146,9 @@ setGeneric("getNumPartitions", function(x) { standardGeneric("getNumPartitions") # @export setGeneric("numPartitions", function(x) { standardGeneric("numPartitions") }) -# @rdname persist -# @export -setGeneric("persist", function(x, newLevel) { standardGeneric("persist") }) +setGeneric("partitionByRDD", function(x, ...) { standardGeneric("partitionByRDD") }) --- End diff -- Ah yes - you are right. We do have a `partitionBy` in windowSpec. Renaming the RDD one is fine -- we might need to move the `partitionBy` generic to the right place in `generics.R` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14182 This looks fine to me - @felixcheung feel free to merge this when you think its good to go --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14522 Yeah LGTM. Merging this to master, branch-2.0 -- Thanks @junyangq --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics that cr...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14626 I had one minor question about partitionBy -- otherwise change LGTM. Thanks @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14626: [SPARK-16519][SPARKR] Handle SparkR RDD generics ...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14626#discussion_r74803446 --- Diff: R/pkg/R/generics.R --- @@ -152,9 +146,9 @@ setGeneric("getNumPartitions", function(x) { standardGeneric("getNumPartitions") # @export setGeneric("numPartitions", function(x) { standardGeneric("numPartitions") }) -# @rdname persist -# @export -setGeneric("persist", function(x, newLevel) { standardGeneric("persist") }) +setGeneric("partitionByRDD", function(x, ...) { standardGeneric("partitionByRDD") }) --- End diff -- Is `partitionBy` a problem that the CRAN checks report ? From what I can see we only have a `windowPartitionBy` function which shouldn't collide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org