[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16330 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r106794245 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2909,6 +2910,30 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) + length(list1) <- l + length(list2) <- l + expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE)) +} + +# This should always be the last test in this test file. +test_that("No extra files are created in SPARK_HOME by starting session and making calls", { + # Check that it is not creating any extra file. + # Does not check the tempdir which would be cleaned up after. + filesAfter <- list.files(path = sparkRDir, all.files = TRUE) + + expect_true(length(sparkRFilesBefore) > 0) + # first, ensure derby.log is not there + expect_false("derby.log" %in% filesAfter) + # second, ensure only spark-warehouse is created when calling SparkSession, enableHiveSupport = F --- End diff -- agreed. updated, hope it's better now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r106789593 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2909,6 +2910,30 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) + length(list1) <- l + length(list2) <- l + expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE)) +} + +# This should always be the last test in this test file. +test_that("No extra files are created in SPARK_HOME by starting session and making calls", { + # Check that it is not creating any extra file. + # Does not check the tempdir which would be cleaned up after. + filesAfter <- list.files(path = sparkRDir, all.files = TRUE) + + expect_true(length(sparkRFilesBefore) > 0) + # first, ensure derby.log is not there + expect_false("derby.log" %in% filesAfter) + # second, ensure only spark-warehouse is created when calling SparkSession, enableHiveSupport = F --- End diff -- I'm a little confused how these two setdiff commands map to with or without hive support. Can we make this a bit more easier to understand ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r106359526 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -127,6 +127,13 @@ private[r] object RRDD { sparkConf.setExecutorEnv(name.toString, value.toString) } +if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") && --- End diff -- setting `derby.stream.error.file` is all we need to move `derby.log` - I'd like to proceed with this change to make the cut for 2.1.1 release unless we have serious concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105565957 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) + length(list1) <- l + length(list2) <- l + expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE)) +} + +# This should always be the last test in this test file. +test_that("No extra files are created in SPARK_HOME by starting session and making calls", { + # Check that it is not creating any extra file. + # Does not check the tempdir which would be cleaned up after. + filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), "R"), all.files = TRUE) + + expect_true(length(sparkHomeFileBefore) > 0) + compare_list(sparkHomeFileBefore, filesBefore) --- End diff -- updated/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105565947 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -127,6 +127,13 @@ private[r] object RRDD { sparkConf.setExecutorEnv(name.toString, value.toString) } +if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") && --- End diff -- in further testing, I found that `derby.system.home` affects metastore_db too, but I'm changing to set `derby.stream.error.file` instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105562562 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -127,6 +127,13 @@ private[r] object RRDD { sparkConf.setExecutorEnv(name.toString, value.toString) } +if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") && --- End diff -- Introduce "spark.sql.derby.system.home" in SQLConf as an internal config? Set the default to `System.getProperty("derby.system.home")`? Then, in the R, we can set "derby.system.home" to `tempdir()`? Does it sound ok? @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105548957 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) --- End diff -- Got it - that sounds good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105546348 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) --- End diff -- here's what it looks like ``` 1. Failure: No extra files are created in SPARK_HOME by starting session and making calls (@test_sparkSQL.R#2917) length(list1) not equal to length(list2). 1/1 mismatches [1] 22 - 23 == -1 2. Failure: No extra files are created in SPARK_HOME by starting session and making calls (@test_sparkSQL.R#2917) sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). 3/23 mismatches x[21]: "unit-tests.out" y[21]: "spark-warehouse" x[22]: "WINDOWS.md" y[22]: "unit-tests.out" x[23]: NA y[23]: "WINDOWS.md" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105545762 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) + length(list1) <- l + length(list2) <- l + expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE)) +} + +# This should always be the last test in this test file. +test_that("No extra files are created in SPARK_HOME by starting session and making calls", { + # Check that it is not creating any extra file. + # Does not check the tempdir which would be cleaned up after. + filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), "R"), all.files = TRUE) + + expect_true(length(sparkHomeFileBefore) > 0) + compare_list(sparkHomeFileBefore, filesBefore) --- End diff -- I'm trying to catch a few things with this - will add some comment on. for instance, 1) what's created by calling `sparkR.session(enableHiveSupport = F)` (every tests except test_sparkSQL.R) 2) what's created by calling `sparkR.session(enableHiveSupport = T)` (test_sparkSQL.R) this unfortunately doesn't quite work as expected - it should have failed actually instead of passing - because we are running Scala tests before and they have caused spark-warehouse and metastore_db to be created already, before any R code is run. reworking that now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105545724 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -127,6 +127,13 @@ private[r] object RRDD { sparkConf.setExecutorEnv(name.toString, value.toString) } +if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") && --- End diff -- well, in revisiting this I thought it would be easier to minimize the impact by making this R only. it would be much easier if we make the derby log going to tmp always for all lang binding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105545712 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) --- End diff -- the idea is to show enough information from the log without having to rerun the check manually. the first check will show the numeric values but it wouldn't say how exactly they are different. the next check (or moved to compare_list() here) will get testthat to dump the delta too, but first it must set the 2 lists into the same size etc.. in fact, all of these are well tested in "Check masked functions" test in test_context.R, just duplicated here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105545530 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) --- End diff -- The lengths should be equal if we get to this line ? Or am I missing something ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105545521 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -127,6 +127,13 @@ private[r] object RRDD { sparkConf.setExecutorEnv(name.toString, value.toString) } +if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") && --- End diff -- Its a little awkward that this is set in RRDD. Is there a more general place we can set this in across languages / runtimes (i.e. for Python / Scala as well) ? @cloud-fan @gatorsmile Any thoughts on this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r105545556 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +compare_list <- function(list1, list2) { + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(list1), length(list2)) + l <- max(length(list1), length(list2)) + length(list1) <- l + length(list2) <- l + expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE)) +} + +# This should always be the last test in this test file. +test_that("No extra files are created in SPARK_HOME by starting session and making calls", { + # Check that it is not creating any extra file. + # Does not check the tempdir which would be cleaned up after. + filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), "R"), all.files = TRUE) + + expect_true(length(sparkHomeFileBefore) > 0) + compare_list(sparkHomeFileBefore, filesBefore) --- End diff -- I'm not sure what we are checking by having both `sparkHomeFilesBefore` and `filesBefore` -- Wouldn't just one of them do the job and if not can we add a comment here ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r93351538 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -104,6 +104,12 @@ class SparkHadoopUtil extends Logging { } val bufferSize = conf.get("spark.buffer.size", "65536") hadoopConf.set("io.file.buffer.size", bufferSize) + + if (conf.contains("spark.sql.default.derby.dir")) { --- End diff -- @yhuai Spark by default has derby for metastore. Generally metastore_db and derby.log gets created by default in the current directory. This creates a problem for more restrictive environment, such as when running as a R package when the guideline is not to have anything written to user's space (unless under tempdir) Just checking now it also seems to be the case when running the pyspark shell. It looks like this is the new behavior since 2.0.0. Would it make sense if we always default derby/metastore to tempdir unless it is running in an application directory that would be cleaned out when the job is done (eg. YARN cluster) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r93176308 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -104,6 +104,12 @@ class SparkHadoopUtil extends Logging { } val bufferSize = conf.get("spark.buffer.size", "65536") hadoopConf.set("io.file.buffer.size", bufferSize) + + if (conf.contains("spark.sql.default.derby.dir")) { --- End diff -- Why do we need to introduce this flag? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r92984064 --- Diff: R/pkg/R/sparkR.R --- @@ -381,6 +381,10 @@ sparkR.session <- function( deployMode <- sparkConfigMap[["spark.submit.deployMode"]] } + if (!exists("spark.sql.default.derby.dir", envir = sparkConfigMap)) { --- End diff -- I see. This PR is just to fix the default behavior for releasing SparkR on CRAN. We need to add code comments to explain what it is for the future code maintainer/reader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r92955799 --- Diff: R/pkg/R/sparkR.R --- @@ -381,6 +381,10 @@ sparkR.session <- function( deployMode <- sparkConfigMap[["spark.submit.deployMode"]] } + if (!exists("spark.sql.default.derby.dir", envir = sparkConfigMap)) { --- End diff -- The metastore might not be always derby, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r92955787 --- Diff: R/pkg/R/sparkR.R --- @@ -381,6 +381,10 @@ sparkR.session <- function( deployMode <- sparkConfigMap[["spark.submit.deployMode"]] } + if (!exists("spark.sql.default.derby.dir", envir = sparkConfigMap)) { --- End diff -- What is `spark.sql.default.derby.dir`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r92953159 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -127,6 +127,13 @@ private[r] object RRDD { sparkConf.setExecutorEnv(name.toString, value.toString) } +if (sparkEnvirMap.containsKey("spark.sql.default.derby.dir") && +System.getProperty("derby.system.home") == null) { + // This must be set before SparkContext is instantiated. + System.setProperty("derby.system.home", + sparkEnvirMap.get("spark.sql.default.derby.dir").toString) --- End diff -- Is there a better, more general (than R), place for this in SparkConf or sql? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r92953125 --- Diff: R/pkg/inst/tests/testthat/test_aFirstTest.R --- @@ -0,0 +1,39 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# With testthat, tests are run in alphabetic order on the test filenames. +# Name this test starting with an `a` to make sure it is run first +context("a first test for package") + +filesBefore <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), "R"), all.files = TRUE) + +sparkSession <- sparkR.session(enableHiveSupport = FALSE) + +test_that("No extra files are created in SPARK_HOME", { + # Check that it is not creating any extra file. + # Does not check the tempdir which would be cleaned up after. + filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), "R"), all.files = TRUE) + + # get testthat to show the diff by first making the 2 lists equal in length + expect_equal(length(filesBefore), length(filesAfter)) + l <- max(length(filesBefore), length(filesAfter)) + length(filesBefore) <- l + length(filesAfter) <- l + expect_equal(sort(filesBefore, na.last = TRUE), sort(filesAfter, na.last = TRUE)) --- End diff -- this will fail until merging with #16290 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/16330 [SPARK-18817][SPARKR][SQL] change derby log output and metastore to temp dir ## What changes were proposed in this pull request? Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, override Hadoop (Hive) Conf for JDO if not set (eg. hive-site.xml not loaded) for metastore_db Set System.Property for derby home dir to move derby.log ## How was this patch tested? Manually, unit tests With this, these are relocated to under /tmp ``` /tmp# ls RtmprNffBx/ derby.log metastore_db ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rderby Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16330.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16330 commit 338b3c46ed2435daf2e780eab1ed04be3ed0b3ea Author: Felix CheungDate: 2016-12-18T22:44:51Z set derby dir --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org