[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16330


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r106794245
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2909,6 +2910,30 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
+  length(list1) <- l
+  length(list2) <- l
+  expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE))
+}
+
+# This should always be the last test in this test file.
+test_that("No extra files are created in SPARK_HOME by starting session 
and making calls", {
+  # Check that it is not creating any extra file.
+  # Does not check the tempdir which would be cleaned up after.
+  filesAfter <- list.files(path = sparkRDir, all.files = TRUE)
+
+  expect_true(length(sparkRFilesBefore) > 0)
+  # first, ensure derby.log is not there
+  expect_false("derby.log" %in% filesAfter)
+  # second, ensure only spark-warehouse is created when calling 
SparkSession, enableHiveSupport = F
--- End diff --

agreed. 
updated, hope it's better now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-18 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r106789593
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2909,6 +2910,30 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
+  length(list1) <- l
+  length(list2) <- l
+  expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE))
+}
+
+# This should always be the last test in this test file.
+test_that("No extra files are created in SPARK_HOME by starting session 
and making calls", {
+  # Check that it is not creating any extra file.
+  # Does not check the tempdir which would be cleaned up after.
+  filesAfter <- list.files(path = sparkRDir, all.files = TRUE)
+
+  expect_true(length(sparkRFilesBefore) > 0)
+  # first, ensure derby.log is not there
+  expect_false("derby.log" %in% filesAfter)
+  # second, ensure only spark-warehouse is created when calling 
SparkSession, enableHiveSupport = F
--- End diff --

I'm a little confused how these two setdiff commands map to with or without 
hive support. Can we make this a bit more easier to understand ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r106359526
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -127,6 +127,13 @@ private[r] object RRDD {
   sparkConf.setExecutorEnv(name.toString, value.toString)
 }
 
+if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") &&
--- End diff --

setting `derby.stream.error.file` is all we need to move `derby.log` - I'd 
like to proceed with this change to make the cut for 2.1.1 release unless we 
have serious concern?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105565957
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
+  length(list1) <- l
+  length(list2) <- l
+  expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE))
+}
+
+# This should always be the last test in this test file.
+test_that("No extra files are created in SPARK_HOME by starting session 
and making calls", {
+  # Check that it is not creating any extra file.
+  # Does not check the tempdir which would be cleaned up after.
+  filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), 
"R"), all.files = TRUE)
+
+  expect_true(length(sparkHomeFileBefore) > 0)
+  compare_list(sparkHomeFileBefore, filesBefore)
--- End diff --

updated/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105565947
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -127,6 +127,13 @@ private[r] object RRDD {
   sparkConf.setExecutorEnv(name.toString, value.toString)
 }
 
+if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") &&
--- End diff --

in further testing, I found that `derby.system.home` affects metastore_db 
too, but I'm changing to set `derby.stream.error.file` instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105562562
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -127,6 +127,13 @@ private[r] object RRDD {
   sparkConf.setExecutorEnv(name.toString, value.toString)
 }
 
+if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") &&
--- End diff --

Introduce "spark.sql.derby.system.home" in SQLConf as an internal config? 
Set the default to `System.getProperty("derby.system.home")`? Then, in the R, 
we can set "derby.system.home" to `tempdir()`?

Does it sound ok? @cloud-fan @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105548957
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
--- End diff --

Got it - that sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105546348
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
--- End diff --

here's what it looks like
```
1. Failure: No extra files are created in SPARK_HOME by starting session 
and making calls (@test_sparkSQL.R#2917)
length(list1) not equal to length(list2).
1/1 mismatches
[1] 22 - 23 == -1


2. Failure: No extra files are created in SPARK_HOME by starting session 
and making calls (@test_sparkSQL.R#2917)
sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
3/23 mismatches
x[21]: "unit-tests.out"
y[21]: "spark-warehouse"

x[22]: "WINDOWS.md"
y[22]: "unit-tests.out"

x[23]: NA
y[23]: "WINDOWS.md"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105545762
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
+  length(list1) <- l
+  length(list2) <- l
+  expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE))
+}
+
+# This should always be the last test in this test file.
+test_that("No extra files are created in SPARK_HOME by starting session 
and making calls", {
+  # Check that it is not creating any extra file.
+  # Does not check the tempdir which would be cleaned up after.
+  filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), 
"R"), all.files = TRUE)
+
+  expect_true(length(sparkHomeFileBefore) > 0)
+  compare_list(sparkHomeFileBefore, filesBefore)
--- End diff --

I'm trying to catch a few things with this - will add some comment on.
for instance,
1) what's created by calling `sparkR.session(enableHiveSupport = F)` (every 
tests except test_sparkSQL.R)
2) what's created by calling `sparkR.session(enableHiveSupport = T)` 
(test_sparkSQL.R)



this unfortunately doesn't quite work as expected - it should have failed 
actually instead of passing - because we are running Scala tests before and 
they have caused spark-warehouse and metastore_db to be created already, before 
any R code is run.

reworking that now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105545724
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -127,6 +127,13 @@ private[r] object RRDD {
   sparkConf.setExecutorEnv(name.toString, value.toString)
 }
 
+if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") &&
--- End diff --

well, in revisiting this I thought it would be easier to minimize the 
impact by making this R only.
it would be much easier if we make the derby log going to tmp always for 
all lang binding 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105545712
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
--- End diff --

the idea is to show enough information from the log without having to rerun 
the check manually.
the first check will show the numeric values but it wouldn't say how 
exactly they are different.
the next check (or moved to compare_list() here) will get testthat to dump 
the delta too, but first it must set the 2 lists into the same size etc.. 

in fact, all of these are well tested in "Check masked functions" test in 
test_context.R, just duplicated here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105545530
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
--- End diff --

The lengths should be equal if we get to this line ? Or am I missing 
something ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105545521
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -127,6 +127,13 @@ private[r] object RRDD {
   sparkConf.setExecutorEnv(name.toString, value.toString)
 }
 
+if (sparkEnvirMap.containsKey("spark.r.sql.default.derby.dir") &&
--- End diff --

Its a little awkward that this is set in RRDD. Is there a more general 
place we can set this in across languages / runtimes (i.e. for Python / Scala 
as well) ? 

@cloud-fan @gatorsmile Any thoughts on this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2017-03-11 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r105545556
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2897,6 +2898,27 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+compare_list <- function(list1, list2) {
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(list1), length(list2))
+  l <- max(length(list1), length(list2))
+  length(list1) <- l
+  length(list2) <- l
+  expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE))
+}
+
+# This should always be the last test in this test file.
+test_that("No extra files are created in SPARK_HOME by starting session 
and making calls", {
+  # Check that it is not creating any extra file.
+  # Does not check the tempdir which would be cleaned up after.
+  filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), 
"R"), all.files = TRUE)
+
+  expect_true(length(sparkHomeFileBefore) > 0)
+  compare_list(sparkHomeFileBefore, filesBefore)
--- End diff --

I'm not sure what we are checking by having both `sparkHomeFilesBefore` and 
`filesBefore` -- Wouldn't just one of them do the job and if not can we add a 
comment here ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r93351538
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -104,6 +104,12 @@ class SparkHadoopUtil extends Logging {
   }
   val bufferSize = conf.get("spark.buffer.size", "65536")
   hadoopConf.set("io.file.buffer.size", bufferSize)
+
+  if (conf.contains("spark.sql.default.derby.dir")) {
--- End diff --

@yhuai 
Spark by default has derby for metastore. Generally metastore_db and 
derby.log gets created by default in the current directory. This creates a 
problem for more restrictive environment, such as when running as a R package 
when the guideline is not to have anything written to user's space (unless 
under tempdir)

Just checking now it also seems to be the case when running the pyspark 
shell.

It looks like this is the new behavior since 2.0.0. Would it make sense if 
we always default derby/metastore to tempdir unless it is running in an 
application directory that would be cleaned out when the job is done (eg. YARN 
cluster)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-19 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r93176308
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -104,6 +104,12 @@ class SparkHadoopUtil extends Logging {
   }
   val bufferSize = conf.get("spark.buffer.size", "65536")
   hadoopConf.set("io.file.buffer.size", bufferSize)
+
+  if (conf.contains("spark.sql.default.derby.dir")) {
--- End diff --

Why do we need to introduce this flag?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r92984064
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -381,6 +381,10 @@ sparkR.session <- function(
 deployMode <- sparkConfigMap[["spark.submit.deployMode"]]
   }
 
+  if (!exists("spark.sql.default.derby.dir", envir = sparkConfigMap)) {
--- End diff --

I see. This PR is just to fix the default behavior for releasing SparkR on 
CRAN. 

We need to add code comments to explain what it is for the future code 
maintainer/reader. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r92955799
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -381,6 +381,10 @@ sparkR.session <- function(
 deployMode <- sparkConfigMap[["spark.submit.deployMode"]]
   }
 
+  if (!exists("spark.sql.default.derby.dir", envir = sparkConfigMap)) {
--- End diff --

The metastore might not be always derby, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r92955787
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -381,6 +381,10 @@ sparkR.session <- function(
 deployMode <- sparkConfigMap[["spark.submit.deployMode"]]
   }
 
+  if (!exists("spark.sql.default.derby.dir", envir = sparkConfigMap)) {
--- End diff --

What is `spark.sql.default.derby.dir`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r92953159
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala ---
@@ -127,6 +127,13 @@ private[r] object RRDD {
   sparkConf.setExecutorEnv(name.toString, value.toString)
 }
 
+if (sparkEnvirMap.containsKey("spark.sql.default.derby.dir") &&
+System.getProperty("derby.system.home") == null) {
+  // This must be set before SparkContext is instantiated.
+  System.setProperty("derby.system.home",
+ 
sparkEnvirMap.get("spark.sql.default.derby.dir").toString)
--- End diff --

Is there a better, more general (than R), place for this in SparkConf or 
sql?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16330#discussion_r92953125
  
--- Diff: R/pkg/inst/tests/testthat/test_aFirstTest.R ---
@@ -0,0 +1,39 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# With testthat, tests are run in alphabetic order on the test filenames.
+# Name this test starting with an `a` to make sure it is run first
+context("a first test for package")
+
+filesBefore <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), "R"), 
all.files = TRUE)
+
+sparkSession <- sparkR.session(enableHiveSupport = FALSE)
+
+test_that("No extra files are created in SPARK_HOME", {
+  # Check that it is not creating any extra file.
+  # Does not check the tempdir which would be cleaned up after.
+  filesAfter <- list.files(path = file.path(Sys.getenv("SPARK_HOME"), 
"R"), all.files = TRUE)
+
+  # get testthat to show the diff by first making the 2 lists equal in 
length
+  expect_equal(length(filesBefore), length(filesAfter))
+  l <- max(length(filesBefore), length(filesAfter))
+  length(filesBefore) <- l
+  length(filesAfter) <- l
+  expect_equal(sort(filesBefore, na.last = TRUE), sort(filesAfter, na.last 
= TRUE))
--- End diff --

this will fail until merging with #16290


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...

2016-12-18 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/16330

[SPARK-18817][SPARKR][SQL] change derby log output and metastore to temp dir

## What changes were proposed in this pull request?

Passes R `tempdir()` (this is the R session temp dir, shared with other 
temp files/dirs) to JVM, override Hadoop (Hive) Conf for JDO if not set (eg. 
hive-site.xml not loaded) for metastore_db

Set System.Property for derby home dir to move derby.log

## How was this patch tested?

Manually, unit tests

With this, these are relocated to under /tmp
```
/tmp# ls RtmprNffBx/
derby.log
metastore_db
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rderby

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16330


commit 338b3c46ed2435daf2e780eab1ed04be3ed0b3ea
Author: Felix Cheung 
Date:   2016-12-18T22:44:51Z

set derby dir




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org