[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2017-03-10 Thread shivaram
Github user shivaram closed the pull request at:

https://github.com/apache/spark/pull/16290


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2017-02-27 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r103276097
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -376,6 +377,12 @@ sparkR.session <- function(
 overrideEnvs(sparkConfigMap, paramMap)
   }
 
+  # NOTE(shivaram): Set default warehouse dir to tmpdir to meet CRAN 
requirements
+  # See SPARK-18817 for more details
+  if (!exists("spark.sql.default.warehouse.dir", envir = sparkConfigMap)) {
--- End diff --

Ah I see - I will make try to use `SessionState` and see if it can avoid 
having to create a new option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2017-02-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r102373053
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -376,6 +377,12 @@ sparkR.session <- function(
 overrideEnvs(sparkConfigMap, paramMap)
   }
 
+  # NOTE(shivaram): Set default warehouse dir to tmpdir to meet CRAN 
requirements
+  # See SPARK-18817 for more details
+  if (!exists("spark.sql.default.warehouse.dir", envir = sparkConfigMap)) {
--- End diff --

actually we can, `SessionState.conf.settings` contains all the user-setted 
entries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2017-02-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r101883075
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -376,6 +377,12 @@ sparkR.session <- function(
 overrideEnvs(sparkConfigMap, paramMap)
   }
 
+  # NOTE(shivaram): Set default warehouse dir to tmpdir to meet CRAN 
requirements
+  # See SPARK-18817 for more details
+  if (!exists("spark.sql.default.warehouse.dir", envir = sparkConfigMap)) {
--- End diff --

After rethinking it, we might not need to add an extra sql conf. We just 
need to know whether the value of `spark.sql.warehouse.dir` is from the users 
or the original default. If it is the default, R can simply change it. 

Maybe it is a good to-have feature for users to know whether the SQLConf 
value is from users or from the default. cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92971642
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

That I agree completely


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92968560
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

I think my bigger concern for that is that usually tests are run all at 
time - i.e. core, sql, hive and then python, R. And there are no guarantees 
that other module tests won't create files inside `SPARK_HOME` afaik. So while 
we can check some basic things with our test, I dont think verifying a global 
property is always possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92958552
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

I'm not sure why it needs to run first ? because the default warehouse dir 
is in `tempdir` even if spark.session is called before it shouldn't create any 
warehouse dir in `SPARK_HOME/` ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92953413
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

I think a couple of other test files would have called `sparkR.session` 
already, so I'd propose adding a new test explicitly named to make sure it is 
called first, ie. 
https://github.com/apache/spark/pull/16330/files#diff-5ff1ba5d1751f3b1cc96a567e9ab25ffR18


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92951374
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

Ah I see - Will check this today.  I think if `SPARK_HOME` is accessible I 
can just call `list.files` with that as `path`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92949262
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

was testing this and it looked like the current directory (`.`) was 
`SPARK_HOME/R/pkg/inst/tests/testthat` and `spark-warehouse` would have been in 
`SPARK_HOME/R`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92935954
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -964,10 +970,16 @@ object StaticSQLConf {
 }
   }
 
+  val DEFAULT_WAREHOUSE_PATH = buildConf("spark.sql.default.warehouse.dir")
--- End diff --

For the internal configuration, it will not be printed out. For example, 
you can try something like
```Scala
spark.sql("SET -v").show(numRows = 200, truncate = false)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-17 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92932104
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -964,10 +970,16 @@ object StaticSQLConf {
 }
   }
 
+  val DEFAULT_WAREHOUSE_PATH = buildConf("spark.sql.default.warehouse.dir")
--- End diff --

I am not familiar with this part of the code base - What are the 
consequences of making it internal ? Is it just in terms of what shows up in 
documentation or does it affect how users can use it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-17 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92932105
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -221,6 +221,19 @@ class SQLConfSuite extends QueryTest with 
SharedSQLContext {
   .sessionState.conf.warehousePath.stripSuffix("/"))
   }
 
+  test("changing default value of warehouse path") {
+try {
+  val newWarehouseDefault = "spark-warehouse2"
+  val newWarehouseDefaultPath = new 
Path(Utils.resolveURI(newWarehouseDefault)).toString
+  sparkContext.conf.set("spark.sql.default.warehouse.dir", 
newWarehouseDefaultPath)
+  val spark = new SparkSession(sparkContext)
+  assert(newWarehouseDefaultPath.stripSuffix("/") === spark
+.sessionState.conf.warehousePath.stripSuffix("/"))
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-17 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92932097
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -819,7 +819,13 @@ private[sql] class SQLConf extends Serializable with 
CatalystConf with Logging {
 
   def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH)
 
-  def warehousePath: String = new 
Path(getConf(StaticSQLConf.WAREHOUSE_PATH)).toString
+  def warehousePath: String = {
+if (contains(StaticSQLConf.WAREHOUSE_PATH.key)) {
--- End diff --

Nice catch - Added the same check here as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-17 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92932095
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala ---
@@ -221,6 +221,19 @@ class SQLConfSuite extends QueryTest with 
SharedSQLContext {
   .sessionState.conf.warehousePath.stripSuffix("/"))
   }
 
+  test("changing default value of warehouse path") {
--- End diff --

Good point. Added tests for all 4 cases now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-17 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92931894
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

Well - given that this PR is only changing the warehouse dir, I think its 
only fair to test for that.  Or in other words, adding such a test would fail 
now because of `derby.log` etc. (per our JIRA discussion) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92916658
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -819,7 +819,13 @@ private[sql] class SQLConf extends Serializable with 
CatalystConf with Logging {
 
   def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH)
 
-  def warehousePath: String = new 
Path(getConf(StaticSQLConf.WAREHOUSE_PATH)).toString
+  def warehousePath: String = {
+if (contains(StaticSQLConf.WAREHOUSE_PATH.key)) {
--- End diff --

What is the reason we are not doing the same check, as shown in [another 
place](https://github.com/shivaram/spark-1/blob/6eec97d463b027c93b48621b55e3bd4005dc0f7e/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L59)?
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-16 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92916155
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -72,6 +72,20 @@ test_that("repeatedly starting and stopping 
SparkSession", {
   }
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  sparkR.session.stop()
+  sparkR.session(enableHiveSupport = FALSE)
+
+  # Create a temporary table
+  sql("CREATE TABLE people_warehouse_test")
+  # spark-warehouse should be written only tempdir() and not current 
working directory
+  res <- list.files(path = ".", pattern = ".*spark-warehouse.*",
--- End diff --

should we test to make sure that no files are created during this process 
instead of only checking for `spark-warehouse`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-16 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92877278
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -55,14 +55,19 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
 s"is set. Setting ${WAREHOUSE_PATH.key} to the value of " +
 s"hive.metastore.warehouse.dir ('$hiveWarehouseDir').")
   hiveWarehouseDir
-} else {
+} else if (sparkContext.conf.contains(WAREHOUSE_PATH.key) &&
+   sparkContext.conf.get(WAREHOUSE_PATH).isDefined) {
--- End diff --

Indented 4 spaces now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-16 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92873918
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -964,11 +970,17 @@ object StaticSQLConf {
 }
   }
 
-  val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir")
-.doc("The default location for managed databases and tables.")
+  val DEFAULT_WAREHOUSE_PATH = buildConf("spark.sql.default.warehouse.dir")
+.doc("The default location for managed databases and tables. " +
+ "Used if spark.sql.warehouse.dir is not set")
 .stringConf
 .createWithDefault(Utils.resolveURI("spark-warehouse").toString)
 
+  val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir")
+.doc("The location for managed databases and tables.")
--- End diff --

Thats a good point. I misunderstood the meaning of `default` there. Will 
fix this now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92761572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -964,11 +970,17 @@ object StaticSQLConf {
 }
   }
 
-  val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir")
-.doc("The default location for managed databases and tables.")
+  val DEFAULT_WAREHOUSE_PATH = buildConf("spark.sql.default.warehouse.dir")
+.doc("The default location for managed databases and tables. " +
+ "Used if spark.sql.warehouse.dir is not set")
 .stringConf
 .createWithDefault(Utils.resolveURI("spark-warehouse").toString)
 
+  val WAREHOUSE_PATH = buildConf("spark.sql.warehouse.dir")
+.doc("The location for managed databases and tables.")
--- End diff --

The description is not right. `spark.sql.warehouse.dir` is still the 
default location when we create a database/table without providing the location 
value. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92553628
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", {
   expect_equal(grepl("blah", retError), TRUE)
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  # nothing should be written outside tempdir() without explicit user 
permission
+  inital_working_directory_files <- list.files()
--- End diff --

From my test, the `spark-warehouse` directory is created when I run `a <- 
createDataFrame(iris)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92553387
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", {
   expect_equal(grepl("blah", retError), TRUE)
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  # nothing should be written outside tempdir() without explicit user 
permission
+  inital_working_directory_files <- list.files()
--- End diff --

I'm referring to other tests in this test file, test_sparkSQL, that is 
calling to API that might already initialize the warehouse dir.

`sparkR.session()` is called at the top. Does this 
`createOrReplaceTempView` cause the warehouse dir to be created?


https://github.com/shivaram/spark-1/blob/25834109588e8e545deafb1da162958766a057e2/R/pkg/inst/tests/testthat/test_sparkSQL.R#L570



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92548697
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -55,14 +55,19 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
 s"is set. Setting ${WAREHOUSE_PATH.key} to the value of " +
 s"hive.metastore.warehouse.dir ('$hiveWarehouseDir').")
   hiveWarehouseDir
-} else {
+} else if (sparkContext.conf.contains(WAREHOUSE_PATH.key) &&
+   sparkContext.conf.get(WAREHOUSE_PATH).isDefined) {
--- End diff --

Nit: indent is not right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread bdwyer2
Github user bdwyer2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92533173
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", {
   expect_equal(grepl("blah", retError), TRUE)
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  # nothing should be written outside tempdir() without explicit user 
permission
+  inital_working_directory_files <- list.files()
--- End diff --

Does Jenkins start with a new workspace every time it runs a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92531777
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -363,6 +363,13 @@ sparkR.session <- function(
   ...) {
 
   sparkConfigMap <- convertNamedListToEnv(sparkConfig)
+
+  # NOTE(shivaram): Set default warehouse dir to tmpdir to meet CRAN 
requirements
+  # See SPARK-18817 for more details
+  if (!exists("spark.sql.default.warehouse.dir", envir = sparkConfigMap)) {
+assign("spark.sql.default.warehouse.dir", tempdir(), envir = 
sparkConfigMap)
--- End diff --

I think we should move this after L383 "overrideEnvs(sparkConfigMap, 
paramMap)" in case it is passed in named param


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92531445
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -363,6 +363,13 @@ sparkR.session <- function(
   ...) {
 
   sparkConfigMap <- convertNamedListToEnv(sparkConfig)
+
+  # NOTE(shivaram): Set default warehouse dir to tmpdir to meet CRAN 
requirements
+  # See SPARK-18817 for more details
+  if (!exists("spark.sql.default.warehouse.dir", envir = sparkConfigMap)) {
+assign("spark.sql.default.warehouse.dir", tempdir(), envir = 
sparkConfigMap)
--- End diff --

I think we could just `sparkConfigMap[["spark.sql.warehouse.default.dir"]] 
<- tempdir()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread shivaram
GitHub user shivaram opened a pull request:

https://github.com/apache/spark/pull/16290

[SPARK-18817] [SPARKR] [SQL] Set default warehouse dir to tempdir

## What changes were proposed in this pull request?
This PR sets the default warehouse dir to a temporary directory in SparkR 
to avoid creating directories in the working directory (see JIRA for more 
details).

To do this we introduce a new SQL config that is used to configure the 
default. For all other frontends, existing behavior is maintained

## How was this patch tested?
Running unit tests locally. Manually with SparkR shell



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shivaram/spark-1 spark-warehouse-default

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16290


commit ac0508985154e600bb8a313f2d151a035790c9f1
Author: Shivaram Venkataraman 
Date:   2016-12-15T01:47:34Z

Set default warehouse dir to tempdir
To do this we introduce a new SQL config that is set to tempdir
from SparkR.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org