[jira] [Comment Edited] (SPARK-18817) Ensure nothing is written outside R's tempdir() by default

Felix Cheung (JIRA) Sat, 17 Dec 2016 18:06:00 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757998#comment-15757998
 ]


Felix Cheung edited comment on SPARK-18817 at 12/18/16 2:04 AM:
----------------------------------------------------------------

Aside from changing the existing shipped behavior, there are a few mentions of 
this behavior in various documentation that would become wrong and would need 
to be updated.

IMO more importantly we still have a feature that can be turned on (as 
documented or suggested in documentations) that would cause files to be written 
without the user explicitly agreeing to it (or understanding it). This to me 
doesn't seem like we would be addressing the root of the issue fully, merely 
side-stepping it?

I've managed to track down the fix to move metastore_db and derby.log though. 
There are two separate switches to set and it is doable from pure R (have 
tested that); but I'd recommend doing in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116
  in order to respect any existing value from hive-site.xml if given one. 

How about we introduce something like spark.sql.default.derby.dir and fix this 
that way?


was (Author: felixcheung):
Aside from changing the existing shipped behavior, there are a few mentions of 
this behavior in various documentation that would become wrong and would need 
to be updated.

IMO more importantly we still have a feature that can be turned on (as 
documented or suggested in documentations) that would cause files to be written 
without the user explicitly agreeing to it (or understanding it). This to me 
doesn't seem like we would be addressing the root of the issue fully, merely 
side-stepping it?

I've managed to track down the fix to move metastore_db and derby.log though. 
There are two separate switches to set that it doable from pure R; but I'd 
recommend doing in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116
  in order to respect any existing value from hive-site.xml if given one. 

How about we introduce something like spark.sql.default.derby.dir and fix this 
that way?

> Ensure nothing is written outside R's tempdir() by default
> ----------------------------------------------------------
>
>                 Key: SPARK-18817
>                 URL: https://issues.apache.org/jira/browse/SPARK-18817
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Brendan Dwyer
>            Priority: Critical
>
> Per CRAN policies
> https://cran.r-project.org/web/packages/policies.html
> {quote}
> - Packages should not write in the users’ home filespace, nor anywhere else 
> on the file system apart from the R session’s temporary directory (or during 
> installation in the location pointed to by TMPDIR: and such usage should be 
> cleaned up). Installing into the system’s R installation (e.g., scripts to 
> its bin directory) is not allowed.
> Limited exceptions may be allowed in interactive sessions if the package 
> obtains confirmation from the user.
> - Packages should not modify the global environment (user’s workspace).
> {quote}
> Currently "spark-warehouse" gets created in the working directory when 
> sparkR.session() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18817) Ensure nothing is written outside R's tempdir() by default

Reply via email to