[ https://issues.apache.org/jira/browse/SPARK-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757998#comment-15757998 ]
Felix Cheung edited comment on SPARK-18817 at 12/18/16 2:04 AM: ---------------------------------------------------------------- Aside from changing the existing shipped behavior, there are a few mentions of this behavior in various documentation that would become wrong and would need to be updated. IMO more importantly we still have a feature that can be turned on (as documented or suggested in documentations) that would cause files to be written without the user explicitly agreeing to it (or understanding it). This to me doesn't seem like we would be addressing the root of the issue fully, merely side-stepping it? I've managed to track down the fix to move metastore_db and derby.log though. There are two separate switches to set and it is doable from pure R (have tested that); but I'd recommend doing in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116 in order to respect any existing value from hive-site.xml if given one. How about we introduce something like spark.sql.default.derby.dir and fix this that way? was (Author: felixcheung): Aside from changing the existing shipped behavior, there are a few mentions of this behavior in various documentation that would become wrong and would need to be updated. IMO more importantly we still have a feature that can be turned on (as documented or suggested in documentations) that would cause files to be written without the user explicitly agreeing to it (or understanding it). This to me doesn't seem like we would be addressing the root of the issue fully, merely side-stepping it? I've managed to track down the fix to move metastore_db and derby.log though. There are two separate switches to set that it doable from pure R; but I'd recommend doing in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116 in order to respect any existing value from hive-site.xml if given one. How about we introduce something like spark.sql.default.derby.dir and fix this that way? > Ensure nothing is written outside R's tempdir() by default > ---------------------------------------------------------- > > Key: SPARK-18817 > URL: https://issues.apache.org/jira/browse/SPARK-18817 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Reporter: Brendan Dwyer > Priority: Critical > > Per CRAN policies > https://cran.r-project.org/web/packages/policies.html > {quote} > - Packages should not write in the users’ home filespace, nor anywhere else > on the file system apart from the R session’s temporary directory (or during > installation in the location pointed to by TMPDIR: and such usage should be > cleaned up). Installing into the system’s R installation (e.g., scripts to > its bin directory) is not allowed. > Limited exceptions may be allowed in interactive sessions if the package > obtains confirmation from the user. > - Packages should not modify the global environment (user’s workspace). > {quote} > Currently "spark-warehouse" gets created in the working directory when > sparkR.session() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org