[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624335#comment-14624335
 ] 

Apache Spark commented on SPARK-8596:
-

User 'koaning' has created a pull request for this issue:
https://github.com/apache/spark/pull/7366

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-08 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618863#comment-14618863
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

Thanks for the PR. Will review this today. And we don't have anything like this 
open for ipython as far as I know. You can open a new JIRA and discuss this.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-08 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618064#comment-14618064
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

i can confirm it works, but i cant confirm if im not breaking anything else. so 
i was wondering if there was some sort of test script to test if this 
provisioning script works. 

anyway: https://github.com/mesos/spark-ec2/pull/129

i do have a final question: do we have something like this open for the 
ipython/jupyter notebook as well? 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-07 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616924#comment-14616924
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

You can test this by launching a new cluster with a command that looks like 

{code}
./spark-ec2 -s 2 -t r3.xlarge -i pem -k key --spark-ec2-git-repo 
https://github.com/koaning/spark-ec2 --spark-ec2-git-branch rstudio-install 
launch rstudio-test
{code}

This cluster setup will now use the spark-ec2 scripts from your repo while 
setting things up. Once you think its good, you can open a PR on 
github.com/mesos/spark-ec2

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-07 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616616#comment-14616616
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

made the changes. confirmed that rstudio now works out of the box with the 
startupscript. 

https://github.com/koaning/spark-ec2/tree/rstudio-install

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-06 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614968#comment-14614968
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

done and done. 

this task feels like it is getting ready for merge. i am only wondering about 
these few lines of code in the `init.sh` script here: 
https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L36-38

```
sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh  
/root/spark/conf/spark-env2.sh
mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh
ulimit -n 100
```

the ulimit can only be set by the root user. but it doesn't feel right to 
remove that line from the `init.sh` script in the rstudio folder. [~shivaram], 
thoughts? 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-06 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615675#comment-14615675
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

I think you can move the /etc/security/limits.conf to `setup-slave.sh` -- That 
gets run on every machine.  Regarding editing spark-env.sh, can you put that 
`ulimit` call within an `if user_is_root` ?

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-06 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615264#comment-14615264
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

Hmm the problem is I'm not sure the ulimit applies across shells, which is why 
we were doing it in spark-env.sh. Hmm here are a couple of things

1. We could change it in the AMI in /etc/security/limits.conf
2. Does leaving it in there lead to any error or does it just print a warning 
statement ? If its just a warning, I'd say lets leave it in there. The ulimit 
used to be only needed for really large shuffles, so some of the RStudio use 
cases might work even without it.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-06 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615523#comment-14615523
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

1. true. made commit: 
https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh#L35-37.
 do we wan this code here, or somewhere else? seems like this is something 
system wide and not rstudio specific... what about /spark/init.sh instead of 
/rstudio/init.sh? 

2. it gives you a breaking error. 

```
 sc - sparkR.init(spark_link) 
Launching java with spark-submit command /root/spark/bin/spark-submit  
sparkr-shell /tmp/RtmpSaSV2q/backend_port53744f8e9f59 
/root/spark/conf/spark-env.sh: line 30: ulimit: open files: cannot modify 
limit: Operation not permitted
```
ulimit is a command that can only be run by root, while the rstudio user isn't. 

3. can i remove this line: 
https://github.com/koaning/spark-ec2/blob/branch-1.4/templates/root/spark/conf/spark-env.sh#L30
 

not all users will want to use rstudio and im not sure how this might break 
things (im assuming pyspark might use this script as well?) perhaps we can move 
the new `/etc/security/limits.conf` paremeters in here? 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-05 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614294#comment-14614294
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

Ok. How about I create an 'startSpark.R' file in the user folder for the 
rstudio user? That way we won't have any interference with the  
`/root/spark/bin/sparkR` script and the R user will have a good way for a quick 
start. 

To get a new user to be able to run jobs, it still seems like I need to run 
`chmod a+w /mnt/spark` to solve the previously named error. 

```
15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in 
/mnt/spark. Ignoring this directory.
15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in 
/mnt2/spark. Ignoring this directory.
```

Do we want to do this? or force the user to do this manually and to leave them 
with a proper tutorial? [~RedOakMark], did you find an alternative solution? 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-05 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614331#comment-14614331
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

I think its fine to make /mnt/spark and /mnt2/spark writable by all users by 
default. These are just directories used to store tmp files for a Spark job. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-03 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613273#comment-14613273
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

Some points that might be helpful

1. The `start-all.sh` should be automatically run on cluster startup. Users 
don't need to automatically run it
2. The Spark master URL is present in a file `/root/spark-ec2/cluster-url`. You 
can just read the file to get the value (no need to get the hostname from EC2 
etc.)
3. I think it should be fine to put this in a profile that gets picked up by 
RStudio. However I'd say we should not use .Rprofile as that may interfere with 
users using the /root/spark/bin/sparkR script ?


 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-03 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613018#comment-14613018
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

I now have a more elegant way to get any R shell connected to spark. If you 
have run the spark submit: 

```
/root/spark/sbin/stop-all.sh
/root/spark/sbin/start-all.sh
```

then this snippet will collect all data you need automatically (the tutorial 
had manual labor involved) 

```
region_ip - system(curl 
http://169.254.169.254/latest/meta-data/public-hostname;, intern=TRUE)
spark_link - paste0('spark://', region_ip, ':7077')

.libPaths(c(.libPaths(), '/root/spark/R/lib'))
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)

sc - sparkR.init(spark_link)
sqlContext - sparkRSQL.init(sc)
```

This snippet can be made part of the '.Rprofile', which will allow any user of 
Rstudio to automatically be connected to Spark. This will only work if 
`/root/spark/sbin/start-all.sh` is run. Do we want this? Possible downside 
might be that errors will be thrown in the R user doesn't understand that 
`stat-all.sh` needs to be run first. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-03 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613100#comment-14613100
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

We just found that `chmod a+w /mnt/spark` solves the problem, but it is not 
very elegant.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-02 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611747#comment-14611747
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

Cool, would love to hear your end of the story. It seems the only bother to get 
the script to work. 

Slightly deviating subject: I'm not just a frequent R user, I do a lot of 
python as well. Is there a similar ticket like this for the iPython (jupyter) 
notebook? It seems like the most appropriate GUI for the python language. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-02 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611832#comment-14611832
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

By the way, I now have scripts that do install Rstudio (just ran and 
confirmed). 

The code is here: 

https://github.com/koaning/spark-ec2/tree/rstudio-install
https://github.com/koaning/spark/tree/rstudio-install

When initializing with this command: 

./spark-ec2 --key-pair=spark-df 
--identity-file=/Users/code/Downloads/spark-df.pem --region=eu-west-1 -s 1 
--instance-type=c3.2xlarge 
--spark-ec2-git-repo=https://github.com/koaning/spark-ec2 
--spark-ec2-git-branch=rstudio-install launch mysparkr

I can confirm that rstudio is installand and that a correct user is added. 
There are two concerns:

- should we not force the user to supply the password themselves? setting a 
standard password seems like a security vulnerability. 
- I am not sure if this gets installed on all the slave nodes. I added this 
module 
(https://github.com/koaning/spark-ec2/blob/rstudio-install/rstudio/init.sh) and 
we only need it on the master node. I wonder what the best way is to ensure 
this.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-02 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612178#comment-14612178
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

Thanks [~cantdutchthis] -- To answer your questions

1. Regarding the password we could set a default password and at the end of 
spark_ec2.py we could add  'Please change this password'. Or if we wanted to be 
a bit more secure we could generate a random password in spark_ec2.py, set it 
and then print the password out at the end saying 'please use RStudio with this 
password'

2. The init.sh by default only runs on the master. If we want it to run on the 
slaves we need to `ssh` to all the slaves and do it. But for Rstudio what you 
have should be fine.

I'll test this out soon and get back on this JIRA.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-01 Thread Mark Stephenson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610555#comment-14610555
 ] 

Mark Stephenson commented on SPARK-8596:


[~cantdutchthis]: we have been getting the same error and it's definitely a 
user permissions issue.  Even when giving the new RStudio user ownership rights 
to the ./spark folder, there are additional classpath errors. 

We are working on a solution today to utilize and login to RStudio as the 
'hadoop' user to start with, just to make sure that the proof of concept works, 
and then expound a longer term solution with some potential bootstrap code.  
Will advise once we have it solved.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-30 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609127#comment-14609127
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

Mhm. I seem to stumble on another issue with adding a new user for Rstudio. 

Here is a link to the tutorial that I recently made (which I would like to push 
to the Rstudio blog, once this issue is fixed). 
https://gist.github.com/koaning/5a896eb5c773c24091c2. 

The odd thing is that the tutorial works fine if you do not run the 
`/root/spark/bin/sparkR` command and move on to installing Rstudio instead. If 
you run the sparkR shell then you get this error in later after Rstudio has 
been provisioned: 

```
 sc - 
 sparkR.init('spark://ec2-52-18-7-11.eu-west-1.compute.amazonaws.com:7077')
Launching java with spark-submit command /root/spark/bin/spark-submit  
sparkr-shell /tmp/RtmpxBIfkg/backend_port104b15f47402 
15/06/30 21:38:49 INFO spark.SparkContext: Running Spark version 1.4.0
15/06/30 21:38:49 INFO spark.SecurityManager: Changing view acls to: analyst
15/06/30 21:38:49 INFO spark.SecurityManager: Changing modify acls to: analyst
15/06/30 21:38:49 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(analyst); users 
with modify permissions: Set(analyst)
15/06/30 21:38:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/06/30 21:38:49 INFO Remoting: Starting remoting
15/06/30 21:38:50 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkDriver@172.31.6.135:58940]
15/06/30 21:38:50 INFO util.Utils: Successfully started service 'sparkDriver' 
on port 58940.
15/06/30 21:38:50 INFO spark.SparkEnv: Registering MapOutputTracker
15/06/30 21:38:50 INFO spark.SparkEnv: Registering BlockManagerMaster
15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in 
/mnt/spark. Ignoring this directory.
15/06/30 21:38:50 ERROR util.Utils: Failed to create local root dir in 
/mnt2/spark. Ignoring this directory.
15/06/30 21:38:50 ERROR storage.DiskBlockManager: Failed to create any local 
dir.
15/06/30 21:38:50 INFO util.Utils: Shutdown hook called
Error in readTypedObject(con, type) : 
  Unsupported type for deserialization 
```

I get the impression this error is caused by the fact that we create another 
user that doesn't have full root access and can therefore not create a local 
dir. What might be the best way of dealing with this? What assumptions does 
Spark make in terms of permissions? Can any user push spark jobs via the spark 
link or are there some permissions involved on the filesystem before one can do 
this? 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-30 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609148#comment-14609148
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

I think the assumption is that the root user is running the scripts in 
/root/spark/bin -- No other use cases have been tests AFAIK. On the other hand 
the Spark master (i.e the service running at spark://master_host_name:7077 
doesn't do any authentication as far as I know. So we should be able to submit 
jobs from other user accounts but you might need to copy Spark to that user's 
account before running things.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604588#comment-14604588
 ] 

Apache Spark commented on SPARK-8596:
-

User 'koaning' has created a pull request for this issue:
https://github.com/apache/spark/pull/7068

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-28 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604587#comment-14604587
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

1. on it. 



 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-28 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604897#comment-14604897
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

Merged https://github.com/apache/spark/pull/7068 to open the RStudio port. I'm 
keeping the JIRA open till we fix the second part of this issue too.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604053#comment-14604053
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

I'm writing a small tutorial to get up to scratch with rstudio on AWS. It 
works. The main issue seems that currently ec2 installs an old version of R 
(3.1) while most packages like ggplot require a new version (3.2). I'm going to 
share the tutorial with the Rstudio guys soon. 

My approach is to run `spark/bin/start-all.sh` on the master node and then run 
the following commands in Rstudio on the master node: 

.libPaths( c( .libPaths(), '/root/spark/R/lib') )
Sys.setenv(SPARK_HOME = '/root/spark')
Sys.setenv(PATH = paste(Sys.getenv(c(PATH)), '/root/spark/bin', sep=':'))
library(SparkR)
sc - sparkR.init('SPARK MASTER ADR')
sqlContext - sparkRSQL.init(sc)

This works on my end, and I've been able to use the dataframe API with a json 
blob on s3 with this sqlContext. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604055#comment-14604055
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

Can we make an issue on the R version? Seems like something that could be fixed 
by using a different standard AMI. 

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604307#comment-14604307
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

[~cantdutchthis] Thanks for taking a look at this. One thing that many users 
have run into is that if you install RStudio server on the EC2 master node it 
doesn't work as RStudio requires a root user's password. Is there some 
configuration you did to overcome this ?

Regarding the R version lets open a new JIRA for it. I can describe how the 
Spark EC2 AMIs are built and we can try to see if we can install R from some 
other YUM repo etc.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604328#comment-14604328
 ] 

Vincent Warmerdam commented on SPARK-8596:
--

Rstudio doesn't want to be run as a root user in general. Rstudio doesn't have 
https out of the box, so any port sniffing suddenly becomes a security risk if 
somebody can use the webui to gain root access. 

Instead,  you can just go and add a new user 

$ useradd analyst 
$ passwd analyst

This user will then be able to log in. Note that in order to see rstudio, you 
will need to also edit the security group for the master node to allow TCP to 
connect to this port. 

I'd love to help out and spend some time on these issues by the way. I've got a 
small tutorial .md file ready, can I share that via Jira?  Would like to double 
check it with you guys because I may be doing a dirty trick. For Rstudio not to 
give errors, I remove a line of code in a shell script (because this new user 
is not root it cannot run `ulimit` commands). 

Dirty trick (run as root): 

sed -e 's/^ulimit/#ulimit/g' /root/spark/conf/spark-env.sh  
/root/spark/conf/spark-env2.sh
mv /root/spark/conf/spark-env2.sh /root/spark/conf/spark-env.sh
ulimit -n 100

Installing the right R version always was a bit tricky. Will follow other Jira 
ticket as well. 


 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-27 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604440#comment-14604440
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

Thanks ! These are very useful instructions. We can break up this jira into a 
bunch of smaller issues.

1. Opening the RStudio port in the EC2 cluster. For this we need to add the 
right port number to the Spark EC2 script at 
https://github.com/apache/spark/blob/0b5abbf5f96a5f6bfd15a65e8788cf3fa96fe54c/ec2/spark_ec2.py#L507.This
 should be a pretty simple change -- Would you like to open a PR for this ? 

2. We need to add code to install rstudio, add a new user (lets say username 
rstudio, password rstudio) -- To do this we will need to modify scripts in the 
spark-ec2 repo 
at https://github.com/mesos/spark-ec2. At a high-level these scripts are run on 
the master node after the cluster is launched and these scripts install Spark, 
Hadoop etc. on the AMI. So we can just add a new module to spark-ec2 called 
rstudio and then in rstudio/setup.sh we can add code to setup the new users 
etc. as well. 

Let me know if you want to take a shot at the second one as well


 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-24 Thread Guorong Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600164#comment-14600164
 ] 

Guorong Xu commented on SPARK-8596:
---

Right now I am using another way to solve this issue, I did not install RStudio 
on the head node instead of installing RStudio on the instance which launches a 
cluster. And then I use the below command to initiate a spark context.
sc - sparkR.init(master=spark://[Remote_head_node]:7077, 
sparkEnvir=list(spark.executor.memory=1g))

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-24 Thread Guorong Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599739#comment-14599739
 ] 

Guorong Xu commented on SPARK-8596:
---

When I install Spark on EC2 following ec2-script, I assume the Spark should be 
installed on the driver node. If I install Spark in /home/rstudio on the driver 
node again, then I will have two copies of Spark installation on the drive 
node. Will Rstudio submit jobs to the right Spark and do computing cross all 
worker nodes?

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-24 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599940#comment-14599940
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

I think it should technically work if you submit jobs to the Spark master url 
that is already setup by the EC2 cluster then you should be able to use the 
cluster. I think this will work as we don't do any user-authentication / 
permissions in Spark. But I haven't tried it before, so let us know what 
happens.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org