Re: Spark stand-alone mode

2023-10-17 Thread Ilango
Hi all, Thanks a lot for your suggestions and knowledge sharing. I like to let you know that, I completed setting up the stand alone cluster and couple of data science users are able to use it already for last two weeks. And the performance is really good. Almost 10X performance improvement

Re: Spark stand-alone mode

2023-09-19 Thread Patrick Tucci
Multiple applications can run at once, but you need to either configure Spark or your applications to allow that. In stand-alone mode, each application attempts to take all resources available by default. This section of the documentation has more details:

Re: Spark stand-alone mode

2023-09-18 Thread Ilango
Thanks all for your suggestions. Noted with thanks. Just wanted share few more details about the environment 1. We use NFS for data storage and data is in parquet format 2. All HPC nodes are connected and already work as a cluster for Studio workbench. I can setup password less SSH if it not exist

Re: Spark stand-alone mode

2023-09-15 Thread Bjørn Jørgensen
you need to setup ssh without password, use key instead. How to connect without password using SSH (passwordless) fre. 15. sep. 2023 kl. 20:55 skrev Mich Talebzadeh <

Re: Spark stand-alone mode

2023-09-15 Thread Mich Talebzadeh
Hi, Can these 4 nodes talk to each other through ssh as trusted hosts (on top of the network that Sean already mentioned)? Otherwise you need to set it up. You can install a LAN if you have another free port at the back of your HPC nodes. They should You ought to try to set up a Hadoop cluster

Re: Spark stand-alone mode

2023-09-15 Thread Sean Owen
Yes, should work fine, just set up according to the docs. There needs to be network connectivity between whatever the driver node is and these 4 nodes. On Thu, Sep 14, 2023 at 11:57 PM Ilango wrote: > > Hi all, > > We have 4 HPC nodes and installed spark individually in all nodes. > > Spark is

Re: Spark stand-alone mode

2023-09-15 Thread Patrick Tucci
I use Spark in standalone mode. It works well, and the instructions on the site are accurate for the most part. The only thing that didn't work for me was the start_all.sh script. Instead, I use a simple script that starts the master node, then uses SSH to connect to the worker machines and start

Spark stand-alone mode

2023-09-14 Thread Ilango
Hi all, We have 4 HPC nodes and installed spark individually in all nodes. Spark is used as local mode(each driver/executor will have 8 cores and 65 GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as scheduler. As this is local mode, we are facing performance issue(as only

Spark Stand-alone mode job not starting (akka Connection refused)

2014-05-28 Thread T.J. Alumbaugh
I've been trying for several days now to get a Spark application running in stand-alone mode, as described here: http://spark.apache.org/docs/latest/spark-standalone.html I'm using pyspark, so I've been following the example here: