Yes, should work fine, just set up according to the docs. There needs to be network connectivity between whatever the driver node is and these 4 nodes.
On Thu, Sep 14, 2023 at 11:57 PM Ilango <elango...@gmail.com> wrote: > > Hi all, > > We have 4 HPC nodes and installed spark individually in all nodes. > > Spark is used as local mode(each driver/executor will have 8 cores and 65 > GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as > scheduler. > > As this is local mode, we are facing performance issue(as only one > executor) when it comes dealing with large datasets. > > Can I convert this 4 nodes into spark standalone cluster. We dont have > hadoop so yarn mode is out of scope. > > Shall I follow the official documentation for setting up standalone > cluster. Will it work? Do I need to aware anything else? > Can you please share your thoughts? > > Thanks, > Elango >