monit with spark
We want to monitor spark master and spark slaves using monit but we want to use the sbin scripts to do so. The scripts create the spark master and salve processes independent from themselves so monit would not know the started processed pid to watch. Is this correct? Should we watch the ports? How should we configure monit to run and monitor spark standalone processes? -- Thanks, Mike
Strategy to automatically configure spark workers env params in standalone mode
We are planning to use varying servers spec (32 GB, 64GB, 244GB RAM or even higher and varying cores) for an standalone deployment of spark but we do not know the spec of the server ahead of time and we need to script up some logic that will run on the server on boot and automatically set the following params on the server based on what it reads from OS about cores and memory SPARK_WORKER_CORES SPARK_WORKER_MEMORY SPARK_WORKER_INSTANCES What could such script the logic to be based on the memory size and number of cores seen by this script? In other words, what are the recommend rule of thumps to divide up a server (specially for larger rams) without knowing about the spark application and data size ahead of time? Thanks, Mike
single worker vs multiple workers on each machine
Hi There, I am new to Spark and I was wondering when you have so much memory on each machine of the cluster, is it better to run multiple workers with limited memory on each machine or is it better to run a single worker with access to the majority of the machine memory? If the answer is it depends, would you please elaborate? Thanks, Mike