monit with spark

2015-02-15 Thread Mike Sam
We want to monitor spark master and spark slaves using monit but we want to
use the sbin scripts to do so. The scripts create the spark master and
salve processes independent from themselves so monit would not know the
started processed pid to watch. Is this correct? Should we watch the ports?

How should we configure monit to run and monitor spark standalone
processes?

-- 
Thanks,
Mike


Strategy to automatically configure spark workers env params in standalone mode

2015-02-14 Thread Mike Sam
We are planning to use varying servers spec (32 GB, 64GB, 244GB RAM or even
higher and varying cores) for an standalone deployment of spark but we do
not know the spec of the server ahead of time and we need to script up some
logic that will run on the server on boot and automatically set the
following params on the server based on what it reads from OS about cores
and memory

SPARK_WORKER_CORES
SPARK_WORKER_MEMORY
SPARK_WORKER_INSTANCES

What could such script the logic to be based on the memory size and number
of cores seen by this script? In other words, what are the recommend rule
of thumps to divide up a server (specially for larger rams) without knowing
about the spark application and data size ahead of time?

Thanks,
Mike


single worker vs multiple workers on each machine

2014-09-11 Thread Mike Sam
Hi There,

I am new to Spark and I was wondering when you have so much memory on each
machine of the cluster, is it better to run multiple workers with limited
memory on each machine or is it better to run a single worker with access
to the majority of the machine memory? If the answer is it depends, would
you please elaborate?

Thanks,
Mike