Trying to keep the answer short and simple... On Wed, Jun 22, 2016 at 1:19 PM, Michael Segel <msegel_had...@hotmail.com> wrote: > But this gets to the question… what are the real differences between client > and cluster modes? > What are the pros/cons and use cases where one has advantages over the > other?
- client mode requires the process that launched the app remain alive. Meaning the host where it lives has to stay alive, and it may not be super-friendly to ssh sessions dying, for example, unless you use nohup. - client mode driver logs are printed to stderr by default. yes you can change that, but in cluster mode, they're all collected by yarn without any user intervention. - if your edge node (from where the app is launched) isn't really part of the cluster (e.g., lives in an outside network with firewalls or higher latency), you may run into issues. - in cluster mode, your driver's cpu / memory usage is accounted for in YARN; this matters if your edge node is part of the cluster (and could be running yarn containers), since in client mode your driver will potentially use a lot of memory / cpu. - finally, in cluster mode YARN can restart your application without user interference. this is useful for things that need to stay up (think a long running streaming job, for example). -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org