We want to run multiple instances of spark sql cli on our yarn cluster.
Each instance of the cli is to be used by a different user. This would be
non-optimal if each user brings up a different cli given how spark works on
yarn by running executor processes (and hence consuming resources) on
worker nodes for the lifetime of the application. Imagine each user trying
to cache a table in memory when there is limited memory across the
cluster.  The right way seems like to use the same spark context shared
across multiple initializations and running just one spark sql application.
Is my understanding correct about resource usage on yarn for spark-sql? Is
there a way to do the sharing of spark context currently ? Seem like it
needs some kind of thrift interface hooked into the cli driver.

*Apologies if you have already seen this on user group*

Reply via email to