We want to run multiple instances of spark sql cli on our yarn cluster. Each instance of the cli is to be used by a different user. This would be non-optimal if each user brings up a different cli given how spark works on yarn by running executor processes (and hence consuming resources) on worker nodes for the lifetime of the application. Imagine each user trying to cache a table in memory when there is limited memory across the cluster. The right way seems like to use the same spark context shared across multiple initializations and running just one spark sql application. Is my understanding correct about resource usage on yarn for spark-sql? Is there a way to do the sharing of spark context currently ? Seem like it needs some kind of thrift interface hooked into the cli driver.
*Apologies if you have already seen this on user group*