Re: Sharing Spark Drivers
Hi John, This would be a potential application for the Spark Kernel project ( https://github.com/ibm-et/spark-kernel). The Spark Kernel serves as your driver application, allowing you to feed it snippets of code (or load up entire jars via magics) in Scala to execute against a Spark cluster. Although not technically supported, you can connect multiple applications to the same Spark Kernel instance to use the same resources (both on the cluster and on the driver). If you're curious, you can find a getting started section here: https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel Signed, Chip Senkbeil On Tue Feb 24 2015 at 8:04:08 AM John Omernik j...@omernik.com wrote: I have been posting on the Mesos list, as I am looking to see if it it's possible or not to share spark drivers. Obviously, in stand alone cluster mode, the Master handles requests, and you can instantiate a new sparkcontext to a currently running master. However in Mesos (and perhaps Yarn) I don't see how this is possible. I guess I am curious on why? It could make quite a bit of sense to have one driver act as a master, running as a certain user, (ideally running out in the Mesos cluster, which I believe Tim Chen is working on). That driver could belong to a user, and be used as a long term resource controlled instance that the user could use for adhoc queries. While running many little ones out on the cluster seems to be a waste of driver resources, as each driver would be using the same resources, and rarely would many be used at once (if they were for a users adhoc environment). Additionally, the advantages of the shared driver seem to play out for a user as they come back to the environment over and over again. Does this make sense? I really want to try to understand how looking at this way is wrong, either from a Spark paradigm perspective of a technological perspective. I will grant, that I am coming from a traditional background, so some of the older ideas for how to set things up may be creeping into my thinking, but if that's the case, I'd love to understand better. Thanks1 John - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Sharing Spark Drivers
I have been posting on the Mesos list, as I am looking to see if it it's possible or not to share spark drivers. Obviously, in stand alone cluster mode, the Master handles requests, and you can instantiate a new sparkcontext to a currently running master. However in Mesos (and perhaps Yarn) I don't see how this is possible. I guess I am curious on why? It could make quite a bit of sense to have one driver act as a master, running as a certain user, (ideally running out in the Mesos cluster, which I believe Tim Chen is working on). That driver could belong to a user, and be used as a long term resource controlled instance that the user could use for adhoc queries. While running many little ones out on the cluster seems to be a waste of driver resources, as each driver would be using the same resources, and rarely would many be used at once (if they were for a users adhoc environment). Additionally, the advantages of the shared driver seem to play out for a user as they come back to the environment over and over again. Does this make sense? I really want to try to understand how looking at this way is wrong, either from a Spark paradigm perspective of a technological perspective. I will grant, that I am coming from a traditional background, so some of the older ideas for how to set things up may be creeping into my thinking, but if that's the case, I'd love to understand better. Thanks1 John - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Sharing Spark Drivers
I am aware of that, but two things are working against me here with spark-kernel. Python is our language, and we are really looking for a supported way to approach this for the enterprise. I like the concept, it just doesn't work for us given our constraints. This does raise an interesting point though, if side projects are spinning up to support this, why not make this a feature of the main project or is it just that esoteric that it's not important for the main project to be looking into it? On Tue, Feb 24, 2015 at 9:25 AM, Chip Senkbeil chip.senkb...@gmail.com wrote: Hi John, This would be a potential application for the Spark Kernel project (https://github.com/ibm-et/spark-kernel). The Spark Kernel serves as your driver application, allowing you to feed it snippets of code (or load up entire jars via magics) in Scala to execute against a Spark cluster. Although not technically supported, you can connect multiple applications to the same Spark Kernel instance to use the same resources (both on the cluster and on the driver). If you're curious, you can find a getting started section here: https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel Signed, Chip Senkbeil On Tue Feb 24 2015 at 8:04:08 AM John Omernik j...@omernik.com wrote: I have been posting on the Mesos list, as I am looking to see if it it's possible or not to share spark drivers. Obviously, in stand alone cluster mode, the Master handles requests, and you can instantiate a new sparkcontext to a currently running master. However in Mesos (and perhaps Yarn) I don't see how this is possible. I guess I am curious on why? It could make quite a bit of sense to have one driver act as a master, running as a certain user, (ideally running out in the Mesos cluster, which I believe Tim Chen is working on). That driver could belong to a user, and be used as a long term resource controlled instance that the user could use for adhoc queries. While running many little ones out on the cluster seems to be a waste of driver resources, as each driver would be using the same resources, and rarely would many be used at once (if they were for a users adhoc environment). Additionally, the advantages of the shared driver seem to play out for a user as they come back to the environment over and over again. Does this make sense? I really want to try to understand how looking at this way is wrong, either from a Spark paradigm perspective of a technological perspective. I will grant, that I am coming from a traditional background, so some of the older ideas for how to set things up may be creeping into my thinking, but if that's the case, I'd love to understand better. Thanks1 John - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org