Re: Sharing Spark Drivers

2015-02-24 Thread Chip Senkbeil
Hi John,

This would be a potential application for the Spark Kernel project (
https://github.com/ibm-et/spark-kernel). The Spark Kernel serves as your
driver application, allowing you to feed it snippets of code (or load up
entire jars via magics) in Scala to execute against a Spark cluster.

Although not technically supported, you can connect multiple applications
to the same Spark Kernel instance to use the same resources (both on the
cluster and on the driver).

If you're curious, you can find a getting started section here:
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

Signed,
Chip Senkbeil

On Tue Feb 24 2015 at 8:04:08 AM John Omernik j...@omernik.com wrote:

 I have been posting on the Mesos list, as I am looking to see if it
 it's possible or not to share spark drivers.  Obviously, in stand
 alone cluster mode, the Master handles requests, and you can
 instantiate a new sparkcontext to a currently running master. However
 in Mesos (and perhaps Yarn) I don't see how this is possible.

 I guess I am curious on why? It could make quite a bit of sense to
 have one driver act as a master, running as a certain user, (ideally
 running out in the Mesos cluster, which I believe Tim Chen is working
 on).   That driver could belong to a user, and be used as a long term
 resource controlled instance that the user could use for adhoc
 queries.  While running many little ones out on the cluster seems to
 be a waste of driver resources, as each driver would be using the same
 resources, and rarely would many be used at once (if they were for a
 users adhoc environment). Additionally, the advantages of the shared
 driver seem to play out for a user as they come back to the
 environment over and over again.

 Does this make sense? I really want to try to understand how looking
 at this way is wrong, either from a Spark paradigm perspective of a
 technological perspective.  I will grant, that I am coming from a
 traditional background, so some of the older ideas for how to set
 things up may be creeping into my thinking, but if that's the case,
 I'd love to understand better.

 Thanks1

 John

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Sharing Spark Drivers

2015-02-24 Thread John Omernik
I have been posting on the Mesos list, as I am looking to see if it
it's possible or not to share spark drivers.  Obviously, in stand
alone cluster mode, the Master handles requests, and you can
instantiate a new sparkcontext to a currently running master. However
in Mesos (and perhaps Yarn) I don't see how this is possible.

I guess I am curious on why? It could make quite a bit of sense to
have one driver act as a master, running as a certain user, (ideally
running out in the Mesos cluster, which I believe Tim Chen is working
on).   That driver could belong to a user, and be used as a long term
resource controlled instance that the user could use for adhoc
queries.  While running many little ones out on the cluster seems to
be a waste of driver resources, as each driver would be using the same
resources, and rarely would many be used at once (if they were for a
users adhoc environment). Additionally, the advantages of the shared
driver seem to play out for a user as they come back to the
environment over and over again.

Does this make sense? I really want to try to understand how looking
at this way is wrong, either from a Spark paradigm perspective of a
technological perspective.  I will grant, that I am coming from a
traditional background, so some of the older ideas for how to set
things up may be creeping into my thinking, but if that's the case,
I'd love to understand better.

Thanks1

John

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Sharing Spark Drivers

2015-02-24 Thread John Omernik
I am aware of that, but two things are working against me here with
spark-kernel. Python is our language, and we are really looking for a
supported way to approach this for the enterprise.  I like the
concept, it just doesn't work for us given our constraints.

This does raise an interesting point though, if side projects are
spinning up to support this, why not make this a feature of the main
project or is it just that esoteric that it's not important for the
main project to be looking into it?



On Tue, Feb 24, 2015 at 9:25 AM, Chip Senkbeil chip.senkb...@gmail.com wrote:
 Hi John,

 This would be a potential application for the Spark Kernel project
 (https://github.com/ibm-et/spark-kernel). The Spark Kernel serves as your
 driver application, allowing you to feed it snippets of code (or load up
 entire jars via magics) in Scala to execute against a Spark cluster.

 Although not technically supported, you can connect multiple applications to
 the same Spark Kernel instance to use the same resources (both on the
 cluster and on the driver).

 If you're curious, you can find a getting started section here:
 https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

 Signed,
 Chip Senkbeil

 On Tue Feb 24 2015 at 8:04:08 AM John Omernik j...@omernik.com wrote:

 I have been posting on the Mesos list, as I am looking to see if it
 it's possible or not to share spark drivers.  Obviously, in stand
 alone cluster mode, the Master handles requests, and you can
 instantiate a new sparkcontext to a currently running master. However
 in Mesos (and perhaps Yarn) I don't see how this is possible.

 I guess I am curious on why? It could make quite a bit of sense to
 have one driver act as a master, running as a certain user, (ideally
 running out in the Mesos cluster, which I believe Tim Chen is working
 on).   That driver could belong to a user, and be used as a long term
 resource controlled instance that the user could use for adhoc
 queries.  While running many little ones out on the cluster seems to
 be a waste of driver resources, as each driver would be using the same
 resources, and rarely would many be used at once (if they were for a
 users adhoc environment). Additionally, the advantages of the shared
 driver seem to play out for a user as they come back to the
 environment over and over again.

 Does this make sense? I really want to try to understand how looking
 at this way is wrong, either from a Spark paradigm perspective of a
 technological perspective.  I will grant, that I am coming from a
 traditional background, so some of the older ideas for how to set
 things up may be creeping into my thinking, but if that's the case,
 I'd love to understand better.

 Thanks1

 John

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org