Re: Sharing Spark Drivers

2015-02-24 Thread Chip Senkbeil
Hi John,

This would be a potential application for the Spark Kernel project (
https://github.com/ibm-et/spark-kernel). The Spark Kernel serves as your
driver application, allowing you to feed it snippets of code (or load up
entire jars via magics) in Scala to execute against a Spark cluster.

Although not technically supported, you can connect multiple applications
to the same Spark Kernel instance to use the same resources (both on the
cluster and on the driver).

If you're curious, you can find a getting started section here:
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

Signed,
Chip Senkbeil

On Tue Feb 24 2015 at 8:04:08 AM John Omernik j...@omernik.com wrote:

 I have been posting on the Mesos list, as I am looking to see if it
 it's possible or not to share spark drivers.  Obviously, in stand
 alone cluster mode, the Master handles requests, and you can
 instantiate a new sparkcontext to a currently running master. However
 in Mesos (and perhaps Yarn) I don't see how this is possible.

 I guess I am curious on why? It could make quite a bit of sense to
 have one driver act as a master, running as a certain user, (ideally
 running out in the Mesos cluster, which I believe Tim Chen is working
 on).   That driver could belong to a user, and be used as a long term
 resource controlled instance that the user could use for adhoc
 queries.  While running many little ones out on the cluster seems to
 be a waste of driver resources, as each driver would be using the same
 resources, and rarely would many be used at once (if they were for a
 users adhoc environment). Additionally, the advantages of the shared
 driver seem to play out for a user as they come back to the
 environment over and over again.

 Does this make sense? I really want to try to understand how looking
 at this way is wrong, either from a Spark paradigm perspective of a
 technological perspective.  I will grant, that I am coming from a
 traditional background, so some of the older ideas for how to set
 things up may be creeping into my thinking, but if that's the case,
 I'd love to understand better.

 Thanks1

 John

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: spark driver behind firewall

2015-02-06 Thread Chip Senkbeil
Hi,

You can use the Spark Kernel project (https://github.com/ibm-et/spark-kernel)
as a workaround of sorts. The Spark Kernel provides a generic solution to
dynamically interact with an Apache Spark cluster (think of a remote Spark
Shell). It serves as the driver application with which you can send Scala
code to interact with Apache Spark. You would still need to expose the
Spark Kernel outside the firewall (similar to Kostas' suggestion about the
jobserver), of course.

Signed,
Chip Senkbeil

On Thu Feb 05 2015 at 11:07:28 PM Kostas Sakellis kos...@cloudera.com
wrote:

 Yes, the driver has to be able to accept incoming connections. All the
 executors connect back to the driver sending heartbeats, map status,
 metrics. It is critical and I don't know of a way around it. You could look
 into using something like the
 https://github.com/spark-jobserver/spark-jobserver that could run outside
 the firewall. Then from inside the firewall you can make REST calls to the
 server.

 On Thu, Feb 5, 2015 at 5:03 PM, Kane Kim kane.ist...@gmail.com wrote:

 I submit spark job from machine behind firewall, I can't open any
 incoming connections to that box, does driver absolutely need to accept
 incoming connections? Is there any workaround for that case?

 Thanks.





Re: How to design a long live spark application

2015-02-05 Thread Chip Senkbeil
Hi,

You can also check out the Spark Kernel project:
https://github.com/ibm-et/spark-kernel

It can plug into the upcoming IPython 3.0 notebook (providing a Scala/Spark
language interface) and provides an API to submit code snippets (like the
Spark Shell) and get results directly back, rather than having to write out
your results elsewhere. A client library (
https://github.com/ibm-et/spark-kernel/wiki/Guide-for-the-Spark-Kernel-Client)
is available in Scala so you can create applications that can interactively
communicate with Apache Spark.

You can find a getting started section here:
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

If you have any more questions about the project, feel free to email me!

Signed,
Chip Senkbeil

On Thu Feb 05 2015 at 10:58:01 AM Corey Nolet cjno...@gmail.com wrote:

 Here's another lightweight example of running a SparkContext in a common
 java servlet container: https://github.com/calrissian/spark-jetty-server

 On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com
 wrote:

 If you want to design something like Spark shell have a look at:

 http://zeppelin-project.org/

 Its open source and may already do what you need. If not, its source code
 will be helpful in answering the questions about how to integrate with long
 running jobs that you have.


 On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com
 wrote:

 You can check out https://github.com/spark-jobserver/spark-jobserver -
 this allows several users to upload their jars and run jobs with a REST
 interface.

 However, if all users are using the same functionality, you can write a
 simple spray server which will act as the driver and hosts the spark
 context+RDDs, launched in client mode.

 On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com
 wrote:

 Hi All,



 I want to develop a server side application:



 User submit request à Server run spark application and return (this
 might take a few seconds).



 So I want to host the server to keep the long-live context, I don’t
 know whether this is reasonable or not.



 Basically I try to have a global JavaSparkContext instance and keep it
 there, and initialize some RDD. Then my java application will use it to
 submit the job.



 So now I have some questions:



 1, if I don’t close it, will there any timeout I need to configure on
 the spark server?

 2, In theory I want to design something similar to Spark shell (which
 also host a default sc there), just it is not shell based.



 Any suggestion? I think my request is very common for application
 development, here must someone has done it before?



 Regards,



 Shawn






Re: SVD in pyspark ?

2015-01-25 Thread Chip Senkbeil
Hi Andreas,

With regard to the notebook interface,  you can use the Spark Kernel (
https://github.com/ibm-et/spark-kernel) as the backend for an IPython 3.0
notebook. The kernel is designed to be the foundation for interactive
applications connecting to Apache Spark and uses the IPython 5.0 message
protocol - used by IPython 3.0 - to communicate.

See the getting started section here:
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel

It discusses getting IPython connected to a Spark Kernel. If you have any
more questions, feel free to ask!

Signed,
Chip Senkbeil
IBM Emerging Technologies Software Engineer

On Sun Jan 25 2015 at 1:12:32 PM Andreas Rhode m.a.rh...@gmail.com wrote:

 Is the distributed SVD functionality exposed to Python yet?

 Seems it's only available to scala or java, unless I am missing something,
 looking for a pyspark equivalent to
 org.apache.spark.mllib.linalg.SingularValueDecomposition

 In case it's not there yet, is there a way to make a wrapper to call from
 python into the corresponding java/scala code? The reason for using python
 instead of just directly  scala is that I like to take advantage of the
 notebook interface for visualization.

 As a side, is there a inotebook like interface for the scala based REPL?

 Thanks

 Andreas



 --
 View this message in context: http://apache-spark-user-list.
 1001560.n3.nabble.com/SVD-in-pyspark-tp21356.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org