Re: Sharing Spark Drivers
Hi John, This would be a potential application for the Spark Kernel project ( https://github.com/ibm-et/spark-kernel). The Spark Kernel serves as your driver application, allowing you to feed it snippets of code (or load up entire jars via magics) in Scala to execute against a Spark cluster. Although not technically supported, you can connect multiple applications to the same Spark Kernel instance to use the same resources (both on the cluster and on the driver). If you're curious, you can find a getting started section here: https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel Signed, Chip Senkbeil On Tue Feb 24 2015 at 8:04:08 AM John Omernik j...@omernik.com wrote: I have been posting on the Mesos list, as I am looking to see if it it's possible or not to share spark drivers. Obviously, in stand alone cluster mode, the Master handles requests, and you can instantiate a new sparkcontext to a currently running master. However in Mesos (and perhaps Yarn) I don't see how this is possible. I guess I am curious on why? It could make quite a bit of sense to have one driver act as a master, running as a certain user, (ideally running out in the Mesos cluster, which I believe Tim Chen is working on). That driver could belong to a user, and be used as a long term resource controlled instance that the user could use for adhoc queries. While running many little ones out on the cluster seems to be a waste of driver resources, as each driver would be using the same resources, and rarely would many be used at once (if they were for a users adhoc environment). Additionally, the advantages of the shared driver seem to play out for a user as they come back to the environment over and over again. Does this make sense? I really want to try to understand how looking at this way is wrong, either from a Spark paradigm perspective of a technological perspective. I will grant, that I am coming from a traditional background, so some of the older ideas for how to set things up may be creeping into my thinking, but if that's the case, I'd love to understand better. Thanks1 John - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark driver behind firewall
Hi, You can use the Spark Kernel project (https://github.com/ibm-et/spark-kernel) as a workaround of sorts. The Spark Kernel provides a generic solution to dynamically interact with an Apache Spark cluster (think of a remote Spark Shell). It serves as the driver application with which you can send Scala code to interact with Apache Spark. You would still need to expose the Spark Kernel outside the firewall (similar to Kostas' suggestion about the jobserver), of course. Signed, Chip Senkbeil On Thu Feb 05 2015 at 11:07:28 PM Kostas Sakellis kos...@cloudera.com wrote: Yes, the driver has to be able to accept incoming connections. All the executors connect back to the driver sending heartbeats, map status, metrics. It is critical and I don't know of a way around it. You could look into using something like the https://github.com/spark-jobserver/spark-jobserver that could run outside the firewall. Then from inside the firewall you can make REST calls to the server. On Thu, Feb 5, 2015 at 5:03 PM, Kane Kim kane.ist...@gmail.com wrote: I submit spark job from machine behind firewall, I can't open any incoming connections to that box, does driver absolutely need to accept incoming connections? Is there any workaround for that case? Thanks.
Re: How to design a long live spark application
Hi, You can also check out the Spark Kernel project: https://github.com/ibm-et/spark-kernel It can plug into the upcoming IPython 3.0 notebook (providing a Scala/Spark language interface) and provides an API to submit code snippets (like the Spark Shell) and get results directly back, rather than having to write out your results elsewhere. A client library ( https://github.com/ibm-et/spark-kernel/wiki/Guide-for-the-Spark-Kernel-Client) is available in Scala so you can create applications that can interactively communicate with Apache Spark. You can find a getting started section here: https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel If you have any more questions about the project, feel free to email me! Signed, Chip Senkbeil On Thu Feb 05 2015 at 10:58:01 AM Corey Nolet cjno...@gmail.com wrote: Here's another lightweight example of running a SparkContext in a common java servlet container: https://github.com/calrissian/spark-jetty-server On Thu, Feb 5, 2015 at 11:46 AM, Charles Feduke charles.fed...@gmail.com wrote: If you want to design something like Spark shell have a look at: http://zeppelin-project.org/ Its open source and may already do what you need. If not, its source code will be helpful in answering the questions about how to integrate with long running jobs that you have. On Thu Feb 05 2015 at 11:42:56 AM Boromir Widas vcsub...@gmail.com wrote: You can check out https://github.com/spark-jobserver/spark-jobserver - this allows several users to upload their jars and run jobs with a REST interface. However, if all users are using the same functionality, you can write a simple spray server which will act as the driver and hosts the spark context+RDDs, launched in client mode. On Thu, Feb 5, 2015 at 10:25 AM, Shuai Zheng szheng.c...@gmail.com wrote: Hi All, I want to develop a server side application: User submit request à Server run spark application and return (this might take a few seconds). So I want to host the server to keep the long-live context, I don’t know whether this is reasonable or not. Basically I try to have a global JavaSparkContext instance and keep it there, and initialize some RDD. Then my java application will use it to submit the job. So now I have some questions: 1, if I don’t close it, will there any timeout I need to configure on the spark server? 2, In theory I want to design something similar to Spark shell (which also host a default sc there), just it is not shell based. Any suggestion? I think my request is very common for application development, here must someone has done it before? Regards, Shawn
Re: SVD in pyspark ?
Hi Andreas, With regard to the notebook interface, you can use the Spark Kernel ( https://github.com/ibm-et/spark-kernel) as the backend for an IPython 3.0 notebook. The kernel is designed to be the foundation for interactive applications connecting to Apache Spark and uses the IPython 5.0 message protocol - used by IPython 3.0 - to communicate. See the getting started section here: https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel It discusses getting IPython connected to a Spark Kernel. If you have any more questions, feel free to ask! Signed, Chip Senkbeil IBM Emerging Technologies Software Engineer On Sun Jan 25 2015 at 1:12:32 PM Andreas Rhode m.a.rh...@gmail.com wrote: Is the distributed SVD functionality exposed to Python yet? Seems it's only available to scala or java, unless I am missing something, looking for a pyspark equivalent to org.apache.spark.mllib.linalg.SingularValueDecomposition In case it's not there yet, is there a way to make a wrapper to call from python into the corresponding java/scala code? The reason for using python instead of just directly scala is that I like to take advantage of the notebook interface for visualization. As a side, is there a inotebook like interface for the scala based REPL? Thanks Andreas -- View this message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/SVD-in-pyspark-tp21356.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org