Re: is Mesos falling out of favor?
By looking at your config, I think there's something wrong with your setup. One of the key elements of Mesos is that you are abstracted from where the execution of your task takes place. The SPARK_EXECUTOR_URI tells Mesos where to find the 'framework' (in Mesos jargon) required to execute a job. (Actually, it tells the spark driver to tell mesos where to download the framework) Your config looks like you are running some mix of Spark Cluster with Mesos. This is an example of a Spark job to run on Mesos: Driver: ADD_JARS=/.../job-jar-with-dependencies.jar SPARK_LOCAL_IP=IP java -cp /.../spark-assembly.jar:/.../job-jar-with-dependencies.jar -Dconfig.file=job-config.conf com.example.jobs.SparkJob Config: job-config.conf contains this info on Mesos: (Note the Mesos URI is constructed from this config # # Mesos configuration # mesos { zookeeper = {zookeeper.ip} executorUri = hdfs://${hdfs.nameNode.host}:${hdfs.nameNode.port}/spark/spark-0.9.0.1-bin.tar.gz master { host = {mesos-ip} port = 5050 } } Probably this can still be improved as it's the result of some trial-error-repeat, but it's working for us. -greetz, Gerard On Wed, May 7, 2014 at 7:43 PM, deric barton.to...@gmail.com wrote: I'm running 1.0.0 branch, finally I've managed to make it work. I'm using a Debian package which is distributed on all slave nodes. So, I've removed `SPARK_EXECUTOR_URI` and it works, spark-env.sh looks like this: export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so export SCALA_HOME=/usr export SCALA_LIBRARY_PATH=/usr/share/java export MASTER=mesos://zk://192.168.1.1:2181/mesos export SPARK_HOME=/usr/share/spark export SPARK_LOCAL_IP=192.168.1.2 export SPARK_PRINT_LAUNCH_COMMAND=1 export CLASSPATH=$CLASSPATH:$SPARK_HOME/lib/ scripts for Debian package are here (I'll try to add some documentation): https://github.com/deric/spark-deb-packaging -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5484.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: is Mesos falling out of favor?
Paco, that's a great video reference, thanks. To be fair to our friends at Yahoo, who have done a tremendous amount to help advance the cause of the BDAS stack, it's not FUD coming from them, certainly not in any organized or intentional manner. In vacuo we prefer Mesos ourselves, but also can't ignore the fact that in the larger market, many enterprise technology stack decisions are made based on their existing vendor support relationships. And in view of Mesos, super happy to see Mesosphere growing! Sent while mobile. Pls excuse typos etc. That's FUD. Tracking the Mesos and Spark use cases, there are very large production deployments of these together. Some are rather private but others are being surfaced. IMHO, one of the most amazing case studies is from Christina Delimitrou http://youtu.be/YpmElyi94AA For a tutorial, use the following but upgrade it to latest production for Spark. There was a related O'Reilly webcast and Strata tutorial as well: http://mesosphere.io/learn/run-spark-on-mesos/ FWIW, I teach Intro to Spark with sections on CM4, YARN, Mesos, etc. Based on lots of student experiences, Mesos is clearly the shortest path to deploying a Spark cluster if you want to leverage the robustness, multi-tenancy for mixed workloads, less ops overhead, etc., that show up repeatedly in the use case analyses. My opinion only and not that of any of my clients: Don't believe the FUD from YHOO unless you really want to be stuck in 2009. On Wed, May 7, 2014 at 8:30 AM, deric barton.to...@gmail.com wrote: I'm also using right now SPARK_EXECUTOR_URI, though I would prefer distributing Spark as a binary package. For running examples with `./bin/run-example ...` it works fine, however tasks from spark-shell are getting lost. Error: Could not find or load main class org.apache.spark.executor.MesosExecutorBackend which looks more like problem with sbin/spark-executor and missing paths to jar. Anyone encountered this error before? I guess Yahoo invested quite a lot of effort into YARN and Spark integration (moreover when Mahout is migrating to Spark there's much more interest in Hadoop and Spark integration). If there would be some Mesos company working on Spark - Mesos integration it could be at least on the same level. I don't see any other reason why would be YARN better than Mesos, personally I like the latter, however I haven't checked YARN for a while, maybe they've made a significant progress. I think Mesos is more universal and flexible than YARN. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5481.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: is Mesos falling out of favor?
I'm also using right now SPARK_EXECUTOR_URI, though I would prefer distributing Spark as a binary package. For running examples with `./bin/run-example ...` it works fine, however tasks from spark-shell are getting lost. Error: Could not find or load main class org.apache.spark.executor.MesosExecutorBackend which looks more like problem with sbin/spark-executor and missing paths to jar. Anyone encountered this error before? I guess Yahoo invested quite a lot of effort into YARN and Spark integration (moreover when Mahout is migrating to Spark there's much more interest in Hadoop and Spark integration). If there would be some Mesos company working on Spark - Mesos integration it could be at least on the same level. I don't see any other reason why would be YARN better than Mesos, personally I like the latter, however I haven't checked YARN for a while, maybe they've made a significant progress. I think Mesos is more universal and flexible than YARN. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5481.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: is Mesos falling out of favor?
curious what the bug is and what it breaks? I have spark 0.9.0 running on mesos 0.17.0 and seems to work correctly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5483.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: is Mesos falling out of favor?
For what it is worth, our team here at MediaCrossinghttp://mediacrossing.com has been using the Spark/Mesos combination since last summer with much success (low operations overhead, high developer performance). IMO, Hadoop is overcomplicated from both a development and operations perspective so I am looking to lower our dependencies on it, not increase them. Our stack currently includes: - Spark 0.9.1 - Mesos 0.17 - Chronos - HDFS (CDH 5.0-mr1) - Flume 1.4.0 - ZooKeeper - Cassandra 2.0 (key-value store alternative to HBase) - Storm 0.9 (we prefer today to Spark Streaming) We've used Shark in the past as well, but since most of us prefer the Spark Shell we have not been maintaining it. Using Mesos to run Spark allows for us to optimize our available resources (CPU + RAM currently ) between Spark, Chronos and a number of other services. I see YARN as being heavily focused on MR2, but the reality is we are using Spark in large part because writing MapReduce jobs is verbose, hard to maintain and not performant (against Spark). We have the advantage of not having any real legacy Map/Reduce jobs to maintain, so that consideration does not come into play. Finally, I am a believer that for the long term direction of our company, the Berkeley stack https://amplab.cs.berkeley.edu/software/ will serve us best. Leveraging Mesos and Spark from the onset paves the way for this. On Sun, May 11, 2014 at 1:28 PM, Paco Nathan cet...@gmail.com wrote: That's FUD. Tracking the Mesos and Spark use cases, there are very large production deployments of these together. Some are rather private but others are being surfaced. IMHO, one of the most amazing case studies is from Christina Delimitrou http://youtu.be/YpmElyi94AA For a tutorial, use the following but upgrade it to latest production for Spark. There was a related O'Reilly webcast and Strata tutorial as well: http://mesosphere.io/learn/run-spark-on-mesos/ FWIW, I teach Intro to Spark with sections on CM4, YARN, Mesos, etc. Based on lots of student experiences, Mesos is clearly the shortest path to deploying a Spark cluster if you want to leverage the robustness, multi-tenancy for mixed workloads, less ops overhead, etc., that show up repeatedly in the use case analyses. My opinion only and not that of any of my clients: Don't believe the FUD from YHOO unless you really want to be stuck in 2009. On Wed, May 7, 2014 at 8:30 AM, deric barton.to...@gmail.com wrote: I'm also using right now SPARK_EXECUTOR_URI, though I would prefer distributing Spark as a binary package. For running examples with `./bin/run-example ...` it works fine, however tasks from spark-shell are getting lost. Error: Could not find or load main class org.apache.spark.executor.MesosExecutorBackend which looks more like problem with sbin/spark-executor and missing paths to jar. Anyone encountered this error before? I guess Yahoo invested quite a lot of effort into YARN and Spark integration (moreover when Mahout is migrating to Spark there's much more interest in Hadoop and Spark integration). If there would be some Mesos company working on Spark - Mesos integration it could be at least on the same level. I don't see any other reason why would be YARN better than Mesos, personally I like the latter, however I haven't checked YARN for a while, maybe they've made a significant progress. I think Mesos is more universal and flexible than YARN. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5481.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: is Mesos falling out of favor?
I guess it's due to missing documentation and quite complicated setup. Continuous integration would be nice! Btw. is it possible to use spark as a shared library and not to fetch spark tarball for each task? Do you point SPARK_EXECUTOR_URI to HDFS url? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5448.html Sent from the Apache Spark User List mailing list archive at Nabble.com.