Re: Spark on Mesos 0.20
Hello, Sorry for the delay (again), we were busy upgrading our cluster from MAPR 3.0.x to Mapr 3.1.1.26113.GA I updated my builds to include referencing the native hadoop libraries by this distribution as well as installing SNAPPY (I no longer see the 'unable to load native hadoop libraries' and also see that it loads the SNAPPY library). I ran the example against a directory of ApacheLog files containing about 4.4GB, and things seem to work fine. time MASTER=mesos://*:5050* /opt/spark/current/bin/run-example LogQuery maprfs:///user/hive/warehouse/apachelog/dt=20141017/16 14/10/18 05:23:21 INFO scheduler.DAGScheduler: Stage 0 (collect at LogQuery.scala:80) finished in 1.704 s 14/10/18 05:23:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool default 14/10/18 05:23:21 INFO spark.SparkContext: Job finished: collect at LogQuery.scala:80, took 40.533904277 s (null,null,null) bytes=0 n=16682940 real 0m51.393s user 0m19.130s sys 0m4.120s So this combination of software seems to work fine for me! Spark 1.1.0 Mesos 0.20.1 MAPR 3.1.1.26113.GA spark-1.1.0-bin-mapr3.tgz Note: one thing you might try is increasing your spark.executor.memory setting Mine was set to 8GB in the spark-defaults.conf file. Hope this helps, Fi Fairiz Fi Azizi On Thu, Oct 9, 2014 at 11:35 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: On 10/10/2014 06:11 AM, Fairiz Azizi wrote: Hello, Sorry for the late reply. When I tried the LogQuery example this time, things now seem to be fine! ... 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at LogQuery.scala:80) finished in 0.429 s 14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool defa 14/10/10 04:01:21 INFO spark.SparkContext: Job finished: collect at LogQuery.scala:80, took 12.802743914 s (10.10.10.10,FRED,GET http://images.com/2013/Generic.jpg HTTP/1.1) bytes=621 n=2 Not sure if this is the correct response for that example. Our mesos/spark builds have since been updated since I last wrote. Possibly, the JDK version was updated to 1.7.0_67 If you are using an older JDK, maybe try updating that? I have tested on current JDK 7 and now I am running JDK 8, the problem still exist. Can you run logquery on data of size say 100+ GB, so that you have more map tasks. As we start to see the issue on larger tasks. - Gurvinder - Fi Fairiz Fi Azizi On Wed, Oct 8, 2014 at 7:54 AM, RJ Nowling rnowl...@gmail.com mailto:rnowl...@gmail.com wrote: Yep! That's the example I was talking about. Is an error message printed when it hangs? I get : 14/09/30 13:23:14 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140930-131734-1723727882-5050-1895-1 On Tue, Oct 7, 2014 at 8:36 PM, Fairiz Azizi code...@gmail.com mailto:code...@gmail.com wrote: Sure, could you point me to the example? The only thing I could find was https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala So do you mean running it like: MASTER=mesos://xxx_:5050_ ./run-example LogQuery I tried that and I can see the job run and the tasks complete on the slave nodes, but the client process seems to hang forever, it's probably a different problem. BTW, only a dozen or so tasks kick off. I actually haven't done much with Scala and Spark (it's been all python). Fi Fairiz Fi Azizi On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com mailto:rnowl...@gmail.com wrote: I was able to reproduce it on a small 4 node cluster (1 mesos master and 3 mesos slaves) with relatively low-end specs. As I said, I just ran the log query examples with the fine-grained mesos mode. Spark 1.1.0 and mesos 0.20.1. Fairiz, could you try running the logquery example included with Spark and see what you get? Thanks! On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com mailto:code...@gmail.com wrote: That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython
Re: Spark on Mesos 0.20
Hello, Sorry for the late reply. When I tried the LogQuery example this time, things now seem to be fine! ... 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at LogQuery.scala:80) finished in 0.429 s 14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool defa 14/10/10 04:01:21 INFO spark.SparkContext: Job finished: collect at LogQuery.scala:80, took 12.802743914 s (10.10.10.10,FRED,GET http://images.com/2013/Generic.jpg HTTP/1.1) bytes=621 n=2 Not sure if this is the correct response for that example. Our mesos/spark builds have since been updated since I last wrote. Possibly, the JDK version was updated to 1.7.0_67 If you are using an older JDK, maybe try updating that? - Fi Fairiz Fi Azizi On Wed, Oct 8, 2014 at 7:54 AM, RJ Nowling rnowl...@gmail.com wrote: Yep! That's the example I was talking about. Is an error message printed when it hangs? I get : 14/09/30 13:23:14 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140930-131734-1723727882-5050-1895-1 On Tue, Oct 7, 2014 at 8:36 PM, Fairiz Azizi code...@gmail.com wrote: Sure, could you point me to the example? The only thing I could find was https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala So do you mean running it like: MASTER=mesos://xxx*:5050* ./run-example LogQuery I tried that and I can see the job run and the tasks complete on the slave nodes, but the client process seems to hang forever, it's probably a different problem. BTW, only a dozen or so tasks kick off. I actually haven't done much with Scala and Spark (it's been all python). Fi Fairiz Fi Azizi On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com wrote: I was able to reproduce it on a small 4 node cluster (1 mesos master and 3 mesos slaves) with relatively low-end specs. As I said, I just ran the log query examples with the fine-grained mesos mode. Spark 1.1.0 and mesos 0.20.1. Fairiz, could you try running the logquery example included with Spark and see what you get? Thanks! On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com wrote: That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython notebook spark shell. thanks, Fi Fairiz Fi Azizi On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote: Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote: I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no wrote: The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from the mesos-slave where this container was running. http://pastebin.com/Q1Cuzm6Q If you look for the code from where error produced by spark, you will see that it simply exit and saying in comments this should never happen, lets just quit :-) - Gurvinder On 10/06/2014 09:30 AM, Timothy Chen wrote: (Hit enter too soon...) What is your setup and steps to repro this? Tim On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com wrote: Hi Gurvinder, I tried fine grain mode before and didn't get into that problem. On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: On 10/06/2014 08:19 AM, Fairiz Azizi wrote: The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse mode, in fine grain mode there is an issue with blockmanager names conflict. I have been waiting for it to be fixed but it is still there. -Gurvinder - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev
Re: Spark on Mesos 0.20
Sure, could you point me to the example? The only thing I could find was https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala So do you mean running it like: MASTER=mesos://xxx*:5050* ./run-example LogQuery I tried that and I can see the job run and the tasks complete on the slave nodes, but the client process seems to hang forever, it's probably a different problem. BTW, only a dozen or so tasks kick off. I actually haven't done much with Scala and Spark (it's been all python). Fi Fairiz Fi Azizi On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com wrote: I was able to reproduce it on a small 4 node cluster (1 mesos master and 3 mesos slaves) with relatively low-end specs. As I said, I just ran the log query examples with the fine-grained mesos mode. Spark 1.1.0 and mesos 0.20.1. Fairiz, could you try running the logquery example included with Spark and see what you get? Thanks! On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com wrote: That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython notebook spark shell. thanks, Fi Fairiz Fi Azizi On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote: Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote: I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no wrote: The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from the mesos-slave where this container was running. http://pastebin.com/Q1Cuzm6Q If you look for the code from where error produced by spark, you will see that it simply exit and saying in comments this should never happen, lets just quit :-) - Gurvinder On 10/06/2014 09:30 AM, Timothy Chen wrote: (Hit enter too soon...) What is your setup and steps to repro this? Tim On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com wrote: Hi Gurvinder, I tried fine grain mode before and didn't get into that problem. On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: On 10/06/2014 08:19 AM, Fairiz Azizi wrote: The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse mode, in fine grain mode there is an issue with blockmanager names conflict. I have been waiting for it to be fixed but it is still there. -Gurvinder - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Spark on Mesos 0.20
The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi
Re: Spark on Mesos 0.20
That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython notebook spark shell. thanks, Fi Fairiz Fi Azizi On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote: Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote: I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no wrote: The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from the mesos-slave where this container was running. http://pastebin.com/Q1Cuzm6Q If you look for the code from where error produced by spark, you will see that it simply exit and saying in comments this should never happen, lets just quit :-) - Gurvinder On 10/06/2014 09:30 AM, Timothy Chen wrote: (Hit enter too soon...) What is your setup and steps to repro this? Tim On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com wrote: Hi Gurvinder, I tried fine grain mode before and didn't get into that problem. On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: On 10/06/2014 08:19 AM, Fairiz Azizi wrote: The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse mode, in fine grain mode there is an issue with blockmanager names conflict. I have been waiting for it to be fixed but it is still there. -Gurvinder - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314