Re: Spark on Mesos 0.20

2014-10-17 Thread Fairiz Azizi
Hello,

Sorry for the delay (again), we were busy upgrading our cluster from MAPR
3.0.x to Mapr 3.1.1.26113.GA

I updated my builds to include referencing the native hadoop libraries by
this distribution as well as installing SNAPPY (I no longer see the 'unable
to load native hadoop libraries' and also see that it loads the SNAPPY
library).

I ran the example against a directory of ApacheLog files containing about
4.4GB, and things seem to work fine.

time MASTER=mesos://*:5050* /opt/spark/current/bin/run-example
LogQuery maprfs:///user/hive/warehouse/apachelog/dt=20141017/16

14/10/18 05:23:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
LogQuery.scala:80) finished in 1.704 s
14/10/18 05:23:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool default
14/10/18 05:23:21 INFO spark.SparkContext: Job finished: collect at
LogQuery.scala:80, took 40.533904277 s
(null,null,null) bytes=0 n=16682940

real 0m51.393s
user 0m19.130s
sys 0m4.120s

So this combination of software seems to work fine for me!

Spark 1.1.0
Mesos 0.20.1
MAPR 3.1.1.26113.GA
spark-1.1.0-bin-mapr3.tgz

Note: one thing you might try is increasing your spark.executor.memory
setting
Mine was set to 8GB in the spark-defaults.conf file.

Hope this helps,
Fi


Fairiz Fi Azizi

On Thu, Oct 9, 2014 at 11:35 PM, Gurvinder Singh gurvinder.si...@uninett.no
 wrote:

 On 10/10/2014 06:11 AM, Fairiz Azizi wrote:
  Hello,
 
  Sorry for the late reply.
 
  When I tried the LogQuery example this time, things now seem to be fine!
 
  ...
 
  14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
  LogQuery.scala:80) finished in 0.429 s
 
  14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
  whose tasks have all completed, from pool defa
 
  14/10/10 04:01:21 INFO spark.SparkContext: Job finished: collect at
  LogQuery.scala:80, took 12.802743914 s
 
  (10.10.10.10,FRED,GET http://images.com/2013/Generic.jpg HTTP/1.1)
  bytes=621   n=2
 
 
  Not sure if this is the correct response for that example.
 
  Our mesos/spark builds have since been updated since I last wrote.
 
  Possibly, the JDK version was updated to 1.7.0_67
 
  If you are using an older JDK, maybe try updating that?
 I have tested on current JDK 7 and now I am running JDK 8, the problem
 still exist. Can you run logquery on data of size say 100+ GB, so that
 you have more map tasks. As we start to see the issue on larger tasks.

 - Gurvinder
 
 
  - Fi
 
 
 
  Fairiz Fi Azizi
 
  On Wed, Oct 8, 2014 at 7:54 AM, RJ Nowling rnowl...@gmail.com
  mailto:rnowl...@gmail.com wrote:
 
  Yep!  That's the example I was talking about.
 
  Is an error message printed when it hangs? I get :
 
  14/09/30 13:23:14 ERROR BlockManagerMasterActor: Got two different
 block manager registrations on 20140930-131734-1723727882-5050-1895-1
 
 
 
  On Tue, Oct 7, 2014 at 8:36 PM, Fairiz Azizi code...@gmail.com
  mailto:code...@gmail.com wrote:
 
  Sure, could you point me to the example?
 
  The only thing I could find was
 
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala
 
  So do you mean running it like:
 MASTER=mesos://xxx_:5050_ ./run-example LogQuery
 
  I tried that and I can see the job run and the tasks complete on
  the slave nodes, but the client process seems to hang forever,
  it's probably a different problem. BTW, only a dozen or so tasks
  kick off.
 
  I actually haven't done much with Scala and Spark (it's been all
  python).
 
  Fi
 
 
 
  Fairiz Fi Azizi
 
  On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com
  mailto:rnowl...@gmail.com wrote:
 
  I was able to reproduce it on a small 4 node cluster (1
  mesos master and 3 mesos slaves) with relatively low-end
  specs.  As I said, I just ran the log query examples with
  the fine-grained mesos mode.
 
  Spark 1.1.0 and mesos 0.20.1.
 
  Fairiz, could you try running the logquery example included
  with Spark and see what you get?
 
  Thanks!
 
  On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi
  code...@gmail.com mailto:code...@gmail.com wrote:
 
  That's what great about Spark, the community is so
  active! :)
 
  I compiled Mesos 0.20.1 from the source tarball.
 
  Using the Mapr3 Spark 1.1.0 distribution from the Spark
  downloads page  (spark-1.1.0-bin-mapr3.tgz).
 
  I see no problems for the workloads we are trying.
 
  However, the cluster is small (less than 100 cores
  across 3 nodes).
 
  The workloads reads in just a few gigabytes from HDFS,
  via an ipython

Re: Spark on Mesos 0.20

2014-10-09 Thread Fairiz Azizi
Hello,

Sorry for the late reply.

When I tried the LogQuery example this time, things now seem to be fine!

...

14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
LogQuery.scala:80) finished in 0.429 s

14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool defa

14/10/10 04:01:21 INFO spark.SparkContext: Job finished: collect at
LogQuery.scala:80, took 12.802743914 s

(10.10.10.10,FRED,GET http://images.com/2013/Generic.jpg HTTP/1.1)
bytes=621   n=2


Not sure if this is the correct response for that example.

Our mesos/spark builds have since been updated since I last wrote.

Possibly, the JDK version was updated to 1.7.0_67

If you are using an older JDK, maybe try updating that?


- Fi



Fairiz Fi Azizi

On Wed, Oct 8, 2014 at 7:54 AM, RJ Nowling rnowl...@gmail.com wrote:

 Yep!  That's the example I was talking about.

 Is an error message printed when it hangs? I get :

 14/09/30 13:23:14 ERROR BlockManagerMasterActor: Got two different block 
 manager registrations on 20140930-131734-1723727882-5050-1895-1



 On Tue, Oct 7, 2014 at 8:36 PM, Fairiz Azizi code...@gmail.com wrote:

 Sure, could you point me to the example?

 The only thing I could find was

 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala

 So do you mean running it like:
MASTER=mesos://xxx*:5050* ./run-example LogQuery

 I tried that and I can see the job run and the tasks complete on the
 slave nodes, but the client process seems to hang forever, it's probably a
 different problem. BTW, only a dozen or so tasks kick off.

 I actually haven't done much with Scala and Spark (it's been all python).

 Fi



 Fairiz Fi Azizi

 On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com wrote:

 I was able to reproduce it on a small 4 node cluster (1 mesos master and
 3 mesos slaves) with relatively low-end specs.  As I said, I just ran the
 log query examples with the fine-grained mesos mode.

 Spark 1.1.0 and mesos 0.20.1.

 Fairiz, could you try running the logquery example included with Spark
 and see what you get?

 Thanks!

 On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com wrote:

 That's what great about Spark, the community is so active! :)

 I compiled Mesos 0.20.1 from the source tarball.

 Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page
  (spark-1.1.0-bin-mapr3.tgz).

 I see no problems for the workloads we are trying.

 However, the cluster is small (less than 100 cores across 3 nodes).

 The workloads reads in just a few gigabytes from HDFS, via an ipython
 notebook spark shell.

 thanks,
 Fi



 Fairiz Fi Azizi

 On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote:

 Ok I created SPARK-3817 to track this, will try to repro it as well.

 Tim

 On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote:
  I've recently run into this issue as well. I get it from running
 Spark
  examples such as log query.  Maybe that'll help reproduce the issue.
 
 
  On Monday, October 6, 2014, Gurvinder Singh 
 gurvinder.si...@uninett.no
  wrote:
 
  The issue does not occur if the task at hand has small number of map
  tasks. I have a task which has 978 map tasks and I see this error as
 
  14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different
 block
  manager registrations on 20140711-081617-711206558-5050-2543-5
 
  Here is the log from the mesos-slave where this container was
 running.
 
  http://pastebin.com/Q1Cuzm6Q
 
  If you look for the code from where error produced by spark, you
 will
  see that it simply exit and saying in comments this should never
  happen, lets just quit :-)
 
  - Gurvinder
  On 10/06/2014 09:30 AM, Timothy Chen wrote:
   (Hit enter too soon...)
  
   What is your setup and steps to repro this?
  
   Tim
  
   On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com
 wrote:
   Hi Gurvinder,
  
   I tried fine grain mode before and didn't get into that problem.
  
  
   On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh
   gurvinder.si...@uninett.no wrote:
   On 10/06/2014 08:19 AM, Fairiz Azizi wrote:
   The Spark online docs indicate that Spark is compatible with
 Mesos
   0.18.1
  
   I've gotten it to work just fine on 0.18.1 and 0.18.2
  
   Has anyone tried Spark on a newer version of Mesos, i.e. Mesos
   v0.20.0?
  
   -Fi
  
   Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in
   coarse
   mode, in fine grain mode there is an issue with blockmanager
 names
   conflict. I have been waiting for it to be fixed but it is still
   there.
  
   -Gurvinder
  
  
 -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
 
 
 -
  To unsubscribe, e-mail: dev

Re: Spark on Mesos 0.20

2014-10-07 Thread Fairiz Azizi
Sure, could you point me to the example?

The only thing I could find was
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala

So do you mean running it like:
   MASTER=mesos://xxx*:5050* ./run-example LogQuery

I tried that and I can see the job run and the tasks complete on the slave
nodes, but the client process seems to hang forever, it's probably a
different problem. BTW, only a dozen or so tasks kick off.

I actually haven't done much with Scala and Spark (it's been all python).

Fi



Fairiz Fi Azizi

On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com wrote:

 I was able to reproduce it on a small 4 node cluster (1 mesos master and 3
 mesos slaves) with relatively low-end specs.  As I said, I just ran the log
 query examples with the fine-grained mesos mode.

 Spark 1.1.0 and mesos 0.20.1.

 Fairiz, could you try running the logquery example included with Spark and
 see what you get?

 Thanks!

 On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com wrote:

 That's what great about Spark, the community is so active! :)

 I compiled Mesos 0.20.1 from the source tarball.

 Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page
  (spark-1.1.0-bin-mapr3.tgz).

 I see no problems for the workloads we are trying.

 However, the cluster is small (less than 100 cores across 3 nodes).

 The workloads reads in just a few gigabytes from HDFS, via an ipython
 notebook spark shell.

 thanks,
 Fi



 Fairiz Fi Azizi

 On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote:

 Ok I created SPARK-3817 to track this, will try to repro it as well.

 Tim

 On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote:
  I've recently run into this issue as well. I get it from running Spark
  examples such as log query.  Maybe that'll help reproduce the issue.
 
 
  On Monday, October 6, 2014, Gurvinder Singh 
 gurvinder.si...@uninett.no
  wrote:
 
  The issue does not occur if the task at hand has small number of map
  tasks. I have a task which has 978 map tasks and I see this error as
 
  14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different
 block
  manager registrations on 20140711-081617-711206558-5050-2543-5
 
  Here is the log from the mesos-slave where this container was running.
 
  http://pastebin.com/Q1Cuzm6Q
 
  If you look for the code from where error produced by spark, you will
  see that it simply exit and saying in comments this should never
  happen, lets just quit :-)
 
  - Gurvinder
  On 10/06/2014 09:30 AM, Timothy Chen wrote:
   (Hit enter too soon...)
  
   What is your setup and steps to repro this?
  
   Tim
  
   On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com
 wrote:
   Hi Gurvinder,
  
   I tried fine grain mode before and didn't get into that problem.
  
  
   On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh
   gurvinder.si...@uninett.no wrote:
   On 10/06/2014 08:19 AM, Fairiz Azizi wrote:
   The Spark online docs indicate that Spark is compatible with
 Mesos
   0.18.1
  
   I've gotten it to work just fine on 0.18.1 and 0.18.2
  
   Has anyone tried Spark on a newer version of Mesos, i.e. Mesos
   v0.20.0?
  
   -Fi
  
   Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in
   coarse
   mode, in fine grain mode there is an issue with blockmanager names
   conflict. I have been waiting for it to be fixed but it is still
   there.
  
   -Gurvinder
  
  
 -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
  --
  em rnowl...@gmail.com
  c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314



Spark on Mesos 0.20

2014-10-06 Thread Fairiz Azizi
The Spark online docs indicate that Spark is compatible with Mesos 0.18.1

I've gotten it to work just fine on 0.18.1 and 0.18.2

Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0?

-Fi


Re: Spark on Mesos 0.20

2014-10-06 Thread Fairiz Azizi
That's what great about Spark, the community is so active! :)

I compiled Mesos 0.20.1 from the source tarball.

Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page
 (spark-1.1.0-bin-mapr3.tgz).

I see no problems for the workloads we are trying.

However, the cluster is small (less than 100 cores across 3 nodes).

The workloads reads in just a few gigabytes from HDFS, via an ipython
notebook spark shell.

thanks,
Fi



Fairiz Fi Azizi

On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote:

 Ok I created SPARK-3817 to track this, will try to repro it as well.

 Tim

 On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote:
  I've recently run into this issue as well. I get it from running Spark
  examples such as log query.  Maybe that'll help reproduce the issue.
 
 
  On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no
  wrote:
 
  The issue does not occur if the task at hand has small number of map
  tasks. I have a task which has 978 map tasks and I see this error as
 
  14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block
  manager registrations on 20140711-081617-711206558-5050-2543-5
 
  Here is the log from the mesos-slave where this container was running.
 
  http://pastebin.com/Q1Cuzm6Q
 
  If you look for the code from where error produced by spark, you will
  see that it simply exit and saying in comments this should never
  happen, lets just quit :-)
 
  - Gurvinder
  On 10/06/2014 09:30 AM, Timothy Chen wrote:
   (Hit enter too soon...)
  
   What is your setup and steps to repro this?
  
   Tim
  
   On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com
 wrote:
   Hi Gurvinder,
  
   I tried fine grain mode before and didn't get into that problem.
  
  
   On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh
   gurvinder.si...@uninett.no wrote:
   On 10/06/2014 08:19 AM, Fairiz Azizi wrote:
   The Spark online docs indicate that Spark is compatible with Mesos
   0.18.1
  
   I've gotten it to work just fine on 0.18.1 and 0.18.2
  
   Has anyone tried Spark on a newer version of Mesos, i.e. Mesos
   v0.20.0?
  
   -Fi
  
   Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in
   coarse
   mode, in fine grain mode there is an issue with blockmanager names
   conflict. I have been waiting for it to be fixed but it is still
   there.
  
   -Gurvinder
  
  
 -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
  --
  em rnowl...@gmail.com
  c 954.496.2314