To be honest I don't know what was the problem. I didn't manage to make my
Spark jobs work on the mesos cluster running on two virtual machines. I
managed to make it work when I run my Spark jobs on my local machine and
both master and mesos slaves are running also in my machine.

I guess something is not working properly in the way that virtualbox is
assigning their network interfaces to the virtual machines but I can't
waste more time in the issue.

Thank you again for your help!

2015-05-28 19:28 GMT+02:00 Alex Rukletsov <[email protected]>:

> Great! Mind sharing with the list what the problem was (for future
> reference)?
>
> On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <[email protected]>
> wrote:
>
> > Hi Alex,
> >
> > I managed to make it work!! Finally I'm running both mesos master and
> slave
> > in my laptop and picking up the spark jar from a hdfs installed in a VM.
> > I've just launched an spark job and is working fine!
> >
> > Thank you very much for your help
> >
> > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <[email protected]>:
> >
> > > Hi Alex,
> > >
> > > see following an extract of the chronos log (not sure whether this is
> the
> > > log you were talking about):
> > >
> > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> > > scheduled! Declining offers
> > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received
> > resource
> > > offers
> > > 2015-05-28_14:18:34.49903
> > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> > > scheduled! Declining offers
> > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received
> > resource
> > > offers
> > > 2015-05-28_14:18:40.50444
> > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> > > scheduled! Declining offers
> > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > >
> > > I'm using 0.20.1 because I'm using this vagrant machine:
> > > https://github.com/Banno/vagrant-mesos
> > >
> > > Kind regards and thank you again for your help
> > >
> > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <[email protected]>:
> > >
> > >> Alberto,
> > >>
> > >> it looks like Spark scheduler disconnects right after establishing the
> > >> connection. Would you mind sharing scheduler logs as well? Also I see
> > that
> > >> you haven't specified the failover_timeout, try setting this value to
> > >> something meaningful (several hours for test purposes).
> > >>
> > >> And by the way, any reason you're still on Mesos 0.20.1?
> > >>
> > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <[email protected]
> >
> > >> wrote:
> > >>
> > >> > Hi Alex,
> > >> >
> > >> > I do not know what's going on, now I'm unable to access the spark
> > >> console
> > >> > again, it's hanging up in the same point as before. See following
> the
> > >> > master logs:
> > >> >
> > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944
> master.cpp:3760]
> > >> > Sending 1 offers to framework
> 20150527-100126-169978048-5050-1851-0001
> > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> scheduler-be29901f-39ab-4bdf
> > >> > [email protected]:32768
> > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942
> master.cpp:2273]
> > >> > Processing ACCEPT call for offers: [
> > >> > 20150527-152023-169978048-5050-876-O241 ] on slave
> > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> > >> > 2.168.33.11:5051 (mesos-slave1) for framework
> > >> > 20150527-100126-169978048-5050-1851-0001
> > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > >> > [email protected]:32768
> > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> > >> hierarchical.hpp:648]
> > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375;
> > ports(*):[31000-32000]
> > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> > >> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0
> > from
> > >> > framework 20150527-100126-169978048-5050-1851-0001
> > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937
> master.cpp:1574]
> > >> > Received registration request for framework 'Spark shell' at
> > >> > [email protected]:55562
> > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937
> master.cpp:1638]
> > >> > Registering framework 20150527-152023-169978048-5050-876-0026 (Spark
> > >> shell)
> > >> > at [email protected]:5556
> > >> > 2
> > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> > >> hierarchical.hpp:321]
> > >> > Added framework 20150527-152023-169978048-5050-876-0026
> > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937
> master.cpp:3760]
> > >> > Sending 1 offers to framework
> 20150527-152023-169978048-5050-876-0026
> > >> > (Spark shell) at
> [email protected].
> > >> > 0.1:55562
> > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944
> master.cpp:878]
> > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> > >> > [email protected]:55562
> > >> disconnecte
> > >> > d
> > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944
> master.cpp:1948]
> > >> > Disconnecting framework 20150527-152023-169978048-5050-876-0026
> (Spark
> > >> > shell) at
> [email protected]:55
> > >> > 562
> > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944
> master.cpp:1964]
> > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026
> (Spark
> > >> > shell) at
> > [email protected]:555
> > >> > 62
> > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> > >> hierarchical.hpp:400]
> > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944
> master.cpp:900]
> > >> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark
> shell)
> > >> at
> > >> > [email protected]:55562 0ns
> > >> > to failover
> > >> >
> > >> >
> > >> > Kind regards and thank you very much for your help!!
> > >> >
> > >> >
> > >> >
> > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <[email protected]>:
> > >> >
> > >> > > Alberto,
> > >> > >
> > >> > > would you mind providing slave and master logs (or appropriate
> parts
> > >> of
> > >> > > them)? Have you specified the --work_dir flag for your Mesos
> > Workers?
> > >> > >
> > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <
> > [email protected]
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Alex,
> > >> > > >
> > >> > > > Thank you for replying. I managed to fix the first problem but
> now
> > >> > when I
> > >> > > > launch a spark job through my console mesos is losing all the
> > >> tasks. I
> > >> > > can
> > >> > > > see them all in my mesos slave but their status is LOST. The
> > stderr
> > >> &
> > >> > > > stdout files of the tasks are both empty.
> > >> > > >
> > >> > > > Any ideas?
> > >> > > >
> > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <[email protected]
> >:
> > >> > > >
> > >> > > > > Alberto,
> > >> > > > >
> > >> > > > > What may be happening in your case is that Master is not able
> to
> > >> talk
> > >> > > to
> > >> > > > > your scheduler. When responding to a scheduler, Mesos Master
> > >> doesn't
> > >> > > use
> > >> > > > > the IP from which a request came from, but rather an IP set in
> > the
> > >> > > > > "Libprocess-from" field instead. That's exactly what you
> specify
> > >> in
> > >> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could you
> > >> please
> > >> > > > > double check the it set up correctly and that IP is reachable
> > for
> > >> > Mesos
> > >> > > > > Master?
> > >> > > > >
> > >> > > > > In case you are not able to solve the problem, please provide
> > >> > scheduler
> > >> > > > and
> > >> > > > > Master logs together with master, zookeeper, and scheduler
> > >> > > > configurations.
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> > >> > [email protected]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi all,
> > >> > > > > >
> > >> > > > > > I managed to get a mesos cluster up & running on a Ubuntu
> VM.
> > >> I've
> > >> > > > > > been also able to run and connect a spark-shell from this
> > >> machine
> > >> > and
> > >> > > > > > it works properly.
> > >> > > > > >
> > >> > > > > > Unfortunately, I'm trying to connect from the host machine
> > where
> > >> > the
> > >> > > > > > VM is running to launch spark jobs and I can not.
> > >> > > > > >
> > >> > > > > > See below the spark console output:
> > >> > > > > >
> > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server
> VM,
> > >> Java
> > >> > > > > > 1.7.0_75)
> > >> > > > > > Type in expressions to have them evaluated.
> > >> > > > > > Type :help for more information.
> > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls
> to:
> > >> > > > arodriguez
> > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls
> > to:
> > >> > > > > arodriguez
> > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > >> > > > > > authentication disabled; ui acls disabled; users with view
> > >> > > > > > permissions: Set(arodriguez); users with modify permissions:
> > >> > > > > > Set(arodriguez)
> > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening
> > on
> > >> > > > > > addresses :[akka.tcp://[email protected]
> > :47229]
> > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > >> > > > > > 'sparkDriver' on port 47229.
> > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> MapOutputTracker
> > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > BlockManagerMaster
> > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
> > >> directory at
> > >> > > > > > /tmp/spark-local-20150525181301-7fa8
> > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with
> > >> > capacity
> > >> > > > > > 265.4 MB
> > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> > >> > native-hadoop
> > >> > > > > > library for your platform... using builtin-java classes
> where
> > >> > > > > > applicable
> > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server
> > >> directory
> > >> > is
> > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > 'HTTP
> > >> > file
> > >> > > > > > server' on port 51659.
> > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > >> > 'SparkUI'
> > >> > > > > > on port 4040.
> > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > >> > > > > > http://localhost.localdomain:4040
> > >> > > > > > WARNING: Logging before InitGoogleLogging() is written to
> > STDERR
> > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > >> > > > > > **************************************************
> > >> > > > > > Scheduler driver bound to loopback interface! Cannot
> > communicate
> > >> > with
> > >> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
> > >> environment
> > >> > > > > > variable to use a routable IP address.
> > >> > > > > > **************************************************
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @712:
> > >> > > > > > Client environment:zookeeper.version=zookeeper C client
> 3.4.6
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @716:
> > >> > > > > > Client environment:host.name=localhost.localdomain
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @723:
> > >> > > > > > Client environment:os.name=Linux
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @724:
> > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @725:
> > >> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC
> > 2015
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @733:
> > >> > > > > > Client environment:user.name=arodriguez
> > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @741:
> > >> > > > > > Client environment:user.home=/home/arodriguez
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @753:
> > >> > > > > > Client
> > >> > > > > >
> > >> > >
> > >>
> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > >> > > > > > 2015-05-25
> > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > >> > > > > @786:
> > >> > > > > > Initiating client connection, host=10.141.141.10:2181
> > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > >> > > > > > 2015-05-25
> > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > >> > > > > @1705:
> > >> > > > > > initiated connection to server [10.141.141.10:2181]
> > >> > > > > > 2015-05-25
> > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > >> > > > > @1752:
> > >> > > > > > session establishment complete on server [
> 10.141.141.10:2181
> > ],
> > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> > >> > operations:
> > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create
> > path
> > >> > > > > > '/mesos' in ZooKeeper
> > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new
> > >> > leader:
> > >> > > > > > (id='16')
> > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > >> > > > > > '/mesos/info_0000000016' in ZooKeeper
> > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading
> > >> master
> > >> > > > > > ([email protected]:5050) is detected
> > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master
> detected
> > >> at
> > >> > > > > > [email protected]:5050
> > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials
> > >> provided.
> > >> > > > > > Attempting to register without authentication
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > It hangs up in the last line.
> > >> > > > > >
> > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with no
> luck.
> > >> > > > > >
> > >> > > > > > Any advice?
> > >> > > > > >
> > >> > > > > > Thank you in advance.
> > >> > > > > >
> > >> > > > > > Kind regards,
> > >> > > > > >
> > >> > > > > > Alberto
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to