Apologies in advance if you already know all this and are an expert on vbox & networking - but maybe this either helps or at least may point you in the right direction (hopefully!)
The problem is most likely to be found in the fact that your laptop (or whatever box you're running vbox in) has a hostname that's not DNS-resolvable (and probably neither your VMs do). Further, by default, VBox configures the VM's NICs to be on a 'Bridged' private subnet, which means that you can 'net out' (eg, ping google.com from the VM) but not get in (eg, run a server accessible from outside the VM) Mesos master/slave need to be able to talk to each other, bi-directionally, which is possibly what was causing the issue in the first place. NAT'ing the VMs won't probably work either (you won't know in advance which port the Slave will be listening on - I think!) One option is to configure vbox's VMs to be on their own subnet (I forget the exact terminology, it's been almost a year now since I fiddled with it: I think it's the Host-Only option <https://www.virtualbox.org/manual/ch06.html#network_hostonly>) but essentially vbox will create a subnet and act as a router - the host machine will also have a virtual NIC in that subnet, so you'll be able to route requests to/from the VMs. There's also the fact that the Spark driver (pyspark, or spark-submit) will need to be able to talk to the worker nodes, but that should "just work" once you get Mesos to work. HTH, *Marco Massenzio* *Distributed Systems Engineer* On Thu, May 28, 2015 at 11:13 PM, Alberto Rodriguez <[email protected]> wrote: > To be honest I don't know what was the problem. I didn't manage to make my > Spark jobs work on the mesos cluster running on two virtual machines. I > managed to make it work when I run my Spark jobs on my local machine and > both master and mesos slaves are running also in my machine. > > I guess something is not working properly in the way that virtualbox is > assigning their network interfaces to the virtual machines but I can't > waste more time in the issue. > > Thank you again for your help! > > 2015-05-28 19:28 GMT+02:00 Alex Rukletsov <[email protected]>: > > > Great! Mind sharing with the list what the problem was (for future > > reference)? > > > > On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <[email protected]> > > wrote: > > > > > Hi Alex, > > > > > > I managed to make it work!! Finally I'm running both mesos master and > > slave > > > in my laptop and picking up the spark jar from a hdfs installed in a > VM. > > > I've just launched an spark job and is working fine! > > > > > > Thank you very much for your help > > > > > > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <[email protected]>: > > > > > > > Hi Alex, > > > > > > > > see following an extract of the chronos log (not sure whether this is > > the > > > > log you were talking about): > > > > > > > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks > > > > scheduled! Declining offers > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106) > > > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received > > > resource > > > > offers > > > > 2015-05-28_14:18:34.49903 > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:87) > > > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks > > > > scheduled! Declining offers > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106) > > > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received > > > resource > > > > offers > > > > 2015-05-28_14:18:40.50444 > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:87) > > > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks > > > > scheduled! Declining offers > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106) > > > > > > > > I'm using 0.20.1 because I'm using this vagrant machine: > > > > https://github.com/Banno/vagrant-mesos > > > > > > > > Kind regards and thank you again for your help > > > > > > > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <[email protected]>: > > > > > > > >> Alberto, > > > >> > > > >> it looks like Spark scheduler disconnects right after establishing > the > > > >> connection. Would you mind sharing scheduler logs as well? Also I > see > > > that > > > >> you haven't specified the failover_timeout, try setting this value > to > > > >> something meaningful (several hours for test purposes). > > > >> > > > >> And by the way, any reason you're still on Mesos 0.20.1? > > > >> > > > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez < > [email protected] > > > > > > >> wrote: > > > >> > > > >> > Hi Alex, > > > >> > > > > >> > I do not know what's going on, now I'm unable to access the spark > > > >> console > > > >> > again, it's hanging up in the same point as before. See following > > the > > > >> > master logs: > > > >> > > > > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494 944 > > master.cpp:3760] > > > >> > Sending 1 offers to framework > > 20150527-100126-169978048-5050-1851-0001 > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at > > scheduler-be29901f-39ab-4bdf > > > >> > [email protected]:32768 > > > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196 942 > > master.cpp:2273] > > > >> > Processing ACCEPT call for offers: [ > > > >> > 20150527-152023-169978048-5050-876-O241 ] on slave > > > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19 > > > >> > 2.168.33.11:5051 (mesos-slave1) for framework > > > >> > 20150527-100126-169978048-5050-1851-0001 > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at > > > >> > > [email protected]:32768 > > > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300 942 > > > >> hierarchical.hpp:648] > > > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375; > > > ports(*):[31000-32000] > > > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port > > > >> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0 > > > from > > > >> > framework 20150527-100126-169978048-5050-1851-0001 > > > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363 937 > > master.cpp:1574] > > > >> > Received registration request for framework 'Spark shell' at > > > >> > [email protected]:55562 > > > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461 937 > > master.cpp:1638] > > > >> > Registering framework 20150527-152023-169978048-5050-876-0026 > (Spark > > > >> shell) > > > >> > at [email protected]:5556 > > > >> > 2 > > > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703 937 > > > >> hierarchical.hpp:321] > > > >> > Added framework 20150527-152023-169978048-5050-876-0026 > > > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826 937 > > master.cpp:3760] > > > >> > Sending 1 offers to framework > > 20150527-152023-169978048-5050-876-0026 > > > >> > (Spark shell) at > > [email protected]. > > > >> > 0.1:55562 > > > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267 944 > > master.cpp:878] > > > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at > > > >> > [email protected]:55562 > > > >> disconnecte > > > >> > d > > > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308 944 > > master.cpp:1948] > > > >> > Disconnecting framework 20150527-152023-169978048-5050-876-0026 > > (Spark > > > >> > shell) at > > [email protected]:55 > > > >> > 562 > > > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326 944 > > master.cpp:1964] > > > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026 > > (Spark > > > >> > shell) at > > > [email protected]:555 > > > >> > 62 > > > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447 939 > > > >> hierarchical.hpp:400] > > > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026 > > > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459 944 > > master.cpp:900] > > > >> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark > > shell) > > > >> at > > > >> > [email protected]:55562 > 0ns > > > >> > to failover > > > >> > > > > >> > > > > >> > Kind regards and thank you very much for your help!! > > > >> > > > > >> > > > > >> > > > > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <[email protected]>: > > > >> > > > > >> > > Alberto, > > > >> > > > > > >> > > would you mind providing slave and master logs (or appropriate > > parts > > > >> of > > > >> > > them)? Have you specified the --work_dir flag for your Mesos > > > Workers? > > > >> > > > > > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez < > > > [email protected] > > > >> > > > > >> > > wrote: > > > >> > > > > > >> > > > Hi Alex, > > > >> > > > > > > >> > > > Thank you for replying. I managed to fix the first problem but > > now > > > >> > when I > > > >> > > > launch a spark job through my console mesos is losing all the > > > >> tasks. I > > > >> > > can > > > >> > > > see them all in my mesos slave but their status is LOST. The > > > stderr > > > >> & > > > >> > > > stdout files of the tasks are both empty. > > > >> > > > > > > >> > > > Any ideas? > > > >> > > > > > > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov < > [email protected] > > >: > > > >> > > > > > > >> > > > > Alberto, > > > >> > > > > > > > >> > > > > What may be happening in your case is that Master is not > able > > to > > > >> talk > > > >> > > to > > > >> > > > > your scheduler. When responding to a scheduler, Mesos Master > > > >> doesn't > > > >> > > use > > > >> > > > > the IP from which a request came from, but rather an IP set > in > > > the > > > >> > > > > "Libprocess-from" field instead. That's exactly what you > > specify > > > >> in > > > >> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could > you > > > >> please > > > >> > > > > double check the it set up correctly and that IP is > reachable > > > for > > > >> > Mesos > > > >> > > > > Master? > > > >> > > > > > > > >> > > > > In case you are not able to solve the problem, please > provide > > > >> > scheduler > > > >> > > > and > > > >> > > > > Master logs together with master, zookeeper, and scheduler > > > >> > > > configurations. > > > >> > > > > > > > >> > > > > > > > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez < > > > >> > [email protected]> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > Hi all, > > > >> > > > > > > > > >> > > > > > I managed to get a mesos cluster up & running on a Ubuntu > > VM. > > > >> I've > > > >> > > > > > been also able to run and connect a spark-shell from this > > > >> machine > > > >> > and > > > >> > > > > > it works properly. > > > >> > > > > > > > > >> > > > > > Unfortunately, I'm trying to connect from the host machine > > > where > > > >> > the > > > >> > > > > > VM is running to launch spark jobs and I can not. > > > >> > > > > > > > > >> > > > > > See below the spark console output: > > > >> > > > > > > > > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server > > VM, > > > >> Java > > > >> > > > > > 1.7.0_75) > > > >> > > > > > Type in expressions to have them evaluated. > > > >> > > > > > Type :help for more information. > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls > > to: > > > >> > > > arodriguez > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify > acls > > > to: > > > >> > > > > arodriguez > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager: > > > >> > > > > > authentication disabled; ui acls disabled; users with view > > > >> > > > > > permissions: Set(arodriguez); users with modify > permissions: > > > >> > > > > > Set(arodriguez) > > > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; > listening > > > on > > > >> > > > > > addresses :[akka.tcp://[email protected] > > > :47229] > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service > > > >> > > > > > 'sparkDriver' on port 47229. > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering > > MapOutputTracker > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering > > > BlockManagerMaster > > > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local > > > >> directory at > > > >> > > > > > /tmp/spark-local-20150525181301-7fa8 > > > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started > with > > > >> > capacity > > > >> > > > > > 265.4 MB > > > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load > > > >> > native-hadoop > > > >> > > > > > library for your platform... using builtin-java classes > > where > > > >> > > > > > applicable > > > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server > > > >> directory > > > >> > is > > > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16 > > > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service > > > 'HTTP > > > >> > file > > > >> > > > > > server' on port 51659. > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service > > > >> > 'SparkUI' > > > >> > > > > > on port 4040. > > > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at > > > >> > > > > > http://localhost.localdomain:4040 > > > >> > > > > > WARNING: Logging before InitGoogleLogging() is written to > > > STDERR > > > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323] > > > >> > > > > > ************************************************** > > > >> > > > > > Scheduler driver bound to loopback interface! Cannot > > > communicate > > > >> > with > > > >> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP' > > > >> environment > > > >> > > > > > variable to use a routable IP address. > > > >> > > > > > ************************************************** > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @712: > > > >> > > > > > Client environment:zookeeper.version=zookeeper C client > > 3.4.6 > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @716: > > > >> > > > > > Client environment:host.name=localhost.localdomain > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @723: > > > >> > > > > > Client environment:os.name=Linux > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @724: > > > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64 > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @725: > > > >> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 > UTC > > > 2015 > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @733: > > > >> > > > > > Client environment:user.name=arodriguez > > > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1 > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @741: > > > >> > > > > > Client environment:user.home=/home/arodriguez > > > >> > > > > > 2015-05-25 > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env > > > >> > @753: > > > >> > > > > > Client > > > >> > > > > > > > > >> > > > > > >> > > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin > > > >> > > > > > 2015-05-25 > > > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init > > > >> > > > > @786: > > > >> > > > > > Initiating client connection, host=10.141.141.10:2181 > > > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0 > > > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0 > > > >> > > > > > 2015-05-25 > > > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events > > > >> > > > > @1705: > > > >> > > > > > initiated connection to server [10.141.141.10:2181] > > > >> > > > > > 2015-05-25 > > > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events > > > >> > > > > @1752: > > > >> > > > > > session establishment complete on server [ > > 10.141.141.10:2181 > > > ], > > > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000 > > > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process > > > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper > > > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group > > > >> > operations: > > > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0) > > > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to > create > > > path > > > >> > > > > > '/mesos' in ZooKeeper > > > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a > new > > > >> > leader: > > > >> > > > > > (id='16') > > > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get > > > >> > > > > > '/mesos/info_0000000016' in ZooKeeper > > > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new > leading > > > >> master > > > >> > > > > > ([email protected]:5050) is detected > > > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master > > detected > > > >> at > > > >> > > > > > [email protected]:5050 > > > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials > > > >> provided. > > > >> > > > > > Attempting to register without authentication > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > It hangs up in the last line. > > > >> > > > > > > > > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with no > > luck. > > > >> > > > > > > > > >> > > > > > Any advice? > > > >> > > > > > > > > >> > > > > > Thank you in advance. > > > >> > > > > > > > > >> > > > > > Kind regards, > > > >> > > > > > > > > >> > > > > > Alberto > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >
