>From the logs, the slave never got the 'registered' message from the master. The master removes/disconnects a slave, when the slave doesn't respond to its health checks, after a timeout.
Did you try to start the slave with --ip=<public ip> as suggested earlier? I'm not familiar with AWS networking semantics, but I suspect you cannot connect from 107.22.185.93 --> 10.96.130.119? @vinodkone On Fri, Nov 2, 2012 at 12:36 PM, Jim Donahue <[email protected]> wrote: > Ben, > > Complete logs are attached. Note that the master log ends long before the > slave -- seems like the master has decided to go autistic. > > The master is using an AWS elastic IP address, which the slave uses to > connect. The master has a "slaves" file in its deploy directory with an > entry giving the AWS internal IP address of the slave (and the address in > the file matches the internal IP address in the AWS management console). > And it looks like they did rendezvous for a moment -- when I (briefly) got > the webUI up everything looked right. > > Thanks, > > Jim > > -----Original Message----- > From: Benjamin Mahler [mailto:[email protected]] > Sent: Friday, November 02, 2012 11:59 AM > To: [email protected] > Subject: Re: WebUI problems > > "But I can't connect to the webUI on the slave." -- right, slaves do not > have their own webuis anymore, the master collects slave information and > displays it in it's webui. > > Do you run in an environment where you have public and private IPs? It > looks like the slave cannot receive messages from the master. It looks like > you may want to try --ip=<public_slave_ip> when you start your slave. > > Can you provide the full master / slave logs for this? > Can you also provide the commands you're using to start the master / slave? > > On Fri, Nov 2, 2012 at 11:35 AM, Jim Donahue <[email protected]> wrote: > > > Now I'm seeing the master and slave go autistic. > > > > Using port 5050, I was able to get the webUI up exactly once and then > > everything looks like it dies. The log on the master shows a bunch of > > "slave already registered, resending ack" messages, followed by the slave > > disconnecting and reconnecting on the same port. Finally, the INFO log > > ends with an "adding slave" message and then just stops. > > > > As far as I can tell, the master is still running. But I can't connect to > > it again through the webUI. > > > > Looking at the slave log, the slave detected the master and then shows > > periodic reporting of its current disk usage and "allowed age" -- there's > > no indication of any disconnect in the slave log. But I can't connect to > > the webUI on the slave. > > > > > > Thanks, > > > > > > Jim > > > > -----Original Message----- > > From: Benjamin Mahler [mailto:[email protected]] > > Sent: Friday, November 02, 2012 10:29 AM > > To: [email protected] > > Subject: Re: WebUI problems > > > > We've recently killed the old webui: https://reviews.apache.org/r/7708/ > > > > In the process, the --webui_port flag was removed as it was no longer > > applicable. I was under the assumption our flag system would not allow > > extraneous flags to be provided, but perhaps that not the case. > > > > The new webui runs on 5050 as Erich indicated. Please report any issues > you > > find! > > > > On Fri, Nov 2, 2012 at 10:21 AM, Erich Nachbar <[email protected]> > > wrote: > > > > > Had the same problem. Try using port 5050 instead of the old 8080. The > > > webui_port option was ignored when I tried it. > > > > > > > > > On Fri, Nov 2, 2012 at 10:17 AM, Jim Donahue <[email protected]> > wrote: > > > > > > > Yesterday I built a new AMI using the latest Mesos and now I can't > > > connect > > > > to the web UI (which used to work). Logging into the instances (a > > master > > > > and a slave), all looks well -- the master sees the slave and the > slave > > > > sees the master. Both master and slave were started with the option > > > > > > > > --webui_port=5051 > > > > > > > > But no luck connecting to them with a browser. Has something changed > > > > recently that I missed? I noticed that I did have to change the > build > > > > recipe for my AMI to install some new libraries, but I didn't see any > > > > errors in the build and the tests all ran, except for the cgroup > ones. > > > > > > > > The other thing I noticed is that the logs on both master and slave > > have > > > > names of the form: > > > > > > > > ...invalid-user.log.INFO.... > > > > > > > > Is this something I should worry about? > > > > > > > > Thanks, > > > > > > > > Jim Donahue > > > > Adobe Systems > > > > > > > > > > > > > > > > -- > > > Erich Nachbar > > > CTO | Quantifind <http://quantifind.com/>| 650-430-5500 > > > > > >
