>From the logs, the slave never got the 'registered' message from the
master. The master removes/disconnects a slave, when the slave doesn't
respond to its health checks, after a timeout.

Did you try to start the slave with --ip=<public ip> as suggested earlier?

I'm not familiar with AWS networking semantics, but I suspect you cannot
connect from 107.22.185.93 --> 10.96.130.119?

@vinodkone


On Fri, Nov 2, 2012 at 12:36 PM, Jim Donahue <[email protected]> wrote:

> Ben,
>
> Complete logs are attached.  Note that the master log ends long before the
> slave -- seems like the master has decided to go autistic.
>
> The master is using an AWS elastic IP address, which the slave uses to
> connect.  The master has a "slaves" file in its deploy directory with an
> entry giving the AWS internal IP address of the slave (and the address in
> the file matches the internal IP address in the AWS management console).
>  And it looks like they did rendezvous for a moment -- when I (briefly) got
> the webUI up everything looked right.
>
> Thanks,
>
> Jim
>
> -----Original Message-----
> From: Benjamin Mahler [mailto:[email protected]]
> Sent: Friday, November 02, 2012 11:59 AM
> To: [email protected]
> Subject: Re: WebUI problems
>
> "But I can't connect to the webUI on the slave." -- right, slaves do not
> have their own webuis anymore, the master collects slave information and
> displays it in it's webui.
>
> Do you run in an environment where you have public and private IPs? It
> looks like the slave cannot receive messages from the master. It looks like
> you may want to try --ip=<public_slave_ip> when you start your slave.
>
> Can you provide the full master / slave logs for this?
> Can you also provide the commands you're using to start the master / slave?
>
> On Fri, Nov 2, 2012 at 11:35 AM, Jim Donahue <[email protected]> wrote:
>
> > Now I'm seeing the master and slave go autistic.
> >
> > Using port 5050, I was able to get the webUI up exactly once and then
> > everything looks like it dies.  The log on the master shows a bunch of
> > "slave already registered, resending ack" messages, followed by the slave
> > disconnecting and reconnecting on the same port.  Finally, the INFO log
> > ends with an "adding slave" message and then just stops.
> >
> > As far as I can tell, the master is still running. But I can't connect to
> > it again through the webUI.
> >
> > Looking at the slave log, the slave detected the master and then shows
> > periodic reporting of its current disk usage and "allowed age" -- there's
> > no indication of any disconnect in the slave log.  But I can't connect to
> > the webUI on the slave.
> >
> >
> > Thanks,
> >
> >
> > Jim
> >
> > -----Original Message-----
> > From: Benjamin Mahler [mailto:[email protected]]
> > Sent: Friday, November 02, 2012 10:29 AM
> > To: [email protected]
> > Subject: Re: WebUI problems
> >
> > We've recently killed the old webui: https://reviews.apache.org/r/7708/
> >
> > In the process, the --webui_port flag was removed as it was no longer
> > applicable. I was under the assumption our flag system would not allow
> > extraneous flags to be provided, but perhaps that not the case.
> >
> > The new webui runs on 5050 as Erich indicated. Please report any issues
> you
> > find!
> >
> > On Fri, Nov 2, 2012 at 10:21 AM, Erich Nachbar <[email protected]>
> > wrote:
> >
> > > Had the same problem. Try using port 5050 instead of the old 8080. The
> > > webui_port option was ignored when I tried it.
> > >
> > >
> > > On Fri, Nov 2, 2012 at 10:17 AM, Jim Donahue <[email protected]>
> wrote:
> > >
> > > > Yesterday I built a new AMI using the latest Mesos and now I can't
> > > connect
> > > > to the web UI (which used to work).  Logging into the instances (a
> > master
> > > > and a slave), all looks well -- the master sees the slave and the
> slave
> > > > sees the master.  Both master and slave were started with the option
> > > >
> > > >         --webui_port=5051
> > > >
> > > > But no luck connecting to them with a browser.  Has something changed
> > > > recently that I missed?  I noticed that I did have to change the
> build
> > > > recipe for my AMI to install some new libraries, but I didn't see any
> > > > errors in the build and the tests all ran, except for the cgroup
> ones.
> > > >
> > > > The other thing I noticed is that the logs on both master and slave
> > have
> > > > names of the form:
> > > >
> > > >         ...invalid-user.log.INFO....
> > > >
> > > > Is this something I should worry about?
> > > >
> > > > Thanks,
> > > >
> > > > Jim Donahue
> > > > Adobe Systems
> > > >
> > >
> > >
> > >
> > > --
> > > Erich Nachbar
> > > CTO | Quantifind <http://quantifind.com/>| 650-430-5500
> > >
> >
>

Reply via email to