On 29 November 2013 14:42, Peter Waller <pe...@scraperwiki.com> wrote:
> Hi Roger et al,
>
> I've restarted juju-machine-0 and rsyslog. juju status hangs (seemingly
> indefinitely) and I'm seeing rapid log growth still. In machine-0.log I'm
> seeing the below. In all-machines.log I'm seeing what looks like the same
> behaviour for all of the other machines.

Hi Peter,

Have you verified that disk space has actually been freed up?
Assuming so, have you tried restarting juju-db ?
If you have, perhaps you could try the following command
(on the bootstrap node) to verify whether mongo is still alive:

   mongo --ssl -u admin -p <admin-secret> localhost:37017/juju

where <admin-secret> is the admin secret configured in your
environments.yaml file.

I'm thinking this will probably fail because otherwise your
status would be succeeding, but it will be good to be sure.
If it does fail, it will be interesting to know what's going
on with mongo - it's quite possible that it has got its knickers
in a twist with the lack of disk space.

  cheers,
    rog.

>
> 013-11-29 14:40:04 DEBUG juju.state open.go:88 connection failed, will
> retry: d
> ial tcp 127.0.0.1:37017: connection refused
> 2013-11-29 14:40:05 INFO juju runner.go:253 worker: start "api"
> 2013-11-29 14:40:05 INFO juju apiclient.go:106 state/api: dialing
> "wss://localhost:17070/"
> 2013-11-29 14:40:05 ERROR juju apiclient.go:111 state/api: websocket.Dial
> wss://localhost:17070/: dial tcp 127.0.0.1:17070: connection refused
> 2013-11-29 14:40:05 ERROR juju runner.go:211 worker: exited "api":
> websocket.Dial wss://localhost:17070/: dial tcp 127.0.0.1:17070: connection
> refused
> 2013-11-29 14:40:05 INFO juju runner.go:245 worker: restarting "api" in 3s
> 2013-11-29 14:40:05 DEBUG juju.state open.go:88 connection failed, will
> retry: dial tcp 127.0.0.1:37017: connection refused
> 2013-11-29 14:40:05 DEBUG juju.state open.go:88 connection failed, will
> retry: dial tcp 127.0.0.1:37017: connection refused
>
>
>
> On 29 November 2013 12:40, Peter Waller <pe...@scraperwiki.com> wrote:
>>
>> I've replied off-list to Roger with URLs to the logs. I'm happy for them
>> to be shared internally between juju developers and to share them with
>> anyone who is interested.
>>
>>
>> On 29 November 2013 12:30, roger peppe <rogpe...@gmail.com> wrote:
>>>
>>> On 29 November 2013 11:44, Peter Waller <pe...@scraperwiki.com> wrote:
>>> > The pids appear to be constant since I last reported them. Your theory
>>> > about
>>> > the machine being out of disk is correct.
>>> >
>>> > Indeed the log files are 1.4 and 1.6 GB for all-machines and
>>> > machine-0.log.
>>> > I'll try xz'ing them and then sending them along to you. Is it okay if
>>> > I
>>> > e-mail them directly to you and anyone else who is interested (please
>>> > mail
>>> > me personally) assuming they compress down to ~megabytes?
>>>
>>> Yes please. Putting them somewhere like google drive might work
>>> better than sending an attachment. I imagine they'll compress
>>> very nicely though.
>>>
>>> BTW, just removing the log files will not work, as the files will still
>>> be held
>>> open. You probably want to do something like (having first removed
>>> enough files elsewhere to ensure some spare disk space):
>>>
>>> cd /var/log/juju
>>> mv machine-0.log old-machine-0.log
>>> restart jujud-machine-0
>>>
>>> That will cause the machine agent to create a new log file, leaving
>>> you free to archive the old one and remove it to save space.
>>>
>>> You can do a similar thing with all-machines.log, except you'll need
>>> to restart rsyslog.
>>>
>>> I hope that when you've done that, mongo will start working again
>>> (you might need to restart the juju-db service). If you find that
>>> you can run juju status, it would be good to know what is
>>> the value of the "agent-version" field in your environment
>>> config (i.e. the output of "juju get-environment | grep agent-version").
>>>
>>>   cheers,
>>>     rog.
>>>
>>> >
>>> > On 29 November 2013 11:26, roger peppe <rogpe...@gmail.com> wrote:
>>> >>
>>> >> On 29 November 2013 10:26, Peter Waller <pe...@scraperwiki.com> wrote:
>>> >> > On 28 November 2013 17:44, Peter Waller <pe...@scraperwiki.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> I'm still having the problem of it spinning, every few seconds all
>>> >> >> of
>>> >> >> the
>>> >> >> machines are still spewing into the logs, despite my attempt at
>>> >> >> asking
>>> >> >> it to
>>> >> >> "upgrade" to a different version.
>>> >> >
>>> >> >
>>> >> > Last night I left it spinning absent any better ideas. It didn't
>>> >> > seem to
>>> >> > be
>>> >> > causing any obvious harm that I could find.
>>> >> >
>>> >> > This morning `juju status` doesn't work and nor does juju debug-log.
>>> >> >
>>> >> > On the bootstrap node the following is running for `ps aux | grep
>>> >> > juju`
>>> >> >
>>> >> > root     23973  0.0  0.0   4440   624 ?        Ss   Aug29   0:00
>>> >> > /bin/sh
>>> >> > -e
>>> >> > -c /var/lib/juju/tools/machine-0/jujud machine --log-file
>>> >> > /var/log/juju/machine-0.log --data-dir '/var/lib/juju' --machine-id
>>> >> > 0
>>> >> > --debug >> /var/log/juju/machine-0.log 2>&1 /bin/sh
>>> >> > root     23974  0.2  1.4 309440 24340 ?        Sl   Aug29 346:48
>>> >> > /var/lib/juju/tools/machine-0/jujud machine --log-file
>>> >> > /var/log/juju/machine-0.log --data-dir /var/lib/juju --machine-id 0
>>> >> > --debug
>>> >> >
>>> >> > $ juju debug-log | grep --line-buffered -v jsoncodec
>>> >> > ERROR no reachable servers
>>> >> >
>>> >> > $ juju status
>>> >> > <hangs seemingly indefinitely>
>>> >> >
>>> >> > Any advice or documentation someone can point me at to get this
>>> >> > system
>>> >> > back
>>> >> > into a working state?
>>> >>
>>> >> If you do the above ps, then wait 5 seconds and try again, have the
>>> >> process ids changed?
>>> >> If so, that means it's probably continually bouncing for some reason.
>>> >>
>>> >> It would be useful to see the logs, in particular
>>> >> /var/log/juju/machine-0.log.
>>> >> Firstly how big are they? It is possible that the machine is out of
>>> >> disk
>>> >> space
>>> >> and that's caused the mongo database storage to stop working.
>>> >> You could try deleting the log files (perhaps save them first for
>>> >> later diagnosis).
>>> >>
>>> >> Secondly, perhaps you could paste somewhere the last few thousand
>>> >> lines
>>> >> of the logs, that might give us more of an idea of what's happening
>>> >> currently.
>>> >>
>>> >> Please feel free to ping us on IRC (#juju-dev or #juju on
>>> >> freenode.net; my
>>> >> nickname is rogpeppe there) and we could help more directly.
>>> >>
>>> >>   cheers,
>>> >>     rog.
>>> >
>>> >
>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Reply via email to