A few hours ago we had the same problem, also with Ubuntu 20.04, and
there is a coincidence in time with the latest docker update, which
was triggered from Puppet. After all, all the containers came back up
without a reboot. Thanks for the hint.

Note to myself: change the package parameter for the Ubuntu package
'docker.io' from 'latest' to 'installed'.

Am Sa., 7. Aug. 2021 um 11:05 Uhr schrieb Andrew Walker-Brown
<andrew_jbr...@hotmail.com>:
>
> Thanks David,
>
> Spent some more time digging in the logs/google.  Also had a further 2 nodes 
> fail this morning (different nodes).
>
> Looks like it’s related to apt-auto updates on Ubuntu 20.04, although we 
> don’t run unattended upgrades.  Docker appears to get a terminate signal 
> which shutsdown/restarts all the containers but some don’t come back cleanly. 
>  There’s was also some legacy unused interfaces/bonds in the netplan config.
>
> Anyway, cleaned all that up...so hopefully it’s resolved.
>
> Cheers,
>
> A.
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>
> From: David Caro<mailto:dc...@wikimedia.org>
> Sent: 06 August 2021 09:20
> To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>
> Cc: Marc<mailto:m...@f1-outsourcing.eu>; 
> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> Subject: Re: [ceph-users] Re: All OSDs on one host down
>
> On 08/06 07:59, Andrew Walker-Brown wrote:
> > Hi Marc,
> >
> > Yes i’m probably doing just that.
> >
> > The ceph admin guides aren’t exactly helpful on this.  The cluster was 
> > deployed using cephadm and it’s been running perfectly until now.
> >
> > Wouldn’t running “journalctl -u ceph-osd@5” on host ceph-004 show me the 
> > logs for osd.5 on that host?
>
> On my containerized setup, the services that cephadm created are:
>
> dcaro@node1:~ $ sudo systemctl list-units | grep ceph
>   ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@crash.node1.service               
>                                                                   loaded 
> active running   Ceph crash.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
>   ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mgr.node1.mhqltg.service          
>                                                                   loaded 
> active running   Ceph mgr.node1.mhqltg for 
> d49b287a-b680-11eb-95d4-e45f010c03a8
>   ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mon.node1.service                 
>                                                                   loaded 
> active running   Ceph mon.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
>   ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.3.service                     
>                                                                   loaded 
> active running   Ceph osd.3 for d49b287a-b680-11eb-95d4-e45f010c03a8
>   ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.7.service                     
>                                                                   loaded 
> active running   Ceph osd.7 for d49b287a-b680-11eb-95d4-e45f010c03a8
>   system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice       
>                                                                   loaded 
> active active    
> system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice
>   ceph-d49b287a-b680-11eb-95d4-e45f010c03a8.target                            
>                                                                   loaded 
> active active    Ceph cluster d49b287a-b680-11eb-95d4-e45f010c03a8
>   ceph.target                                                                 
>                                                                   loaded 
> active active    All Ceph clusters and services
>
> where the string after 'ceph-' is the fsid of the cluster.
> Hope that helps (you can use the systemctl list-units also to search the 
> specific ones on yours).
>
>
> >
> > Cheers,
> > A
> >
> >
> >
> >
> >
> > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 
> > 10
> >
> > From: Marc<mailto:m...@f1-outsourcing.eu>
> > Sent: 06 August 2021 08:54
> > To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>; 
> > ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> > Subject: RE: All OSDs on one host down
> >
> > >
> > > I’ve tried restarting on of the osds but that fails, journalctl shows
> > > osd not found.....not convinced I’ve got the systemctl command right.
> > >
> >
> > You are not mixing 'not container commands' with 'container commands'. As 
> > in, if you execute this journalctl outside of the container it will not 
> > find anything of course.
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> David Caro
> SRE - Cloud Services
> Wikimedia Foundation <https://wikimediafoundation.org/>
> PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
>
> "Imagine a world in which every single human being can freely share in the
> sum of all knowledge. That's our commitment."
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to