[ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
One month ago when I first started evaluating ceph, I chose Debian 9.3 as the operating system. I saw random OS hang so I gave up and switched to Ubuntu 16.04. Every thing works well using Ubuntu 16.04. Yesterday I tried Ubuntu 17.10, again I saw random OS hang, no matter it's mon, mgr, osd, or rgw. When it hangs, the console won't respond to keyboard input, the host is unreachable from the network. This is the OS vs kernel version list: Ubuntu 16.04 -> kernel 4.4 Debian 9.3 -> kernel 4.9 Ubuntu 17.10 -> kernel 4.13 Just wondering if anyone has seen the same issue, or it's just me. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
The freeze is likely a kernel panic. Try testing different versions of the kernel. On Fri, Jan 19, 2018, 8:46 AM Youzhong Yang wrote: > One month ago when I first started evaluating ceph, I chose Debian 9.3 as > the operating system. I saw random OS hang so I gave up and switched to > Ubuntu 16.04. Every thing works well using Ubuntu 16.04. > > Yesterday I tried Ubuntu 17.10, again I saw random OS hang, no matter it's > mon, mgr, osd, or rgw. When it hangs, the console won't respond to keyboard > input, the host is unreachable from the network. > > This is the OS vs kernel version list: > Ubuntu 16.04 -> kernel 4.4 > Debian 9.3 -> kernel 4.9 > Ubuntu 17.10 -> kernel 4.13 > > Just wondering if anyone has seen the same issue, or it's just me. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
Hi, On 01/19/18 14:46, Youzhong Yang wrote: > Just wondering if anyone has seen the same issue, or it's just me. we're using debian with our own backported kernels and ceph, works rock solid. what you're describing sounds more like hardware issues to me. if you don't fully "trust"/have confidence in your hardware (and your logs don't reveal anything), I'd recommend running some burn-in tests (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out cpu/ram/etc. issues. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
I don't think it's hardware issue. All the hosts are VMs. By the way, using the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last night, so far so good, no freeze. On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann wrote: > Hi, > > On 01/19/18 14:46, Youzhong Yang wrote: > > Just wondering if anyone has seen the same issue, or it's just me. > > we're using debian with our own backported kernels and ceph, works rock > solid. > > what you're describing sounds more like hardware issues to me. if you > don't fully "trust"/have confidence in your hardware (and your logs > don't reveal anything), I'd recommend running some burn-in tests > (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out > cpu/ram/etc. issues. > > Regards, > Daniel > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang wrote: > I don't think it's hardware issue. All the hosts are VMs. By the way, using > the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last > night, so far so good, no freeze. Too little information to make any sort of assessment I'm afraid but, at this stage, this doesn't sound like a ceph issue. > > On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann > wrote: >> >> Hi, >> >> On 01/19/18 14:46, Youzhong Yang wrote: >> > Just wondering if anyone has seen the same issue, or it's just me. >> >> we're using debian with our own backported kernels and ceph, works rock >> solid. >> >> what you're describing sounds more like hardware issues to me. if you >> don't fully "trust"/have confidence in your hardware (and your logs >> don't reveal anything), I'd recommend running some burn-in tests >> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out >> cpu/ram/etc. issues. >> >> Regards, >> Daniel >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
As someone suggested, I installed linux-generic-hwe-16.04 package on Ubuntu 16.04 to get kernel of 17.10, and then rebooted all VMs, here is what I observed: - ceph monitor node froze upon reboot, in another case froze after a few minutes - ceph OSD hosts easily froze - ceph admin node (which runs no ceph service but ceph-deploy) never freezes - ceph rgw nodes and ceph mgr so far so good Here are two images I captured: https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_oZqI506/view?usp=sharing https://drive.google.com/file/d/1tzDQ3DYTnfDHh_hTQb0ISZZ4WZdRxHLv/view?usp=sharing Thanks. On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard wrote: > On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang > wrote: > > I don't think it's hardware issue. All the hosts are VMs. By the way, > using > > the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last > > night, so far so good, no freeze. > > Too little information to make any sort of assessment I'm afraid but, > at this stage, this doesn't sound like a ceph issue. > > > > > On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann > > wrote: > >> > >> Hi, > >> > >> On 01/19/18 14:46, Youzhong Yang wrote: > >> > Just wondering if anyone has seen the same issue, or it's just me. > >> > >> we're using debian with our own backported kernels and ceph, works rock > >> solid. > >> > >> what you're describing sounds more like hardware issues to me. if you > >> don't fully "trust"/have confidence in your hardware (and your logs > >> don't reveal anything), I'd recommend running some burn-in tests > >> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out > >> cpu/ram/etc. issues. > >> > >> Regards, > >> Daniel > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Cheers, > Brad > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
How up to date is your VM environment? We saw something very similar last year with Linux VM’s running newish kernels. It turns out newer kernels supported a new feature of the vmxnet3 adapters which had a bug in ESXi. The fix was release last year some time in ESXi6.5 U1, or a workaround was to set an option in the VM config. https://kb.vmware.com/s/article/2151480 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Youzhong Yang Sent: 21 January 2018 19:50 To: Brad Hubbard Cc: ceph-users Subject: Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ? As someone suggested, I installed linux-generic-hwe-16.04 package on Ubuntu 16.04 to get kernel of 17.10, and then rebooted all VMs, here is what I observed: - ceph monitor node froze upon reboot, in another case froze after a few minutes - ceph OSD hosts easily froze - ceph admin node (which runs no ceph service but ceph-deploy) never freezes - ceph rgw nodes and ceph mgr so far so good Here are two images I captured: https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_oZqI506/view?usp=sharing https://drive.google.com/file/d/1tzDQ3DYTnfDHh_hTQb0ISZZ4WZdRxHLv/view?usp=sharing Thanks. On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard mailto:bhubb...@redhat.com> > wrote: On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang mailto:youzh...@gmail.com> > wrote: > I don't think it's hardware issue. All the hosts are VMs. By the way, using > the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last > night, so far so good, no freeze. Too little information to make any sort of assessment I'm afraid but, at this stage, this doesn't sound like a ceph issue. > > On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann <mailto:daniel.baum...@bfh.ch> > > wrote: >> >> Hi, >> >> On 01/19/18 14:46, Youzhong Yang wrote: >> > Just wondering if anyone has seen the same issue, or it's just me. >> >> we're using debian with our own backported kernels and ceph, works rock >> solid. >> >> what you're describing sounds more like hardware issues to me. if you >> don't fully "trust"/have confidence in your hardware (and your logs >> don't reveal anything), I'd recommend running some burn-in tests >> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out >> cpu/ram/etc. issues. >> >> Regards, >> Daniel >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
Thanks. I applied the workaround to .vmx and rebooted all VMs. No more freeze! On Sun, Jan 21, 2018 at 3:43 PM, Nick Fisk wrote: > How up to date is your VM environment? We saw something very similar last > year with Linux VM’s running newish kernels. It turns out newer kernels > supported a new feature of the vmxnet3 adapters which had a bug in ESXi. > The fix was release last year some time in ESXi6.5 U1, or a workaround was > to set an option in the VM config. > > > > https://kb.vmware.com/s/article/2151480 > > > > > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Youzhong Yang > *Sent:* 21 January 2018 19:50 > *To:* Brad Hubbard > *Cc:* ceph-users > *Subject:* Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = > random OS hang ? > > > > As someone suggested, I installed linux-generic-hwe-16.04 package on > Ubuntu 16.04 to get kernel of 17.10, and then rebooted all VMs, here is > what I observed: > > - ceph monitor node froze upon reboot, in another case froze after a few > minutes > > - ceph OSD hosts easily froze > > - ceph admin node (which runs no ceph service but ceph-deploy) never > freezes > > - ceph rgw nodes and ceph mgr so far so good > > > > Here are two images I captured: > > > > https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_ > oZqI506/view?usp=sharing > > https://drive.google.com/file/d/1tzDQ3DYTnfDHh_ > hTQb0ISZZ4WZdRxHLv/view?usp=sharing > > > > Thanks. > > > > On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard wrote: > > On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang > wrote: > > I don't think it's hardware issue. All the hosts are VMs. By the way, > using > > the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last > > night, so far so good, no freeze. > > Too little information to make any sort of assessment I'm afraid but, > at this stage, this doesn't sound like a ceph issue. > > > > > > On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann > > wrote: > >> > >> Hi, > >> > >> On 01/19/18 14:46, Youzhong Yang wrote: > >> > Just wondering if anyone has seen the same issue, or it's just me. > >> > >> we're using debian with our own backported kernels and ceph, works rock > >> solid. > >> > >> what you're describing sounds more like hardware issues to me. if you > >> don't fully "trust"/have confidence in your hardware (and your logs > >> don't reveal anything), I'd recommend running some burn-in tests > >> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out > >> cpu/ram/etc. issues. > >> > >> Regards, > >> Daniel > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Cheers, > Brad > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com