I am running Kubuntu 18.10 w kernel 4.18.0-11-generic with AMD Ryzen 2700x CPU, I initially believed I had a Ryzen soft lockup issue, and I had posted in AMD community forums:
https://community.amd.com/thread/225795# But I later realized the AMD soft lockup issue is one that required motherboard reset button to get out off. My issue is usually not so bad, most of the time, SSH and network and VIRTUAL MACHINES inside my server will still work. I could use the following command vis SSH to get back alive: #sudo systemctl restart sddm I am now inclined to suspect a Linux Kernel scheduler had caused some of my threads frozen, and X.org console frozen - mouse and keyboard stuck. The latest discover on/right-after X'mas 2018 was that all CPUs logical & physical cores will still be running as seen in ksysguard graphs and top command, while some threads typically my late night crontab backup jobs, HANG FOR HOURS randomly and after hours, RESUME THEMSELVES. The backup was apparently all done - but up to after 12hours of delays! I had also seen frozen X.org screen later refreshed a little after 45mins, but I could not wait further so I SSH a sddm restart as mentioned above. I copy my post dated Dec.27.2018 on AMD community forum below: Dear All, Today my new discovery indicated that we may be heading wrong direction with regards to CPU core voltage and power states. It has got to be something else. 265px-Ksysguard1.png I use the famous linux top command and ksysguard (above imgs) and I sort of AMBUSH the problem awaited to solidly catch a process that frozen. And my chance came today. I caught my Virtual Machines Backup crontab jobs frozen at the vmware's vmrun suspend command. Info: https://docs.vmware.com/en/VMware-Fusion/11/com.vmware.fusion.using.doc /GUID-24F54E24-EFB0-4E94-8A07-2AD791F0E497.html My cron jobs put each virtual machines into suspend mode and backup into a harddisk. I got a clue few days ago when I check through my backups, their folder date time stamps suggested that the usual backup jobs which should all be done within 30 mins normally, had on 2 occasions took several hours! There was nothing else wrong beside the long time spent at late night to backup, the data seem quite completely backed up. That means, the lockup or freeze could unfreeze themselves and proceeded to a long delayed completion. So I ssh into this Ryzen machine at my crontab job hour today, forwarded X and ran ksysguard and top at remote desktop. Yes the cron job frozen and backup was not happening. I also used the linux ps -aux | grep crontab & similar commands, it was confirm that crontab was hanging awaiting for vmrun to suspend the vm, and this command just frozen. It fronzen for almost 2 hours! & later it completed it after this long delay. And my script went ahead further to backup another virtual machine, and after backing up, it is suppose to do vmrun resume but agian, the resume frozen up and took more than 1 hour. After this even my ssh -X session died. I can not reconnect again. During these hours, I had the top command and ksysguard showing me that other processes and thread were running, ALL my 16 logical (8 physical) CPUs were RUNNING! None of the CPU cores were frozen up in C6 or any other power states, while the thread hang for hours. Because of Hyperthreading, each 2 logical CPUs are from 1 single physical CPU core, and if any core locked up in power state during these hours of lockup, the graphs of 2 logical CPUs must die for each physical CPU to freeze in deep sleep state. If 2 physical cores locked up, than graphs of 2 logical CPUs must die (ZERO % usage). I am very sure of my observations. It was repeated twice during my AMBUSH mission today. I am very sure of how my scripts work, and how vmrun works, this similar setup and script had worked for more than 10 years, and used on older AMD and Intel machines. This Ryzen is a recent replacement for the retired old server. I am now not inclined to believe that CPU cores were frozen in deep sleep power states, nor it was Typical Current Idle issue. Not for my Ryzen machine anyway. It has to be something else, RANDOMLY LOCKING UP, and RANDOMLY UNLOCKED THEMSELVES, Affecting process / thread that also appear to be random. I checked the PIDs of these locked up jobs, top said they were in idle state. While it was locked I went into various /proc folders and files to sniff for clues, did not get anything too useful except to see that they were idle /proc/[PID]/status /proc/[PID]/task/[PID]/status My favorite soft reset systemctl restart sddm had worked many times nearly without fail because I think it flushed out and killed the hanging threads, this command killed X and everything else running on X, which will be quite a big number, and it restarted KDE desktop manager. I am hoping to get a further breakthrough to find out what caused the thread to LOCK-UP & UNLOCK themselves. Cheers. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1798961 Title: Random unrecoverable freezes on Ubuntu 18.10 Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Triaged Status in linux source package in Cosmic: Triaged Status in linux source package in Disco: Triaged Bug description: First thing I notice is that the mouse cursor freezes as I'm using it, then I hit the CAPS LOCK key and the LED indicator doesn't respond. Then I try the "REISUB" command, but it doesn't do anything either. Only a hard reset works, pressing down the power button for a few seconds. How to reproduce? I couldn't figure out a consistent method. It is still random to me. Version: Ubuntu 4.18.0-10.11-generic 4.18.12 System information attached. Also happens under Arch Linux and Fedora. I've talked to another user on IRC who seems to be having the same freezes. ProblemType: Bug DistroRelease: Ubuntu 18.10 Package: linux-image-4.18.0-10-generic 4.18.0-10.11 ProcVersionSignature: Ubuntu 4.18.0-10.11-generic 4.18.12 Uname: Linux 4.18.0-10-generic x86_64 ApportVersion: 2.20.10-0ubuntu13 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: dsilva 1213 F.... pulseaudio /dev/snd/controlC0: dsilva 1213 F.... pulseaudio CurrentDesktop: XFCE Date: Sat Oct 20 09:54:50 2018 InstallationDate: Installed on 2018-10-20 (0 days ago) InstallationMedia: Xubuntu 18.10 "Cosmic Cuttlefish" - Release amd64 (20181017.2) MachineType: Dell Inc. Inspiron 5458 ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.18.0-10-generic root=/dev/mapper/xubuntu--vg-root ro quiet splash vt.handoff=1 RelatedPackageVersions: linux-restricted-modules-4.18.0-10-generic N/A linux-backports-modules-4.18.0-10-generic N/A linux-firmware 1.175 RfKill: 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 02/02/2018 dmi.bios.vendor: Dell Inc. dmi.bios.version: A15 dmi.board.name: 09WGNT dmi.board.vendor: Dell Inc. dmi.board.version: A00 dmi.chassis.type: 9 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvrA15:bd02/02/2018:svnDellInc.:pnInspiron5458:pvr01:rvnDellInc.:rn09WGNT:rvrA00:cvnDellInc.:ct9:cvr: dmi.product.name: Inspiron 5458 dmi.product.sku: Inspiron 5458 dmi.product.version: 01 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798961/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp