------- Comment From mranw...@us.ibm.com 2018-12-06 02:22 EDT------- I recreated the problem where I could see the errors in dmesg (and the console) and then added the firmware to /lib/firmware/nvidia/gx100. After that: mranweil@ltc-wspoon5:~$ dmesg|grep -i nouv [ 6.632529] nouveau 0004:04:00.0: enabling device (0140 -> 0142) [ 6.632613] nouveau 0004:04:00.0: Using 32-bit DMA via iommu [ 6.632721] nouveau 0004:04:00.0: NVIDIA GV100 (140000a1) <snip> [ 7.061963] nouveau 0035:03:00.0: DRM: Pointer to TMDS table invalid [ 7.061966] nouveau 0035:03:00.0: DRM: DCB version 4.1 [ 7.063141] nouveau 0035:03:00.0: DRM: MM: using COPY for buffer copies [ 7.063154] [drm] Initialized nouveau 1.3.1 20120801 for 0035:03:00.0 on minor 2 mranweil@ltc-wspoon5:~$
So looks like the firmware from the current git tree addresses the error messages. I didn't do anything further with the driver. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1794055 Title: [Witherspoon-DD2.2][Ubu 18.10] [4.18.0-7-generic ] OS booting thrown with nouveau errors; OS booted successfully Status in The Ubuntu-power-systems project: Incomplete Status in linux package in Ubuntu: Incomplete Status in linux source package in Cosmic: Incomplete Bug description: == Comment: #0 - Kalpana Shetty <kalsh...@in.ibm.com> - 2018-09-15 23:55:13 == ---Problem Description--- [Witherspoon-DD2.2][Ubu 18.10] [4.18.0-7-generic ] OS booting thrown with nouveau errors Contact Information = kalsh...@in.ibm.com, preeti.tha...@in.ibm.com ---uname output--- root@ltc-wcwsp3:~# uname -a Linux ltc-wcwsp3 4.18.0-7-generic #8-Ubuntu SMP Tue Aug 28 18:20:56 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Witherspoon DD2.2 LC Steps: 1. Netinstall Ubu 18.10 on Witherspoon-LC-DD2.2 6GPU system ------> PASS 2. Boot the OS ---> PASS but error thrown on the console related open source NVIDIA driver. [Disk: sdb2 / c0302064-c5a3-49a7-8bd4-402283e6fcbe] Ubuntu, with Linux 4.18.0-7-generic (recovery mode) Ubuntu, with Linux 4.18.0-7-generic Ubuntu [Disk: nvme0n1p2 / c5d042f1-812e-49e0-94b2-ade477084061] Ubuntu, with Linux 4.18.0-7-generic (recovery mode) * Ubuntu, with Linux 4.18.0-7-generic Ubuntu System information System configuration System status log Language Rescan devices Retrieve config from URL Plugins (0) Exit to shell ?????????????????????????????????????????????????????????????????????????????? Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help The system is going down NOW! Sent SIGTERM to all processes Sent SIGKILL to all processes [ 57.513329] kexec_core: Starting new kernel [ 149.358703978,5] OPAL: Switch to big-endian OS [ 153.355498935,5] OPAL: Switch to little-endian OS [ 2.943735] integrity: Unable to open file: /etc/keys/x509_ima.der (-2) [ 2.943738] integrity: Unable to open file: /etc/keys/x509_evm.der (-2) [ 3.132733] vio vio: uevent: failed to send synthetic uevent [ 4.058698] nouveau 0004:04:00.0: gr: failed to load gr/sw_nonctx [ 4.129215] nouveau 0004:04:00.0: DRM: failed to create kernel channel, -22 [ 19.126509] nouveau 0004:04:00.0: DRM: failed to idle channel 0 [DRM] [ 19.281450] nouveau 0004:05:00.0: gr: failed to load gr/sw_nonctx [ 19.351322] nouveau 0004:05:00.0: DRM: failed to create kernel channel, -22 [ 34.350509] nouveau 0004:05:00.0: DRM: failed to idle channel 0 [DRM] [ 34.502063] nouveau 0004:06:00.0: gr: failed to load gr/sw_nonctx [ 34.572144] nouveau 0004:06:00.0: DRM: failed to create kernel channel, -22 [ 49.570509] nouveau 0004:06:00.0: DRM: failed to idle channel 0 [DRM] [ 49.734754] nouveau 0035:03:00.0: gr: failed to load gr/sw_nonctx [ 49.805057] nouveau 0035:03:00.0: DRM: failed to create kernel channel, -22 [ 64.802510] nouveau 0035:03:00.0: DRM: failed to idle channel 0 [DRM] [ 64.955442] nouveau 0035:04:00.0: gr: failed to load gr/sw_nonctx [ 65.025537] nouveau 0035:04:00.0: DRM: failed to create kernel channel, -22 [ 80.022509] nouveau 0035:04:00.0: DRM: failed to idle channel 0 [DRM] [ 80.181169] nouveau 0035:05:00.0: gr: failed to load gr/sw_nonctx [ 80.251481] nouveau 0035:05:00.0: DRM: failed to create kernel channel, -22 [ 95.250509] nouveau 0035:05:00.0: DRM: failed to idle channel 0 [DRM] /dev/nvme0n1p2: recovering journal /dev/nvme0n1p2: clean, 72569/97681408 files, 7384418/390701312 blocks -.mount kmod-static-nodes.service dev-hugepages.mount dev-mqueue.mount sys-kernel-debug.mount ufw.service lvm2-lvmetad.service systemd-remount-fs.service systemd-random-seed.service systemd-sysusers.service keyboard-setup.service systemd-tmpfiles-setup-dev.service lvm2-monitor.service finalrd.service console-setup.service swapfile.swap ebtables.service systemd-udevd.service systemd-journald.service systemd-journal-flush.service systemd-tmpfiles-setup.service systemd-update-utmp.service [ 100.997765] vio vio: uevent: failed to send synthetic uevent systemd-udev-trigger.service systemd-timesyncd.service apparmor.service lvm2-pvscan@8:3.service systemd-modules-load.service sys-kernel-config.mount sys-fs-fuse-connections.mount systemd-sysctl.service ondemand.service dbus.service irqbalance.service opal-prd.service lxcfs.service atd.service cron.service iprdump.service iprinit.service systemd-logind.service iprupdate.service systemd-networkd.service rsyslog.service polkit.service accounts-daemon.service lxd-containers.service networkd-dispatcher.service var-lib-lxcfs.mount tmp-selftest\x2dmountpoint\x2d039055037.mount snapd.service snapd.seeded.service systemd-resolved.service systemd-networkd-wait-online.service blk-availability.service systemd-user-sessions.service apport.service Ubuntu Cosmic Cuttlefish (development branch) ltc-wcwsp3 hvc0 ltc-wcwsp3 login: == Comment: #2 - Kalpana Shetty <kalsh...@in.ibm.com> - 2018-09-16 00:07:26 == sosreport -> http://9.114.13.132/repo/bugs/ubu/sosreport-BZ171506.171506-20180915235600.tar.xz == Comment: #3 - Kalpana Shetty <kalsh...@in.ibm.com> - 2018-09-16 00:33:02 == == Comment: #4 - Praveen K. Pandey <praveen.pan...@in.ibm.com> - 2018-09-19 05:52:23 == facing nouveau related error on power8 system as well [ 4.764818] nouveau 0002:01:00.0: fifo: fault 00 [READ] at 0000000000020000 engine 0c [HOST6] client 06 [GPC0/L1_2] reason 02 [PTE] on channel 0 [03ffb18000 DRM] [ 4.942169] nouveau 000a:01:00.0: fifo: fault 00 [READ] at 0000000000020000 engine 0c [HOST6] client 06 [GPC0/L1_2] reason 02 [PTE] on channel 0 [03ffb18000 DRM] /dev/sdb2: clean, 132397/61054976 files, 5995714/244188416 blocks [ 11.206278] vio vio: uevent: failed to send synthetic uevent [ OK ] Started Show Plymouth Boot Screen. [ OK ] Reached target Local Encrypted Volumes. [ OK ] Started Forward Password Requests to Plymouth Directory Watch. plymouth-start.service [ OK ] Started ebtables ruleset management. == Comment: #5 - Chandni Verma <chand...@in.ibm.com> - 2018-09-20 16:41:49 == --- screening --- From provided dmesg, I notice: 1294 [ 19.281478] nouveau 0004:05:00.0: bios: version 88.00.13.00.02 1295 [ 19.282753] nouveau 0004:05:00.0: Direct firmware load for nvidia/gv100/gr/sw_nonctx.bin failed with error -2 1296 [ 19.282755] nouveau 0004:05:00.0: gr: failed to load gr/sw_nonctx 1297 [ 19.282813] nouveau 0004:05:00.0: Using 32-bit DMA via iommu .. 1322 [ 34.367713] nouveau 0004:06:00.0: NVIDIA GV100 (140000a1) 1323 [ 34.497152] nouveau 0004:06:00.0: bios: version 88.00.13.00.02 1324 [ 34.502736] nouveau 0004:06:00.0: Direct firmware load for nvidia/gv100/gr/sw_nonctx.bin failed with error -2 1325 [ 34.502738] nouveau 0004:06:00.0: gr: failed to load gr/sw_nonctx 1326 [ 34.502797] nouveau 0004:06:00.0: Using 32-bit DMA via iommu .. upto 6 instances of the above... Looks like an NVIDIA firmware issue. == Comment: #6 - Luciano Chavez <cha...@us.ibm.com> - 2018-09-20 17:03:31 == (In reply to comment #5) > --- screening --- > > From provided dmesg, I notice: > > > 1294 [ 19.281478] nouveau 0004:05:00.0: bios: version 88.00.13.00.02 > 1295 [ 19.282753] nouveau 0004:05:00.0: Direct firmware load for > nvidia/gv100/gr/sw_nonctx.bin failed with error -2 > 1296 [ 19.282755] nouveau 0004:05:00.0: gr: failed to load gr/sw_nonctx > 1297 [ 19.282813] nouveau 0004:05:00.0: Using 32-bit DMA via iommu > > .. > > 1322 [ 34.367713] nouveau 0004:06:00.0: NVIDIA GV100 (140000a1) > 1323 [ 34.497152] nouveau 0004:06:00.0: bios: version 88.00.13.00.02 > 1324 [ 34.502736] nouveau 0004:06:00.0: Direct firmware load for > nvidia/gv100/gr/sw_nonctx.bin failed with error -2 > 1325 [ 34.502738] nouveau 0004:06:00.0: gr: failed to load gr/sw_nonctx > 1326 [ 34.502797] nouveau 0004:06:00.0: Using 32-bit DMA via iommu > > .. > > upto 6 instances of the above... > > > Looks like an NVIDIA firmware issue. Well, I think those message mean that the nouveau module can't find the firmware file as opposed to it being a FW issue. Might be a packaging issue if this is actually not causing any real issues. Probably best to mirror this to Canonical for their comment. == Comment: #10 - Chandni Verma <chand...@in.ibm.com> - 2018-09-24 03:25:35 == To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1794055/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp