[Kernel-packages] [Bug 1879707] Comment bridged from LTC Bugzilla
--- Comment From jan.hoepp...@de.ibm.com 2020-10-15 13:47 EDT--- (In reply to comment #34) > So I took the time to re-test this again. > My z/VM guest has 4 CPUs (but SMT on), and 4 DASD FBA devices that equally > split a 64GB zFCP/SCSI LUN in 4 16GB FBA chunks. > > I've tested (in comment #8) with 2GB RAM where things worked and I wasn't > able to recreate the error situation. > I then moved to 6GB RAM and things still worked for me. > Then 8GB - where everything was still fine. > And finally 10GB - still don't see the issue. > > $ grep -i 'error\|crash\|crit\|panic\|I\/O\|erp\|sense\|fba' /var/log/syslog > ul 28 10:05:23 hwe0005 systemd[1]: Stopping LSB: automatic crash report > generation... > Jul 28 10:05:23 hwe0005 systemd[1]: Stopping Configure dump on panic for > System z... > Jul 28 10:07:36 hwe0005 systemd-udevd[514]: dasd-fba: > /etc/udev/rules.d/41-generic-ccw-0.0.0009.rules:7 Failed to write > ATTR{/sys/devices/css0/0.0.0007/0.0.0009/online}, ignoring: Invalid argument > Jul 28 10:07:36 hwe0005 systemd-udevd[511]: 0.0.0102: > /etc/udev/rules.d/41-dasd-fba-0.0.0102.rules:7 Failed to write > ATTR{/sys/devices/css0/0.0.0001/0.0.0102/online}, ignoring: Invalid argument > Jul 28 10:07:36 hwe0005 systemd-udevd[522]: 0.0.0101: > /etc/udev/rules.d/41-dasd-fba-0.0.0101.rules:7 Failed to write > ATTR{/sys/devices/css0/0.0./0.0.0101/online}, ignoring: Invalid argument > Jul 28 10:07:36 hwe0005 systemd-udevd[522]: 0.0.0103: > /etc/udev/rules.d/41-dasd-fba-0.0.0103.rules:7 Failed to write > ATTR{/sys/devices/css0/0.0.0002/0.0.0103/online}, ignoring: Invalid argument > Jul 28 10:07:36 hwe0005 systemd-udevd[505]: 0.0.0104: > /etc/udev/rules.d/41-dasd-fba-0.0.0104.rules:7 Failed to write > ATTR{/sys/devices/css0/0.0.0003/0.0.0104/online}, ignoring: Invalid argument > Jul 28 10:07:36 hwe0005 kernel: [4.983272] dasd-fba.f36f2f: 0.0.0101: > New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk > Jul 28 10:07:36 hwe0005 kernel: [4.988020] dasd-fba.f36f2f: 0.0.0102: > New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk > Jul 28 10:07:36 hwe0005 kernel: [4.990317] dasd-fba.f36f2f: 0.0.0103: > New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk > Jul 28 10:07:36 hwe0005 kernel: [4.992370] dasd-fba.f36f2f: 0.0.0104: > New FBA DASD 9336/10 (CU 6310/80) with 16384 MB and 512 B/blk > Jul 28 10:07:36 hwe0005 systemd[1]: Condition check resulted in Process > error reports when automatic reporting is enabled (file watch) being skipped. > Jul 28 10:07:36 hwe0005 systemd[1]: Condition check resulted in Unix socket > for apport crash forwarding being skipped. > Jul 28 10:07:36 hwe0005 systemd[1]: Starting LSB: automatic crash report > generation... > Jul 28 10:07:36 hwe0005 systemd[1]: Starting Configure dump on panic for > System z... > Jul 28 10:07:36 hwe0005 apport[764]: * Starting automatic crash report > generation: apport > Jul 28 10:07:36 hwe0005 dumpconf[770]: stop on panic configured. > Jul 28 10:07:36 hwe0005 systemd[1]: Finished Configure dump on panic for > System z. > Jul 28 10:07:36 hwe0005 systemd[1]: Started LSB: automatic crash report > generation. > > I'm wondering a bit about the systemd msgs and the sysfs device tree. But > other than that no ERP, sense, or panics so far ... > > $ dmesg | grep -i 'error\|fail\|crash\|warn\|crit\|panic\|erp\|fba' > [4.983272] dasd-fba.f36f2f: 0.0.0101: New FBA DASD 9336/10 (CU 6310/80) > with 16383 MB and 512 B/blk > [4.988020] dasd-fba.f36f2f: 0.0.0102: New FBA DASD 9336/10 (CU 6310/80) > with 16383 MB and 512 B/blk > [4.990317] dasd-fba.f36f2f: 0.0.0103: New FBA DASD 9336/10 (CU 6310/80) > with 16383 MB and 512 B/blk > [4.992370] dasd-fba.f36f2f: 0.0.0104: New FBA DASD 9336/10 (CU 6310/80) > with 16384 MB and 512 B/blk > [5.075981] random: 7 urandom warning(s) missed due to ratelimiting > > I always did a quick check of the partition data: > > ubuntu@hwe0005:~$ sudo fdisk -l /dev/dasde1 > Disk /dev/dasde1: 15.102 GiB, 17178902528 bytes, 33552544 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 512 bytes > I/O size (minimum/optimal): 512 bytes / 512 bytes > > And then created a ext3 file system using -F on all 4 FBA devices one after > the other: > > ubuntu@hwe0005:~$ sudo mkfs.ext3 -F /dev/dasde1 > mke2fs 1.45.5 (07-Jan-2020) > /dev/dasde1 contains a ext3 file system > created on Tue Jul 28 09:45:37 2020 > Discarding device blocks: done > Creating filesystem with 4194068 4k blocks and 1048576 inodes > Filesystem UUID: c34e7583-1dc9-4b8a-8494-7a100338a7e6 > Superblock backups stored on blocks: > 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, > 4096000 > > Allocating group tables: done > Writing inode tables: done > Creating journal (16384 blocks): done > Writing superblocks and filesystem accounting information: done > > Does it have a dependency on a certain z/VM version: > > And I'm running this z/VM version: > 00: CP Q CPLEVEL
[Kernel-packages] [Bug 1879707] Comment bridged from LTC Bugzilla
--- Comment From jan.hoepp...@de.ibm.com 2020-07-24 06:10 EDT--- (In reply to comment #29) > I just wanted to try the kernel that I've build yesterday (see comment #6) > and as a first step I wanted to recreate re-create the described error > situation on an up to date 20.04 system: > $ lsb_release -a > No LSB modules are available. > Distributor ID: Ubuntu > Description: Ubuntu 20.04 LTS > Release: 20.04 > Codename: focal > $ uname -a > Linux zlin42 5.4.0-40-generic #44-Ubuntu SMP Mon Jun 22 23:57:33 UTC 2020 > s390x s390x s390x GNU/Linux > $ apt-cache policy linux-generic > linux-generic: > Installed: 5.4.0.40.43 > Candidate: 5.4.0.40.43 > Version table: > *** 5.4.0.40.43 500 > 500 http://ports.ubuntu.com/ubuntu-ports focal-updates/main s390x Packages > 500 http://ports.ubuntu.com/ubuntu-ports focal-security/main s390x Packages > 100 /var/lib/dpkg/status > 5.4.0.26.32 500 > 500 http://ports.ubuntu.com/ubuntu-ports focal/main s390x Packages > $ > > But by surprise I didn't ran into any problems! > > I was able to flawlessly make use of FBA devices in the following three > different way: > 1) (re-)used a FBA device that was previously in use, without wiping out any > data > 2) used a FBA device that I wiped out using wipefs > 3) used a FBA device that I entirely wiped out (and zeroed) using dd > (see the attached doc for more details) > > Looks like a fix for this problem came in with the kernels that were rolled > out in between 5.4.0-29-generic and 5.4.0-40-generic. > (Again I didn't installed the patched kernel, I just used the latest > official one.) > > Message to the initial bug reporter: > Please can you retry on an up-to-date 20.04 system (or after having updated > your existing one, like for example with "sudo apt -y -q update && sudo apt > -y -q full-upgrade" plus reboot in case the kernel got update, what I assume > will happen)? The issue is more likely to occur when the system has > 2GB of memory. I can't see in your attachment what system configuration you're running, but maybe that's the reason you're not running into the problem. This was also my mistake when I tried to reproduce the problem. Once I had my configuration changed to 8GB of memory I ran into the error immediately. The problem really is that the ZERO_PAGE allocation for the discard I/O can't be used on systems with > 2GB of memory. The fix I proposed actually solves the problem and is currently in review. I'll post the commit id once the fix went upstream. Jan -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1879707 Title: [UBUNTU 20.04] mke2fs dasd(fba),Failing CCW,default ERP has run out of retries and failed Status in Ubuntu on IBM z Systems: Incomplete Status in linux package in Ubuntu: New Bug description: mke2fs,dasd(fba) guest edevices FBA,default ERP has run out of retries and failed,Failing CCW ---uname output--- xx - 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:27:18 UTC 2020 s390x s390x s390x GNU/Linux Machine Type = IBM 3906 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- mke2fs to dasd(fba) devices Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. -Post a private note with access information to the machine that the bug is occuring on. -Attach sysctl -a output output to the bug. dasd(fba),Failing CCW,default ERP has run out of retries and failed between the following syslog events, mke2fs running, before mounting and starting IO to dasd(fba) devices May 14 14:33:32 ilabg13 root: ILAB_IO_FROM_MSDI_START May 14 14:48:34 ilabg13 root: ILAB_IO_FROM_MSDI_RUNNING To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1879707/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1879707] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2020-07-22 03:20 EDT--- @Canonical: Thanks for providing the test build. Does that fix can also be tested in your environment, where other BZ relates to.. Many thx -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1879707 Title: [UBUNTU 20.04] mke2fs dasd(fba),Failing CCW,default ERP has run out of retries and failed Status in Ubuntu on IBM z Systems: In Progress Status in linux package in Ubuntu: New Bug description: mke2fs,dasd(fba) guest edevices FBA,default ERP has run out of retries and failed,Failing CCW ---uname output--- xx - 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:27:18 UTC 2020 s390x s390x s390x GNU/Linux Machine Type = IBM 3906 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- mke2fs to dasd(fba) devices Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. -Post a private note with access information to the machine that the bug is occuring on. -Attach sysctl -a output output to the bug. dasd(fba),Failing CCW,default ERP has run out of retries and failed between the following syslog events, mke2fs running, before mounting and starting IO to dasd(fba) devices May 14 14:33:32 ilabg13 root: ILAB_IO_FROM_MSDI_START May 14 14:48:34 ilabg13 root: ILAB_IO_FROM_MSDI_RUNNING To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1879707/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp