This bug was fixed in the package linux - 4.8.0-49.52
---------------
linux (4.8.0-49.52) yakkety; urgency=low
* linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)
* [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
(LP: #1682561)
- Drivers: hv: util: move waiting for release to hv_utils_transport itself
linux (4.8.0-48.51) yakkety; urgency=low
* linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)
* [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
(LP: #1681893)
- Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
linux (4.8.0-47.50) yakkety; urgency=low
* linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)
* CVE-2017-6353
- sctp: deny peeloff operation on asocs with threads sleeping on it
* CVE-2017-5986
- sctp: avoid BUG_ON on sctp_wait_for_sndbuf
* vfat: missing iso8859-1 charset (LP: #1677230)
- [Config] NLS_ISO8859_1=y
* [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
- net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
* Regression: KVM modules should be on main kernel package (LP: #1678099)
- [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list
* linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
4.4.0-63.84~14.04.2 (LP: #1664912)
- SAUCE: apparmor: fix link auditing failure due to, uninitialized var
* regession tests failing after stackprofile test is run (LP: #1661030)
- SAUCE: fix regression with domain change in complain mode
* Permission denied and inconsistent behavior in complain mode with 'ip netns
list' command (LP: #1648903)
- SAUCE: fix regression with domain change in complain mode
* unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
from a unshared mount namespace (LP: #1656121)
- SAUCE: apparmor: null profiles should inherit parent control flags
* apparmor refcount leak of profile namespace when removing profiles
(LP: #1660849)
- SAUCE: apparmor: fix ns ref count link when removing profiles from policy
* tor in lxd: apparmor="DENIED" operation="change_onexec"
namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
name="system_tor" (LP: #1648143)
- SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using
stacked
namespaces
* apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
- SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails
* apparmor auditing denied access of special apparmor .null fi\ le
(LP: #1660836)
- SAUCE: apparmor: Don't audit denied access of special apparmor .null file
* apparmor label leak when new label is unused (LP: #1660834)
- SAUCE: apparmor: fix label leak when new label is unused
* apparmor reference count bug in label_merge_insert() (LP: #1660833)
- SAUCE: apparmor: fix reference count bug in label_merge_insert()
* apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
- SAUCE: apparmor: fix replacement race in reading rawdata
* unix domain socket cross permission check failing with nested namespaces
(LP: #1660832)
- SAUCE: apparmor: fix cross ns perm of unix domain sockets
* [Hyper-V][Mellanox] net/mlx4_core: Avoid delays during VF driver device
shutdown (LP: #1672785)
- Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow"
- net/mlx4_core: Avoid delays during VF driver device shutdown
* Update ENA driver to 1.1.2 from net-next (LP: #1664312)
- net: ena: Remove unnecessary pci_set_drvdata()
- net: ena: Fix error return code in ena_device_init()
- net: ena: change the return type of ena_set_push_mode() to be void.
- net: ena: use setup_timer() and mod_timer()
- net/ena: remove ntuple filter support from device feature list
- net/ena: fix queues number calculation
- net/ena: fix ethtool RSS flow configuration
- net/ena: fix RSS default hash configuration
- net/ena: fix NULL dereference when removing the driver after device reset
failed
- net/ena: refactor ena_get_stats64 to be atomic context safe
- net/ena: fix potential access to freed memory during device reset
- net/ena: use READ_ONCE to access completion descriptors
- net/ena: reduce the severity of ena printouts
- net/ena: change driver's default timeouts
- net/ena: change condition for host attribute configuration
- net/ena: update driver version to 1.1.2
* ISST-LTE:pVM:roselp4:ubuntu16.04.2: number of numa_miss and numa_foreign
wrong in numastat (LP: #1672953)
- mm: fix remote numa hits statistics
- mm: get rid of __GFP_OTHER_NODE
* Using an NVMe drive causes huge power drain (LP: #1664602)
- nvme/scsi: Remove power management support
- nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
- nvme: introduce struct nvme_request
- nvme: Add a quirk mechanism that uses identify_ctrl
- nvme: Enable autonomous power state transitions
* POWER9: Additional patches for TTY and CPU_IDLE (LP: #1674325)
- tty: Fix ldisc crash on reopened tty
- SAUCE: powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration
* Ubuntu 16.10: Network checksum fixes needed for IPoIB for Mellanox CX4/CX5
card (LP: #1670247)
- Revert "powerpc: port 64 bits pgtable_cache to 32 bits"
- powerpc/Makefile: Drop CONFIG_WORD_SIZE for BITS
- powerpc: port 64 bits pgtable_cache to 32 bits
- [Config] CONFIG_WORD_SIZE disappeared
- powerpc/64: Fix checksum folding in csum_tcpudp_nofold and
ip_fast_csum_nofold
- powerpc/64: Use optimized checksum routines on little-endian
- CONFIG_GENERIC_CSUM=n for ppc64el
- powerpc/64: Fix checksum folding in csum_add()
* [Hyper-V] Rebase Hyper-V to the upstream 4.10 kernel (LP: #1670544)
- PCI: hv: Use device serial number as PCI domain
- PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
- PCI: hv: Use the correct buffer size in new_pcichild_device()
- scsi: storvsc: Payload buffer incorrectly sized for 32 bit kernels.
- hv_netvsc: remove excessive logging on MTU change
- net: centralize net_device min/max MTU checking
- net: deprecate eth_change_mtu, remove usage
- net: use core MTU range checking in virt drivers
- hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()
- net: use core MTU range checking in virt drivers
- tools: hv: fix a compile warning in snprintf
- tools: hv: remove unnecessary header files and netlink related code
- vmbus: add support for dynamic device id's
- Drivers: hv: utils: reduce HV_UTIL_NEGO_TIMEOUT timeout
- Drivers: hv: utils: Fix the mapping between host version and protocol to
use
- Drivers: hv: vss: Improve log messages.
- hv: change clockevents unbind tactics
- Drivers: hv: balloon: Disable hot add when CONFIG_MEMORY_HOTPLUG is not
set
- Drivers: hv: balloon: Fix info request to show max page count
- Drivers: hv: balloon: Add logging for dynamic memory operations
- [Config] CONFIG_UIO_HV_GENERIC=m
- uio-hv-generic: new userspace i/o driver for VMBus
- hyperv: Fix spelling of HV_UNKOWN
- Drivers: hv: ring_buffer: count on wrap around mappings in
get_next_pkt_raw() (v2)
- ethernet: use net core MTU range checking in more drivers
* Kernel linux-image-4.4.0-67-generic prevent the boot on Microsoft Hyper-v
2012r2 Gen2 VM (LP: #1674635)
- scsi: storvsc: Workaround for virtual DVD SCSI version
* Enable lspcon on i915 (LP: #1676747)
- drm: Helper for lspcon in drm_dp_dual_mode
- drm/i915: Add lspcon support for I915 driver
- drm/i915: Parse VBT data for lspcon
- drm/i915: Enable lspcon initialization
- drm/i915: Add lspcon resume function
* stress_smoke_test passing and exiting rc=9 (linux 4.9.0-12.13 ADT test
failure with linux 4.9.0-12.13) (LP: #1658633)
- ext4: lock the xattr block before checksuming it
* ip_rcv_finish() NULL pointer kernel panic (LP: #1672470)
- (upstream) bridge: drop netfilter fake rtable unconditionally
* dm-queue-length module is not included in installer/initramfs (LP: #1673350)
- d-i: Also add dm-queue-length to multipath modules
* Broadcom bluetooth modules sometimes fail to initialize (LP: #1483101)
- Bluetooth: btbcm: Add a delay for module reset
* Need support of Broadcom bluetooth device [413c:8143] (LP: #1166113)
- Bluetooth: btusb: Add support for 413c:8143
* Unable to Connect Third HDD via USB Hub (LP: #1663991)
- mm/slub.c: fix random_seq offset destruction
* POWER9 : Enable Stop 0-2 with ESL=EC=0 (LP: #1666197)
- powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro
- powernv:stop: Rename pnv_arch300_idle_init to pnv_power9_idle_init
- cpuidle:powernv: Add helper function to populate powernv idle states.
- powernv: Pass PSSCR value and mask to power9_idle_stop
- Documentation:powerpc: Add device-tree bindings for power-mgt
- powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop
* Nvlink2: Additional patches (LP: #1667081)
- mm: enable CONFIG_MOVABLE_NODE on non-x86 arches
- of/fdt: mark hotpluggable memory
- dt: add documentation of "hotpluggable" memory property
- powerpc/mm: Fix memory hotplug BUG() on radix
- powerpc/powernv: Initialise nest mmu
- powerpc/powernv: Use OPAL call for TCE kill on NVLink2
- powerpc/mm: refactor radix physical page mapping
- powerpc/mm: add radix__create_section_mapping()
- powerpc/mm: add radix__remove_section_mapping()
- powerpc/mm: unstub radix__vmemmap_remove_mapping()
- [Config] Update CONFIG_MOVABLE_NODE values and annotations
- [Config] CONFIG_MOVABLE_NODE=n for s390x
* FC Adapter (LPe32000-based) prints "iotag out of range", goes offline, and
delays boot a lot (Ubuntu17.04/Emulex/lpfc)) (LP: #1670490)
- scsi: lpfc: Correct WQ creation for pagesize
- scsi: lpfc: Add missing memory barrier
* CIFS: Call echo service immediately after socket reconnect (LP: #1669941)
- Call echo service immediately after socket reconnect
* Kernel: Fix Transactional memory config typo (LP: #1669023)
- powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state()
* h-prod does not function across cores (LP: #1670726)
- KVM: PPC: Book3S HV: Fix H_PROD to actually wake the target vcpu
* [Hyper-V] Missing PCI patches breaking SR-IOV hot remove (LP: #1670518)
- PCI: hv: Fix hv_pci_remove() for hot-remove
- PCI: hv: Delete the device earlier from hbus->children for hot-remove
- PCI: hv: Make unnecessarily global IRQ masking functions static
- PCI: hv: Allocate physically contiguous hypercall params buffer
* move aufs.ko from -extra to linux-image package (LP: #1673498)
- [config] aufs.ko moved to linux-image package
* POWER9: Improve CAS negotiation (LP: #1671169)
- powerpc: Parse the command line before calling CAS
- powerpc: Add missing error check to prom_find_boot_cpu()
- powerpc/pseries: Advertise HPT resizing support via CAS
- powerpc/64: Disable use of radix under a hypervisor
- powerpc/pseries: Advertise Hot Plug Event support to firmware
- powerpc: Update to new option-vector-5 format for CAS
* Power9 kernel: add virtualization patches (LP: #1670800)
- powerpc/fadump: Set core e_flags using kernel's ELF ABI version
- powerpc/sparse: Add more assembler prototypes
- powerpc/pasemi: Fix Nemo SB600 i8259 interrupts.
- powerpc/pasemi: Fix device_type of Nemo SB600 node.
- powerpc/pseries: Use H_CLEAR_HPT to clear MMU hash table during kexec
- powerpc/pseries: Move CMO code from plapr_wrappers.h to platforms/pseries
- powerpc: Fix old style declaration GCC warnings
- powerpc/pseries: add definitions for new H_SIGNAL_SYS_RESET hcall
- powerpc/prom: Define structs for client architecture vectors
- powerpc/prom: Switch to using structs for ibm_architecture_vec
- tracing: Have the reg function allow to fail
- powerpc: port 64 bits pgtable_cache to 32 bits
- powerpc/64: Don't try to use radix MMU under a hypervisor
- powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options
- powerpc/64: Enable use of radix MMU under hypervisor on POWER9
* lsattr 32bit does not work on 64bit kernel (Inappropriate ioctl error)
(LP: #1619918)
- btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
* linux-tools-common should Depends: lsb-release (LP: #1667571)
- [Config] linux-tools-common depends on lsb-release
* CAPI:Ubuntu: Kernel panic while rebooting (LP: #1667599)
- pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()
* Add Use-After-Free Patch for Ubuntu16.10 - EEH on BELL3 adapter fails to
recover (serial/tty) (LP: #1669153)
- 8250_pci: Fix potential use-after-free in error path
* Request to backport cxlflash patches to Xenial SRU stream (LP: #1623750)
- scsi: cxlflash: Scan host only after the port is ready for I/O
- scsi: cxlflash: Fix to avoid EEH and host reset collisions
- scsi: cxlflash: Improve EEH recovery time
* FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft bootme,
HTX & Linux errors seen with 256 virtual LUNs (LP: #1667239)
- cxl: Fix coredump generation when cxl_get_fd() is used
* POWER9: Additional patches for 17.04 and 16.04.2 (LP: #1667116)
- powerpc/mm: Update PROTFAULT handling in the page fault path
- powerpc/mm/radix: Update pte update sequence for pte clear case
- powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full
mm
- powerpc/mm/radix: Skip ptesync in pte update helpers
- SAUCE: powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting
up CPU
* [Hyper-V] Ubuntu 14.04.2 LTS Generation 2 SCSI Errors on VSS Based Backups
(LP: #1470250)
- Drivers: hv: vss: Operation timeouts should match host expectation
- SAUCE: Tools: hv: vss: Thaw the filesystem and continue after freeze fails
* PowerNV: No rate limit for kernel error "KVM can't copy data from"
(LP: #1667416)
- SAUCE: KVM: PPC: Book3S: Ratelimit copy data failure error messages
* kernel 4.4.0-63 with USB WLAN RTL8192CU freezes desktop (LP: #1666421)
- rtlwifi: rtl_usb: Fix missing entry in USB driver's private data
* Export symbol "dev_pm_qos_update_user_latency_tolerance" (LP: #1666401)
- PM / QoS: Export dev_pm_qos_update_user_latency_tolerance
* Linux ZFS port doesn't respect RLIMIT_FSIZE (LP: #1656259)
- SAUCE: (noup) Update zfs to 0.6.5.8-0ubuntu4.2
-- Stefan Bader <[email protected]> Thu, 20 Apr 2017 09:38:36
+0200
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1667239
Title:
FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft
bootme, HTX & Linux errors seen with 256 virtual LUNs
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Xenial:
Fix Released
Status in linux source package in Yakkety:
Fix Released
Bug description:
== Comment: #1 - Application Cdeadmin <[email protected]> - 2016-06-02
15:28:27 ==
==== State: Open by: anitrap on 01 June 2016 17:36:39 ====
Contact: Anitra Powell ([email protected] )
Backup: Dion Bell ([email protected])
Primary BMC (1603G):
=====================================================
# cat /proc/ractrends/Helper/FwInfo
FW_VERSION=2.13.91819
FW_DATE=Mar 10 2016
FW_BUILDTIME=10:59:31 CDT
FW_DESC=8335 SRC BUILD RR9 03102016
FW_PRODUCTID=1
FW_RELEASEID=RR9
FW_CODEBASEVERSION=2.X
#
PNOR (1603G):
========================
# ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47
Product Name : OpenPOWER Firmware
Product Version : IBM-firestone-ibm-OP8_v1.7_1.62
Product Extra : hostboot-bc98d0b-1a29dff
Product Extra : occ-0362706-16fdfa7
Product Extra : skiboot-5.1.13
Product Extra : hostboot-binaries-43d5a59
Product Extra : firestone-xml-e7b4fa2-c302f0e
Product Extra : capp-ucode-105cb8f
Partition Info:
=================
ver 1.5.4.3 - OS, HTX, Firmware and Machine details
OS: GNU/Linux
OS Version: Ubuntu 16.04 LTS \n \l
Kernel Version: 4.4.8c0ffee0+
HTX Version: htxubuntu-396
Host Name: fsbmc30p1
Machine Serial No: 210995A
Machine Type/Model: 8335-GCA
root@fsbmc30p1:~# uname -a
Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le
ppc64le ppc64le GNU/Linux
FlashGT NVMe setup:
===================
1 FlashGT card in slot 1 running in superpipe mode with 128 LUNs per port
(total of 256 LUNs).
lsscsi
[0:0:0:0] disk ATA ST1000NX0313 BE33 /dev/sda
[1:0:0:0] disk ATA ST1000NX0313 BE33 /dev/sdb
[4:0:0:0] disk NVMe SAMSUNG MZ1LV960 3011 /dev/sdc
[4:1:0:0] disk NVMe SAMSUNG MZ1LV960 3011 /dev/sdd
[5:0:0:0] cd/dvd AMI Virtual CDROM0 1.00 /dev/sr0
[5:0:0:1] cd/dvd AMI Virtual CDROM1 1.00 /dev/sr1
[5:0:0:2] cd/dvd AMI Virtual CDROM2 1.00 /dev/sr2
[5:0:0:3] cd/dvd AMI Virtual CDROM3 1.00 /dev/sr3
[6:0:0:0] disk AMI Virtual Floppy0 1.00 /dev/sde
[6:0:0:1] disk AMI Virtual Floppy1 1.00 /dev/sdf
[6:0:0:2] disk AMI Virtual Floppy2 1.00 /dev/sdg
[6:0:0:3] disk AMI Virtual Floppy3 1.00 /dev/sdh
[7:0:0:0] disk AMI Virtual HDisk0 1.00 /dev/sdi
[7:0:0:1] disk AMI Virtual HDisk1 1.00 /dev/sdj
[7:0:0:2] disk AMI Virtual HDisk2 1.00 /dev/sdk
[7:0:0:3] disk AMI Virtual HDisk3 1.00 /dev/sdl
[7:0:0:4] disk AMI Virtual HDisk4 1.00 /dev/sdm
lspci | grep -i acc
0004:01:00.0 Processing accelerators: IBM Device 0601 (rev 01)
ls -l /sys/class/cxl
total 0
lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0 ->
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0
lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0m ->
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0m
lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0s ->
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0s
lrwxrwxrwx 1 root root 0 May 31 13:27 card0 ->
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0
lscfg | grep afu
+ afu0.0 Slot1/card0/afu0.0
+ afu0.0m Slot1/card0/afu0.0/afu0.0m
+ afu0.0s Slot1/card0/afu0.0/afu0.0s
/opt/ibm/capikv/bin/cxlfstatus
CXL Flash Device Status
Found 0601 0004:01:00.0 Slot1
Device: SCSI Block Mode LUN WWID
sg2: 4:0:0:0, sdc, superpipe, 60025380025382463300046000000000
sg3: 4:1:0:0, sdd, superpipe, 60025380025382463300052000000000
dpkg -l | grep capi
4el no description given 3.0-1970-3042652
ppc6
4el no description given 3.0-1970-3042652
ppc6
root@fsbmc30p1:/tmp# dpkg -l | grep afu
ii afuimage 3.0-1970-3042652
all no description given
cat /opt/ibm/capikv/version.txt
1970-3042652
/opt/ibm/capikv/afu/cxl_afu_dump /dev/cxl/afu0.0m -v
AFU Version = 160525N1
NVMe0 Version = BTV73011
NVMe0 NEXT = BTV73011
NVMe0 STATUS = 0x702
NVMe1 Version = BTV73011
NVMe1 NEXT = BTV73011
NVMe1 STATUS = 0x702
cat /tmp/test_lun_mode
128
Problem:
===========
While running soft bootme (shutdown -r from OS every hour, I noticed htx
errors after the 9th & 17th reboot of partition. At this point they seem like
different issues so I am opening up 2 different defects. I've already opened
up defect SW354759 for the first set of htx errors and assigned to htx_screen.
This defect is for issue that happened after 17th reboot (Jun 1 @
6am). On the 18th reboot (Jun 1 @ 7am), the shutdown -r command
failed... I had to manually power down system.
I guess I will open to surelock_screen first since it seems similar to
the one Dion opened up while running 128 virtual LUNs per port (defect
http://w3.rchland.ibm.com/projects/bestquest/?defect=SW353881) . For
this fail, other exercisers eventually failed also.
Test Info:
============
- running Soft bootme (shutdown -r every hour)
- mdt.bu + hxecom (GPUs were running). I copied a modified mdt.bu to another
mdt file so I would not see any errors in htx after reboot.
Sample of HTX errors (for this defect)
==============================
/dev/sg2.53 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock
READCMP5 numopers= 20000 loop= 4956 blk=0x4eee
len= 4096 offset=0 Seed Values= 37882, 44181, 50758
Data Pattern Seed Values = 37882, 44182, 50758 LBA Fencepost = 0xb94a
cblk_read error - Device or resource busy
/dev/sg2.18 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock
READCMP9 numopers= 20000 loop= 1501 blk=0x93f1
len= 4096 offset=0 Seed Values= 37847, 44740, 50780
Data Pattern Seed Values = 37847, 44741, 50780 LBA Fencepost = 0xb275
cblk_read error - Device or resource busy
/dev/sg2.98 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock
READCMP5 numopers= 20000 loop= 10365 blk=0x86d5
len= 4096 offset=0 Seed Values= 37927, 41320, 50710
Data Pattern Seed Values = 37927, 41321, 50710 LBA Fencepost = 0xbc7c
cblk_read error - Device or resource busy
/dev/sg2.116 Jun 1 06:30:45 2016 err=00000005 sev=4 hxesurelock
RDCMP10 numopers= 20000 loop= 6383 blk=0xc33d
len= 4096 offset=0 Seed Values= 37945, 49039, 50726
Data Pattern Seed Values = 37945, 49040, 50726 LBA Fencepost = 0xd0b0
cblk_read error - Input/output error
/dev/fpu17 Jun 1 06:30:51 2016 err=0000000b sev=1 hxefpu64
pthread_create call failed with rc: 11, errno: 11, Resource temporarily
unavailable
/dev/fpu17 Jun 1 06:30:51 2016 err=0000000b sev=1 hxefpu64
Hardware Exerciser stopped on an error
/dev/sctu43 Jun 1 06:30:51 2016 err=0000000b sev=1 hxesctu
pthread_create call failed with rc: 11, errno: 11, Resource temporarily
unavailable
/dev/sctu43 Jun 1 06:30:51 2016 err=0000000b sev=1 hxesctu
Hardware Exerciser stopped on an error
Logs:
======
/gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1
/gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/htxerr
/gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/syslog
/gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/kern.log
/gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/bootme.log
sample of syslog during first htx error:
================================================
Jun 1 06:19:20 fsbmc30p1 systemd[1]: Started Cleanup of Temporary
Directories.
Jun 1 06:25:01 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next
retry is Wed Jun 1 06:25:31 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Jun 1 06:25:01 fsbmc30p1 CRON[99327]: (root) CMD (test -x /usr/sbin/anacron
|| ( cd / && run-parts --report /etc/cron.daily ))
Jun 1 06:26:53 fsbmc30p1 CXLBLK[37882]:
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num =
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
Jun 1 06:26:53 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next
retry is Wed Jun 1 06:27:23 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Jun 1 06:26:53 fsbmc30p1 CXLBLK[37847]:
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num =
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
Jun 1 06:26:53 fsbmc30p1 CXLBLK[37927]:
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num =
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
Jun 1 06:26:59 fsbmc30p1 CXLBLK[37961]:
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num =
0x607,for chunk->dev_name = /dev/sg3, chunk index = 0
Jun 1 06:26:59 fsbmc30p1 CXLBLK[37954]:
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num =
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
Jun 1 06:26:59 fsbmc30p1 CXLBLK[37887]:
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num =
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
Jun 1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took
200250 ns
sample from kern.log during fail:
=================================
Jun 1 06:08:11 fsbmc30p1 kernel: [ 250.251041] nvidia-uvm: Loaded the UVM
driver in lite mode, major device number 241
Jun 1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took
200250 ns
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.764382] hxesurelock[40392]:
unhandled signal 11 at 0000000000000024 nip 00003fff84602978 lr
00003fff84602974 code 30001
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868242] Unable to handle kernel
paging request for data at address 0x0000000c
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868599] Faulting instruction
address: 0xc00000000035e2b0
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868865] Oops: Kernel access of bad
area, sig: 11 [#1]
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868928] SMP NR_CPUS=2048 NUMA PowerNV
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868992] Modules linked in:
nvidia_uvm(POE) iptable_filter ip_tables x_tables nvidia(POE) ipmi_devintf
joydev input_leds mac_hid opal_prd ofpart cmdlinepart powernv_flash mtd at24
ipmi_powernv ipmi_msghandler uio_pdrv_genirq uio ibmpowernv powernv_rng
binfmt_misc nfsd ib_iser auth_rpcgss rdma_cm iw_cm ib_cm nfs_acl ib_sa ib_mad
lockd ib_core grace ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath
linear mlx4_en hid_generic usbhid hid uas usb_storage cxlflash ast bnx2x
i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
drm cxl vxlan mlx4_core ahci ip6_udp_tunnel udp_tunnel libahci mdio libcrc32c
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870299] CPU: 80 PID: 40392 Comm:
hxesurelock Tainted: P OE 4.4.8c0ffee0+ #2
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870379] task: c000007935fe23a0 ti:
c000007910810000 task.ti: c000007910810000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870476] NIP: c00000000035e2b0 LR:
c00000000035e280 CTR: 0000000000000000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870552] REGS: c0000079108135e0 TRAP:
0300 Tainted: P OE (4.4.8c0ffee0+)
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870642] MSR: 9000000100009033
<SF,HV,EE,ME,IR,DR,RI,LE> CR: 28053988 XER: 00000000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] CFAR: c000000000008468 DAR:
000000000000000c DSISR: 40000000 SOFTE: 1
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR00: c00000000035e280
c000007910813860 c000000001594600 0000000000000000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR04: c000007823192400
000000000002574f 0000000000000001 0000000000000000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR08: c0000079241b8a00
0000000000000000 00000000000044fb 65776f702f62696c
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR12: 2d656c3436637072
c00000000fb6f800 00000000464c457f 0000000000010c78
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR16: 0000000000000000
0000000000000039 d000000034fa04c5 0000000000010000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR20: 00000000000000cd
0000000000000550 0000000000010000 00000000039e0000
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR24: 00003fffffffffff
c000007910813af8 c000007823192600 c00000793f57b980
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR28: c00000793f573e80
00003fffffffffff 000000000000001f c000007926f29790
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872149] NIP [c00000000035e2b0]
elf_core_dump+0xd60/0x1300
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872277] LR [c00000000035e280]
elf_core_dump+0xd30/0x1300
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872351] Call Trace:
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872407] [c000007910813860]
[c00000000035e280] elf_core_dump+0xd30/0x1300 (unreliable)
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872527] [c000007910813a60]
[c00000000036898c] do_coredump+0xcec/0x11e0
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872625] [c000007910813c20]
[c0000000000ce7a0] get_signal+0x540/0x7b0
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872705] [c000007910813d10]
[c000000000017344] do_signal+0x54/0x2b0
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872785] [c000007910813e00]
[c00000000001776c] do_notify_resume+0xbc/0xd0
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872877] [c000007910813e30]
[c000000000009838] ret_from_except_lite+0x64/0x68
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872963] Instruction dump:
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.873004] 60000000 2fa30000 409effa8
e95f0050 39200000 794737e3 4082ffa4 e91f00a0
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.873148] 2fa80000 419e002c e92800f8
e9290000 <8129000c> 79279fe3 41820018 7948efe3
Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.884655] ---[ end trace
f8abb6e0d0322daa ]---
gsave info:
==============
GSA Location:
/gsa/ausgsa/projects/s/sift/hst/trial_data/Surelock/Ubuntu/flashgt/fsbmc30p1_ubuntu1604_FlashGT_bootme_test5/FAIL201606011024
<===== This is from RTC side description =====>
See the Discussion field for the initial comments from CQ.
</===== This is from RTC side description =====>
==== State: Open by: mpvageli on 02 June 2016 14:20:06 ====
Oops: Kernel access of bad area, sig: 11 [#1]
# ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47
Product Name : OpenPOWER Firmware
Product Version : IBM-firestone-ibm-OP8_v1.7_1.62
Product Extra : hostboot-bc98d0b-1a29dff
Product Extra : occ-0362706-16fdfa7
Product Extra : skiboot-5.1.13
Product Extra : hostboot-binaries-43d5a59
Product Extra : firestone-xml-e7b4fa2-c302f0e
Product Extra : capp-ucode-105cb8f
== Comment: #9 - VIPIN K. PARASHAR <[email protected]> - 2016-06-07
12:04:49 ==
root@fsbmc30p1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04 LTS
Release: 16.04
Codename: xenial
root@fsbmc30p1:~# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
root@fsbmc30p1:~# uname -a
Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le
ppc64le ppc64le GNU/Linux
root@fsbmc30p1:~#
== Comment: #24 - VIPIN K. PARASHAR <[email protected]> - 2016-07-07
07:14:05 ==
From kernel logs
===========
[ 7087.918089] device enP3p5s0f2 left promiscuous mode
[ 8801.190528] cxlflash 0007:00:00.0: send_tmf: TMF timed out!
[ 8806.190383] cxlflash 0007:00:00.0: send_tmf: TMF timed out!
[ 8816.507485] hxesurelock[14180]: unhandled signal 11 at 0000000000000024
nip 00003fff852c2ee8 lr 00003fff852c2938 code 30001
[ 8816.511368] hxesurelock[13501]: unhandled signal 11 at 0000000000000024
nip 00003fff890b2ee8 lr 00003fff890b2938 code 30001
[ 8816.526807] Unable to handle kernel paging request for data at address
0x0000000c
[ 8816.526928] Faulting instruction address: 0xc00000000035e2b0
[ 8816.530233] Unable to handle kernel paging request for data at address
0x0000000c
[ 8816.530596] Faulting instruction address: 0xc00000000035e2b0
3f:mon> t
[c000000686a13a60] c00000000036898c do_coredump+0xcec/0x11e0
[c000000686a13c20] c0000000000ce7a0 get_signal+0x540/0x7b0
[c000000686a13d10] c000000000017344 do_signal+0x54/0x2b0
[c000000686a13e00] c00000000001776c do_notify_resume+0xbc/0xd0
[c000000686a13e30] c000000000009838 ret_from_except_lite+0x64/0x68
--- Exception: 300 (Data Access) at 00003fff890b2ee8
SP (3fff83c2c490) is in userspace
3f:mon> r
R00 = c00000000035e280 R16 = 0000000000000000
R01 = c000000686a13860 R17 = 0000000000000042
R02 = c000000001594600 R18 = d000000021b104fa
R03 = 0000000000000000 R19 = 0000000000010000
R04 = c000002fb7463400 R20 = 00000000000000cd
R05 = 00000000000001bf R21 = 0000000000000628
R06 = 0000000000000001 R22 = 0000000000010000
R07 = 0000000000000000 R23 = 0000000000250000
R08 = c00000281af21500 R24 = 00003fffffffffff
R09 = 0000000000000000 R25 = c000000686a13af8
R10 = 00000000000044fb R26 = c000002fb7463800
R11 = 6c2d656c34366370 R27 = c000002ff0e05cc0
R12 = 756e672d78756e69 R28 = c000002ff0e05c40
R13 = c00000000fb65680 R29 = 00003fffffffffff
R14 = 00000000464c457f R30 = 0000000000000016
R15 = 0000000000010e70 R31 = c000002fb94bd3b8
pc = c00000000035e2b0 elf_core_dump+0xd60/0x1300
cfar= c000000000008468 slb_miss_realmode+0x50/0x78
lr = c00000000035e280 elf_core_dump+0xd30/0x1300
msr = 9000000100009033 cr = 28053828
ctr = 0000000000000000 xer = 0000000000000000 trap = 300
dar = 000000000000000c dsisr = 40000000
3f:mon>
hxesurelock process has segfaulted and kernel has crashed while
dumping core.
== Comment: #87 - Frederic Barrat <[email protected]> - 2017-02-21
11:50:40 ==
Fix is in kernel v4.10:
bdecf76e319a29735d828575f4a9269f0e17c547
"cxl: Fix coredump generation when cxl_get_fd() is used"
We'd like to have it backported to 16.10 and 16.04 LTS.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667239/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp