[Group.of.nepali.translators] [Bug 1824864] Re: CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64

2019-08-14 Thread Terry Rudd
** Changed in: linux (Ubuntu Cosmic)
   Status: Fix Committed => Invalid

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1824864

Title:
  CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Invalid
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * Too small dmsg kernel buf ring size leads to loosing/missing early
  boot kernel messages which happen before journald starts slurping them
  up and storing them on disc. This results in messages similar to this
  one on boot "missed NN kernel messages on boot". This is especially
  pronounced on arm64 as the default setting there is way lower than any
  other 32bit or 64bit architecture we ship. Also amd64 appears to have
  the highest setting of 18 among all architectures we ship. The best
  course of action to bump all 64bit arches to 18, and keep all 32bit
  arches at the current & upstream default of 17.

  [Test Case]

   * $ cat /boot/config-`uname -r` | grep CONFIG_LOG_BUF_SHIFT

  on 64bit arches result should be: CONFIG_LOG_BUF_SHIFT=18
  on 32bit arches result should be: CONFIG_LOG_BUF_SHIFT=17

   * run systemd adt test, the boot-and-services test case should not
  fail journald tests with "missed kernel messages" visible in the error
  logs.

  [Regression Potential]

   * Increasing the size of the log_buf, will increase kernel memory
  usage which cannot be reclaimed. It will now become 256kb on arm64,
  ppc64el, s390x instead of 8kB/128kb/128kb respectively. 32bit arches
  remain unchanged at 128kb.

  [Other Info]
   
   * Original bug report

  CONFIG_LOG_BUF_SHIFT
  policy<{
  'amd64'  : '18',
  'arm64'  : '14',
  'armhf'  : '17',
  'i386'   : '17',
  'ppc64el': '17',
  's390x'  : '17'}>

  Please set CONFIG_LOG_BUF_SHIFT to at least 17 on arm64.

  Potentially bump all 64-bit arches to 18 (or higher!) as was done on
  amd64, meaning set 18 on arm64 s390x ppc64el.

  I have a systemd autopkgtest test that asserts that we see Linux
  kernel command line in the dmesg (journalctl -k -b). And it is
  consistently failing on arm64 scalingstack KVM EFI machines with
  messages of "missing 81 kernel messages".

  config LOG_BUF_SHIFT
  int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
  range 12 25
  default 17
  depends on PRINTK
  help
    Select the minimal kernel log buffer size as a power of 2.
    The final size is affected by LOG_CPU_MAX_BUF_SHIFT config
    parameter, see below. Any higher size also might be forced
    by "log_buf_len" boot parameter.

    Examples:
   17 => 128 KB
   16 => 64 KB
   15 => 32 KB
   14 => 16 KB
   13 =>  8 KB
   12 =>  4 KB

  14 sounds like redictiously low for arm64. given that 17 is default
  across 32-bit arches, and 18 is default on amd64.

  On a related note, we have CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT
policy<{'amd64': '13', 'arm64': '13', 'armhf': '13', 'i386': '13', 'ppc64el': 
'13', 's390x': '13'}>
  I'm not sure if we want to bump these up to LOG_BUF_SHIFT size or not.

  Please backport this to xenial and up.

  === systemd ===

  systemd, boot-and-services test case can bump the ring buffer before
  running the tests.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824864/+subscriptions

___
Mailing list: https://launchpad.net/~group.of.nepali.translators
Post to : group.of.nepali.translators@lists.launchpad.net
Unsubscribe : https://launchpad.net/~group.of.nepali.translators
More help   : https://help.launchpad.net/ListHelp


[Group.of.nepali.translators] [Bug 1824687] Re: 4.4.0-145-generic Kernel Panic ip6_expire_frag_queue

2019-08-14 Thread Terry Rudd
** Changed in: linux (Ubuntu Cosmic)
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1824687

Title:
  4.4.0-145-generic Kernel Panic  ip6_expire_frag_queue

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Cosmic:
  Invalid
Status in linux source package in Disco:
  Triaged

Bug description:
  [SRU Justification]

  == Impact ==

  Since 05c0b86b96 "ipv6: frags: rewrite ip6_expire_frag_queue()" the
  16.04/4.4 kernel crashes whenever that functions gets called (on busy
  systems this can be every 3-4 hours). While this potentially affects
  Cosmic and later, too, the fix differs on later kernels (Bionic is not
  yet affected as it does not yet carry updates to the frags handling).

  == Fix ==

  For Xenial and Cosmic, the proposed fix would be additional changes to 
ip6_expipre_frag_queue(), taken from follow-up changes to ip_expire().
  For Disco, I would hold back because we have a backlog of stable patches 
there and depending on what got backported to 5.0.y there would be a simpler 
fix.
  For current development kernels, one just needs to ensure that the following 
upstream change is included: 47d3d7fdb10a "ip6: fix skb leak in 
ip6frag_expire_frag_queue()".

  == Testcase ==

  Unfortunately this could not be re-created locally. But a test kernel
  which had the proposed fix applied was showing good testing (see
  comment #37 and #38).

  == Risk of Regression ==

  The modified function is only called in rare cases and the positive
  testing in production would cover this. So I would consider it low.

  ---

  Description:  Ubuntu 16.04.6 LTS
  Release:  16.04

  After upgrading our server to this Kernel we experience frequent Kernel 
panics (Attachment).
  Every 3 hours.
  Our machine has a throuput of about 600 Mbits/s
  The Panics are around the area of ip6_expire_frag_queue.

    __pskb_pull_tail
    ip6_dst_lookup_tail
    _decode_session6
    __xfrm_decode_session
    icmpv6_route_lookup
    icmp6_send

  It seems similar to Bug Report in Debian.
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=922488

  According to the bug finder of above bug it also occurred after using a 
Kernel with the change of
  rewrite ip6_expire_frag_queue()

  Intermediate solution. We disabled IPv6 on this machine to avoid further 
Panics.
  Please let me know what information is missing. The ubuntu-bug linux was 
send. And I hope it is attached to this report.

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-145-generic 4.4.0-145.171
  ProcVersionSignature: Ubuntu 4.4.0-145.171-generic 4.4.176
  Uname: Linux 4.4.0-145-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2.18
  Architecture: amd64
  Date: Sun Apr 14 11:40:11 2019
  InstallationDate: Installed on 2018-03-18 (391 days ago)
  InstallationMedia: Ubuntu-Server 16.04.4 LTS "Xenial Xerus" - Release amd64 
(20180228)
  ProcEnviron:
   LANGUAGE=en_GB:en
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed
  UpgradeStatus: Upgraded to xenial on 2018-10-21 (174 days ago)
  ---
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Apr 12 21:04 seq
   crw-rw 1 root audio 116, 33 Apr 12 21:04 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.18
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  HibernationDevice: RESUME=/dev/mapper/tor3--vg-swap_1
  InstallationDate: Installed on 2018-03-18 (393 days ago)
  InstallationMedia: Ubuntu-Server 16.04.4 LTS "Xenial Xerus" - Release amd64 
(20180228)
  IwConfig: Error: [Errno 2] No such file or directory
  Lsusb:
   Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon
   Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   LANGUAGE=en_GB:en
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-145-generic 
root=/dev/mapper/hostname--vg-root ro
  ProcVersionSignature: Ubuntu 4.4.0-145.171-generic 4.4.176
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-145-generic N/A
   linux-backports-modules-4.4.0-145-generic  N/A
   linux-firmware

[Group.of.nepali.translators] [Bug 1835322] Re: [linux-azure] panic in ext4_resize_fs() found during storage testing

2019-08-08 Thread Terry Rudd
** Changed in: linux-azure (Ubuntu Cosmic)
   Status: Fix Committed => Invalid

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1835322

Title:
  [linux-azure] panic in ext4_resize_fs()  found during storage testing

Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux-azure source package in Xenial:
  Fix Released
Status in linux-azure source package in Cosmic:
  Invalid

Bug description:
  A panic was observed during file system testing.  The trace is the
  following:

  [ 8783.243586] kernel BUG at 
/build/linux-azure-3iFJ9j/linux-azure-4.18.0/fs/ext4/resize.c:266!
  [ 8783.252751] invalid opcode:  [#1] SMP PTI
  [ 8783.256735] CPU: 7 PID: 39476 Comm: resize2fs Not tainted 
4.18.0-1023-azure #24~18.04.1-Ubuntu
  [ 8783.256735] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090007  06/02/2017
  [ 8783.256735] RIP: 0010:ext4_resize_fs+0x73b/0xf10
  [ 8783.256735] Code: 50 ff ff ff 41 8b 75 10 4d 8b 65 00 85 f6 0f 94 c0 4d 85 
e4 0f 94 c1 09 c8 83 bd 5c ff ff ff 01 7e 48 84 c0 0f 84 43 06 00 00 <0f> 0b 48 
c7 c2 68 a7 8d 8f 48 c7 c6 00 fb 88 8f 4c 89 f7 e8 0d f8
  [ 8783.256735] RSP: 0018:984e8dce7cb0 EFLAGS: 00010202
  [ 8783.256735] RAX: 00205c01 RBX: 001f RCX: 

  [ 8783.256735] RDX: 8b1dbe1367d0 RSI:  RDI: 

  [ 8783.256735] RBP: 984e8dce7d88 R08: 984e8dce7d4c R09: 
984e8dce7d54
  [ 8783.256735] R10: 0120 R11: 0001 R12: 
8b1dbe136800
  [ 8783.256735] R13: 8b1d74aefe80 R14: 8b1dbdeb9000 R15: 

  [ 8783.256735] FS:  7f213fed30c0() GS:8b1ded7c() 
knlGS:
  [ 8783.256735] CS:  0010 DS:  ES:  CR0: 80050033
  [ 8783.256735] CR2: 556aa08ae9b8 CR3: 001b8e324005 CR4: 
003606e0
  [ 8783.256735] DR0:  DR1:  DR2: 

  [ 8783.256735] DR3:  DR6: fffe0ff0 DR7: 
0400
  [ 8783.256735] Call Trace:
  [ 8783.256735]  ? security_capable+0x3c/0x60
  [ 8783.256735]  ext4_ioctl+0xf91/0x14d0
  [ 8783.256735]  ? audit_filter_rules.constprop.14+0x325/0xf90
  [ 8783.256735]  ? audit_filter_rules.constprop.14+0x24b/0xf90
  [ 8783.256735]  do_vfs_ioctl+0xa8/0x630
  [ 8783.256735]  ksys_ioctl+0x75/0x80
  [ 8783.256735]  __x64_sys_ioctl+0x1a/0x20
  [ 8783.256735]  do_syscall_64+0x6a/0x1a0
  [ 8783.256735]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 8783.256735] RIP: 0033:0x7f213f3825d7
  [ 8783.256735] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
  [ 8783.256735] RSP: 002b:7ffe8effd688 EFLAGS: 0246 ORIG_RAX: 
0010
  [ 8783.256735] RAX: ffda RBX: 556aa08aa980 RCX: 
7f213f3825d7
  [ 8783.256735] RDX: 7ffe8effd7d0 RSI: 40086610 RDI: 
0004
  [ 8783.256735] RBP: 0004 R08:  R09: 

  [ 8783.256735] R10:  R11: 0246 R12: 
556aa08ac980
  [ 8783.256735] R13: 7ffe8effd7d0 R14: 556aa08a92d0 R15: 


  
  This issue is resolved by the following upstream commit:
  f96c3ac8dfc2 ("ext4: fix crash during online resizing")

  
  Commit f96c3ac8dfc2 is in mainline as of v5.1-rc1.  This commit was requested 
in the upstream stable kernels.  However, the Ubuntu kernels are EOL upstream.  
Please include this commit in the 16.04 and 18.04 linux-azure kernels.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1835322/+subscriptions

___
Mailing list: https://launchpad.net/~group.of.nepali.translators
Post to : group.of.nepali.translators@lists.launchpad.net
Unsubscribe : https://launchpad.net/~group.of.nepali.translators
More help   : https://help.launchpad.net/ListHelp


[Group.of.nepali.translators] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-02-21 Thread Terry Rudd
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  New

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

___
Mailing list: https://launchpad.net/~group.of.nepali.translators
Post to : group.of.nepali.translators@lists.launchpad.net
Unsubscribe : https://launchpad.net/~group.of.nepali.translators
More help   : https://help.launchpad.net/ListHelp