[Bug 1940723] Re: GRUB (re)installation failing due to stale grub-{pc, efi}/install_devices

2024-10-02 Thread Trent Lloyd
See also: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2083176

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1940723

Title:
  GRUB (re)installation failing due to stale
  grub-{pc,efi}/install_devices

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1940723/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2083176] Re: grub-efi/install_devices becoming stale due to by-id/nvme-eui.* symlinks disappearing

2024-10-02 Thread Trent Lloyd
I looked into this a few months ago for slightly different reasons
(juju/maas getting confused and not identifying a disk, due to differing
kernels used for install vs boot), I can confirm I found at the time
that the nvme by-id symlinks change due to backporting of the
NVME_QUIRK_BOGUS_NID quirk. This was

Unfortunately backports of this quirk for random SSD models has been
regularly done to linux -stable kernels upstream. I ran out of time to
follow-up on this at the time, but probably this practice needs to be
raised upstream with the kernel and possibly needs to stop and/or some
solution to do with the symlinks needs to happen, I didn't quite get as
far as understanding why the BOGUS NID matters and what that breaks, or
what is fixed by the change, fully.

There are a couple of other open bugs related to this issue, e.g. where it also 
breaks on upgrade:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2039108
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1940723

in my juju/maas case this was happening with VirtIO SCSI devices too,
not a real SSD. As that was also quirked. May make for a way to
reproduce the issue without one of the effected SSDs.

Possibly also related links I collected:
https://lore.kernel.org/all/20220606064055.ga2...@lst.de/T/#madf46b0ae9d07405bad2e324cb782c477e7518b2:
https://bugs.launchpad.net/curtin/+bug/2015100
https://bugzilla.redhat.com/show_bug.cgi?id=2031810
https://bugzilla.kernel.org/show_bug.cgi?id=217981
https://www.truenas.com/community/threads/bluefin-to-cobia-rc1-drive-now-fails-with-duplicate-ids.113205/

** Bug watch added: Red Hat Bugzilla #2031810
   https://bugzilla.redhat.com/show_bug.cgi?id=2031810

** Bug watch added: Linux Kernel Bug Tracker #217981
   https://bugzilla.kernel.org/show_bug.cgi?id=217981

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083176

Title:
  grub-efi/install_devices becoming stale due to by-id/nvme-eui.*
  symlinks disappearing

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2083176/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2064717] Re: ceph-volume needs "packaging" and "ceph" modules

2024-08-28 Thread Trent Lloyd
Note: This issue is more impactful than I initially realised. I was
thinking it was mainly an issue on initial deploy, but if you upgrade
your deployment to 18.2.4 and then reboot a node, the OSDs won't start,
because the ceph-volume tool is needed to activate the OSDs.

** Changed in: cloud-archive/bobcat
   Importance: Undecided => High

** Changed in: cloud-archive/bobcat
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2064717

Title:
  ceph-volume needs "packaging" and "ceph" modules

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/2064717/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2064717] Re: ceph-volume needs "packaging" and "ceph" modules

2024-08-28 Thread Trent Lloyd
OK, well we learnt now that only upgrading and not doing a fresh
deployment, and only doing the ceph-mon tests is not enough. Indeed,
let's work on a more concrete/full test plan. I have some strong
thoughts for that so will discuss with you and Utkarsh, etc.

Luciano: In the mean time, can you prioritise a ceph 18.2.4 SRU to fix
this regression please (not the charm fix)? Like this week preferably.
We have customers actually using this and affected by it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2064717

Title:
  ceph-volume needs "packaging" and "ceph" modules

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/2064717/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2064717] Re: ceph-volume needs "packaging" and "ceph" modules

2024-08-27 Thread Trent Lloyd
I suspect the reason this was not picked up in the SRU test, is possibly
related to code from the Squid charm being used in the test instead of
the Reef Charm.

The squid charm merged a "tactical fix" to manually install
python3-packaging in this change:
https://review.opendev.org/c/openstack/charm-ceph-osd/+/918992

But not for Reef, it was originally proposed but abandoned when we
thought it wasn't needed for Reef:
https://review.opendev.org/c/openstack/charm-ceph-osd/+/919794

The Squid charm supports running/installing reef because you're expected
to upgrade the charm before Ceph itself, to orchestrate an upgrade. So
both the Reef and Squid charm branches have a test for Reef
(tests/bundles/jammy-bobcat).

IMHO merging this charm change was a bad idea and it should be reverted
once all the packages are fixed. The package should simply have been
fixed immediately in the first instance.

While I can appreciate this might have been done as a stop-gap to get
the charm CI working while the issue was not yet fixed in an SRU, the
problem is that we are using the charm tests to verify the SRU of the
Ubuntu package which is potentially (and actually, even in the cloud-
archive) used by people without the charms, so this is likely to hide
such an issue as it did here. It also means we don't have a functional
test to actually test that the issue is fixed, both in the Reef and
Squid SRUs.

I can't quite figure out exactly how this test was done though. The
original message said it was tested with the ceph-osd charm tests, but
the zaza.openstack.charm_tests.ceph.tests.CephPrometheusTest test listed
in the output only exists in charm-ceph-mon. 

Then those tests all use
the reef branch of the charm.. I am guessing maybe since we had to test
with bobcat-proposed that the squid branch was used by with openstack-
origin overriden to bobcat-proposed or something?

Luciano: Would be
great if you can clarify/reverse engineer exactly how you managed to do
this so we can learn for next time. I also wonder if we'd be better
using charmed-openstack-tester or something like that, instead of purely
the charm-ceph-mon tests, for validating SRUs?

A few possible lessons for future SRU verificaiton:
- We need to ensure we verify SRUs with all GIT/charmhub branches of the charm 
that support a release. So generally that would be both the matching and newer 
version. It's not sufficient to check with only one of those.
- Thinking more about the charm users that are the majority, I think ideally we 
also need to run both the charmhub stable AND candidate branches for both of 
those releases. Currently the test bundles use the '/edge' channel (which maps 
to candidate) and would only test the candidate charm, and won't show up if 
we're about to release a package that is broken wtith the stable charms. 
Especially for the latest release of Ceph, due to the Solutions QA process, 
sometimes the stable channel is lagging the edge channel by weeks or even 
months. So this is not unlikely.
- Using the charm tests to verify the Ubuntu package in general has some 
limits, in that it may miss scenarios that would still effect non-charm users. 
I am not proposing we stop using it, but we should be aware of that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2064717

Title:
  ceph-volume needs "packaging" and "ceph" modules

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/2064717/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2064717] Re: ceph-volume needs "packaging" and "ceph" modules

2024-08-26 Thread Trent Lloyd
I discovered this issue myself (for Reef, 18.2.4) today when running the
zaza integration test for charm-glance-simplestreams-sync against jammy-
bobcat.

According to the SRU, the charm-ceph-osd tests were run, and the package
version was verified. The question is, why did those tests not catch
this?

When I run the zaza test for charm-ceph-osd in the stable/reef branch, it also 
fails with the issue. I see the exact same version installed as reported in the 
SRU bug:
juju ssh ceph-osd/0 sudo ceph -v
ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)

So I am really curious to understand why the test passed previously.

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/bobcat
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2064717

Title:
  ceph-volume needs "packaging" and "ceph" modules

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-osd/+bug/2064717/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2062927] Re: Ambiguity in mdns configuration

2024-05-08 Thread Trent Lloyd
It's not possible to correctly run two mDNS stacks at the same time, as
while multicast udp packets can be received by multiple programs, only
one program will receive unicasted port 5353 mDNS replies, even if both
daemons allow multiple-binding to port 5353.

While actually using that feature is not so commonly used intentionally,
it is used sortof by accident by many enterprise wireless network
vendors when they "convert" multicast to unicast as a network
optimisation (because multicast packets are truly multicasted, but at a
"base" network rate much slower than the normal rate of the clients,
which uses up more airtime than sending them all individually at a
higher speed).

Hence, we cannot really enable the independent systemd-resolved support
at the same time as actually using Avahi to do proper service discovery,
and you should use the avahi/nss-mdns support instead if you want any
actual mDNS service discovery support.


Ideally resolved would add a backend to use avahi when it exists/is
installed so we could drop the extra nss-mdns step. But no one has
written that code so far.

But I am not sure why you say you cannot disable the systemd-resolved
mDNS support. It's disabled in resolved by default out of the box, and
then when disabled it doesn't bind to the port, so Avahi works fine, and
nss-mdns will work find alongside systemd-resolved. Many people use this
configuration all the time.

So I am curious.. in what specific scenario and configuration are you
seeing it enabled and the port conflict?

On an out of the box install if you run "resolvectl status" you'll see
-mDNS on all the interfaces. Can you detail your configuration more
precisely?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2062927

Title:
  Ambiguity in mdns configuration

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/2062927/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1811255] Re: perf archive missing

2024-03-04 Thread Trent Lloyd
*** This bug is a duplicate of bug 1823281 ***
https://bugs.launchpad.net/bugs/1823281

** This bug has been marked a duplicate of bug 1823281
   perf-archive is not shipped in the linux-tools package

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1811255

Title:
  perf archive missing

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1811255/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1977669] Re: Metadata broken for SR-IOV external ports

2022-06-05 Thread Trent Lloyd
** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1977669

Title:
  Metadata broken for SR-IOV external ports

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1977669/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1977669] Re: Metadata broken for SR-IOV external ports

2022-06-05 Thread Trent Lloyd
** Description changed:

  OpenStack Usurri/OVN SR-IOV instances are unable to connect to the
  metadata service despite DHCP and normal traffic work.
  
  The 169.254.169.254 metadata route is directed at the DHCP port IP, and
  no arp reply is received by the VM for this IP. Diagnosis finds that the
  ARP reply returns from the ovnmeta namespace on the chassis hosting the
  external port but is dropped inside OVS.
  
  20.03.2-0ubuntu0.20.04.2 backported the following patch:
  Do not forward traffic from localport to localnet ports (LP: #1943266)
  (d/p/lp-1943266-physical-do-not-forward-traffic-from-localport-to-a-.patch)
  
- This patch broke metadata for SR-IOV external prots and was fixed in 
1148580290d0ace803f20aeaa0241dd51c100630 "Don't suppress localport traffic 
directed to external port":
+ This patch broke metadata for SR-IOV external ports and was fixed in 
1148580290d0ace803f20aeaa0241dd51c100630 "Don't suppress localport traffic 
directed to external port":
  https://github.com/ovn-org/ovn/commit/1148580290d0ace803f20aeaa0241dd51c100630

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1977669

Title:
  Metadata broken for SR-IOV external ports

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1977669/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1977669] [NEW] Metadata broken for SR-IOV external ports

2022-06-04 Thread Trent Lloyd
Public bug reported:

OpenStack Usurri/OVN SR-IOV instances are unable to connect to the
metadata service despite DHCP and normal traffic work.

The 169.254.169.254 metadata route is directed at the DHCP port IP, and
no arp reply is received by the VM for this IP. Diagnosis finds that the
ARP reply returns from the ovnmeta namespace on the chassis hosting the
external port but is dropped inside OVS.

20.03.2-0ubuntu0.20.04.2 backported the following patch:
Do not forward traffic from localport to localnet ports (LP: #1943266)
(d/p/lp-1943266-physical-do-not-forward-traffic-from-localport-to-a-.patch)

This patch broke metadata for SR-IOV external prots and was fixed in 
1148580290d0ace803f20aeaa0241dd51c100630 "Don't suppress localport traffic 
directed to external port":
https://github.com/ovn-org/ovn/commit/1148580290d0ace803f20aeaa0241dd51c100630

** Affects: ovn (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1977669

Title:
  Metadata broken for SR-IOV external ports

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1977669/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1970453] Re: DMAR: ERROR: DMA PTE for vPFN 0x7bf32 already set

2022-05-11 Thread Trent Lloyd
With regards to the patch here:
https://lists.linuxfoundation.org/pipermail/iommu/2021-October/060115.html

It is mentioned this issue can occur if you are passing through a PCI
device to a virtual machine guest. This patch seems like it never made
it into the kernel. So I am curious if you are using any virtual
machines on this host, and if any of them are mapping PCI devices from
the host in.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1970453

Title:
  DMAR: ERROR: DMA PTE for vPFN 0x7bf32 already set

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970453/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1964445] [NEW] Incorrectly identifies processes inside LXD container on jammy/cgroupsv2

2022-03-09 Thread Trent Lloyd
Public bug reported:

Processes inside of LXD containers are incorrectly identified as needing
a restart on jammy. The cause is that needrestart does not correctly
parse cgroups v2.

Since needrestart is installed in a default install, this is problematic
as it prompts you to restart and actually restarts the host version of a
container's processes unnecessarily.

I have sent an upstream pull request to fix this here, it's a simple fix to the 
regex:
https://github.com/liske/needrestart/pull/238

Upstream also already has a fix to the same for Docker:
https://github.com/liske/needrestart/pull/234

We should patch both of these into Jammy before release. I can send this
patch upstream to Debian also however as they do not currently use
cgroups v2 by default it is not directly affected in a default
configuration (but would be affected if you enable them). Since we are
also close to release this may also need to be expedited.


= Test Case = 

- Install Jammy Server with needrestart installed (the server iso installs it 
by default, cloud/vm/lxd images do not)
- Launch an LXD focal container
- (slightly harder) inside the focal container, upgrade a commonly used library 
such as libc6. To do this you may need to first downgrade libc6, restart 
avahi-daemon, upgrade it again.
- Run "needrestart" on the host and see that the container's avahi-daemon is 
recognised to restart (but it will restart the hosts process, and the next 
invocation will prompt to restart again)

** Affects: needrestart (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1964445

Title:
  Incorrectly identifies processes inside LXD container on
  jammy/cgroupsv2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/needrestart/+bug/1964445/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1958148] Re: mkinitramfs is too slow

2022-02-28 Thread Trent Lloyd
Where is the discussion happening?

I ran the same benchmarks for my i7-6770HQ 4-core system. This really
needs revising.

While disk space using in /boot is a concern, in this example at least
-10 would use only 8MB (10%) more space and cut the time taken from 2m1s
to 13s.

zstd.0 84M 0m2.150s
zstd.1 96M 0m1.236s
zstd.2 90M 0m1.350s
zstd.3 84M 0m2.235s
zstd.4 84M 0m3.355s
zstd.5 81M 0m5.679s
zstd.6 81M 0m7.416s
zstd.7 78M 0m8.857s
zstd.8 77M 0m10.134s
zstd.9 77M 0m11.238s
zstd.10 72M 0m13.232s
zstd.11 72M 0m14.897s
zstd.12 72M 0m19.343s
zstd.13 72M 0m26.327s
zstd.14 72M 0m30.948s
zstd.15 72M 0m40.913s
zstd.16 70M 0m59.517s
zstd.17 66M 1m15.854s
zstd.18 64M 1m36.227s
zstd.19 64M 2m1.417s

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1958148

Title:
  mkinitramfs is too slow

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1958148/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2022-01-12 Thread Trent Lloyd
Re-installing from scratch should resolve the issue. I suspect in most
cases if you install with the 21.10 installer (even though it has the
old kernel) as long as you install updates during the install this issue
probably won't hit you. It mostly seems to occur after a reboot and it's
loading data back from disk again.

As per some of the other comments you'll have a bit of a hard time
copying data off the old broken install.. you need to work through which
files/folders are corrupt and reboot and then exclude those from the
next rsync.


You could use the 22.04 daily build, it will eventually upgrade into the final 
release. However not usually recommended as there may be bugs or other problems 
in those daily images and/or it's not uncommon for the development release to 
sometimes break during the development cycle. Most of the time it doesn't and 
it usually works most of the time, but it's much more likely than using 21.10.

I'd try a re-install with 21.10 as I described. Obviously you'll need to
backup all of your data from the existing install first.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1077796] Re: /bin/kill no longer works with negative PID

2021-12-16 Thread Trent Lloyd
Most shells (including bash, zsh) have a built-in for kill so it's done
internally. Some shells don't so it executes /bin/kill instead which has
this issue.

One comment noted this was fixed at some point in 2013 in version 3.3.4
but it apparently broke again at some point and is broken at least in
20.04 Focal's v3.3.16.

This was recently fixed again upstream here:
https://gitlab.com/procps-ng/procps/-/merge_requests/77

Upstream v3.3.16 (in 20.04 Focal and 20.10 Hirsute) was released Dec
2019 without this fix. That fix was submitted upstream 3 years ago but
only merged 11 months ago and was included in the v3.3.17 release which
was made in Feb 2021 so not included in 20.04 Focal. 3.3.17 with the fix
is already in 21.10 Impish.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1077796

Title:
  /bin/kill no longer works with negative PID

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1077796/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1952496] Re: ubuntu 20.04 LTS network problem

2021-11-29 Thread Trent Lloyd
Thanks for the data. I can see you queried 'steven-ubuntu.local' and
that looks like the hostname of the local machine. Can you also query
the hostname of the AFP server you are trying to connect to (using both
getent hosts and avahi-resolve-host-name).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1952496

Title:
  ubuntu 20.04 LTS network problem

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1952496/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1952496] Re: ubuntu 20.04 LTS network problem

2021-11-28 Thread Trent Lloyd
As a side note, it may be time to switch to a new protocol. As even
Apple has dropped support for sharing AFP versions in the last few
releases and is deprecating it's usage. You can use Samba to do SMBFS
including the extra special apple stuff if you need timemachine support
etc on your NAS

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1952496

Title:
  ubuntu 20.04 LTS network problem

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1952496/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1952496] Re: ubuntu 20.04 LTS network problem

2021-11-28 Thread Trent Lloyd
To assist with this can you get the following outputs from the broken
system:

# Change 'hostname.local' to the hostname expected to work

cat /etc/nsswitch.conf

systemctl status avahi-daemon

journalctl -u avahi-daemon --boot

avahi-resolve-host-name hostname.local

getent hosts hostname.local

** Changed in: avahi (Ubuntu)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1952496

Title:
  ubuntu 20.04 LTS network problem

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1952496/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1339518] Re: sudo config file specifies group "admin" that doesn't exist in system

2021-11-17 Thread Trent Lloyd
Subscribing Marc as he seems to be largely maintaining this and made the
original changes and has been keeping the delta. Hopefully he can
provide some insight.

Seems this is a delta to Debian that is being kept intentionally for a
long time, it's frequently in the changelog even in the most recent
Debian merge.

I'd have thought if we kept this in here by default we probably should
have kept a default 'admin' group with no members but it's a bit late
for that at this point.

- debian/sudoers:
 + also grant admin group sudo access

Also seems this change was originally made in 2014:

sudo (1.8.9p5-1ubuntu3) vivid; urgency=medium

  * debian/patches/also_check_sudo_group.diff: also check the sudo group
in plugins/sudoers/sudoers.c to create the admin flag file. Leave the
admin group check for backwards compatibility. (LP: #1387347)

 -- Marc Deslauriers   Wed, 29 Oct 2014
15:55:34 -0400

sudo (1.8.9p5-1ubuntu2) utopic; urgency=medium

  * debian/sudo_root.8: mention sudo group instead of deprecated group
admin (LP: #1130643)

 -- Andrey Bondarenko   Sat, 23 Aug
2014 01:18:05 +0600

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1339518

Title:
  sudo config file specifies group "admin" that doesn't exist in system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sudo/+bug/1339518/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1339518] Re: sudo config file specifies group "admin" that doesn't exist in system

2021-11-17 Thread Trent Lloyd
Just noticed this today, it's still the same on Ubuntu 20.04. The
default sudoers file ships the admin group having sudo privileges but
the group doesn't exist by default.

While it doesn't have out of the box security implications, I think this
is a security concern as someone could potentially add an 'admin' user
and not expect them to get sudo access with the default matching group
name created for them.

For example downstream products like web hosting or control panel style
tools that creates users with a user-provided name. Since neither the
user or group 'admin' exists by default they could be fooled into
creating escalatable privileges.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1339518

Title:
  sudo config file specifies group "admin" that doesn't exist in system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sudo/+bug/1339518/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931660] Re: PANIC at zfs_znode.c:339:zfs_znode_sa_init()

2021-10-15 Thread Trent Lloyd
This looks like a duplicate of this:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931660

Title:
  PANIC at zfs_znode.c:339:zfs_znode_sa_init()

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931660/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-09-27 Thread Trent Lloyd
In a related way say you wanted to recover a system from a boot disk,
and copy all the data off to another disk. If you use a sequential file
copy like from tar/cp in verbose mode and watch it, eventaully it will
hang on the file triggering the issue (watch dmesg/kern.log). Once that
happens, move that file into a directory like /broken which you exclude
from tar/cp, reboot to get back into a working state, then start the
copy again. Basically what I did incrementally to find all the broken
files. Fortunately for me they were mostly inside chrome or electron app
dirs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-09-27 Thread Trent Lloyd
So to be clear this patch revert fixes the issue being caused new, but,
if the issue already happened on your filesystem it will continue to
occur because the exception is reporting corruption on disk. I don't
currently have a good fix for this other than to move the affected files
to a directory you don't use (but it's sometimes tricky to figure out
which files are the cause).

For dkms status you could try check ls -la /proc/$(pidof dkms)/fd to see
what file it opened, or strace it, to try figure out what file it's up
to when it hangs. then move that file or directory out of the way and
then replace them.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-09-26 Thread Trent Lloyd
Have created a 100% reliable reproducer test case and also determined
the Ubuntu-specific patch 4701-enable-ARC-FILL-LOCKED-flag.patch to fix
Bug #1900889 is likely the cause.

[Test Case]

The important parts are:
- Use encryption
- rsync the zfs git tree
- Use parallel I/O from silversearcher-ag to access it after a reboot. A simple 
"find ." or "find . -exec cat {} > /dev/null \;" does not reproduce the issue.

Reproduction done using a libvirt VM installed from the Ubuntu Impish
daily livecd using a normal ext4 root but with a second 4GB /dev/vdb
disk for zfs later

= Preparation
apt install silversearcher-ag git zfs-dkms zfsutils-linux
echo -n testkey2 > /root/testkey
git clone https://github.com/openzfs/zfs /root/zfs

= Test Execution
zpool create test /dev/vdb
zfs create test/test -o encryption=on -o keyformat=passphrase -o 
keylocation=file:///root/testkey
rsync -va --progress -HAX /root/zfs/ /test/test/zfs/

# If you access the data now it works fine.
reboot

zfs load-key test/test
zfs mount -a
cd /test/test/zfs/
ag DISKS= 

= Test Result
ag hangs, "sudo dmesg" shows an exception

[Analysis]
I rebuilt the zfs-linux 2.0.6-1ubuntu1 package from ppa:colin-king/zfs-impish 
without the Ubuntu-specific patch ubuntu/4701-enable-ARC-FILL-LOCKED-flag.patch 
which fixed Bug #1900889. With this patch disabled the issue does not 
reproduce. Re-enabling the patch it reproduces reliably every time again.

Seems this bug was never sent upstream. No code changes upstream setting the 
flag ARC_FILL_IN_PLACE appear to have been added since that I can see however 
interestingly the code for this ARC_FILL_IN_PLACE handling was added to fix a 
similar sounding issue "Raw receive fix and encrypted objset security fix" 
 in 
https://github.com/openzfs/zfs/commit/69830602de2d836013a91bd42cc8d36bbebb3aae 
. This first shipped in zfs 0.8.0 and the original bug was filed against 0.8.3.

I also have found the same issue as the original Launchpad bug reported 
upstream without any fixes and a lot of discussion (and quite a few duplicates 
linking back to 11679):
https://github.com/openzfs/zfs/issues/11679
https://github.com/openzfs/zfs/issues/12014

Without fully understanding the ZFS code in relation to this flag, the
code at
https://github.com/openzfs/zfs/blob/ce2bdcedf549b2d83ae9df23a3fa0188b33327b7/module/zfs/arc.c#L2026
talks about how this flag is to do with decrypting blocks in the ARC and
doing so 'inplace'. It makes some sense thus that I need encryption to
reproduce it and it works best after a reboot (thus flushing the ARC)
and why I can still read the data in the test case before doing a reboot
when it then fails.

This patch was added in 0.8.4-1ubuntu15 and I first experienced the
issue somewhere between 0.8.4-1ubuntu11 and 0.8.4-1ubuntu16.

So it all adds up and I suggest that this patch should be reverted.

** Bug watch added: github.com/openzfs/zfs/issues #11679
   https://github.com/openzfs/zfs/issues/11679

** Bug watch added: github.com/openzfs/zfs/issues #12014
   https://github.com/openzfs/zfs/issues/12014

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-09-26 Thread Trent Lloyd
While trying to setup a reproducer that would excercise chrome or wine
or something I stumbled across the following reproducer that worked
twice in a row in a libvirt VM on my machine today.

The general gist is to
(1) Create a zfs filesystem with "-o encryption=aes-256-gcm -o compression=zstd 
-o atime=off -o keyformat=passphrase"
(2) rsync a copy of the openzfs git tree into it
(3) Reboot
(4) Use silversearcher-ag to search the directory for "DISKS="

Precise steps:
mkdir src
cd src
git clone https://github.com/openzfs/zfs
sudo apt install zfsutils-linux zfs-initramfs
sudo zpool create tank /dev/vdb
sudo zfs create tank/lathiat2 -o encryption=aes-256-gcm -o compression=zstd  -o 
atime=off  -o keyformat=passphrase
rsync -va --progress -HAX /etc/skel /tank/lathiat2/; chown -R lathiat:lathiat 
/tank/lathiat2; rsync -va --progress /home/lathiat/src/ /tank/lathiat2/src/; 
chown -R lathiat:lathiat /tank/lathiat2/src/
# reboot
sudo zfs load-key tank/lathiat2
sudo zfs mount -a
cd /tank/lathiat2/src/zfs/
ag DISKS=

Hit on the exact same crash:
[   61.377929] VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, 
SA_HDL_SHARED, &zp->z_sa_hdl)) failed
[   61.377930] PANIC at zfs_znode.c:339:zfs_znode_sa_init()

Now will test this out on the beta 2.0.6 package and also see if the
standard zfs test suite will trigger it or not as a matter of somewhat
curiosity.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-09-23 Thread Trent Lloyd
34 more user reports on the upstream bug of people hitting it on Ubuntu 5.13.0:
https://github.com/openzfs/zfs/issues/10971

I think this needs some priority. It doesn't seem like it's hitting
upstream, for some reason only really hitting on Ubuntu.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-09-05 Thread Trent Lloyd
@Colin To be clear this is the same bug I originally hit and opened the
launchpad for, it just doesn't quite match with what most people saw in
the upstream bugs. But it seemed to get fixed anyway for a while, and
has regressed again somehow.

Same exception as from the original description and others reporting:
2021 May 16 21:19:09 laptop VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, 
zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

The upstream bug mostly reported slightly different errors though
similar symptoms (files get stuck and can't be accessed).

I also tried to use 'zdb' to check if incorrect file modes were saved,
unfortunately it seems zdb does not work for encrypted datasets, it only
dumps the unencrypted block info and doesn't dump info about filemodes
etc from the encrypted part. So I can't check that.

I've reverted back to 5.11.0-25 for now and it's stable again.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1783184] Re: neutron-ovs-cleanup can have unintended side effects

2021-08-31 Thread Trent Lloyd
There is a systemd option that I think will solve this issue.

https://www.freedesktop.org/software/systemd/man/systemd.unit.html#RefuseManualStart=

RefuseManualStart=, RefuseManualStop=
Takes a boolean argument. If true, this unit can only be activated or 
deactivated indirectly. In this case, explicit start-up or termination 
requested by the user is denied, however if it is started or stopped as a 
dependency of another unit, start-up or termination will succeed. This is 
mostly a safety feature to ensure that the user does not accidentally activate 
units that are not intended to be activated explicitly, and not accidentally 
deactivate units that are not intended to be deactivated. These options default 
to false.

As far as I am aware there is rarely/never a good reason to run this 
intentionally. If someone *really* wants to run it, the command is somewhat 
straightforward to run directly:
ExecStart=/usr/bin/neutron-ovs-cleanup --config-file 
/usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf 
--config-file /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini 
--log-file /var/log/neutron/ovs-cleanup.log

There are 2 such services:
neutron-ovs-cleanup.service
neutron-linuxbridge-cleanup.service

See also:
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1885264
(recent work to stop it being run on package upgrade by accident)

And while we're at it, RedHat had a bug where the cleanup script could
take 1-2 minutes on some busy/large hosts, and added "TimeoutSec=0" to
avoid issues related to that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1783184

Title:
  neutron-ovs-cleanup can have unintended side effects

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1783184/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1892242] Re: Curtin doesn't handle type:mount entries without 'path' element

2021-08-27 Thread Trent Lloyd
In terms of understanding when this was fixed for what users/versions.
Assuming that MAAS is copying the curtin version from the server, to the
deployed client, which I think is the case, you need to get an updated
Curtin to the MAAS server.

The bug was fix released into curtin 20.1-20-g1304d3ea-0ubuntu1 in
August 2020.

Curtin has not been updated in bionic itself since May 2020
(20.1-2-g42a9667f-0ubuntu1~18.04.1). So no fix there.

MAAS 2.7 PPA (https://launchpad.net/~maas/+archive/ubuntu/2.7) - No fix

MAAS 2.8 PPA (https://launchpad.net/~maas/+archive/ubuntu/2.8) - Fixed
in 21.2-0ubuntu1~18.04.1 uploaded 1st March 2021 - first and only curtin
upload

MAAS 2.9 PPA (https://launchpad.net/~maas/+archive/ubuntu/2.9) - FIxed
in 21.2-0ubuntu1~20.04.1 uploaded 16th February 2021 - first and only
curtin upload

MAAS 2.8 was released 24 June 2020
MAAS 2.9 was released December 2020
MAAS 3.0 was released 6 July 2021 [Note: only supports 20.04]


So seems like there was a gap from August 2020 to December 2020 where the fix 
possibly wasn't available to MAAS users at all? And then until March 2021 it 
wasn't available to MAAS 2.8 users.

However I don't know what version of MAAS, if any, is consuming curtin
as a snap. And whether that applies to both DEB and SNAP installations
of those given versions.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1892242

Title:
  Curtin doesn't handle type:mount entries without 'path' element

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1892242/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-08-24 Thread Trent Lloyd
I traced the call failure. I found the failing code is in
sa.c:1291#sa_build_index()

if (BSWAP_32(sa_hdr_phys->sa_magic) != SA_MAGIC) {

This code prints debug info to /proc/spl/kstat/zfs/dbgmsg, which for me is:
1629791353   sa.c:1293:sa_build_index(): Buffer Header: cb872954 != 
SA_MAGIC:2f505a object=0x45175e

So in this case seems the data is somehow corrupted, since this is
supposed to be a magic value that is always correct and doesn't change.
Not entirely clear how this actually played into the original bug. So it
may be that this is really a different bug. Hrm.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-08-23 Thread Trent Lloyd
This has re-appeared for me today after upgrading to 5.13.0-14 on
Impish. Same call stack, and same chrome-based applications (Mattermost
was hit first) affected.

Not currently running DKMS, so:

Today:
5.13.0-14-lowlat Tue Aug 24 10:59   still running (zfs module is 2.0.3-8ubuntu6)

Yesterday:
5.11.0-25-lowlat Mon Aug 23 12:52 - 08:05  (19:13) (zfs module is 
2.0.2-1ubuntu5)

I am a bit confused because the patched line "newmode = zp->z_mode;"
still seems present in the package.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-08-11 Thread Trent Lloyd
Try the zfs_recover step from Colin's comment above. And then look for
invalid files and try to move them out of the way.

I'm not aware of encrypted pools being specifically implicated (no such
mention in the bug and it doesn't seem like it), having said that, I am
using encryption on the dataset where I was hitting this.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1827264] Re: ovs-vswitchd thread consuming 100% CPU

2021-06-27 Thread Trent Lloyd
Seems there is a good chance at least some of the people commenting or
affected by this bug are duplicate of Bug #1839592 - essentially a libc6
bug that meant threads weren't woken up when they should have been.
Fixed by libc6 upgrade to 2.27-3ubuntu1.3 in bionic.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1827264

Title:
  ovs-vswitchd thread consuming 100% CPU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-05-20 Thread Trent Lloyd
Are you confident that the issue is a new issue? Unfortunately as best I
can tell, the corruption can occur and then will still appear on a fixed
system if it's reading corruption created in the past that unfortunately
scrub doesn't seem to detect.

I've still had no re-occurance here after a few weeks on hirsute with
2.0.2-1ubuntu5 (which includes the
https://github.com/openzfs/zfs/issues/11474 fix) - but from a fresh
install.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1920640] Re: EXPKEYSIG C8CAB6595FDFF622 Ubuntu Debug Symbol Archive Automatic Signing Key (2016)

2021-03-21 Thread Trent Lloyd
** Changed in: ubuntu-keyring (Ubuntu)
   Importance: Undecided => Critical

** Changed in: ubuntu-keyring (Ubuntu)
   Importance: Critical => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1920640

Title:
  EXPKEYSIG C8CAB6595FDFF622 Ubuntu Debug Symbol Archive Automatic
  Signing Key (2016) 

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-keyring/+bug/1920640/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1920640] Re: EXPKEYSIG C8CAB6595FDFF622 Ubuntu Debug Symbol Archive Automatic Signing Key (2016)

2021-03-21 Thread Trent Lloyd
Updated the following wiki pages:
https://wiki.ubuntu.com/Debug%20Symbol%20Packages
https://wiki.ubuntu.com/DebuggingProgramCrash

With the note:
Note: The GPG key expired on 2021-03-21 and may need updating by either 
upgrading the ubuntu-dbgsym-keyring package or re-running the apt-key command. 
Please see Bug #1920640 for workaround details if that does not work.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1920640

Title:
  EXPKEYSIG C8CAB6595FDFF622 Ubuntu Debug Symbol Archive Automatic
  Signing Key (2016) 

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-keyring/+bug/1920640/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1920640] Re: EXPKEYSIG C8CAB6595FDFF622 Ubuntu Debug Symbol Archive Automatic Signing Key (2016)

2021-03-21 Thread Trent Lloyd
Just to make the current status clear from what I can gather:

- The GPG key was extended by 1 year to 2022-03-21

- On Ubuntu Bionic (18.04) and newer the GPG key is normally installed
by the ubuntu-dbgsym-keyring package (on 18.04 Bionic onwards). This
package is not yet updated. An update to this package is required and
still pending.

- On Ubuntu Xenial (16.04) users typically imported the key from
keyserver.ubuntu.com. As that is not yet updated, you will need to
import the key from HTTP using the workaround below which will work as a
temporary workaround on all Ubuntu releases. Once keyserver.ubuntu.com
is updated, you could also use "sudo apt-key adv --keyserver
keyserver.ubuntu.com --recv-keys
F2EDC64DC5AEE1F6B9C621F0C8CAB6595FDFF622"

- The updated GPG key is not currently published to keyserver.ubuntu.com

- The updated GPG key is available at http://ddebs.ubuntu.com/dbgsym-
release-key.asc

- As a workaround you can import that key to apt using "wget -O -
http://ddebs.ubuntu.com/dbgsym-release-key.asc | sudo apt-key add -"
(note: you need a space between the -O and -, contrary to the previously
pasted comment)

- I believe that the key likely needs to be extended longer and
published to all resources including the ubuntu-dbgsym-keyring package
and keyserver.ubuntu.com

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1920640

Title:
  EXPKEYSIG C8CAB6595FDFF622 Ubuntu Debug Symbol Archive Automatic
  Signing Key (2016) 

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ubuntu-keyring/+bug/1920640/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-03-20 Thread Trent Lloyd
I got another couple of days out of it without issue - so I think it's
likely fixed.

It seems like this issue looks very similar to the following upstream bug, same 
behaviour but a different error, and so I wonder if it was ultimately the same 
bug. Looks like this patch from 2.0.3 was pulled into the package?
https://github.com/openzfs/zfs/issues/11621
https://github.com/openzfs/zfs/issues/11474
https://github.com/openzfs/zfs/pull/11576

Further testing has been hampered as zsys deleted all of my home
datasets entirely (including all snapshots) - tracked in
https://github.com/ubuntu/zsys/issues/196 - using a non-zfs boot until I
finish recovering that - but still seems likely fixed as I was hitting
it most days before.


** Bug watch added: github.com/openzfs/zfs/issues #11621
   https://github.com/openzfs/zfs/issues/11621

** Bug watch added: github.com/openzfs/zfs/issues #11474
   https://github.com/openzfs/zfs/issues/11474

** Bug watch added: github.com/ubuntu/zsys/issues #196
   https://github.com/ubuntu/zsys/issues/196

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-03-19 Thread Trent Lloyd
I have specifically verified that this bug (vlan traffic interruption
during restart when rabbitmq is down) is fixed by the package in bionic-
proposed. Followed my reproduction steps per the Test Case and all
traffic to instances stops on 12.1.1-0ubuntu3 and does not stop on
12.1.1-0ubuntu4

But not completing verification yet as we need to perform more general
testing on the package for regressions etc

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1869808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894843] Re: [dvr_snat] Router update deletes rfp interface from qrouter even when VM port is present on this host

2021-03-16 Thread Trent Lloyd
When using DVR-SNAT, a simple neutron-l3-agent gateway restart triggers
this issue.

Reproduction Note: Nodes with an ACTIVE or BACKUP (in the case of L3HA)
router for the network are not affected by this issue, so a small 1-6
node environment may make this difficult to reproduce or only affect
half of the nodes (e.g. 3/6 nodes if you have L3HA).

Workaround: for each compute node, you need to create a new VM on each
network. While registering the new VM port it will cause the missing
fpr/rfp interface pair to be created and paired. It does not seem
possible to fix it any other way such as stopping/starting the existing
VM, rebooting the host, etc.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894843

Title:
  [dvr_snat] Router update deletes rfp interface from qrouter even when
  VM port is present on this host

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1894843/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-03-15 Thread Trent Lloyd
Looking to get this approved so that we can verify it, as needing this
ideally released by the weekend of March 27th for some maintenance
activity. Is something holding back the approval?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1869808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-03-08 Thread Trent Lloyd
It's worth noting that, as best I can understand, the patches won't fix
an already broken filesystem. You have to remove all of the affected
files, and it's difficult to know exactly what files are affected. I try
to guess based on which show a ??? mark in "ls -la". But sometimes the
"ls" hangs, etc.

I've been running zfs-dkms 2.0.2-1ubuntu2 for 24 hours now and so far so
good.. won't call it conclusive but hoping this has solved it. Though I
am thoroughly confused as to what patch solved it, nothing *seems*
relevant. Which is frustrating.

Will try to update in a few days as to whether it definitely hasn't hit,
most of the time I hit it in a day but wasn't strictly 100%.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916708] Re: udpif_revalidator crash in ofpbuf_resize__

2021-02-23 Thread Trent Lloyd
E-mailed upstream for assistance:
https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050963.html


** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1916708

Title:
  udpif_revalidator crash in ofpbuf_resize__

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1916708/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1916708] [NEW] udpif_revalidator crash in ofpbuf_resize__

2021-02-23 Thread Trent Lloyd
Public bug reported:

The udpif_revalidator thread crashed in ofpbuf_resize__ on openvswitch
2.9.2-0ubuntu0.18.04.3~cloud0 (on 16.04 from the xenial-queens cloud
archive, backported from the 18.04 release of the same version). Kernel
version was 4.4.0-159-generic.

The issue is suspected to still exist in upstream master as Feb
2021/v2.15.0 but has not been completed understood. Opening this bug to
track future occurances.

The general issue appears to be that the udpif_revaliditator thread tried
to expand a stack-allocated ofpbuf to fit a netlink reply with size 3204
but the buffer is of size 2048. This intentionally raises an assertion as
we can't expand the memory on the stack. 

The crash in __ofpbuf_resize__ appears due to OVS_NOT_REACHED() being
called because b->source = OFPBUF_STACK (the line number indicates it's the
default: case but this appears to be an optimiser quirk, b->source is
OFPBUF_STACK). We can't realloc() the buffer memory if it's allocated on
the stack.

This buffer is provided in #7 nl_sock_transact_multiple__ during the call
to nl_sock_recv__, specified as buf_txn->reply. In this specific case it
seems we found transactions[0] available and so we used that rather than
tmp_txn.
The original source of transactions (it's passed through most of the
function calls) appears to be op_auxdata allocated on the stack at the top
of the dpif_netlink_operate__ function (dpif-netlink.c:1875).

The size of this particular message was 3204, so 2048 went into the buffer
and 1156 went into the tail iovector setup inside nl_sock_recv__ which it
then tried to expand the ofpbuf to hold. Various nl_sock_* functions have
comments about the buffer ideally being the right size for optimal
performance (I guess to avoid the reallocation), but it seems like a
possible oversight in the dpif_netlink_operate__ workflow that the
nl_sock_* functions may ultimately want to try to expand that buffer and
then fail because of the stack allocation.

The relevant source tree can be found here:
git clone -b applied/2.9.2-0ubuntu0.18.04.3
https://git.launchpad.net/ubuntu/+source/openvswitch
https://git.launchpad.net/ubuntu/+source/openvswitch/tree/?h=applied/2.9.2-0ubuntu0.18.04.3

Thread 1 (Thread 0x7f3e0700 (LWP 1539131)):
#0  0x7f3ed30c8428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x7f3ed30ca02a in __GI_abort () at abort.c:89
#2  0x004e5035 in ofpbuf_resize__ (b=b@entry=0x7f3e0fffb050, 
new_headroom=, new_tailroom=new_tailroom@entry=1156) at 
../lib/ofpbuf.c:262
#3  0x004e5338 in ofpbuf_prealloc_tailroom (b=b@entry=0x7f3e0fffb050, 
size=size@entry=1156) at ../lib/ofpbuf.c:291
#4  0x004e54e5 in ofpbuf_put_uninit (size=size@entry=1156, 
b=b@entry=0x7f3e0fffb050) at ../lib/ofpbuf.c:365
#5  ofpbuf_put (b=b@entry=0x7f3e0fffb050, p=p@entry=0x7f3e0ffcf0a0, 
size=size@entry=1156) at ../lib/ofpbuf.c:388
#6  0x005392a6 in nl_sock_recv__ (sock=sock@entry=0x7f3e50009150, 
buf=0x7f3e0fffb050, wait=wait@entry=false) at ../lib/netlink-socket.c:705
#7  0x00539474 in nl_sock_transact_multiple__ 
(sock=sock@entry=0x7f3e50009150, 
transactions=transactions@entry=0x7f3e0ffdff20, n=1, 
done=done@entry=0x7f3e0ffdfe10) at ../lib/netlink-socket.c:824
#8  0x0053980a in nl_sock_transact_multiple (sock=0x7f3e50009150, 
transactions=transactions@entry=0x7f3e0ffdff20, n=n@entry=1) at 
../lib/netlink-socket.c:1009
#9  0x0053aa1b in nl_sock_transact_multiple (n=1, 
transactions=0x7f3e0ffdff20, sock=) at 
../lib/netlink-socket.c:1765
#10 nl_transact_multiple (protocol=protocol@entry=16, 
transactions=transactions@entry=0x7f3e0ffdff20, n=n@entry=1) at 
../lib/netlink-socket.c:1764
#11 0x00528b01 in dpif_netlink_operate__ (dpif=dpif@entry=0x25a6150, 
ops=ops@entry=0x7f3e0fffaf28, n_ops=n_ops@entry=1) at ../lib/dpif-netlink.c:1964
#12 0x00529956 in dpif_netlink_operate_chunks (n_ops=1, 
ops=0x7f3e0fffaf28, dpif=) at ../lib/dpif-netlink.c:2243
#13 dpif_netlink_operate (dpif_=0x25a6150, ops=, 
n_ops=) at ../lib/dpif-netlink.c:2279
#14 0x004756de in dpif_operate (dpif=0x25a6150, ops=, 
ops@entry=0x7f3e0fffaf28, n_ops=n_ops@entry=1) at ../lib/dpif.c:1359
#15 0x004758e7 in dpif_flow_get (dpif=, key=, key_len=, ufid=, pmd_id=, 
buf=buf@entry=0x7f3e0fffb050, flow=) at ../lib/dpif.c:1014
#16 0x0043f662 in ukey_create_from_dpif_flow (udpif=0x229cbf0, 
udpif=0x229cbf0, ukey=, flow=0x7f3e0fffc790) at 
../ofproto/ofproto-dpif-upcall.c:1709
#17 ukey_acquire (error=, result=, 
flow=0x7f3e0fffc790, udpif=0x229cbf0) at ../ofproto/ofproto-dpif-upcall.c:1914
#18 revalidate (revalidator=0x250eaa8) at ../ofproto/ofproto-dpif-upcall.c:2473
#19 0x0043f816 in udpif_revalidator (arg=0x250eaa8) at 
../ofproto/ofproto-dpif-upcall.c:913
#20 0x004ea4b4 in ovsthread_wrapper (aux_=) at 
../lib/ovs-thread.c:348
#21 0x7f3ed39756ba in start_thread (arg=0x7f3e0700) at 
pthread_create.c:333
#22 0x7f3ed319a41d in clon

[Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-02-17 Thread Trent Lloyd
Attaching revised SRU patch for Ubuntu Bionic, no code content changes
but fixed the changelog to list all 3 bug numbers correctly.

** Patch added: "neutron SRU patch for Ubuntu Bionic (new version)"
   
https://bugs.launchpad.net/neutron/+bug/1869808/+attachment/5464699/+files/lp1869808-bionic.debdiff

** Patch removed: "debdiff for ubuntu cloud archive (queens)"
   
https://bugs.launchpad.net/neutron/+bug/1869808/+attachment/5464416/+files/lp1869808-queens.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1869808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-02-17 Thread Trent Lloyd
Ubuntu SRU Justification

[Impact]

- When there is a RabbitMQ or neutron-api outage, the neutron-
openvswitch-agent undergoes a "resync" process and temporarily blocks
all VM traffic. This always happens for a short time period (maybe <1
second) but in some high scale environments this lasts for minutes. If
RabbitMQ is down again during the re-sync, traffic will also be blocked
until it can connect which may be for a long period. This also affects
situations where neutron-openvswitch-agent is intentionally restarted
while RabbitMQ is down. Bug #1869808 addresses this issue and Bug
#1887148 is a fix for that fix to prevent network loops during DVR
startup.

- In the same situation, the neutron-l3-agent can delete the L3 router
(Bug #1871850)


[Test Case]

(1) Deploy Openstack Bionic-Queens with DVR and a *VLAN* tenant network
(VXLAN or FLAT will not reproduce the issue). With a standard
deployment, simply enabling DHCP on the ext_net subnet will allow VMs to
be booted directly on the ext_net provider network. "openstack subnet
set --dhcp ext_net and then deploy the VM directly to ext_net"

(2) Deploy a VM to the VLAN network

(3) Start pinging the VM from an external network

(4) Stop all RabbitMQ servers

(5) Restart neutron-openvswitch-agent

(6) Ping traffic should cease and not recover

(7) Start all RabbitMQ servers

(8) Ping traffic will recover after 30-60 seconds


[Where problems could occur]

These patches are all cherry-picked from the upstream stable branches,
and have existed upstream including the stable/queens branch for many
months and in Ubuntu all supported subsequent releases (Stein onwards)
have also had these patches for many months with the exception of
Queens.

There is a chance that not installing these drop flows during startup
could have traffic go somewhere that's not expected when the network is
in a partially setup case, this was the case for DVR and in setups where
more than 1 DVR external network port existed a network loop was
possibly temporarily created. This was already addressed with the
included patch for Bug #1869808. Checked and could not locate any other
merged changes to this drop_port logic that also need to be backported.

[Other Info]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1869808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-02-17 Thread Trent Lloyd
SRU proposed for Ubuntu Bionic + Cloud Archive (Queens) for the following 3 
bugs:
Bug #1869808 reboot neutron-ovs-agent introduces a short interrupt of vlan 
traffic
Bug #1887148 Network loop between physical networks with DVR (Fix for fix to 
Bug #1869808)
Bug #1871850 [L3] existing router resources are partial deleted unexpectedly 
when MQ is gone

SRU is only required for Bionic + Queens Cloud Archive, all other
releases already have these patches.

==
reboot neutron-ovs-agent introduces a short interrupt of vlan traffic
https://bugs.launchpad.net/neutron/+bug/1869808

pike1f4f888ad34d54ec968d9c9f9f80c388f3ca0d12stable/pike [EOL]
queens  131bbc9a53411033cf27664d8f1fd7afc72c57bfstable/queens [Needed]
rocky   cc48edf85cf66277423b0eb52ae6353f8028d2a6stable/rocky [EOL]
stein   6dfc35680fcc885d9ad449ca2b39225fb1bca89814.3.0 [Already done]
train   4f501f405d1c44e00784df8450cbe83129da1ea715.2.0 [Already done]
ussuri  88e70a520acaca37db645c3ef1124df8c7d778d516.1.0 [Already done]
master  90212b12cdf62e92d811997ebba699cab431d69617.0.0 [Already done]

==
[L3] existing router resources are partial deleted unexpectedly when MQ is gone
https://bugs.launchpad.net/neutron/+bug/1871850

queens  ec6c98060d78c97edf6382ede977209f007fdb81stable/queens [Needed]
rocky   5ee377952badd94d08425aab41853916092acd07stable/rocky [EOL]
stein   71f22834f2240834ca591e27a920f9444bac968914.4.0 [Already done]
train   a96ad52c7e57664c63e3675b64718c5a288946fb15.3.0 [Already done]
ussuri  5eeb98cdb51dc0dadd43128d1d0ed7d497606ded16.2.0 [Already done]
master  12b9149e20665d80c11f1ef3d2283e1fa6f3b69317.0.0 [Already done]

==
Network loop between physical networks with DVR (Fix for 1869808)
https://bugs.launchpad.net/neutron/+bug/1887148

pike00466f41d690ca7c7a918bfd861878ef620bbec9stable/pike [EOL]
queens  8a173ec29ac1819c3d28c191814cd1402d272bb9stable/queens [Needed]
rocky   47ec363f5faefd85dfa33223c0087fafb5b9stable/rocky [EOL]
stein   8181c5dbfe799ac6c832ab67b7eab3bcef4098b914.3.1 [Already done]
train   17eded13595b18ab60af5256e0f63c57c370229615.2.0 [Already done]
ussuri  143fe8ff89ba776618ed6291af9d5e28e4662bdb16.1.0 [Already done]
master  c1a77ef8b74bb9b5abbc5cb03fb3201383122eb817.0.0 [Already done]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1869808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-02-02 Thread Trent Lloyd
I can confirm 100% this bug is still happening with 2.0.1 from hirsute-
proposed, even with a brand new install, on a different disk (SATA SSD
instead of NVMe Intel Optane 900p SSD), using 2.0.1 inside the installer
and from first boot. I can reproduce it reliably within about 2 hours
just using the desktop with google chrome (after restoring my google
chrome sync, so common set of data and extensions), it always seems to
trigger first on an access from Google Chrome for some reason - that
part is very reliable - but other files can get corrupt or lose access
including git trees and the like.

So I am at a loss to explain the cause given no one outside of Ubuntu
seems to be hitting this. It also, for whatever reason, seems to always
cause my tampermonkey and lastpass extension files to show as corrupt -
but not other extensions - very reliably happens every time.

The only notable change from default is I am using encryption=on with
passphrase for /home/user. I have not tested with encryption off.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-01-23 Thread Trent Lloyd
Using 2.0.1 from hirsute-proposed it seems like I'm still hitting this.
Move and replace .config/google-chrome and seems after using it for a
day, shutdown, boot up, same issue again.

Going to see if I can somehow try to reproduce this on a different disk
or in a VM with xfstests or something.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-01-17 Thread Trent Lloyd
This issue seems to have appeared somewhere between zfs-linux
0.8.4-1ubuntu11 (last known working version) and 0.8.4-1ubuntu16.

When the issue first hit, I had zfs-dkms installed, which was on
0.8.4-1ubuntu16 where as the kernel build had 0.8.4-1ubuntu11. I removed
zfs-dkms to go back to the kernel built version and it was working OK.
linux-image-5.8.0-36-generic is now released on Hirsute with
0.8.4-1ubuntu16 and so now the out of the box kernel is also broken and
I am regularly having problems with this.

linux-image-5.8.0-29-generic: working
linux-image-5.8.0-36-generic: broken

`
lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo 
/lib/modules/5.8.0-29-generic/kernel/zfs/zfs.ko|grep version
version: 0.8.4-1ubuntu11

lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo 
/lib/modules/5.8.0-36-generic/kernel/zfs/zfs.ko|grep version
version: 0.8.4-1ubuntu16
`

I don't have a good quick/easy reproducer but just using my desktop for
a day or two seems I am likely to hit the issue after a while.

I tried to install the upstream zfs-dkms package for 2.0 to see if I can
bisect the issue on upstream versions but it breaks my boot for some
weird systemd reason I cannot quite figure out as yet.

Looking at the Ubuntu changelog I'd say the fix for
https://bugs.launchpad.net/bugs/1899826 that landed in 0.8.4-1ubuntu13
to backport the 5.9 and 5.10 compataibility patches is a prime suspect
but could also be any other version. I'm going to try and 'bisect'
0.8.4-1ubuntu11 through 0.8.4-1ubuntu16 to figure out which version
actually hit it.

Since the default kernel is now hitting this, there have been 2 more
user reports of the same things in the upstream bug in the past few days
since that kernel landed and I am regularly getting inaccessible files
not just from chrome but even a linux git tree among other things I am
going to raise the priority on this bug to Critical as you lose access
to files so has data loss potential. I have not yet determined if you
can somehow get the data back, so far it's only affected files I can
replace such as cache/git files. It seems like snapshots might be OK
(which would make sense).

** Changed in: zfs-linux (Ubuntu)
   Importance: High => Critical

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899826] Re: backport upstream fixes for 5.9 Linux support

2021-01-17 Thread Trent Lloyd
Accidentally posted the above comment in the wrong bug, sorry, was meant
for https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476 -
where I suspect this bug as having caused a regression.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899826

Title:
  backport upstream fixes for 5.9 Linux support

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899826/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1899826] Re: backport upstream fixes for 5.9 Linux support

2021-01-17 Thread Trent Lloyd
This issue seems to have appeared somewhere between zfs-linux
0.8.4-1ubuntu11 (last known working version) and 0.8.4-1ubuntu16.

When the issue first hit, I had zfs-dkms installed, which was on
0.8.4-1ubuntu16 where as the kernel build had 0.8.4-1ubuntu11. I removed
zfs-dkms to go back to the kernel built version and it was working OK.
linux-image-5.8.0-36-generic is now released on Hirsute with
0.8.4-1ubuntu16 and so now the out of the box kernel is also broken, and
I am regularly having problems with this.

linux-image-5.8.0-29-generic: working
linux-image-5.8.0-36-generic: broken

`
lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo 
/lib/modules/5.8.0-29-generic/kernel/zfs/zfs.ko|grep version
version: 0.8.4-1ubuntu11

lathiat@optane ~/src/zfs[zfs-2.0-release]$ sudo modinfo 
/lib/modules/5.8.0-36-generic/kernel/zfs/zfs.ko|grep version
version: 0.8.4-1ubuntu16
`

I don't have a good quick/easy reproducer but just using my desktop for
a day or two seems I am likely to hit the issue after a while.

I tried to install the upstream zfs-dkms package for 2.0 to see if I can
bisect the issue on upstream versions but it breaks my boot for some
reason I cannot quite figure out.

Looking at the Ubuntu changelog I'd say the fix for
https://bugs.launchpad.net/bugs/1899826 that landed in 0.8.4-1ubuntu13
to backport the 5.9 and 5.10 compataibility patches is a prime suspect
but could also be any other version. I'm going to try and 'bisect'
0.8.4-1ubuntu11 through 0.8.4-1ubuntu16 to figure out which version
actually hit it.

Since the default kernel is now hitting this, there have been 2 more
user reports of the same things in the upstream bug in the past few days
since that kernel landed, and I am regularly getting inaccessible files
not just from chrome but even a linux git tree among other things I am
going to raise the priority on this bug to Critical as you lose access
to files so has data loss potential. I have not yet determined if you
can somehow get the data back, so far it's only affected files I can
replace such as cache/git files. It seems like snapshots might be OK
(which would make sense).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899826

Title:
  backport upstream fixes for 5.9 Linux support

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899826/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-01-14 Thread Trent Lloyd
Another user report here:
https://github.com/openzfs/zfs/issues/10971

Curiously I found a 2016(??) report of similar here:
https://bbs.archlinux.org/viewtopic.php?id=217204

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2021-01-14 Thread Trent Lloyd
I hit this problem again today, but now without zfs-dkms. After
upgrading my kernel from initrd.img-5.8.0-29-generic to 5.8.0-36-generic
my Google Chrome Cache directory is broken again, had to rename it and
then reboot to get out of the problem.

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => High

** Bug watch added: github.com/openzfs/zfs/issues #10971
   https://github.com/openzfs/zfs/issues/10971

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1874939] Re: ceph-osd can't connect after upgrade to focal

2020-12-09 Thread Trent Lloyd
This issue appears to be documented here:
https://docs.ceph.com/en/latest/releases/nautilus/#instructions

Complete the upgrade by disallowing pre-Nautilus OSDs and enabling all
new Nautilus-only functionality:

# ceph osd require-osd-release nautilus
Important This step is mandatory. Failure to execute this step will make it 
impossible for OSDs to communicate after msgrv2 is enabled.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874939

Title:
  ceph-osd can't connect after upgrade to focal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1874939/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Trent Lloyd
** Attachment added: "blktrace-lp1907262.tar.gz"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+attachment/5442212/+files/blktrace-lp1907262.tar.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Trent Lloyd
I can reproduce this on a Google Cloud n1-standard-16 using 2x Local
NVMe disks. Then partition nvme0n1 and nvne0n2 with only an 8GB
partition, then format directly with ext4 (skip LVM).

In this setup each 'check' takes <1 min so speeds up testing
considerably. Example details - seems pre-emptible instance cost for
this is $0.292/hour / $7/day.

gcloud compute instances create raid10-test --project=juju2-157804 \
--zone=us-west1-b \
--machine-type=n1-standard-16 \
--subnet=default \
--network-tier=STANDARD \
--no-restart-on-failure \
--maintenance-policy=TERMINATE \
--preemptible \
--boot-disk-size=32GB \
--boot-disk-type=pd-ssd \
--image=ubuntu-1804-bionic-v20201116 --image-project=ubuntu-os-cloud \
--local-ssd=interface=NVME  --local-ssd=interface=NVME

# apt install linux-image-virtual
# apt-get remove linux-image-gcp linux-image-5.4.0-1029-gcp 
linux-image-unsigned-5.4.0-1029-gcp   --purge
# reboot

sgdisk -n 0:0:+8G /dev/nvme0n1
sgdisk -n 0:0:+8G /dev/nvme0n2
mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2
mkfs.ext4 /dev/md0
mount /dev/md0 /mnt
dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M; sync; rm /mnt/data.raw
echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat 
/sys/block/md0/md/mismatch_cnt' # no mismatch
fstrim -v /mnt
echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat 
/sys/block/md0/md/mismatch_cnt' # mismatch=256

I ran blktrace /dev/md0 /dev/nvme0n1 /dev/nvme0n2 and will upload the
results I didn't have time to try and understand the results as yet.

Some thoughts
 - It was asserted that the first disk 'appears' fine
 - So I wondered can we reliably repair by asking mdadm to do a 'repair' or 
'resync'
 - It seems that reads are at least sometimes balanced (maybe by PID) to 
different disks since this post.. 
https://www.spinics.net/lists/raid/msg62762.html - unclear if the same 
selection impacts writes (not that it would help performance)
 - So it's unclear we can reliably say only a 'passive mirror' is being 
corrupted, it's possible application reads may or may not be corrupted. More 
testing/understanding of the code required.
 - This area of RAID10 and RAID1 seems quite under-documented, "man md" doesn't 
talk much about how or which disk is used to repair the other if there is a 
mismatch (unlike RAID5 where the parity gives us some assurances as to which 
data is wrong).
 - We should try writes from different PIDs, with known different data, and 
compare the data on both disks with the known data to see if we can knowingly 
get the wrong data on both disks or only one. And try that with 4 disks instead 
of 2.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] Re: PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2020-12-01 Thread Trent Lloyd
Should mention that Chrome itself always showed "waiting for cache" part
of backing up the story around the cache files.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1906476] [NEW] PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed

2020-12-01 Thread Trent Lloyd
Public bug reported:

Since today while running Ubuntu 21.04 Hirsute I started getting a ZFS
panic in the kernel log which was also hanging Disk I/O for all
Chrome/Electron Apps.

I have narrowed down a few important notes:
- It does not happen with module version 0.8.4-1ubuntu11 built and included 
with 5.8.0-29-generic

- It was happening when using zfs-dkms 0.8.4-1ubuntu16 built with DKMS
on the same kernel and also on 5.8.18-acso (a custom kernel).

- For whatever reason multiple Chrome/Electron apps were affected,
specifically Discord, Chrome and Mattermost. In all cases they seem (but
I was unable to strace the processes so it was a bit hard ot confirm
100% but by deduction from /proc/PID/fd and the hanging ls) they seem
hung trying to open files in their 'Cache' directory, e.g.  ~/.cache
/google-chrome/Default/Cache and ~/.config/Mattermost/Cache .. while the
issue was going on I could not list that directory either "ls" would
just hang.

- Once I removed zfs-dkms only to revert to the kernel built-in version
it immediately worked without changing anything, removing files, etc.

- It happened over multiple reboots and kernels every time, all my
Chrome apps weren't working but for whatever reason nothing else seemed
affected.

- It would log a series of spl_panic dumps into kern.log that look like this:
Dec  2 12:36:42 optane kernel: [   72.857033] VERIFY(0 == 
sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) 
failed
Dec  2 12:36:42 optane kernel: [   72.857036] PANIC at 
zfs_znode.c:335:zfs_znode_sa_init()

I could only find one other google reference to this issue, with 2 other users 
reporting the same error but on 20.04 here:
https://github.com/openzfs/zfs/issues/10971

- I was not experiencing the issue on 0.8.4-1ubuntu14 and fairly sure it
was working on 0.8.4-1ubuntu15 but broken after upgrade to
0.8.4-1ubuntu16. I will reinstall those zfs-dkms versions to verify
that.

There were a few originating call stacks but the first one I hit was

Call Trace:
 dump_stack+0x74/0x95
 spl_dumpstack+0x29/0x2b [spl]
 spl_panic+0xd4/0xfc [spl]
 ? sa_cache_constructor+0x27/0x50 [zfs]
 ? _cond_resched+0x19/0x40
 ? mutex_lock+0x12/0x40
 ? dmu_buf_set_user_ie+0x54/0x80 [zfs]
 zfs_znode_sa_init+0xe0/0xf0 [zfs]
 zfs_znode_alloc+0x101/0x700 [zfs]
 ? arc_buf_fill+0x270/0xd30 [zfs]
 ? __cv_init+0x42/0x60 [spl]
 ? dnode_cons+0x28f/0x2a0 [zfs]
 ? _cond_resched+0x19/0x40
 ? _cond_resched+0x19/0x40
 ? mutex_lock+0x12/0x40
 ? aggsum_add+0x153/0x170 [zfs]
 ? spl_kmem_alloc_impl+0xd8/0x110 [spl]
 ? arc_space_consume+0x54/0xe0 [zfs]
 ? dbuf_read+0x4a0/0xb50 [zfs]
 ? _cond_resched+0x19/0x40
 ? mutex_lock+0x12/0x40
 ? dnode_rele_and_unlock+0x5a/0xc0 [zfs]
 ? _cond_resched+0x19/0x40
 ? mutex_lock+0x12/0x40
 ? dmu_object_info_from_dnode+0x84/0xb0 [zfs]
 zfs_zget+0x1c3/0x270 [zfs]
 ? dmu_buf_rele+0x3a/0x40 [zfs]
 zfs_dirent_lock+0x349/0x680 [zfs]
 zfs_dirlook+0x90/0x2a0 [zfs]
 ? zfs_zaccess+0x10c/0x480 [zfs]
 zfs_lookup+0x202/0x3b0 [zfs]
 zpl_lookup+0xca/0x1e0 [zfs]
 path_openat+0x6a2/0xfe0
 do_filp_open+0x9b/0x110
 ? __check_object_size+0xdb/0x1b0
 ? __alloc_fd+0x46/0x170
 do_sys_openat2+0x217/0x2d0
 ? do_sys_openat2+0x217/0x2d0
 do_sys_open+0x59/0x80
 __x64_sys_openat+0x20/0x30

** Affects: zfs-linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1906476

Title:
  PANIC at zfs_znode.c:335:zfs_znode_sa_init() // VERIFY(0 ==
  sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED,
  &zp->z_sa_hdl)) failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1906476/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1847361] Re: Upgrade of qemu binaries causes running instances not able to dynamically load modules

2020-11-30 Thread Trent Lloyd
Note: This patch has related regressions in Hirsute due to the version number 
containing a space:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1906245
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1905377

Seems the patch is temporarily dropped will need to ensure we don't
totally lose the fix

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1847361

Title:
  Upgrade of qemu binaries causes running instances not able to
  dynamically load modules

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1847361/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1902351] Re: forgets touchpad settings

2020-11-24 Thread Trent Lloyd
I am experiencing this as well, it worked on 20.04 Focal and is broken
on 20.10 Groovy and 21.04 Hirsute as of today with the latest Hirsute
packages.

I am using GNOME with a Logitech T650 touchpad. If I unplug and replug
the receiver it forgets again. I then have to toggle both natural
scrolling (Settings->Touchpad) and "mouse click emulations" (Tweaks)
each to have it work again.

Given this is apparently common accross GNOME and KDE perhaps it is
somehow related to libinput rather than gnome-shell/settings/etc?

** Also affects: libinput (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902351

Title:
  forgets touchpad settings

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kcm-touchpad/+bug/1902351/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: pacemaker left stopped after unattended-upgrade of pacemaker (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

2020-11-16 Thread Trent Lloyd
For clarity my findings so far are that:
 - The package upgrade stops pacemaker

 - After 30 seconds (customised down from 30min by charm-hacluster), the
stop times out and pretends to have finished, but leaves pacemaker
running (due to SendSIGKILL=no in the .service intentionally set
upstream to prevent fencing)

 - Pacemaker is started again, but fails to start because the old copy
is still running, so exits and the systemd service is left 'stopped'

 - The original "unmanaged" pacemaker copy eventually exits sometimes
later (usually once the resources all transitioned away) leaving no
running pacemaker at all

Compounding this issue is that:
 - Pacemaker won't stop until it confirms all local services have stopped and 
transitioned away to other nodes (and possibly that it won't destory quorum by 
going down, but I am not sure about that bit) - in some cases this just takes 
more than 30 seconds in other cases the cluster may be in such a state that it 
will never happen, e.g. another node was already down or trying to shutdown.

 - All unattended-upgrades happen within a randomized 60 minute window
(apt-daily-upgrade.timer), and they all just try to stop pacemaker
without regard to whether that is possible or likely to succeed - after
a while all 3 will be attempting to stop so none of them would succeed.

Current Thoughts:
 - Adjust the charm-hacluster StopTimeout=30 back to some value (possibly the 
default) after testing this does not break the charm from doing 
deploy/scale-up/scale-down [as noted in previous bugs where it was originally 
added, but the original case was supposedly fixed by adding the cluster_count 
option].

 - Consider whether we need to override SendSigKILL in the charm -
changing it as a global package default seems like a bad idea

 - Research an improvement to the pacemaker dpkg scripts to do something
smarter than just running stop, for example the preinst script could ask
for a transition away without actually running stop on pacemaker and/or
abort the upgrade if it is obvious that that transition will fail.

 - As a related note, the patch to set BindsTo=corosync on
pacemaker.service was removed in Groovy due to debate with Debian over
this change (but still exists in Xenial-Focal). This is something that
will need to be dealt with for the next LTS. This override should
probably be added to charm-hacluster at a minimum.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  pacemaker left stopped after unattended-upgrade of pacemaker
  (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: pacemaker left stopped after unattended-upgrade of pacemaker (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

2020-11-16 Thread Trent Lloyd
With regards to Billy's Comment #18, my analysis for that bionic
sosreport is in Comment #8 where I found that specific sosreport didn't
experience this issue - but I found most likely that node was suffering
from the issue occuring on the MySQL nodes it was connected to - and the
service couldn't connect to MySQL as a result. We'd need the full logs
(sosreport --all-logs) from all related keystone nodes and mysql nodes
in the environment to be sure but I am 95% sure that is the case there.

I think there is some argument to be made to improve the package restart
process for the pacemaker package itself, whoever I am finding based on
the logs here and in a couple of environments I analysed that the
primary problem is specifically related to the reduced StopTimeout set
by charm-hacluster. So I think we should focus on that issue here and if
we decide it makes sense to make improvements to the pacemaker package
process itself that should be opened as a separate bug as I haven't seen
any evidence of that issue in the logs here so far.

For anyone else experiencing this bug, please take a *full* copy of
/var/log (or sosreport --all-logs) from -all- nodes in that specific
pacemaker cluster and upload them and I am happy to analyse them - if
you need a non-public location to share the files feel free to e-mail
them to me. It would be great to receive that from any nodes already
recovered so we can ensure we fully understand all the cases that
happened.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  pacemaker left stopped after unattended-upgrade of pacemaker
  (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

2020-11-12 Thread Trent Lloyd
** Changed in: charm-hacluster
   Status: New => Confirmed

** Changed in: pacemaker (Ubuntu)
   Status: Confirmed => Invalid

** Summary changed:

- upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters
+ pacemaker left stopped after unattended-upgrade of pacemaker 
(1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

** Description changed:

  On several machines running pacemaker with corosync, after the package was 
upgraded by unattended-upgrades, the VIPs were gone.
  Restarting pacemaker and corosync didn't help, because some processes (lrmd) 
remained after the stop.
  Manually killing them allowed to restart in a good shape.
  
- This is on Ubuntu xenial.
+ This is on Ubuntu xenial (EDIT: and bionic)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  pacemaker left stopped after unattended-upgrade of pacemaker
  (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

2020-11-12 Thread Trent Lloyd
For the fix to Bug #1654403 charm-hacluster sets TimeoutStartSec and
TimeoutStopSync for both corosync and pacemaker, to the same value.

system-wide default (xenial, bionic): TimeoutStopSec=90s TimeoutStartSec=90s
corosync package default: system-wide default (no changes)
pacemaker package default: TimeoutStopSec=30min TimeoutStartSec=60s

charm-hacluster corosync+pacemaker override: TimeoutStopSec=60s
TimeoutStartSec=180s

effective changes:
corosync TimeoutStopSec=90s -> 60sTimeoutStartSec=90s -> 180s
pacemaker TimeoutStopSec=30min -> 60s TimeoutStartSec=60s -> 180s

The original bug description was "On corosync restart, corosync may take
longer than a minute to come up. The systemd start script times out too
soon. Then pacemaker which is dependent on corosync is immediatly
started and fails as corosync is still in the process of starting."

So the TimeoutStartSec increase from 60/90 -> 180 was the only thing
needed. I believe the TimeoutStopSec change for pacemaker is in error at
least as the bug is described.

Having said that, I can imagine charm failures during deployment or
reconfiguration where it tries to stop pacemaker for various reasons and
it fails to stop fast enough because the resources won't migrate away
(possibly because all the nodes are trying to stop at the same time, as
charm-hacluster doesn't seem to have a staggered change setup) and it
currently restarts corosync to effect changes to the ring. So this may
well have fixed other charm-related problems not really accurately
described in the previous bug - though that bug does specifically
mention cases where the expected cluster_count is not set - in that case
it tries to setup corosync/pacemaker before all 3 nodes are up - which
might get into this scenario. So before we go ahead and change the
stop_timeout back to 30min we probably need to validate various
scenarios for that issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  pacemaker left stopped after unattended-upgrade of pacemaker
  (1.1.14-2ubuntu1.8 -> 1.1.14-2ubuntu1.9)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

2020-11-12 Thread Trent Lloyd
I misread and the systemd unit is native, and it already sets the following 
settings:
SendSIGKILL=no
TimeoutStopSec=30min
TimeoutStartSec=60s

The problem is that most of these failures have been experienced on juju
hacluster charm installations, which are overriding these values

$ cat ./systemd/system/pacemaker.service.d/overrides.conf
[Service]
TimeoutStartSec=180
TimeoutStopSec=60

This was apparently done to fix the following bug:
https://bugs.launchpad.net/charms/+source/hacluster/+bug/1654403

FWIW These values are configurable in charm config options. It seems
this bug needs to be revisited and/or this bug may need to be retargeted
at least in part to charm-hacluster.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

2020-11-11 Thread Trent Lloyd
Analysed the logs for an occurance of this, the problem appears to be
that pacemaker doesn't stop after 1 minute so systemd gives up and just
starts a new instance anyway, noting that all of the existing processes
are left behind.

I am awaiting the extra rotated logs to confirm but from what I can see
basically the new pacemaker fails to start because the old one is still
running, and then the old one eventually exits, leave you with no
instance of pacemaker (which is the state we found it in, pacemaker was
stopped).

06:13:44 systemd[1]: pacemaker.service: State 'stop-sigterm' timed out. 
Skipping SIGKILL.
06:13:44 pacemakerd[427]:   notice: Caught 'Terminated' signal
06:14:44 systemd[1]: pacemaker.service: State 'stop-final-sigterm' timed out. 
Skipping SIGKILL. Entering failed mode.
06:14:44 systemd[1]: pacemaker.service: Failed with result 'timeout'.
06:14:44 systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 445 (cib) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 449 (attrd) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 450 (pengine) 
in control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 451 (crmd) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 427 
(pacemakerd) in control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 447 (stonithd) 
in control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Found left-over process 448 (lrmd) in 
control group while starting unit. Ignoring.
06:14:45 systemd[1]: pacemaker.service: Failed to reset devices.list: Operation 
not permitted
06:14:45 systemd[1]: Started Pacemaker High Availability Cluster Manager.

Likely the solution here is some combination of tweaking the systemd
config to wait longer, force kill if necessary and possibly reap all
processes if it does force a restart. It's not a native systemd unit
though some of this stuff can be tweaked by comments. I'll look a little
further at that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1903745] Re: upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

2020-11-10 Thread Trent Lloyd
I reviewed the sosreports and provide some general analysis below.


[sosreport-juju-machine-2-lxc-1-2020-11-10-tayyude]

I don't see any sign in this log of package upgrades or VIP stop/starts,
I suspect this host may be unrelated.

[sosreport-juju-caae6f-19-lxd-6-20201110230352.tar.xz]

This is a charm-keystone node

Looking at this sosreport my general finding is that everything worked
correctly on this specific host.

unattended-upgrades.log:
We can see the upgrade starts at 2020-11-10 06:17:03 and finishes at 
"2020-11-10 06:17:48"

syslog.1:
Nov 10 06:17:41 juju-caae6f-19-lxd-6 crmd[41203]:   notice: Result of probe 
operation for res_ks_680cfdf_vip on juju-caae6f-19-lxd-6: 7 (not running)
Nov 10 06:19:44 juju-caae6f-19-lxd-6 crmd[41203]:   notice: Result of start 
operation for res_ks_680cfdf_vip on juju-caae6f-19-lxd-6: 0 (ok)

We also see that the VIP moved around to different hosts a few times,
likely as a result of each host successively upgrading. Which makes
sense. I don't see any sign in this log of the mentioned lrmd issue.

[mysql issue]

What we do see however is issues with "Too many connections" from MySQL
in the keystone logs. This generally happens because when the VIP moves
from one host to another, all the old connections are left behind and
just go stale (because the VIP was removed, the traffic for these
connections just disappears and is sent to the new VIP owner which
doens't have those TCP connections) and sit there until wait_timeout is
reached (typically either 180s/3 min or 3600s/1 hour in our deployments)
as the node will never get the TCP reset when the remote end sends it.
The problem happens when it fails *back* to a host it already failed
away from, now many of the connection slots are still used by the stale
connections and you run our of connections if your max_connections limit
is not at least double your normal connection count. This problem will
eventually self resolve once the connections timeout but may take an
hour.

Note that this sosreport is from a keystone node that *also* has charm-
hacluster/corosync/pacemaker but the above discussed mysql issue would
have occurred on the percona mysql nodes. To analyse the number of
failovers we would need to get sosreports from the mysql node(s).

[summary]

I think we have likely 2 potential issues here from what I can see
described so far.

Firstly the networkd issue is likely not related to this specific case,
as that happens specifically when systemd is upgraded and thus networkd
is restarted, that shouldn't have happened here.

(Issue 1) The first is that we hit max_connections due to the multiple
successive MySQL VIP failovers where max_connections is not at least 2x
the steady state connection count. It also seems possible in some cases
the VIP may shift back to the same host a 3rd time by chance and you may
end up needing 3x. I think we could potentially improve that by
modifying the pacemaker resource scripts to kill active connections when
the VIP departs, or, ensuring that you have 2-3x max_connections of the
steady state active connection count. That should go into a new bug
likely against charm-percona-cluster as it ships it's own resource
agent. We could also potentially add a configurable nagios check for
having active connections in excess of 50% of max_connections.

(Issue 2) It was described that pacemaker got into a bad state during
the restart and the lrmd didn't exit, and didn't work correctly until it
was manually killed and restarted. I think we need to get more
logs/sosreports from the nodes that had that specific issue, it sounds
like something that may be a bug specific to a certain scenario or
perhaps the older xenial version [This USN-4623-1 update happened for
all LTS releases, 16.04/18.04/20.04].

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1903745

Title:
  upgrade from 1.1.14-2ubuntu1.8 to 1.1.14-2ubuntu1.9 breaks clusters

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1903745/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1848497] Re: virtio-balloon change breaks migration from qemu prior to 4.0

2020-10-26 Thread Trent Lloyd
I have verified the package for this specific virtio-balloon issue
discussed in this bug only.


Migrating from 3.1+dfsg-2ubuntu3.2~cloud0
- To the latest released version (3.1+dfsg-2ubuntu3.7~cloud0) fails due to 
balloon setup

2020-10-26T07:40:30.157066Z qemu-system-x86_64: get_pci_config_device: Bad 
config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
2020-10-26T07:40:30.157431Z qemu-system-x86_64: Failed to load PCIDevice:config
2020-10-26T07:40:30.157443Z qemu-system-x86_64: Failed to load 
virtio-balloon:virtio
2020-10-26T07:40:30.157448Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:04.0/virtio-balloon'
2020-10-26T07:40:30.159527Z qemu-system-x86_64: load of migration failed: 
Invalid argument
2020-10-26 07:40:30.223+: shutting down, reason=failed

- To the proposed version (3.1+dfsg-2ubuntu3.7~cloud1): works as
expected

Marking as verification completed.

** Tags removed: verification-stein-needed
** Tags added: verification-stein-done

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1848497

Title:
  virtio-balloon change breaks migration from qemu prior to 4.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1848497/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1897483] Re: With hardware offloading enabled, OVS logs are spammed with netdev_offload_tc ERR messages

2020-10-10 Thread Trent Lloyd
There is an indication in the below RHBZ this can actually prevent
openvswitch from working properly as it loses too much CPU time to this
processing in large environments (100s or 1000s of ports)

https://bugzilla.redhat.com/show_bug.cgi?id=1737982

Seems to be a rejected upstream patch here, unclear if one was later accepted, 
we shoudl check for it:
https://lists.linuxfoundation.org/pipermail/ovs-dev/2019-March/357348.html

And potentially prioritise a fix for this.

** Tags added: sts

** Bug watch added: Red Hat Bugzilla #1737982
   https://bugzilla.redhat.com/show_bug.cgi?id=1737982

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1897483

Title:
  With hardware offloading enabled, OVS logs are spammed with
  netdev_offload_tc ERR messages

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1897483/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1896734] Re: A privsep daemon spawned by neutron-openvswitch-agent hangs when debug logging is enabled (large number of registered NICs) - an RPC response is too large for msgpack

2020-10-08 Thread Trent Lloyd
** Tags added: seg

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1896734

Title:
  A privsep daemon spawned by neutron-openvswitch-agent hangs when debug
  logging is enabled (large number of registered NICs) - an RPC response
  is too large for msgpack

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1896734/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1887779] Re: recurrent uncaught exception

2020-09-29 Thread Trent Lloyd
I hit this too, after restart to fix it I also lose all my stored
metrics from the last few days. So going to triage this as High.

** Changed in: graphite-carbon (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1887779

Title:
  recurrent uncaught exception

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/graphite-carbon/+bug/1887779/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1882416] Re: virtio-balloon change breaks rocky -> stein live migrate

2020-09-25 Thread Trent Lloyd
I think the issue here is that Stein's qemu comes from Disco which was
EOL before Bug #1848497 was fixed and so the change wasn't backported.

While Stein is EOL next month the problem is this makes live migrations
fail which are often wanted during OpenStack upgrades to actually get
through Stein onto Train. So I think we'll need to backport the fix.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1882416

Title:
  virtio-balloon change breaks rocky -> stein live migrate

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1882416/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1882416] Re: virtio-balloon change breaks rocky -> stein live migrate

2020-09-25 Thread Trent Lloyd
** Tags added: seg

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1882416

Title:
  virtio-balloon change breaks rocky -> stein live migrate

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1882416/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-15 Thread Trent Lloyd
Right, the systems are running 1.1ubuntu1.18.04.11 - in my original
query to you I was trying to figure out if the patches in .12 or .13
were likely to have caused this specific situation and you weren't sure
hence the bug report with more details.

** Changed in: unattended-upgrades (Ubuntu)
   Status: Incomplete => New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1894453] Re: Building Ceph packages with RelWithDebInfo

2020-09-08 Thread Trent Lloyd
Are we sure it's actually building as Debug?

At least 15.2.3 on focal seems to build with RelWithDebugInfo.. I see
-O2 .. only do_cmake.sh had logic for this (it would set Debug if a .git
directory exists) but the debian rules file doesn't seem to use that
script but cmake directly.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894453

Title:
  Building Ceph packages with RelWithDebInfo

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1894453/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
** Attachment added: "dpkg.log.6"
   
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+attachment/5406809/+files/dpkg.log.6

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
Uploaded all historical log files in lp1893889-logs.tar.gz
Uploaded dpkg_-l 

For convenient access also uploaded unattended-upgrades.log.4,
unattended-upgrades-dpkg.log.4 and dpkg.log.6 which have the lines from
the first instance of hitting the error

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
** Attachment added: "unattended-upgrades-dpkg.log.4"
   
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+attachment/5406808/+files/unattended-upgrades-dpkg.log.4

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
** Attachment added: "dpkg_-l"
   
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+attachment/5406810/+files/dpkg_-l

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
** Attachment added: "unattended-upgrades.log.4"
   
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+attachment/5406807/+files/unattended-upgrades.log.4

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] Re: unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
** Attachment added: "all unattended-upgrades and dpkg logs"
   
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+attachment/5406806/+files/lp1893889-logs.tar.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1893889] [NEW] unattended-upgrade of nova-common failure due to conffile prompt

2020-09-01 Thread Trent Lloyd
Public bug reported:

unattended-upgrades attempted to upgrade nova from 2:17.0.9-0ubuntu1 to
2:17.0.10-0ubuntu2.1 (bionic-security), however nova-common contains a
modified conffile (/etc/nova/nova.conf) which prompts during upgrade and
leaves apt/dpkg in a permanent error state requiring manual
intervention. It also prevents other automated apt install operations
from working while in this state.

I understand that this conffile prompt is a generally known problem and
that unattended-upgrades specifically attempts to skip upgrades that
have such a conffile prompt, however that did not work on this case. I
am filing this bug to try and identify and resolve the cause and this
affected multiple systems in an Ubuntu OpenStack deployment.

rbalint advised that this is very likely a more complex interaction with the 
exact upgrades that were being staged at the time and hence more logs would be 
needed, indeed attempting to reproduce this very simply with a downgrade of 
nova packages to 2:17.0.0-0ubuntu1 results in it being skipped, as expected:
root@juju-c21ec6-bionic-nova-7:/home/ubuntu# unattended-upgrade
Package nova-common has conffile prompt and needs to be upgraded manually

And from the unattended-upgrades log we can see that 179 packages in
total were scheduled to upgrade together during this run.

Attaching the following logs files:
/var/log/unattended-upgrades/*
/var/log/dpkg*
dpkg_-l (As at 2020-04-27 16:22, the same time period as the 
unattended-upgrades logs, but the dpkg.log* files were taken later but also 
cover the full time period from before 2019-12-28 and after 2020-04-27).

The first instance of the failure is in unattended-upgrades.log.4.gz Line 161
"2019-12-28 06:15:29,837 Packages that will be upgraded: amd64-microcode... 
[truncated, 179 packages total]"

That relates to the output in unattended-upgrades-dpkg.log.4.gz Line 791
"Log started: 2019-12-28  06:25:56"

Which relates to the output of dpkg.log.6.gz Line 392
"2019-12-28 06:25:56 upgrade nova-compute-kvm:all 2:17.0.9-0ubuntu1 
2:17.0.10-0ubuntu2.1"

It fails many times after that as anytime you attempt to install a
package, it attempts to configure nova.conf again and exits with an
error again. But that is the original failure. But note that various
package upgrades happened by unattended-upgrades (and possibly other
sources) in the intervening 4 months and so I guess reproducing the
situation may require reverse engineering the original package list from
the dpkg logs. I have not currently attempted to do that with the hopes
intimate knowledge of the unattended-upgrades code and logs will make
that process faster.

A full sosreport from the system is available if more information is
required that will include other log files, and various other command
outputs. It is not uploaded initially for privacy.

** Affects: unattended-upgrades (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: sts

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1893889

Title:
  unattended-upgrade of nova-common failure due to conffile prompt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1893889/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1891269] Re: perf is not built with python script support

2020-08-11 Thread Trent Lloyd
Logs are not required for this issue

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1891269

Title:
  perf is not built with python script support

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891269/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1891269] [NEW] perf is not built with python script support

2020-08-11 Thread Trent Lloyd
Public bug reported:

The "perf" tool supports python scripting to process events, this
support is currently not enabled.

$ sudo perf script -g python
Python scripting not supported.  Install libpython and rebuild perf to enable 
it.
For example:
  # apt-get install python-dev (ubuntu)
  # yum install python-devel (Fedora)
  etc.

The expected behaviour is that the script creates a template python file
for you to modify to process the events.

>From what I can see enabling this requires a few items
- We need to Build-Depend on python3-dev
- We would ship the perf-script-python binary
- There are various python modules (under tools/perf/scripts/python) needed for 
these to work
- There are also a number of upstream scripts (e.g. 'net_dropmonitor') we could 
ship, normally you can see those by running 'perf script -l' but we get 
"open(/usr/libexec/perf-core/scripts) failed. Check "PERF_EXEC_PATH" env to set 
scripts dir.". Expected output can be seen by running 
"PERF_EXEC_PATH=LINUX_SOURCE_PATH/tools/perf ./perf script -l"


While not important to me personally, it also doesn't have support for
perl that could be fixed in a similar way, in case we want to fix that
at the same time. It doesn't have as many pre-existing scripts though
and seems less likely to be as useful compared to the Python version.

$ sudo perf script -g perl
Perl scripting not supported.  Install libperl and rebuild perf to enable it.
For example:
  # apt-get install libperl-dev (ubuntu)
  # yum install 'perl(ExtUtils::Embed)' (Fedora)
  etc.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: seg

** Tags added: seg

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1891269

Title:
  perf is not built with python script support

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1891269/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1888047] Re: libnss-mdns slow response

2020-07-22 Thread Trent Lloyd
This output is generally quite confusing.

Can you try remove the "search www.tendawifi.com" and see how it
differs?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1888047

Title:
  libnss-mdns slow response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nss-mdns/+bug/1888047/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1888047] Re: libnss-mdns slow response

2020-07-22 Thread Trent Lloyd
ideally using mdns4_minimal specifically (or i guess, both, but
generally not recommended to use mdns4 in most cases)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1888047

Title:
  libnss-mdns slow response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nss-mdns/+bug/1888047/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1888047] Re: libnss-mdns slow response

2020-07-21 Thread Trent Lloyd
Can you please confirm
(1) The timing of "getent hosts indigosky.local", "host indigosky.local", 
"nslookup indigosky.local" and "nslookup indigosky.local 192.168.235.1" all 
done at the same time (mainly adding the direct lookup through the server, 
wondering if nslookup is doing something weird in focal).
(2) The timings for the same if you switch mdns4 back to mdns4_minimal (but 
remove everything else)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1888047

Title:
  libnss-mdns slow response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nss-mdns/+bug/1888047/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 80900] Re: Avahi daemon prevents resolution of FQDNs ending in ".local" due to false negatives in the detection of ".local" networks

2020-07-19 Thread Trent Lloyd
This is fixed in Ubuntu 20.04 with nss-mdns 0.14 and later which does
proper split horizon handling.

** Changed in: avahi (Ubuntu)
   Status: Triaged => Fix Released

** Changed in: nss-mdns (Ubuntu)
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/80900

Title:
  Avahi daemon prevents resolution of FQDNs ending in ".local" due to
  false negatives in the detection of ".local" networks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/80900/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1888047] Re: libnss-mdns slow response

2020-07-19 Thread Trent Lloyd
Rumen,

When you use 'nslookup' it should go directly to using the DNS server
(127.0.0.53 [which is systemd-resolved]) which typically bypasses
libnss-mdns but also typically doesn't have this 5 second delay (which
avahi can have in some configurations). Seems most likely the 5 second
delay is coming from inside systemd-resolved for some reason.

The best way to test with "NSS" is to use "getent hosts DOMAIN"

Could you please confirm the output of the following commands:

lsb_release -a

dpkg -l libnss-mdns

systemctl status avahi-daemon

time getent hosts sirius.local

time nslookup sirius.local # just to verify the problem still exists at
the same time we do the above test

systemd-resolve --status --no-pager

- attach the file /etc/systemd/resolved.conf

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1888047

Title:
  libnss-mdns slow response

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nss-mdns/+bug/1888047/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1886809] Re: Pulse connect VPN exists because unwanted avahi network starts

2020-07-08 Thread Trent Lloyd
I'm not sure it makes sense to just universally skip "tun*" interfaces
(at least yet) but we may need to review the scenarios in which
/etc/network/if-up.d/avahi-autoipd is executing.

Helio: Can you provider a reproducer scenario? e.g. is this ubuntu
server, ubuntu desktop, what is the contents of:
/etc/network/interfaces, /etc/network/interfaces.d/*, /etc/netplan/* and
whether network manager is in use or not. And lastly exactly how pulse
VPN is installed and configured, and how that interface is
started/connected?

Additionally you may find this issue goes away with netplan versus the
older-style interfaces files. In any case with as much info as possible
for a reproducer I can check your exact scenario.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1886809

Title:
  Pulse connect VPN exists because unwanted avahi network starts

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1886809/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1871685] Re: [SRU] vagrant spits out ruby deprecation warnings on every call

2020-04-30 Thread Trent Lloyd
Hi Lucas,

Thanks for the patch updates. When I first submitted this we could have
snuck through before release without an SRU but the patch backport now
makes sense.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871685

Title:
  [SRU] vagrant spits out ruby deprecation warnings on every call

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/vagrant/+bug/1871685/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1874021] Re: avahi-browse -av is empty, mdns not working in arduino ide

2020-04-22 Thread Trent Lloyd
OK thanks for the updates. So I can see a lot of mDNS packets in the
lp1874021.pcap capture from various sources. I can see some printers,
google cast, sonoff, etc. Curiously though when you do the avahi cache
dump it isn't seeing any of these.

Wireshark is showing malformed packets for many of the responses
strangely, the IP and UDP headers indicate a different length to that of
the actual dta. Not sure if this is an issue with wireshark, the
wireless driver or whatever mDNS implementations are replying. May need
further looking at.

But it's curious though that avahi is showing absoultely no cached
services, and that it works on ethernet, given that the lp1874021.pcap
seems to show plenty of actual mDNS packets coming and going.


Could you try start Avahi (on wireless) using --debug

(1) override the systemd config with this command:
systemctl edit avahi-daemon.service

Once the editor opens, add the following 3 lines then save and squit:

[Service]
ExecStart=
ExecStart=/usr/sbin/avahi-daemon -s --debug

Then restart avahi-daemon: sudo systemctl restart avahi-daemon.service

Lastly run "avahi-browse -av", wait a minute or two, then upload a copy
of the "journalctl -u avahi-daemon" again?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874021

Title:
  avahi-browse -av is empty, mdns not working in arduino ide

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1874021/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1874192] [NEW] Remove avahi .local notification support (no longer needed)

2020-04-21 Thread Trent Lloyd
Public bug reported:

As of nss-mdns 0.14 (which is now shipping in Focal 20.04) Avahi no
longer requires to be stopped when a unicast .local domain is present,
nss-mdns now has logic to make this work correctly when Avahi is running
for both multicast and unicast.

We dropped the script that performs this check in Avahi, the relevant
logic in update-notifier to notify this information should also be
removed, although it should no longer function it's just dead code now.

e.g. /usr/lib/systemd/user/unicast-local-avahi.path, etc.

** Affects: update-notifier (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874192

Title:
  Remove avahi .local notification support (no longer needed)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/update-notifier/+bug/1874192/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1874021] Re: avahi-browse -av is empty, mdns not working in arduino ide

2020-04-21 Thread Trent Lloyd
Looking at jctl.txt things look normal, the server starts up, gets
server startup complete and then adds the appropriate IP for wlan0.
Config file looks normal.

Can you please try the following to collect extra debug info

(1) Start a tcpdump and leaving it running - tcpdump --no-promiscuous-mode -w 
lp1874021.pcap -i wlp1s0 port 5353 and udp 
(2) Restart avahi; sudo systemctl restart avahi-daemon
(3) Wait 10 seconds, then try run "avahi-browse -av | tee -a 
lp1874021-browse.txt" 
(4) Wait another 10 seconds
(5) Run: sudo killall -USR1 avahi-daemon # this dumps the avahi cache into the 
journal
(6) Quit avahi-browse
(7) Quit tcpdump
(8) Please then upload the lp1874021-browse.txt (copied output from 
avahi-browse), lp1874021.pcap (raw packet capture of mdns packets) and a copy 
of the output of "journalctl -u avahi-daemon"

As an extra test, after having done the above, you can try put the
interface in promiscous mode and see if that fixes the problem. This can
make Avahi work on bad network (usually wifi) drivers that do not
correctly implement multicast.

(9) sudo tcpdump -w lp1874021-promisc.pcap -i wlp1s0 port 5353 and udp 
(10) sudo systemctl restart avahi-daemon
(11) avahi-browse -av
(12) If the service still hasn't shown up, consider also then restarting 
whatever device is advertising the service you want to connect to. And note if 
it then appears after doing that.
(13) If you have the option, try to then plug in either or both devices via 
ethernet instead of WiFi.

If the services do start appearing at some point be sure to note which
step you were at when that happened.

Please note that all of these files will contain information about mDNS
services on your local network, typically this information is relatively
OK to be public since it would be broadcast if you were on a public WiFi
network - it can include names, mac addresses, etc. If that is a concern
to you in terms of privacy then feel free to consider either attempting
to sanitize the data (though that is difficult for the pcap file) or
setting the bug to private although we much prefer not to set bugs to
private if possible.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1874021

Title:
  avahi-browse -av is empty, mdns not working in arduino ide

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1874021/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 327362] Re: Some ISPs have .local domain which disables avahi-daemon

2020-04-21 Thread Trent Lloyd
For anyone looking at this in 2020, this is fixed in nss-mdns 0.14 which
is in Ubuntu Focal 20.04 - it will now correctly pass through unicast
.local lookups.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/327362

Title:
  Some ISPs have .local domain which disables avahi-daemon

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/327362/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1871685] Re: vagrant spits out ruby deprecation warnings on every call

2020-04-19 Thread Trent Lloyd
** Patch added: "full merge debdiff from old ubuntu version to new ubuntu 
version"
   
https://bugs.launchpad.net/ubuntu/+source/vagrant/+bug/1871685/+attachment/5356998/+files/lp1871685_complete-merge_2.2.6+dfsg-2ubuntu1_2.2.7+dfsg-1ubuntu1.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871685

Title:
  vagrant spits out ruby deprecation warnings on every call

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/vagrant/+bug/1871685/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1871685] Re: vagrant spits out ruby deprecation warnings on every call

2020-04-19 Thread Trent Lloyd
** Patch added: "partial merge debdiff showing only the delta to current debian 
version"
   
https://bugs.launchpad.net/ubuntu/+source/vagrant/+bug/1871685/+attachment/5356999/+files/lp1871685_merge-only_2.2.7+dfsg-1_2.2.7+dfsg-1ubuntu1.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871685

Title:
  vagrant spits out ruby deprecation warnings on every call

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/vagrant/+bug/1871685/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1871685] Re: vagrant spits out ruby deprecation warnings on every call

2020-04-19 Thread Trent Lloyd
Please sponsor this upload of a merge of Vagrant 2.2.7+dfsg-1 from
Debian. It is a minor upstream version bump (2.2.6 -> 2.2.7) plus
contains new patches from Debian to fix multiple Ruby 2.7 deprecation
warnings on every command invocation.

Two debdiffs attached:
partial merge debdiff showing only the delta to current debian version 
(lp1871685_merge-only_2.2.7+dfsg-1_2.2.7+dfsg-1ubuntu1.debdiff)
full merge debdiff from old ubuntu version to new ubuntu version 
(lp1871685_complete-merge_2.2.6+dfsg-2ubuntu1_2.2.7+dfsg-1ubuntu1.debdiff)

This is a direct merge of the previous merge, which is simply to disable
the autopkgtest as it is long time known flakey on Ubuntu
infrastructure.

It would be ideal to get this merge through ahead of Focal release to
continue having no delta to Debian upstream. This package is in
universe.

** Changed in: vagrant (Ubuntu)
   Status: New => Confirmed

** Changed in: vagrant (Ubuntu)
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1871685

Title:
  vagrant spits out ruby deprecation warnings on every call

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/vagrant/+bug/1871685/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

  1   2   3   4   5   >