[PATCH v2 1/1] dm-delay: fix hung task introduced by kthread mode

2024-05-06 Thread Joel Colledge
triggering by creating the thread with kthread_run() instead of using kthread_create() directly. Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq") Signed-off-by: Joel Colledge --- drivers/md/dm-delay.c | 3 +-- 1 file changed, 1 insertion(+), 2

[PATCH v2 0/1] dm-delay: fix hung task issue

2024-05-06 Thread Joel Colledge
ng patch fixes the issue. Thanks Christian and Benjamin for the comments on v1! Changes from v1: - Use kthread_run() instead of wake_up_process() Joel Colledge (1): dm-delay: fix hung task introduced by kthread mode drivers/md/dm-delay.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- 2.34.1

Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode

2024-04-30 Thread Joel Colledge
On Tue, 30 Apr 2024 at 17:27, Christian Loehle wrote: > On 30/04/2024 15:44, Joel Colledge wrote: > > On Tue, 30 Apr 2024 at 16:28, Christian Loehle > > wrote: > >> Is this an issue for delay > 0 too somehow? > > > > I believe it is. If th

Re: [PATCH 1/1] dm-delay: fix hung task introduced by kthread mode

2024-04-30 Thread Joel Colledge
On Tue, 30 Apr 2024 at 16:28, Christian Loehle wrote: > Is this an issue for delay > 0 too somehow? I believe it is. If there is simply no IO to the delay device, then nothing will wake the new thread and the same issue will occur. I haven't yet reproduced this case, because the system I am

[PATCH 1/1] dm-delay: fix hung task introduced by kthread mode

2024-04-26 Thread Joel Colledge
the newly minted worker in delay_ctr(). Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq") Signed-off-by: Joel Colledge --- drivers/md/dm-delay.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-del

[PATCH 0/1] dm-delay: fix hung task issue

2024-04-26 Thread Joel Colledge
wing patch fixes the issue. Best regards, Joel Joel Colledge (1): dm-delay: fix hung task introduced by kthread mode drivers/md/dm-delay.c | 1 + 1 file changed, 1 insertion(+) -- 2.34.1

Re: Kernel Panic with 9.2.8

2024-04-12 Thread Joel Colledge
Hallo Aleksandr, Thanks for the report. We are looking into this issue and have an idea of what may be causing it. The issue appears to be related to Kubernetes. In particular, to some unusual actions taken by the CSI driver. There is a workaround available in the CSI driver that avoids these

Re: Usynced blocks if replication is interrupted during initial sync

2024-03-20 Thread Joel Colledge
> We are still seeing the issue as described but perhaps I am not putting the > invalidate > at the right spot > > Note - I've added it at step 6 below, but I'm wondering if it should be after > the additional node is configured and adjusted (in which case I would need to > unmount as apparently

Re: Usynced blocks if replication is interrupted during initial sync

2024-03-19 Thread Joel Colledge
Hi Tim, Thanks for the report and your previous one with the subject line "Blocks fail to sync if connection lost during initial sync". This does look like a bug. I can reproduce the issue. It appears to be a regression introduced in drbd-9.1.16 / drbd-9.2.5. Specifically, with the following

Re: [DRBD-user] Resources outdated but Current UUIDs match and Bitmap UUIDs are clean

2023-06-20 Thread Joel Colledge
Hi Andrei, Which DRBD version is running here? Which version was running previously, where this issue was not observed? Best regards, Joel ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com

Re: [DRBD-user] drbd9.2 resync stuck with drbd_set_in_sync: sector=<...>s size=<...> nonsense!

2022-10-25 Thread Joel Colledge
Dear Nils, > The third resource however did sync about 65% of the outdated data and > then stalled (no more sync traffic, no progress in drbdmon) > > The kernel message that seems to be relevant here is this: > > drbd vm-101-disk-1/0 drbd1001: drbd_set_in_sync: sector=73703424s > size=134479872

Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot

2022-08-02 Thread Joel Colledge
Hi Michael, Are you using the most recent version of drbd-utils? There have been a few fixes over the years which might be related. Perhaps the hardware problems affected the metadata long ago and now the corrupted metadata is present in all the snapshots. If that is not the case, this looks to

[Bug 1891169] Re: fio crashes when using --offset with a device

2021-09-23 Thread Joel Colledge
I have also observed strange behavior with the fio 3.16 package on Focal. It occurs with the combination of "size" and "offset", and is affected by the presence of "time_based". For example with this command: fio --name=test --filename=/dev/loop9 --ioengine=libaio --rw=write --direct=1

Re: [DRBD-user] Adding a second volume to a resource and making it UpToDate

2021-09-01 Thread Joel Colledge
> Thank you for your reply, Joel! I'm a little confused. If I understand > you correctly, what I should have done is: > > pcs property set maintenance-mode=true > pcs cluster standby storage1 > > How would the combination of putting Pacemaker in maintenance mode and > then trying to standby a

Re: [DRBD-user] Adding a second volume to a resource and making it UpToDate

2021-08-31 Thread Joel Colledge
Hi Bryan, > In the very last post in this thread there is this: > > "DRBD requires some intervention to enable the volume. The > simplest method to get the new volume working would be to demote > the resource to the "Secondary" role and then promote it to the > "Primary" role again, using drbdadm

Re: [DRBD-user] drbc 9.1.1 whole cluster blocked

2021-05-27 Thread Joel Colledge
> No ko-count set, so apparently something different... ko-count is enabled by default (with value "7"). Have you explicitly disabled it? Your description does sound very similar to the issue that has been fixed as Rene mentioned. Regards, Joel ___

Re: [DRBD-user] 300% latency difference between protocol A and B with NVMe

2020-11-24 Thread Joel Colledge
Hi Wido, These results are not too surprising. Consider the steps involved in a protocol C write. Note that tcp_lat is one way latency, so we get: Send data to peer: 13.3 us (perhaps more, if qperf was testing with a size less than 4K) Write on peer: 1s / 32200 == 31.1 us Confirmation of write

Re: Network update disrupts network usage

2020-09-16 Thread Joel Colledge
> The networkUpdate() method in libvirt source will recreate firewall > rules if any DHCP hosts change. This is because the firewall rules > differ when there is zero vs non-zero number of DHCP hosts present. > > This could be optimized to only recreate when going from zero to > non-zero or

Network update disrupts network usage

2020-09-09 Thread Joel Colledge
Dear libvirt users, I am encountering problems with network connections from VMs while running net-update on the host. I would be very grateful for suggestions of fixes or workarounds. I am using libvirt in the context of an automated test system which creates and destroys VMs fairly rapidly,

Re: Public "generic" cloud images for stretch

2020-01-21 Thread Joel Colledge
Hi Thomas, > You can still use the OpenStack image, built with > openstack-debian-images, which kind of match the generic image: > > http://cdimage.debian.org/cdimage/openstack/ Thanks for the suggestion. I had already tried that image, but it had failed to boot. It seems that the OpenStack

Public "generic" cloud images for stretch

2020-01-20 Thread Joel Colledge
Hi, I am experimenting with cloud-init enabled images for use in local development and test infrastructure. For buster I found the "generic" image, which is very useful: https://cloud.debian.org/images/cloud/buster/daily/20200119-143/debian-10-generic-amd64-daily-20200119-143.qcow2 However, I

Re: [DRBD-user] Upgrade DRBD 9.0.19-1 to 9.0.20-1 on Debian 9

2019-10-31 Thread Joel Colledge
Hi Anthony, On Thu, Oct 31, 2019 at 7:18 AM Anthony Frnog wrote: > When I upgrade DRBD from 9.0.19-1 to 9.0.20-1 on Debian 9, the DRBD cluster > seems "break" Yes, we are aware of this issue and have prepared a solution to it internally. It only affects kernels in the 4.8 and 4.9 series.

[PATCH v3] scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

2019-10-11 Thread Joel Colledge
-dmesg Python Exception embedded null character: Error occurred in Python command: embedded null character The read_u* utility functions now take an offset argument to make them easier to use. Signed-off-by: Joel Colledge --- Changes in v3: - fix some overlong lines and generally make the code

Re: [PATCH v2] scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

2019-10-11 Thread Joel Colledge
On Fri, Oct 11, 2019 at 2:47 PM Leonard Crestez wrote: > This struct printk_log is quite small, I wonder if it's possible to do a > single read for each log entry? This might make lx-dmesg faster because > of fewer roundtrips to gdbserver and jtag (or whatever backend you're > using). I think

Re: [PATCH v2] scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

2019-10-11 Thread Joel Colledge
On Fri, Oct 11, 2019 at 2:38 PM Jan Kiszka wrote: > Does bitpos really use a non-int type? Otherwise, plain '/' suffices. bitpos uses gdb.Field. When I use '/' I get an error: Error occurred in Python command: slice indices must be integers or None or have an __index__ method I'm guessing

[PATCH v2] scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

2019-10-11 Thread Joel Colledge
-dmesg Python Exception embedded null character: Error occurred in Python command: embedded null character Signed-off-by: Joel Colledge --- Changes in v2: - use type information from gdb instead of hardcoded offsets Thanks for the idea about using the struct layout info from gdb, Leonard. I

Re: [PATCH] scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

2019-10-10 Thread Joel Colledge
Hi Jan and Kieran, maintainers of scripts/gdb/, CC: Leonard, most recent contributor to scripts/gdb/linux/dmesg.py Could someone look at this fix please? Is there anything I should improve in the code or the format of the contribution? Thanks.

[PATCH] scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set

2019-09-25 Thread Joel Colledge
character Signed-off-by: Joel Colledge --- scripts/gdb/linux/constants.py.in | 1 + scripts/gdb/linux/dmesg.py| 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/scripts/gdb/linux/constants.py.in b/scripts/gdb/linux/constants.py.in index 2efbec6b6b8d..3c9794a0bf55 100644

[DRBD-user] drbd-10.0.0a1

2019-08-05 Thread Joel Colledge
Hi, The first drbd-10.0 alpha release is out. We are working on some major new features in DRBD, so we have created a drbd-9.0 stable branch and upcoming releases from master will belong to the drbd-10.0 series. The main changes for drbd-10.0 so far are: * Reduced lock contention. The

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-12 Thread Joel Colledge
x > > Regards, > Rob > > > On 6/12/19 9:50 AM, Joel Colledge wrote: > > Hi Rob, > > This is strange, since the filesystem DAX access uses essentially the same > checks as DRBD does. You can get more detail about the failure by doing the > mount test again after en

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-12 Thread Joel Colledge
g > off DAX. > [518270.835599] EXT4-fs (dm-16): mounted filesystem with ordered data > mode. Opts: dax > > Regards, > Rob > > > On 6/11/19 3:57 PM, Joel Colledge wrote: > > Hi Rob, > > This is strange. It seems that your LV is reporting that it supports > PM

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-11 Thread Joel Colledge
Hi Rob, This is strange. It seems that your LV is reporting that it supports PMEM/DAX. I suggest that you check that this issue also occurs without DRBD. For example, create a filesystem and try to mount it with DAX enabled: mkfs.ext4 -b 4K /dev/ mount -o dax /dev/ The check the syslog to see

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-11 Thread Joel Colledge
m? What backing device are you using for DRBD? In case you have external metadata - what backing device are you using for the DRBD metadata? Is this a PMEM device? Just before the crash, you should see "meta-data IO uses: ..." in your kernel log. Please provide this log line. Joel Colle

[Wireshark-dev] Wiki access for DRBD documentation

2019-03-12 Thread Joel Colledge
Hi, I would like to add a DRBD pcap to the wiki as requested in https://code.wireshark.org/review/#/c/32332/ Could someone give my wiki user JoelColledge the appropriate permissions? Thanks, Joel ___ Sent via:

Re: [DRBD-user] LINSTOR snapshots problem

2018-09-21 Thread Joel Colledge
es volume definitions matching those at the time of the snapshot. Best regards, -- Joel Colledge LINBIT | Keeping the Digital World Running DRBD HA - Disaster Recovery - Software-defined Storage DRBD® and LINBIT® are registered trademarks of LINBIT