Hello Jason, thanks for your response. See my inline comments.
Am 31.07.19 um 14:43 schrieb Jason Dillaman: > On Wed, Jul 31, 2019 at 6:20 AM Marc Schöchlin <m...@256bit.org> wrote: > > > The problem not seems to be related to kernel releases, filesystem types or > the ceph and network setup. > Release 12.2.5 seems to work properly, and at least releases >= 12.2.10 seems > to have the described problem. > ... > > It's basically just a log message tweak and some changes to how the > process is daemonized. If you could re-test w/ each release after > 12.2.5 and pin-point where the issue starts occurring, we would have > something more to investigate. Are there changes related to https://tracker.ceph.com/issues/23891? You showed me the very low amount of changes in rbd-nbd. What about librbd, librados, ...? What else can we do to find a detailed reason for the crash? Do you think it is useful to activate coredump-creation for that process? >> Whats next? Is i a good idea to do a binary search between 12.2.12 and >> 12.2.5? >> Due to the absence of a coworker i almost had no capacity to execute deeper tests with this problem. But i can say that in reproduced the problem also with release 12.2.12. The new (updated) list: - SUCCESSFUL: kernel 4.15, ceph 12.2.5, 1TB ec-volume, ext4 file system, 120s device timeout -> 18 hour testrun was successful, no dmesg output - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created without reboot -> parallel krbd device usage with 99% io usage worked without a problem while running the test - FAILED: kernel 4.15, ceph 12.2.11, 2TB ec-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created -> parallel krbd device usage with 99% io usage worked without a problem while running the test - FAILED: kernel 4.4, ceph 12.2.11, 2TB ec-volume, xfs file system, no timeout -> failed after < 10 minutes -> system runs in a high system load, system is almost unusable, unable to shutdown the system, hard reset of vm necessary, manual exclusive lock removal is necessary before remapping the device - FAILED: kernel 4.4, ceph 12.2.11, 2TB 3-replica-volume, xfs file system, 120s device timeout -> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created *- FAILED: kernel 5.0, ceph 12.2.12, 2TB ec-volume, ext4 file system, 120s device timeout****-> failed after < 1 hour, rbd-nbd map/device is gone, mount throws io errors, map/mount can be re-created* Regards Marc
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com