More notekeeping: I added udev_dbg(udev_monitor->udev, "udev_monitor_receive_device: start\n");
at the very beginning of udev_monitor_receive_device(). In theory, any "start" of this should either be matched by a "success" at the end, or an "unable to receive message: Resource temporarily unavailable". In a successful test run this is true, but in a failed run I get 14 "start"s and 3 "Resource temporarily unavailable", which leaves exactly 11 "starts" which ought to succeed. But I only get 9 successes, reflecting the "actual: 9 expected: 11" failure. Indeed I see two blocks libudev: udev_monitor_receive_device: udev_monitor_receive_device: start libudev: udev_device_new_from_syspath: device 0x7f2394001be0 has devpath '/devices/card2' libudev: udev_device_new_from_syspath: device 0x7f2394001be0 has devpath '/devices/card2' without either a success or a "unable to receive message", which means that there is somewhere an exit of udev_monitor_receive_device() which eats the event. I now added udev_dbg()s to all exit paths which didn't yet have one. This revealed that the event was correctly read from the netlink socket, but then discarded here: /* skip device, if it does not pass the current filter */ if (!passes_filter(udev_monitor, udev_device)) { struct pollfd pfd[1]; int rc; udev_device_unref(udev_device); src/platform/udev_wrapper.cpp has wrappers for udev_monitor_filter_add_match_subsystem_devtype(), and apparently for this test this is applied: src/platform/graphics/mesa/display.cpp: monitor.filter_by_subsystem_and_type("drm", "drm_minor"); Mir does not use tag based filtering. Further additions of udev_dbg() to passes_filter() reveals that the received uevent device delivers udev_device_get_devtype() == NULL, which is the reason for discarding it as the monitor filter only watches for devtype "drm_minor". My current suspicion is that this is a race on the fake /sys "uevent" file for the device -- it gets read before it got completely written, or rather synced to disk. Thus the "DEVTYPE=" property would be missing. This would be the kind of thing which would get aggravated under high system load. Also, I ran the test case with just this patch: // sleeping between calls to fake_devices hides race conditions - std::this_thread::sleep_for(std::chrono::microseconds{500}); + //std::this_thread::sleep_for(std::chrono::microseconds{500}); + sync(); ... and it is now through 500 iterations without failure. This doesn't prove that this is the bug as the sync() delays the iterations quite a bit (each test now takes ~ 200 ms), but it's currently the only plausible explanation that I have. I'll create a similar test for that in umockdev's test suite, which will make this slightly easier to debug and ensure it stays fixed. ** Also affects: umockdev (Ubuntu) Importance: Undecided Status: New ** Summary changed: - Intermittent mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test failure + Intermittent mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test failure: device DEVTYPE is sometimes NULL -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to umockdev in Ubuntu. https://bugs.launchpad.net/bugs/1336671 Title: Intermittent mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test failure: device DEVTYPE is sometimes NULL Status in Mir: In Progress Status in “umockdev” package in Ubuntu: New Bug description: As seen in: http://s-jenkins.ubuntu-ci:8080/job/mir-clang-utopic- amd64-build/799/console To reproduce locally: bzr branch lp:mir/devel mir-devel && cd mir-devel mkdir build && cd build && cmake .. && make -j4 umockdev-wrapper bin/mir_unit_tests --gtest_filter=MesaDisplayTest.drm_device_change_event_triggers_handler --gtest_repeat=-1 --gtest_break_on_failure (Ignore the segfault when the test fails, it's a side effect of --gtest_break_on_failure) Running with strace or on a system with high load increases the chances that we hit the problem. For example, running make -j4 in another mir branch while running the tests does the trick for mir. To manage notifications about this bug go to: https://bugs.launchpad.net/mir/+bug/1336671/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : desktop-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp