More notekeeping: I added
 
  udev_dbg(udev_monitor->udev, "udev_monitor_receive_device: start\n");

at the very beginning of udev_monitor_receive_device(). In theory, any
"start" of this should either be matched by a "success" at the end, or
an "unable to receive message: Resource temporarily unavailable". In a
successful test run this is true, but in a failed run I get 14 "start"s
and 3 "Resource temporarily unavailable", which leaves exactly 11
"starts" which ought to succeed. But I only get 9 successes, reflecting
the "actual: 9 expected: 11" failure.

Indeed I see two blocks

libudev: udev_monitor_receive_device: udev_monitor_receive_device: start
libudev: udev_device_new_from_syspath: device 0x7f2394001be0 has devpath 
'/devices/card2'
libudev: udev_device_new_from_syspath: device 0x7f2394001be0 has devpath 
'/devices/card2'

without either a success or a "unable to receive message", which means
that there is somewhere an exit of udev_monitor_receive_device() which
eats the event.

I now added udev_dbg()s to all exit paths which didn't yet have one.
This revealed that the event was correctly read from the netlink socket,
but then discarded here:

        /* skip device, if it does not pass the current filter */
        if (!passes_filter(udev_monitor, udev_device)) {
                struct pollfd pfd[1];
                int rc;

                udev_device_unref(udev_device);

src/platform/udev_wrapper.cpp has wrappers for
udev_monitor_filter_add_match_subsystem_devtype(), and apparently for
this test this is applied:

src/platform/graphics/mesa/display.cpp:
monitor.filter_by_subsystem_and_type("drm", "drm_minor");

Mir does not use tag based filtering.

Further additions of udev_dbg() to passes_filter() reveals that the
received uevent device delivers udev_device_get_devtype() == NULL, which
is the reason for discarding it as the monitor filter only watches for
devtype "drm_minor".

My current suspicion is that this is a race on the fake /sys "uevent"
file for the device -- it gets read before it got completely written, or
rather synced to disk. Thus the "DEVTYPE=" property would be missing.
This would be the kind of thing which would get aggravated under high
system load. Also, I ran the test case with just this patch:

                 // sleeping between calls to fake_devices hides race conditions
-                std::this_thread::sleep_for(std::chrono::microseconds{500});
+                //std::this_thread::sleep_for(std::chrono::microseconds{500});
+                sync();

... and it is now through 500 iterations without failure. This doesn't
prove that this is the bug as the sync() delays the iterations quite a
bit (each test now takes ~ 200 ms), but it's currently the only
plausible explanation that I have.

I'll create a similar test for that in umockdev's test suite, which will
make this slightly easier to debug and ensure it stays fixed.

** Also affects: umockdev (Ubuntu)
   Importance: Undecided
       Status: New

** Summary changed:

- Intermittent 
mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test 
failure
+ Intermittent 
mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test 
failure: device DEVTYPE is sometimes NULL

-- 
You received this bug notification because you are a member of Desktop
Packages, which is subscribed to umockdev in Ubuntu.
https://bugs.launchpad.net/bugs/1336671

Title:
  Intermittent
  mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler
  test failure: device DEVTYPE is sometimes NULL

Status in Mir:
  In Progress
Status in “umockdev” package in Ubuntu:
  New

Bug description:
  As seen in: http://s-jenkins.ubuntu-ci:8080/job/mir-clang-utopic-
  amd64-build/799/console

  To reproduce locally:

  bzr branch lp:mir/devel mir-devel && cd mir-devel
  mkdir build && cd build && cmake .. && make -j4
  umockdev-wrapper bin/mir_unit_tests 
--gtest_filter=MesaDisplayTest.drm_device_change_event_triggers_handler 
--gtest_repeat=-1 --gtest_break_on_failure

  (Ignore the segfault when the test fails, it's a side effect of
  --gtest_break_on_failure)

  Running with strace or on a system with high load increases the chances that 
we hit the problem. For example, running make -j4 in another mir branch while
  running the tests does the trick for mir.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mir/+bug/1336671/+subscriptions

-- 
Mailing list: https://launchpad.net/~desktop-packages
Post to     : desktop-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~desktop-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to