Hi,
This regression is fairly complicated and it's high impact, as mptsas is
being used to drive fairly popular controllers, including the
entry-level ones in several generations of Dell PowerEdge servers.
We've been debugging this for a while now over at Ubuntu's Launchpad[1]
and the issue has been subsequently been raised on both the
linux-scsi[2] & systemd mailing lists[3].
In essence, there are four different behaviors/bugs here:
1) The kthread_create() semantics have changed in 3.13 with 786235ee by
making kthreads killable. Not a bug on its own, but it's a "breaks
previously working userspace configuration" kind of bug. Ubuntu has
reverted this patch for trusty as a workaround.
2) mptsas, to probe the SAS bus, spawns a kthread that takes more than
30s to complete. The consensus on the list AIUI is that it's a bug and
it should not take that long.
3) systemd-udev by default sends SIGKILL to kthreads that have been
running for more than 30s. systemd developers do not consider this a bug
but an intended behavior and refuse to fix this issue. Adding
"OPTIONS+="event_timeout=120" to the udev config would probably
workaround this.
4) Unrelated to the bug at hand, mptsas is buggy in the error handling
codepath, when the kthread spawning fails. It tries to clean up by
dereferencing a NULL pointer and hence the kernel oopses, while
otherwise it'd just continue running, just without any mptsas devices
present. I've made an analysis of the buggy codepath on comment #27 on
the LP bug above. This has always been a bug, it's just that that
codepath was untested until now.
The end result is that this regression is somewhere in the limbo land
between kernel/systemd for the two features (1)/(2) that are valid on
their own but reveal a regression in combination with (3) and each other.
Issue (2) seems like a real bug and the root cause here, but one that
probably can't be easily fixed in a point release -- I don't think it
hasn't even been fixed in master yet.
Issue (4) is easily fixable but it's orthogonal and not going to solve
the real problem here. It will just downgrade this from an oops to
"just" a system with no disk drives but an otherwise working kernel.
Regards,
Faidon
1: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
2: https://lkml.org/lkml/2014/3/23/42
3:
http://lists.freedesktop.org/archives/systemd-devel/2014-March/018007.html
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org