On Tue, 5 Dec 2017, Matthew Wilcox wrote:

On Tue, Dec 05, 2017 at 07:11:00AM +1100, NeilBrown wrote:
As we don't seem to be pursuing this possibility is probably isn't very
important, but I'd like to point out that the original fix isn't a true
fix.
It just sorts a shared group_info early.  This does not stop corruption.
Every time a thread calls set_groups() on that group_info it will be
sorted again.
The sort algorithm used is the heap sort, and a heap sort always moves
elements in the array around - it does not leave a sorted array
untouched (unlike e.g. the quick sort which doesn't move anything in a
sorted array).
So it is still possible for two calls to groups_sort() to race.
We *need* to move groups_sort() out of set_groups().

Hum, makes sense. I've applied it to the most recent Fedora kernel (that uses heapsort) and I didn't see the problem again. I should run a few more repetitions to be sure.

It must be relatively common to sort an already-sorted array.  I wonder
if something like this patch would be worthwhile?

I have deliberately broken this patch so it can't be applied.  I haven't
tested it, and for all I know, I got the sign of cmp_func wrong.

diff --git a/lib/sort.c b/lib/sort.c
index d6b7a202b0b6..2b527fde6dad 100644
--- a/lib/sort.c
+++ b/lib/sort.c
@@ -75,7 +75,14 @@ void sort(void *base, size_t num, size_t size,
                        swap_func = generic_swap;
        }

-       /* heapify */
+       /* Do not sort an already-sorted array */
+       for (c = 0; c < (n - size); c += size) {
+               if (cmp_func(base + c, base + c + size) < 0)
+                       goto heapify;
+       }
+       return;
+
+heapify:
        for ( ; i >= 0; i -= size) {
                for (r = i; r * 2 + size < n; r  = c) {
                        c = r * 2 + size;

The bug happens when two threads enter sort_groups for the same group info in parallel, and one thread starts overwriting values that another thread may already have "heapified" or sorted.

Thread A                  Thread B
Enter groups_sort
                          Enter groups_sort
.
.
.
Return from groups_sort
                          .
                          .
                          .
                          Return from groups_sort

Wouldn't this patch just make both threads see the structure as unsorted and sort them?

Thanks,
trbecker

Reply via email to