Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

Torsten Kaiser Mon, 19 Nov 2007 11:35:44 -0800

On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
>
> > Trying the last NFSv4 patch (but that patch is only the cause, why I
> > had lockdep enabled) I got this:
> > [   64.550203]
> > [   64.550205] =========================
> > [   64.552213] [ BUG: held lock freed! ]
> > [   64.553633] -------------------------
> > [   64.555055] kcryptd/1022 is freeing memory
> > FFFF81011EBEFB00-FFFF81011EBEFB3F, with a lock still held there!
>
> so kcryptd frees a live, still in use bio? That could be a receipe for
> data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything?


I have SLUB_DEBUG=y, but not SLUB_DEBUG_ON.
But apart from this message I did not the anything in the syslog.
It seems to not be onetime event, as one the third boot it happend
again. Stacktrace was identical.
Sadly trying 3 boots with slub_debug=FZP and another one with only F
did not trigger it.

But I don't think kcryptd is freeing a bio at that point.
The message said about the freed lock: (kcryptd){--..}, at: [<ffffffff80247dd9>]
(gdb) list *0xffffffff80247dd9
0xffffffff80247dd9 is in run_workqueue (include/asm/bitops_64.h:69).
64       * you should call smp_mb__before_clear_bit() and/or
smp_mb__after_clear_bit()
65       * in order to ensure changes are visible on other processors.
66       */
67      static inline void clear_bit(int nr, volatile void *addr)
68      {
69              __asm__ __volatile__( LOCK_PREFIX
70                      "btrl %1,%0"
71                      :ADDR
72                      :"dIr" (nr));
73      }
increasing the addr a little bit shows:
(gdb) list *0xffffffff80247ddf
0xffffffff80247ddf is in run_workqueue (kernel/workqueue.c:275).
270                     list_del_init(cwq->worklist.next);
271                     spin_unlock_irq(&cwq->lock);
272
273                     BUG_ON(get_wq_data(work) != cwq);
274                     work_clear_pending(work);
275                     lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0,
2, _THIS_IP_);
276                     lock_acquire(&lockdep_map, 0, 0, 0, 2, _THIS_IP_);
277                     f(work);
278                     lock_release(&lockdep_map, 1, _THIS_IP_);
279                     lock_release(&cwq->wq->lockdep_map, 1, _THIS_IP_);


Above this acquire/release sequence is the following comment:
#ifdef CONFIG_LOCKDEP
                /*
                 * It is permissible to free the struct work_struct
                 * from inside the function that is called from it,
                 * this we need to take into account for lockdep too.
                 * To avoid bogus "held lock freed" warnings as well
                 * as problems when looking into work->lockdep_map,
                 * make a copy and use that here.
                 */
                struct lockdep_map lockdep_map = work->lockdep_map;
#endif

Did something trigger this anyway?

Anything I could try, apart from more boots with slub_debug=F?

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

Reply via email to