On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote: > Torsten Kaiser wrote: > > On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > >> * Torsten Kaiser <[EMAIL PROTECTED]> wrote: > ... > > Above this acquire/release sequence is the following comment: > > #ifdef CONFIG_LOCKDEP > > /* > > * It is permissible to free the struct work_struct > > * from inside the function that is called from it, > > * this we need to take into account for lockdep too. > > * To avoid bogus "held lock freed" warnings as well > > * as problems when looking into work->lockdep_map, > > * make a copy and use that here. > > */ > > struct lockdep_map lockdep_map = work->lockdep_map; > > #endif > > > > Did something trigger this anyway? > > > > Anything I could try, apart from more boots with slub_debug=F? > > Please could you try which patch from the dm-crypt series cause this ? > (agk-dm-dm-crypt* names.) > > I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because > there is one work struct used subsequently in two threads... > (io thread already started while crypt thread is processing lockdep_map > after calling f(work)...) > > (btw these patches prepare dm-crypt for next patchset introducing > async cryptoapi, so there should be no functional changes yet.)
I looked at all of these agk-*-patches, as the error is not bisectable, because it triggers unreliable. The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch This one does a functional change, as there now is an additional ref on io->pending. Instead of only increasing io->pending if there really are more then one clone-bio, it will now take an additional ref in crypt_write_io_process(). I certainly agree with the cleanup, but this introduces the following change: Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). Now there is an additional call to crypt_dec_pending() to balance the additional ref placed into crypt_write_io_process(). And that one is not called from whatever context/thread cleans up after make_generic_request, but directly in the context/thread of the caller of crypt_write_io_process(), and that is kcryptd. So now it is possible (if all requests finish before crypt_write_io_process() returns) that kcryptd itself will release the bio, but the workqueue infrastructure still seems to have a lock on that. But as the comment in run_workqueue says, this should be legal, and I can't figure out what would make the the lockdep copy mechanism fail. Especially if the trigger was really a WRITE request, as with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this should never use the kcrypt_io-workqueue and so there should be not even the problem with using INIT_WORK twice on the same work_struct. ... or I just don't see the bug. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/