Re: [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue

Qu Wenruo Tue, 07 Jan 2014 22:25:01 -0800

On Mon, 23 Dec 2013 10:35:04 +0800
, Qu Wenruo wrote:

On fri, 20 Dec 2013 05:30:48 -0800, Josef Bacik wrote:
On 12/19/2013 07:08 PM, Qu Wenruo wrote:
I'm sorry but I failed to reproduce the problem.
Btrfs/012 in xfstests has been run for serveral hours but nothinghappened.
Would you please give me some more details about the environment orthe panic backtrace?
Ok so it wasn't that test, it was just ./check -g auto. It wouldsometimes die at btrfs/012, other times it would make it as far asgeneric/083 before it keeled over. I bisected it down to
btrfs: Replace fs_info->workers with btrfs_workqueue
which of course is just the first patch that you start using the newcode which isn't helpful. My dmesg from one of my runs is here
http://ur1.ca/g867b
All of the initial panics looked exactly the same. I'm just going tokick the series out for now while you track down the problem. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribelinux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks for your dmesg a lot.
It's quite sad that I still failed to reproduce the same bug on all mytest environment during the weekend.
But according to the dmesg, which is expired now though, I made thefollowing assumption:
There is a possibility that out of order execution made the "work->wq= wq" sentence executed behind the "queue_work()" call,and the "normal_work_helper" is executed in another CPU instantly,which caused the problem.
------
static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
struct btrfs_work *work)
{
unsigned long flags;

work->wq = wq;
/* There is no memory barrier ensure the work->wq = wq is executed
before queue_work */
thresh_queue_hook(wq);
if (work->ordered_func) {
spin_lock_irqsave(&wq->list_lock, flags);
list_add_tail(&work->ordered_list, &wq->ordered_list);
spin_unlock_irqrestore(&wq->list_lock, flags);
}
queue_work(wq->normal_wq, &work->normal_work);
}
------
The memory order problem involves compiler or even CPU hardware (AMDCPUS seems to have a weaker memory order than Intel),so to simulate the bug, I used the following codes tried to reproducethe bug, and the result is pretty much the same as your dmesg.
------
static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
struct btrfs_work *work)
{
unsigned long flags;
int in_order = get_random_int() % 1000

if (in_order)
work->wq = wq;
thresh_queue_hook(wq);
if (work->ordered_func) {
spin_lock_irqsave(&wq->list_lock, flags);
list_add_tail(&work->ordered_list, &wq->ordered_list);
spin_unlock_irqrestore(&wq->list_lock, flags);
}
queue_work(wq->normal_wq, &work->normal_work);

/* Try make the codes out of order since the Intel CPU
seems obey the memory order quite well */
if (!in_order)
work->wq = wq;
}
------
The fix is as easy as just adding a smp_wmb() behind the "work->wq =wq" sentence, but since I failed to reproduce the bug,
I can't confirm whether the fix is good or not.
I know it's impolite but would you please first try the smp_wmb()first to confirm the fix?
If the fix is good I'll send the formal V5 patchset.

Thanks

Qu

Happy new year!

Very sorry for disturbing but ithas been some time before my last replay
and no feedback since then (mainly because of the new year holiday).

If itis convenient for you,would you please give some feedback about thesmp_wmb() fix ?


Also incase that smp_wmb() doesn't work, it would be very nice of you to
also try smp_mb() fix.

Thanks
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue

Reply via email to