On 5/3/18 11:52 AM, Nikolay Borisov wrote: > > > On 3.05.2018 16:39, Jeff Mahoney wrote: >> On 5/3/18 3:24 AM, Nikolay Borisov wrote: >>> >>> >>> On 3.05.2018 00:11, je...@suse.com wrote: >>>> From: Jeff Mahoney <je...@suse.com> >>>> >>>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization) >>>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but >>>> ended up reintroducing the hang-on-unmount bug that the commit it >>>> intended to fix addressed. >>>> >>>> The race this time is between qgroup_rescan_init setting >>>> ->qgroup_rescan_running = true and the worker starting. There are >>>> many scenarios where we initialize the worker and never start it. The >>>> completion btrfs_ioctl_quota_rescan_wait waits for will never come. >>>> This can happen even without involving error handling, since mounting >>>> the file system read-only returns between initializing the worker and >>>> queueing it. >>>> >>>> The right place to do it is when we're queuing the worker. The flag >>>> really just means that btrfs_ioctl_quota_rescan_wait should wait for >>>> a completion. >>>> >>>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to >>>> refer to both runtime behavior and on-disk state, we introduce a new >>>> fs_info->qgroup_rescan_ready to indicate that we're initialized and >>>> waiting to start. >>> >>> Am I correct in my understanding that this qgroup_rescan_ready flag is >>> used to avoid qgroup_rescan_init being called AFTER it has already been >>> called but BEFORE queue_rescan_worker ? Why wasn't the initial version >>> of this patch without this flag sufficient? >> >> No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN >> flag near the end of the worker and clearing the running flag. The >> rescan lock is dropped in between, so btrfs_rescan_init will let a new >> rescan request in while we update the status item on disk. We wouldn't >> have queued another worker since that's what the warning catches, but if >> there were already tasks waiting for completion, they wouldn't have been >> woken since the wait queue list would be reinitialized. There's no way >> to reorder clearing the flag without changing how we handle >> ->qgroup_flags. I plan on doing that separately. This was just meant >> to be the simple fix. > > Great, I think some of this information should go into the change log, > in explaining what the symptoms of the race condition are.
You're right. I was treating as a race that my patch introduced but it didn't. It just complained about it. -Jeff -- Jeff Mahoney SUSE Labs
signature.asc
Description: OpenPGP digital signature