On Fri, 9 Nov 2012 22:45:16 +0800, Liu Bo wrote: > On Fri, Nov 09, 2012 at 11:19:17AM +0100, Stefan Behrens wrote: >> On Fri, 9 Nov 2012 08:44:01 +0800, Liu Bo wrote: >>> On Thu, Nov 08, 2012 at 06:24:36PM +0100, Stefan Behrens wrote: >>>> On Thu, 8 Nov 2012 22:50:47 +0800, Liu Bo wrote: >>>>> On Tue, Nov 06, 2012 at 05:38:33PM +0100, Stefan Behrens wrote: >>>>>> + trans = btrfs_start_transaction(root, 0); >>>>> >>>>> why a start_transaction here? Any reasons? >>>>> (same question also for some other places) >>>>> >>>> >>>> Without this transaction, there is outstanding I/O which is not flushed. >>>> Pending writes that go only to the old disk need to be flushed before >>>> the mode is switched to write all live data to the source disk and to >>>> the target disk as well. The copy operation that is part of the scrub >>>> code works on the commit root for performance reasons. Every write >>>> request that is performed after the commit root is established needs to >>>> go to both disks. Those requests that already have the bdev assigned >>>> (i.e., btrfs_map_bio() was already called) cannot be duplicated anymore >>>> to write to the new disk as well. >>>> >>>> btrfs_dev_replace_finishing() looks similar and goes through a >>>> transaction commit between the steps where the bdev in the mapping tree >>>> is swapped and the step when the old bdev is freed. Otherwise the bdev >>>> would be accessed after being freed. >>>> >>> >>> I see, if you're only about to flush metadata, why not join a transaction? >> >> btrfs_join_transaction() would delay the current transaction and enforce >> that the current transaction is used and not a new one. >> btrfs_start_transaction() would use either the current transaction, or a >> new one. It is less interfering. > > hmm...btrfs_start_transaction() would not use the current transaction unless > you're still in the same task, ie. current->journal_info remains unchanged, > otherwise it will be blocked by the current transaction(wait_current_trans()). > > If there are several btrfs_start_transaction() being blocked, after the > current > one's commit, one of them will allocate a new transaction, and the rest can > join it. > > But btrfs_join_transaction will join the current as much as possible. > > And since here we don't do any reservation and seems to just update > chunk/device > tree(which will use global block rsv directly), I perfer > btrfs_join_transaction(). >
I am still not sure, which one is worse or better: a) to delay a commit by calling btrfs_join_transaction() which joins and thereby delays a transaction, or b) to go through one additional transaction. Here is the log message of the commit that added btrfs_join_transaction(). For me, it sounds like one should use btrfs_join_transaction() only when it is _required_ to join a transaction, e.g. when a low level function is required to join the transaction that some higher level function has started: commit f9295749388f82c8d2f485e99c72cd7c7876a99b Author: Chris Mason <chris.ma...@oracle.com> Date: Thu Jul 17 12:54:14 2008 -0400 btrfs_start_transaction: wait for commits in progress to finish btrfs_commit_transaction has to loop waiting for any writers in the transaction to finish before it can proceed. btrfs_start_transaction should be polite and not join a transaction that is in the process of being finished off. There are a few places that can't wait, basically the ones doing IO that might be needed to finish the transaction. For them, btrfs_join_transaction is added. >> >> Since in dev-replace.c it is not required to enforce that a current >> transaction is joined, btrfs_start_transaction() is the one to choose >> here, as I understood it. >> >> But that's an interesting topic and I would appreciate to get a definite >> rule which one to choose when. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html