On 04/25/2016 08:35 PM, Qu Wenruo wrote:


Josef Bacik wrote on 2016/04/25 10:24 -0400:
On 04/24/2016 08:56 PM, Qu Wenruo wrote:


Josef Bacik wrote on 2016/04/22 14:23 -0400:
On 04/22/2016 02:21 PM, Mark Fasheh wrote:
On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:
On 04/15/2016 05:08 AM, Qu Wenruo wrote:
+    /*
+     * Force parent root to be updated, as we recorded it before so
its
+     * last_trans == cur_transid.
+     * Or it won't be committed again onto disk after later
+     * insert_dir_item()
+     */
+    if (!ret)
+        record_root_in_trans(trans, parent, 1);
+    return ret;
+}

NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,

Yeah I saw that. To be fair, we run a whole lot of the transaction
stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.

I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly
are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef



Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the
problem is, in create_pending_snapshots(), we are already inside
commit_transaction().

Or commit_transaction() can be called inside commit_transaction()?


No, figure out a different way.  IIRC I dealt with this with the
no_quota flag for inc_ref/dec_ref since the copy root stuff does strange
things with the reference counts, but all this code is gone now.  I
looked around to see if I could figure out how the refs are ending up
this way but it doesn't make sense to me and there isn't enough
information in your changelog for me to be able to figure it out. You've
created this mess, clean it up without making it messier.  Thanks,

Josef


Unfortunately, your original no_quota flag just hide the bug, and hide
it in a bad method.

Originally, no_quota flag is used for case like this, to skip quota at
snapshot creation, and use quota_inherit() to hack the quota accounting.
It seems work, but in fact, if the DIR_ITEM insert need to create a new
cousin leaf, then quota is messed up.


No, and this is the problem, you fundamentally didn't understand what I wrote, and instead of trying to understand it and fix the bug you just threw it all away. The no_quota stuff was not a hack, it was put in place to deal with refs we already knew where accounted for, such as when we converted to mixed refs or other such operations.

There were bugs in my rework, but now the situation is untenable. What we now have is something that holds delayed refs in memory for the entire transaction, which is a non-starter for anybody who wants to use this in production. On our gluster machines we get millions of delayed refs per transaction, and then there are multiple file systems. Then you have to post-process all of that during the critical section of the commit? So now suddenly I'm walking millions of delayed refs doing accounting all at once, that is going to cause commit latencies in the seconds which is completely unacceptable.

Anyway I spent some time this morning reading through the new stuff to figure out how it works now and I've got a patch to fix this problem that doesn't involve screwing with the transaction commit stuff at all. I sent it along separately

Btrfs: fix qgroup accounting when snapshotting

This fixes the basic case that was described originally. I haven't tested it other than that but I'm pretty sure it is correct. Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to