On 5/1/15 12:37 PM, Christian Schwede wrote:
On 01.05.15 20:33, Samuel Merritt wrote:
On 5/1/15 7:55 AM, Uwe Sauter wrote:


Am 01.05.2015 um 02:21 schrieb Samuel Merritt:

It seems like 1430268763.41931.data would be in the same allocation
group as
objects/757/a94/bd77129a1cae9e32381776e322efca94, and
bd77129a1cae9e32381776e322efca94 would be in the same allocation
group as objects/757/a94, and so on. Thus, everything would be in the
same allocation group as the root directory.

This can't be the case, or else there'd be no point to allocation
groups. What am I missing here?


Hi,

I think what you're missing is, that inodes stay in the allocation
group where they first were created. So moving a file
around in the filesystem changes the path but not the allocation
group. So first creating a temporary file and then
moving it into the hash folder leaves the file associated with the
temp folder's allocation group, thus the allocation
group grows bigger and bigger and searching the allocation group takes
more and more time.

That doesn't really answer the question, though. We have this message
<http://www.spinics.net/lists/xfs/msg32868.html> which says that "...the
locality of a new inode is determined by the
parent inode, and so if all new inodes are created in the same
directory, then they are all created in the same AG."

Let's say we start out with a freshly-formatted disk, so there's only
one inode, and it's for the root directory.

Then, Swift goes and starts making its directory structure on disk, and
calls mkdir('objects'). Since a new inode is created in the same AG as
its parent, the inode for '/objects' is in the same AG as the inode for
'/'.

Swift makes another dir: mkdir('objects/757')

The inode for '/objects/757' is in the same AG as its parent '/objects',
which is the same as the AG for '/'.

Keep going a while, and you get

/
/objects
/objects/757
/objects/757/a94
/objects/757/a94/bd77129a1cae9e32381776e322efca94
/objects/757/a94/bd77129a1cae9e32381776e322efca94/1430268763.41931.data
/tmp

and they're all in the same AG.

Now, the XFS developers are not stupid, so what I typed up there can't
possibly be true, or else every inode on a filesystem would be in the
same AG.

So, my question is this: what, of the things I typed above, is false?
Equivalently, how is an inode created in a *different* AG than its parent?

Hmm, reading the docs at

http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/AG_Free_Space_Management.html

I would assume a different AG is selected if there is one AG with more
free space.

After far too much time spent searching, I found some docs explaining things. On XFS, a *file* is created in the same AG as its parent directory [1], but a *directory* is not [2]. Rather, directories are splayed across all AGs.

Thus, in the example above, the .data file and its parent directory share an AG, but the other directories could each be in any AG.

[1] http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch06s03.html

[2] http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch06s02.html

But, and that might be one of the problems here: it seems there is a
default of only 4 allocation groups, at least that's what I see on
various disks executing a xfs_info.

In fact after looking into the sources of mkfs.xfs I found this default
for disks with sizes up to 4TB:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=blob;f=mkfs/xfs_mkfs.c;h=5084d755;hb=HEAD#l688

Might be a good idea to do some benchmarking with different AG numbers?

Could be useful, but we should first get Swift to not dump everything in the same AG. Otherwise, the benchmarks will be pretty predictable. ;)


_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to