On 4/29/15 4:08 PM, Shrinand Javadekar wrote:
Hi,

I have been investigating a pretty serious Swift performance problem
for a while now. I have a single node Swift instance with 16 cores,
64GB memory and 8 MDs of 3TB each. I only write 256KB objects into
this Swift instance with high concurrency; 256 parallel object PUTs.
Also, I was sharding the objects equally across 32 containers.

On a completely clean system, we were getting ~375 object puts per
second. But this kept on reducing pretty quickly and by the time we
had 600GB of data in Swift, the throughput was ~100 objects per
second.

We used sysdig to get a trace of what's happening in the system and
found that the open system calls were taking way longer; several 100s
of milliseconds, sometimes even 1 second.

Investigating this further revealed a problem in the way Swift writes
the objects on XFS. Swift's object server creates a temp directory
under the mount point /srv/node/r0. It create an file under this temp
directory first (say /srv/node/r0/tmp/tmpASDF) and eventually renames
this file to its final destination.

rename /srv/node/r0/tmp/tmpASDF ->
/srv/node/r0/objects/312/eef/deadbeef/33453453454323424.data.

XFS creates an inode in the same allocation group as it parent. So,
when the temp file tmpASDF is created, it goes in the same allocation
group of "tmp". When the rename happens, only the filesystem metadata
gets modified. The allocation groups of the inodes don't change.

This part confuses me. If an inode is in the same allocation group as its parent, then let's say we have a path on the FS of:

objects/757/a94/bd77129a1cae9e32381776e322efca94/1430268763.41931.data

It seems like 1430268763.41931.data would be in the same allocation group as objects/757/a94/bd77129a1cae9e32381776e322efca94, and bd77129a1cae9e32381776e322efca94 would be in the same allocation group as objects/757/a94, and so on. Thus, everything would be in the same allocation group as the root directory.

This can't be the case, or else there'd be no point to allocation groups. What am I missing here?

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to