Re: Ignore O_SYNC for rbd cache
On Wed, Oct 10, 2012 at 9:23 AM, Sage Weil s...@inktank.com wrote: I certainly wouldn't recommend it, but there are probably use cases where it makes sense (i.e., the data isn't as important as the performance). This would make a lot of sense for e.g. service orchestration-style setups where you run an elastic pool of webapps. The persistent storage is the database, not the local disk, but you might still e.g. spool uploads to local disk first, or have a local cache a la varnish. Crashing a machine in such a setup tends to mean deleting the image, not trying to recover it. Also, for anyone running virtualized mapreduce worker nodes.. Cephfs plugged in as the FS, compute wanting local storage for the temporary files, but crashes just mean the task is restarted elsewhere.. (Now, in both of the above, you might ask, why not use a local disk for this then, why use RBD? Because a lot of people are interested in running diskless compute servers, or ones booting off of a minimal SSD/SD-card, with just the base OS, no vm images stored locally. Tremendously helps with density, especially on low-power platforms like ARM.) -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ignore O_SYNC for rbd cache
Hi, Recent tests on my test rack with 20G IB(iboip, 64k mtu, default CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite fantastic performance - on both reads and writes Ceph completely utilizing all disk bandwidth as high as 0.9 of theoretical limit of sum of all bandwidths bearing in mind replication level. The only thing that may bring down overall performance is a O_SYNC|O_DIRECT writes which will be issued by almost every database server in the default setup. Assuming that the database config may be untouchable and somehow I can build very reliable hardware setup which `ll never fail on power, should ceph have an option to ignore these flags? May be there is another real-world cases for including such or I am very wrong even thinking on fool client application in this way. Thank you for any suggestion! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ignore O_SYNC for rbd cache
On Wed, 10 Oct 2012, Andrey Korolyov wrote: Hi, Recent tests on my test rack with 20G IB(iboip, 64k mtu, default CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite fantastic performance - on both reads and writes Ceph completely utilizing all disk bandwidth as high as 0.9 of theoretical limit of sum of all bandwidths bearing in mind replication level. The only thing that may bring down overall performance is a O_SYNC|O_DIRECT writes which will be issued by almost every database server in the default setup. Assuming that the database config may be untouchable and somehow I can build very reliable hardware setup which `ll never fail on power, should ceph have an option to ignore these flags? May be there is another real-world cases for including such or I am very wrong even thinking on fool client application in this way. I certainly wouldn't recommend it, but there are probably use cases where it makes sense (i.e., the data isn't as important as the performance). Any such option would probably be called rbd async flush danger danger = true and would trigger a flush but not wait for it, or perhaps rbd ignore flush danger danger = true which would not honor flush at all. This would jeopoardize the integrity of the file system living on the RBD image; they rely on flush to order their commits, and playing fast and loose with that can lead to any number of corruptions. The only silver lining is that in the not-so-distant future (3-4 years ago) this was poorly supported by the block layer and file systems alike and ext3 didn't crash and burn as quite often as you might have expected. Anyway, not something I would recommend, certainly for a generic VM platform. Maybe if you have a sepcific performance-sensitive application you can afford to let crash and burn... sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ignore O_SYNC for rbd cache
On 10/10/2012 09:23 AM, Sage Weil wrote: On Wed, 10 Oct 2012, Andrey Korolyov wrote: Hi, Recent tests on my test rack with 20G IB(iboip, 64k mtu, default CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite fantastic performance - on both reads and writes Ceph completely utilizing all disk bandwidth as high as 0.9 of theoretical limit of sum of all bandwidths bearing in mind replication level. The only thing that may bring down overall performance is a O_SYNC|O_DIRECT writes which will be issued by almost every database server in the default setup. Assuming that the database config may be untouchable and somehow I can build very reliable hardware setup which `ll never fail on power, should ceph have an option to ignore these flags? May be there is another real-world cases for including such or I am very wrong even thinking on fool client application in this way. I certainly wouldn't recommend it, but there are probably use cases where it makes sense (i.e., the data isn't as important as the performance). Any such option would probably be called rbd async flush danger danger = true and would trigger a flush but not wait for it, or perhaps rbd ignore flush danger danger = true which would not honor flush at all. qemu already has a cache=unsafe option which does exactly that. This would jeopoardize the integrity of the file system living on the RBD image; they rely on flush to order their commits, and playing fast and loose with that can lead to any number of corruptions. The only silver lining is that in the not-so-distant future (3-4 years ago) this was poorly supported by the block layer and file systems alike and ext3 didn't crash and burn as quite often as you might have expected. Anyway, not something I would recommend, certainly for a generic VM platform. Maybe if you have a sepcific performance-sensitive application you can afford to let crash and burn... sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html