Re: Ignore O_SYNC for rbd cache

2012-10-12 Thread Tommi Virtanen
On Wed, Oct 10, 2012 at 9:23 AM, Sage Weil s...@inktank.com wrote:
 I certainly wouldn't recommend it, but there are probably use cases where
 it makes sense (i.e., the data isn't as important as the performance).

This would make a lot of sense for e.g. service orchestration-style
setups where you run an elastic pool of webapps. The persistent
storage is the database, not the local disk, but you might still e.g.
spool uploads to local disk first, or have a local cache a la varnish.
Crashing a machine in such a setup tends to mean deleting the image,
not trying to recover it.

Also, for anyone running virtualized mapreduce worker nodes.. Cephfs
plugged in as the FS, compute wanting local storage for the temporary
files, but crashes just mean the task is restarted elsewhere..

(Now, in both of the above, you might ask, why not use a local disk
for this then, why use RBD? Because a lot of people are interested in
running diskless compute servers, or ones booting off of a minimal
SSD/SD-card, with just the base OS, no vm images stored locally.
Tremendously helps with density, especially on low-power platforms
like ARM.)
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ignore O_SYNC for rbd cache

2012-10-10 Thread Andrey Korolyov
Hi,

Recent tests on my test rack with 20G IB(iboip, 64k mtu, default
CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite
fantastic performance - on both reads and writes Ceph completely
utilizing all disk bandwidth as high as 0.9 of theoretical limit of
sum of all bandwidths bearing in mind replication level. The only
thing that may bring down overall performance is a O_SYNC|O_DIRECT
writes which will be issued by almost every database server in the
default setup. Assuming that the database config may be untouchable
and somehow I can build very reliable hardware setup which `ll never
fail on power, should ceph have an option to ignore these flags? May
be there is another real-world cases for including such or I am very
wrong even thinking on fool client application in this way.

Thank you for any suggestion!
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ignore O_SYNC for rbd cache

2012-10-10 Thread Sage Weil
On Wed, 10 Oct 2012, Andrey Korolyov wrote:
 Hi,
 
 Recent tests on my test rack with 20G IB(iboip, 64k mtu, default
 CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite
 fantastic performance - on both reads and writes Ceph completely
 utilizing all disk bandwidth as high as 0.9 of theoretical limit of
 sum of all bandwidths bearing in mind replication level. The only
 thing that may bring down overall performance is a O_SYNC|O_DIRECT
 writes which will be issued by almost every database server in the
 default setup. Assuming that the database config may be untouchable
 and somehow I can build very reliable hardware setup which `ll never
 fail on power, should ceph have an option to ignore these flags? May
 be there is another real-world cases for including such or I am very
 wrong even thinking on fool client application in this way.

I certainly wouldn't recommend it, but there are probably use cases where 
it makes sense (i.e., the data isn't as important as the performance).  
Any such option would probably be called

 rbd async flush danger danger = true

and would trigger a flush but not wait for it, or perhaps

 rbd ignore flush danger danger = true

which would not honor flush at all. 

This would jeopoardize the integrity of the file system living on the RBD 
image; they rely on flush to order their commits, and playing fast and 
loose with that can lead to any number of corruptions.  The only silver 
lining is that in the not-so-distant future (3-4 years ago) this was 
poorly supported by the block layer and file systems alike and ext3 didn't 
crash and burn as quite often as you might have expected.

Anyway, not something I would recommend, certainly for a generic VM 
platform.  Maybe if you have a sepcific performance-sensitive application 
you can afford to let crash and burn...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ignore O_SYNC for rbd cache

2012-10-10 Thread Josh Durgin

On 10/10/2012 09:23 AM, Sage Weil wrote:

On Wed, 10 Oct 2012, Andrey Korolyov wrote:

Hi,

Recent tests on my test rack with 20G IB(iboip, 64k mtu, default
CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite
fantastic performance - on both reads and writes Ceph completely
utilizing all disk bandwidth as high as 0.9 of theoretical limit of
sum of all bandwidths bearing in mind replication level. The only
thing that may bring down overall performance is a O_SYNC|O_DIRECT
writes which will be issued by almost every database server in the
default setup. Assuming that the database config may be untouchable
and somehow I can build very reliable hardware setup which `ll never
fail on power, should ceph have an option to ignore these flags? May
be there is another real-world cases for including such or I am very
wrong even thinking on fool client application in this way.


I certainly wouldn't recommend it, but there are probably use cases where
it makes sense (i.e., the data isn't as important as the performance).
Any such option would probably be called

  rbd async flush danger danger = true

and would trigger a flush but not wait for it, or perhaps

  rbd ignore flush danger danger = true

which would not honor flush at all.


qemu already has a cache=unsafe option which does exactly that.


This would jeopoardize the integrity of the file system living on the RBD
image; they rely on flush to order their commits, and playing fast and
loose with that can lead to any number of corruptions.  The only silver
lining is that in the not-so-distant future (3-4 years ago) this was
poorly supported by the block layer and file systems alike and ext3 didn't
crash and burn as quite often as you might have expected.

Anyway, not something I would recommend, certainly for a generic VM
platform.  Maybe if you have a sepcific performance-sensitive application
you can afford to let crash and burn...

sage


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html