Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-27 Thread Daniel Niasoff
Hi Ric,

But you would still have to set a dm-cache per rbd volume which makes it 
difficult to manage.

There needs to be a global setting either within kvm or ceph that caches 
reads/writes before they hit the rbd the device.

Thanks

Daniel

-Original Message-
From: Ric Wheeler [mailto:rwhee...@redhat.com] 
Sent: 27 March 2016 09:00
To: Van Leeuwen, Robert ; Daniel Niasoff 
; Jason Dillaman 
Cc: ceph-users@lists.ceph.com; Mike Snitzer ; Joe Thornber 

Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node.

On 03/16/2016 12:15 PM, Van Leeuwen, Robert wrote:
>> My understanding of how a writeback cache should work is that it should only 
>> take a few seconds for writes to be streamed onto the network and is 
>> focussed on resolving the speed issue of small sync writes. The writes would 
>> be bundled into larger writes that are not time sensitive.
>>
>> So there is potential for a few seconds data loss but compared to the 
>> current trend of using ephemeral storage to solve this issue, it's a major 
>> improvement.
> It think is a bit worse then just a few seconds of data:
> As mentioned in the blueprint for ceph you would need some kind or ordered 
> write-back cache that maintains checkpoints internally.
>
> I am not that familiar with the internals of dm-cache but I do not think it 
> guarantees any write order.
> E.g. By default it will bypass the cache for sequential IO.
>
> So I think it is very likely the “few seconds of data loss" in this case 
> means the filesystem is corrupt and you could lose the whole thing.
> At the very least you will need to run fsck on it and hope it can sort out 
> all of the errors with minimal data loss.
>
>
> So, for me, it seems conflicting to me to use persistent storage and then 
> hoping your volumes survive a power outage.
>
> If you can survive missing that data you are probably better of running fully 
> from ephemeral storage in the first place.
>
> Cheers,
> Robert van Leeuwen

Hi Robert,

I might be misunderstanding your point above, but dm-cache provides persistent 
storage. It will be there when you reboot and look for data on that same box. 
dm-cache is also power failure safe and tested to survive this kind of outage.

If you try to look at the rbd device under dm-cache from another host, of 
course any data that was cached on the dm-cache layer will be missing since the 
dm-cache device itself is local to the host you wrote the data from originally.

In a similar way, using dm-cache for write caching (or any write cache local to 
a client) will also mean that your data has a single point of failure since 
that data will not be replicated out to the backing store until it is destaged 
from cache.

I would note that this is exactly the kind of write cache that is popular these 
days in front of enterprise storage arrays on clients so this is not really 
uncommon.

Regards,

Ric


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-27 Thread Ric Wheeler

On 03/16/2016 12:15 PM, Van Leeuwen, Robert wrote:

My understanding of how a writeback cache should work is that it should only 
take a few seconds for writes to be streamed onto the network and is focussed 
on resolving the speed issue of small sync writes. The writes would be bundled 
into larger writes that are not time sensitive.

So there is potential for a few seconds data loss but compared to the current 
trend of using ephemeral storage to solve this issue, it's a major improvement.

It think is a bit worse then just a few seconds of data:
As mentioned in the blueprint for ceph you would need some kind or ordered 
write-back cache that maintains checkpoints internally.

I am not that familiar with the internals of dm-cache but I do not think it 
guarantees any write order.
E.g. By default it will bypass the cache for sequential IO.

So I think it is very likely the “few seconds of data loss" in this case means 
the filesystem is corrupt and you could lose the whole thing.
At the very least you will need to run fsck on it and hope it can sort out all 
of the errors with minimal data loss.


So, for me, it seems conflicting to me to use persistent storage and then 
hoping your volumes survive a power outage.

If you can survive missing that data you are probably better of running fully 
from ephemeral storage in the first place.

Cheers,
Robert van Leeuwen


Hi Robert,

I might be misunderstanding your point above, but dm-cache provides persistent 
storage. It will be there when you reboot and look for data on that same box. 
dm-cache is also power failure safe and tested to survive this kind of outage.


If you try to look at the rbd device under dm-cache from another host, of course 
any data that was cached on the dm-cache layer will be missing since the 
dm-cache device itself is local to the host you wrote the data from originally.


In a similar way, using dm-cache for write caching (or any write cache local to 
a client) will also mean that your data has a single point of failure since that 
data will not be replicated out to the backing store until it is destaged from 
cache.


I would note that this is exactly the kind of write cache that is popular these 
days in front of enterprise storage arrays on clients so this is not really 
uncommon.


Regards,

Ric


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com