Okay I get your point, its way more safer without cache at all.

I am talking from totally ignorace, so please correct me if I say
something wrong.

What I dont really understand is how badly is DB space used.

1-When its a new OSD, it might be totally empty but its not used for
storing any actual data at all, so writes and reads could be speeded up
by using that free space.

2-When OSD is full, maybe you have tones of cold metadata that is never
used taking all the space in the SSD. So maybe it wouldnt be a bad idea
to push that metadata to the HDD cold DB and trying to bring the actual
hot data into the disk so reads (or maybe writes) could be improved. So
maybe having a hot ratio on the metadata could be interesing, I know
metadata is way more important that the actual data but if metadata is
freezing I dont get the value of being using SSD space.

I know blueStore has also System Cache as I metion in this email

https://www.spinics.net/lists/ceph-users/msg39426.html

but also that cache doesnt include any data at all, so its hard for me
to understand how bluestore can be so fast if its limited to the HDD speed.

If someone knows how RocksDB SSD actually works, how bluestore works to
keep speed up or why the hole metadata should be in a separate SSD
please tell me :) I am really trying to understand this topic.

El 14/10/2017 a las 2:39, Kjetil Joergensen escribió:
> Generally on bcache & for that matter lvmcache & dmwriteboost.
>
> We did extensive "power off" testing with all of them and reliably
> managed to break it on our hardware setup.
>
> while true; boot box; start writing & stress metadata updates (i.e.
> make piles of files and unlink them, or you could find something else
> that's picky about write ordering); let it run for a bit; yank power;
> power on;
>
> This never survived for more than a night without badly corrupting
> some xfs filesystem. We did the same testing without caching and could
> not reproduce.
>
> This may have been a quirk resulting from our particular setup, I get
> the impression that others use it and sleep well at night, but I'd
> recommend testing it under the most unforgivable circumstances you can
> think of before proceeding.
>
> -KJ
>
> On Thu, Oct 12, 2017 at 4:54 PM, Jorge Pinilla López <jorp...@unizar.es> 
> wrote:
>> Well, I wouldn't use bcache on filestore at all.
>> First there are problems with all that you have said and second but way
>> important you got doble writes (in FS data was written to journal and to
>> storage disk at the same time), if jounal and data disk were the same then
>> speed was divided by two getting really bad output.
>>
>> In BlueStore things change quite a lot, first there are not double writes
>> there is no "journal" (well there is  a something call Wal but  it's not
>> used in the same way), data goes directly into the data disk and you only
>> write a few metadata and make a commit into the DB. Rebalancing and scrub go
>> through a RockDB not a file system making it way more simple and effective,
>> you aren't supposed to have all the problems that you had with FS.
>>
>> In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I
>> personally wouldn't use something deprecated by developers and support.
>>
>>
>> -------- Mensaje original --------
>> De: Marek Grzybowski <marek.grzybow...@gmail.com>
>> Fecha: 13/10/17 12:22 AM (GMT+01:00)
>> Para: Jorge Pinilla López <jorp...@unizar.es>, ceph-users@lists.ceph.com
>> Asunto: Re: [ceph-users] using Bcache on blueStore
>>
>> On 12.10.2017 20:28, Jorge Pinilla López wrote:
>>> Hey all!
>>> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD)
>>> per host.
>>>
>>> I have been thinking and all docs say that I should give all the SSD space
>>> for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.
>>>
>>> But it came to my mind that if the OSD isnt full maybe I am not using all
>>> the space in the SSD, or maybe I prefer having a really small amount of hot
>>> k/v and metadata and the data itself in a really fast device than just
>>> storing all could metadata.
>>>
>>> So I though that using Bcache to make SSD to be a cache and as metadata
>>> and k/v are usually hot, they should be place on the cache. But this doesnt
>>> guarantee me that k/v and metadata are actually always in the SSD cause
>>> under heavy cache loads it can be pushed out (like really big data files).
>>>
>>> So I came up with the idea of setting small 5-10GB partitions for the hot
>>> RocksDB and the rest to use it as a cache, so I make sure that really hot
>>> metadata is actually always on the SSD and the coulder one should be also on
>>> the SSD (as a bcache) if its not really freezing, in that case they would be
>>> pushed to the HDD. It also doesnt make anysense to have metadatada that you
>>> never used using space on the SSD, I rather use that space to store hotter
>>> data.
>>>
>>> This is also make writes faster, and in blueStore we dont have the double
>>> write problem so it should work fine.
>>>
>>> What do you think about this? does it have any downsite? is there any
>>> other way?
>> Hi Jorge
>>   I was inexperienced and tried bcache on old fsstore once. It was bad.
>> Mostly because bcache does not have any typical disk scheduling algorithm.
>> So when scrub or rebalnce was running latency on such storage was very high
>> and unpredictable.
>> OSD deamon could not give any ioprio for disks read or writes, and
>> additionaly
>> bcache cache was poisoned by scrub/rebalance.
>>
>> Fortunately to me, it is very easy to rolling replace OSDs.
>> I use some SSDs partitions for journal now and what left for pure ssd
>> storage.
>> This works really great .
>>
>> If i will ever need cache, i will use cache tiering instead .
>>
>>
>> --
>>   Kind Regards
>>     Marek Grzybowski
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>

-- 
------------------------------------------------------------------------
*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A
<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to