Take a look at these which should answer at least some of your questions. http://ceph.com/community/new-luminous-bluestore/
http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/ On Mon, Sep 11, 2017 at 8:45 PM, Richard Hesketh <richard.hesk...@rd.bbc.co.uk> wrote: > On 08/09/17 11:44, Richard Hesketh wrote: >> Hi, >> >> Reading the ceph-users list I'm obviously seeing a lot of people talking >> about using bluestore now that Luminous has been released. I note that many >> users seem to be under the impression that they need separate block devices >> for the bluestore data block, the DB, and the WAL... even when they are >> going to put the DB and the WAL on the same device! >> >> As per the docs at >> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/ >> this is nonsense: >> >>> If there is only a small amount of fast storage available (e.g., less than >>> a gigabyte), we recommend using it as a WAL device. If there is more, >>> provisioning a DB >>> device makes more sense. The BlueStore journal will always be placed on the >>> fastest device available, so using a DB device will provide the same >>> benefit that the WAL >>> device would while also allowing additional metadata to be stored there (if >>> it will fix). [sic, I assume that should be "fit"] >> >> I understand that if you've got three speeds of storage available, there may >> be some sense to dividing these. For instance, if you've got lots of HDD, a >> bit of SSD, and a tiny NVMe available in the same host, data on HDD, DB on >> SSD and WAL on NVMe may be a sensible division of data. That's not the case >> for most of the examples I'm reading; they're talking about putting DB and >> WAL on the same block device, but in different partitions. There's even one >> example of someone suggesting to try partitioning a single SSD to put >> data/DB/WAL all in separate partitions! >> >> Are the docs wrong and/or I am missing something about optimal bluestore >> setup, or do people simply have the wrong end of the stick? I ask because >> I'm just going through switching all my OSDs over to Bluestore now and I've >> just been reusing the partitions I set up for journals on my SSDs as DB >> devices for Bluestore HDDs without specifying anything to do with the WAL, >> and I'd like to know sooner rather than later if I'm making some sort of >> horrible mistake. >> >> Rich > > Having had no explanatory reply so far I'll ask further... > > I have been continuing to update my OSDs and so far the performance offered > by bluestore has been somewhat underwhelming. Recovery operations after > replacing the Filestore OSDs with Bluestore equivalents have been much slower > than expected, not even half the speed of recovery ops when I was upgrading > Filestore OSDs with larger disks a few months ago. This contributes to my > sense that I am doing something wrong. > > I've found that if I allow ceph-disk to partition my DB SSDs rather than > reusing the rather large journal partitions I originally created for > Filestore, it is only creating very small 1GB partitions. Attempting to > search for bluestore configuration parameters has pointed me towards > bluestore_block_db_size and bluestore_block_wal_size config settings. > Unfortunately these settings are completely undocumented so I'm not sure what > their functional purpose is. In any event in my running config I seem to have > the following default values: > > # ceph-conf --show-config | grep bluestore > ... > bluestore_block_create = true > bluestore_block_db_create = false > bluestore_block_db_path = > bluestore_block_db_size = 0 > bluestore_block_path = > bluestore_block_preallocate_file = false > bluestore_block_size = 10737418240 > bluestore_block_wal_create = false > bluestore_block_wal_path = > bluestore_block_wal_size = 100663296 > ... > > I have been creating bluestore osds by: > > ceph-disk prepare --bluestore /dev/sdX --block.db /dev/sdY1 --osd-id Z # > re-using existing partitions for DB > or > ceph-disk prepare --bluestore /dev/sdX --block.db /dev/sdY --osd-id Z # > letting ceph-disk partition DB, after zapping original partitions > > Are these sane values? What does it mean that block_db_size is 0 - is it just > using the entire block device specified or not actually using it at all? Is > the WAL actually being placed on the DB block device? And is that 1GB default > really a sensible size for the DB partition? > > Rich > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com