[ceph-users] Is it possible to stripe rados object?

2022-01-26 Thread lin yunfan
Hi,
I know with rbd and cephfs there is a stripe setting to stripe data
into multiple rodos object.
Is it possible to use librados api to stripe a large object into many
small ones?

linyunfan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW resharding

2020-05-25 Thread lin yunfan
I think the shard number recommendation is 100K objects/per shard/per
bucket. If you have many objects but they are spread in many
buckets/containers and each bucket/container have less than 1.6M
objects(max_shards=16) then you should be ok.

linyunfan

Adrian Nicolae  于2020年5月25日周一 下午3:04写道:
>
> I'm using only Swift , not S3.  We have a container for every customer.
> Right now there are thousands of containers.
>
>
>
> On 5/25/2020 9:02 AM, lin yunfan wrote:
> > Can you store your data in different buckets?
> >
> > linyunfan
> >
> > Adrian Nicolae  于2020年5月19日周二 下午3:32写道:
> >> Hi,
> >>
> >> I have the following Ceph Mimic setup :
> >>
> >> - a bunch of old servers with 3-4 SATA drives each (74 OSDs in total)
> >>
> >> - index/leveldb is stored on each OSD (so no SSD drives, just SATA)
> >>
> >> - the current usage  is :
> >>
> >> GLOBAL:
> >>   SIZEAVAIL   RAW USED %RAW USED
> >>   542 TiB 105 TiB  437 TiB 80.67
> >> POOLS:
> >>   NAME   ID USED%USED MAX
> >> AVAIL OBJECTS
> >>   .rgw.root  1  1.1 KiB 0 26
> >> TiB4
> >>   default.rgw.control2  0 B 0 26
> >> TiB8
> >>   default.rgw.meta   3   20 MiB 0 26
> >> TiB75357
> >>   default.rgw.log4  0 B 0 26
> >> TiB 4271
> >>   default.rgw.buckets.data   5  290 TiB 85.05 51 TiB
> >> 78067284
> >>   default.rgw.buckets.non-ec 6  0 B 0 26
> >> TiB0
> >>   default.rgw.buckets.index  7  0 B 0 26
> >> TiB   603008
> >>
> >> - rgw_override_bucket_index_max_shards = 16.   Clients are accessing RGW
> >> via Swift, not S3.
> >>
> >> - the replication schema is EC 4+2.
> >>
> >> We are using this Ceph cluster as  a secondary storage for another
> >> storage infrastructure (which is more expensive) and we are offloading
> >> cold data (big files with a low number of downloads/reads from our
> >> customer). This way we can lower the TCO .  So most of the files are big
> >> ( a few GB at least).
> >>
> >>So far Ceph is doing well considering that I don't have big
> >> expectations from current hardware.  I'm a bit worried however that we
> >> have 78 M objects with max_shards=16 and we will probably reach 100M in
> >> the next few months. Do I need a increase the max shards to ensure the
> >> stability of the cluster ?  I read that storing more than 1 M of objects
> >> in a single bucket can lead to OSD's flapping or having io timeouts
> >> during deep-scrub or even to have ODS's failures due to the leveldb
> >> compacting all the time if we have a large number of DELETEs.
> >>
> >> Any advice would be appreciated.
> >>
> >>
> >> Thank you,
> >>
> >> Adrian Nicolae
> >>
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW resharding

2020-05-24 Thread lin yunfan
Can you store your data in different buckets?

linyunfan

Adrian Nicolae  于2020年5月19日周二 下午3:32写道:
>
> Hi,
>
> I have the following Ceph Mimic setup :
>
> - a bunch of old servers with 3-4 SATA drives each (74 OSDs in total)
>
> - index/leveldb is stored on each OSD (so no SSD drives, just SATA)
>
> - the current usage  is :
>
> GLOBAL:
>  SIZEAVAIL   RAW USED %RAW USED
>  542 TiB 105 TiB  437 TiB 80.67
> POOLS:
>  NAME   ID USED%USED MAX
> AVAIL OBJECTS
>  .rgw.root  1  1.1 KiB 0 26
> TiB4
>  default.rgw.control2  0 B 0 26
> TiB8
>  default.rgw.meta   3   20 MiB 0 26
> TiB75357
>  default.rgw.log4  0 B 0 26
> TiB 4271
>  default.rgw.buckets.data   5  290 TiB 85.05 51 TiB
> 78067284
>  default.rgw.buckets.non-ec 6  0 B 0 26
> TiB0
>  default.rgw.buckets.index  7  0 B 0 26
> TiB   603008
>
> - rgw_override_bucket_index_max_shards = 16.   Clients are accessing RGW
> via Swift, not S3.
>
> - the replication schema is EC 4+2.
>
> We are using this Ceph cluster as  a secondary storage for another
> storage infrastructure (which is more expensive) and we are offloading
> cold data (big files with a low number of downloads/reads from our
> customer). This way we can lower the TCO .  So most of the files are big
> ( a few GB at least).
>
>   So far Ceph is doing well considering that I don't have big
> expectations from current hardware.  I'm a bit worried however that we
> have 78 M objects with max_shards=16 and we will probably reach 100M in
> the next few months. Do I need a increase the max shards to ensure the
> stability of the cluster ?  I read that storing more than 1 M of objects
> in a single bucket can lead to OSD's flapping or having io timeouts
> during deep-scrub or even to have ODS's failures due to the leveldb
> compacting all the time if we have a large number of DELETEs.
>
> Any advice would be appreciated.
>
>
> Thank you,
>
> Adrian Nicolae
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-14 Thread lin yunfan
That is correct.I didn't explain it clearly. I said that is because in
some write only scenario  the public network and cluster network will
all be saturated the same time.
linyunfan

Janne Johansson  于2020年5月14日周四 下午3:42写道:
>
> Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :
>>
>> Besides the recoverry  scenario , in a write only scenario the cluster
>> network will use the almost the same bandwith as public network.
>
>
> That would depend on the replication factor. If it is high, I would assume 
> every MB from the client network would make (repl-factor - 1) times the data 
> on the private network to send replication requests to the other OSD hosts 
> with the same amount of data.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-13 Thread lin yunfan
Besides the recoverry  scenario , in a write only scenario the cluster
network will use the almost the same bandwith as public network.

linyunfan

Anthony D'Atri  于2020年5月9日周六 下午4:32写道:
>
>
> > Hi,
> >
> > I deployed few clusters with two networks as well as only one network.
> > There has little impact between them for my experience.
> >
> > I did a performance test on nautilus cluster with two networks last week.
> > What I found is that the cluster network has low bandwidth usage
>
> During steady-state, sure.  Heartbeats go over that, as do replication ops 
> when clients write data.
>
> During heavy recovery or backfill, including healing from failures, 
> balancing, adding/removing drives, much more will be used.
>
> Convention wisdom has been to not let that traffic DoS clients, or clients to 
> DoS heartbeats.
>
> But this I think dates to a time when 1Gb/s networks were common.  If one’s 
> using modern multiple/bonded 25Gb/s or 40Gb/s links ….
>
> > while public network bandwidth is nearly full.
>
> If your public network is saturated, that actually is a problem, last thing 
> you want is to add recovery traffic, or to slow down heartbeats.  For most 
> people, it isn’t saturated.
>
> How do you define “full” ?  TOR uplinks?  TORs to individual nodes?  Switch 
> backplanes?  Are you using bonding with the wrong hash policy?
>
> > As a result, I don't think the cluster network is necessary.
>
> For an increasing percentage of folks deploying production-quality clusters, 
> agreed.
>
> >
> >
> > Willi Schiegel  于2020年5月8日周五 下午6:14写道:
> >
> >> Hello Nghia,
> >>
> >> I once asked a similar question about network architecture and got the
> >> same answer as Martin wrote from Wido den Hollander:
> >>
> >> There is no need to have a public and cluster network with Ceph. Working
> >> as a Ceph consultant I've deployed multi-PB Ceph clusters with a single
> >> public network without any problems. Each node has a single IP-address,
> >> nothing more, nothing less.
> >>
> >> In the current Ceph manual you can read
> >>
> >> It is possible to run a Ceph Storage Cluster with two networks: a public
> >> (front-side) network and a cluster (back-side) network. However, this
> >> approach complicates network configuration (both hardware and software)
> >> and does not usually have a significant impact on overall performance.
> >> For this reason, we generally recommend that dual-NIC systems either be
> >> configured with two IPs on the same network, or bonded.
> >>
> >> I followed the advice from Wido "One system, one IP address" and
> >> everything works fine. So, you should be fine with one interface for
> >> MONs, MGRs, and OSDs.
> >>
> >> Best
> >> Willi
> >>
> >> On 5/8/20 11:57 AM, Nghia Viet Tran wrote:
> >>> Hi Martin,
> >>>
> >>> Thanks for your response. You mean one network interface for only MON
> >>> hosts or for the whole cluster including OSD hosts? I’m confusing now
> >>> because there are some projects that only useone public network for the
> >>> whole cluster. That means the rebalancing, replicating objects and
> >>> heartbeats from OSD hostswould affects the performance of Ceph client.
> >>>
> >>> *From: *Martin Verges 
> >>> *Date: *Friday, May 8, 2020 at 16:20
> >>> *To: *Nghia Viet Tran 
> >>> *Cc: *"ceph-users@ceph.io" 
> >>> *Subject: *Re: [ceph-users] Cluster network and public network
> >>>
> >>> Hello Nghia,
> >>>
> >>> just use one network interface card and use frontend and backend traffic
> >>> on the same. No problem with that.
> >>>
> >>> If you have a dual port card, use both ports as an LACP channel and
> >>> maybe separate it using VLANs if you want to, but not required as well.
> >>>
> >>>
> >>> --
> >>>
> >>> Martin Verges
> >>> Managing director
> >>>
> >>> Mobile: +49 174 9335695
> >>> E-Mail: martin.ver...@croit.io 
> >>> Chat: https://t.me/MartinVerges
> >>>
> >>> croit GmbH, Freseniusstr. 31h, 81247 Munich
> >>> CEO: Martin Verges - VAT-ID: DE310638492
> >>> Com. register: Amtsgericht Munich HRB 231263
> >>>
> >>> Web: https://croit.io
> >>> YouTube: https://goo.gl/PGE1Bx
> >>>
> >>> Am Fr., 8. Mai 2020 um 09:29 Uhr schrieb Nghia Viet Tran
> >>> mailto:nghia.viet.t...@mgm-tp.com>>:
> >>>
> >>>Hi everyone,
> >>>
> >>>I have a question about the network setup. From the document, It’s
> >>>recommended to have 2 NICs per hosts as described in below picture
> >>>
> >>>Diagram
> >>>
> >>>In the picture, OSD hosts will connect to the Cluster network for
> >>>replicate and heartbeat between OSDs, therefore, we definitely need
> >>>2 NICs for it. But seems there are no connections between Ceph MON
> >>>and Cluster network. Can we install 1 NIC on Ceph MON then?
> >>>
> >>>I appreciated any comments!
> >>>
> >>>Thank you!
> >>>
> >>>--
> >>>
> >>>Nghia Viet Tran (Mr)
> >>>
> >>>___
> >>>ceph-users mailing list -- ceph-users@ceph.io
> >>>

[ceph-users] Re: Bluestore - How to review config?

2020-05-05 Thread lin yunfan
Is there a way to get the block,block.db,block.wal path and size?
what if all of them or some of them are colocated in one disk?

I can get the info from a wal,db,block colocated osd like below:

ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
{
"/var/lib/ceph/osd/ceph-0//block": {
"osd_uuid": "199f2445-af9e-4172-8231-6d98858684e8",
"size": 107268255744,
"btime": "2020-02-15 21:53:42.972004",
"description": "main",
"bluefs": "1",
"ceph_fsid": "83a73817-3566-4044-91b6-22cee6753515",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"ready": "ready",
"require_osd_release": "\u000c",
"whoami": "0"
}
}
but there is no path of the block.

ceph osd metadata 0
{
"id": 0,
"arch": "x86_64",
"back_addr": "172.18.2.178:6801/12605",
"back_iface": "ens160",
"bluefs": "1",
"bluefs_db_access_mode": "blk",
"bluefs_db_block_size": "4096",
"bluefs_db_dev": "8:16",
"bluefs_db_dev_node": "sdb",
"bluefs_db_driver": "KernelDevice",
"bluefs_db_model": "Virtual disk",
"bluefs_db_partition_path": "/dev/sdb2",
"bluefs_db_rotational": "1",
"bluefs_db_size": "107268255744",
"bluefs_db_type": "hdd",
"bluefs_single_shared_device": "1",
"bluestore_bdev_access_mode": "blk",
"bluestore_bdev_block_size": "4096",
"bluestore_bdev_dev": "8:16",
"bluestore_bdev_dev_node": "sdb",
"bluestore_bdev_driver": "KernelDevice",
"bluestore_bdev_model": "Virtual disk",
"bluestore_bdev_partition_path": "/dev/sdb2",
"bluestore_bdev_rotational": "1",
"bluestore_bdev_size": "107268255744",
"bluestore_bdev_type": "hdd",
"ceph_version": "ceph version 12.2.13
(584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)",
"cpu": "Intel(R) Xeon(R) CPU E7-4820 v3 @ 1.90GHz",
"default_device_class": "hdd",
"distro": "ubuntu",
"distro_description": "Ubuntu 18.04.3 LTS",
"distro_version": "18.04",
"front_addr": "172.18.2.178:6800/12605",
"front_iface": "ens160",
"hb_back_addr": "172.18.2.178:6802/12605",
"hb_front_addr": "172.18.2.178:6803/12605",
"hostname": "ceph",
"journal_rotational": "1",
"kernel_description": "#92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020",
"kernel_version": "4.15.0-91-generic",
"mem_swap_kb": "4194300",
"mem_total_kb": "8168160",
"os": "Linux",
"osd_data": "/var/lib/ceph/osd/ceph-0",
"osd_objectstore": "bluestore",
"rotational": "1"
}
there are paths and size. does bdev mean block in  ceph-bluestore-tool?


ceph daemon osd.0 config show (
block
"bluestore_block_path": "",
"bluestore_block_size": "10737418240",

db
"bluestore_block_db_create": "false",
"bluestore_block_db_path": "",
"bluestore_block_db_size": "0",

wal
"bluestore_block_wal_create": "false",
"bluestore_block_wal_path": "",
"bluestore_block_wal_size": "100663296",
there is no path info and only have wal size

What is the best way to get the path and size infomation of
block,block.db adn block.wal?




linyunfan

Igor Fedotov  于2020年5月5日周二 下午10:47写道:
>
> Hi Dave,
>
> wouldn't this help (particularly "Viewing runtime settings" section):
>
> https://docs.ceph.com/docs/nautilus/rados/configuration/ceph-conf/
>
>
> Thanks,
>
> Igor
>
> On 5/5/2020 2:52 AM, Dave Hall wrote:
> > Hello,
> >
> > Sorry if this has been asked before...
> >
> > A few months ago I deployed a small Nautilus cluster using
> > ceph-ansible.  The OSD nodes have multiple spinning drives and a PCI
> > NVMe.   Now that the cluster has been stable for a while it's time to
> > start optimizing performance.
> >
> > While I can tell that there is a part of the NVMe associated with each
> > OSD, I'm trying to verify which BlueStore components are using the
> > NVMe - WAL, DB, Cache - and whether the configuration generated by
> > ceph-ansible (and my settings in osds.yml) is optimal for my hardware.
> >
> > I've searched around a bit and, while I have found documentation on
> > how to configure, reconfigure, and repair a BlueStore OSD, I haven't
> > found anything on how to query the current configuration.
> >
> > Could anybody point me to a command or link to documentation on this?
> >
> > Thanks.
> >
> > -Dave
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dear Abby: Why Is Architecting CEPH So Hard?

2020-04-23 Thread lin yunfan
Hi Maritin,
How is the performance of d120-c21 hdd cluster? Can it utilize the
full performance of the 16 hdd?


linyunfan

Martin Verges  于2020年4月23日周四 下午6:12写道:
>
> Hello,
>
> simpler systems tend to be cheaper to buy per TB storage, not on a
> theoretical but practical quote.
>
> For example 1U Gigabyte 16bay D120-C21 systems with a density of 64 disks
> per 4U are quite ok for most users. On 40 Nodes per rack + 2 switches you
> have 10PB raw space for around 350k€.
> They come with everything you need from dual 10G SFP+ to acceptable 8c/16t
> 45W TDP CPU. It comes with a M.2 slot if you want a db/wal or other
> additional disk.
> Such systems equipped with 16x16TB have a price point of below 8k€ or ~31 €
> per TB RAW storage.
>
> For me this is just an example of a quite cheap but capable HDD node. I
> never saw a better offer for big fat systems on a price per TB and TCO.
>
> Please remember, there is no best node for everyone, this node is not the
> best or fastest out on the market and just an example ;)
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Do., 23. Apr. 2020 um 11:21 Uhr schrieb Darren Soothill <
> darren.sooth...@suse.com>:
>
> > I can think of 1 vendor who has made some of the compromises that you talk
> > of although memory and CPU is not one of them they are limited on slots and
> > NVME capacity.
> >
> > But there are plenty of other vendors out there who use the same model of
> > motherboard across the whole chassis range so there isn’t a compromise in
> > terms of slots and CPU.
> >
> > The compromise may come with the size of the chassis in that a lot of
> > these bigger chassis can also be deeper to get rid of the compromises.
> >
> > The reality with an OSD node is you don't need that many slots or network
> > ports.
> >
> >
> >
> > From: Janne Johansson 
> > Date: Thursday, 23 April 2020 at 08:08
> > To: Darren Soothill 
> > Cc: ceph-users@ceph.io 
> > Subject: Re: [ceph-users] Re: Dear Abby: Why Is Architecting CEPH So Hard?
> > Den tors 23 apr. 2020 kl 08:49 skrev Darren Soothill <
> > darren.sooth...@suse.com>:
> > If you want the lowest cost per TB then you will be going with larger
> > nodes in your cluster but it does mean you minimum cluster size is going to
> > be many PB’s in size.
> > Now the question is what is the tax that a particular chassis vendor is
> > charging you. I know from the configs we do on a regular basis that a 60
> > drive chassis will give you the lowest cost per TB. BUT it has
> > implications. Your cluster size needs to be up in the order of 10PB
> > minimum. 60 x 18TB gives you around 1PB per node.  Oh did you notice here
> > we are going for the bigger disk drives. Why because the more data you can
> > spread your fixed costs across the lower the overall cost per GB.
> >
> > I don't know all models, but the computers I've looked at with 60 drive
> > slots will have a small and "crappy" motherboard, with few options, not
> > many buses/slots/network ports and low amounts of cores, DIMM sockets and
> > so on, counting on you to make almost a passive storage node on it. I have
> > a hard time thinking the 60*18TB OSD recovery requirements in cpu and ram
> > would be covered in any way by the kinds of 60-slot boxes I've seen. Not
> > that I focus on that area, but it seems like a common tradeoff, Heavy
> > Duty(tm) motherboards or tons of drives.
> >
> > --
> > May the most significant bit of your life be positive.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dear Abby: Why Is Architecting CEPH So Hard?

2020-04-22 Thread lin . yunfan






Big nodes are most for HDD cluster and with 40G nic or 100G nic I don't think the network would be the bottleneck.







 








lin.yunfan






lin.yun...@gmail.com
 

On 4/23/2020 11:20,Jarett DeAngelis wrote: 


Well, for starters, "more network" = "faster cluster."On Wed, Apr 22, 2020 at 11:18 PM lin.yunfan  wrote:








I have seen a lot of people saying not to go with big nodes.
What is the exact reason for that?I can understand that if the cluster is not big enough then the total nodes count could be too small to withstand a node failure, but if the cluster is big enough wouldn't the big node be more cost effective?






 








lin.yunfan






lin.yun...@gmail.com




















签名由
网易邮箱大师
定制

 

On 4/23/2020 06:33,Brian Topping wrote: 


Great set of suggestions, thanks! One to consider: On Apr 22, 2020, at 4:14 PM, Jack  wrote:  I use 32GB flash-based satadom devices for root device They are basically SSD, and do not take front slots As they are never burning up, we never replace them Ergo, the need to "open" the server is not an issueThis is probably the wrong forum to understand how you are not burning them out. Any kind of logs or monitor databases on a small SATADOM will cook them quick, especially an MLC. There is no extra space for wear leveling and the like. I tried making it work with fancy systemd logging to memory and having those logs pulled by a log scraper storing to the actual data drives, but there was no place for the monitor DB. No monitor DB means Ceph doesn’t load, and if a monitor DB gets corrupted, it’s perilous for the cluster and instant death if the monitors aren’t replicated.My node chassis have two motherboards and each is hard limited to four SSDs. On each node, `/boot` is mirrored (RAID1) on partition 1, `/` is stripe/mirrored (RAID10) on p2, then used whatever was left for ceph data on partition 3 of each disk. This way any disk could fail and I could still boot. Merging the volumes (ie no SATADOM), wear leveling was statistically more effective. And I don’t have to get into crazy system configurations that nobody would want to maintain or document.$0.02…Brian___ceph-users mailing list -- ceph-users@ceph.ioTo unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dear Abby: Why Is Architecting CEPH So Hard?

2020-04-22 Thread lin . yunfan







I have seen a lot of people saying not to go with big nodes.
What is the exact reason for that?I can understand that if the cluster is not big enough then the total nodes count could be too small to withstand a node failure, but if the cluster is big enough wouldn't the big node be more cost effective?






 








lin.yunfan






lin.yun...@gmail.com




















签名由
网易邮箱大师
定制

 

On 4/23/2020 06:33,Brian Topping wrote: 


Great set of suggestions, thanks! One to consider: On Apr 22, 2020, at 4:14 PM, Jack  wrote:  I use 32GB flash-based satadom devices for root device They are basically SSD, and do not take front slots As they are never burning up, we never replace them Ergo, the need to "open" the server is not an issueThis is probably the wrong forum to understand how you are not burning them out. Any kind of logs or monitor databases on a small SATADOM will cook them quick, especially an MLC. There is no extra space for wear leveling and the like. I tried making it work with fancy systemd logging to memory and having those logs pulled by a log scraper storing to the actual data drives, but there was no place for the monitor DB. No monitor DB means Ceph doesn’t load, and if a monitor DB gets corrupted, it’s perilous for the cluster and instant death if the monitors aren’t replicated.My node chassis have two motherboards and each is hard limited to four SSDs. On each node, `/boot` is mirrored (RAID1) on partition 1, `/` is stripe/mirrored (RAID10) on p2, then used whatever was left for ceph data on partition 3 of each disk. This way any disk could fail and I could still boot. Merging the volumes (ie no SATADOM), wear leveling was statistically more effective. And I don’t have to get into crazy system configurations that nobody would want to maintain or document.$0.02…Brian___ceph-users mailing list -- ceph-users@ceph.ioTo unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io