[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
On 14/1/2024 1:57 pm, Anthony D'Atri wrote: The OP is asking about new servers I think. I was looking his statement below relating to using hardware laying around, just putting out there some options which worked for use. So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Cheers ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
Because it's almost impossible to purchase the equipment required to convert old drive bays to u.2 etc. The M.2's we purchased are enterprise class. Mike On 14/1/2024 12:53 pm, Anthony D'Atri wrote: Why use such a card and M.2 drives that I suspect aren’t enterprise-class? Instead of U.2, E1.s, or E3.s ? On Jan 13, 2024, at 5:10 AM, Mike O'Connor wrote: On 13/1/2024 1:02 am, Drew Weaver wrote: Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the motherboard without a bridge or HBA for Ceph specifically? Thanks, -Drew ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an emailtoceph-users-le...@ceph.io Hi You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are cheap enough around $USD180 from Aliexpress. There are companies with cards which have many more m.2 ports but the cost goes up greatly. We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G switch. It works really well. Cheers Mike ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
On 13/1/2024 1:02 am, Drew Weaver wrote: Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the motherboard without a bridge or HBA for Ceph specifically? Thanks, -Drew ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io Hi You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are cheap enough around $USD180 from Aliexpress. There are companies with cards which have many more m.2 ports but the cost goes up greatly. We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G switch. It works really well. Cheers Mike ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CEPH Cluster Backup - Options on my solution
> [SNIP script] > > Hi mike > > When looking for backup solutions, did you come across benji [1][2] > and the orginal backy2 [3][4] solutions ? > I have been running benji for a while now, and it seems solid. I use a > second cluster as storage, but it does support S3 and encryption as well. > > just wondering if you had any experience you could share that excluded > these options, and make you aware of them if you did not. Hi Ronny Installed backy2 but it uses local storage, at least for a period of time. When I found it was taking up all my root disk space I had to stop it. I have not tried benji, as I assumed it used the same methods. Cheers Mike ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] CEPH Cluster Backup - Options on my solution
Hi All I've been worried that I did not have a good backup of my cluster and having looked around I could not find anything which did not require local storage. I found Rhian script while looking for a backup solution before a major version upgrade and found that it worked very well. I'm looking for opinions on this method below. This uses stdio to pipe the data directly from rbd to a s3 host, while encrypting the data. I have a few worries. 1. The snap shots, I think I should be deleting the older one and only keeping the the last one or two. 2. Having only one full backup and diff from there seems wrong to me but the amount of data would seem to preclude creating a new full back up on a regular biases. I'm sure that others will pipe up with more issues. Note. The sum.c program is needed because s3 needs to be told about files which are larger than 5Gig and the standard tools I tested all had a limit of 32bits when printing out the size in bytes Cheers Mike #!/bin/bash ### # Original Author: Rhian Resnick - Updated a lot by Mike O'Connor # Purpose: Backup CEPH RBD using snapshots, the files that are created should be stored off the ceph cluster, but you can use ceph storage during the process of backing them up. ### export AWS_PROFILE=wasabi PUBKEY='PublicKey' pool=$1 if [ “$pool” == “” ] then echo Usage: $0 pool exit 1 fi rbd ls $pool | while read vol do if [ $vol == ISO ] then continue fi # Look up latest backup file echo BACKUP ${vol} LASTSNAP=`aws s3 ls s3://dcbackup/$vol/ | sort | tail -n 1 | awk '{print $4}' | cut -d "." -f 1` echo "Last Snap: $vol/$LASTSNAP" # Create a snap, we need this to do the diff NEWSNAP=`date +%y%m%d%H%M` echo "New Snap: $NEWSNAP" echo rbd snap create $pool/$vol@$NEWSNAP rbd snap create $pool/$vol@$NEWSNAP if [ "$LASTSNAP" == "" ] then RBD_SIZE=`rbd diff $pool/$vol | ./sum` echo "rbd export-diff $pool/$vol@$NEWSNAP - | seccure-encrypt ${PUBKEY} | aws s3 cp --expected-size ${RBD_SIZE} - s3://dcbackup/$vol/$NEWSNAP.diff" rbd export-diff $pool/$vol@$NEWSNAP - | seccure-encrypt ${PUBKEY} | aws s3 cp --expected-size ${RBD_SIZE} - s3://dcbackup/$vol/$NEWSNAP.diff else RBD_SIZE=`rbd diff --from-snap $LASTSNAP $pool/$vol | ./sum` echo "rbd export-diff --from-snap $LASTSNAP $pool/$vol@$NEWSNAP - | seccure-encrypt ${PUBKEY} | aws s3 cp --expected-size ${RBD_SIZE} - s3://dcbackup/$vol/$NEWSNAP.diff" rbd export-diff --from-snap $LASTSNAP $pool/$vol@$NEWSNAP - | seccure-encrypt ${PUBKEY} | aws s3 cp - s3://dcbackup/$vol/$NEWSNAP.diff fi echo echo done #include #include #include #include /* sum column 2 of input with lines like: number number word * skipping the first (header) row * * usage: sum [-v] file * * if -v is specified a count of rows processed (including the header) is also output */ int main(int argc, char *argv[]) { uint64_t sum = 0L; size_t n = 100; unsigned int count=1; char *buf = (char*)malloc(n); getline(,,stdin); while (1) { uint64_t offset, bytes; n = fscanf(stdin, "%lld %lld %s\n", , , buf); if (n != 3) break; ++count; sum += bytes; } if (argc>1 && strcmp(argv[1],"-v")==0) printf("%d %lld\n", count, sum); else printf("%lld\n", sum); } ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mapped rbd is very slow
This probably muddies the water. Note Active cluster with around 22 read/write IOPS and 200kB read/write A CephFS mounted with 3 hosts 6 osd per host with 8G public and 10G private networking for Ceph. No SSDs and mostly WD Red 1T 2.5" drives some are HGST 1T 7200. root@blade7:~# fio -ioengine=libaio -name=test -bs=4k -iodepth=32 -rw=randwrite -direct=1 -runtime=60 -filename=/mnt/pve/cephfs/test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process test: you need to specify size= fio: pid=0, err=22/file:filesetup.c:952, func=total_file_size, error=Invalid argument Run status group 0 (all jobs): root@blade7:~# fio -ioengine=libaio -name=test -bs=4k -iodepth=32 -rw=randwrite -direct=1 -runtime=60 -size=10G -filename=/mnt/pve/cephfs/test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process test: Laying out IO file (1 file / 10240MiB) Jobs: 1 (f=0): [f(1)][100.0%][w=580KiB/s][w=145 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3561674: Sat Aug 17 09:20:22 2019 write: IOPS=2262, BW=9051KiB/s (9268kB/s)(538MiB/60845msec); 0 zone resets slat (usec): min=8, max=35648, avg=40.01, stdev=97.51 clat (usec): min=954, max=2854.3k, avg=14090.15, stdev=100194.83 lat (usec): min=994, max=2854.3k, avg=14130.65, stdev=100195.40 clat percentiles (usec): | 1.00th=[ 1254], 5.00th=[ 1450], 10.00th=[ 1582], | 20.00th=[ 1795], 30.00th=[ 2008], 40.00th=[ 2245], | 50.00th=[ 2540], 60.00th=[ 2933], 70.00th=[ 3392], | 80.00th=[ 4228], 90.00th=[ 7767], 95.00th=[ 35914], | 99.00th=[ 254804], 99.50th=[ 616563], 99.90th=[1652556], | 99.95th=[2122318], 99.99th=[2600469] bw ( KiB/s): min= 48, max=44408, per=100.00%, avg=10387.54, stdev=10384.94, samples=106 iops : min= 12, max=11102, avg=2596.88, stdev=2596.23, samples=106 lat (usec) : 1000=0.01% lat (msec) : 2=29.82%, 4=47.95%, 10=14.23%, 20=2.43%, 50=1.34% lat (msec) : 100=2.68%, 250=0.53%, 500=0.40%, 750=0.20%, 1000=0.14% cpu : usr=1.45%, sys=6.36%, ctx=151946, majf=0, minf=280 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,137674,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=9051KiB/s (9268kB/s), 9051KiB/s-9051KiB/s (9268kB/s-9268kB/s), io=538MiB (564MB), run=60845-60845msec This is on the same system with a RBD mapped file system root@blade7:/mnt# fio -ioengine=libaio -name=test -bs=4k -iodepth=32 -rw=randwrite -direct=1 -runtime=60 -size=10G -filename=/mnt/image0/test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process test: Laying out IO file (1 file / 10240MiB) Jobs: 1 (f=1): [w(1)][4.5%][w=4KiB/s][w=1 IOPS][eta 21m:30s] test: (groupid=0, jobs=1): err= 0: pid=3567399: Sat Aug 17 09:38:55 2019 write: IOPS=1935, BW=7744KiB/s (7930kB/s)(462MiB/61143msec); 0 zone resets slat (usec): min=9, max=700161, avg=65.17, stdev=2092.54 clat (usec): min=954, max=2578.6k, avg=16457.67, stdev=109995.03 lat (usec): min=1021, max=2578.6k, avg=16523.42, stdev=110014.91 clat percentiles (usec): | 1.00th=[ 1254], 5.00th=[ 1434], 10.00th=[ 1549], | 20.00th=[ 1745], 30.00th=[ 1909], 40.00th=[ 2114], | 50.00th=[ 2376], 60.00th=[ 2704], 70.00th=[ 3228], | 80.00th=[ 4080], 90.00th=[ 8717], 95.00th=[ 53216], | 99.00th=[ 291505], 99.50th=[ 675283], 99.90th=[1669333], | 99.95th=[2231370], 99.99th=[2365588] bw ( KiB/s): min= 8, max=35968, per=100.00%, avg=9015.64, stdev=8402.84, samples=105 iops : min= 2, max= 8992, avg=2253.90, stdev=2100.72, samples=105 lat (usec) : 1000=0.01% lat (msec) : 2=34.85%, 4=44.49%, 10=11.54%, 20=1.84%, 50=1.81% lat (msec) : 100=3.27%, 250=1.13%, 500=0.42%, 750=0.19%, 1000=0.08% cpu : usr=1.42%, sys=6.63%, ctx=123309, majf=0, minf=283 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,118371,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=7744KiB/s (7930kB/s), 7744KiB/s-7744KiB/s (7930kB/s-7930kB/s), io=462MiB (485MB), run=61143-61143msec Disk stats (read/write): rbd0: ios=0/118670, merge=0/9674, ticks=0/1894238, in_queue=1651008, util=33.33% On 17/8/19 8:46 am, Olivier