[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-13 Thread Mike O'Connor

On 14/1/2024 1:57 pm, Anthony D'Atri wrote:

The OP is asking about new servers I think.
I was looking his statement below relating to using hardware laying 
around, just putting out there some options which worked for use.
  
So we were going to replace a Ceph cluster with some hardware we had

laying around using SATA HBAs but I was told that the only right way to
build Ceph in 2023 is with direct attach NVMe.


Cheers

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-13 Thread Mike O'Connor
Because it's almost impossible to purchase the equipment required to 
convert old drive bays to u.2 etc.


The M.2's we purchased are enterprise class.

Mike


On 14/1/2024 12:53 pm, Anthony D'Atri wrote:

Why use such a card and M.2 drives that I suspect aren’t enterprise-class? 
Instead of U.2, E1.s, or E3.s ?


On Jan 13, 2024, at 5:10 AM, Mike O'Connor  wrote:

On 13/1/2024 1:02 am, Drew Weaver wrote:

Hello,

So we were going to replace a Ceph cluster with some hardware we had laying 
around using SATA HBAs but I was told that the only right way to build Ceph in 
2023 is with direct attach NVMe.

Does anyone have any recommendation for a 1U barebones server (we just drop in ram 
disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the 
motherboard without a bridge or HBA for Ceph specifically?

Thanks,
-Drew

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an emailtoceph-users-le...@ceph.io

Hi

You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are cheap 
enough around $USD180 from Aliexpress.

There are companies with cards which have many more m.2 ports but the cost goes 
up greatly.

We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G 
Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G switch.

It works really well.

Cheers

Mike
___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-13 Thread Mike O'Connor

On 13/1/2024 1:02 am, Drew Weaver wrote:

Hello,

So we were going to replace a Ceph cluster with some hardware we had laying 
around using SATA HBAs but I was told that the only right way to build Ceph in 
2023 is with direct attach NVMe.

Does anyone have any recommendation for a 1U barebones server (we just drop in ram 
disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the 
motherboard without a bridge or HBA for Ceph specifically?

Thanks,
-Drew

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io


Hi

You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME 
are cheap enough around $USD180 from Aliexpress.


There are companies with cards which have many more m.2 ports but the 
cost goes up greatly.


We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G 
Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G 
switch.


It works really well.

Cheers

Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPH Cluster Backup - Options on my solution

2019-08-17 Thread Mike O'Connor


> [SNIP script]
>
> Hi mike
>
> When looking for backup solutions, did you come across benji [1][2]
> and the orginal backy2 [3][4] solutions ?
> I have been running benji for a while now, and it seems solid. I use a
> second cluster as storage, but it does support S3 and encryption as well.
>
> just wondering if you had any experience you could share that excluded
> these options, and make you aware of them if you did not. 
Hi Ronny

Installed backy2 but it uses local storage, at least for a period of time.

When I found it was taking up all my root disk space I had to stop it.

I have not tried benji, as I assumed it used the same methods.

Cheers

Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CEPH Cluster Backup - Options on my solution

2019-08-16 Thread Mike O'Connor
Hi All

I've been worried that I did not have a good backup of my cluster and
having looked around I could not find anything which did not require
local storage.

I found Rhian script while looking for a backup solution before a major
version upgrade and found that it worked very well.

I'm looking for opinions on this method below.

This uses stdio to pipe the data directly from rbd to a s3 host, while
encrypting the data.

I have a few worries.

1. The snap shots, I think I should be deleting the older one and only
keeping the the last one or two.
2. Having only one full backup and diff from there seems wrong to me but
the amount of data would seem to preclude creating a new full back up on
a regular biases.

I'm sure that others will pipe up with more issues.

Note. The sum.c program is needed because s3 needs to be told about
files which are larger than 5Gig and the standard tools I tested all had
a limit of 32bits when printing out the size in bytes

Cheers

Mike


#!/bin/bash
###
# Original Author: Rhian Resnick - Updated a lot by Mike O'Connor
# Purpose: Backup CEPH RBD using snapshots, the files that are created
should be stored off the ceph cluster, but you can use ceph storage
during the process of backing them up.
###


export AWS_PROFILE=wasabi
PUBKEY='PublicKey'

pool=$1
if [ “$pool” == “” ]
then
    echo Usage: $0 pool
    exit 1
fi
rbd ls $pool | while read vol
do
    if [ $vol == ISO ]
    then
  continue
    fi
    # Look up latest backup file

    echo BACKUP ${vol}
    LASTSNAP=`aws s3 ls s3://dcbackup/$vol/ | sort | tail -n 1 | awk
'{print $4}' | cut -d "." -f 1`
    echo "Last Snap: $vol/$LASTSNAP"

    # Create a snap, we need this to do the diff
    NEWSNAP=`date +%y%m%d%H%M`
    echo "New Snap: $NEWSNAP"
    echo rbd snap create $pool/$vol@$NEWSNAP
    rbd snap create $pool/$vol@$NEWSNAP

    if [ "$LASTSNAP" == "" ]
    then
    RBD_SIZE=`rbd diff $pool/$vol | ./sum`
    echo "rbd export-diff $pool/$vol@$NEWSNAP - | seccure-encrypt
${PUBKEY} | aws s3 cp --expected-size ${RBD_SIZE} -
s3://dcbackup/$vol/$NEWSNAP.diff"
    rbd export-diff $pool/$vol@$NEWSNAP - | seccure-encrypt
${PUBKEY} | aws s3 cp --expected-size ${RBD_SIZE} -
s3://dcbackup/$vol/$NEWSNAP.diff
    else
    RBD_SIZE=`rbd diff --from-snap $LASTSNAP $pool/$vol | ./sum`
    echo "rbd export-diff --from-snap $LASTSNAP $pool/$vol@$NEWSNAP
- | seccure-encrypt ${PUBKEY} | aws s3 cp --expected-size ${RBD_SIZE}  -
s3://dcbackup/$vol/$NEWSNAP.diff"
    rbd export-diff --from-snap $LASTSNAP $pool/$vol@$NEWSNAP - |
seccure-encrypt ${PUBKEY} | aws s3 cp  - s3://dcbackup/$vol/$NEWSNAP.diff
    fi
    echo
    echo
done


#include 
#include 
#include 
#include 

/* sum column 2 of input with lines like:  number number word
 * skipping the first (header) row
 *
 * usage: sum [-v] file
 *
 * if -v is specified a count of rows processed (including the header)
is also output
 */

int main(int argc, char *argv[]) {
    uint64_t sum = 0L;
    size_t n = 100;
    unsigned int count=1;
    char *buf = (char*)malloc(n);

    getline(,,stdin);
    while (1) {
    uint64_t offset, bytes;
    n = fscanf(stdin, "%lld %lld %s\n", , , buf);
    if (n != 3) break;
    ++count;
    sum += bytes;
    }
    if (argc>1 && strcmp(argv[1],"-v")==0)
    printf("%d %lld\n", count, sum);
    else
    printf("%lld\n", sum);
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mapped rbd is very slow

2019-08-16 Thread Mike O'Connor
This probably muddies the water. Note Active cluster with around 22
read/write IOPS and 200kB read/write

A CephFS mounted with 3 hosts 6 osd per host with 8G public and 10G
private networking for Ceph.
No SSDs and mostly WD Red 1T 2.5" drives some are HGST 1T 7200.

root@blade7:~# fio -ioengine=libaio -name=test -bs=4k -iodepth=32
-rw=randwrite -direct=1 -runtime=60 -filename=/mnt/pve/cephfs/test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
test: you need to specify size=
fio: pid=0, err=22/file:filesetup.c:952, func=total_file_size,
error=Invalid argument

Run status group 0 (all jobs):
root@blade7:~# fio -ioengine=libaio -name=test -bs=4k -iodepth=32
-rw=randwrite -direct=1 -runtime=60 -size=10G -filename=/mnt/pve/cephfs/test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=0): [f(1)][100.0%][w=580KiB/s][w=145 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3561674: Sat Aug 17 09:20:22 2019
  write: IOPS=2262, BW=9051KiB/s (9268kB/s)(538MiB/60845msec); 0 zone resets
    slat (usec): min=8, max=35648, avg=40.01, stdev=97.51
    clat (usec): min=954, max=2854.3k, avg=14090.15, stdev=100194.83
 lat (usec): min=994, max=2854.3k, avg=14130.65, stdev=100195.40
    clat percentiles (usec):
 |  1.00th=[   1254],  5.00th=[   1450], 10.00th=[   1582],
 | 20.00th=[   1795], 30.00th=[   2008], 40.00th=[   2245],
 | 50.00th=[   2540], 60.00th=[   2933], 70.00th=[   3392],
 | 80.00th=[   4228], 90.00th=[   7767], 95.00th=[  35914],
 | 99.00th=[ 254804], 99.50th=[ 616563], 99.90th=[1652556],
 | 99.95th=[2122318], 99.99th=[2600469]
   bw (  KiB/s): min=   48, max=44408, per=100.00%, avg=10387.54,
stdev=10384.94, samples=106
   iops    : min=   12, max=11102, avg=2596.88, stdev=2596.23,
samples=106
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=29.82%, 4=47.95%, 10=14.23%, 20=2.43%, 50=1.34%
  lat (msec)   : 100=2.68%, 250=0.53%, 500=0.40%, 750=0.20%, 1000=0.14%
  cpu  : usr=1.45%, sys=6.36%, ctx=151946, majf=0, minf=280
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
 issued rwts: total=0,137674,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=9051KiB/s (9268kB/s), 9051KiB/s-9051KiB/s
(9268kB/s-9268kB/s), io=538MiB (564MB), run=60845-60845msec

This is on the same system with a RBD mapped file system

root@blade7:/mnt# fio -ioengine=libaio -name=test -bs=4k -iodepth=32
-rw=randwrite -direct=1 -runtime=60 -size=10G -filename=/mnt/image0/test
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [w(1)][4.5%][w=4KiB/s][w=1 IOPS][eta 21m:30s]
test: (groupid=0, jobs=1): err= 0: pid=3567399: Sat Aug 17 09:38:55 2019
  write: IOPS=1935, BW=7744KiB/s (7930kB/s)(462MiB/61143msec); 0 zone resets
    slat (usec): min=9, max=700161, avg=65.17, stdev=2092.54
    clat (usec): min=954, max=2578.6k, avg=16457.67, stdev=109995.03
 lat (usec): min=1021, max=2578.6k, avg=16523.42, stdev=110014.91
    clat percentiles (usec):
 |  1.00th=[   1254],  5.00th=[   1434], 10.00th=[   1549],
 | 20.00th=[   1745], 30.00th=[   1909], 40.00th=[   2114],
 | 50.00th=[   2376], 60.00th=[   2704], 70.00th=[   3228],
 | 80.00th=[   4080], 90.00th=[   8717], 95.00th=[  53216],
 | 99.00th=[ 291505], 99.50th=[ 675283], 99.90th=[1669333],
 | 99.95th=[2231370], 99.99th=[2365588]
   bw (  KiB/s): min=    8, max=35968, per=100.00%, avg=9015.64,
stdev=8402.84, samples=105
   iops    : min=    2, max= 8992, avg=2253.90, stdev=2100.72,
samples=105
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=34.85%, 4=44.49%, 10=11.54%, 20=1.84%, 50=1.81%
  lat (msec)   : 100=3.27%, 250=1.13%, 500=0.42%, 750=0.19%, 1000=0.08%
  cpu  : usr=1.42%, sys=6.63%, ctx=123309, majf=0, minf=283
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
 issued rwts: total=0,118371,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=7744KiB/s (7930kB/s), 7744KiB/s-7744KiB/s
(7930kB/s-7930kB/s), io=462MiB (485MB), run=61143-61143msec

Disk stats (read/write):
  rbd0: ios=0/118670, merge=0/9674, ticks=0/1894238, in_queue=1651008,
util=33.33%


On 17/8/19 8:46 am, Olivier