Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-06-03 Thread Mattia Belluco
Hi Jake,

I would definitely go for the "leave the rest unused" solution.

Regards,
Mattia

On 5/29/19 4:25 PM, Jake ` wrote:
> Thank you for a lot of detailed and useful information :)
> 
> I'm tempted to ask a related question on SSD endurance...
> 
> If 60GB is the sweet spot for each DB/WAL partition, and the SSD has
> spare capacity, for example, I'd budgeted 266GB per DB/WAL.
> 
> Would it then be better to make a 60GB "sweet spot" sized DB/WALs, and
> leave the remaining SSD unused, as this would maximise the lifespan of
> the SSD, and speedup  garbage collection?
> 
> many thanks
> 
> Jake
> 
> 
> 
> On 5/29/19 9:56 AM, Mattia Belluco wrote:
>> On 5/29/19 5:40 AM, Konstantin Shalygin wrote:
>>> block.db should be 30Gb or 300Gb - anything between is pointless. There
>>> is described why:
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-February/033286.html
>>
>> Following some discussions we had at the past Cephalocon I beg to differ
>> on this point: when RocksDB needs to compact a layer it rewrites it
>> *before* deleting the old data; if you'd like to be sure you db does not
>> spill over to the spindle you should allocate twice the size of the
>> biggest layer to allow for compaction. I guess ~60 GB would be the sweet
>> spot assuming you don't plan to mess with size and multiplier of the
>> rocksDB layers and don't want to go all the way to 600 GB (300 GB x2)
>>
>> regards,
>> Mattia
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Mattia Belluco
S3IT Services and Support for Science IT
Office Y11 F 52
University of Zürich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 42 22
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-30 Thread Paul Emmerich
The ~4% recommendation in the docs is missleading.

How much you need really depends on how you use it, for CephFS that means:
are you going to put lots of small files on it? Or mainly big files?
If you expect lots of small files: go for a DB that's > ~300 GB. For mostly
large files you are probably fine with a 60 GB DB.

As pointed out by others: 266 GB is the same as 60 GB.

I expect the new Nautilus warning for spillover to bite a lot of people who
didn't know about the undocumented magic numbers for sizes ;)

Paul


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Tue, May 28, 2019 at 3:13 PM Jake Grimmett  wrote:

> Dear All,
>
> Quick question regarding SSD sizing for a DB/WAL...
>
> I understand 4% is generally recommended for a DB/WAL.
>
> Does this 4% continue for "large" 12TB drives, or can we  economise and
> use a smaller DB/WAL?
>
> Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB OSD,
> rather than 480GB. i.e. 2.2% rather than 4%.
>
> Will "bad things" happen as the OSD fills with a smaller DB/WAL?
>
> By the way the cluster will mainly be providing CephFS, fairly large
> files, and will use erasure encoding.
>
> many thanks for any advice,
>
> Jake
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-29 Thread Jake Grimmett
Thank you for a lot of detailed and useful information :)

I'm tempted to ask a related question on SSD endurance...

If 60GB is the sweet spot for each DB/WAL partition, and the SSD has
spare capacity, for example, I'd budgeted 266GB per DB/WAL.

Would it then be better to make a 60GB "sweet spot" sized DB/WALs, and
leave the remaining SSD unused, as this would maximise the lifespan of
the SSD, and speedup  garbage collection?

many thanks

Jake



On 5/29/19 9:56 AM, Mattia Belluco wrote:
> On 5/29/19 5:40 AM, Konstantin Shalygin wrote:
>> block.db should be 30Gb or 300Gb - anything between is pointless. There
>> is described why:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-February/033286.html
> 
> Following some discussions we had at the past Cephalocon I beg to differ
> on this point: when RocksDB needs to compact a layer it rewrites it
> *before* deleting the old data; if you'd like to be sure you db does not
> spill over to the spindle you should allocate twice the size of the
> biggest layer to allow for compaction. I guess ~60 GB would be the sweet
> spot assuming you don't plan to mess with size and multiplier of the
> rocksDB layers and don't want to go all the way to 600 GB (300 GB x2)
> 
> regards,
> Mattia
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-29 Thread Mattia Belluco
On 5/29/19 5:40 AM, Konstantin Shalygin wrote:
> block.db should be 30Gb or 300Gb - anything between is pointless. There
> is described why:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-February/033286.html

Following some discussions we had at the past Cephalocon I beg to differ
on this point: when RocksDB needs to compact a layer it rewrites it
*before* deleting the old data; if you'd like to be sure you db does not
spill over to the spindle you should allocate twice the size of the
biggest layer to allow for compaction. I guess ~60 GB would be the sweet
spot assuming you don't plan to mess with size and multiplier of the
rocksDB layers and don't want to go all the way to 600 GB (300 GB x2)

regards,
Mattia


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-29 Thread Burkhard Linke

Hi,

On 5/29/19 8:25 AM, Konstantin Shalygin wrote:



We have a similar setup, but 24 disks and 2x P4800X. And the 375GB NVME
drives are _not_ large enough:

*snipsnap*



Your block.db is 29Gb, should be 30Gb to prevent spillover to slow 
backend.




Well, it's the usual gigabyte vs. gigibyte fuck up.


The drive has exactly 366292584 bytes, which is ~350GB (with GB = the 
computer scientist's GB, 1024^3). Since rocksdb also seems to be written 
by computer scientists, we are 10 GB short for a working setup...



There are options to reduce the level size in rocksdb. Does anyone have 
experience with changing them, and what are sane values (e.g. powers of 2)?



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-29 Thread Konstantin Shalygin

We have a similar setup, but 24 disks and 2x P4800X. And the 375GB NVME
drives are _not_ large enough:


2019-05-29 07:00:00.000108 mon.bcf-03 [WRN] overall HEALTH_WARN BlueFS
spillover detected on 22 OSD(s)

root at bcf-10  :~# 
parted /dev/nvme0n1 print
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 375GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End Size    File system  Name  Flags
   1  1049kB  31.1GB  31.1GB
   2  31.1GB  62.3GB  31.1GB
   3  62.3GB  93.4GB  31.1GB
   4  93.4GB  125GB   31.1GB
   5  125GB   156GB   31.1GB
   6  156GB   187GB   31.1GB
   7  187GB   218GB   31.1GB
   8  218GB   249GB   31.1GB
   9  249GB   280GB   31.1GB
10  280GB   311GB   31.1GB
11  311GB   343GB   31.1GB
12  343GB   375GB   32.6GB


The second NVME has the same partition layout. The twelfth partition is
actually large enough to hold all the data, but the other 11 partitions
on this drive are a little bit too small. I'm still trying to calculate
the exact sweet spot


With 24 OSDs and two of them having a just-large-enough-db-partition, I
end up with 22 OSD not fully using their db partition and spilling over
into the slow disk...exactly as reported by ceph.

Details for one of the affected OSDs:

      "bluefs": {
      "gift_bytes": 0,
      "reclaim_bytes": 0,
      "db_total_bytes": 31138504704,
      "db_used_bytes": 2782912512,
      "wal_total_bytes": 0,
      "wal_used_bytes": 0,
      "slow_total_bytes": 320062095360,
      "slow_used_bytes": 5838471168,
      "num_files": 135,
      "log_bytes": 13295616,
      "log_compactions": 9,
      "logged_bytes": 338104320,
      "files_written_wal": 2,
      "files_written_sst": 5066,
      "bytes_written_wal": 375879721287,
      "bytes_written_sst": 227201938586,
      "bytes_written_slow": 6516224,
      "max_bytes_wal": 0,
      "max_bytes_db": 5265940480,
      "max_bytes_slow": 7540310016
      },

Maybe it's just matter of shifting some megabytes. We are about to
deploy more of these nodes, so I would be grateful if anyone can comment
on the correct size of the DB partitions. Otherwise I'll have to use a
RAID-0 for two drives.


Regards,




Your block.db is 29Gb, should be 30Gb to prevent spillover to slow backend.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-29 Thread Burkhard Linke

Hi,

On 5/29/19 5:23 AM, Frank Yu wrote:

Hi Jake,

I have same question about size of DB/WAL for OSD。My situations:  12 
osd per OSD nodes, 8 TB(maybe 12TB later) per OSD, Intel NVMe SSD 
(optane P4800x) 375G per OSD nodes, which means DB/WAL can use about 
30GB per OSD(8TB), I mainly use CephFS to serve the HPC cluster for ML.
(plan to separate CephFS metadata to pool based on NVMe SSD, BTW, does 
this improve the performance a lot? any compares?)



We have a similar setup, but 24 disks and 2x P4800X. And the 375GB NVME 
drives are _not_ large enough:



2019-05-29 07:00:00.000108 mon.bcf-03 [WRN] overall HEALTH_WARN BlueFS 
spillover detected on 22 OSD(s)


root@bcf-10:~# parted /dev/nvme0n1 print
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 375GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End Size    File system  Name  Flags
 1  1049kB  31.1GB  31.1GB
 2  31.1GB  62.3GB  31.1GB
 3  62.3GB  93.4GB  31.1GB
 4  93.4GB  125GB   31.1GB
 5  125GB   156GB   31.1GB
 6  156GB   187GB   31.1GB
 7  187GB   218GB   31.1GB
 8  218GB   249GB   31.1GB
 9  249GB   280GB   31.1GB
10  280GB   311GB   31.1GB
11  311GB   343GB   31.1GB
12  343GB   375GB   32.6GB


The second NVME has the same partition layout. The twelfth partition is 
actually large enough to hold all the data, but the other 11 partitions 
on this drive are a little bit too small. I'm still trying to calculate 
the exact sweet spot



With 24 OSDs and two of them having a just-large-enough-db-partition, I 
end up with 22 OSD not fully using their db partition and spilling over 
into the slow disk...exactly as reported by ceph.


Details for one of the affected OSDs:

    "bluefs": {
    "gift_bytes": 0,
    "reclaim_bytes": 0,
    "db_total_bytes": 31138504704,
    "db_used_bytes": 2782912512,
    "wal_total_bytes": 0,
    "wal_used_bytes": 0,
    "slow_total_bytes": 320062095360,
    "slow_used_bytes": 5838471168,
    "num_files": 135,
    "log_bytes": 13295616,
    "log_compactions": 9,
    "logged_bytes": 338104320,
    "files_written_wal": 2,
    "files_written_sst": 5066,
    "bytes_written_wal": 375879721287,
    "bytes_written_sst": 227201938586,
    "bytes_written_slow": 6516224,
    "max_bytes_wal": 0,
    "max_bytes_db": 5265940480,
    "max_bytes_slow": 7540310016
    },

Maybe it's just matter of shifting some megabytes. We are about to 
deploy more of these nodes, so I would be grateful if anyone can comment 
on the correct size of the DB partitions. Otherwise I'll have to use a 
RAID-0 for two drives.



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Konstantin Shalygin

Dear All,

Quick question regarding SSD sizing for a DB/WAL...

I understand 4% is generally recommended for a DB/WAL.

Does this 4% continue for "large" 12TB drives, or can we  economise and
use a smaller DB/WAL?

Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB OSD,
rather than 480GB. i.e. 2.2% rather than 4%.

Will "bad things" happen as the OSD fills with a smaller DB/WAL?

By the way the cluster will mainly be providing CephFS, fairly large
files, and will use erasure encoding.

many thanks for any advice,

Jake



block.db should be 30Gb or 300Gb - anything between is pointless. There 
is described why: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-February/033286.html


This "4%" mean nothing actually.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Frank Yu
Hi Jake,

I have same question about size of DB/WAL for OSD。My situations:  12 osd
per OSD nodes, 8 TB(maybe 12TB later) per OSD, Intel NVMe SSD (optane
P4800x) 375G per OSD nodes, which means DB/WAL can use about 30GB per
OSD(8TB), I mainly use CephFS to serve the HPC cluster for ML.
(plan to separate CephFS metadata to pool based on NVMe SSD, BTW, does this
improve the performance a lot? any compares?)


On Wed, May 29, 2019 at 12:29 AM Jake Grimmett 
wrote:

> Hi Martin,
>
> thanks for your reply :)
>
> We already have a separate NVMe SSD pool for cephfs metadata.
>
> I agree it's much simpler & more robust not using a separate DB/WAL, but
> as we have enough money for a 1.6TB SSD for every 6 HDD, so it's
> tempting to go down that route. If people think a 2.2% DB/WAL is a bad
> idea, we will definitely have a re-think.
>
> Perhaps I'm being greedy for more performance; we have a 250 node HPC
> cluster, and it would be nice to see how cephfs compares to our beegfs
> scratch.
>
> best regards,
>
> Jake
>
>
> On 5/28/19 3:14 PM, Martin Verges wrote:
> > Hello Jake,
> >
> > do you have any latency requirements that you do require the DB/WAL at
> all?
> > If not, CephFS with EC on SATA HDD works quite well as long as you have
> > the metadata on a separate ssd pool.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io 
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > Am Di., 28. Mai 2019 um 15:13 Uhr schrieb Jake Grimmett
> > mailto:j...@mrc-lmb.cam.ac.uk>>:
> >
> > Dear All,
> >
> > Quick question regarding SSD sizing for a DB/WAL...
> >
> > I understand 4% is generally recommended for a DB/WAL.
> >
> > Does this 4% continue for "large" 12TB drives, or can we  economise
> and
> > use a smaller DB/WAL?
> >
> > Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB
> OSD,
> > rather than 480GB. i.e. 2.2% rather than 4%.
> >
> > Will "bad things" happen as the OSD fills with a smaller DB/WAL?
> >
> > By the way the cluster will mainly be providing CephFS, fairly large
> > files, and will use erasure encoding.
> >
> > many thanks for any advice,
> >
> > Jake
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Regards
Frank Yu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Benjeman Meekhof
I suggest having a look at this thread, which suggests that sizes 'in
between' the requirements of different RocksDB levels have no net
effect, and size accordingly.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030740.html

My impression is that 28GB is good (L0+L1+L3), or 280 GB is good (+L4
too), or whatever size is required for +L5 is good, but anything in
between will probably not get used.  I've seen this somewhat borne out
with our oldest storage nodes which have only enough NVMe space to
provide 24GB per OSD.  Though only ~3GiB of DB space are in use of the
24GiB available, 1GiB of 'slow db' is used:

"db_total_bytes": 26671570944,
"db_used_bytes": 2801795072,
"slow_used_bytes": 1102053376
(Mimic 13.2.5)

thanks,
Ben


On Tue, May 28, 2019 at 12:55 PM Igor Fedotov  wrote:
>
> Hi Jake,
>
> just my 2 cents - I'd suggest to use LVM for DB/WAL to be able
> seamlessly extend their sizes if needed.
>
> Once you've configured this way and if you're able to add more NVMe
> later you're almost free to select any size at the initial stage.
>
>
> Thanks,
>
> Igor
>
>
> On 5/28/2019 4:13 PM, Jake Grimmett wrote:
> > Dear All,
> >
> > Quick question regarding SSD sizing for a DB/WAL...
> >
> > I understand 4% is generally recommended for a DB/WAL.
> >
> > Does this 4% continue for "large" 12TB drives, or can we  economise and
> > use a smaller DB/WAL?
> >
> > Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB OSD,
> > rather than 480GB. i.e. 2.2% rather than 4%.
> >
> > Will "bad things" happen as the OSD fills with a smaller DB/WAL?
> >
> > By the way the cluster will mainly be providing CephFS, fairly large
> > files, and will use erasure encoding.
> >
> > many thanks for any advice,
> >
> > Jake
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Igor Fedotov

Hi Jake,

just my 2 cents - I'd suggest to use LVM for DB/WAL to be able 
seamlessly extend their sizes if needed.


Once you've configured this way and if you're able to add more NVMe 
later you're almost free to select any size at the initial stage.



Thanks,

Igor


On 5/28/2019 4:13 PM, Jake Grimmett wrote:

Dear All,

Quick question regarding SSD sizing for a DB/WAL...

I understand 4% is generally recommended for a DB/WAL.

Does this 4% continue for "large" 12TB drives, or can we  economise and
use a smaller DB/WAL?

Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB OSD,
rather than 480GB. i.e. 2.2% rather than 4%.

Will "bad things" happen as the OSD fills with a smaller DB/WAL?

By the way the cluster will mainly be providing CephFS, fairly large
files, and will use erasure encoding.

many thanks for any advice,

Jake


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Martin Verges
Hello Jake,

you can use 2.2% as well and performance will most of the time better than
without having a DB/WAL. However if the DB/WAL is filled up, a spillover to
the regular drive is done and the performance will just drop as if you
wouldn't have a DB/WAL drive.

I believe that you could use "ceph daemon osd.X perf dump" and look for
"db_used_bytes" and "wal_used_bytes", but without guarantee from my side.
As far I know, it would be ok to choose values from 2-4% depending on your
usage and configuration.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Di., 28. Mai 2019 um 18:28 Uhr schrieb Jake Grimmett <
j...@mrc-lmb.cam.ac.uk>:

> Hi Martin,
>
> thanks for your reply :)
>
> We already have a separate NVMe SSD pool for cephfs metadata.
>
> I agree it's much simpler & more robust not using a separate DB/WAL, but
> as we have enough money for a 1.6TB SSD for every 6 HDD, so it's
> tempting to go down that route. If people think a 2.2% DB/WAL is a bad
> idea, we will definitely have a re-think.
>
> Perhaps I'm being greedy for more performance; we have a 250 node HPC
> cluster, and it would be nice to see how cephfs compares to our beegfs
> scratch.
>
> best regards,
>
> Jake
>
>
> On 5/28/19 3:14 PM, Martin Verges wrote:
> > Hello Jake,
> >
> > do you have any latency requirements that you do require the DB/WAL at
> all?
> > If not, CephFS with EC on SATA HDD works quite well as long as you have
> > the metadata on a separate ssd pool.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io 
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > Am Di., 28. Mai 2019 um 15:13 Uhr schrieb Jake Grimmett
> > mailto:j...@mrc-lmb.cam.ac.uk>>:
> >
> > Dear All,
> >
> > Quick question regarding SSD sizing for a DB/WAL...
> >
> > I understand 4% is generally recommended for a DB/WAL.
> >
> > Does this 4% continue for "large" 12TB drives, or can we  economise
> and
> > use a smaller DB/WAL?
> >
> > Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB
> OSD,
> > rather than 480GB. i.e. 2.2% rather than 4%.
> >
> > Will "bad things" happen as the OSD fills with a smaller DB/WAL?
> >
> > By the way the cluster will mainly be providing CephFS, fairly large
> > files, and will use erasure encoding.
> >
> > many thanks for any advice,
> >
> > Jake
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Jake Grimmett
Hi Martin,

thanks for your reply :)

We already have a separate NVMe SSD pool for cephfs metadata.

I agree it's much simpler & more robust not using a separate DB/WAL, but
as we have enough money for a 1.6TB SSD for every 6 HDD, so it's
tempting to go down that route. If people think a 2.2% DB/WAL is a bad
idea, we will definitely have a re-think.

Perhaps I'm being greedy for more performance; we have a 250 node HPC
cluster, and it would be nice to see how cephfs compares to our beegfs
scratch.

best regards,

Jake


On 5/28/19 3:14 PM, Martin Verges wrote:
> Hello Jake,
> 
> do you have any latency requirements that you do require the DB/WAL at all?
> If not, CephFS with EC on SATA HDD works quite well as long as you have
> the metadata on a separate ssd pool.
> 
> --
> Martin Verges
> Managing director
> 
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io 
> Chat: https://t.me/MartinVerges
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> 
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
> 
> 
> Am Di., 28. Mai 2019 um 15:13 Uhr schrieb Jake Grimmett
> mailto:j...@mrc-lmb.cam.ac.uk>>:
> 
> Dear All,
> 
> Quick question regarding SSD sizing for a DB/WAL...
> 
> I understand 4% is generally recommended for a DB/WAL.
> 
> Does this 4% continue for "large" 12TB drives, or can we  economise and
> use a smaller DB/WAL?
> 
> Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB OSD,
> rather than 480GB. i.e. 2.2% rather than 4%.
> 
> Will "bad things" happen as the OSD fills with a smaller DB/WAL?
> 
> By the way the cluster will mainly be providing CephFS, fairly large
> files, and will use erasure encoding.
> 
> many thanks for any advice,
> 
> Jake
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Sizing for DB/WAL: 4% for large drives?

2019-05-28 Thread Martin Verges
Hello Jake,

do you have any latency requirements that you do require the DB/WAL at all?
If not, CephFS with EC on SATA HDD works quite well as long as you have the
metadata on a separate ssd pool.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Di., 28. Mai 2019 um 15:13 Uhr schrieb Jake Grimmett <
j...@mrc-lmb.cam.ac.uk>:

> Dear All,
>
> Quick question regarding SSD sizing for a DB/WAL...
>
> I understand 4% is generally recommended for a DB/WAL.
>
> Does this 4% continue for "large" 12TB drives, or can we  economise and
> use a smaller DB/WAL?
>
> Ideally I'd fit a smaller drive providing a 266GB DB/WAL per 12TB OSD,
> rather than 480GB. i.e. 2.2% rather than 4%.
>
> Will "bad things" happen as the OSD fills with a smaller DB/WAL?
>
> By the way the cluster will mainly be providing CephFS, fairly large
> files, and will use erasure encoding.
>
> many thanks for any advice,
>
> Jake
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com