[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread Seena Fallah
Yes I'm planning to use only 60% - 70% of my disks and pools like
buckets.index doesn't grow too much and don't need much space! I'm just
trying to make this pool faster because I see it sometimes needs 1Milion
iops and I think NVME is a good option for this pool. But finding a good
datacenter NVME in low space is too hard :(

On Mon, Sep 14, 2020 at 7:32 PM Martin Verges 
wrote:

> Hello,
>
> Please keep in mind that you can have significant operational problems if
> you choose too small OSDs. Sometimes your OSDs require >40G for
> osdmaps/pgmaps/... and the smaller you OSD, the more likely it will be a
> problem as Ceph is totally unable to deal with full disks and break apart.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mo., 14. Sept. 2020 um 15:58 Uhr schrieb :
>
>> https://www.kingston.com/unitedkingdom/en/ssd/dc1000b-data-center-boot-ssd
>>
>> look good for your purpose.
>>
>>
>>
>> - Original Message -
>> From: "Seena Fallah" 
>> To: "Виталий Филиппов" 
>> Cc: "Anthony D'Atri" , "ceph-users" <
>> ceph-users@ceph.io>
>> Sent: Monday, September 14, 2020 2:47:14 PM
>> Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster
>>
>> Thanks for the sheet. I need a low space disk for my use case (around
>> 240GB). Do you have any suggestions with M.2 and capacitors?
>>
>> On Mon, Sep 14, 2020 at 6:11 PM  wrote:
>>
>> > There's also Micron 7300 Pro/Max. Please benchmark it like described
>> here
>> >
>> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
>> > and send me the results if you get one :))
>> >
>> > Samsung PM983 M.2
>> >
>> > I want to have a separate disk for buckets index pool and all of my
>> server
>> > bays are full and I should use m2 storage devices. Also the bucket index
>> > doesn't need much space so I plan to have a 6x device with replica 3 for
>> > it. Each disk could be 240GB to not waste space but there is no
>> enterprise
>> > nvme disk in this space! Do you have any recommendations?
>> > On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
>> > wrote:
>> >
>> > Easy, 883 has capacitors and 970 evo doesn't
>> > 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah <
>> seenafal...@gmail.com>
>> > пишет:
>> >
>> > Hi. How do you say 883DCT is faster than 970 EVO? I saw the
>> specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell
>> why 970 EVO act lower than 883DCT?
>> >
>> > --
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > --
>> > With best regards,
>> > Vitaliy Filippov
>> >
>> >
>> >
>> >
>> >
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread Martin Verges
Hello,

Please keep in mind that you can have significant operational problems if
you choose too small OSDs. Sometimes your OSDs require >40G for
osdmaps/pgmaps/... and the smaller you OSD, the more likely it will be a
problem as Ceph is totally unable to deal with full disks and break apart.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mo., 14. Sept. 2020 um 15:58 Uhr schrieb :

> https://www.kingston.com/unitedkingdom/en/ssd/dc1000b-data-center-boot-ssd
>
> look good for your purpose.
>
>
>
> - Original Message -
> From: "Seena Fallah" 
> To: "Виталий Филиппов" 
> Cc: "Anthony D'Atri" , "ceph-users" <
> ceph-users@ceph.io>
> Sent: Monday, September 14, 2020 2:47:14 PM
> Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster
>
> Thanks for the sheet. I need a low space disk for my use case (around
> 240GB). Do you have any suggestions with M.2 and capacitors?
>
> On Mon, Sep 14, 2020 at 6:11 PM  wrote:
>
> > There's also Micron 7300 Pro/Max. Please benchmark it like described here
> >
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
> > and send me the results if you get one :))
> >
> > Samsung PM983 M.2
> >
> > I want to have a separate disk for buckets index pool and all of my
> server
> > bays are full and I should use m2 storage devices. Also the bucket index
> > doesn't need much space so I plan to have a 6x device with replica 3 for
> > it. Each disk could be 240GB to not waste space but there is no
> enterprise
> > nvme disk in this space! Do you have any recommendations?
> > On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
> > wrote:
> >
> > Easy, 883 has capacitors and 970 evo doesn't
> > 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah <
> seenafal...@gmail.com>
> > пишет:
> >
> > Hi. How do you say 883DCT is faster than 970 EVO? I saw the
> specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell
> why 970 EVO act lower than 883DCT?
> >
> > --
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > --
> > With best regards,
> > Vitaliy Filippov
> >
> >
> >
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread response
https://www.kingston.com/unitedkingdom/en/ssd/dc1000b-data-center-boot-ssd

look good for your purpose. 



- Original Message -
From: "Seena Fallah" 
To: "Виталий Филиппов" 
Cc: "Anthony D'Atri" , "ceph-users" 

Sent: Monday, September 14, 2020 2:47:14 PM
Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster

Thanks for the sheet. I need a low space disk for my use case (around
240GB). Do you have any suggestions with M.2 and capacitors?

On Mon, Sep 14, 2020 at 6:11 PM  wrote:

> There's also Micron 7300 Pro/Max. Please benchmark it like described here
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
> and send me the results if you get one :))
>
> Samsung PM983 M.2
>
> I want to have a separate disk for buckets index pool and all of my server
> bays are full and I should use m2 storage devices. Also the bucket index
> doesn't need much space so I plan to have a 6x device with replica 3 for
> it. Each disk could be 240GB to not waste space but there is no enterprise
> nvme disk in this space! Do you have any recommendations?
> On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
> wrote:
>
> Easy, 883 has capacitors and 970 evo doesn't
> 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah 
> пишет:
>
> Hi. How do you say 883DCT is faster than 970 EVO? I saw the specifications 
> and 970 EVO has higher IOPS than 883DCT! Can you please tell why 970 EVO act 
> lower than 883DCT?
>
> --
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> With best regards,
> Vitaliy Filippov
>
>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread Seena Fallah
Thanks for the sheet. I need a low space disk for my use case (around
240GB). Do you have any suggestions with M.2 and capacitors?

On Mon, Sep 14, 2020 at 6:11 PM  wrote:

> There's also Micron 7300 Pro/Max. Please benchmark it like described here
> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
> and send me the results if you get one :))
>
> Samsung PM983 M.2
>
> I want to have a separate disk for buckets index pool and all of my server
> bays are full and I should use m2 storage devices. Also the bucket index
> doesn't need much space so I plan to have a 6x device with replica 3 for
> it. Each disk could be 240GB to not waste space but there is no enterprise
> nvme disk in this space! Do you have any recommendations?
> On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
> wrote:
>
> Easy, 883 has capacitors and 970 evo doesn't
> 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah 
> пишет:
>
> Hi. How do you say 883DCT is faster than 970 EVO? I saw the specifications 
> and 970 EVO has higher IOPS than 883DCT! Can you please tell why 970 EVO act 
> lower than 883DCT?
>
> --
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> With best regards,
> Vitaliy Filippov
>
>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread vitalif
There's also Micron 7300 Pro/Max. Please benchmark it like described here 
https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit
 
(https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit)
 and send me the results if you get one :))
Samsung PM983 M.2
 I want to have a separate disk for buckets index pool and all of my server 
bays are full and I should use m2 storage devices. Also the bucket index 
doesn't need much space so I plan to have a 6x device with replica 3 for it. 
Each disk could be 240GB to not waste space but there is no enterprise nvme 
disk in this space! Do you have any recommendations? 
 On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов mailto:vita...@yourcmc.ru)> wrote: 
Easy, 883 has capacitors and 970 evo doesn't
13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah mailto:seenafal...@gmail.com)> пишет: 

Hi. How do you say 883DCT is faster than 970 EVO? I saw the 
specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell why 
970 EVO act lower than 883DCT? 

ceph-users mailing list -- ceph-users@ceph.io (mailto:ceph-users@ceph.io)
To unsubscribe send an email to ceph-users-le...@ceph.io 
(mailto:ceph-users-le...@ceph.io)   
--
With best regards,
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-14 Thread vitalif
Samsung PM983 M.2
 I want to have a separate disk for buckets index pool and all of my server 
bays are full and I should use m2 storage devices. Also the bucket index 
doesn't need much space so I plan to have a 6x device with replica 3 for it. 
Each disk could be 240GB to not waste space but there is no enterprise nvme 
disk in this space! Do you have any recommendations? 
 On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов mailto:vita...@yourcmc.ru)> wrote: 
Easy, 883 has capacitors and 970 evo doesn't
13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah mailto:seenafal...@gmail.com)> пишет: 

Hi. How do you say 883DCT is faster than 970 EVO? I saw the 
specifications and 970 EVO has higher IOPS than 883DCT! Can you please tell why 
970 EVO act lower than 883DCT? 

ceph-users mailing list -- ceph-users@ceph.io (mailto:ceph-users@ceph.io)
To unsubscribe send an email to ceph-users-le...@ceph.io 
(mailto:ceph-users-le...@ceph.io)   
--
With best regards,
Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-13 Thread Seena Fallah
I want to have a separate disk for buckets index pool and all of my server
bays are full and I should use m2 storage devices. Also the bucket index
doesn't need much space so I plan to have a 6x device with replica 3 for
it. Each disk could be 240GB to not waste space but there is no enterprise
nvme disk in this space! Do you have any recommendations?

On Sun, Sep 13, 2020 at 10:17 PM Виталий Филиппов 
wrote:

> Easy, 883 has capacitors and 970 evo doesn't
>
> 13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah 
> пишет:
>>
>> Hi. How do you say 883DCT is faster than 970 EVO?
>> I saw the specifications and 970 EVO has higher IOPS than 883DCT!
>> Can you please tell why 970 EVO act lower than 883DCT?
>> --
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> --
> With best regards,
> Vitaliy Filippov
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-13 Thread Виталий Филиппов
Easy, 883 has capacitors and 970 evo doesn't

13 сентября 2020 г. 0:57:43 GMT+03:00, Seena Fallah  
пишет:
>Hi. How do you say 883DCT is faster than 970 EVO?
>I saw the specifications and 970 EVO has higher IOPS than 883DCT!
>Can you please tell why 970 EVO act lower than 883DCT?
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-12 Thread Anthony D'Atri
Is this a reply to Paul’s message from 11 months ago?

https://bit.ly/32oZGlR

The PM1725b is interesting in that it has explicitly configurable durability vs 
capacity, which may be even more effective than user-level short-stroking / 
underprovisioning.


> 
> Hi. How do you say 883DCT is faster than 970 EVO?
> I saw the specifications and 970 EVO has higher IOPS than 883DCT!
> Can you please tell why 970 EVO act lower than 883DCT?

The thread above explains that.  Basically it’s not as simple as “faster”. IOPS 
describe behavior along one axis under a certain workload for a certain length 
of times.  Subtle factors:

* With increasing block size, queue depth, operation rate / duration, some 
less-robust drives will exhibit cliffing where their performance falls off 
dramatically


——
|
|
—

(that may or may not render usefully, your UMA may vary)

Or they may lose your data when there’s a power event.

* Is IOPS what you’re really concerned with?  As your OSD nodes are 
increasingly saturated by parallel requests (or if you’re overly aggressive 
with your PG ratio) , you may see more IOPS / throughput, at the risk of 
latencying going down the drain.  This may be reasonably acceptable for RGW 
bucket data, but maybe not indexes and for sure not for RBD volumes.

* The nature of the workload can dramatically affect performance

** block size
** queue depth
** r/w mix
** sync
** phoon
** etc

This is one thing that (hopefully) distinguishes “enterprise” drives from 
“consumer” drives.  There’s one “enterprise” drive (now EOL) that turned out to 
develop UREs and dramatically increased latency when presented with an actual 
enterprise Ceph — vs desktop — workload. I fought that for a year and found 
that older drives actually fared better than newer, though the vendor denyed an 
engineering or process change.  Consider the total cost of saving a few bucks 
on cheap drives that appear *on paper* to have attractive marketing specs, vs 
the nightmares you will face and the other things you won’t have time to work 
on if you’re consumed with pandemic drive failures.

Look up the performance firmware update history of the various 840/860 EVO even 
when used on desktops, which is not to say that the 970 does or doesn’t exhibit 
the same or similar issues.  Consider if you want to risk your 
corporate/production data, applications, and users on desktop-engineered drives.

In the end, you really need to buy or borrow eval drives and measure how they 
perform under both benchmarks and real workloads.  And Ceph mon / OSD service 
is *not* the same as any FIO or other benchmark tool load.

https://github.com/louwrentius/fio-plot

is a delightfully visual tool that shows the IOPS / BW / latency tradeoffs

Ideally one would compare FIO benchmarks across drives and also provision 
multiple models on a given system, slap OSDs on them, throw your real workload 
at them, and after at least a month gather drive/OSD iops/latency/bw metrics 
for each and compare them.  I’m not aware of a simple tool to manage this 
process, though I’d love one.

ymmocv
— aad


> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2020-09-12 Thread Seena Fallah
Hi. How do you say 883DCT is faster than 970 EVO?
I saw the specifications and 970 EVO has higher IOPS than 883DCT!
Can you please tell why 970 EVO act lower than 883DCT?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-25 Thread Drew Weaver
Not related to the original topic but the Micron case in that article is 
fascinating and a little surprising.

With pretty much best in class hardware in a lab environment:

Potential 25,899,072‬ 4KiB random write IOPs goes to 477K
Potential 23,826,216 4KiB random read IOPs goes to 2,000,000

477K write IOPs and 2M read IOPs isn't terrible especially given there is 
replication but the overhead when you look at the numbers is still staggering.

Thanks for sharing this article.
-Drew

-Original Message-
From: Vitaliy Filippov  
Sent: Thursday, October 24, 2019 6:32 PM
To: ceph-us...@ceph.com; Hermann Himmelbauer 
Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster

It's easy:

https://yourcmc.ru/wiki/Ceph_performance

> Hi,
> I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster 
> on
> 3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks), 
> interconnected via Infiniband 40.
>
> Problem is that the ceph performance is quite bad (approx. 30MiB/s 
> reading, 3-4 MiB/s writing ), so I thought about plugging into each 
> node a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is 
> to have a faster ceph storage and also some storage extension.
>
> The question is now which SSDs I should use. If I understand it right, 
> not every SSD is suitable for ceph, as is denoted at the links below:
>
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-
> ssd-is-suitable-as-a-journal-device/
> or here:
> https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
>
> In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a 
> fast SSD for ceph. As the 950 is not available anymore, I ordered a 
> Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
>
> Before equipping all nodes with these SSDs, I did some tests with "fio"
> as recommended, e.g. like this:
>
> fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
> --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
> --name=journal-test
>
> The results are as the following:
>
> ---
> 1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
> write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
>
> Jobs: 4:
> read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
> write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
>
> Jobs: 10:
> read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
> write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
> ---
>
> So the read speed is impressive, but the write speed is really bad.
>
> Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND 
> chips (MLC instead of TLC). The results are, however even worse for
> writing:
>
> ---
> Samsung 970 PRO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
> write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
>
> Jobs: 4:
> read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
> write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
>
> Jobs: 10:
> read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
> write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
> ---
>
> I did some research and found out, that the "--sync" flag sets the 
> flag "O_DSYNC" which seems to disable the SSD cache which leads to 
> these horrid write speeds.
>
> It seems that this relates to the fact that the write cache is only 
> not disabled for SSDs which implement some kind of battery buffer that 
> guarantees a data flush to the flash in case of a powerloss.
>
> However, It seems impossible to find out which SSDs do have this 
> powerloss protection, moreover, these enterprise SSDs are crazy 
> expensive compared to the SSDs above - moreover it's unclear if 
> powerloss protection is even available in the NVMe form factor. So 
> building a 1 or 2 TB cluster seems not really affordable/viable.
>
> So, can please anyone give me hints what to do? Is it possible to 
> ensure that the write cache is not disabled in some way (my server is 
> situated in a data center, so there will probably never be loss of power).
>
> Or is the link above already outdated as newer ceph releases somehow 
> deal with this problem? Or maybe a later Debian release (10) will 
> handle the O_DSYNC flag differently?
>
> Perhaps I should simply invest in faster (and bigger) harddisks and 
> forget the SSD-cluster idea?
>
> Thank you in advance for any help,
>
> Best Regards,
> Hermann


--
With best regards,
   Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-25 Thread Paul Emmerich
Disabling write cache helps with the 970 Pro, but it still sucks. I've
worked on a setup with heavy metadata requirements (gigantic S3
buckets being listed) that unfortunately had all of that stored on 970
Pros and that never really worked out.

Just get a proper SSD like the 883, 983, or 1725. The (tiny) price
difference vs. the consumer disks just isn't worth the hassle and the
problems you are going to run into.

Paul

On Thu, Oct 24, 2019 at 9:08 PM Hermann Himmelbauer  wrote:
>
> Hi,
> I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
> 3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
> interconnected via Infiniband 40.
>
> Problem is that the ceph performance is quite bad (approx. 30MiB/s
> reading, 3-4 MiB/s writing ), so I thought about plugging into each node
> a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
> have a faster ceph storage and also some storage extension.
>
> The question is now which SSDs I should use. If I understand it right,
> not every SSD is suitable for ceph, as is denoted at the links below:
>
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> or here:
> https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
>
> In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
> fast SSD for ceph. As the 950 is not available anymore, I ordered a
> Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
>
> Before equipping all nodes with these SSDs, I did some tests with "fio"
> as recommended, e.g. like this:
>
> fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
> --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
> --name=journal-test
>
> The results are as the following:
>
> ---
> 1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
> write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
>
> Jobs: 4:
> read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
> write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
>
> Jobs: 10:
> read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
> write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
> ---
>
> So the read speed is impressive, but the write speed is really bad.
>
> Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
> chips (MLC instead of TLC). The results are, however even worse for writing:
>
> ---
> Samsung 970 PRO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
> write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
>
> Jobs: 4:
> read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
> write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
>
> Jobs: 10:
> read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
> write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
> ---
>
> I did some research and found out, that the "--sync" flag sets the flag
> "O_DSYNC" which seems to disable the SSD cache which leads to these
> horrid write speeds.
>
> It seems that this relates to the fact that the write cache is only not
> disabled for SSDs which implement some kind of battery buffer that
> guarantees a data flush to the flash in case of a powerloss.
>
> However, It seems impossible to find out which SSDs do have this
> powerloss protection, moreover, these enterprise SSDs are crazy
> expensive compared to the SSDs above - moreover it's unclear if
> powerloss protection is even available in the NVMe form factor. So
> building a 1 or 2 TB cluster seems not really affordable/viable.
>
> So, can please anyone give me hints what to do? Is it possible to ensure
> that the write cache is not disabled in some way (my server is situated
> in a data center, so there will probably never be loss of power).
>
> Or is the link above already outdated as newer ceph releases somehow
> deal with this problem? Or maybe a later Debian release (10) will handle
> the O_DSYNC flag differently?
>
> Perhaps I should simply invest in faster (and bigger) harddisks and
> forget the SSD-cluster idea?
>
> Thank you in advance for any help,
>
> Best Regards,
> Hermann
>
>
> --
> herm...@qwer.tk
> PGP/GPG: 299893C7 (on keyservers)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-25 Thread Vitaliy Filippov
Hi, sorry for intervening, but please try the first also with -fsync=1,  
NVMes sometimes ignore -sync=1 (Bluestore uses fsync).



the Samsung PM1725b is definitely a good choice when it comes to "lower"
price enterprise SSDs. They cost pretty much the same as the Samsung Pro
SSDs but offer way higher DWPD and power loss protection.

My benchmarks of the 3.2TB version in a PCIe 2.0 slot (the card is 3.0!)

fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
write: IOPS=154k, BW=601MiB/s (630MB/s)(35.2GiB/60003msec)

fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4M
--numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
write: IOPS=679, BW=2717MiB/s (2849MB/s)(159GiB/60005msec)


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-25 Thread Georg Fleig

Hi,

the Samsung PM1725b is definitely a good choice when it comes to "lower" 
price enterprise SSDs. They cost pretty much the same as the Samsung Pro 
SSDs but offer way higher DWPD and power loss protection.


My benchmarks of the 3.2TB version in a PCIe 2.0 slot (the card is 3.0!)

fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k 
--numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test

write: IOPS=154k, BW=601MiB/s (630MB/s)(35.2GiB/60003msec)

fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4M 
--numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test

write: IOPS=679, BW=2717MiB/s (2849MB/s)(159GiB/60005msec)


Regards,
Georg

On 24.10.19 21:21, Martin Verges wrote:

Hello,

think about migrating to a way faster and better Ceph version and 
towards bluestore to increase the performance with the existing hardware.


If you want to go with PCIe card, the Samsung PM1725b can provide 
quite good speeds but at much higher costs then the EVO. If you want 
to check drives, take a look at the uncached write latency. The lower 
the value is, the better will be the drive.


--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io 
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Do., 24. Okt. 2019 um 21:09 Uhr schrieb Hermann Himmelbauer 
mailto:herm...@qwer.tk>>:


Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3)
cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
interconnected via Infiniband 40.

Problem is that the ceph performance is quite bad (approx. 30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into
each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
have a faster ceph storage and also some storage extension.

The question is now which SSDs I should use. If I understand it right,
not every SSD is suitable for ceph, as is denoted at the links below:


https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark

In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
fast SSD for ceph. As the 950 is not available anymore, I ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.

Before equipping all nodes with these SSDs, I did some tests with
"fio"
as recommended, e.g. like this:

fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test

The results are as the following:

---
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec

Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec

Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
---

So the read speed is impressive, but the write speed is really bad.

Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
chips (MLC instead of TLC). The results are, however even worse
for writing:

---
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec

Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec

Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
---

I did some research and found out, that the "--sync" flag sets the
flag
"O_DSYNC" which seems to disable the SSD cache which leads to these
horrid write speeds.

It seems that this relates to the fact that the write cache is
only not
disabled for SSDs which implement some kind of battery buffer that
guarantees a data flush to the flash in case of a powerloss.

However, It seems impossible to find out which SSDs do have this
powerloss protection, moreover, these enterprise SSDs are crazy
expensive compared to the SSDs above - moreover it's unclear if
powerloss protection is even available in the NVMe form factor. So
building a 1 or 2 TB

[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-24 Thread Vitaliy Filippov
Especially https://yourcmc.ru/wiki/Ceph_performance#CAPACITORS.21 but I  
recommend you to read the whole article


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-24 Thread Vitaliy Filippov

It's easy:

https://yourcmc.ru/wiki/Ceph_performance


Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
interconnected via Infiniband 40.

Problem is that the ceph performance is quite bad (approx. 30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
have a faster ceph storage and also some storage extension.

The question is now which SSDs I should use. If I understand it right,
not every SSD is suitable for ceph, as is denoted at the links below:

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark

In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
fast SSD for ceph. As the 950 is not available anymore, I ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.

Before equipping all nodes with these SSDs, I did some tests with "fio"
as recommended, e.g. like this:

fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test

The results are as the following:

---
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec

Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec

Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
---

So the read speed is impressive, but the write speed is really bad.

Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
chips (MLC instead of TLC). The results are, however even worse for  
writing:


---
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec

Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec

Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
---

I did some research and found out, that the "--sync" flag sets the flag
"O_DSYNC" which seems to disable the SSD cache which leads to these
horrid write speeds.

It seems that this relates to the fact that the write cache is only not
disabled for SSDs which implement some kind of battery buffer that
guarantees a data flush to the flash in case of a powerloss.

However, It seems impossible to find out which SSDs do have this
powerloss protection, moreover, these enterprise SSDs are crazy
expensive compared to the SSDs above - moreover it's unclear if
powerloss protection is even available in the NVMe form factor. So
building a 1 or 2 TB cluster seems not really affordable/viable.

So, can please anyone give me hints what to do? Is it possible to ensure
that the write cache is not disabled in some way (my server is situated
in a data center, so there will probably never be loss of power).

Or is the link above already outdated as newer ceph releases somehow
deal with this problem? Or maybe a later Debian release (10) will handle
the O_DSYNC flag differently?

Perhaps I should simply invest in faster (and bigger) harddisks and
forget the SSD-cluster idea?

Thank you in advance for any help,

Best Regards,
Hermann



--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-24 Thread Frank Schilder
Dear Hermann,

try your tests again with volatile write cache disabled ([s/h]dparm -W 0 
DEVICE). If your disks have super capacitors, you should then see spec 
performance (possibly starting with iodopth=2 or 4) with your fio test. A good 
article is this one here: 
https://yourcmc.ru/wiki/index.php?title=Ceph_performance .

The feature you are looking for is called "power loss protection". I would 
expect Samsung PRO disks to have it.

The fio test with iodepth=1 will give you an indication of what you an expect 
from a single OSD deployed on the disk. When choosing disks, also look for 
DWPD>=1.

In addition, as Martin writes, consider upgrading and deploy all new disks with 
bluestore.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Martin Verges 
Sent: 24 October 2019 21:21
To: Hermann Himmelbauer
Cc: ceph-users
Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster

Hello,

think about migrating to a way faster and better Ceph version and towards 
bluestore to increase the performance with the existing hardware.

If you want to go with PCIe card, the Samsung PM1725b can provide quite good 
speeds but at much higher costs then the EVO. If you want to check drives, take 
a look at the uncached write latency. The lower the value is, the better will 
be the drive.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io<mailto:martin.ver...@croit.io>
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Do., 24. Okt. 2019 um 21:09 Uhr schrieb Hermann Himmelbauer 
mailto:herm...@qwer.tk>>:
Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
interconnected via Infiniband 40.

Problem is that the ceph performance is quite bad (approx. 30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
have a faster ceph storage and also some storage extension.

The question is now which SSDs I should use. If I understand it right,
not every SSD is suitable for ceph, as is denoted at the links below:

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark

In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
fast SSD for ceph. As the 950 is not available anymore, I ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.

Before equipping all nodes with these SSDs, I did some tests with "fio"
as recommended, e.g. like this:

fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test

The results are as the following:

---
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec

Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec

Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
---

So the read speed is impressive, but the write speed is really bad.

Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
chips (MLC instead of TLC). The results are, however even worse for writing:

---
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec

Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec

Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
---

I did some research and found out, that the "--sync" flag sets the flag
"O_DSYNC" which seems to disable the SSD cache which leads to these
horrid write speeds.

It seems that this relates to the fact that the write cache is only not
disabled for SSDs which implement some kind of battery buffer that
guarantees a data flush to the flash in case of a powerloss.

However, It seems impossible to find out which SSDs do have this
powerloss protection, moreover, these enterprise SSDs are crazy
expensive compared to the SSDs above - moreover it's unclear if
powerloss protection is even available in the NVMe form factor. So
building a 1 or 2

[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-24 Thread Martin Verges
Hello,

think about migrating to a way faster and better Ceph version and towards
bluestore to increase the performance with the existing hardware.

If you want to go with PCIe card, the Samsung PM1725b can provide quite
good speeds but at much higher costs then the EVO. If you want to check
drives, take a look at the uncached write latency. The lower the value is,
the better will be the drive.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Do., 24. Okt. 2019 um 21:09 Uhr schrieb Hermann Himmelbauer <
herm...@qwer.tk>:

> Hi,
> I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on
> 3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks),
> interconnected via Infiniband 40.
>
> Problem is that the ceph performance is quite bad (approx. 30MiB/s
> reading, 3-4 MiB/s writing ), so I thought about plugging into each node
> a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to
> have a faster ceph storage and also some storage extension.
>
> The question is now which SSDs I should use. If I understand it right,
> not every SSD is suitable for ceph, as is denoted at the links below:
>
>
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> or here:
> https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
>
> In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a
> fast SSD for ceph. As the 950 is not available anymore, I ordered a
> Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
>
> Before equipping all nodes with these SSDs, I did some tests with "fio"
> as recommended, e.g. like this:
>
> fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
> --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting
> --name=journal-test
>
> The results are as the following:
>
> ---
> 1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
> write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
>
> Jobs: 4:
> read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
> write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
>
> Jobs: 10:
> read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
> write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
> ---
>
> So the read speed is impressive, but the write speed is really bad.
>
> Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND
> chips (MLC instead of TLC). The results are, however even worse for
> writing:
>
> ---
> Samsung 970 PRO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
> write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
>
> Jobs: 4:
> read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
> write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
>
> Jobs: 10:
> read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
> write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
> ---
>
> I did some research and found out, that the "--sync" flag sets the flag
> "O_DSYNC" which seems to disable the SSD cache which leads to these
> horrid write speeds.
>
> It seems that this relates to the fact that the write cache is only not
> disabled for SSDs which implement some kind of battery buffer that
> guarantees a data flush to the flash in case of a powerloss.
>
> However, It seems impossible to find out which SSDs do have this
> powerloss protection, moreover, these enterprise SSDs are crazy
> expensive compared to the SSDs above - moreover it's unclear if
> powerloss protection is even available in the NVMe form factor. So
> building a 1 or 2 TB cluster seems not really affordable/viable.
>
> So, can please anyone give me hints what to do? Is it possible to ensure
> that the write cache is not disabled in some way (my server is situated
> in a data center, so there will probably never be loss of power).
>
> Or is the link above already outdated as newer ceph releases somehow
> deal with this problem? Or maybe a later Debian release (10) will handle
> the O_DSYNC flag differently?
>
> Perhaps I should simply invest in faster (and bigger) harddisks and
> forget the SSD-cluster idea?
>
> Thank you in advance for any help,
>
> Best Regards,
> Hermann
>
>
> --
> herm...@qwer.tk
> PGP/GPG: 299893C7 (on keyservers)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an