[ceph-users] Re: Repurposing some Dell R750s for Ceph

2024-07-12 Thread Drew Weaver
Okay it seems like we don't really have a definitive answer on whether it's OK 
to use a RAID controller or not and in what capacity.

Passthrough meaning:

Are you saying that it's OK to use a raid controller where the disks are in 
non-RAID mode?
Are you saying that it's OK to use a raid controller where each disk is in its 
own RAID-0 volume?

I'm just trying to clarify a little bit. You can imagine that nobody wants to 
be that user that does this against the documentation's guidelines and then 
something goes terribly wrong.

Thanks again,
-Drew


-Original Message-
From: Anthony D'Atri  
Sent: Thursday, July 11, 2024 7:24 PM
To: Drew Weaver 
Cc: John Jasen ; ceph-users@ceph.io
Subject: Re: [ceph-users] Repurposing some Dell R750s for Ceph



> 
> Isn’t the supported/recommended configuration to use an HBA if you have to 
> but never use a RAID controller?

That may be something I added to the docs.  My contempt for RAID HBAs knows no 
bounds ;)

Ceph doesn’t care.  Passthrough should work fine, I’ve done that for tends of 
thousands of OSDs, albeit on different LSI HBA SKUs.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Repurposing some Dell R750s for Ceph

2024-07-11 Thread Drew Weaver
Hi,

>I don't think the motherboard has enough PCIe lanes to natively connect all 
>the drives: the RAID controller effectively functioned as a expander, so you 
>needed less PCIe lanes on the motherboard.
>As the quickest way forward: look for passthrough / single-disk / RAID0 
>options, in that order, in the controller management tools (perccli etc).
>I haven't used the N variant at all, and since it's NVME presented as 
>SCSI/SAS, I don't want to trust the solution of reflashing the controller for 
>IT (passthrough) mode.

Reviewing the diagrams of the system with the H755N and without the H755N it 
looks like the PCIe lanes are limited by the cables they use when you order 
them with these RAID controllers.  (which I guess is why in order to put 16x 
drives in a system you need two controllers). It looks like 

As far as passthrough RAID0 disks, etc I was told that is not a valid 
configuration and wouldn't be supported.

Thanks for the information.
-Drew
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Repurposing some Dell R750s for Ceph

2024-07-11 Thread Drew Weaver
Hi,

Isn’t the supported/recommended configuration to use an HBA if you have to but 
never use a RAID controller?

The backplane is already NVMe as the drives installed in the system currently 
are already NVMe.

Also I was looking through some diagrams of the R750 and it appears that if you 
order them with the RAID controller(s) the bandwidth between the backplane and 
the system is hamstrung to some degree because of the cables they are using so 
even if I could configure them in NON raid it would still be suboptimal.

Thanks for the information.
-Drew


From: John Jasen 
Sent: Thursday, July 11, 2024 10:06 AM
To: Drew Weaver 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Repurposing some Dell R750s for Ceph

retrofitting the guts of a Dell PE R7xx server is not straightforward. You 
could be looking into replacing the motherboard, the backplane, and so forth.

You can probably convert the H755N card to present the drives to the OS, so you 
can use them for Ceph. This may be AHCI mode, pass-through mode, non-RAID 
device, or some other magic words in the raid configuration utility.

This should be in the raid documentation, somewhere.



On Thu, Jul 11, 2024 at 9:17 AM Drew Weaver 
mailto:drew.wea...@thenap.com>> wrote:
Hello,

We would like to repurpose some Dell PowerEdge R750s for a Ceph cluster.

Currently the servers have one H755N RAID controller for each 8 drives. (2 
total)

I have been asking their technical support what needs to happen in order for us 
to just rip out those raid controllers and cable the backplane directly to the 
motherboard/PCIe lanes and they haven't been super enthusiastic about helping 
me. I get it just buy another 50 servers, right? No big deal.

I have the diagrams that show how each configuration should be connected, I 
think I just need the right cable(s), my question is has anyone done this work 
before and was it worth it?

Also bonus if anyone has an R750 that has the drives directly connected to the 
backplane and can find the part number of the cable that connects the backplane 
to the motherboard I would greatly appreciate that part number. My sales guys 
are "having a hard time locating it".

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Repurposing some Dell R750s for Ceph

2024-07-11 Thread Drew Weaver
Hi,

I'm a bit confused by your question the 'drive bays' or backplane is the same 
for an NVMe system, it's either a SATA/SAS/NVME backplane or a NVMe backplane.

I don't understand why you believe that my configuration has to be 3.5" as it 
isn't. It's a 16x2.5" chassis with two H755N controllers (one for each set of 8 
drives).

The H755N controller is a hardware raid adapter for NVMe.

I hope this clarifies the confusion.

Thank you,
-Drew


-Original Message-
From: Frank Schilder  
Sent: Thursday, July 11, 2024 9:57 AM
To: Drew Weaver ; 'ceph-users@ceph.io' 

Subject: Re: Repurposing some Dell R750s for Ceph

Hi Drew,

as far as I know Dell's drive bays for RAID controllers are not the same as the 
drive bays for CPU attached disks. In particular, I don't think they have that 
config for 3.5" drive bays and your description sounds a lot like that's what 
you have. Are you trying to go from 16x2.5" HDD to something like 24xNVMe?

Maybe you could provide a bit more information here, like (links to) the wiring 
diagrams you mentioned? From the description I cannot entirely deduce what 
exactly you have and where you want to go to.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________
From: Drew Weaver 
Sent: Thursday, July 11, 2024 3:16 PM
To: 'ceph-users@ceph.io'
Subject: [ceph-users] Repurposing some Dell R750s for Ceph

Hello,

We would like to repurpose some Dell PowerEdge R750s for a Ceph cluster.

Currently the servers have one H755N RAID controller for each 8 drives. (2 
total)

I have been asking their technical support what needs to happen in order for us 
to just rip out those raid controllers and cable the backplane directly to the 
motherboard/PCIe lanes and they haven't been super enthusiastic about helping 
me. I get it just buy another 50 servers, right? No big deal.

I have the diagrams that show how each configuration should be connected, I 
think I just need the right cable(s), my question is has anyone done this work 
before and was it worth it?

Also bonus if anyone has an R750 that has the drives directly connected to the 
backplane and can find the part number of the cable that connects the backplane 
to the motherboard I would greatly appreciate that part number. My sales guys 
are "having a hard time locating it".

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Repurposing some Dell R750s for Ceph

2024-07-11 Thread Drew Weaver
Hello,

We would like to repurpose some Dell PowerEdge R750s for a Ceph cluster.

Currently the servers have one H755N RAID controller for each 8 drives. (2 
total)

I have been asking their technical support what needs to happen in order for us 
to just rip out those raid controllers and cable the backplane directly to the 
motherboard/PCIe lanes and they haven't been super enthusiastic about helping 
me. I get it just buy another 50 servers, right? No big deal.

I have the diagrams that show how each configuration should be connected, I 
think I just need the right cable(s), my question is has anyone done this work 
before and was it worth it?

Also bonus if anyone has an R750 that has the drives directly connected to the 
backplane and can find the part number of the cable that connects the backplane 
to the motherboard I would greatly appreciate that part number. My sales guys 
are "having a hard time locating it".

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Viability of NVMeOF/TCP for VMWare

2024-06-27 Thread Drew Weaver
Howdy,

I recently saw that Ceph has a gateway which allows VMWare ESXi to connect to 
RBD.

We had another gateway like this awhile back the ISCSI gateway.

The ISCSI gateway ended up being... let's say problematic.

Is there any reason to believe that NVMeOF will also end up on the floor and 
has anyone that uses VMWare extensively evaluated it's viability?

Just curious!

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-16 Thread Drew Weaver
>Groovy.  Channel drives are IMHO a pain, though in the case of certain 
>manufacturers it can be the only way to get firmware updates.  Channel drives 
>often only have a 3 year warranty, vs 5 for generic drives.

Yes, we have run into this with Kioxia as far as being able to find new 
firmware. Which MFG tends to be the most responsible in this regard in your 
view? Not looking for a vendor rec or anything just specifically for this one 
particular issue.

Thanks!
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-16 Thread Drew Weaver
By HBA I suspect you mean a non-RAID HBA?

Yes, something like the HBA355

NVMe SSDs shouldn’t cost significantly more than SATA SSDs.  Hint:  certain 
tier-one chassis manufacturers mark both the fsck up.  You can get a better 
warranty and pricing by buying drives from a VAR.

  We stopped buying “Vendor FW” drives a long time ago. Although 
when the PowerEdge R750 originally came out they removed the ability for the 
DRAC to monitor the endurance of the non-vendor SSDs to penalize us, it took 
about 6 months or arguing to get them to put that back in.

It’s a trap!  Which is to say, that the $/GB really isn’t far away, and in fact 
once you step back to TCO from the unit economics of the drive in insolation, 
the HDDs often turn out to be *more* expensive.

  I suppose depending on what DWPD/endurance you are assuming on 
the SSDs but also in my very specific case we have PBs of HDDs in inventory so 
that costs us…no additional money. My comment on there being more economical 
NVMe disks available was simply that if we are all changing over to NVMe but we 
don’t right now need to be able to move 7GB/s per drive it would be cool to 
just stop buying anything with SATA in it and then just change out the drives 
later.  Which was kind of the vibe with SATA when SSDs were first introduced. 
Everyone disagrees with me on this point but it doesn’t really make sense that 
you have to choose between SATA or NVME on a system with a backplane.

But yes I see all of your points as far as if I was trying to build a Ceph 
cluster as primary storage and had a budget for this project. That would indeed 
change everything about my algebra.

Thanks for your time and consideration I appreciate it.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-15 Thread Drew Weaver
Oh, well what I was going to do was just use SATA HBAs on PowerEdge R740s 
because we don't really care about performance as this is just used as a copy 
point for backups/archival but the current Ceph cluster we have [Which is based 
on HDDs attached to Dell RAID controllers with each disk in RAID-0 and works 
just fine for us] is on EL7 and that is going to be EOL soon. So I thought it 
might be better on the new cluster to use HBAs instead of having the OSDs just 
be single disk RAID-0 volumes because I am pretty sure that's the least good 
scenario whether or not it has been working for us for like 8 years now.

So I asked on the list for recommendations and also read on the website and it 
really sounds like the only "right way" to run Ceph is by directly attaching 
disks to a motherboard. I had thought that HBAs were okay before but I am 
probably confusing that with ZFS/BSD or some other equally hyperspecific 
requirement. The other note was about how using NVMe seems to be the only right 
way now too.

I would've rather just stuck to SATA but I figured if I was going to have to 
buy all new servers that direct attach the SATA ports right off the 
motherboards to a backplane I may as well do it with NVMe (even though the 
price of the media will be a lot higher).

It would be cool if someone made NVMe drives that were cost competitive and had 
similar performance to hard drives (meaning, not super expensive but not 
lightning fast either) because the $/GB on datacenter NVMe drives like Kioxia, 
etc is still pretty far away from what it is for HDDs (obviously).

Anyway thanks.
-Drew





-Original Message-
From: Robin H. Johnson  
Sent: Sunday, January 14, 2024 5:00 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: recommendation for barebones server with 8-12 direct 
attach NVMe?

On Fri, Jan 12, 2024 at 02:32:12PM +0000, Drew Weaver wrote:
> Hello,
> 
> So we were going to replace a Ceph cluster with some hardware we had 
> laying around using SATA HBAs but I was told that the only right way 
> to build Ceph in 2023 is with direct attach NVMe.
> 
> Does anyone have any recommendation for a 1U barebones server (we just 
> drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct 
> attached to the motherboard without a bridge or HBA for Ceph 
> specifically?
If you're buying new, Supermicro would be my first choice for vendor based on 
experience.
https://www.supermicro.com/en/products/nvme

You said 2.5" bays, which makes me think you have existing drives.
There are models to fit that, but if you're also considering new drives, you 
can get further density in E1/E3

The only caveat is that you will absolutely want to put a better NIC in these 
systems, because 2x10G is easy to saturate with a pile of NVME.

--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB 
E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-12 Thread Drew Weaver
Hello,

So we were going to replace a Ceph cluster with some hardware we had laying 
around using SATA HBAs but I was told that the only right way to build Ceph in 
2023 is with direct attach NVMe.

Does anyone have any recommendation for a 1U barebones server (we just drop in 
ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the 
motherboard without a bridge or HBA for Ceph specifically?

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Building new cluster had a couple of questions

2023-12-26 Thread Drew Weaver
Okay so NVMe is the only path forward?

I was simply going to replace the PERC H750s with some HBA350s but if that will 
not work I will just wait until I have a pile of NVMe servers that we aren't 
using in a few years, I guess.

Thanks,
-Drew



From: Anthony D'Atri 
Sent: Friday, December 22, 2023 12:33 PM
To: Drew Weaver 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Building new cluster had a couple of questions




Sorry I thought of one more thing.

I was actually re-reading the hardware recommendations for Ceph and it seems to 
imply that both RAID controllers as well as HBAs are bad ideas.

Advice I added most likely ;)   "RAID controllers" *are* a subset of HBAs BTW.  
The nomenclature can be confusing and there's this guy on Reddit 


I believe I remember knowing that RAID controllers are sub optimal but I guess 
I don't understand how you would actually build a cluster with many disks 12-14 
each per server without any HBAs in the servers.

NVMe

Are there certain HBAs that are worse than others? Sorry I am just confused.

For liability and professionalism I won't name names, especially in serverland 
there's one manufacturer who dominates.

There are three main points, informed by years of wrangling the things.  I 
posted a litany of my experiences to this very list a few years back, including 
data-losing firmware / utility bugs and operationally expensive ECOs.

* RoC HBAs aka "RAID controllers" are IMHO a throwback to the days when x86 / 
x64 servers didn't have good software RAID.  In the land of the Sun we had VxVM 
(at $$$) that worked well, and Sun's SDS/ODS that ... got better over time.  I 
dunno if the Microsoft world has bootable software RAID now or not.  They are 
in my experience flaky and a pain to monitor.  Granted they offer the potential 
for a system to boot without intervention if the first boot drive is horqued, 
but IMHO that doesn't happen nearly often enough to justify the hassle.

* These things can have significant cost, especially if one shells out for 
cache RAM, BBU/FBWC, etc.  Today I have a handful of legacy systems that were 
purchased with a tri-mode HBA that in 2018 had a list price of USD 2000.  The  
*only* thing it's doing is mirroring two SATA boot drives.  That money would be 
better spent on SSDs, either with a non-RAID aka JBOD HBA, or better NVMe.

* RAID HBAs confound observability.  Many models today have a JBOD / 
passthrough mode -- in which case why pay for all the RAID-fu?  Some, 
bizarrely, still don't, and one must set up a single-drive RAID0 volume around 
every drive for the system to see it.  This makes iostat even less useful than 
it already is, and one has to jump through hoops to get SMART info.  Hoops 
that, for example, the very promising upstream smartctl_exporter doesn't have.

There's a certain current M.2 boot drive module like this, the OS cannot see 
the drives AT ALL unless they're in a virtual drive.  Like Chong said, there's 
too much recession.

When using SATA or SAS, you can get a non-RAID HBA for much less money than a 
RAID HBA.  But the nuance here is that unless you have pre-existing gear, SATA 
and especially SATA *do not save money*.  This is heterodox to conventional 
wisdom.

An NVMe-only chassis does not need an HBA of any kind.  NVMe *is* PCI-e.  It 
especially doesn't need an astronomically expensive NVMe-capable RAID 
controller, at least not for uses like Ceph.  If one has an unusual use-case 
that absolutely requires a single volume, if LVM doesn't cut it for some reason 
-- maybe.  And there are things like Phison and Scaleflux that are out of 
scope, we're talking about Ceph here.

Some chassis vendors try hard to stuff an RoC HBA down your throat, with rather 
high markups.  Others may offer a basic SATA HBA built into the motherboard if 
you need it for some reason.

So when you don't have to spend USD hundreds to a thousand on an RoC HBA + 
BBU/cache/FBWC and jump through hoops to have one more thing to monitor, and an 
NVMe SSD doesn't cost significantly more than a SATA SSD, an all-NVMe system 
can easily be *less* expensive than SATA or especially SAS.  SAS is very, very 
much in its last days in the marketplace; SATA is right behind it.  In 5-10 
years you'll be hard-pressed to find enterprise SAS/SATA SSDs, or if you can, 
might only be from a single manufacturer -- which is an Akbarian trap.

This calculator can help show the false economy of SATA / SAS, and especially 
of HDDs.  Yes, in the enterprise, HDDs are *not* less expensive unless you're a 
slave to $chassisvendor.

Total Cost of Ownership (TCO) Model for 
Storage<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.snia.org_forums_cmsi_programs_TCOcalc=DwMFAg=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw=M811ByC6EF1KzizsPxKtADBJDEbo7HguscxGSxUK3fN5VqH6k410tFMhUt4tAaNA=m2f0ZHAwOxOp0EYlVtdSd7mmhB2j4-xQBB0RfYBoUug=>
snia.org<https://urldefense.

[ceph-users] Re: Building new cluster had a couple of questions

2023-12-22 Thread Drew Weaver
Sorry I thought of one more thing.

I was actually re-reading the hardware recommendations for Ceph and it seems to 
imply that both RAID controllers as well as HBAs are bad ideas.

I believe I remember knowing that RAID controllers are sub optimal but I guess 
I don't understand how you would actually build a cluster with many disks 12-14 
each per server without any HBAs in the servers.

Are there certain HBAs that are worse than others? Sorry I am just confused.

Thanks,
-Drew

-Original Message-
From: Drew Weaver  
Sent: Thursday, December 21, 2023 8:51 AM
To: 'ceph-users@ceph.io' 
Subject: [ceph-users] Building new cluster had a couple of questions

Howdy,

I am going to be replacing an old cluster pretty soon and I am looking for a 
few suggestions.

#1 cephadm or ceph-ansible for management?
#2 Since the whole... CentOS thing... what distro appears to be the most 
straightforward to use with Ceph?  I was going to try and deploy it on Rocky 9.

That is all I have.

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Building new cluster had a couple of questions

2023-12-21 Thread Drew Weaver
Howdy,

I am going to be replacing an old cluster pretty soon and I am looking for a 
few suggestions.

#1 cephadm or ceph-ansible for management?
#2 Since the whole... CentOS thing... what distro appears to be the most 
straightforward to use with Ceph?  I was going to try and deploy it on Rocky 9.

That is all I have.

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around? [EXT]

2023-02-14 Thread Drew Weaver
That is pretty awesome, I will look into doing it that way. All of our 
monitoring is integrated to use the very very expensive DRAC enterprise license 
we pay for (my fault for trusting Dell).

We are looking for a new hardware vendor but this will likely work for the 
mistake we already made.

Thanks,
-Drew





-Original Message-
From: Dave Holland  
Sent: Tuesday, February 14, 2023 11:39 AM
To: Drew Weaver 
Cc: 'ceph-users@ceph.io' 
Subject: Re: [ceph-users] iDRAC 9 version 6.10 shows 0% for write endurance on 
non-dell drives, work around? [EXT]

On Tue, Feb 14, 2023 at 04:00:30PM +, Drew Weaver wrote:
> What are you folks using to monitor your write endurance on your SSDs that 
> you couldn't buy from Dell because they had a 16 week lead time while the MFG 
> could deliver the drives in 3 days?

Our Ceph servers are SuperMicro not Dell but this approach is portable. We 
wrote a little shell script to parse the output of "nvme"
and/or "smartctl" every hour and send the data to a Graphite server.
We have a Grafana dashboard to display the all-important graphs. After
~5 years life, our most worn NVMe (used for journal/db only -- data is on HDD) 
is showing 89% life remaining.

Dave
-- 
**   Dave Holland   ** Systems Support -- Informatics Systems Group **
** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK**


--
 The Wellcome Sanger Institute is operated by Genome Research  Limited, a 
charity registered in England with number 1021457 and a  company registered in 
England with number 2742969, whose registered  office is 215 Euston Road, 
London, NW1 2BE. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around?

2023-02-14 Thread Drew Weaver
Hello,

After upgrading a lot of iDRAC9 modules to version 6.10 in servers that are 
involved in a Ceph cluster we noticed that the iDRAC9 shows the write endurance 
as 0% on any non-certified disk.

OMSA still shows the correct remaining write endurance but I am assuming that 
they are working feverishly to eliminate that too.

I opened a support ticket with Dell once this was brought to my attention and 
they basically told me that I was lucky that it ever worked at all, which I 
thought was an odd response given that the iDRAC enterprise licenses cost 
several hundred dollars each.

I know that the old Intel Datacenter Tool used to be able to reach through a 
MegaRAID controller and read the remaining write endurance but that tool is 
essentially defunct now.

What are you folks using to monitor your write endurance on your SSDs that you 
couldn't buy from Dell because they had a 16 week lead time while the MFG could 
deliver the drives in 3 days?

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recommended SSDs for Ceph

2022-09-29 Thread Drew Weaver
Hello,

We had been using Intel SSD D3 S4610/20 SSDs but Solidigm is... having 
problems Bottom line is they haven't shipped an order in a year.

Does anyone have any recommendations on SATA SSDs that have a fairly good mix 
of performance/endurance/cost?

I know that they should all just work because they are just block devices, but 
I'm specifically wondering about better endurance.

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW performance as a Veeam capacity tier

2021-09-30 Thread Drew Weaver
Just an update for anyone that sees this it looks like Veeam doesn't index it's 
content real well and as such when it offloads it, it is random IO which means 
that the IOPS and throughput is not great and you really need to overbuild your 
volumes (RAID) on your Veeam server to get any kind of performance out of it. 
So on a 4disk r10 you get about 30M/s when offloading.



-Original Message-
From: Konstantin Shalygin  
Sent: Saturday, July 10, 2021 10:28 AM
To: Nathan Fish 
Cc: Drew Weaver ; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: RGW performance as a Veeam capacity tier

Veeam normally produced 2-4Gbit/s to S3 in our case


k

Sent from my iPhone

> On 10 Jul 2021, at 08:36, Nathan Fish  wrote:
> 
> No, that's pretty slow, you should get at least 10x that for 
> sequential writes. Sounds like Veeam is doing a lot of sync random 
> writes. If you are able to add a bit of SSD (preferably NVMe) for 
> journaling, that can help random IO a lot. Alternatively, look into IO 
> settings for Veeam.
> 
> For reference, we have ~100 drives with size=3, and get ~3 GiB/s 
> sequential with the right benchmark tuning.
> 
>> On Fri, Jul 9, 2021 at 1:59 PM Drew Weaver  wrote:
>> 
>> Greetings.
>> 
>> I've begun testing using Ceph 14.2.9 as a capacity tier for a scale out 
>> backup repository in Veeam 11.
>> 
>> The backup host and the RGW server are connected directly at 10Gbps.
>> 
>> It would appear that the maximum throughput that Veeam is able to achieve 
>> while archiving data to this cluster is about 24MB/sec.
>> 
>> client:   156 KiB/s rd, 24 MiB/s wr, 156 op/s rd, 385 op/s wr
>> 
>> The cluster has 6 OSD hosts with a total of 48 4TB SATA drives.
>> 
>> Does that performance sound about right for 48 4TB SATA drives /w 10G 
>> networking?
>> 
>> Thanks,
>> -Drew
>> 
>> 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Migrating CEPH OS looking for suggestions

2021-09-30 Thread Drew Weaver
Hi,

I am going to migrate our ceph cluster to a new OS and I am trying to choose 
the right one so that I won't have to replace it again when python4 becomes a 
requirement mid-cycle [or whatever].

Has anyone seen any recommendations from the devs as to what distro they are 
targeting for lets say the next 5 years?

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW performance as a Veeam capacity tier

2021-07-09 Thread Drew Weaver
Greetings.

I've begun testing using Ceph 14.2.9 as a capacity tier for a scale out backup 
repository in Veeam 11.

The backup host and the RGW server are connected directly at 10Gbps.

It would appear that the maximum throughput that Veeam is able to achieve while 
archiving data to this cluster is about 24MB/sec.

client:   156 KiB/s rd, 24 MiB/s wr, 156 op/s rd, 385 op/s wr

The cluster has 6 OSD hosts with a total of 48 4TB SATA drives.

Does that performance sound about right for 48 4TB SATA drives /w 10G 
networking?

Thanks,
-Drew





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Rolling upgrade model to new OS

2021-06-04 Thread Drew Weaver
Hello,

I need to upgrade the OS that our Ceph cluster is running on to support new 
versions of Ceph.

Has anyone devised a model for how you handle this?

Do you just:

Install some new nodes with the new OS
Install the old version of Ceph on the new nodes
Add those nodes/osds to the cluster
Remove the old nodes
Upgrade Ceph on the new nodes

Are there any specific OS that Ceph has said that will have longer future 
version support? Would like to only touch the OS every 3-4 years if possible.

Thanks,
-Drew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Replacing disk with xfs on it, documentation?

2021-03-09 Thread Drew Weaver
Hello,

I haven't needed to replace a disk in awhile and it seems that I have misplaced 
my quick little guide on how to do it.

When searching the docs it is now recommending that you should use ceph-volume 
to create OSDs when doing that it creates LV:

Disk /dev/sde: 4000.2 GB, 4000225165312 bytes, 7812939776 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk 
/dev/mapper/ceph--34b5a0a9--f84a--416f--8b74--fb1e05161f80-osd--block--4581caf4--eef0--42e1--b237--c114dfde3d15:
 4000.2 GB, 4000220971008 bytes, 7812931584 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Cool but all of my other OSDs look like this and appear to just be XFS.

Disk /dev/sdd: 4000.2 GB, 4000225165312 bytes, 7812939776 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt
Disk identifier: D0195044-10B5-4113-8210-A5CFCB9213A2


# Start  EndSize  TypeName
1 2048   206847100M  Ceph OSDceph data
2   206848   78129397423.7T  unknown ceph block

35075 ?S< 0:00 [xfs-buf/sdd1]
  35076 ?S< 0:00 [xfs-data/sdd1]
  35077 ?S< 0:00 [xfs-conv/sdd1]
  35078 ?S< 0:00 [xfs-cil/sdd1]
  35079 ?S< 0:00 [xfs-reclaim/sdd]
  35080 ?S< 0:00 [xfs-log/sdd1]
  35082 ?S  0:00 [xfsaild/sdd1]

I can't seem to find the instructions in order to create an OSD like the 
original ones anymore.

I'm pretty sure that when this cluster was setup it was setup using the:

ceph-deploy osd prepare
ceph-deploy osd create
ceph-deploy osd activate

commands, but it seems as though the prepare and activate commands have since 
been removed from ceph-deploy, so I am really confused. =)

Does anyone by chance have the instructions for replacing a failed drive when 
it meets the above criteria?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Resolving LARGE_OMAP_OBJECTS

2021-03-05 Thread Drew Weaver
Sorry for multi-reply, I got that command to run:

for obj in $(rados -p default.rgw.buckets.index ls | grep 
2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637); do printf "%-60s %7d\n" $obj 
$(rados -p default.rgw.buckets.index listomapkeys $obj | wc -l); done;
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.0  10423
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.15 10445
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.3  10542
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.6  10511
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.13 10414
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.12 10479
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.2  10486
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.5  10448
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.4  10470
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.8  10474
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.1  10470
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.10 10411
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.7  10445
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.14 10413
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.9  10356
.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637.11 10410

-Original Message-
From: Benoît Knecht  
Sent: Friday, March 5, 2021 12:00 PM
To: Drew Weaver 
Cc: 'ceph-users@ceph.io' 
Subject: RE: [ceph-users] Resolving LARGE_OMAP_OBJECTS

On Friday, March 5th, 2021 at 15:20, Drew Weaver  wrote:
> Sorry to sound clueless but no matter what I search for on El Goog I can't 
> figure out how to answer the question as to whether dynamic sharding is 
> enabled in our environment.
>
> It's not configured as true in the config files, but it is the default.
>
> Is there a radosgw-admin command to determine whether or not it's enabled in 
> the running environment?

If `rgw_dynamic_resharding` is not explicitly set to `false` in your 
environment, I think we can assume dynamic resharding is enabled. And if any of 
your buckets have more than one shard and you didn't reshard them manually, 
you'll know for sure dynamic resharding is working; you can check the number of 
shards on a bucket with `radosgw-admin bucket stats --bucket=`, there's a 
`num_shards` field. You can also check with `radosgw-admin bucket limit check` 
if any of your buckets are about to be resharded.

Assuming dynamic resharding is enabled and none of your buckets are about to be 
resharded, I would then find out which object has too many OMAP keys by 
grepping the logs. The name of the object will contain the bucket ID (also 
found in the output of `radosgw-admin bucket stats`), so you'll know which 
bucket is causing the issue. And you can check how many OMAP keys are in each 
shard of that bucket index using

```
for obj in $(rados -p default.rgw.buckets.index ls | grep 
eaf0ece5-9f4a-4aa8-9d67-8c6698f7919b.88726492.4); do
  printf "%-60s %7d\n" $obj $(rados -p default.rgw.buckets.index listomapkeys 
$obj | wc -l) done ```

(where `eaf0ece5-9f4a-4aa8-9d67-8c6698f7919b.88726492.4` is your bucket ID). If 
the number of keys are very uneven amongst the shards, there's probably an 
issue that needs to be addressed. If you they are relatively even but slightly 
above the warning threshold, it's probably a versioned bucket, and it should be 
safe to simply increase the threshold.

Cheers,

--
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Resolving LARGE_OMAP_OBJECTS

2021-03-05 Thread Drew Weaver
Hi,

Only 2 of the buckets are really used:

"buckets": [
{
"bucket": "test",
"tenant": "",
"num_objects": 968107,
"num_shards": 16,
"objects_per_shard": 60506,
"fill_status": "OK"
}
]

"buckets": [
{
"bucket": "prototype",
"tenant": "",
"num_objects": 4633533,
"num_shards": 98,
"objects_per_shard": 47280,
"fill_status": "OK"
}
]

In the log it shows this:

cluster [WRN] Large omap object found. Object: 
7:3019f91b:::.dir.2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.213.0:head PG: 
7.d89f980c (7.c) Key count: 206795 Size (bytes): 46499042

pool id 7 is default.rgw.buckets.index

The entry in the log doesn't match any of the bucket ids but this one exists: 
"id": "2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637",

Thanks to a RedHat KB article that mentioned checking for stale instances I ran 
this:

# radosgw-admin reshard stale-instances list | wc -l
713

This command:

```
for obj in $(rados -p default.rgw.buckets.index ls | grep 
2b67ef7c-2015-4ca0-bf50-b7595d01e46e.74194.637); do
  printf "%-60s %7d\n" $obj $(rados -p default.rgw.buckets.index listomapkeys 
$obj | wc -l) done ```

returns this:

-bash: command substitution: line 4: syntax error: unexpected end of file

I figured perhaps you were using ``` to denote code so I tried running it 
without that and also on one line and neither of those did the trick.

Is that just bash?

Thanks so much for all of your help thus far.

-Drew

-----Original Message-
From: Benoît Knecht  
Sent: Friday, March 5, 2021 12:00 PM
To: Drew Weaver 
Cc: 'ceph-users@ceph.io' 
Subject: RE: [ceph-users] Resolving LARGE_OMAP_OBJECTS

On Friday, March 5th, 2021 at 15:20, Drew Weaver  wrote:
> Sorry to sound clueless but no matter what I search for on El Goog I can't 
> figure out how to answer the question as to whether dynamic sharding is 
> enabled in our environment.
>
> It's not configured as true in the config files, but it is the default.
>
> Is there a radosgw-admin command to determine whether or not it's enabled in 
> the running environment?

If `rgw_dynamic_resharding` is not explicitly set to `false` in your 
environment, I think we can assume dynamic resharding is enabled. And if any of 
your buckets have more than one shard and you didn't reshard them manually, 
you'll know for sure dynamic resharding is working; you can check the number of 
shards on a bucket with `radosgw-admin bucket stats --bucket=`, there's a 
`num_shards` field. You can also check with `radosgw-admin bucket limit check` 
if any of your buckets are about to be resharded.

Assuming dynamic resharding is enabled and none of your buckets are about to be 
resharded, I would then find out which object has too many OMAP keys by 
grepping the logs. The name of the object will contain the bucket ID (also 
found in the output of `radosgw-admin bucket stats`), so you'll know which 
bucket is causing the issue. And you can check how many OMAP keys are in each 
shard of that bucket index using

```
for obj in $(rados -p default.rgw.buckets.index ls | grep 
eaf0ece5-9f4a-4aa8-9d67-8c6698f7919b.88726492.4); do
  printf "%-60s %7d\n" $obj $(rados -p default.rgw.buckets.index listomapkeys 
$obj | wc -l) done ```

(where `eaf0ece5-9f4a-4aa8-9d67-8c6698f7919b.88726492.4` is your bucket ID). If 
the number of keys are very uneven amongst the shards, there's probably an 
issue that needs to be addressed. If you they are relatively even but slightly 
above the warning threshold, it's probably a versioned bucket, and it should be 
safe to simply increase the threshold.

Cheers,

--
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Resolving LARGE_OMAP_OBJECTS

2021-03-05 Thread Drew Weaver
Sorry to sound clueless but no matter what I search for on El Goog I can't 
figure out how to answer the question as to whether dynamic sharding is enabled 
in our environment.

It's not configured as true in the config files, but it is the default.

Is there a radosgw-admin command to determine whether or not it's enabled in 
the running environment?

Thanks,
-Drew

-Original Message-
From: Benoît Knecht  
Sent: Thursday, March 4, 2021 11:46 AM
To: Drew Weaver 
Cc: 'ceph-users@ceph.io' 
Subject: Re: [ceph-users] Resolving LARGE_OMAP_OBJECTS

Hi Drew,

On Thursday, March 4th, 2021 at 15:18, Drew Weaver  
wrote:
> Howdy, the dashboard on our cluster keeps showing LARGE_OMAP_OBJECTS.
>
> I went through this document
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.suse.com_support_kb_doc_-3Fid-3D19698=DwIFaQ=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM=OPufM5oSy-PFpzfoijO_w76wskMALE1o4LtA3tMGmuw=CNkAvwXr2_ctEE9c_biQmD6shezowIzUC32IvquGvmE=gzTWzc9eJy5J-DuCuctcLib_g1evf_bHZZ0uhOhy4ak=
>
> I've found that we have a total of 5 buckets, each one is owned by a 
> different user.

Do you have dynamic sharding enabled? If so, hitting the large OMAP object 
threshold is a bit suspicious, as resharding should keep each shard below the 
threshold.

Did you look into your logs to find out which object is affected, and it's the 
number of keys or its size triggering the warning (you can grep for "Large omap 
object found" in /var/log/ceph/ceph.log)?

From the name of the object, you can figure out which bucket is affected, and 
you can look for it in the default.rgw.buckets.index pool and see if the index 
keys are evenly distributed among the shards.

--
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Resolving LARGE_OMAP_OBJECTS

2021-03-04 Thread Drew Weaver
Howdy, the dashboard on our cluster keeps showing LARGE_OMAP_OBJECTS.

I went through this document

https://www.suse.com/support/kb/doc/?id=19698

I've found that we have a total of 5 buckets, each one is owned by a different 
user.

>From what I have read on this issue it seems to flip flop between this is an 
>actual problem that will cause real world issues to "we just raised the limit 
>in the next version".

Does anyone have any expertise on whether this is an actual problem or if we 
should just tune the numbers and how do you determine that?

One other quick question: is there a way to add usage information for buckets 
into mgr for version 14?

Thanks,
-Drew





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions RE: Ceph/CentOS/IBM

2021-03-03 Thread Drew Weaver
> As I understand it right now Ceph 14 is the last version that will run on 
> CentOS/EL7 but CentOS8 was "killed off".

>This is wrong. Ceph 15 runs on CentOS 7 just fine, but without the dashboard.

Oh, what I should have said is that I want it to be fully functional.




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Questions RE: Ceph/CentOS/IBM

2021-03-03 Thread Drew Weaver
Howdy,

After the IBM acquisition of RedHat the landscape for CentOS quickly changed.

As I understand it right now Ceph 14 is the last version that will run on 
CentOS/EL7 but CentOS8 was "killed off".

So given that, if you were going to build a Ceph cluster today would you even 
bother doing it using a non-commercial distribution or would you just use RHEL 
8 (or even their commercial Ceph product).

Secondly, are we expecting IBM to "kill off" Ceph as well?

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Trying to upgrade to octopus removes current version of ceph release and tries to install older version...

2020-06-08 Thread Drew Weaver
Nevermind didn't see that Octopus isn't really supported on C7 so I'll just 
stick with what I have until I want to upgrade to C8.

Thanks,
-Drew

-Original Message-
From: Drew Weaver  
Sent: Monday, June 8, 2020 1:38 PM
To: 'ceph-users@ceph.io' 
Subject: [ceph-users] Trying to upgrade to octopus removes current version of 
ceph release and tries to install older version...

Hi cluster is version 14.2.9

ceph-deploy v2.0.1
using command ceph-deploy install --release octopus mon0 mon1 mon2

result is this command being run:
sudo yum remove -y ceph-release
which removes this package:
ceph-releasenoarch1-1.el7   @/ceph-release-1-0.el7.noarch
Then it tries to install this older version:
Running command: sudo yum install -y 
https://download.ceph.com/rpm-octopus/el7/noarch/ceph-release-1-0.el7.noarch.rpm

Which does not actually exist.

Am I doing this wrong?



___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Trying to upgrade to octopus removes current version of ceph release and tries to install older version...

2020-06-08 Thread Drew Weaver
Hi cluster is version 14.2.9

ceph-deploy v2.0.1
using command ceph-deploy install --release octopus mon0 mon1 mon2

result is this command being run:
sudo yum remove -y ceph-release
which removes this package:
ceph-releasenoarch1-1.el7   @/ceph-release-1-0.el7.noarch
Then it tries to install this older version:
Running command: sudo yum install -y 
https://download.ceph.com/rpm-octopus/el7/noarch/ceph-release-1-0.el7.noarch.rpm

Which does not actually exist.

Am I doing this wrong?



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Choosing suitable SSD for Ceph cluster

2019-10-25 Thread Drew Weaver
Not related to the original topic but the Micron case in that article is 
fascinating and a little surprising.

With pretty much best in class hardware in a lab environment:

Potential 25,899,072‬ 4KiB random write IOPs goes to 477K
Potential 23,826,216 4KiB random read IOPs goes to 2,000,000

477K write IOPs and 2M read IOPs isn't terrible especially given there is 
replication but the overhead when you look at the numbers is still staggering.

Thanks for sharing this article.
-Drew

-Original Message-
From: Vitaliy Filippov  
Sent: Thursday, October 24, 2019 6:32 PM
To: ceph-us...@ceph.com; Hermann Himmelbauer 
Subject: [ceph-users] Re: Choosing suitable SSD for Ceph cluster

It's easy:

https://yourcmc.ru/wiki/Ceph_performance

> Hi,
> I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster 
> on
> 3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks), 
> interconnected via Infiniband 40.
>
> Problem is that the ceph performance is quite bad (approx. 30MiB/s 
> reading, 3-4 MiB/s writing ), so I thought about plugging into each 
> node a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is 
> to have a faster ceph storage and also some storage extension.
>
> The question is now which SSDs I should use. If I understand it right, 
> not every SSD is suitable for ceph, as is denoted at the links below:
>
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-
> ssd-is-suitable-as-a-journal-device/
> or here:
> https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
>
> In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a 
> fast SSD for ceph. As the 950 is not available anymore, I ordered a 
> Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO.
>
> Before equipping all nodes with these SSDs, I did some tests with "fio"
> as recommended, e.g. like this:
>
> fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k
> --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
> --name=journal-test
>
> The results are as the following:
>
> ---
> 1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
> write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
>
> Jobs: 4:
> read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
> write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
>
> Jobs: 10:
> read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
> write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
> ---
>
> So the read speed is impressive, but the write speed is really bad.
>
> Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND 
> chips (MLC instead of TLC). The results are, however even worse for
> writing:
>
> ---
> Samsung 970 PRO NVMe M.2 mit PCIe Adapter
> Jobs: 1:
> read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
> write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
>
> Jobs: 4:
> read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
> write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
>
> Jobs: 10:
> read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec
> write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
> ---
>
> I did some research and found out, that the "--sync" flag sets the 
> flag "O_DSYNC" which seems to disable the SSD cache which leads to 
> these horrid write speeds.
>
> It seems that this relates to the fact that the write cache is only 
> not disabled for SSDs which implement some kind of battery buffer that 
> guarantees a data flush to the flash in case of a powerloss.
>
> However, It seems impossible to find out which SSDs do have this 
> powerloss protection, moreover, these enterprise SSDs are crazy 
> expensive compared to the SSDs above - moreover it's unclear if 
> powerloss protection is even available in the NVMe form factor. So 
> building a 1 or 2 TB cluster seems not really affordable/viable.
>
> So, can please anyone give me hints what to do? Is it possible to 
> ensure that the write cache is not disabled in some way (my server is 
> situated in a data center, so there will probably never be loss of power).
>
> Or is the link above already outdated as newer ceph releases somehow 
> deal with this problem? Or maybe a later Debian release (10) will 
> handle the O_DSYNC flag differently?
>
> Perhaps I should simply invest in faster (and bigger) harddisks and 
> forget the SSD-cluster idea?
>
> Thank you in advance for any help,
>
> Best Regards,
> Hermann


--
With best regards,
   Vitaliy Filippov
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iSCSI write performance

2019-10-24 Thread Drew Weaver
I was told by someone at Red Hat that ISCSI performance is still several 
magnitudes behind using the client / driver.

Thanks,
-Drew


-Original Message-
From: Nathan Fish  
Sent: Thursday, October 24, 2019 1:27 PM
To: Ryan 
Cc: ceph-users 
Subject: [ceph-users] Re: iSCSI write performance

Are you using Erasure Coding or replication? What is your crush rule?
What SSDs and CPUs? Does each OSD use 100% of a core or more when writing?

On Thu, Oct 24, 2019 at 1:22 PM Ryan  wrote:
>
> I'm in the process of testing the iscsi target feature of ceph. The cluster 
> is running ceph 14.2.4 and ceph-iscsi 3.3. It consists of 5 hosts with 12 SSD 
> OSDs per host. Some basic testing moving VMs to a ceph backed datastore is 
> only showing 60MB/s transfers. However moving these back off the datastore is 
> fast at 200-300MB/s.
>
> What should I be looking at to track down the write performance issue? In 
> comparison with the Nimble Storage arrays I can see 200-300MB/s in both 
> directions.
>
> Thanks,
> Ryan
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io