> Samsung EVO...
> Which exact model, I presume this is not a DC one?
>
> If you had put your journals on those, you would already be pulling your hairs
> out due to abysmal performance.
>
> Also with Evo ones, I'd be worried about endurance.

No,  I am using the P3700DCs for journals.  The Samsungs are the 850 2TB 
(MZ-75E2T0BW).  Chosen primarily on price.  We already built a system using the 
1TB models with Solaris+ZFS and I have little faith in them.  Certainly their 
write performance is erratic and not ideal.  We have other vendor options which 
are what they call "Enterprise Value" SSDs, but still 4x the price.   I would 
prefer a higher grade drive but unfortunately cost is being driven from above 
me.

> > On the ceph side each disk in the OSD servers are setup as an individual
> > OSD, with a 12G journal created on the flash mirror.   I setup the SSD
> > servers into one root, and the SATA servers into another and created
> > pools using hosts as fault boundaries, with the pools set for 2
> > copies.
> Risky. If you have very reliable and well monitored SSDs you can get away
> with 2 (I do so), but with HDDs and the combination of their reliability and
> recovery time it's asking for trouble.
> I realize that this is testbed, but if your production has a replication of 3 
> you
> will be disappointed by the additional latency.

Again, cost - the end goal will be we build metro based dual site pools which 
will be 2+2 replication.  I am aware of the risks but already presenting 
numbers based on buying 4x the disk we are able to use gets questioned hard.

> This smells like garbage collection on your SSDs, especially since it matches
> time wise what you saw on them below.

I concur.   I am just not sure why that impacts back to the client when from 
the client perspective the journal should hide this.   If the journal is 
struggling to keep up and has to flush constantly then perhaps, but  on the 
current steady state IO rate I am testing with I don't think the journal should 
be that saturated.

> Have you tried the HDD based pool and did you see similar, consistent
> interval, spikes?

To be honest I have been focusing on the SSD numbers but that would be a good 
comparison.

> Or alternatively, configured 2 of your NVMEs as OSDs?

That was what I was thinking of doing - move the NVMEs to the frontends, make 
them OSDs and configure them as a read-forward cache tier for the other pools, 
and just have the SSDs and SATA journal by default on a first partition.

> No, not really. The journal can only buffer so much.
> There are several threads about this in the archives.
>
> You could tune it but that will only go so far if your backing storage can't 
> keep
> up.
>
> Regards,
>
> Christian


Agreed - Thanks for your help.
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to