Re: [ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Kevin Olbrich
2018-02-02 12:44 GMT+01:00 Richard Hesketh :

> On 02/02/18 08:33, Kevin Olbrich wrote:
> > Hi!
> >
> > I am planning a new Flash-based cluster. In the past we used SAMSUNG
> PM863a 480G as journal drives in our HDD cluster.
> > After a lot of tests with luminous and bluestore on HDD clusters, we
> plan to re-deploy our whole RBD pool (OpenNebula cloud) using these disks.
> >
> > As far as I understand, it would be best to skip journaling / WAL and
> just deploy every OSD 1-by-1. This would have the following pro's (correct
> me, if I am wrong):
> > - maximum performance as the journal is spread accross all devices
> > - a lost drive does not affect any other drive
> >
> > Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to
> migrate to Ubuntu 16.04.3 with HWE (kernel 4.10).
> > Clients will be Fedora 27 + OpenNebula.
> >
> > Any comments?
> >
> > Thank you.
> >
> > Kind regards,
> > Kevin
>
> There is only a real advantage to separating the DB/WAL from the main data
> if they're going to be hosted on a device which is appreciably faster than
> the main storage. Since you're going all SSD, it makes sense to deploy each
> OSD all-in-one; as you say, you don't bottleneck on any one disk, and it
> also offers you more maintenance flexibility as you will be able to easily
> move OSDs between hosts if required. If you wanted to start pushing
> performance more, you'd be looking at putting NVMe disks in your hosts for
> DB/WAL.
>

We got some Intel P3700 NVMe (PCIe) disks but each host will be serving 10
OSDs, combined sync-speed on the samsungs was better than this single NVMe
(we did some short fio-benchmarks no real-ceph-test, could also be
different now).
If performance is only slightly better, sticking to single OSD failure
domain is better for maintenance, as this new cluster will not be monitored
24/7 by our staff while migration is in progress.


> FYI, the 16.04 HWE kernel has currently rolled on over to 4.13.
>

Did someone test this kernel branch with ceph? Any performance impact? If I
unterstood the docs, Ubuntu is a well tested plattform for ceph, so this
should have been already tested (?).


>   May I ask why are you using EL repo with centos?
> AFAIK, Redhat is backporting all ceph features to 3.10 kernels. Am I
> wrong?
>

Before we moved from OpenStack to OpenNebula in early 2017, we had some
problems with krbd / fuse (missing features, etc.).
We then decided to move from 3.10 zu 4.4 which solved all problems and we
noticed a small performance improvement.
Maybe these problems are solved already, we had these problems when we
rolled out Mitaka.
We did not change our deployment scripts since then, thats why we are still
at kernel-ml.

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Serkan Çoban
May I ask why are you using EL repo with centos?
AFAIK, Redhat is backporting all ceph features to 3.10 kernels. Am I wrong?

On Fri, Feb 2, 2018 at 2:44 PM, Richard Hesketh
 wrote:
> On 02/02/18 08:33, Kevin Olbrich wrote:
>> Hi!
>>
>> I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a 
>> 480G as journal drives in our HDD cluster.
>> After a lot of tests with luminous and bluestore on HDD clusters, we plan to 
>> re-deploy our whole RBD pool (OpenNebula cloud) using these disks.
>>
>> As far as I understand, it would be best to skip journaling / WAL and just 
>> deploy every OSD 1-by-1. This would have the following pro's (correct me, if 
>> I am wrong):
>> - maximum performance as the journal is spread accross all devices
>> - a lost drive does not affect any other drive
>>
>> Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to migrate to 
>> Ubuntu 16.04.3 with HWE (kernel 4.10).
>> Clients will be Fedora 27 + OpenNebula.
>>
>> Any comments?
>>
>> Thank you.
>>
>> Kind regards,
>> Kevin
>
> There is only a real advantage to separating the DB/WAL from the main data if 
> they're going to be hosted on a device which is appreciably faster than the 
> main storage. Since you're going all SSD, it makes sense to deploy each OSD 
> all-in-one; as you say, you don't bottleneck on any one disk, and it also 
> offers you more maintenance flexibility as you will be able to easily move 
> OSDs between hosts if required. If you wanted to start pushing performance 
> more, you'd be looking at putting NVMe disks in your hosts for DB/WAL.
>
> FYI, the 16.04 HWE kernel has currently rolled on over to 4.13.
>
> Rich
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Richard Hesketh
On 02/02/18 08:33, Kevin Olbrich wrote:
> Hi!
> 
> I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a 
> 480G as journal drives in our HDD cluster.
> After a lot of tests with luminous and bluestore on HDD clusters, we plan to 
> re-deploy our whole RBD pool (OpenNebula cloud) using these disks.
> 
> As far as I understand, it would be best to skip journaling / WAL and just 
> deploy every OSD 1-by-1. This would have the following pro's (correct me, if 
> I am wrong):
> - maximum performance as the journal is spread accross all devices
> - a lost drive does not affect any other drive
> 
> Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to migrate to 
> Ubuntu 16.04.3 with HWE (kernel 4.10).
> Clients will be Fedora 27 + OpenNebula.
> 
> Any comments?
> 
> Thank you.
> 
> Kind regards,
> Kevin

There is only a real advantage to separating the DB/WAL from the main data if 
they're going to be hosted on a device which is appreciably faster than the 
main storage. Since you're going all SSD, it makes sense to deploy each OSD 
all-in-one; as you say, you don't bottleneck on any one disk, and it also 
offers you more maintenance flexibility as you will be able to easily move OSDs 
between hosts if required. If you wanted to start pushing performance more, 
you'd be looking at putting NVMe disks in your hosts for DB/WAL.

FYI, the 16.04 HWE kernel has currently rolled on over to 4.13.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Kevin Olbrich
Hi!

I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a
480G as journal drives in our HDD cluster.
After a lot of tests with luminous and bluestore on HDD clusters, we plan
to re-deploy our whole RBD pool (OpenNebula cloud) using these disks.

As far as I understand, it would be best to skip journaling / WAL and just
deploy every OSD 1-by-1. This would have the following pro's (correct me,
if I am wrong):
- maximum performance as the journal is spread accross all devices
- a lost drive does not affect any other drive

Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to migrate
to Ubuntu 16.04.3 with HWE (kernel 4.10).
Clients will be Fedora 27 + OpenNebula.

Any comments?

Thank you.

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com