Re: [ceph-users] To backport or not to backport

2019-07-05 Thread Robert LeBlanc
On Thu, Jul 4, 2019 at 8:00 AM Stefan Kooman  wrote:

> Hi,
>
> Now the release cadence has been set, it's time for another discussion
> :-).
>
> During Ceph day NL we had a panel q/a [1]. One of the things that was
> discussed were backports. Occasionally users will ask for backports of
> functionality in newer releases to older releases (that are still in
> support).
>
> Ceph is quite a unique project in the sense that new functionality gets
> backported to older releases. Sometimes even functionality gets changed
> in the lifetime of a release. I can recall "ceph-volume" change to LVM
> in the beginning of the Luminous release. While backports can enrich the
> user experience of a ceph operator, it's not without risks. There have
> been several issues with "incomplete" backports and or unforeseen
> circumstances that had the reverse effect: downtime of (part of) ceph
> services. The ones that come to my mind are:
>
> - MDS (cephfs damaged)  mimic backport (13.2.2)
> - RADOS (pg log hard limit) luminous / mimic backport (12.2.8 / 13.2.2)
>
> I would like to define a simple rule of when to backport:
>
> - Only backport fixes that do not introduce new functionality, but
> addresses
>   (impaired) functionality already present in the release.
>
> Example of, IMHO, a backport that matches the backport criteria was the
> "bitmap_allocator" fix. It fixed a real problem, not some corner case.
> Don't get me wrong here, it is important to catch corner cases, but it
> should not put the majority of clusters at risk.
>
> The time and effort that might be saved with this approach can indeed be
> spend in one of the new focus areas Sage mentioned during his keynote
> talk at Cephalocon Barcelona: quality. Quality of the backports that are
> needed, improved testing, especially for upgrades to newer releases. If
> upgrades are seemless, people are more willing to upgrade, because hey,
> it just works(tm). Upgrades should be boring.
>
> How many clusters (not nautilus ;-)) are running with "bitmap_allocator" or
> with the pglog_hardlimit enabled? If a new feature is not enabled by
> default and it's unclear how "stable" it is to use, operators tend to not
> enable it, defeating the purpose of the backport.
>
> Backporting fixes to older releases can be considered a "business
> opportunity" for the likes of Red Hat, SUSE, Fujitsu, etc. Especially
> for users that want a system that "keeps on running forever" and never
> needs "dangerous" updates.
>
> This is my view on the matter, please let me know what you think of
> this.
>
> Gr. Stefan
>
> P.s. Just to make things clear: this thread is in _no way_ intended to
> pick on
> anybody.
>
>
> [1]: https://pad.ceph.com/p/ceph-day-nl-2019-panel


I prefer a released version to be fairly static and not have new features
introduced, only bug fixes. For one, I'd prefer not to have to read the
release notes to figure out how dangerous a "bug-fix" release should be.
The fixes in a released version should be tested extremely well so it "Just
Works".

By not back porting new features, I think it gives more time to bake the
features into the new version and frees up the developers to focus on the
forward direction of the product. If I want a new feature, then the burden
is on me to test a new version and verify that it works in my environment
(or vendors), not the developers.

I wholeheartedly support only bug fixes and security fixes going into
released versions.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] To backport or not to backport

2019-07-04 Thread Daniel Baumann
Hi,

On 7/4/19 3:00 PM, Stefan Kooman wrote:
> - Only backport fixes that do not introduce new functionality, but addresses
>   (impaired) functionality already present in the release.

ack, and also my full agrement/support for everything else you wrote,
thanks.

reading in the changelogs about backported features (in particular the
one release where bluestor was backported to) left me quite scared for
upgrading our cluster.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] To backport or not to backport

2019-07-04 Thread Stefan Kooman
Hi,

Now the release cadence has been set, it's time for another discussion
:-).

During Ceph day NL we had a panel q/a [1]. One of the things that was
discussed were backports. Occasionally users will ask for backports of
functionality in newer releases to older releases (that are still in
support).

Ceph is quite a unique project in the sense that new functionality gets
backported to older releases. Sometimes even functionality gets changed
in the lifetime of a release. I can recall "ceph-volume" change to LVM
in the beginning of the Luminous release. While backports can enrich the
user experience of a ceph operator, it's not without risks. There have
been several issues with "incomplete" backports and or unforeseen
circumstances that had the reverse effect: downtime of (part of) ceph
services. The ones that come to my mind are:

- MDS (cephfs damaged)  mimic backport (13.2.2)
- RADOS (pg log hard limit) luminous / mimic backport (12.2.8 / 13.2.2)

I would like to define a simple rule of when to backport:

- Only backport fixes that do not introduce new functionality, but addresses
  (impaired) functionality already present in the release.

Example of, IMHO, a backport that matches the backport criteria was the
"bitmap_allocator" fix. It fixed a real problem, not some corner case.
Don't get me wrong here, it is important to catch corner cases, but it
should not put the majority of clusters at risk.

The time and effort that might be saved with this approach can indeed be
spend in one of the new focus areas Sage mentioned during his keynote
talk at Cephalocon Barcelona: quality. Quality of the backports that are
needed, improved testing, especially for upgrades to newer releases. If
upgrades are seemless, people are more willing to upgrade, because hey,
it just works(tm). Upgrades should be boring.

How many clusters (not nautilus ;-)) are running with "bitmap_allocator" or
with the pglog_hardlimit enabled? If a new feature is not enabled by
default and it's unclear how "stable" it is to use, operators tend to not
enable it, defeating the purpose of the backport.

Backporting fixes to older releases can be considered a "business
opportunity" for the likes of Red Hat, SUSE, Fujitsu, etc. Especially
for users that want a system that "keeps on running forever" and never
needs "dangerous" updates.

This is my view on the matter, please let me know what you think of
this.

Gr. Stefan

P.s. Just to make things clear: this thread is in _no way_ intended to pick on
anybody. 


[1]: https://pad.ceph.com/p/ceph-day-nl-2019-panel

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com