Re: [ceph-users] Proxmox/ceph upgrade and addition of a new node/OSDs

2018-09-21 Thread Fabian Grünbichler
On Fri, Sep 21, 2018 at 09:03:15AM +0200, Hervé Ballans wrote:
> Hi MJ (and all),
> 
> So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the
> operation in a few words : overall, everything went well :)
> The most critical operation of all is the 'osd crush tunables optimal', I
> talk about it in more detail after...
> 
> The Proxmox documentation is really well written and accurate and, normally,
> following the documentation step by step is almost sufficient !

Glad to hear that everything worked well.

> 
> * first step : upgrade Ceph Jewel to Luminous :
> https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous
> (Note here : OSDs remain in FileStore backend, no BlueStore migration)
> 
> * second step : upgrade Proxmox version 4 to 5 :
> https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0
> 
> Just some numbers, observations and tips (based on our feedback, I'm not an
> expert !) :
> 
> * Before migration, make sure you are in the lastest version of Proxmox 4
> (4.4-24) and Ceph Jewel (10.2.11)
> 
> * We don't use the pve repository for ceph packages but the official one
> (download.ceph.com). Thus, during the upgrade of Promox PVE, we don't
> replace ceph.com repository with promox.com Ceph repository...

This is not recommended (and for a reason) - our packages are almost
identical to the upstream/official ones. But we do include the
occasional bug fix much faster than the official packages do, including
reverting breakage. Furthermore, when using our repository, you know
that the packages went through our own testing to ensure compatibility
with our stack (e.g., issues like JSON output changing from one minor
release to the next breaking our integration/GUI). Also, this natural
delay between upstream releases and availability in our repository has
saved our users from lots of "serious bug noticed one day after release"
issues since we switched to providing Ceph via our own repositories.

> * When you upgrade Ceph to Luminous (without tunables optimal), there is no
> impact on Proxmox 4. VMs are still running normally.
> The side effect (non blocking for the functionning of VMs) is located in the
> GUI, on the Ceph menu : it can't report the status of the ceph cluster as it
> has a JSON formatting error (indeed the output of the command 'ceph -s' is
> completely different, really more readable on Luminous)

Yes, this is to be expected. Backporting all of that just for the short
time window of "upgrade in progress" is too much work for too little
gain.

> 
> * It misses a little step in section 8 "Create Manager instances" of the
> upgrade ceph documentation. As the Ceph manager daemon is new since
> Luminous, the package doesn't exist on Jewel. So you have to install the
> ceph-mgr package on each node first before doing 'pveceph createmgr'|||
> |

It actually does not ;) ceph-mgr is pulled in by ceph on upgrades from
Jewel to Luminous - unless you manually removed that package at some
point.

> Otherwise :
> - verify that all your VMs are recently backuped on an external storage (in
> case of Disaster recovery Plan !)

Good idea in general :D

> - if you can, stop all your non-critical VMs (in order to limit client io
> operations)
> - if any, wait for the end of current backups then disable datacenter backup
> (in order to limit client io operations). !! do not forget to re-enable it
> when all is over !!
> - if any and if no longer needed, delete your snapshots, it removes many
> useless objects !
> - start the tunables operation outside of major activity periods (night,
> week-end, ??) and take into account that it can be very slow...

Scheduling and carefully planning rebalancing operations is always
needed on a production cluster. Note that the upgrade docs state that
switching to "tunables optimal" is recommended, but "will cause a
massive rebalance".

> There are probably some options to configure in ceph to avoid 'pgs stuck'
> states, but on our side, as we previously moved our critical VM's disks, we
> didn't care about that !
> 
> * Anyway, the upgrade step of Proxmox PVE is done easily and quickly (just
> follow the documentation). Note that you can upgrade Proxmox PVE before
> doing the 'tunables optimal' operation.
> 
> Hoping that you will find this information useful, good luck with your very
> next migration !

Thank you for the detailled report and feedback!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-maintainers] download.ceph.com repository changes

2018-08-03 Thread Fabian Grünbichler
On Mon, Jul 30, 2018 at 11:36:55AM -0600, Ken Dreyer wrote:
> On Fri, Jul 27, 2018 at 1:28 AM, Fabian Grünbichler
>  wrote:
> > On Tue, Jul 24, 2018 at 10:38:43AM -0400, Alfredo Deza wrote:
> >> Hi all,
> >>
> >> After the 12.2.6 release went out, we've been thinking on better ways
> >> to remove a version from our repositories to prevent users from
> >> upgrading/installing a known bad release.
> >>
> >> The way our repos are structured today means every single version of
> >> the release is included in the repository. That is, for Luminous,
> >> every 12.x.x version of the binaries is in the same repo. This is true
> >> for both RPM and DEB repositories.
> >>
> >> However, the DEB repos don't allow pinning to a given version because
> >> our tooling (namely reprepro) doesn't construct the repositories in a
> >> way that this is allowed. For RPM repos this is fine, and version
> >> pinning works.
> >
> > If you mean that reprepo does not support referencing multiple versions
> > of packages in the Packages file, there is a patched fork that does
> > (that seems well-supported):
> >
> > https://github.com/profitbricks/reprepro
> 
> Thanks for this link. That's great to know someone's working on this.
> 
> What's the status of merging that back into the main reprepro code, or
> else shipping that fork as the new reprepro package in Debian /
> Ubuntu? The Ceph project could end up responsible for maintaining that
> reprepro fork if the main Ubuntu community does not pick it up :) The
> fork is several years old, and the latest update on
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=570623 was over a
> year ago.

I don't know anything more than what is publicly available about either
merging back to the original reprepo or shipping in Debian/Ubuntu. We
are using our own custom repo software built around lower level tools, I
was just aware of the fork for unrelated reasons :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-maintainers] download.ceph.com repository changes

2018-07-27 Thread Fabian Grünbichler
On Tue, Jul 24, 2018 at 10:38:43AM -0400, Alfredo Deza wrote:
> Hi all,
> 
> After the 12.2.6 release went out, we've been thinking on better ways
> to remove a version from our repositories to prevent users from
> upgrading/installing a known bad release.
> 
> The way our repos are structured today means every single version of
> the release is included in the repository. That is, for Luminous,
> every 12.x.x version of the binaries is in the same repo. This is true
> for both RPM and DEB repositories.
> 
> However, the DEB repos don't allow pinning to a given version because
> our tooling (namely reprepro) doesn't construct the repositories in a
> way that this is allowed. For RPM repos this is fine, and version
> pinning works.

If you mean that reprepo does not support referencing multiple versions
of packages in the Packages file, there is a patched fork that does
(that seems well-supported):

https://github.com/profitbricks/reprepro

> 
> To remove a bad version we have to proposals (and would like to hear
> ideas on other possibilities), one that would involve symlinks and the
> other one which purges the known bad version from our repos.
> 
> *Symlinking*
> When releasing we would have a "previous" and "latest" symlink that
> would get updated as versions move forward. It would require
> separation of versions at the URL level (all versions would no longer
> be available in one repo).
> 
> The URL structure would then look like:
> 
> debian/luminous/12.2.3/
> debian/luminous/previous/  (points to 12.2.5)
> debian/luminous/latest/   (points to 12.2.7)
> 
> Caveats: the url structure would change from debian-luminous/ to
> prevent breakage, and the versions would be split. For RPMs it would
> mean a regression if someone is used to pinning, for example pinning
> to 12.2.2 wouldn't be possible using the same url.
> 
> Pros: Faster release times, less need to move packages around, and
> easier to remove a bad version
> 
> 
> *Single version removal*
> Our tooling would need to go and remove the known bad version from the
> repository, which would require to rebuild the repository again, so
> that the metadata is updated with the difference in the binaries.
> 
> Caveats: time intensive process, almost like cutting a new release
> which takes about a day (and sometimes longer). Error prone since the
> process wouldn't be the same (one off, just when a version needs to be
> removed)

I am not involved in this process, but that seems like something is
wrong somewhere. You keep all the binary debs on the public mirror, so
"retracting" a broken latest one should just consist of:

- deleting the .deb files of the broken release
- regenerating the Packages*, Content* and *Release* metadata files

The former should be quasi-instant, the latter takes a bit (ceph
packages are quite big, especially the ones containing debug symbols,
and they need to be hashed multiple times), but nowhere near a day.

If you keep the "old" metadata files around, both steps should be almost
instant:
- delete broken .deb files
- revert (expensive) metadata files to previous snapshot

> Pros: all urls for download.ceph.com and its structure are kept the same.

that is quite a big pro, IMHO.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-19 Thread Fabian Grünbichler
On Mon, Jun 18, 2018 at 07:15:49PM +, Sage Weil wrote:
> On Mon, 18 Jun 2018, Fabian Grünbichler wrote:
> > it's of course within your purview as upstream project (lead) to define
> > certain platforms/architectures/distros as fully supported, and others
> > as best-effort/community-driven/... . there was no clear public
> > communication (AFAICT, only the one thread on ceph-maintainers, which is
> > rather low visibility) that Debian moves from somewhere in the middle[2]
> > to the latter category with Mimic, and has now (at least for the time
> > being) effectively joined FreeBSD (which has at least one community
> > member pouring in enormous amounts of work) and the various community
> > Linux distros like Arch, Gentoo, ... (where I frankly have no idea about
> > the status quo). there is also no mention in the docs or the release
> > notes about the lack of Debian packages (and the state of Xenial
> > packages) for Mimic. all of which gives off more of an "unintended
> > consequence" vibe, rather than "conscious decision to drop Debian".
> 
> This is a fair assessment, and it's good to hear that there is some path 
> forward.
> 
> It looks like the Buster release date is Feb '19 (give or take), which 
> corresponds to Nautilus, so it should be possible for Debian users to skip 
> mimic and upgrade directly from luminous to nautilus as long as we build 
> some buster packages for luminous as well right around the end of its 
> lifetime (and/or mimic point releases once buster gets closer to stable).
> 
> Is this reasonable?

yes. like I already indicated, this is our "Plan B" in case Mimic on
Stretch is not feasible. we'll likely skip Mimic entirely (except for
some internal testing to catch and fix issue before Nautlius) in that
case, and jump straight to Nautilus with Buster and keep Luminous on
"life support" for Stretch and upgrading.

>   https://github.com/ceph/ceph/pull/22602

LGTM. still would like to see some note about the Xenial
toolchain stability issues, but that is more for the sake of others (I
am not an Ubuntu user).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-18 Thread Fabian Grünbichler
On Wed, Jun 13, 2018 at 12:36:50PM +, Sage Weil wrote:
> Hi Fabian,

thanks for your quick, and sorry for my delayed response (only having
1.5 usable arms atm).

> 
> On Wed, 13 Jun 2018, Fabian Grünbichler wrote:
> > On Mon, Jun 04, 2018 at 06:39:08PM +, Sage Weil wrote:
> > > [adding ceph-maintainers]
> > 
> > [and ceph-devel]
> > > On Mon, 4 Jun 2018, Charles Alva wrote:
> > > > Hi Guys,
> > > > 
> > > > When will the Ceph Mimic packages for Debian Stretch released? I could 
> > > > not
> > > > find the packages even after changing the sources.list.
> > > 
> > > The problem is that we're now using c++17, which requires a newer gcc
> > > than stretch or jessie provide, and Debian does not provide backports of
> > > the newer gcc packages.  We currently can't build the latest Ceph for
> > > those releases.
> > 
> > IMHO this is backwards. if you want to support distro X you should take
> > care to not need toolchain features that are not included in distro X.
> 
> Well, I thought we did.  When we were making the C++17 decision we
> verified that we could do builds on Ubuntu and CentOS using a newer
> compiler toolchain.  My assumption was that since both of these distros
> had backports that pretty much everyone did.  Clearly I was wrong.

just to make this explicit: I did not mean to imply any malicious intent
on your part. I am aware this whole issue is more a question of
priorities and capacities than anything else (on all sides, not just
yours). I do appreciate all the work you and the rest of the main Ceph
developers and the community as a whole has done and continues to do,
and we see ourselves very much as part of this community!

> 
> > [...]
> > effectively this means the current Xenial builds are about as safe and
> > production-ready as doing your own gcc backports for Stretch - i.e., not
> > very.
> 
> I missed this nuance as well.

just as another point, install-deps.sh script from ceph.git will
upgrade(!) the following packages on Xenial:

- libatomic1
- libcc1-0
- libcilkrts5
- libgcc1
- libgomp1
- libitm1
- liblsan0
- libquadmath0
- libstdc++6
- libtsan0
- libubsan0

some of them will even be upgraded to versions built from gcc-8. so not
only is this backport not very production-grade from a testing and
support POV, it is also not self-contained (in contrast to the old
Wheezy Mozilla GCC backport, or RedHat's DTS, at least from my
understanding?). IMHO both issues should be mentioned in the appropriate
places (regular docs for the former, dev docs for the latter?).

> > > We'd love to build for stretch, but until there is a newer gcc for that
> > > distro it's not possible.  We could build packages for 'testing', but I'm
> > > not sure if those will be usable on stretch.
> > 
> > saying you'd love to build for a distro, while effectively making sure a
> > build according to that distro's release policies is impossible without
> > major effort by someone else does strike me as a bit of a hollow
> > statement. in the end this is a further nail in the coffin of upstream
> > support for the Debian(-based) distros, with only the latest (1.5 months
> > old!) Ubuntu LTS being properly supported.
> 
> I think we need to be clear about the use of the term "support" here.  I
> was careful to say we'd like to *build* for Debian, but I'm not sure what
> organizations out there are offering formal *support* for any of the
> ceph.com packages (in the sense of providing technical support for bug
> escalations or any guarantees around stability etc).  This incident is
> perhaps an indication that those organizations should become more involved
> in the upstream development and decision-making process.

I am aware that there is no formal support (in the sense of commercial
agreements, etc.pp.) for the packages on download.ceph.com, and the fact
that most of the testing the packages for Debian get are just a
side-effect of you testing Ubuntu Xenial and now Bionic. we are already
rolling our own .deb packages for Proxmox VE (based on the upstream
debian/ directory) because we have been bitten in the past by issues not
caught in the upstream CI infrastructure.

we try to stay involved in the community, e.g. by opening or forwarding
bugs after initial triaging, and contributing fixes when possible. we do
spread the "gospel of Ceph" and promote it quite heaviliy, we do develop
integration and management layers that probably allow end users to setup
and use Ceph that would otherwise not dare to because of the complexity
involved.

but in the end, we (as in Ceph the upstream project, and Proxmox as
downstream distro) both face a similar dilemma - given limited developer

Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-13 Thread Fabian Grünbichler
On Mon, Jun 04, 2018 at 06:39:08PM +, Sage Weil wrote:
> [adding ceph-maintainers]

[and ceph-devel]

> 
> On Mon, 4 Jun 2018, Charles Alva wrote:
> > Hi Guys,
> > 
> > When will the Ceph Mimic packages for Debian Stretch released? I could not
> > find the packages even after changing the sources.list.
> 
> The problem is that we're now using c++17, which requires a newer gcc 
> than stretch or jessie provide, and Debian does not provide backports of 
> the newer gcc packages.  We currently can't build the latest Ceph for 
> those releases.

IMHO this is backwards. if you want to support distro X you should take
care to not need toolchain features that are not included in distro X.

Debian only provides one toolchain backport, and that is for Firefox,
which has a stable update exception because it is such an important
component for desktop systems and cannot be supported otherwise[1]. This
package is also not intended as general purpose toolchain, but built
solely for enabling a Firefox backport.

> We raised this with the Debian package maintainers about a month ago[1][2] 
> when the first release candidate was built and didn't get any response 
> (beyond a "yes, we there are not gcc package backports").

this is not how Debian works, as you most likely know ;)

> Both ubuntu and 
> fedora/rhel/centos (and I presume sles/opensuse) provide compiler 
> backports we did not anticipate this being a problem.

this is also not very accurate. it is true that Canonical provides a
toolchain PPA[2] which the Ceph build for Xenial seems to use, but there
is (AFAICT) no official guarantee for the level of support, security or
otherwise[3]. in fact, the PPA description states that it contains
"Toolchain test builds", which seem to mean pretty automatic backports
of whatever is in the current Ubuntu dev release, with a very short
delay between upload to Cosmic and the PPA for Xenial. e.g., for the
currently contained gcc-7 packages, there was less than a week between
hitting Cosmic and Xenial. Cosmic at the current point in the release
cycle is already not really exposed to public testing scrutiny in
general, and for sure not to the level something like the core toolchain
would require.

effectively this means the current Xenial builds are about as safe and
production-ready as doing your own gcc backports for Stretch - i.e., not
very.

> We'd love to build for stretch, but until there is a newer gcc for that 
> distro it's not possible.  We could build packages for 'testing', but I'm 
> not sure if those will be usable on stretch.

saying you'd love to build for a distro, while effectively making sure a
build according to that distro's release policies is impossible without
major effort by someone else does strike me as a bit of a hollow
statement. in the end this is a further nail in the coffin of upstream
support for the Debian(-based) distros, with only the latest (1.5 months
old!) Ubuntu LTS being properly supported.

I hope we find some way to support Mimic+ for Stretch without requiring
a backport of gcc-7+, although it unfortunately seems unlikely at this
point.

1: https://tracker.debian.org/pkg/gcc-mozilla
2: https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test
3: https://wiki.ubuntu.com/ToolChain#Toolchain_Updates


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Fabian Grünbichler
On Wed, Mar 07, 2018 at 02:04:52PM +0100, Fabian Grünbichler wrote:
> On Wed, Feb 28, 2018 at 10:24:50AM +0100, Florent B wrote:
> > Hi,
> > 
> > Since yesterday, the "ceph-luminous" repository does not contain any
> > package for Debian Jessie.
> > 
> > Is it expected ?
> 
> AFAICT the packages are all there[2], but the Packages file only
> references the ceph-deploy package so apt does not find the rest.
> 
> IMHO this looks like something went wrong when generating the repository
> metadata files - so maybe it's just a question of getting the people who
> maintain the repository to notice this thread ;)
> 

and as alfredo just pointed out on IRC, it has already been fixed!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] No more Luminous packages for Debian Jessie ??

2018-03-07 Thread Fabian Grünbichler
On Wed, Feb 28, 2018 at 10:24:50AM +0100, Florent B wrote:
> Hi,
> 
> Since yesterday, the "ceph-luminous" repository does not contain any
> package for Debian Jessie.
> 
> Is it expected ?

AFAICT the packages are all there[2], but the Packages file only
references the ceph-deploy package so apt does not find the rest.

IMHO this looks like something went wrong when generating the repository
metadata files - so maybe it's just a question of getting the people who
maintain the repository to notice this thread ;)

2: http://download.ceph.com/debian-luminous/pool/main/c/ceph/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-09 Thread Fabian Grünbichler
On Tue, Jan 09, 2018 at 02:14:51PM -0500, Alfredo Deza wrote:
> On Tue, Jan 9, 2018 at 1:35 PM, Reed Dier  wrote:
> > I would just like to mirror what Dan van der Ster’s sentiments are.
> >
> > As someone attempting to move an OSD to bluestore, with limited/no LVM
> > experience, it is a completely different beast and complexity level compared
> > to the ceph-disk/filestore days.
> >
> > ceph-deploy was a very simple tool that did exactly what I was looking to
> > do, but now we have deprecated ceph-disk halfway into a release, ceph-deploy
> > doesn’t appear to fully support ceph-volume, which is now the official way
> > to manage OSDs moving forward.
> 
> ceph-deploy now fully supports ceph-volume, we should get a release soon
> 
> >
> > My ceph-volume create statement ‘succeeded’ but the OSD doesn’t start, so
> > now I am trying to zap the disk to try to recreate the OSD, and the zap is
> > failing as Dan’s did.
> 
> I would encourage you to open a ticket in the tracker so that we can
> improve on what failed for you
> 
> http://tracker.ceph.com/projects/ceph-volume/issues/new
> 
> ceph-volume keeps thorough logs in /var/log/ceph/ceph-volume.log and
> /var/log/ceph/ceph-volume-systemd.log
> 
> If you create a ticket, please make sure to add all the output and
> steps that you can
> >
> > And yes, I was able to get it zapped using the lvremove, vgremove, pvremove
> > commands, but that is not obvious to someone who hasn’t used LVM extensively
> > for storage management before.
> >
> > I also want to mirror Dan’s sentiments about the unnecessary complexity
> > imposed on what I expect is the default use case of an entire disk being
> > used. I can’t see anything more than the ‘entire disk’ method being the
> > largest use case for users of ceph, especially the smaller clusters trying
> > to maximize hardware/spend.
> 
> We don't take lightly the introduction of LVM here. The new tool is
> addressing several insurmountable issues with how ceph-disk operated.
> 
> Although using an entire disk might be easier in the use case you are
> in, it is certainly not the only thing we have to support, so then
> again, we can't
> reliably decide what strategy would be best to destroy that volume, or
> group, or if the PV should be destroyed as well.

wouldn't it be possible to detect on creation that it is a full physical
disk that gets initialized completely by ceph-volume, store that in the
metadata somewhere and clean up accordingly when destroying the OSD?

> 
> The 'zap' sub-command will allow that lv to be reused for an OSD and
> that should work. Again, if it isn't sufficient, we really do need
> more information and a
> ticket in the tracker is the best way.
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hangs with qemu/libvirt/rbd when one host disappears

2017-12-07 Thread Fabian Grünbichler
On Thu, Dec 07, 2017 at 09:59:43AM +0100, Marcus Priesch wrote:
> Hello Brad,
> 
> thanks for your answer !
> 
> >> at least the point of all is that a single host should be allowed to
> >> fail and the vm's continue running ... ;)
> > 
> > You don't really have six MONs do you (although I know the answer to
> > this question)? I think you need to take another look at some of the
> > docs about monitors.
> 
> however i dont get the point here ...
> 
> because its an even number ?
> 
> i read docs ... but dont get any hints on the number of mons ... i would
> assume, the more the better ... is this wrong ?

an even number is always bad for quorum based systems (6 is no better
than 5, as you can only tolerate a loss of 2 before losing quorum).

in Ceph, additional monitors require additional resources AND generate
additional overhead (more mons -> more communication). the rule of thumb
is 3 for small to mid-sized cluster. the next step up performance wise
would be to move the 3 mons to their own stand-alone nodes, and only
once that starts to bottleneck, you increase the number to 5 and/or
upgrade the HW to become faster. for really big clusters, you can then
start splitting out the mgr instances to reduce the load further.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing mon_pg_warn_max_per_osd in v12.2.2

2017-12-04 Thread Fabian Grünbichler
On Mon, Dec 04, 2017 at 11:21:42AM +0100, SOLTECSIS - Victor Rodriguez Cortes 
wrote:
> 
> > Why are you OK with this? A high amount of PGs can cause serious peering 
> > issues. OSDs might eat up a lot of memory and CPU after a reboot or such.
> >
> > Wido
> 
> Mainly because there was no warning at all in v12.2.1 and it just
> appeared after upgrading to v12.2.2. Besides,its not a "too high" number
> of PGs for this environment and no CPU/peering issues have been detected
> yet.
> 
> I'll plan a way to create new OSD's/new CephFS and move files to it, but
> in the mean time I would like to just increase that variable, which is
> supposed to be supported and easy.
> 
> Thanks

the option is now called 'mon_max_pg_per_osd'.

this was originally slated for v12.2.1 where it was erroneously
mentioned in the release notes[1] despite note being part of the
release (I remember asking for updated/fixed release notes after 12.2.1,
seems like that never happened?). now it was applied as part of v12.2.2,
but is not mentioned at all in the release notes[2]...

1: http://docs.ceph.com/docs/master/release-notes/#v12-2-1-luminous
2: http://docs.ceph.com/docs/master/release-notes/#v12-2-2-luminous

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk removal roadmap (was ceph-disk is now deprecated)

2017-12-01 Thread Fabian Grünbichler
On Thu, Nov 30, 2017 at 11:25:03AM -0500, Alfredo Deza wrote:
> Thanks all for your feedback on deprecating ceph-disk, we are very
> excited to be able to move forwards on a much more robust tool and
> process for deploying and handling activation of OSDs, removing the
> dependency on UDEV which has been a tremendous source of constant
> issues.
> 
> Initially (see "killing ceph-disk" thread [0]) we planned for removal
> of Mimic, but we didn't want to introduce the deprecation warnings up
> until we had an out for those who had OSDs deployed in previous
> releases with ceph-disk (we are now able to handle those as well).
> That is the reason ceph-volume, although present since the first
> Luminous release, hasn't been pushed forward much.
> 
> Now that we feel like we can cover almost all cases, we would really
> like to see a wider usage so that we can improve on issues/experience.
> 
> Given that 12.2.2 is already in the process of getting released, we
> can't undo the deprecation warnings for that version, but we will
> remove them for 12.2.3, add them back again in Mimic, which will mean
> ceph-disk will be kept around a bit longer, and finally fully removed
> by N.
> 
> To recap:
> 
> * ceph-disk deprecation warnings will stay for 12.2.2
> * deprecation warnings will be removed in 12.2.3 (and from all later
> Luminous releases)
> * deprecation warnings will be added again in ceph-disk for all Mimic releases
> * ceph-disk will no longer be available for the 'N' release, along
> with the UDEV rules
> 
> I believe these four points address most of the concerns voiced in
> this thread, and should give enough time to port clusters over to
> ceph-volume.
> 
> [0] 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021358.html

Thank you for listening to the feedback - I think most of us know the
balance that needs to be struck between moving a project forward and
decrufting a code base versus providing a stable enough interface for
users is not always easy to find.

I think the above roadmap is a good compromise for all involved parties,
and I hope we can use the remainder of Luminous to prepare for a
seam- and painless transition to ceph-volume in time for the Mimic
release, and then finally retire ceph-disk for good!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk is now deprecated

2017-11-30 Thread Fabian Grünbichler
On Thu, Nov 30, 2017 at 07:04:33AM -0500, Alfredo Deza wrote:
> On Thu, Nov 30, 2017 at 6:31 AM, Fabian Grünbichler
> <f.gruenbich...@proxmox.com> wrote:
> > On Tue, Nov 28, 2017 at 10:39:31AM -0800, Vasu Kulkarni wrote:
> >> On Tue, Nov 28, 2017 at 9:22 AM, David Turner <drakonst...@gmail.com> 
> >> wrote:
> >> > Isn't marking something as deprecated meaning that there is a better 
> >> > option
> >> > that we want you to use and you should switch to it sooner than later? I
> >> > don't understand how this is ready to be marked as such if ceph-volume 
> >> > can't
> >> > be switched to for all supported use cases. If ZFS, encryption, FreeBSD, 
> >> > etc
> >> > are all going to be supported under ceph-volume, then how can ceph-disk 
> >> > be
> >> > deprecated before ceph-volume can support them? I can imagine many Ceph
> >> > admins wasting time chasing an erroneous deprecated warning because it 
> >> > came
> >> > out before the new solution was mature enough to replace the existing
> >> > solution.
> >>
> >> There is no need to worry about this deprecation, Its mostly for
> >> admins to be prepared
> >> for the changes coming ahead and its mostly for *new* installations
> >> that can plan on using ceph-volume which provides
> >> great flexibility compared to ceph-disk.
> >
> > changing existing installations to output deprecation warnings from one
> > minor release to the next means it is not just for new installations
> > though, no matter how you spin it. a mention in the release notes and
> > docs would be enough to get admins to test and use ceph-volume on new
> > installations.
> >
> > I am pretty sure many admins will be bothered by all nodes running OSDs
> > spamming the logs and their terminals with huge deprecation warnings on
> > each OSD activation[1] or other actions involving ceph-disk, and having
> > this state for the remainder of Luminous unless they switch to a new
> > (and as of yet not battle-tested) way of activating their OSDs seems
> > crazy to me.
> >
> > I know our users will be, and given the short notice and huge impact
> > this would have we will likely have to remove the deprecation warnings
> > altogether in our (downstream) packages until we have completed testing
> > of and implementing support for ceph-volume..
> >
> >>
> >> a) many dont use ceph-disk or ceph-volume directly, so the tool you
> >> have right now eg: ceph-deploy or ceph-ansible
> >> will still support the ceph-disk, the previous ceph-deploy release is
> >> still available from pypi
> >>   https://pypi.python.org/pypi/ceph-deploy
> >
> > we have >> 10k (user / customer managed!) installations on Ceph Luminous
> > alone, all using our wrapper around ceph-disk - changing something like
> > this in the middle of a release causes huge headaches for downstreams
> > like us, and is not how a stable project is supposed to be run.
> 
> If you are using a wrapper around ceph-disk, then silencing the
> deprecation warnings should be easy to do.
> 
> These are plain Python warnings, and can be silenced within Python or
> environment variables. There are some details
> on how to do that here https://github.com/ceph/ceph/pull/18989

the problem is not how to get rid of the warnings, but having to when
upgrading from one bug fix release to the next.

> >
> >>
> >> b) also the current push will help anyone who is using ceph-deploy or
> >> ceph-disk in scripts/chef/etc
> >>to have time to think about using newer cli based on ceph-volume
> >
> > a regular deprecate at the beginning of the release cycle were the
> > replacement is deemed stable, remove in the next release cycle would be
> > adequate for this purpose.
> >
> > I don't understand the rush to shoe-horn ceph-volume into existing
> > supposedly stable Ceph installations at all - especially given the
> > current state of ceph-volume (we'll file bugs once we are done writing
> > them up, but a quick rudimentary test already showed stuff like choking
> > on valid ceph.conf files because they contain leading whitespace and
> > incomplete error handling leading to crush map entries for failed OSD
> > creation attempts).
> 
> Any ceph-volume bugs are welcomed as soon as you can get them to us.
> Waiting to get them reported is a problem, since ceph-volume
> is tied to Ceph releases, it means that these will now have to wait
> for another point re

Re: [ceph-users] ceph-disk is now deprecated

2017-11-30 Thread Fabian Grünbichler
On Tue, Nov 28, 2017 at 10:39:31AM -0800, Vasu Kulkarni wrote:
> On Tue, Nov 28, 2017 at 9:22 AM, David Turner  wrote:
> > Isn't marking something as deprecated meaning that there is a better option
> > that we want you to use and you should switch to it sooner than later? I
> > don't understand how this is ready to be marked as such if ceph-volume can't
> > be switched to for all supported use cases. If ZFS, encryption, FreeBSD, etc
> > are all going to be supported under ceph-volume, then how can ceph-disk be
> > deprecated before ceph-volume can support them? I can imagine many Ceph
> > admins wasting time chasing an erroneous deprecated warning because it came
> > out before the new solution was mature enough to replace the existing
> > solution.
> 
> There is no need to worry about this deprecation, Its mostly for
> admins to be prepared
> for the changes coming ahead and its mostly for *new* installations
> that can plan on using ceph-volume which provides
> great flexibility compared to ceph-disk.

changing existing installations to output deprecation warnings from one
minor release to the next means it is not just for new installations
though, no matter how you spin it. a mention in the release notes and
docs would be enough to get admins to test and use ceph-volume on new
installations.

I am pretty sure many admins will be bothered by all nodes running OSDs
spamming the logs and their terminals with huge deprecation warnings on
each OSD activation[1] or other actions involving ceph-disk, and having
this state for the remainder of Luminous unless they switch to a new
(and as of yet not battle-tested) way of activating their OSDs seems
crazy to me.

I know our users will be, and given the short notice and huge impact
this would have we will likely have to remove the deprecation warnings
altogether in our (downstream) packages until we have completed testing
of and implementing support for ceph-volume..

> 
> a) many dont use ceph-disk or ceph-volume directly, so the tool you
> have right now eg: ceph-deploy or ceph-ansible
> will still support the ceph-disk, the previous ceph-deploy release is
> still available from pypi
>   https://pypi.python.org/pypi/ceph-deploy

we have >> 10k (user / customer managed!) installations on Ceph Luminous
alone, all using our wrapper around ceph-disk - changing something like
this in the middle of a release causes huge headaches for downstreams
like us, and is not how a stable project is supposed to be run.

> 
> b) also the current push will help anyone who is using ceph-deploy or
> ceph-disk in scripts/chef/etc
>to have time to think about using newer cli based on ceph-volume

a regular deprecate at the beginning of the release cycle were the
replacement is deemed stable, remove in the next release cycle would be
adequate for this purpose.

I don't understand the rush to shoe-horn ceph-volume into existing
supposedly stable Ceph installations at all - especially given the
current state of ceph-volume (we'll file bugs once we are done writing
them up, but a quick rudimentary test already showed stuff like choking
on valid ceph.conf files because they contain leading whitespace and
incomplete error handling leading to crush map entries for failed OSD
creation attempts).

I DO understand the motivation behind ceph-volume and the desire to get
rid of the udev-based trigger mess, but the solution is not to scare
users into switching in the middle of a release by introducing
deprecation warnings for a core piece of the deployment stack.

IMHO the only reason to push or force such a switch in this manner would
be a (grave) security or data corruption bug, which is not the case at
all here..

1: have you looked at the journal / boot logs of a mid-sized OSD node
using ceph-disk for activation with the deprecation warning active?  if
my boot log is suddenly filled with 20% warnings, my first reaction will
be that something is very wrong.. my likely second reaction when
realizing what is going on is probably not fit for posting to a public
mailing list ;)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-announce] Luminous v12.2.1 released

2017-10-02 Thread Fabian Grünbichler
On Thu, Sep 28, 2017 at 05:46:30PM +0200, Abhishek wrote:
> This is the first bugfix release of Luminous v12.2.x long term stable
> release series. It contains a range of bug fixes and a few features
> across CephFS, RBD & RGW. We recommend all the users of 12.2.x series
> update.
> 
> For more details, refer to the release notes entry at the official
> blog[1] and the complete changelog[2]
> 
> Notable Changes
> ---
> 
> [ snip ]
> 
> * The maximum number of PGs per OSD before the monitor issues a
>warning has been reduced from 300 to 200 PGs.  200 is still twice
>the generally recommended target of 100 PGs per OSD.  This limit can
>be adjusted via the ``mon_max_pg_per_osd`` option on the
>monitors.  The older ``mon_pg_warn_max_per_osd`` option has been
> removed.
> 
> * Creating pools or adjusting pg_num will now fail if the change would
>make the number of PGs per OSD exceed the configured
>``mon_max_pg_per_osd`` limit.  The option can be adjusted if it
>is really necessary to create a pool with more PGs.
> 
> [ snip ]
> 
> Getting Ceph
> 
> 
> [ snip ]
> 
> [1]: http://ceph.com/releases/v12-2-1-luminous-released/
> [2]: https://github.com/ceph/ceph/blob/master/doc/changelog/v12.2.1.txt
> 

those release notes should be corrected, [1] did apparently not make the
cut for 12.2.1 but makes up 1/3 of the notable changes..

1: https://github.com/ceph/ceph/pull/17814


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph packages for Debian Stretch?

2017-06-21 Thread Fabian Grünbichler
On Wed, Jun 21, 2017 at 05:30:02PM +0900, Christian Balzer wrote:
> 
> Hello,
> 
> On Wed, 21 Jun 2017 09:47:08 +0200 (CEST) Alexandre DERUMIER wrote:
> 
> > Hi,
> > 
> > Proxmox is maintening a ceph-luminous repo for stretch
> > 
> > http://download.proxmox.com/debian/ceph-luminous/
> > 
> > 
> > git is here, with patches and modifications to get it work
> > https://git.proxmox.com/?p=ceph.git;a=summary
> >
> While this is probably helpful for the changes needed, my quest is for
> Jewel (really all supported builds) for Stretch.
> And not whenever Luminous gets released, but within the next 10 days.

I think you should be able to just backport the needed commits from
http://tracker.ceph.com/issues/19884 on top of v10.2.7, bump the version
in debian/changelog and use dpkg-buildpackage (or wrapper of your
choice) to rebuild the packages. Building takes a while though ;)

Alternatively use the slightly outdated stock Debian packages (based on
10.2.5 with slightly deviating packaging and the patches in [1]) and
switch over to the official ones when they are available.

1: 
https://anonscm.debian.org/cgit/pkg-ceph/ceph.git/tree/debian/patches?id=7e85745cc7aece92e8f2e505285d451ec2210afa

> 
> Though clearly that's not going to happen, oh well.

Mismatched schedules between yourself and upstream can be cumbersome -
but at least in case of FLOSS you can always take matters into your own
hands and roll your own if the need is big enough ;)

> 
> Christian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] sortbitwise warning broken on Ceph Jewel?

2017-05-16 Thread Fabian Grünbichler
The Kraken release notes[1] contain the following note about the
sortbitwise flag and upgrading from <= Jewel to > Jewel:

The sortbitwise flag must be set on the Jewel cluster before upgrading
to Kraken. The latest Jewel (10.2.4+) releases issue a health warning if
the flag is not set, so this is probably already set. If it is not,
Kraken OSDs will refuse to start and will print and error message in
their log.

I think this refers to the warning introduced by d3dbd8581 [2], which
is triggered if
- a mon config key is set to true (default, not there in master anymore)
- the sortbitwise flag is not set (default for clusters upgrading from
  hammer, not default for new jewel clusters)
- the OSDs support sortbitwise (I assume this is the default for Jewel
  OSDs? I am not sure how to get this information from a running OSD?)

I have not been able to trigger this warning for either an upgraded
Hammer cluster (all nodes upgraded from latest Hammer to latest Jewel
and rebooted) which does not have sortbitwise set, nor for a freshly
installed Jewel cluster where I manually unset sortbitwise and rebooted
afterwards. Am I doing something wrong, or is the check somehow broken?
If the latter is the case, the release notes are very misleading (as
users will probably rely on "no health warning -> safe to upgrade").

I also see one follow-up fix[3] which was only included in Kraken so
far, but AFAICT this should only possible affect the second test with a
manually unset sortbitwise on Jewel, and not the Hammer -> Jewel ->
Kraken/Luminous upgrade path.

1: http://docs.ceph.com/docs/master/release-notes/#upgrading-from-jewel
2: https://github.com/ceph/ceph/commit/d3dbd8581bd39572dc55d4953b5d8c49255426d7
3: https://github.com/ceph/ceph/pull/12682

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automatic OSD start on Jewel

2017-01-04 Thread Fabian Grünbichler
On Wed, Jan 04, 2017 at 12:55:56PM +0100, Florent B wrote:
> On 01/04/2017 12:18 PM, Fabian Grünbichler wrote:
> > On Wed, Jan 04, 2017 at 12:03:39PM +0100, Florent B wrote:
> >> Hi everyone,
> >>
> >> I have a problem with automatic start of OSDs on Debian Jessie with Ceph
> >> Jewel.
> >>
> >> My osd.0 is using /dev/sda5 for data and /dev/sda2 for journal, it is
> >> listed in ceph-disk list :
> >>
> >> /dev/sda :
> >>  /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
> >>  /dev/sda3 other, linux_raid_member
> >>  /dev/sda4 other, linux_raid_member
> >>  /dev/sda2 ceph journal, for /dev/sda5
> >>  /dev/sda5 ceph data, active, cluster ceph, osd.0, journal /dev/sda2
> >>
> >> It was created with ceph-disk prepare.
> >>
> >> When I run "ceph-disk activate /dev/sda5", it is mounted and started.
> >>
> >> If I run "systemctl start ceph-disk@/dev/sda5", the same, it's OK. But
> >> this is a service that can't be "enabled" !!
> >>
> >> But on reboot, nothing happen. The only thing which tries to start is
> >> ceph-osd@0 service (enabled by ceph-disk, not me), and of course it
> >> fails because its data is not mounted.
> >>
> >> I think udev rules should do this, but it does not seem to.
> >>
> >>
> >> root@host102:~# sgdisk -i 2 /dev/sda
> >> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> >> Partition unique GUID: D0F4F00F-723D-4DAD-BA2E-93D52EB564C1
> >> First sector: 2048 (at 1024.0 KiB)
> >> Last sector: 9765887 (at 4.7 GiB)
> >> Partition size: 9763840 sectors (4.7 GiB)
> >> Attribute flags: 
> >> Partition name: 'ceph journal'
> >>
> >> root@host102:~# sgdisk -i 5 /dev/sda
> >> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> >> Partition unique GUID: 5AB4F732-AFBE-4DEA-A4C6-AD290C1302D9
> >> First sector: 123047424 (at 58.7 GiB)
> >> Last sector: 1953459199 (at 931.5 GiB)
> >> Partition size: 1830411776 sectors (872.8 GiB)
> >> Attribute flags: 
> >> Partition name: 'ceph data'
> >>
> >>
> >> Does someone have an idea of what's going on ?
> >>
> >> Thank you.
> >>
> >> Florent
> > are you using the packages from ceph.com? if so, you might be affected
> > by http://tracker.ceph.com/issues/18305 (and
> > http://tracker.ceph.com/issues/17889)
> >
> > did you mask the ceph.service unit generated from the ceph init script?
> >
> > what does "systemctl status '*ceph*'" show? what does "journalctl -b
> > '*ceph*'" show?
> >
> > what happens if you run "ceph-disk activate-all"? (this is what is
> > called last in the init script and will probably trigger mounting of the
> > OSD disk/partition and starting of the ceph-osd@..  service)
> >
> 
> Thank you, that was the problem : I disabled ceph.service unit because I
> thought it was an "old" thing, I didn't knew it is always used.
> Re-enabling it did the trick.
> 
> Isn't it an "old way" of doing things ?
> 

I am not sure if the init script was left on purpose or if nobody
realized that the existing systemd units don't cover all the activation
paths because the init script was forgotten and hides this fact quite
well. I assume the latter ;)

IMHO the current situation is wrong, which is why I filed the bug
(including a proposed fix). especially since the init script actually
starts monitors using systemd-run as transient units instead of via
ceph-mon@XYZ, so on monitor nodes the startup situation can get quite
confusing and racy. so far there hasn't been any feedback - maybe this
thread will help and get some more eyes to look at it..

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automatic OSD start on Jewel

2017-01-04 Thread Fabian Grünbichler
On Wed, Jan 04, 2017 at 12:03:39PM +0100, Florent B wrote:
> Hi everyone,
> 
> I have a problem with automatic start of OSDs on Debian Jessie with Ceph
> Jewel.
> 
> My osd.0 is using /dev/sda5 for data and /dev/sda2 for journal, it is
> listed in ceph-disk list :
> 
> /dev/sda :
>  /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
>  /dev/sda3 other, linux_raid_member
>  /dev/sda4 other, linux_raid_member
>  /dev/sda2 ceph journal, for /dev/sda5
>  /dev/sda5 ceph data, active, cluster ceph, osd.0, journal /dev/sda2
> 
> It was created with ceph-disk prepare.
> 
> When I run "ceph-disk activate /dev/sda5", it is mounted and started.
> 
> If I run "systemctl start ceph-disk@/dev/sda5", the same, it's OK. But
> this is a service that can't be "enabled" !!
> 
> But on reboot, nothing happen. The only thing which tries to start is
> ceph-osd@0 service (enabled by ceph-disk, not me), and of course it
> fails because its data is not mounted.
> 
> I think udev rules should do this, but it does not seem to.
> 
> 
> root@host102:~# sgdisk -i 2 /dev/sda
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: D0F4F00F-723D-4DAD-BA2E-93D52EB564C1
> First sector: 2048 (at 1024.0 KiB)
> Last sector: 9765887 (at 4.7 GiB)
> Partition size: 9763840 sectors (4.7 GiB)
> Attribute flags: 
> Partition name: 'ceph journal'
> 
> root@host102:~# sgdisk -i 5 /dev/sda
> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> Partition unique GUID: 5AB4F732-AFBE-4DEA-A4C6-AD290C1302D9
> First sector: 123047424 (at 58.7 GiB)
> Last sector: 1953459199 (at 931.5 GiB)
> Partition size: 1830411776 sectors (872.8 GiB)
> Attribute flags: 
> Partition name: 'ceph data'
> 
> 
> Does someone have an idea of what's going on ?
> 
> Thank you.
> 
> Florent

are you using the packages from ceph.com? if so, you might be affected
by http://tracker.ceph.com/issues/18305 (and
http://tracker.ceph.com/issues/17889)

did you mask the ceph.service unit generated from the ceph init script?

what does "systemctl status '*ceph*'" show? what does "journalctl -b
'*ceph*'" show?

what happens if you run "ceph-disk activate-all"? (this is what is
called last in the init script and will probably trigger mounting of the
OSD disk/partition and starting of the ceph-osd@..  service)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com