[ceph-users] cephbot - a Slack bot for Ceph has been added to the github.com/ceph project
cephbot [1] is a project that I've been working on and using for years now and it has been added to the github.com/ceph project to increase visibility for other people that would like to implement slack-ops for their Ceph clusters. The instructions show how to set it up so that read-only operations can be performed from Slack for security purposes, but there are settings that could make it possible to lock down who can communicate with cephbot which could make it relatively secure to run administrative tasks as well. Ask here or in the Ceph Slack instance if you have any questions about its uses, implementation, or would like to contribute. I hope you find it as useful as I have. David Turner Sony Interactive Entertainment [1] https://github.com/ceph/cephbot-slack ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] CephFS health warnings after deleting millions of files
A rogue process wrote 38M files into a single CephFS directory that took about a month to delete. We had to increase MDS cache sizes to handle the increased file volume, but we've been able to reduce all of our settings back to default. Ceph cluster is 15.2.11. Cephfs clients are ceph-fuse either version 14.2.16 or 15.2.11 depending if they've been upgraded yet. Nothing has changed in the last ~6 months in regards to client versions or cluster version. We are currently dealing with 2 issues now that things seem to be cleaned up. 1. MDSs report slow requests. [1] Dumping the blocked requests has the same output for all of them. They seemingly get stuck AFTER the event succeeds to acquire locks. I can't find any information about what's happening after this or why things are getting stuck here. 2. Clients failing to advance oldest client/flush tid. There are 2 clients that are the worst offenders for this, but a few other clients are having this same issue. All of the clients having this issue are on 14.2.16, but we also have a hundred clients on the same version that don't have this issue at all. [2] The logs make it look like the clients just have a bad integer/pointer somehow. We can clean up the error by remounting the filesystem or rebooting the server, but these 2 clients in particular keep ending up in this state. No other repeat offenders yet, but we've had 4 other servers in this state over the last couple weeks. Are there any ideas what the next steps might be for diagnosing either of these issues? Thank you. -David Turner [1] $ sudo ceph daemon mds.mon1 dump_blocked_ops { "ops": [ { "description": "client_request(client.17709580:39254 open #0x10001c99cd4 2022-02-22T16:25:40.231547+ caller_uid=0, caller_gid=0{})", "initiated_at": "2022-04-19T19:07:10.663552+", "age": 90.920778446, "duration": 90.92080624405, "type_data": { "flag_point": "acquired locks", "reqid": "client.17709580:39254", "op_type": "client_request", "client_info": { "client": "client.17709580", "tid": 39254 }, "events": [ { "time": "2022-04-19T19:07:10.663552+", "event": "initiated" }, { "time": "2022-04-19T19:07:10.663549+", "event": "throttled" }, { "time": "2022-04-19T19:07:10.663552+", "event": "header_read" }, { "time": "2022-04-19T19:07:10.663555+", "event": "all_read" }, { "time": "2022-04-19T19:07:10.665744+", "event": "dispatched" }, { "time": "2022-04-19T19:07:10.773894+", "event": "failed to xlock, waiting" }, { "time": "2022-04-19T19:07:10.807249+", "event": "acquired locks" } ] } }, [2] 2022-04-19 06:15:36.108 7fb28b7fe700 0 client.30095002 handle_cap_flush_ack mds.1 got unexpected flush ack tid 338611 expected is 0 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph-mds using a lot of buffer_anon memory
We just upgraded a cephfs cluster from 12.2.12 to 14.2.11. Our next step is to upgrade to 14.2.16 to troubleshoot this issue, but I thought I'd reach out here first if anyone had any ideas. The clients are still running an older version of ceph-fuse 12.2.4 and it's very difficult to remount all of them. Would probably take a team of us a couple days to restart all of them. I've looked around online and release notes and all of the known memory leaks I've been able to find have been fixed prior to version 14.2.11 so this would be an unknown memory leak. All of the memory is in use in [1] buffer_anon. If left unchecked it will use up over 700GB of memory within 24 hours. On an identical cluster with an equivalent workload still running 12.2.12 [2] buffer_anon information is much healthier. Without any other options or ideas our plan is to upgrade the cluster to 14.2.16 first and then upgrade the clients. Has anyone else come across high buffer_anon usage? [1] "buffer_anon": { "items": 33756758, "bytes": 135025912897 }, [2] "buffer_anon": { "items": 636, "bytes": 273118 }, ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD corruption and down PGs
Do you have access to another Ceph cluster with enough available space to create rbds that you dd these failing disks into? That's what I'm doing right now with some failing disks. I've recovered 2 out of 6 osds that failed in this way. I would recommend against using the same cluster for this, but a stage cluster or something would be great. On Tue, May 12, 2020, 7:36 PM Kári Bertilsson wrote: > Hi Paul > > I was able to mount both OSD's i need data from successfully using > "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse > --mountpoint /osd92/" > > I see the PG slices that are missing in the mounted folder > "41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside the > mounted folder and that works fine. > > But when i try to export it fails. I get the same error when trying to > list. > > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op list > --debug > Output @ https://pastebin.com/nXScEL6L > > Any ideas ? > > On Tue, May 12, 2020 at 12:17 PM Paul Emmerich > wrote: > > > First thing I'd try is to use objectstore-tool to scrape the > > inactive/broken PGs from the dead OSDs using it's PG export feature. > > Then import these PGs into any other OSD which will automatically recover > > it. > > > > Paul > > > > -- > > Paul Emmerich > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > croit GmbH > > Freseniusstr. 31h > > 81247 München > > www.croit.io > > Tel: +49 89 1896585 90 > > > > > > On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson > > wrote: > > > >> Yes > >> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1 > >> > >> On Tue, May 12, 2020 at 10:39 AM Eugen Block wrote: > >> > >> > Can you share your osd tree and the current ceph status? > >> > > >> > > >> > Zitat von Kári Bertilsson : > >> > > >> > > Hello > >> > > > >> > > I had an incidence where 3 OSD's crashed at once completely and > won't > >> > power > >> > > up. And during recovery 3 OSD's in another host have somehow become > >> > > corrupted. I am running erasure coding with 8+2 setup using crush > map > >> > which > >> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few > >> PG's > >> > > down. Unfortunately these PG's seem to overlap almost all data on > the > >> > pool, > >> > > so i believe the entire pool is mostly lost after only these 2% of > >> PG's > >> > > down. > >> > > > >> > > I am running ceph 14.2.9. > >> > > > >> > > OSD 92 log https://pastebin.com/5aq8SyCW > >> > > OSD 97 log https://pastebin.com/uJELZxwr > >> > > > >> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's > >> > still > >> > > fail with the log above. > >> > > > >> > > Log from trying ceph-bluestore-tool repair --deep which is still > >> running, > >> > > not sure if it will actually fix anything and log looks pretty bad. > >> > > https://pastebin.com/gkqTZpY3 > >> > > > >> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 > >> --op > >> > > list" gave me input/output error. But everything in SMART looks OK, > >> and i > >> > > see no indication of hardware read error in any logs. Same for both > >> OSD. > >> > > > >> > > The OSD's with corruption have absolutely no bad sectors and likely > >> have > >> > > only a minor corruption but at important locations. > >> > > > >> > > Any ideas on how to recover this kind of scenario ? Any tips would > be > >> > > highly appreciated. > >> > > > >> > > Best regards, > >> > > Kári Bertilsson > >> > > ___ > >> > > ceph-users mailing list -- ceph-users@ceph.io > >> > > To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > >> > > >> > ___ > >> > ceph-users mailing list -- ceph-users@ceph.io > >> > To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: 回复: Re: OSDs continuously restarting under load
badblocks has found over 50 bad sectors so far and still running. xfs_repair stopped running twice with a message "Killed" likely indicating that it hit a similar bus error that ceph-osd is running into. This seems like a fairly simple case of failing disks. I just hope I can get through it without data loss. On Thu, Apr 30, 2020 at 10:14 PM David Turner wrote: > I have 2 filestore OSDs in a cluster facing "Caught signal (Bus error)" as > well and can't find anything about it. Ceph 12.2.12. The disks are less > than 50% full and basic writes have been successful. Both disks are on > different nodes. The other 14 disks on each node are unaffected. > > Restarting the node doesn't change the behavior. The affected OSD still > crashes and the other 14 start fine (which likely rules out the controller > and other shared components along those lines). > > I've attempted [1] these commands on the OSDs to see how much of the disk > I could access cleanly. The first is just to flush the journal to disk and > it crashed out with the same error. The second command is to compact the DB > which also crashed with the same error. On one of the OSDs I was able to > make it a fair bit into compacting the DB before it crashed the first time > and now it crashes instantly. > > That leads me to think that it might have gotten to a specific part of the > disk and/or filesystem that is having problems. I'm currently running [2] > xfs_repair on one of the disks to see if it might be the filesystem. On the > other disk I'm running [3] badblocks to check for problems with underlying > sectors. > > I'm assuming that if it's a bad block on the disk that is preventing the > disk from starting that there's really nothing that I can do to recover the > OSD and I'll just need to export any PGs on the disks that aren't active. > Here's hoping I make it through this without data loss. Since I started > this data migration I've already lost a couple disks (completely unreadable > by the OS so I can't get copies of the PGs off of them). Luckily these ones > seem like I might be able to access that part of the data at least. As > well, I only have some unfound objects at the moment, but all of my PGs are > active, which is an improvement. > > > [1] sudo -u ceph ceph-osd -i 285 --flush-journal > sudo -u ceph ceph-kvstore-tool leveldb > /var/lib/ceph/osd/ceph-285/current/omap compact > > [2] xfs_repair -n /dev/sdi1 > [3] badblocks -b 4096 -v /dev/sdn > > On Thu, Mar 19, 2020 at 9:04 AM huxia...@horebdata.cn < > huxia...@horebdata.cn> wrote: > >> Hi, Igor, >> >> thanks for the tip. Dmesg does not say any suspicious information. >> >> I will investigate whether hardware has any problem or not. >> >> best regards, >> >> samuel >> >> >> >> >> >> huxia...@horebdata.cn >> >> 发件人: Igor Fedotov >> 发送时间: 2020-03-19 12:07 >> 收件人: huxia...@horebdata.cn; ceph-users; ceph-users >> 主题: Re: [ceph-users] OSDs continuously restarting under load >> Hi, Samuel, >> >> I've never seen that sort of signal in the real life: >> >> 2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) ** >> >> >> I suppose this has some hardware roots. Have you checked dmesg output? >> >> >> Just in case, here is some info on "Bus Error" signal, may be it will >> provide some insight: https://en.wikipedia.org/wiki/Bus_error >> >> >> Thanks, >> >> Igor >> >> >> On 3/18/2020 5:06 PM, huxia...@horebdata.cn wrote: >> > Hello, folks, >> > >> > I am trying to add a ceph node into an existing ceph cluster. Once the >> reweight of newly-added OSD on the new node exceed 0.4 somewhere, the osd >> becomes unresponsive and restarting, eventually go down. >> > >> > What could be the problem? Any suggestion would be highly appreciated. >> > >> > best regards, >> > >> > samuel >> > >> > >> > root@node81:/var/log/ceph# >> > root@node81:/var/log/ceph# >> > root@node81:/var/log/ceph# >> > root@node81:/var/log/ceph# ceph osd df >> > ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS >> > 12 hybrid 1.0 1.0 3.81TiB 38.3GiB 3.77TiB 0.98 1.32 316 >> > 13 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 308 >> > 14 hybrid 1.0 1.0 3.81TiB 36.9GiB 3.77TiB 0.95 1.27 301 >> > 15 hybrid 1.0 1.0 3.81TiB 37.1GiB 3.77TiB 0.95 1.28 297 >
[ceph-users] Re: 回复: Re: OSDs continuously restarting under load
I have 2 filestore OSDs in a cluster facing "Caught signal (Bus error)" as well and can't find anything about it. Ceph 12.2.12. The disks are less than 50% full and basic writes have been successful. Both disks are on different nodes. The other 14 disks on each node are unaffected. Restarting the node doesn't change the behavior. The affected OSD still crashes and the other 14 start fine (which likely rules out the controller and other shared components along those lines). I've attempted [1] these commands on the OSDs to see how much of the disk I could access cleanly. The first is just to flush the journal to disk and it crashed out with the same error. The second command is to compact the DB which also crashed with the same error. On one of the OSDs I was able to make it a fair bit into compacting the DB before it crashed the first time and now it crashes instantly. That leads me to think that it might have gotten to a specific part of the disk and/or filesystem that is having problems. I'm currently running [2] xfs_repair on one of the disks to see if it might be the filesystem. On the other disk I'm running [3] badblocks to check for problems with underlying sectors. I'm assuming that if it's a bad block on the disk that is preventing the disk from starting that there's really nothing that I can do to recover the OSD and I'll just need to export any PGs on the disks that aren't active. Here's hoping I make it through this without data loss. Since I started this data migration I've already lost a couple disks (completely unreadable by the OS so I can't get copies of the PGs off of them). Luckily these ones seem like I might be able to access that part of the data at least. As well, I only have some unfound objects at the moment, but all of my PGs are active, which is an improvement. [1] sudo -u ceph ceph-osd -i 285 --flush-journal sudo -u ceph ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-285/current/omap compact [2] xfs_repair -n /dev/sdi1 [3] badblocks -b 4096 -v /dev/sdn On Thu, Mar 19, 2020 at 9:04 AM huxia...@horebdata.cn wrote: > Hi, Igor, > > thanks for the tip. Dmesg does not say any suspicious information. > > I will investigate whether hardware has any problem or not. > > best regards, > > samuel > > > > > > huxia...@horebdata.cn > > 发件人: Igor Fedotov > 发送时间: 2020-03-19 12:07 > 收件人: huxia...@horebdata.cn; ceph-users; ceph-users > 主题: Re: [ceph-users] OSDs continuously restarting under load > Hi, Samuel, > > I've never seen that sort of signal in the real life: > > 2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) ** > > > I suppose this has some hardware roots. Have you checked dmesg output? > > > Just in case, here is some info on "Bus Error" signal, may be it will > provide some insight: https://en.wikipedia.org/wiki/Bus_error > > > Thanks, > > Igor > > > On 3/18/2020 5:06 PM, huxia...@horebdata.cn wrote: > > Hello, folks, > > > > I am trying to add a ceph node into an existing ceph cluster. Once the > reweight of newly-added OSD on the new node exceed 0.4 somewhere, the osd > becomes unresponsive and restarting, eventually go down. > > > > What could be the problem? Any suggestion would be highly appreciated. > > > > best regards, > > > > samuel > > > > > > root@node81:/var/log/ceph# > > root@node81:/var/log/ceph# > > root@node81:/var/log/ceph# > > root@node81:/var/log/ceph# ceph osd df > > ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS > > 12 hybrid 1.0 1.0 3.81TiB 38.3GiB 3.77TiB 0.98 1.32 316 > > 13 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 308 > > 14 hybrid 1.0 1.0 3.81TiB 36.9GiB 3.77TiB 0.95 1.27 301 > > 15 hybrid 1.0 1.0 3.81TiB 37.1GiB 3.77TiB 0.95 1.28 297 > > 0 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 305 > > 1 hybrid 1.0 1.0 3.81TiB 38.2GiB 3.77TiB 0.98 1.31 309 > > 2 hybrid 1.0 1.0 3.81TiB 37.4GiB 3.77TiB 0.96 1.29 296 > > 3 hybrid 1.0 1.0 3.81TiB 37.9GiB 3.77TiB 0.97 1.30 303 > > 4hdd 0.2 1.0 3.42TiB 10.5GiB 3.41TiB 0.30 0.40 0 > > 5hdd 0.2 1.0 3.42TiB 9.63GiB 3.41TiB 0.28 0.37 87 > > 6hdd 0.2 1.0 3.42TiB 1.91GiB 3.42TiB 0.05 0.07 0 > > 7hdd 0.2 1.0 3.42TiB 11.3GiB 3.41TiB 0.32 0.43 83 > > 16hdd 0.3 1.0 1.79TiB 16.3GiB 1.78TiB 0.89 1.19 142 > > TOTAL 45.9TiB 351GiB 45.6TiB 0.75 > > > > > > 日志 > > > > root@node81:/var/log/ceph# cat ceph-osd.6.log | grep load_pgs > > 2020-03-18 18:33:57.808747 2000b556000 0 osd.6 0 load_pgs > > 2020-03-18 18:33:57.808763 2000b556000 0 osd.6 0 load_pgs opened 0 pgs > > -1324> 2020-03-18 18:33:57.808747 2000b556000 0 osd.6 0 load_pgs > > -1323> 2020-03-18 18:33:57.808763 2000b556000 0 osd.6 0 load_pgs > opened 0 pgs > > 2020-03-18 18:35:04.363341 2000327
[ceph-users] Re: increasing PG count - limiting disruption
There are a few factors to consider. I've gone from 16k pgs to 32k pgs before and learned some lessons. The first and most imminent is the peering that happens when you increase the PG count. I like to increase the pg_num and pgp_num values slowly to mitigate this. Something like [1] this should do the trick to increase your pg count slowly and waiting for all peering and such to finish before continuing. It will also wait for a few other statuses that you shouldn't be doing maintenance like this during. The second is that mons do not compact their databases while a pg is in a non-"clean" state. That means that while your cluster is creating these new PGs and moving data around, that your mon stores will grow with new maps until everything is healthy again. This is desired behavior to keep everything healthy in Ceph in the face of failures, BUT it means that you need to be aware of how much space you have on your mons for the mon store to grow. When I was increasing from 16k to 32k PGs, that means we could only create 4k PGs at a time. In that cluster that would take about 2 weeks to finish. When we tried to do more than that, our mons ran out of space and we had to add disks to the mons to move the mon stores to so that the mons could continue to run. Finally know that this is just going to take a while (depending on how much data is in your cluster and how full it is). Be patient. Either you increase max_backfills, lower backfill sleep, and such to make the backfilling go faster (at the cost of IOPS used here that the clients can't) or you keep these throttled to not impact clients as much. Keep a good balance though as putting off finishing the recovery for too long leaves your cluster in a riskier position for that much longer. Good luck. [1] *Note that I typed this in gmail and not copied from a script. Please test before using. ceph osd set nobackfill ceph osd set norebalance function healthy_wait() { while ceph health | grep -q 'peering\|inactive\|activating\|creating\|down\|inconsistent\|stale'; do echo waiting for ceph to be healthier sleep 10 done } for count in {2048..4096..256}; do healthy_wait ceph osd pool set $pool pg_num $count healthy_wait ceph osd pool set $pool pgp_num $count done healthy_wait ceph osd unset nobackfill ceph osd unset norebalance On Thu, Nov 14, 2019 at 11:19 AM Frank R wrote: > Hi all, > > When increasing the number of placement groups for a pool by a large > amount (say 2048 to 4096) is it better to go in small steps or all at once? > > This is a filestore cluster. > > Thanks, > Frank > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mix ceph-disk and ceph-volume
Yes, there is nothing wrong with this and had been a common scenario for people during their migration from filestore to bluestore. On Tue, Oct 22, 2019, 9:46 PM Frank R wrote: > Is it ok to create a new OSD using ceph-volume on a server where the other > OSDs were created with ceph-disk? > > thx > Frank > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: minimum osd size?
I did a set of 30GB OSDs before with extra disk space on my SSDs for the metadata pool on cephfs and my entire cluster locked up about 3 weeks later. Some metadata operation was happening, filled some of the 30GB disks to 100%, and all IO was blocked in the cluster. I did some trickery of deleting 1 copy of a few PGs on each OSD, such that I still had at least 2 copies of each PG, and was able to backfill the pool back onto my HDDs and restore cluster functionality. I would say that trying to use that space is definitely not worth it. In one of my production clusters I occasionally get a warning state that an omap object is too large in my buckets.index pool. I could very easily imagine that stalling the entire cluster if my index pool were on such small OSDs. On Tue, Oct 22, 2019, 6:55 PM Frank R wrote: > Hi all, > > I have 40 nvme drives with about 20G free space each. > > Would creating a 10GB partition/lvm on each of the nvmes for an rgw index > pool be a bad idea? > > RGW has about about 5 million objects > > I don't think space will be an issue but I am worried about the 10G size, > is it just too small for a bluestore OSD? > > thx > Frank > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: download.ceph.com repository changes
IRT a testing/cutting edge repo, the non-LTS versions of Ceph have been removed because very few people ever used them and tested them. The majority of people that would be using the testing repo would be people needing a bug fix ASAP. Very few people would actually use this regularly and its effectiveness would be almost zero in preventing problems slipping through. At work I haven't had a problem with which version of Ceph is being installed because we always have local mirrors of the repo that we only update with the upstream repos when we're ready to test a new version in our QA environments long before we promote the version for production use. That said, I've been bit by this multiple times in my home environment where I've accidentally updated a server or reinstalled a server and needed to upgrade my Ceph cluster before I could finish because it installed a newer version of Ceph. I have had to download the entire copy of a version from online, put it into a folder on disk, and set up a repo feeding from that local folder to install a specific version. This would be very handy to just use the ability in apt or yum to just specify a different version of a package in the repo. Problem releases have become more problematic than needed because the packages were left the default packages after a bug was known because there was no way to remove them from the repo. People continue to see the upgrade and grabbing it not realizing it's a busted release. I've only seen that happen on the ML here, but I personally will not touch a new release for at least 2 weeks after it's been released even in my testing clusters. On Tue, Sep 24, 2019 at 4:06 PM Ken Dreyer wrote: > On Tue, Sep 17, 2019 at 8:03 AM Sasha Litvak > wrote: > > > > * I am bothered with a quality of the releases of a very complex system > that > > can bring down a whole house and keep it down for a while. While I wish > the > > QA would be perfect, I wonder if it would be practical to release new > > packages to a testing repo before moving it to a main one. There is a > > chance then someone will detect a problem before it becomes a production > > issue. Let it seat for a couple days or weeks in testing. People who > need > > new update right away or just want to test will install it and report the > > problems. Others will not be affected. > > I think it would be a good step forward to have a separate "testing" > repository. This repository would be a little more cutting-edge, and we'd > copy > all the binaries over to the "main" repository location after 48 hours or > something. > > This would let us all publicly test the candidate GPG-signed packages, for > example. > > - Ken > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io