Re: [ceph-users] dropping trusty
On Fri, Dec 1, 2017 at 1:55 AM, David Gallowaywrote: > On 11/30/2017 12:21 PM, Sage Weil wrote: >> We're talking about dropping trusty support for mimic due to the old >> compiler (incomplete C++11), hassle of using an updated toolchain, general >> desire to stop supporting old stuff, and lack of user objections to >> dropping it in the next release. >> >> We would continue to build trusty packages for luminous and older >> releases, just not mimic going forward. >> >> My question is whether we should drop all of the trusty installs on smithi >> and focus testing on xenial and centos. I haven't seen any trusty related >> failures in half a year. There were some kernel-related issues 6+ months >> ago that are resolved, and there is a valgrind issue with xenial that is >> making us do valgrind only on centos, but otherwise I don't recall any >> other problems. I think the likelihood of a trusty-specific regression on >> luminous/jewel is low. Note that we can still do install and smoke >> testing on VMs to ensure the packages work; we just wouldn't stress test. >> >> Does this seem reasonable? If so, we could reimage the trusty hosts >> immediately, right? >> >> Am I missing anything? >> > > Someone would need to prune through the qa dir and make sure nothing > relies on trusty for tests. We've gotten into a bind recently with the David, thanks for point out the direction. i removed the references to trusty and updated related bits in https://github.com/ceph/ceph/pull/19307. > testing of FOG [1] where jobs are stuck in Waiting for a long time > (tying up workers) because jobs are requesting Trusty. We got close to > having zero Trusty testnodes since the wip-fog branch has been reimaging > baremetal testnodes on every job. > > But other than that, yes, I can reimage the Trusty testnodes. Once FOG > is merged into teuthology master, we won't have to worry about this > anymore since jobs will automatically reimage machines based on what > distro they require. since https://github.com/ceph/teuthology/pull/1126 is merged, could you help reimage the trusty test nodes? > > [1] https://github.com/ceph/teuthology/compare/wip-fog > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Kefu Chai ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is the 12.2.1 really stable? Anybody have production cluster with Luminous Bluestore?
Hi, We're running 12.2.1 on production and facing some memory & cpu issues --> http://tracker.ceph.com/issues/4?next_issue_id=3_issue_id=5 http://tracker.ceph.com/issues/21933 Try 12.2.2 http://ceph.com/releases/v12-2-2-luminous-released/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG::peek_map_epoch assertion fail
A debug log captured when this happens with debug_osd set to at least 15 should tell us. On Sun, Dec 3, 2017 at 10:54 PM, Gonzalo Aguilar Delgadowrote: > Hello, > > What can make fail this assertion? > > > int r = store->omap_get_values(coll, pgmeta_oid, keys, ); > if (r == 0) { > assert(values.size() == 2); -- > > 0> 2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: In function > 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, > ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03 13:39:29.495311 > osd/PG.cc: 3025: FAILED assert(values.size() == 2) > > It seems that's the cause of all the troubles I'm finding. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG::peek_map_epoch assertion fail
Hello, What can make fail this assertion? int r = store->omap_get_values(coll, pgmeta_oid, keys, ); if (r == 0) { assert(values.size() == 2); -- 0> 2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03 13:39:29.495311 osd/PG.cc: 3025: FAILED assert(values.size() == 2) It seems that's the cause of all the troubles I'm finding. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Another OSD broken today. How can I recover it?
Hi, Yes. Nice. Until all your OSD fails and you don't know what else to try. Looking at the faillure rates it will happen very soon. I want to recover them. I'm writing in another mail what I tried. Let see if someone can help me. I'm not doing anything. Just looking at my cluster from time to time to find that something else failed. I will do hard to recover this situation. Thank you. On 26/11/17 16:13, Marc Roos wrote: > > If I am not mistaken, the whole idea with the 3 replica's is dat you > have enough copies to recover from a failed osd. In my tests this seems > to go fine automatically. Are you doing something that is not adviced? > > > > > -Original Message- > From: Gonzalo Aguilar Delgado [mailto:gagui...@aguilardelgado.com] > Sent: zaterdag 25 november 2017 20:44 > To: 'ceph-users' > Subject: [ceph-users] Another OSD broken today. How can I recover it? > > Hello, > > > I had another blackout with ceph today. It seems that ceph osd's fall > from time to time and they are unable to recover. I have 3 OSD's down > now. 1 removed from the cluster and 2 down because I'm unable to recover > them. > > > We really need a recovery tool. It's not normal that an OSD breaks and > there's no way to recover. Is there any way to do it? > > > Last one shows this: > > > > > ] enter Reset >-12> 2017-11-25 20:34:19.548891 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[0.34(unlocked)] enter Initial >-11> 2017-11-25 20:34:19.548983 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 > 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] > exit Initial 0.91 0 0.00 >-10> 2017-11-25 20:34:19.548994 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 > 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] > enter Reset > -9> 2017-11-25 20:34:19.549166 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[10.36(unlocked)] enter Initial > -8> 2017-11-25 20:34:19.566781 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 > n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 > crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial > 0.017614 0 0.00 > -7> 2017-11-25 20:34:19.566811 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 > n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 > crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset > -6> 2017-11-25 20:34:19.585411 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[8.5c(unlocked)] enter Initial > -5> 2017-11-25 20:34:19.602888 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 > 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] > exit Initial 0.017478 0 0.00 > -4> 2017-11-25 20:34:19.602912 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 > 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] > enter Reset > -3> 2017-11-25 20:34:19.603082 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[9.10(unlocked)] enter Initial > -2> 2017-11-25 20:34:19.615456 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 > ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 > crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial > 0.012373 0 0.00 > -1> 2017-11-25 20:34:19.615481 7f6e5dc158c0 5 osd.4 pg_epoch: 9686 > pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 > ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 > crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset > 0> 2017-11-25 20:34:19.617400 7f6e5dc158c0 -1 osd/PG.cc: In > function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, > ceph::bufferlist*)' thread 7f6e5dc158c0 time 2017-11-25 20:34:19.615633 > osd/PG.cc: 3025: FAILED assert(values.size() == 2) > > ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x80) [0x5562d318d790] > 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, > ceph::buffer::list*)+0x661) [0x5562d2b4b601] > 3: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa] > 4: (OSD::init()+0x2026) [0x5562d2aaaca6] > 5: (main()+0x2ef1) [0x5562d2a1c301] > 6: (__libc_start_main()+0xf0) [0x7f6e5aa75830] > 7: (_start()+0x29) [0x5562d2a5db09] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > --- logging levels --- >0/ 5 none >0/ 1 lockdep >0/ 1 context >1/ 1 crush >1/ 5 mds >1/ 5 mds_balancer >1/ 5 mds_locker >1/ 5 mds_log >1/ 5 mds_log_expire >1/ 5 mds_migrator >0/ 1 buffer >
Re: [ceph-users] ceph-disk removal roadmap (was ceph-disk is now deprecated)
Quoting Alfredo Deza (ad...@redhat.com): > > Looks like there is a tag in there that broke it. Lets follow up on a > tracker issue so that we don't hijack this thread? > > http://tracker.ceph.com/projects/ceph-volume/issues/new Issue 22305 made for this: http://tracker.ceph.com/issues/22305 You are right, sorry for hijacking this thread. Gr. Stefan P.s. co-worker of Dennis Lijnsveld -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com