Re: [ceph-users] dropping trusty

2017-12-03 Thread kefu chai
On Fri, Dec 1, 2017 at 1:55 AM, David Galloway  wrote:
> On 11/30/2017 12:21 PM, Sage Weil wrote:
>> We're talking about dropping trusty support for mimic due to the old
>> compiler (incomplete C++11), hassle of using an updated toolchain, general
>> desire to stop supporting old stuff, and lack of user objections to
>> dropping it in the next release.
>>
>> We would continue to build trusty packages for luminous and older
>> releases, just not mimic going forward.
>>
>> My question is whether we should drop all of the trusty installs on smithi
>> and focus testing on xenial and centos.  I haven't seen any trusty related
>> failures in half a year.  There were some kernel-related issues 6+ months
>> ago that are resolved, and there is a valgrind issue with xenial that is
>> making us do valgrind only on centos, but otherwise I don't recall any
>> other problems.  I think the likelihood of a trusty-specific regression on
>> luminous/jewel is low.  Note that we can still do install and smoke
>> testing on VMs to ensure the packages work; we just wouldn't stress test.
>>
>> Does this seem reasonable?  If so, we could reimage the trusty hosts
>> immediately, right?
>>
>> Am I missing anything?
>>
>
> Someone would need to prune through the qa dir and make sure nothing
> relies on trusty for tests.  We've gotten into a bind recently with the

David, thanks for point out the direction. i removed the references to trusty
and updated related bits in https://github.com/ceph/ceph/pull/19307.

> testing of FOG [1] where jobs are stuck in Waiting for a long time
> (tying up workers) because jobs are requesting Trusty.  We got close to
> having zero Trusty testnodes since the wip-fog branch has been reimaging
> baremetal testnodes on every job.
>
> But other than that, yes, I can reimage the Trusty testnodes.  Once FOG
> is merged into teuthology master, we won't have to worry about this
> anymore since jobs will automatically reimage machines based on what
> distro they require.

since https://github.com/ceph/teuthology/pull/1126 is merged, could you help
reimage the trusty test nodes?

>
> [1] https://github.com/ceph/teuthology/compare/wip-fog
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Regards
Kefu Chai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is the 12.2.1 really stable? Anybody have production cluster with Luminous Bluestore?

2017-12-03 Thread Konstantin Shalygin

Hi,

We're running 12.2.1 on production and facing some memory & cpu issues -->

http://tracker.ceph.com/issues/4?next_issue_id=3_issue_id=5

http://tracker.ceph.com/issues/21933

Try 12.2.2 http://ceph.com/releases/v12-2-2-luminous-released/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG::peek_map_epoch assertion fail

2017-12-03 Thread Brad Hubbard
A debug log captured when this happens with debug_osd set to at least
15 should tell us.

On Sun, Dec 3, 2017 at 10:54 PM, Gonzalo Aguilar Delgado
 wrote:
> Hello,
>
> What can make fail this assertion?
>
>
>   int r = store->omap_get_values(coll, pgmeta_oid, keys, );
>   if (r == 0) {
> assert(values.size() == 2); --
>
>  0> 2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: In function
> 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
> ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03 13:39:29.495311
> osd/PG.cc: 3025: FAILED assert(values.size() == 2)
>
> It seems that's the cause of all the troubles I'm finding.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG::peek_map_epoch assertion fail

2017-12-03 Thread Gonzalo Aguilar Delgado
Hello,

What can make fail this assertion?


  int r = store->omap_get_values(coll, pgmeta_oid, keys, );
  if (r == 0) {
    assert(values.size() == 2); --

 0> 2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: In
function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
ceph::bufferlist*)' thread 7f467ba0b8c0 time 2017-12-03 13:39:29.495311
osd/PG.cc: 3025: FAILED assert(values.size() == 2)

It seems that's the cause of all the troubles I'm finding.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another OSD broken today. How can I recover it?

2017-12-03 Thread Gonzalo Aguilar Delgado
Hi,

Yes. Nice. Until all your OSD fails and you don't know what else to try.
Looking at the faillure rates it will happen very soon.

I want to recover them. I'm writing in another mail what I tried. Let
see if someone can help me.

I'm not doing anything. Just looking at my cluster from time to time to
find that something else failed. I will do hard to recover this situation.

Thank you.


On 26/11/17 16:13, Marc Roos wrote:
>  
> If I am not mistaken, the whole idea with the 3 replica's is dat you 
> have enough copies to recover from a failed osd. In my tests this seems 
> to go fine automatically. Are you doing something that is not adviced?
>
>
>
>
> -Original Message-
> From: Gonzalo Aguilar Delgado [mailto:gagui...@aguilardelgado.com] 
> Sent: zaterdag 25 november 2017 20:44
> To: 'ceph-users'
> Subject: [ceph-users] Another OSD broken today. How can I recover it?
>
> Hello, 
>
>
> I had another blackout with ceph today. It seems that ceph osd's fall 
> from time to time and they are unable to recover. I have 3 OSD's down 
> now. 1 removed from the cluster and 2 down because I'm unable to recover 
> them. 
>
>
> We really need a recovery tool. It's not normal that an OSD breaks and 
> there's no way to recover. Is there any way to do it?
>
>
> Last one shows this:
>
>
>
>
> ] enter Reset
>-12> 2017-11-25 20:34:19.548891 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[0.34(unlocked)] enter Initial
>-11> 2017-11-25 20:34:19.548983 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 
> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
> exit Initial 0.91 0 0.00
>-10> 2017-11-25 20:34:19.548994 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[0.34( empty local-les=9685 n=0 ec=404 les/c/f 9685/9685/0 
> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
> enter Reset
> -9> 2017-11-25 20:34:19.549166 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[10.36(unlocked)] enter Initial
> -8> 2017-11-25 20:34:19.566781 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 
> n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 
> crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 
> 0.017614 0 0.00
> -7> 2017-11-25 20:34:19.566811 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[10.36( v 9686'7301894 (9686'7298879,9686'7301894] local-les=9685 
> n=534 ec=419 les/c/f 9685/9686/0 9684/9684/9684) [4,0] r=0 lpr=0 
> crt=9686'7301894 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
> -6> 2017-11-25 20:34:19.585411 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[8.5c(unlocked)] enter Initial
> -5> 2017-11-25 20:34:19.602888 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 
> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
> exit Initial 0.017478 0 0.00
> -4> 2017-11-25 20:34:19.602912 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[8.5c( empty local-les=9685 n=0 ec=348 les/c/f 9685/9685/0 
> 9684/9684/9684) [4,0] r=0 lpr=0 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] 
> enter Reset
> -3> 2017-11-25 20:34:19.603082 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[9.10(unlocked)] enter Initial
> -2> 2017-11-25 20:34:19.615456 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 
> ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 
> crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 
> 0.012373 0 0.00
> -1> 2017-11-25 20:34:19.615481 7f6e5dc158c0  5 osd.4 pg_epoch: 9686 
> pg[9.10( v 9686'2322547 (9031'2319518,9686'2322547] local-les=9685 n=261 
> ec=417 les/c/f 9685/9685/0 9684/9684/9684) [4,0] r=0 lpr=0 
> crt=9686'2322547 lcod 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
>  0> 2017-11-25 20:34:19.617400 7f6e5dc158c0 -1 osd/PG.cc: In 
> function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, 
> ceph::bufferlist*)' thread 7f6e5dc158c0 time 2017-11-25 20:34:19.615633
> osd/PG.cc: 3025: FAILED assert(values.size() == 2)
>
>  ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x80) [0x5562d318d790]
>  2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, 
> ceph::buffer::list*)+0x661) [0x5562d2b4b601]
>  3: (OSD::load_pgs()+0x75a) [0x5562d2a9f8aa]
>  4: (OSD::init()+0x2026) [0x5562d2aaaca6]
>  5: (main()+0x2ef1) [0x5562d2a1c301]
>  6: (__libc_start_main()+0xf0) [0x7f6e5aa75830]
>  7: (_start()+0x29) [0x5562d2a5db09]
>  NOTE: a copy of the executable, or `objdump -rdS ` is 
> needed to interpret this.
>
> --- logging levels ---
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>1/ 5 mds_locker
>1/ 5 mds_log
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>

Re: [ceph-users] ceph-disk removal roadmap (was ceph-disk is now deprecated)

2017-12-03 Thread Stefan Kooman
Quoting Alfredo Deza (ad...@redhat.com):
> 
> Looks like there is a tag in there that broke it. Lets follow up on a
> tracker issue so that we don't hijack this thread?
> 
> http://tracker.ceph.com/projects/ceph-volume/issues/new

Issue 22305 made for this: http://tracker.ceph.com/issues/22305

You are right, sorry for hijacking this thread.

Gr. Stefan

P.s. co-worker of Dennis Lijnsveld

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com