Re: [ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Udo Lembke
Hi,


On 16.11.2016 19:01, Vincent Godin wrote:
> Hello,
>
> We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2
> (initial was hammer 0.94.5) but we have still some big problems on our
> production environment :
>
>   * some ceph filesystem are not mounted at startup and we have to
> mount them with the "/bin/sh -c 'flock /var/lock/ceph-disk
> /usr/sbin/ceph-disk --verbose --log-stdout trigger --syn /dev/vdX1'"
>
vdX1?? This sounds you use ceph inside an virtualized system?

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Nick Fisk
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Vincent Godin
Sent: 16 November 2016 18:02
To: ceph-users <ceph-users@lists.ceph.com>
Subject: [ceph-users] Help needed ! cluster unstable after upgrade from Hammer 
to Jewel

 

Hello,

We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial was 
hammer 0.94.5) but we have still some big problems on our production 
environment :

*   some ceph filesystem are not mounted at startup and we have to mount 
them with the "/bin/sh -c 'flock /var/lock/ceph-disk /usr/sbin/ceph-disk 
--verbose --log-stdout trigger --syn /dev/vdX1'"
*   some OSD start but are in timeout as soon as they start for a pretty 
long time (more than 5 mn)

*   016-11-15 01:46:26.625945 7f79db91e800  0 osd.32 191438 done with init, 
starting boot process
2016-11-15 01:47:28.344996 7f79d61f7700  1 heartbeat_map is_healthy 
'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
2016-11-15 01:47:33.345098 7f79d61f7700  1 heartbeat_map is_healthy 
'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
...

*   these OSD take very long time to stop

*   we just loosed one OSD and the cluster is unable to stabilize and some 
OSDs go Up and Down. The cluster is in ERR state and can not serve production 
environment

*   we are in jewel 10.2.2 on CentOS 7.2 kernel 3.10.0-327.36.3.el7.x86_64

Help will be apreciate !

Vincent

Can you see anything that might indicate why the OSD’s are taking a long time 
to start up. Ie any errors in the kernel log or do the disks look like they are 
working very hard when the OSD tries to start?

Also a quick google of “heartbeat_map is_healthy 'FileStore::op_tp thread” 
brings up several past threads, it might be worth seeing if any of them had a 
solution.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Vincent Godin
Hello,

We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial
was hammer 0.94.5) but we have still some big problems on our production
environment :

   - some ceph filesystem are not mounted at startup and we have to mount
   them with the "/bin/sh -c 'flock /var/lock/ceph-disk /usr/sbin/ceph-disk
   --verbose --log-stdout trigger --syn /dev/vdX1'"

   - some OSD start but are in timeout as soon as they start for a pretty
   long time (more than 5 mn)
  - 016-11-15 01:46:26.625945 7f79db91e800  0 osd.32 191438 done with
  init, starting boot process
  2016-11-15 01:47:28.344996 7f79d61f7700  1 heartbeat_map is_healthy
  'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
  2016-11-15 01:47:33.345098 7f79d61f7700  1 heartbeat_map is_healthy
  'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
  ...

  - these OSD take very long time to stop


   - we just loosed one OSD and the cluster is unable to stabilize and some
   OSDs go Up and Down. The cluster is in ERR state and can not serve
   production environment


   - we are in jewel 10.2.2 on CentOS 7.2 kernel 3.10.0-327.36.3.el7.x86_64

Help will be apreciate !

Vincent
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com