Re: Very slow recovery/peering with latest master

Sage Weil Wed, 23 Sep 2015 18:32:44 -0700

On Wed, 23 Sep 2015, Handzik, Joe wrote:
> Ok. When configuring with ceph-disk, it does something nifty and 
> actually gives the OSD the uuid of the disk's partition as its fsid. I 
> bootstrap off that to get an argument to pass into the function you have 
> identified as the bottleneck. I ran it by sage and we both realized 
> there would be cases where it wouldn't work...I'm sure neither of us 
> realized the failure would take three minutes though.
> 
> In the short term, it makes sense to create an option to disable or 
> short-circuit the blkid code. I would prefer that the default be left 
> with the code enabled, but I'm open to default disabled if others think 
> this will be a widespread problem. You could also make sure your OSD 
> fsids are set to match your disk partition uuids for now too, if that's 
> a faster workaround for you (it'll get rid of the failure).


I think we should try to figure out where it is hanging.  Can you strace 
the blkid process to see what it is up to?

I opened http://tracker.ceph.com/issues/13219

I think as long as it behaves reliably with ceph-disk OSDs then we can 
have it on by default.

sage


> 
> Joe
> 
> > On Sep 23, 2015, at 6:26 PM, Somnath Roy <somnath....@sandisk.com> wrote:
> > 
> > <<inline
> > 
> > -----Original Message-----
> > From: Handzik, Joe [mailto:joseph.t.hand...@hpe.com] 
> > Sent: Wednesday, September 23, 2015 4:20 PM
> > To: Samuel Just
> > Cc: Somnath Roy; Samuel Just (sam.j...@inktank.com); Sage Weil 
> > (s...@newdream.net); ceph-devel
> > Subject: Re: Very slow recovery/peering with latest master
> > 
> > I added that, there is code up the stack in calamari that consumes the path 
> > provided, which is intended in the future to facilitate disk monitoring and 
> > management.
> > 
> > [Somnath] Ok
> > 
> > Somnath, what does your disk configuration look like (filesystem, SSD/HDD, 
> > anything else you think could be relevant)? Did you configure your disks 
> > with ceph-disk, or by hand? I never saw this while testing my code, has 
> > anyone else heard of this behavior on master? The code has been in master 
> > for 2-3 months now I believe.
> > [Somnath] All SSD , I use mkcephfs to create cluster , I partitioned the 
> > disk with fdisk beforehand. I am using XFS. Are you trying with Ubuntu 
> > 3.16.* kernel ? It could be Linux distribution/kernel specific.
> > 
> > It would be nice to not need to disable this, but if this behavior exists 
> > and can't be explained by a misconfiguration or something else I'll need to 
> > figure out a different implementation.
> > 
> > Joe
> > 
> >> On Sep 23, 2015, at 6:07 PM, Samuel Just <sj...@redhat.com> wrote:
> >> 
> >> Wow.  Why would that take so long?  I think you are correct that it's 
> >> only used for metadata, we could just add a config value to disable 
> >> it.
> >> -Sam
> >> 
> >>> On Wed, Sep 23, 2015 at 3:48 PM, Somnath Roy <somnath....@sandisk.com> 
> >>> wrote:
> >>> Sam/Sage,
> >>> I debugged it down and found out that the 
> >>> get_device_by_uuid->blkid_find_dev_with_tag() call within 
> >>> FileStore::collect_metadata() is hanging for ~3 mins before returning a 
> >>> EINVAL. I saw this portion is newly added after hammer.
> >>> Commenting it out resolves the issue. BTW, I saw this value is stored as 
> >>> metadata but not used anywhere , am I missing anything ?
> >>> Here is my Linux details..
> >>> 
> >>> root@emsnode5:~/wip-write-path-optimization/src# uname -a Linux 
> >>> emsnode5 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 
> >>> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> >>> 
> >>> 
> >>> root@emsnode5:~/wip-write-path-optimization/src# lsb_release -a No 
> >>> LSB modules are available.
> >>> Distributor ID: Ubuntu
> >>> Description:    Ubuntu 14.04.2 LTS
> >>> Release:        14.04
> >>> Codename:       trusty
> >>> 
> >>> Thanks & Regards
> >>> Somnath
> >>> 
> >>> -----Original Message-----
> >>> From: Somnath Roy
> >>> Sent: Wednesday, September 16, 2015 2:20 PM
> >>> To: 'Gregory Farnum'
> >>> Cc: 'ceph-devel'
> >>> Subject: RE: Very slow recovery/peering with latest master
> >>> 
> >>> 
> >>> Sage/Greg,
> >>> 
> >>> Yeah, as we expected, it is not happening probably because of recovery 
> >>> settings. I reverted it back in my ceph.conf , but, still seeing this 
> >>> problem.
> >>> 
> >>> Some observation :
> >>> ----------------------
> >>> 
> >>> 1. First of all, I don't think it is something related to my environment. 
> >>> I recreated the cluster with Hammer and this problem is not there.
> >>> 
> >>> 2. I have enabled the messenger/monclient log (Couldn't attach here) in 
> >>> one of the OSDs and found monitor is taking long time to detect the up 
> >>> OSDs. If you see the log, I have started OSD at 2015-09-16 
> >>> 16:13:07.042463 , but, there is no communication (only getting 
> >>> KEEP_ALIVE) till 2015-09-16 16:16:07.180482 , so, 3 mins !!
> >>> 
> >>> 3. During this period, I saw monclient trying to communicate with monitor 
> >>> but not able to probably. It is sending osd_boot at 2015-09-16 
> >>> 16:16:07.180482 only..
> >>> 
> >>> 2015-09-16 16:16:07.180450 7f65377fe700 10 monclient: 
> >>> _send_mon_message to mon.a at 10.60.194.10:6789/0
> >>> 2015-09-16 16:16:07.180482 7f65377fe700  1 -- 10.60.194.10:6820/20102 
> >>> --> 10.60.194.10:6789/0 -- osd_boot(osd.10 booted 0 features 
> >>> 72057594037927935 v45) v6 -- ?+0 0x7f6523c19100 con 0x7f6542045680
> >>> 2015-09-16 16:16:07.180496 7f65377fe700 20 -- 10.60.194.10:6820/20102 
> >>> submit_message osd_boot(osd.10 booted 0 features 72057594037927935 v45) 
> >>> v6 remote, 10.60.194.10:6789/0, have pipe.
> >>> 
> >>> 4. BTW, the osd down scenario is detected very quickly (ceph -w output) , 
> >>> problem is during coming up I guess.
> >>> 
> >>> 
> >>> So, something related to mon communication getting slower ?
> >>> Let me know if more verbose logging is required and how should I share 
> >>> the log..
> >>> 
> >>> Thanks & Regards
> >>> Somnath
> >>> 
> >>> -----Original Message-----
> >>> From: Gregory Farnum [mailto:gfar...@redhat.com]
> >>> Sent: Wednesday, September 16, 2015 11:35 AM
> >>> To: Somnath Roy
> >>> Cc: ceph-devel
> >>> Subject: Re: Very slow recovery/peering with latest master
> >>> 
> >>>> On Tue, Sep 15, 2015 at 8:04 PM, Somnath Roy <somnath....@sandisk.com> 
> >>>> wrote:
> >>>> Hi,
> >>>> I am seeing very slow recovery when I am adding OSDs with the latest 
> >>>> master.
> >>>> Also, If I just restart all the OSDs (no IO is going on in the cluster) 
> >>>> , cluster is taking a significant amount of time to reach in 
> >>>> active+clean state (and even detecting all the up OSDs).
> >>>> 
> >>>> I saw the recovery/backfill default parameters are now changed (to lower 
> >>>> value) , this probably explains the recovery scenario , but, will it 
> >>>> affect the peering time during OSD startup as well ?
> >>> 
> >>> I don't think these values should impact peering time, but you could 
> >>> configure them back to the old defaults and see if it changes.
> >>> -Greg
> >>> 
> >>> ________________________________
> >>> 
> >>> PLEASE NOTE: The information contained in this electronic mail message is 
> >>> intended only for the use of the designated recipient(s) named above. If 
> >>> the reader of this message is not the intended recipient, you are hereby 
> >>> notified that you have received this message in error and that any 
> >>> review, dissemination, distribution, or copying of this message is 
> >>> strictly prohibited. If you have received this communication in error, 
> >>> please notify the sender by telephone or e-mail (as shown above) 
> >>> immediately and destroy any and all copies of this message in your 
> >>> possession (whether hard copies or electronically stored copies).
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> >> in the body of a message to majord...@vger.kernel.org More majordomo 
> >> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Very slow recovery/peering with latest master

Reply via email to