Thanks Sam, I will go ahead opening a tracker for this.

Thanks,
Guang


----------------------------------------
> Date: Tue, 18 Aug 2015 08:42:04 -0700
> Subject: Re: OSD::do_mon_report - do we need holding osd_lock
> From: sj...@redhat.com
> To: yguan...@outlook.com
> CC: ceph-devel@vger.kernel.org
>
> Probably! A quick glance at do_mon_report doesn't seem to turn up
> anything I'd expect to be really hard to refactor. You do need to
> break out the required data (into OSDService, I'd think) so that the
> lock is not necessary.
> -Sam
>
> On Mon, Aug 17, 2015 at 6:10 PM, GuangYang <yguan...@outlook.com> wrote:
>> Hi Sam,
>> Today I noticed a scenario that monitor marked OSD down since it did not 
>> receive the PG stats from the OSD, further investigation showed that the 
>> reason why OSD didn't report stats because it failed to acquire the 
>> osd_lock, what happened was:
>> 1. one PG is undergoing long-run peering (search for missing objects)
>> 2. An OP hold the osd_lock and try to acquire the PG lock, which is being 
>> held by 1).
>> 3. OSD tick thread failed to acquire osd_lock and stuck for 10 minutes, thus 
>> failed to update to monitor its stats
>> 4. monitor mark it down
>>
>> After looking at the code, we found several assertions (that osd_lock should 
>> be held) around OSD::do_mon_report, is that required? Any chance to overcome 
>> the problem described above by refactoring the locking there?
>>
>> Thanks,
>> Guang
                                          

Reply via email to