Re: [ceph-users] size of inc_osdmap vs osdmap

Gregory Farnum Wed, 02 Jan 2019 15:14:24 -0800

On Thu, Dec 27, 2018 at 1:20 PM Sergey Dolgov <palz...@gmail.com> wrote:


> We investigated the issue and set debug_mon up to 20 during little change
> of osdmap get many messages for all pgs of each pool (for all cluster):
>
>> 2018-12-25 19:28:42.426776 7f075af7d700 20 mon.1@0(leader).osd e1373789
>> prime_pg_tempnext_up === next_acting now, clear pg_temp
>> 2018-12-25 19:28:42.426776 7f075a77c700 20 mon.1@0(leader).osd e1373789
>> prime_pg_tempnext_up === next_acting now, clear pg_temp
>> 2018-12-25 19:28:42.426777 7f075977a700 20 mon.1@0(leader).osd e1373789
>> prime_pg_tempnext_up === next_acting now, clear pg_temp
>> 2018-12-25 19:28:42.426779 7f075af7d700 20 mon.1@0(leader).osd e1373789
>> prime_pg_temp 3.1000 [97,812,841]/[] -> [97,812,841]/[97,812,841], priming
>> []
>> 2018-12-25 19:28:42.426780 7f075a77c700 20 mon.1@0(leader).osd e1373789
>> prime_pg_temp 3.0 [84,370,847]/[] -> [84,370,847]/[84,370,847], priming []
>> 2018-12-25 19:28:42.426781 7f075977a700 20 mon.1@0(leader).osd e1373789
>> prime_pg_temp 4.0 [404,857,11]/[] -> [404,857,11]/[404,857,11], priming []
>
> though no pg_temps are created as result(no single backfill)
>
> We suppose this behavior changed in commit
> https://github.com/ceph/ceph/pull/16530/commits/ea723fbb88c69bd00fefd32a3ee94bf5ce53569c
> because earlier function *OSDMonitor::prime_pg_temp* should return in
> https://github.com/ceph/ceph/blob/luminous/src/mon/OSDMonitor.cc#L1009
> like in jewel
> https://github.com/ceph/ceph/blob/jewel/src/mon/OSDMonitor.cc#L1214
>
> i accept that we may be mistaken
>

Well those commits made some changes, but I'm not sure what about them
you're saying is wrong?

What would probably be most helpful is if you can dump out one of those
over-large incremental osdmaps and see what's using up all the space. (You
may be able to do it through the normal Ceph CLI by querying the monitor?
Otherwise if it's something very weird you may need to get the
ceph-dencoder tool and look at it with that.)
-Greg


>
>
> On Wed, Dec 12, 2018 at 10:53 PM Gregory Farnum <gfar...@redhat.com>
> wrote:
>
>> Hmm that does seem odd. How are you looking at those sizes?
>>
>> On Wed, Dec 12, 2018 at 4:38 AM Sergey Dolgov <palz...@gmail.com> wrote:
>>
>>> Greq, for example for our cluster ~1000 osd:
>>>
>>> size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860,
>>> modified 2018-12-12 04:00:17.661731)
>>> size osdmap.1357882__0_F7FE772D__none = 363KB
>>> size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861,
>>> modified 2018-12-12 04:00:27.385702)
>>> size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB
>>>
>>> difference between epoch 1357881 and 1357883: crush weight one osd was
>>> increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size
>>> inc_osdmap so huge
>>>
>>> чт, 6 дек. 2018 г. в 06:20, Gregory Farnum <gfar...@redhat.com>:
>>> >
>>> > On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov <palz...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi guys
>>> >>
>>> >> I faced strange behavior of crushmap change. When I change crush
>>> >> weight osd I sometimes get  increment osdmap(1.2MB) which size is
>>> >> significantly bigger than size of osdmap(0.4MB)
>>> >
>>> >
>>> > This is probably because when CRUSH changes, the new primary OSDs for
>>> a PG will tend to set a "pg temp" value (in the OSDMap) that temporarily
>>> reassigns it to the old acting set, so the data can be accessed while the
>>> new OSDs get backfilled. Depending on the size of your cluster, the number
>>> of PGs on it, and the size of the CRUSH change, this can easily be larger
>>> than the rest of the map because it is data with size linear in the number
>>> of PGs affected, instead of being more normally proportional to the number
>>> of OSDs.
>>> > -Greg
>>> >
>>> >>
>>> >> I use luminois 12.2.8. Cluster was installed a long ago, I suppose
>>> >> that initially it was firefly
>>> >> How can I view content of increment osdmap or can you give me opinion
>>> >> on this problem. I think that spikes of traffic tight after change of
>>> >> crushmap relates to this crushmap behavior
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> Best regards, Sergey Dolgov
>>>
>>
>
> --
> Best regards, Sergey Dolgov
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] size of inc_osdmap vs osdmap

Reply via email to