Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Gregory Farnum
On Thu, Feb 23, 2017 at 2:34 PM, Benjeman Meekhof  wrote:
> Hi Greg,
>
> Appreciate you looking into it.  I'm concerned about CPU power per
> daemon as well...though we never had this issue when restarting our
> dense nodes under Jewel.  Is the rapid rate of OSDmap generation a
> one-time condition particular to post-update processing or to Kraken
> in general?

I'm not aware of anything that would have made this change in Kraken,
but it's possible. Sorry I don't have more detail on this.
-Greg

>
> We did eventually get all the OSD back up either by doing so in small
> batches or setting nodown and waiting for the host to churn
> through...a day or so later all the OSD pop up.  Now that we're in a
> stable non-degraded state I have to do more tests to see what happens
> under Kraken when we kill a node or several nodes.
>
> I have to give ceph a lot of credit here.  Following my email the 16th
> while we were in a marginal state with kraken OSD churning to come up
> we lost a data center for a minute.  Subsequently we had our remaining
> 2 mons refuse to stay in quorom long enough to serve cluster sessions
> (constant back and forth elections).  I believe the issue was timeouts
> caused by explosive leveldb growth in combination with other activity
> but eventually we got them to come back by increasing db lease time in
> ceph settings.  We had some unfound objects at this point but after
> waiting out all the OSD coming online with nodown/noout set everything
> was fine.  I should have been more careful in applying the update but
> as one of our team put it we definitely found out that Ceph is
> resilient to admins as well as other disasters.
>
> thanks,
> Ben
>
> On Thu, Feb 23, 2017 at 5:10 PM, Gregory Farnum  wrote:
>> On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof  wrote:
>>> I tried starting up just a couple OSD with debug_osd = 20 and
>>> debug_filestore = 20.
>>>
>>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>>> unusual but maybe someone else sees something in here that is a
>>> problem:  http://pastebin.com/uy8S7hps
>>>
>>> As this log is rolling on, our OSD has still not been marked up and is
>>> occupying 100% of a CPU core.  I've done this a couple times and in a
>>> matter of some hours it will be marked up and CPU will drop.  If more
>>> kraken OSD on another host are brought up the existing kraken OSD go
>>> back into max CPU usage again while pg recover.  The trend scales
>>> upward as OSD are started until the system is completely saturated.
>>>
>>> I was reading the docs on async messenger settings at
>>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>>> that under 'ms async max op threads' there is a note about one or more
>>> CPUs constantly on 100% load.  As an experiment I set max op threads
>>> to 20 and that is the setting during the period of the pasted log.  It
>>> seems to make no difference.
>>>
>>> Appreciate any thoughts on troubleshooting this.  For the time being
>>> I've aborted our kraken update and will probably re-initialize any
>>> already updated OSD to revert to Jewel except perhaps one host to
>>> continue testing.
>>
>> Ah, that log looks like you're just generating OSDMaps so quickly that
>> rebooting 60 at a time leaves you with a ludicrous number to churn
>> through, and that takes a while. It would have been exacerbated by
>> having 60 daemons fight for the CPU to process them, leading to
>> flapping.
>>
>> You might try restarting daemons sequentially on the node instead of
>> all at once. Depending on your needs it would be even cheaper if you
>> set the nodown flag, though obviously that will impede IO while it
>> happens.
>>
>> I'd be concerned that this demonstrates you don't have enough CPU
>> power per daemon, though.
>> -Greg
>>
>>>
>>> thanks,
>>> Ben
>>>
>>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
 On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
 wrote:
> Hi all,
>
> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
> mons to Kraken.
>
> After updating ceph packages I restarted the 60 OSD on the box with
> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
> load flat-lines at 100% with top showing all of that being system load
> from ceph-osd processes.  Not long after we get OSD flapping due to
> the load on the system (noout was set to start this, but perhaps
> too-quickly unset post restart).
>
> This is causing problems in the cluster, and we reboot the box.  The
> OSD don't start up/mount automatically - not a new problem on this
> setup.  We run 'ceph-disk activate $disk' on a list of all the
> /dev/dm-X devices as output by ceph-disk list.  Everything activates
> and the CPU gradually climbs to once again be a solid 100%.  No OSD
> have joined cluster so it isn't causing issues.

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Benjeman Meekhof
Hi Greg,

Appreciate you looking into it.  I'm concerned about CPU power per
daemon as well...though we never had this issue when restarting our
dense nodes under Jewel.  Is the rapid rate of OSDmap generation a
one-time condition particular to post-update processing or to Kraken
in general?

We did eventually get all the OSD back up either by doing so in small
batches or setting nodown and waiting for the host to churn
through...a day or so later all the OSD pop up.  Now that we're in a
stable non-degraded state I have to do more tests to see what happens
under Kraken when we kill a node or several nodes.

I have to give ceph a lot of credit here.  Following my email the 16th
while we were in a marginal state with kraken OSD churning to come up
we lost a data center for a minute.  Subsequently we had our remaining
2 mons refuse to stay in quorom long enough to serve cluster sessions
(constant back and forth elections).  I believe the issue was timeouts
caused by explosive leveldb growth in combination with other activity
but eventually we got them to come back by increasing db lease time in
ceph settings.  We had some unfound objects at this point but after
waiting out all the OSD coming online with nodown/noout set everything
was fine.  I should have been more careful in applying the update but
as one of our team put it we definitely found out that Ceph is
resilient to admins as well as other disasters.

thanks,
Ben

On Thu, Feb 23, 2017 at 5:10 PM, Gregory Farnum  wrote:
> On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof  wrote:
>> I tried starting up just a couple OSD with debug_osd = 20 and
>> debug_filestore = 20.
>>
>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>> unusual but maybe someone else sees something in here that is a
>> problem:  http://pastebin.com/uy8S7hps
>>
>> As this log is rolling on, our OSD has still not been marked up and is
>> occupying 100% of a CPU core.  I've done this a couple times and in a
>> matter of some hours it will be marked up and CPU will drop.  If more
>> kraken OSD on another host are brought up the existing kraken OSD go
>> back into max CPU usage again while pg recover.  The trend scales
>> upward as OSD are started until the system is completely saturated.
>>
>> I was reading the docs on async messenger settings at
>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>> that under 'ms async max op threads' there is a note about one or more
>> CPUs constantly on 100% load.  As an experiment I set max op threads
>> to 20 and that is the setting during the period of the pasted log.  It
>> seems to make no difference.
>>
>> Appreciate any thoughts on troubleshooting this.  For the time being
>> I've aborted our kraken update and will probably re-initialize any
>> already updated OSD to revert to Jewel except perhaps one host to
>> continue testing.
>
> Ah, that log looks like you're just generating OSDMaps so quickly that
> rebooting 60 at a time leaves you with a ludicrous number to churn
> through, and that takes a while. It would have been exacerbated by
> having 60 daemons fight for the CPU to process them, leading to
> flapping.
>
> You might try restarting daemons sequentially on the node instead of
> all at once. Depending on your needs it would be even cheaper if you
> set the nodown flag, though obviously that will impede IO while it
> happens.
>
> I'd be concerned that this demonstrates you don't have enough CPU
> power per daemon, though.
> -Greg
>
>>
>> thanks,
>> Ben
>>
>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>>> wrote:
 Hi all,

 We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
 (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
 mons to Kraken.

 After updating ceph packages I restarted the 60 OSD on the box with
 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
 load flat-lines at 100% with top showing all of that being system load
 from ceph-osd processes.  Not long after we get OSD flapping due to
 the load on the system (noout was set to start this, but perhaps
 too-quickly unset post restart).

 This is causing problems in the cluster, and we reboot the box.  The
 OSD don't start up/mount automatically - not a new problem on this
 setup.  We run 'ceph-disk activate $disk' on a list of all the
 /dev/dm-X devices as output by ceph-disk list.  Everything activates
 and the CPU gradually climbs to once again be a solid 100%.  No OSD
 have joined cluster so it isn't causing issues.

 I leave the box overnight...by the time I leave I see that 1-2 OSD on
 this box are marked up/in.   By morning all are in, CPU is fine,
 cluster is still fine.

 This is not a show-stopping issue now that I know what happens though
 it means upgrades are a several hour or overnight affair

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Gregory Farnum
On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof  wrote:
> I tried starting up just a couple OSD with debug_osd = 20 and
> debug_filestore = 20.
>
> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
> unusual but maybe someone else sees something in here that is a
> problem:  http://pastebin.com/uy8S7hps
>
> As this log is rolling on, our OSD has still not been marked up and is
> occupying 100% of a CPU core.  I've done this a couple times and in a
> matter of some hours it will be marked up and CPU will drop.  If more
> kraken OSD on another host are brought up the existing kraken OSD go
> back into max CPU usage again while pg recover.  The trend scales
> upward as OSD are started until the system is completely saturated.
>
> I was reading the docs on async messenger settings at
> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
> that under 'ms async max op threads' there is a note about one or more
> CPUs constantly on 100% load.  As an experiment I set max op threads
> to 20 and that is the setting during the period of the pasted log.  It
> seems to make no difference.
>
> Appreciate any thoughts on troubleshooting this.  For the time being
> I've aborted our kraken update and will probably re-initialize any
> already updated OSD to revert to Jewel except perhaps one host to
> continue testing.

Ah, that log looks like you're just generating OSDMaps so quickly that
rebooting 60 at a time leaves you with a ludicrous number to churn
through, and that takes a while. It would have been exacerbated by
having 60 daemons fight for the CPU to process them, leading to
flapping.

You might try restarting daemons sequentially on the node instead of
all at once. Depending on your needs it would be even cheaper if you
set the nodown flag, though obviously that will impede IO while it
happens.

I'd be concerned that this demonstrates you don't have enough CPU
power per daemon, though.
-Greg

>
> thanks,
> Ben
>
> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>> wrote:
>>> Hi all,
>>>
>>> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
>>> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
>>> mons to Kraken.
>>>
>>> After updating ceph packages I restarted the 60 OSD on the box with
>>> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
>>> load flat-lines at 100% with top showing all of that being system load
>>> from ceph-osd processes.  Not long after we get OSD flapping due to
>>> the load on the system (noout was set to start this, but perhaps
>>> too-quickly unset post restart).
>>>
>>> This is causing problems in the cluster, and we reboot the box.  The
>>> OSD don't start up/mount automatically - not a new problem on this
>>> setup.  We run 'ceph-disk activate $disk' on a list of all the
>>> /dev/dm-X devices as output by ceph-disk list.  Everything activates
>>> and the CPU gradually climbs to once again be a solid 100%.  No OSD
>>> have joined cluster so it isn't causing issues.
>>>
>>> I leave the box overnight...by the time I leave I see that 1-2 OSD on
>>> this box are marked up/in.   By morning all are in, CPU is fine,
>>> cluster is still fine.
>>>
>>> This is not a show-stopping issue now that I know what happens though
>>> it means upgrades are a several hour or overnight affair.  Next box I
>>> will just mark all the OSD out before updating and restarting them or
>>> try leaving them up but being sure to set noout to avoid flapping
>>> while they churn.
>>>
>>> Here's a log snippet from one currently spinning in the startup
>>> process since 11am.  This is the second box we did, the first
>>> experience being as detailed above.  Could this have anything to do
>>> with the 'PGs are upgrading' message?
>>
>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>> scale with the object count. I could be missing something but I don't
>> see anything in the upgrade notes that should be doing this either.
>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>> 20" set and see what the log spits out.
>> -Greg
>>
>>>
>>> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load 
>>> lua
>>> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514119978713088, adjusting msgr requires for clients
>>> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032 was 8705, adjusting msgr requires for
>>> mons
>>> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032, adjusting msgr requires for osds
>>> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
>>> 2017-02-14 11:04:09.522249 7fd7a0372940 -1 osd.585 135493 PGs are upgrading
>>> 2017-02-14 11:04:10.246166 7fd7a0372940  0 osd.585 135493 load_pgs
>>> opened 148 pgs
>>> 2017-02-14 11:04:10.246249 7f

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Benjeman Meekhof
Sure, looks as follows:

ceph -s
cluster 24b3bf92-299d-426c-ae56-48a995014f04
 health HEALTH_ERR
775 pgs are stuck inactive for more than 300 seconds
1954 pgs backfill_wait
7764 pgs degraded
517 pgs down
3 pgs inconsistent
504 pgs peering
31 pgs recovering
5661 pgs recovery_wait
6976 pgs stuck degraded
775 pgs stuck inactive
8362 pgs stuck unclean
1851 pgs stuck undersized
1880 pgs undersized
110 requests are blocked > 32 sec
recovery 2788277/17810399 objects degraded (15.655%)
recovery 1846569/17810399 objects misplaced (10.368%)
recovery 11442/5635366 unfound (0.203%)
76 scrub errors
 monmap e4: 3 mons at
{msu-mon01=207.73.217.13:6789/0,um-mon01=141.211.169.13:6789/0,wsu-mon01=204.39.195.13:6789/0}
election epoch 23402, quorum 0,1,2 um-mon01,wsu-mon01,msu-mon01
  fsmap e1074: 1/1/1 up {0=wsu-mds01=up:active}, 1
up:standby-replay, 1 up:standby
mgr active: um-mon01
 osdmap e152705: 627 osds: 475 up, 475 in; 2434 remapped pgs
flags sortbitwise,require_jewel_osds
  pgmap v13524534: 20864 pgs, 25 pools, 21246 GB data, 5503 kobjects
59604 GB used, 3397 TB / 3455 TB avail
2788277/17810399 objects degraded (15.655%)
1846569/17810399 objects misplaced (10.368%)
11442/5635366 unfound (0.203%)
   12501 active+clean
5638 active+recovery_wait+degraded
1617 active+undersized+degraded+remapped+backfill_wait
 436 down+remapped+peering
 258 undersized+degraded+remapped+backfill_wait+peered
 189 active+degraded
  79 active+remapped+backfill_wait
  68 down+peering
  31 active+recovering+degraded
  20 active+recovery_wait+degraded+remapped
  10 down
   4 active+degraded+remapped
   3 down+remapped
   3 active+undersized+degraded+remapped
   2 active+recovery_wait+undersized+degraded+remapped
   2 active+remapped
   1 active+clean+inconsistent
   1 active+recovery_wait+degraded+inconsistent
   1 active+degraded+inconsistent

On Thu, Feb 16, 2017 at 5:08 PM, Shinobu Kinjo  wrote:
> Would you simply do?
>
>  * ceph -s
>
> On Fri, Feb 17, 2017 at 6:26 AM, Benjeman Meekhof  wrote:
>> As I'm looking at logs on the OSD mentioned in previous email at this
>> point, I mostly see this message repeating...is this normal or
>> indicating a problem?  This osd is marked up in the cluster.
>>
>> 2017-02-16 16:23:35.550102 7fc66fce3700 20 osd.564 152609
>> share_map_peer 0x7fc6887a3000 already has epoch 152609
>> 2017-02-16 16:23:35.556208 7fc66f4e2700 20 osd.564 152609
>> share_map_peer 0x7fc689e35000 already has epoch 152609
>> 2017-02-16 16:23:35.556233 7fc66f4e2700 20 osd.564 152609
>> share_map_peer 0x7fc689e35000 already has epoch 152609
>> 2017-02-16 16:23:35.577324 7fc66fce3700 20 osd.564 152609
>> share_map_peer 0x7fc68f4c1000 already has epoch 152609
>> 2017-02-16 16:23:35.577356 7fc6704e4700 20 osd.564 152609
>> share_map_peer 0x7fc68f4c1000 already has epoch 152609
>>
>> thanks,
>> Ben
>>
>> On Thu, Feb 16, 2017 at 12:19 PM, Benjeman Meekhof  
>> wrote:
>>> I tried starting up just a couple OSD with debug_osd = 20 and
>>> debug_filestore = 20.
>>>
>>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>>> unusual but maybe someone else sees something in here that is a
>>> problem:  http://pastebin.com/uy8S7hps
>>>
>>> As this log is rolling on, our OSD has still not been marked up and is
>>> occupying 100% of a CPU core.  I've done this a couple times and in a
>>> matter of some hours it will be marked up and CPU will drop.  If more
>>> kraken OSD on another host are brought up the existing kraken OSD go
>>> back into max CPU usage again while pg recover.  The trend scales
>>> upward as OSD are started until the system is completely saturated.
>>>
>>> I was reading the docs on async messenger settings at
>>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>>> that under 'ms async max op threads' there is a note about one or more
>>> CPUs constantly on 100% load.  As an experiment I set max op threads
>>> to 20 and that is the setting during the period of the pasted log.  It
>>> seems to make no difference.
>>>
>>> Appreciate any thoughts on troubleshooting this.  For the time being
>>> I've aborted our kraken update and will probably re-initialize any
>>> already updated OSD to revert to Jewel except perhaps one host to
>>> continue testing.
>>>
>>> thanks,
>>> Ben
>>>
>>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
 On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meek

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Shinobu Kinjo
Would you simply do?

 * ceph -s

On Fri, Feb 17, 2017 at 6:26 AM, Benjeman Meekhof  wrote:
> As I'm looking at logs on the OSD mentioned in previous email at this
> point, I mostly see this message repeating...is this normal or
> indicating a problem?  This osd is marked up in the cluster.
>
> 2017-02-16 16:23:35.550102 7fc66fce3700 20 osd.564 152609
> share_map_peer 0x7fc6887a3000 already has epoch 152609
> 2017-02-16 16:23:35.556208 7fc66f4e2700 20 osd.564 152609
> share_map_peer 0x7fc689e35000 already has epoch 152609
> 2017-02-16 16:23:35.556233 7fc66f4e2700 20 osd.564 152609
> share_map_peer 0x7fc689e35000 already has epoch 152609
> 2017-02-16 16:23:35.577324 7fc66fce3700 20 osd.564 152609
> share_map_peer 0x7fc68f4c1000 already has epoch 152609
> 2017-02-16 16:23:35.577356 7fc6704e4700 20 osd.564 152609
> share_map_peer 0x7fc68f4c1000 already has epoch 152609
>
> thanks,
> Ben
>
> On Thu, Feb 16, 2017 at 12:19 PM, Benjeman Meekhof  wrote:
>> I tried starting up just a couple OSD with debug_osd = 20 and
>> debug_filestore = 20.
>>
>> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
>> unusual but maybe someone else sees something in here that is a
>> problem:  http://pastebin.com/uy8S7hps
>>
>> As this log is rolling on, our OSD has still not been marked up and is
>> occupying 100% of a CPU core.  I've done this a couple times and in a
>> matter of some hours it will be marked up and CPU will drop.  If more
>> kraken OSD on another host are brought up the existing kraken OSD go
>> back into max CPU usage again while pg recover.  The trend scales
>> upward as OSD are started until the system is completely saturated.
>>
>> I was reading the docs on async messenger settings at
>> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
>> that under 'ms async max op threads' there is a note about one or more
>> CPUs constantly on 100% load.  As an experiment I set max op threads
>> to 20 and that is the setting during the period of the pasted log.  It
>> seems to make no difference.
>>
>> Appreciate any thoughts on troubleshooting this.  For the time being
>> I've aborted our kraken update and will probably re-initialize any
>> already updated OSD to revert to Jewel except perhaps one host to
>> continue testing.
>>
>> thanks,
>> Ben
>>
>> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>>> wrote:
 Hi all,

 We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
 (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
 mons to Kraken.

 After updating ceph packages I restarted the 60 OSD on the box with
 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
 load flat-lines at 100% with top showing all of that being system load
 from ceph-osd processes.  Not long after we get OSD flapping due to
 the load on the system (noout was set to start this, but perhaps
 too-quickly unset post restart).

 This is causing problems in the cluster, and we reboot the box.  The
 OSD don't start up/mount automatically - not a new problem on this
 setup.  We run 'ceph-disk activate $disk' on a list of all the
 /dev/dm-X devices as output by ceph-disk list.  Everything activates
 and the CPU gradually climbs to once again be a solid 100%.  No OSD
 have joined cluster so it isn't causing issues.

 I leave the box overnight...by the time I leave I see that 1-2 OSD on
 this box are marked up/in.   By morning all are in, CPU is fine,
 cluster is still fine.

 This is not a show-stopping issue now that I know what happens though
 it means upgrades are a several hour or overnight affair.  Next box I
 will just mark all the OSD out before updating and restarting them or
 try leaving them up but being sure to set noout to avoid flapping
 while they churn.

 Here's a log snippet from one currently spinning in the startup
 process since 11am.  This is the second box we did, the first
 experience being as detailed above.  Could this have anything to do
 with the 'PGs are upgrading' message?
>>>
>>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>>> scale with the object count. I could be missing something but I don't
>>> see anything in the upgrade notes that should be doing this either.
>>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>>> 20" set and see what the log spits out.
>>> -Greg
>>>

 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to 
 load lua
 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
 has features 288514119978713088, adjusting msgr requires for clients
 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
 has features 288514394856620032 was 8705, adjusting msgr requires for
 mons
 2017-02-14 11:04:07.07

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Benjeman Meekhof
As I'm looking at logs on the OSD mentioned in previous email at this
point, I mostly see this message repeating...is this normal or
indicating a problem?  This osd is marked up in the cluster.

2017-02-16 16:23:35.550102 7fc66fce3700 20 osd.564 152609
share_map_peer 0x7fc6887a3000 already has epoch 152609
2017-02-16 16:23:35.556208 7fc66f4e2700 20 osd.564 152609
share_map_peer 0x7fc689e35000 already has epoch 152609
2017-02-16 16:23:35.556233 7fc66f4e2700 20 osd.564 152609
share_map_peer 0x7fc689e35000 already has epoch 152609
2017-02-16 16:23:35.577324 7fc66fce3700 20 osd.564 152609
share_map_peer 0x7fc68f4c1000 already has epoch 152609
2017-02-16 16:23:35.577356 7fc6704e4700 20 osd.564 152609
share_map_peer 0x7fc68f4c1000 already has epoch 152609

thanks,
Ben

On Thu, Feb 16, 2017 at 12:19 PM, Benjeman Meekhof  wrote:
> I tried starting up just a couple OSD with debug_osd = 20 and
> debug_filestore = 20.
>
> I pasted a sample of the ongoing log here.  To my eyes it doesn't look
> unusual but maybe someone else sees something in here that is a
> problem:  http://pastebin.com/uy8S7hps
>
> As this log is rolling on, our OSD has still not been marked up and is
> occupying 100% of a CPU core.  I've done this a couple times and in a
> matter of some hours it will be marked up and CPU will drop.  If more
> kraken OSD on another host are brought up the existing kraken OSD go
> back into max CPU usage again while pg recover.  The trend scales
> upward as OSD are started until the system is completely saturated.
>
> I was reading the docs on async messenger settings at
> http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
> that under 'ms async max op threads' there is a note about one or more
> CPUs constantly on 100% load.  As an experiment I set max op threads
> to 20 and that is the setting during the period of the pasted log.  It
> seems to make no difference.
>
> Appreciate any thoughts on troubleshooting this.  For the time being
> I've aborted our kraken update and will probably re-initialize any
> already updated OSD to revert to Jewel except perhaps one host to
> continue testing.
>
> thanks,
> Ben
>
> On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
>> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  
>> wrote:
>>> Hi all,
>>>
>>> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
>>> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
>>> mons to Kraken.
>>>
>>> After updating ceph packages I restarted the 60 OSD on the box with
>>> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
>>> load flat-lines at 100% with top showing all of that being system load
>>> from ceph-osd processes.  Not long after we get OSD flapping due to
>>> the load on the system (noout was set to start this, but perhaps
>>> too-quickly unset post restart).
>>>
>>> This is causing problems in the cluster, and we reboot the box.  The
>>> OSD don't start up/mount automatically - not a new problem on this
>>> setup.  We run 'ceph-disk activate $disk' on a list of all the
>>> /dev/dm-X devices as output by ceph-disk list.  Everything activates
>>> and the CPU gradually climbs to once again be a solid 100%.  No OSD
>>> have joined cluster so it isn't causing issues.
>>>
>>> I leave the box overnight...by the time I leave I see that 1-2 OSD on
>>> this box are marked up/in.   By morning all are in, CPU is fine,
>>> cluster is still fine.
>>>
>>> This is not a show-stopping issue now that I know what happens though
>>> it means upgrades are a several hour or overnight affair.  Next box I
>>> will just mark all the OSD out before updating and restarting them or
>>> try leaving them up but being sure to set noout to avoid flapping
>>> while they churn.
>>>
>>> Here's a log snippet from one currently spinning in the startup
>>> process since 11am.  This is the second box we did, the first
>>> experience being as detailed above.  Could this have anything to do
>>> with the 'PGs are upgrading' message?
>>
>> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
>> scale with the object count. I could be missing something but I don't
>> see anything in the upgrade notes that should be doing this either.
>> Try running an upgrade with "debug osd = 20" and "debug filestore =
>> 20" set and see what the log spits out.
>> -Greg
>>
>>>
>>> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load 
>>> lua
>>> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514119978713088, adjusting msgr requires for clients
>>> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032 was 8705, adjusting msgr requires for
>>> mons
>>> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
>>> has features 288514394856620032, adjusting msgr requires for osds
>>> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
>>> 2017-02-14 11:04:09

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-16 Thread Benjeman Meekhof
I tried starting up just a couple OSD with debug_osd = 20 and
debug_filestore = 20.

I pasted a sample of the ongoing log here.  To my eyes it doesn't look
unusual but maybe someone else sees something in here that is a
problem:  http://pastebin.com/uy8S7hps

As this log is rolling on, our OSD has still not been marked up and is
occupying 100% of a CPU core.  I've done this a couple times and in a
matter of some hours it will be marked up and CPU will drop.  If more
kraken OSD on another host are brought up the existing kraken OSD go
back into max CPU usage again while pg recover.  The trend scales
upward as OSD are started until the system is completely saturated.

I was reading the docs on async messenger settings at
http://docs.ceph.com/docs/master/rados/configuration/ms-ref/ and saw
that under 'ms async max op threads' there is a note about one or more
CPUs constantly on 100% load.  As an experiment I set max op threads
to 20 and that is the setting during the period of the pasted log.  It
seems to make no difference.

Appreciate any thoughts on troubleshooting this.  For the time being
I've aborted our kraken update and will probably re-initialize any
already updated OSD to revert to Jewel except perhaps one host to
continue testing.

thanks,
Ben

On Tue, Feb 14, 2017 at 3:55 PM, Gregory Farnum  wrote:
> On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  wrote:
>> Hi all,
>>
>> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
>> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
>> mons to Kraken.
>>
>> After updating ceph packages I restarted the 60 OSD on the box with
>> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
>> load flat-lines at 100% with top showing all of that being system load
>> from ceph-osd processes.  Not long after we get OSD flapping due to
>> the load on the system (noout was set to start this, but perhaps
>> too-quickly unset post restart).
>>
>> This is causing problems in the cluster, and we reboot the box.  The
>> OSD don't start up/mount automatically - not a new problem on this
>> setup.  We run 'ceph-disk activate $disk' on a list of all the
>> /dev/dm-X devices as output by ceph-disk list.  Everything activates
>> and the CPU gradually climbs to once again be a solid 100%.  No OSD
>> have joined cluster so it isn't causing issues.
>>
>> I leave the box overnight...by the time I leave I see that 1-2 OSD on
>> this box are marked up/in.   By morning all are in, CPU is fine,
>> cluster is still fine.
>>
>> This is not a show-stopping issue now that I know what happens though
>> it means upgrades are a several hour or overnight affair.  Next box I
>> will just mark all the OSD out before updating and restarting them or
>> try leaving them up but being sure to set noout to avoid flapping
>> while they churn.
>>
>> Here's a log snippet from one currently spinning in the startup
>> process since 11am.  This is the second box we did, the first
>> experience being as detailed above.  Could this have anything to do
>> with the 'PGs are upgrading' message?
>
> It doesn't seem likely — there's a fixed per-PG overhead that doesn't
> scale with the object count. I could be missing something but I don't
> see anything in the upgrade notes that should be doing this either.
> Try running an upgrade with "debug osd = 20" and "debug filestore =
> 20" set and see what the log spits out.
> -Greg
>
>>
>> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load 
>> lua
>> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
>> has features 288514119978713088, adjusting msgr requires for clients
>> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
>> has features 288514394856620032 was 8705, adjusting msgr requires for
>> mons
>> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
>> has features 288514394856620032, adjusting msgr requires for osds
>> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
>> 2017-02-14 11:04:09.522249 7fd7a0372940 -1 osd.585 135493 PGs are upgrading
>> 2017-02-14 11:04:10.246166 7fd7a0372940  0 osd.585 135493 load_pgs
>> opened 148 pgs
>> 2017-02-14 11:04:10.246249 7fd7a0372940  0 osd.585 135493 using 1 op
>> queue with priority op cut off at 64.
>> 2017-02-14 11:04:10.256299 7fd7a0372940 -1 osd.585 135493
>> log_to_monitors {default=true}
>> 2017-02-14 11:04:12.473450 7fd7a0372940  0 osd.585 135493 done with
>> init, starting boot process
>> (logs stop here, cpu spinning)
>>
>>
>> regards,
>> Ben
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-14 Thread Gregory Farnum
On Tue, Feb 14, 2017 at 11:38 AM, Benjeman Meekhof  wrote:
> Hi all,
>
> We encountered an issue updating our OSD from Jewel (10.2.5) to Kraken
> (11.2.0).  OS was RHEL derivative.  Prior to this we updated all the
> mons to Kraken.
>
> After updating ceph packages I restarted the 60 OSD on the box with
> 'systemctl restart ceph-osd.target'.  Very soon after the system cpu
> load flat-lines at 100% with top showing all of that being system load
> from ceph-osd processes.  Not long after we get OSD flapping due to
> the load on the system (noout was set to start this, but perhaps
> too-quickly unset post restart).
>
> This is causing problems in the cluster, and we reboot the box.  The
> OSD don't start up/mount automatically - not a new problem on this
> setup.  We run 'ceph-disk activate $disk' on a list of all the
> /dev/dm-X devices as output by ceph-disk list.  Everything activates
> and the CPU gradually climbs to once again be a solid 100%.  No OSD
> have joined cluster so it isn't causing issues.
>
> I leave the box overnight...by the time I leave I see that 1-2 OSD on
> this box are marked up/in.   By morning all are in, CPU is fine,
> cluster is still fine.
>
> This is not a show-stopping issue now that I know what happens though
> it means upgrades are a several hour or overnight affair.  Next box I
> will just mark all the OSD out before updating and restarting them or
> try leaving them up but being sure to set noout to avoid flapping
> while they churn.
>
> Here's a log snippet from one currently spinning in the startup
> process since 11am.  This is the second box we did, the first
> experience being as detailed above.  Could this have anything to do
> with the 'PGs are upgrading' message?

It doesn't seem likely — there's a fixed per-PG overhead that doesn't
scale with the object count. I could be missing something but I don't
see anything in the upgrade notes that should be doing this either.
Try running an upgrade with "debug osd = 20" and "debug filestore =
20" set and see what the log spits out.
-Greg

>
> 2017-02-14 11:04:07.028311 7fd7a0372940  0 _get_class not permitted to load 
> lua
> 2017-02-14 11:04:07.077304 7fd7a0372940  0 osd.585 135493 crush map
> has features 288514119978713088, adjusting msgr requires for clients
> 2017-02-14 11:04:07.077318 7fd7a0372940  0 osd.585 135493 crush map
> has features 288514394856620032 was 8705, adjusting msgr requires for
> mons
> 2017-02-14 11:04:07.077324 7fd7a0372940  0 osd.585 135493 crush map
> has features 288514394856620032, adjusting msgr requires for osds
> 2017-02-14 11:04:09.446832 7fd7a0372940  0 osd.585 135493 load_pgs
> 2017-02-14 11:04:09.522249 7fd7a0372940 -1 osd.585 135493 PGs are upgrading
> 2017-02-14 11:04:10.246166 7fd7a0372940  0 osd.585 135493 load_pgs
> opened 148 pgs
> 2017-02-14 11:04:10.246249 7fd7a0372940  0 osd.585 135493 using 1 op
> queue with priority op cut off at 64.
> 2017-02-14 11:04:10.256299 7fd7a0372940 -1 osd.585 135493
> log_to_monitors {default=true}
> 2017-02-14 11:04:12.473450 7fd7a0372940  0 osd.585 135493 done with
> init, starting boot process
> (logs stop here, cpu spinning)
>
>
> regards,
> Ben
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com