Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread nokia ceph
Thank jake, can you confirm are you testing this in which ceph version -
the out of memory you noticed. There is already a memory leak issue
reported in kraken v11.2.0 .  which addressed in this tracker ..
http://tracker.ceph.com/issues/18924 ..

#ceph -v

Ok so you are mounting/mapping ceph as a rbd and writing into it.

We are discussing luminous v12.0.3 issue here, I think we are all on the
same path.

Thanks
Jayaram


On Thu, Jun 8, 2017 at 8:13 PM, Jake Grimmett  wrote:

> Hi Mark / Jayaram,
>
> After running the cluster last night, I noticed lots of
> "Out Of Memory" errors in /var/log/messages, many of these correlate to
> dead OSD's. If this is the problem, this might now be another case of
> the high memory use issues reported in Kraken.
>
> e.g. my script logs:
> Thu 8 Jun 08:26:37 BST 2017  restart OSD  1
>
> and /var/log/messages states...
>
> Jun  8 08:26:35 ceph1 kernel: Out of memory: Kill process 7899
> (ceph-osd) score 113 or sacrifice child
> Jun  8 08:26:35 ceph1 kernel: Killed process 7899 (ceph-osd)
> total-vm:8569516kB, anon-rss:7518836kB, file-rss:0kB, shmem-rss:0kB
> Jun  8 08:26:36 ceph1 systemd: ceph-osd@1.service: main process exited,
> code=killed, status=9/KILL
> Jun  8 08:26:36 ceph1 systemd: Unit ceph-osd@1.service entered failed
> state.
>
> The OSD nodes have 64GB RAM, presumably enough RAM for 10 OSD's doing
> 4+1 EC ?
>
> I've added "bluestore_cache_size = 104857600" to ceph.conf, and am
> retesting. I will see if OSD problems occur, and report back.
>
> As to loading the cluster, I run an rsync job on each node, pulling data
> from an NFS mounted Isilon. A single node pulls ~200MB/s, with all 7
> nodes running, the ceph -w reports between 700 > 1500MB/s writes.
>
> as requested, here is my "restart_OSD_and_log-this.sh" script:
>
> 
> #!/bin/bash
> # catches single failed OSDs, log and restart
> while : ; do
> OSD=`ceph osd tree 2> /dev/null | grep down | \
> awk '{ print $3}' | awk -F "." '{print $2 }'`
> if [ "$OSD" != "" ] ; then
> DATE=`date`
> echo $DATE " restart OSD " $OSD  >> /root/osd_restart_log
> echo "OSD" $OSD "is down, restarting.."
> OSDHOST=`ceph osd find $OSD | grep host | awk -F '"' '{print $4}'`
> ssh $OSDHOST systemctl restart ceph-osd@$OSD
> sleep 30
> else
> echo -ne "\r\033[k"
> echo -ne "all OSD OK"
> fi
> sleep 1
> done
> 
>
> thanks again,
>
> Jake
>
> On 08/06/17 12:08, nokia ceph wrote:
> > Hello Mark,
> >
> > Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222
> >
> > Jake can you share the restart_OSD_and_log-this.sh script
> >
> > Thanks
> > Jayaram
> >
> > On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett  > > wrote:
> >
> > Hi Mark & List,
> >
> > Unfortunately, even when using yesterdays master version of ceph,
> > I'm still seeing OSDs go down, same error as before:
> >
> > OSD log shows lots of entries like this:
> >
> > (osd38)
> > 2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
> > 'tp_osd_tp thread tp_osd_tp' had timed out after 60
> >
> > (osd3)
> > 2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
> > 'tp_osd_tp thread tp_osd_tp' had timed out after 60
> > 2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165
> heartbeat_check:
> > no reply from 10.1.0.86:6811  osd.2 since
> > back 2017-06-07 17:00:19.640002
> > front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)
> >
> >
> > [root@ceph4 ceph]# ceph -v
> > ceph version 12.0.2-2399-ge38ca14
> > (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)
> >
> >
> > I'll continue running the cluster with my
> "restart_OSD_and_log-this.sh"
> > workaround...
> >
> > thanks again for your help,
> >
> > Jake
> >
> > On 06/06/17 15:52, Jake Grimmett wrote:
> > > Hi Mark,
> > >
> > > OK, I'll upgrade to the current master and retest...
> > >
> > > best,
> > >
> > > Jake
> > >
> > > On 06/06/17 15:46, Mark Nelson wrote:
> > >> Hi Jake,
> > >>
> > >> I just happened to notice this was on 12.0.3.  Would it be
> > possible to
> > >> test this out with current master and see if it still is a
> problem?
> > >>
> > >> Mark
> > >>
> > >> On 06/06/2017 09:10 AM, Mark Nelson wrote:
> > >>> Hi Jake,
> > >>>
> > >>> Thanks much.  I'm guessing at this point this is probably a
> > bug.  Would
> > >>> you (or nokiauser) mind creating a bug in the tracker with a
> short
> > >>> description of what's going on and the collectl sample showing
> > this is
> > >>> not IOs backing up on the disk?
> > >>>
> > >>> If you 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread Jake Grimmett
Hi Mark / Jayaram,

After running the cluster last night, I noticed lots of
"Out Of Memory" errors in /var/log/messages, many of these correlate to
dead OSD's. If this is the problem, this might now be another case of
the high memory use issues reported in Kraken.

e.g. my script logs:
Thu 8 Jun 08:26:37 BST 2017  restart OSD  1

and /var/log/messages states...

Jun  8 08:26:35 ceph1 kernel: Out of memory: Kill process 7899
(ceph-osd) score 113 or sacrifice child
Jun  8 08:26:35 ceph1 kernel: Killed process 7899 (ceph-osd)
total-vm:8569516kB, anon-rss:7518836kB, file-rss:0kB, shmem-rss:0kB
Jun  8 08:26:36 ceph1 systemd: ceph-osd@1.service: main process exited,
code=killed, status=9/KILL
Jun  8 08:26:36 ceph1 systemd: Unit ceph-osd@1.service entered failed state.

The OSD nodes have 64GB RAM, presumably enough RAM for 10 OSD's doing
4+1 EC ?

I've added "bluestore_cache_size = 104857600" to ceph.conf, and am
retesting. I will see if OSD problems occur, and report back.

As to loading the cluster, I run an rsync job on each node, pulling data
from an NFS mounted Isilon. A single node pulls ~200MB/s, with all 7
nodes running, the ceph -w reports between 700 > 1500MB/s writes.

as requested, here is my "restart_OSD_and_log-this.sh" script:


#!/bin/bash
# catches single failed OSDs, log and restart
while : ; do
OSD=`ceph osd tree 2> /dev/null | grep down | \
awk '{ print $3}' | awk -F "." '{print $2 }'`
if [ "$OSD" != "" ] ; then
DATE=`date`
echo $DATE " restart OSD " $OSD  >> /root/osd_restart_log
echo "OSD" $OSD "is down, restarting.."
OSDHOST=`ceph osd find $OSD | grep host | awk -F '"' '{print $4}'`
ssh $OSDHOST systemctl restart ceph-osd@$OSD
sleep 30
else
echo -ne "\r\033[k"
echo -ne "all OSD OK"
fi
sleep 1
done


thanks again,

Jake

On 08/06/17 12:08, nokia ceph wrote:
> Hello Mark,
> 
> Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222
> 
> Jake can you share the restart_OSD_and_log-this.sh script 
> 
> Thanks
> Jayaram
> 
> On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett  > wrote:
> 
> Hi Mark & List,
> 
> Unfortunately, even when using yesterdays master version of ceph,
> I'm still seeing OSDs go down, same error as before:
> 
> OSD log shows lots of entries like this:
> 
> (osd38)
> 2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
> 'tp_osd_tp thread tp_osd_tp' had timed out after 60
> 
> (osd3)
> 2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
> 'tp_osd_tp thread tp_osd_tp' had timed out after 60
> 2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check:
> no reply from 10.1.0.86:6811  osd.2 since
> back 2017-06-07 17:00:19.640002
> front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)
> 
> 
> [root@ceph4 ceph]# ceph -v
> ceph version 12.0.2-2399-ge38ca14
> (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)
> 
> 
> I'll continue running the cluster with my "restart_OSD_and_log-this.sh"
> workaround...
> 
> thanks again for your help,
> 
> Jake
> 
> On 06/06/17 15:52, Jake Grimmett wrote:
> > Hi Mark,
> >
> > OK, I'll upgrade to the current master and retest...
> >
> > best,
> >
> > Jake
> >
> > On 06/06/17 15:46, Mark Nelson wrote:
> >> Hi Jake,
> >>
> >> I just happened to notice this was on 12.0.3.  Would it be
> possible to
> >> test this out with current master and see if it still is a problem?
> >>
> >> Mark
> >>
> >> On 06/06/2017 09:10 AM, Mark Nelson wrote:
> >>> Hi Jake,
> >>>
> >>> Thanks much.  I'm guessing at this point this is probably a
> bug.  Would
> >>> you (or nokiauser) mind creating a bug in the tracker with a short
> >>> description of what's going on and the collectl sample showing
> this is
> >>> not IOs backing up on the disk?
> >>>
> >>> If you want to try it, we have a gdb based wallclock profiler
> that might
> >>> be interesting to run while it's in the process of timing out. 
> It tries
> >>> to grab 2000 samples from the osd process which typically takes
> about 10
> >>> minutes or so.  You'll need to either change the number of
> samples to be
> >>> lower in the python code (maybe like 50-100), or change the
> timeout to
> >>> be something longer.
> >>>
> >>> You can find the code here:
> >>>
> >>> https://github.com/markhpc/gdbprof
> 
> >>>
> >>> and invoke it like:
> >>>
> >>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
> >>> 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread nokia ceph
Hello Mark,

As this issue noticed while giving write via librados {C API} only , the
same can't be reproduce with rados user space utility.
Ref;- http://docs.ceph.com/docs/master/rados/api/librados/

Jack, I guess you also creating load via librados.

Thanks
Jayaram

On Thu, Jun 8, 2017 at 5:46 PM, Mark Nelson  wrote:

> Hi Jayaram,
>
> Thanks for creating a tracker entry! Any chance you could add a note about
> how you are generating the 200MB/s client workload?  I've not seen this
> problem in the lab, but any details you could give that would help us
> reproduce the problem would be much appreciated!
>
> Mark
>
> On 06/08/2017 06:08 AM, nokia ceph wrote:
>
>> Hello Mark,
>>
>> Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222
>>
>> Jake can you share the restart_OSD_and_log-this.sh script
>>
>> Thanks
>> Jayaram
>>
>> On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett > > wrote:
>>
>> Hi Mark & List,
>>
>> Unfortunately, even when using yesterdays master version of ceph,
>> I'm still seeing OSDs go down, same error as before:
>>
>> OSD log shows lots of entries like this:
>>
>> (osd38)
>> 2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
>> 'tp_osd_tp thread tp_osd_tp' had timed out after 60
>>
>> (osd3)
>> 2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
>> 'tp_osd_tp thread tp_osd_tp' had timed out after 60
>> 2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check:
>> no reply from 10.1.0.86:6811  osd.2 since
>>
>> back 2017-06-07 17:00:19.640002
>> front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)
>>
>>
>> [root@ceph4 ceph]# ceph -v
>> ceph version 12.0.2-2399-ge38ca14
>> (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)
>>
>>
>> I'll continue running the cluster with my
>> "restart_OSD_and_log-this.sh"
>> workaround...
>>
>> thanks again for your help,
>>
>> Jake
>>
>> On 06/06/17 15:52, Jake Grimmett wrote:
>> > Hi Mark,
>> >
>> > OK, I'll upgrade to the current master and retest...
>> >
>> > best,
>> >
>> > Jake
>> >
>> > On 06/06/17 15:46, Mark Nelson wrote:
>> >> Hi Jake,
>> >>
>> >> I just happened to notice this was on 12.0.3.  Would it be
>> possible to
>> >> test this out with current master and see if it still is a problem?
>> >>
>> >> Mark
>> >>
>> >> On 06/06/2017 09:10 AM, Mark Nelson wrote:
>> >>> Hi Jake,
>> >>>
>> >>> Thanks much.  I'm guessing at this point this is probably a
>> bug.  Would
>> >>> you (or nokiauser) mind creating a bug in the tracker with a short
>> >>> description of what's going on and the collectl sample showing
>> this is
>> >>> not IOs backing up on the disk?
>> >>>
>> >>> If you want to try it, we have a gdb based wallclock profiler
>> that might
>> >>> be interesting to run while it's in the process of timing out.
>> It tries
>> >>> to grab 2000 samples from the osd process which typically takes
>> about 10
>> >>> minutes or so.  You'll need to either change the number of
>> samples to be
>> >>> lower in the python code (maybe like 50-100), or change the
>> timeout to
>> >>> be something longer.
>> >>>
>> >>> You can find the code here:
>> >>>
>> >>> https://github.com/markhpc/gdbprof
>> 
>> >>>
>> >>> and invoke it like:
>> >>>
>> >>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
>> >>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
>> >>>
>> >>> where 27962 in this case is the PID of the ceph-osd process.
>> You'll
>> >>> need gdb with the python bindings and the ceph debug symbols for
>> it to
>> >>> work.
>> >>>
>> >>> This might tell us over time if the tp_osd_tp processes are just
>> sitting
>> >>> on pg::locks.
>> >>>
>> >>> Mark
>> >>>
>> >>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
>>  Hi Mark,
>> 
>>  Thanks again for looking into this problem.
>> 
>>  I ran the cluster overnight, with a script checking for dead
>> OSDs every
>>  second, and restarting them.
>> 
>>  40 OSD failures occurred in 12 hours, some OSDs failed multiple
>> times,
>>  (there are 50 OSDs in the EC tier).
>> 
>>  Unfortunately, the output of collectl doesn't appear to show any
>>  increase in disk queue depth and service times before the OSDs
>> die.
>> 
>>  I've put a couple of examples of collectl output for the disks
>>  associated with the OSDs here:
>> 
>>  https://hastebin.com/icuvotemot.scala
>> 
>> 
>> 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread Mark Nelson

Hi Jayaram,

Thanks for creating a tracker entry! Any chance you could add a note 
about how you are generating the 200MB/s client workload?  I've not seen 
this problem in the lab, but any details you could give that would help 
us reproduce the problem would be much appreciated!


Mark

On 06/08/2017 06:08 AM, nokia ceph wrote:

Hello Mark,

Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222

Jake can you share the restart_OSD_and_log-this.sh script

Thanks
Jayaram

On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett > wrote:

Hi Mark & List,

Unfortunately, even when using yesterdays master version of ceph,
I'm still seeing OSDs go down, same error as before:

OSD log shows lots of entries like this:

(osd38)
2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60

(osd3)
2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check:
no reply from 10.1.0.86:6811  osd.2 since
back 2017-06-07 17:00:19.640002
front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)


[root@ceph4 ceph]# ceph -v
ceph version 12.0.2-2399-ge38ca14
(e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)


I'll continue running the cluster with my "restart_OSD_and_log-this.sh"
workaround...

thanks again for your help,

Jake

On 06/06/17 15:52, Jake Grimmett wrote:
> Hi Mark,
>
> OK, I'll upgrade to the current master and retest...
>
> best,
>
> Jake
>
> On 06/06/17 15:46, Mark Nelson wrote:
>> Hi Jake,
>>
>> I just happened to notice this was on 12.0.3.  Would it be
possible to
>> test this out with current master and see if it still is a problem?
>>
>> Mark
>>
>> On 06/06/2017 09:10 AM, Mark Nelson wrote:
>>> Hi Jake,
>>>
>>> Thanks much.  I'm guessing at this point this is probably a
bug.  Would
>>> you (or nokiauser) mind creating a bug in the tracker with a short
>>> description of what's going on and the collectl sample showing
this is
>>> not IOs backing up on the disk?
>>>
>>> If you want to try it, we have a gdb based wallclock profiler
that might
>>> be interesting to run while it's in the process of timing out.
It tries
>>> to grab 2000 samples from the osd process which typically takes
about 10
>>> minutes or so.  You'll need to either change the number of
samples to be
>>> lower in the python code (maybe like 50-100), or change the
timeout to
>>> be something longer.
>>>
>>> You can find the code here:
>>>
>>> https://github.com/markhpc/gdbprof

>>>
>>> and invoke it like:
>>>
>>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
>>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
>>>
>>> where 27962 in this case is the PID of the ceph-osd process.  You'll
>>> need gdb with the python bindings and the ceph debug symbols for
it to
>>> work.
>>>
>>> This might tell us over time if the tp_osd_tp processes are just
sitting
>>> on pg::locks.
>>>
>>> Mark
>>>
>>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
 Hi Mark,

 Thanks again for looking into this problem.

 I ran the cluster overnight, with a script checking for dead
OSDs every
 second, and restarting them.

 40 OSD failures occurred in 12 hours, some OSDs failed multiple
times,
 (there are 50 OSDs in the EC tier).

 Unfortunately, the output of collectl doesn't appear to show any
 increase in disk queue depth and service times before the OSDs die.

 I've put a couple of examples of collectl output for the disks
 associated with the OSDs here:

 https://hastebin.com/icuvotemot.scala


 please let me know if you need more info...

 best regards,

 Jake


>
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread nokia ceph
Hello Mark,

Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222

Jake can you share the restart_OSD_and_log-this.sh script

Thanks
Jayaram

On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett  wrote:

> Hi Mark & List,
>
> Unfortunately, even when using yesterdays master version of ceph,
> I'm still seeing OSDs go down, same error as before:
>
> OSD log shows lots of entries like this:
>
> (osd38)
> 2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
> 'tp_osd_tp thread tp_osd_tp' had timed out after 60
>
> (osd3)
> 2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
> 'tp_osd_tp thread tp_osd_tp' had timed out after 60
> 2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check:
> no reply from 10.1.0.86:6811 osd.2 since back 2017-06-07 17:00:19.640002
> front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)
>
>
> [root@ceph4 ceph]# ceph -v
> ceph version 12.0.2-2399-ge38ca14
> (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)
>
>
> I'll continue running the cluster with my "restart_OSD_and_log-this.sh"
> workaround...
>
> thanks again for your help,
>
> Jake
>
> On 06/06/17 15:52, Jake Grimmett wrote:
> > Hi Mark,
> >
> > OK, I'll upgrade to the current master and retest...
> >
> > best,
> >
> > Jake
> >
> > On 06/06/17 15:46, Mark Nelson wrote:
> >> Hi Jake,
> >>
> >> I just happened to notice this was on 12.0.3.  Would it be possible to
> >> test this out with current master and see if it still is a problem?
> >>
> >> Mark
> >>
> >> On 06/06/2017 09:10 AM, Mark Nelson wrote:
> >>> Hi Jake,
> >>>
> >>> Thanks much.  I'm guessing at this point this is probably a bug.  Would
> >>> you (or nokiauser) mind creating a bug in the tracker with a short
> >>> description of what's going on and the collectl sample showing this is
> >>> not IOs backing up on the disk?
> >>>
> >>> If you want to try it, we have a gdb based wallclock profiler that
> might
> >>> be interesting to run while it's in the process of timing out.  It
> tries
> >>> to grab 2000 samples from the osd process which typically takes about
> 10
> >>> minutes or so.  You'll need to either change the number of samples to
> be
> >>> lower in the python code (maybe like 50-100), or change the timeout to
> >>> be something longer.
> >>>
> >>> You can find the code here:
> >>>
> >>> https://github.com/markhpc/gdbprof
> >>>
> >>> and invoke it like:
> >>>
> >>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
> >>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
> >>>
> >>> where 27962 in this case is the PID of the ceph-osd process.  You'll
> >>> need gdb with the python bindings and the ceph debug symbols for it to
> >>> work.
> >>>
> >>> This might tell us over time if the tp_osd_tp processes are just
> sitting
> >>> on pg::locks.
> >>>
> >>> Mark
> >>>
> >>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
>  Hi Mark,
> 
>  Thanks again for looking into this problem.
> 
>  I ran the cluster overnight, with a script checking for dead OSDs
> every
>  second, and restarting them.
> 
>  40 OSD failures occurred in 12 hours, some OSDs failed multiple times,
>  (there are 50 OSDs in the EC tier).
> 
>  Unfortunately, the output of collectl doesn't appear to show any
>  increase in disk queue depth and service times before the OSDs die.
> 
>  I've put a couple of examples of collectl output for the disks
>  associated with the OSDs here:
> 
>  https://hastebin.com/icuvotemot.scala
> 
>  please let me know if you need more info...
> 
>  best regards,
> 
>  Jake
> 
> 
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-07 Thread Jake Grimmett
Hi Mark & List,

Unfortunately, even when using yesterdays master version of ceph,
I'm still seeing OSDs go down, same error as before:

OSD log shows lots of entries like this:

(osd38)
2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60

(osd3)
2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60
2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check:
no reply from 10.1.0.86:6811 osd.2 since back 2017-06-07 17:00:19.640002
front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)


[root@ceph4 ceph]# ceph -v
ceph version 12.0.2-2399-ge38ca14
(e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)


I'll continue running the cluster with my "restart_OSD_and_log-this.sh"
workaround...

thanks again for your help,

Jake

On 06/06/17 15:52, Jake Grimmett wrote:
> Hi Mark,
> 
> OK, I'll upgrade to the current master and retest...
> 
> best,
> 
> Jake
> 
> On 06/06/17 15:46, Mark Nelson wrote:
>> Hi Jake,
>>
>> I just happened to notice this was on 12.0.3.  Would it be possible to
>> test this out with current master and see if it still is a problem?
>>
>> Mark
>>
>> On 06/06/2017 09:10 AM, Mark Nelson wrote:
>>> Hi Jake,
>>>
>>> Thanks much.  I'm guessing at this point this is probably a bug.  Would
>>> you (or nokiauser) mind creating a bug in the tracker with a short
>>> description of what's going on and the collectl sample showing this is
>>> not IOs backing up on the disk?
>>>
>>> If you want to try it, we have a gdb based wallclock profiler that might
>>> be interesting to run while it's in the process of timing out.  It tries
>>> to grab 2000 samples from the osd process which typically takes about 10
>>> minutes or so.  You'll need to either change the number of samples to be
>>> lower in the python code (maybe like 50-100), or change the timeout to
>>> be something longer.
>>>
>>> You can find the code here:
>>>
>>> https://github.com/markhpc/gdbprof
>>>
>>> and invoke it like:
>>>
>>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
>>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
>>>
>>> where 27962 in this case is the PID of the ceph-osd process.  You'll
>>> need gdb with the python bindings and the ceph debug symbols for it to
>>> work.
>>>
>>> This might tell us over time if the tp_osd_tp processes are just sitting
>>> on pg::locks.
>>>
>>> Mark
>>>
>>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
 Hi Mark,

 Thanks again for looking into this problem.

 I ran the cluster overnight, with a script checking for dead OSDs every
 second, and restarting them.

 40 OSD failures occurred in 12 hours, some OSDs failed multiple times,
 (there are 50 OSDs in the EC tier).

 Unfortunately, the output of collectl doesn't appear to show any
 increase in disk queue depth and service times before the OSDs die.

 I've put a couple of examples of collectl output for the disks
 associated with the OSDs here:

 https://hastebin.com/icuvotemot.scala

 please let me know if you need more info...

 best regards,

 Jake


> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-06 Thread Jake Grimmett
Hi Mark,

OK, I'll upgrade to the current master and retest...

best,

Jake

On 06/06/17 15:46, Mark Nelson wrote:
> Hi Jake,
> 
> I just happened to notice this was on 12.0.3.  Would it be possible to
> test this out with current master and see if it still is a problem?
> 
> Mark
> 
> On 06/06/2017 09:10 AM, Mark Nelson wrote:
>> Hi Jake,
>>
>> Thanks much.  I'm guessing at this point this is probably a bug.  Would
>> you (or nokiauser) mind creating a bug in the tracker with a short
>> description of what's going on and the collectl sample showing this is
>> not IOs backing up on the disk?
>>
>> If you want to try it, we have a gdb based wallclock profiler that might
>> be interesting to run while it's in the process of timing out.  It tries
>> to grab 2000 samples from the osd process which typically takes about 10
>> minutes or so.  You'll need to either change the number of samples to be
>> lower in the python code (maybe like 50-100), or change the timeout to
>> be something longer.
>>
>> You can find the code here:
>>
>> https://github.com/markhpc/gdbprof
>>
>> and invoke it like:
>>
>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
>>
>> where 27962 in this case is the PID of the ceph-osd process.  You'll
>> need gdb with the python bindings and the ceph debug symbols for it to
>> work.
>>
>> This might tell us over time if the tp_osd_tp processes are just sitting
>> on pg::locks.
>>
>> Mark
>>
>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
>>> Hi Mark,
>>>
>>> Thanks again for looking into this problem.
>>>
>>> I ran the cluster overnight, with a script checking for dead OSDs every
>>> second, and restarting them.
>>>
>>> 40 OSD failures occurred in 12 hours, some OSDs failed multiple times,
>>> (there are 50 OSDs in the EC tier).
>>>
>>> Unfortunately, the output of collectl doesn't appear to show any
>>> increase in disk queue depth and service times before the OSDs die.
>>>
>>> I've put a couple of examples of collectl output for the disks
>>> associated with the OSDs here:
>>>
>>> https://hastebin.com/icuvotemot.scala
>>>
>>> please let me know if you need more info...
>>>
>>> best regards,
>>>
>>> Jake
>>>
>>>


-- 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-06 Thread Mark Nelson

Hi Jake,

I just happened to notice this was on 12.0.3.  Would it be possible to 
test this out with current master and see if it still is a problem?


Mark

On 06/06/2017 09:10 AM, Mark Nelson wrote:

Hi Jake,

Thanks much.  I'm guessing at this point this is probably a bug.  Would
you (or nokiauser) mind creating a bug in the tracker with a short
description of what's going on and the collectl sample showing this is
not IOs backing up on the disk?

If you want to try it, we have a gdb based wallclock profiler that might
be interesting to run while it's in the process of timing out.  It tries
to grab 2000 samples from the osd process which typically takes about 10
minutes or so.  You'll need to either change the number of samples to be
lower in the python code (maybe like 50-100), or change the timeout to
be something longer.

You can find the code here:

https://github.com/markhpc/gdbprof

and invoke it like:

udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
./gdbprof.py' -ex 'profile begin' -ex 'quit'

where 27962 in this case is the PID of the ceph-osd process.  You'll
need gdb with the python bindings and the ceph debug symbols for it to
work.

This might tell us over time if the tp_osd_tp processes are just sitting
on pg::locks.

Mark

On 06/06/2017 05:34 AM, Jake Grimmett wrote:

Hi Mark,

Thanks again for looking into this problem.

I ran the cluster overnight, with a script checking for dead OSDs every
second, and restarting them.

40 OSD failures occurred in 12 hours, some OSDs failed multiple times,
(there are 50 OSDs in the EC tier).

Unfortunately, the output of collectl doesn't appear to show any
increase in disk queue depth and service times before the OSDs die.

I've put a couple of examples of collectl output for the disks
associated with the OSDs here:

https://hastebin.com/icuvotemot.scala

please let me know if you need more info...

best regards,

Jake



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-06 Thread Mark Nelson

Hi Jake,

Thanks much.  I'm guessing at this point this is probably a bug.  Would 
you (or nokiauser) mind creating a bug in the tracker with a short 
description of what's going on and the collectl sample showing this is 
not IOs backing up on the disk?


If you want to try it, we have a gdb based wallclock profiler that might 
be interesting to run while it's in the process of timing out.  It tries 
to grab 2000 samples from the osd process which typically takes about 10 
minutes or so.  You'll need to either change the number of samples to be 
lower in the python code (maybe like 50-100), or change the timeout to 
be something longer.


You can find the code here:

https://github.com/markhpc/gdbprof

and invoke it like:

udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source 
./gdbprof.py' -ex 'profile begin' -ex 'quit'


where 27962 in this case is the PID of the ceph-osd process.  You'll 
need gdb with the python bindings and the ceph debug symbols for it to work.


This might tell us over time if the tp_osd_tp processes are just sitting 
on pg::locks.


Mark

On 06/06/2017 05:34 AM, Jake Grimmett wrote:

Hi Mark,

Thanks again for looking into this problem.

I ran the cluster overnight, with a script checking for dead OSDs every
second, and restarting them.

40 OSD failures occurred in 12 hours, some OSDs failed multiple times,
(there are 50 OSDs in the EC tier).

Unfortunately, the output of collectl doesn't appear to show any
increase in disk queue depth and service times before the OSDs die.

I've put a couple of examples of collectl output for the disks
associated with the OSDs here:

https://hastebin.com/icuvotemot.scala

please let me know if you need more info...

best regards,

Jake



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-06 Thread Jake Grimmett
Hi Mark,

Thanks again for looking into this problem.

I ran the cluster overnight, with a script checking for dead OSDs every
second, and restarting them.

40 OSD failures occurred in 12 hours, some OSDs failed multiple times,
(there are 50 OSDs in the EC tier).

Unfortunately, the output of collectl doesn't appear to show any
increase in disk queue depth and service times before the OSDs die.

I've put a couple of examples of collectl output for the disks
associated with the OSDs here:

https://hastebin.com/icuvotemot.scala

please let me know if you need more info...

best regards,

Jake


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-02 Thread Mark Nelson
I got a chance to run this by Josh and he had a good thought.  Just to 
make sure that it's not IO backing up on the device, it probably makes 
sense to repeat the test and watch what the queue depth and service 
times look like.  I like using collectl for it:


"collectl -sD -oT"

The queue depth and service times should be on the right hand side.  If 
you run that while the test is running and can catch a log before the 
OSD goes down, we can see if the service times on the device are going 
crazy.  If not, that would probably indicate a bluestore bug.


Mark

On 06/01/2017 05:03 PM, Mark Nelson wrote:

Looking at this gdb output, it looks like all of the tp_osd_tp threads
are idling around except for three that are all waiting on a PG lock.  I
bet those sit there for 60s and eventually time out.  The kv_sync_thread
looks idle so I don't think that's it.  Thread 16 is doing
OSD::trim_maps but I don't really know what the locking semantics are
there.  Maybe Sage or Josh can chime in.

In any event, here are the tp_osd_tp threads waiting on a pg lock:

Thread 64 (Thread 0x7f5ed184e700 (LWP 3545048)):
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f5ef68f3d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7f5ef68f3c08 in __GI___pthread_mutex_lock
(mutex=0x7f5f50baf4e8) at pthread_mutex_lock.c:64
#3  0x7f5ef9c95e48 in Mutex::Lock (this=this@entry=0x7f5f50baf4d8,
no_lockdep=no_lockdep@entry=false)
at /usr/src/debug/ceph-12.0.3/src/common/Mutex.cc:110
#4  0x7f5ef97f4d63 in PG::lock (this=0x7f5f50baf000,
no_lockdep=no_lockdep@entry=false) at
/usr/src/debug/ceph-12.0.3/src/osd/PG.cc:363
#5  0x7f5ef9796751 in OSD::ShardedOpWQ::_process
(this=0x7f5f04437198, thread_index=, hb=0x7f5f05b27100)
at /usr/src/debug/ceph-12.0.3/src/osd/OSD.cc:9706
#6  0x7f5ef9cbb235 in ShardedThreadPool::shardedthreadpool_worker
(this=0x7f5f04436958, thread_index=)
at /usr/src/debug/ceph-12.0.3/src/common/WorkQueue.cc:354
#7  0x7f5ef9cbd390 in ShardedThreadPool::WorkThreadSharded::entry
(this=)
at /usr/src/debug/ceph-12.0.3/src/common/WorkQueue.h:685
#8  0x7f5ef68f1dc5 in start_thread (arg=0x7f5ed184e700) at
pthread_create.c:308
#9  0x7f5ef57e173d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 61 (Thread 0x7f5ed3051700 (LWP 3545041)):
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f5ef68f3d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7f5ef68f3c08 in __GI___pthread_mutex_lock
(mutex=0x7f5f07f844e8) at pthread_mutex_lock.c:64
#3  0x7f5ef9c95e48 in Mutex::Lock (this=this@entry=0x7f5f07f844d8,
no_lockdep=no_lockdep@entry=false)
at /usr/src/debug/ceph-12.0.3/src/common/Mutex.cc:110
#4  0x7f5ef97f4d63 in PG::lock (this=0x7f5f07f84000,
no_lockdep=no_lockdep@entry=false) at
/usr/src/debug/ceph-12.0.3/src/osd/PG.cc:363
#5  0x7f5ef9796751 in OSD::ShardedOpWQ::_process
(this=0x7f5f04437198, thread_index=, hb=0x7f5f05b26d80)
at /usr/src/debug/ceph-12.0.3/src/osd/OSD.cc:9706
#6  0x7f5ef9cbb235 in ShardedThreadPool::shardedthreadpool_worker
(this=0x7f5f04436958, thread_index=)
at /usr/src/debug/ceph-12.0.3/src/common/WorkQueue.cc:354
#7  0x7f5ef9cbd390 in ShardedThreadPool::WorkThreadSharded::entry
(this=)
at /usr/src/debug/ceph-12.0.3/src/common/WorkQueue.h:685
#8  0x7f5ef68f1dc5 in start_thread (arg=0x7f5ed3051700) at
pthread_create.c:308
#9  0x7f5ef57e173d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 57 (Thread 0x7f5ed5055700 (LWP 3545035)):
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x7f5ef68f3d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7f5ef68f3c08 in __GI___pthread_mutex_lock
(mutex=0x7f5f4f7674e8) at pthread_mutex_lock.c:64
#3  0x7f5ef9c95e48 in Mutex::Lock (this=this@entry=0x7f5f4f7674d8,
no_lockdep=no_lockdep@entry=false)
at /usr/src/debug/ceph-12.0.3/src/common/Mutex.cc:110
#4  0x7f5ef97f4d63 in PG::lock (this=0x7f5f4f767000,
no_lockdep=no_lockdep@entry=false) at
/usr/src/debug/ceph-12.0.3/src/osd/PG.cc:363
#5  0x7f5ef9796751 in OSD::ShardedOpWQ::_process
(this=0x7f5f04437198, thread_index=, hb=0x7f5f05b26900)
at /usr/src/debug/ceph-12.0.3/src/osd/OSD.cc:9706
#6  0x7f5ef9cbb235 in ShardedThreadPool::shardedthreadpool_worker
(this=0x7f5f04436958, thread_index=)
at /usr/src/debug/ceph-12.0.3/src/common/WorkQueue.cc:354
#7  0x7f5ef9cbd390 in ShardedThreadPool::WorkThreadSharded::entry
(this=)
at /usr/src/debug/ceph-12.0.3/src/common/WorkQueue.h:685
#8  0x7f5ef68f1dc5 in start_thread (arg=0x7f5ed5055700) at
pthread_create.c:308
#9  0x7f5ef57e173d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Mark

On 06/01/2017 01:31 AM, nokia ceph wrote:


Hello Mark,

Please find  the attached gdb.txt file which having 'thread apply all
bt' result.

Thanks
Jayaram






On Wed, May 31, 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-01 Thread Jake Grimmett
Hi Mark,

Firstly, many thanks for looking into this.

Jayaram appears to have a similar config to me;

v12.0.3, EC 4+1 bluestore
SciLin,7.3 - 3.10.0-514.21.1.el7.x86_64

I have 5 EC nodes (10 x 8TB ironwolf each) plus 2 nodes with replicated
NVMe (Cephfs hot tier)

I now think the Highpoint r750 rocket cards are not at fault; I swapped
the r750 on one node for LSI cards, but still had OSD errors occurring
on this node.

The OSD logs recovered are huge; typically exceeding 500MB. I've trimmed
one down to just over 500K, and put this on pasted here...

http://pasted.co/f0c49591

Hopefully this has some useful info.


One other thing, probably a red herring, is that when trying to run
"ceph pg repair" or "ceph pg deep-scrub" I get this...
"Error EACCES: access denied"

thanks again for your help,

Jake
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-31 Thread Mark Nelson



On 05/31/2017 05:21 AM, nokia ceph wrote:

+ ceph-devel ..

$ps -ef | grep 294
ceph 3539720   1 14 08:04 ?00:16:35 /usr/bin/ceph-osd -f
--cluster ceph --id 294 --setuser ceph --setgroup ceph

$gcore -o coredump-osd  3539720


$(gdb) bt
#0  0x7f5ef68f56d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f5ef9cc45ab in ceph::logging::Log::entry() ()
#2  0x7f5ef68f1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x7f5ef57e173d in clone () from /lib64/libc.so.6


Can you do a thread apply all bt instead so we can see what the 
tp_osd_tp threads were doing?  That might be the bigger clue.


Mark




*2017-05-31 10:11:51.064495 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064496 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064497 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064497 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064498 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
2017-05-31 10:11:51.957092 7f5eef88a700 -1 osd.294 1004 heartbeat_check:
no reply from 10.50.62.154:6858  osd.64 since
back 2017-05-31 09:58:53.016145 front 2017-05-31 09:58:53.016145 (cutoff
2017-05-31 10:11:31.957089)
2017-05-31 10:11:51.957114 7f5eef88a700 -1 osd.294 1004 heartbeat_check:
no reply from 10.50.62.154:6931  osd.82 since
back 2017-05-31 10:04:48.204500 front 2017-05-31 10:04:48.204500 (cutoff
2017-05-31 10:11:31.957089)
2017-05-31 10:11:51.957121 7f5eef88a700 -1 osd.294 1004 heartbeat_check:
no reply from 10.50.62.152:6929  osd.162 since
back 2017-05-31 09:57:37.451595 front 2017-05-31 09:57:37.451595 (cutoff
2017-05-31 10:11:31.957089)



Thanks
Jayaram

On Tue, May 30, 2017 at 7:33 PM, nokia ceph > wrote:

Hello Mark,

Yes this issue happens once the test/write started after 60 secs
which correspond config value -- "threadpool_default_timeout = 60 "
. Do you require the down OSD coredump to analyse tp_osd_tp state. .
Please be specify which process dump you would require to analyse.

Like ,
#gcore 

or using wallclock profiler, I'm not much aware how to use this tool.

Thanks
Jayaram

On Tue, May 30, 2017 at 6:57 PM, Mark Nelson > wrote:

On 05/30/2017 05:07 AM, nokia ceph wrote:

Hello Mark,

I can able to reproduce this problem everytime.


Ok, next question, does it happen 60s after starting the 200MB/s
load, or does it take a while?  Sounds like it's pretty random
across OSDs? I'm thinking we want to figure out what state the
tp_osd_tp threads are in when this is happening (maybe via a gdb
bt or using the wallclock profiler to gather several samples)
and then figure out via the logs what lead to the chain of
events that put it in that state.

Mark


Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 -
3.10.0-514.el7.x86_64

Tested with debug bluestore = 20...

From ceph watch
===

2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192
pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB
avail; 35167
B/s rd, 77279 kB/s wr, 726 op/s
2017-05-30 08:57:35.134604 mon.0 [INF] pgmap v15650: 8192
pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB
avail; 19206
B/s rd, 63835 kB/s wr, 579 op/s
2017-05-30 08:57:36.295086 mon.0 [INF] pgmap v15651: 8192
pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB
avail; 10999
B/s rd, 44093 kB/s wr, 379 op/s
2017-05-30 08:57:37.228422 mon.0 [INF] osd.243
10.50.62.154:6895/1842853 
> failed (4 reporters from
different
host after 20.062159 >= grace 20.00)
2017-05-30 08:57:37.234308 mon.0 [INF] osd.19
10.50.62.152:6856/1940715 
> failed (4 reporters from
different

host after 20.000234 >= grace 20.00)
2017-05-30 08:57:37.368342 mon.0 [INF] pgmap v15652: 8192
pgs: 8192
active+clean; 774 GB data, 1360 GB 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-31 Thread nokia ceph
+ ceph-devel ..

$ps -ef | grep 294
ceph 3539720   1 14 08:04 ?00:16:35 /usr/bin/ceph-osd -f
--cluster ceph --id 294 --setuser ceph --setgroup ceph

$gcore -o coredump-osd  3539720


$(gdb) bt
#0  0x7f5ef68f56d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f5ef9cc45ab in ceph::logging::Log::entry() ()
#2  0x7f5ef68f1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x7f5ef57e173d in clone () from /lib64/libc.so.6


*2017-05-31 10:11:51.064495 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064496 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064497 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064497 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-31 10:11:51.064498 7f5ef383b700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
2017-05-31 10:11:51.957092 7f5eef88a700 -1 osd.294 1004 heartbeat_check: no
reply from 10.50.62.154:6858 osd.64 since back 2017-05-31 09:58:53.016145
front 2017-05-31 09:58:53.016145 (cutoff 2017-05-31 10:11:31.957089)
2017-05-31 10:11:51.957114 7f5eef88a700 -1 osd.294 1004 heartbeat_check: no
reply from 10.50.62.154:6931 osd.82 since back 2017-05-31 10:04:48.204500
front 2017-05-31 10:04:48.204500 (cutoff 2017-05-31 10:11:31.957089)
2017-05-31 10:11:51.957121 7f5eef88a700 -1 osd.294 1004 heartbeat_check: no
reply from 10.50.62.152:6929 osd.162 since back 2017-05-31 09:57:37.451595
front 2017-05-31 09:57:37.451595 (cutoff 2017-05-31 10:11:31.957089)



Thanks
Jayaram

On Tue, May 30, 2017 at 7:33 PM, nokia ceph 
wrote:

> Hello Mark,
>
> Yes this issue happens once the test/write started after 60 secs which
> correspond config value -- "threadpool_default_timeout = 60 " . Do you
> require the down OSD coredump to analyse tp_osd_tp state. . Please be
> specify which process dump you would require to analyse.
>
> Like ,
> #gcore 
>
> or using wallclock profiler, I'm not much aware how to use this tool.
>
> Thanks
> Jayaram
>
> On Tue, May 30, 2017 at 6:57 PM, Mark Nelson  wrote:
>
>> On 05/30/2017 05:07 AM, nokia ceph wrote:
>>
>>> Hello Mark,
>>>
>>> I can able to reproduce this problem everytime.
>>>
>>
>> Ok, next question, does it happen 60s after starting the 200MB/s load, or
>> does it take a while?  Sounds like it's pretty random across OSDs? I'm
>> thinking we want to figure out what state the tp_osd_tp threads are in when
>> this is happening (maybe via a gdb bt or using the wallclock profiler to
>> gather several samples) and then figure out via the logs what lead to the
>> chain of events that put it in that state.
>>
>> Mark
>>
>>
>>> Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 -
>>> 3.10.0-514.el7.x86_64
>>>
>>> Tested with debug bluestore = 20...
>>>
>>> From ceph watch
>>> ===
>>>
>>> 2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192
>>> active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 35167
>>> B/s rd, 77279 kB/s wr, 726 op/s
>>> 2017-05-30 08:57:35.134604 mon.0 [INF] pgmap v15650: 8192 pgs: 8192
>>> active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 19206
>>> B/s rd, 63835 kB/s wr, 579 op/s
>>> 2017-05-30 08:57:36.295086 mon.0 [INF] pgmap v15651: 8192 pgs: 8192
>>> active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 10999
>>> B/s rd, 44093 kB/s wr, 379 op/s
>>> 2017-05-30 08:57:37.228422 mon.0 [INF] osd.243 10.50.62.154:6895/1842853
>>>  failed (4 reporters from different
>>> host after 20.062159 >= grace 20.00)
>>> 2017-05-30 08:57:37.234308 mon.0 [INF] osd.19 10.50.62.152:6856/1940715
>>>  failed (4 reporters from different
>>>
>>> host after 20.000234 >= grace 20.00)
>>> 2017-05-30 08:57:37.368342 mon.0 [INF] pgmap v15652: 8192 pgs: 8192
>>> active+clean; 774 GB data, 1360 GB used, 1854 TB / 1855 TB avail; 12628
>>> B/s rd, 50471 kB/s wr, 444 op/s
>>> 2017-05-30 08:57:37.506743 mon.0 [INF] osdmap e848: 340 osds: 338 up,
>>> 340 in
>>>
>>>
>>> From failed osd.19 log
>>> =
>>>
>>> ===
>>> 2017-05-30 08:57:04.155836 7f9d6c723700 10
>>> bluestore(/var/lib/ceph/osd/ceph-19) _omap_setkeys 1.1e85s4_head
>>> 4#1:a178head# = 0
>>> 2017-05-30 08:57:04.155840 7f9d6c723700 10
>>> bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d1f180 cost
>>> 3542664 (2 ios * 150 + 542664 bytes)
>>> 2017-05-30 08:57:04.155841 7f9d6c723700 20
>>> bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d1f180
>>> onodes 0x7f9da6651840 shared_blobs
>>> 2017-05-30 08:57:04.155843 7f9d6c723700 20
>>> bluestore.extentmap(0x7f9da6651930) update
>>> 4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%2fv
>>> 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread nokia ceph
Hello Mark,

Yes this issue happens once the test/write started after 60 secs which
correspond config value -- "threadpool_default_timeout = 60 " . Do you
require the down OSD coredump to analyse tp_osd_tp state. . Please be
specify which process dump you would require to analyse.

Like ,
#gcore 

or using wallclock profiler, I'm not much aware how to use this tool.

Thanks
Jayaram

On Tue, May 30, 2017 at 6:57 PM, Mark Nelson  wrote:

> On 05/30/2017 05:07 AM, nokia ceph wrote:
>
>> Hello Mark,
>>
>> I can able to reproduce this problem everytime.
>>
>
> Ok, next question, does it happen 60s after starting the 200MB/s load, or
> does it take a while?  Sounds like it's pretty random across OSDs? I'm
> thinking we want to figure out what state the tp_osd_tp threads are in when
> this is happening (maybe via a gdb bt or using the wallclock profiler to
> gather several samples) and then figure out via the logs what lead to the
> chain of events that put it in that state.
>
> Mark
>
>
>> Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 -
>> 3.10.0-514.el7.x86_64
>>
>> Tested with debug bluestore = 20...
>>
>> From ceph watch
>> ===
>>
>> 2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192
>> active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 35167
>> B/s rd, 77279 kB/s wr, 726 op/s
>> 2017-05-30 08:57:35.134604 mon.0 [INF] pgmap v15650: 8192 pgs: 8192
>> active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 19206
>> B/s rd, 63835 kB/s wr, 579 op/s
>> 2017-05-30 08:57:36.295086 mon.0 [INF] pgmap v15651: 8192 pgs: 8192
>> active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 10999
>> B/s rd, 44093 kB/s wr, 379 op/s
>> 2017-05-30 08:57:37.228422 mon.0 [INF] osd.243 10.50.62.154:6895/1842853
>>  failed (4 reporters from different
>> host after 20.062159 >= grace 20.00)
>> 2017-05-30 08:57:37.234308 mon.0 [INF] osd.19 10.50.62.152:6856/1940715
>>  failed (4 reporters from different
>>
>> host after 20.000234 >= grace 20.00)
>> 2017-05-30 08:57:37.368342 mon.0 [INF] pgmap v15652: 8192 pgs: 8192
>> active+clean; 774 GB data, 1360 GB used, 1854 TB / 1855 TB avail; 12628
>> B/s rd, 50471 kB/s wr, 444 op/s
>> 2017-05-30 08:57:37.506743 mon.0 [INF] osdmap e848: 340 osds: 338 up, 340
>> in
>>
>>
>> From failed osd.19 log
>> =
>>
>> ===
>> 2017-05-30 08:57:04.155836 7f9d6c723700 10
>> bluestore(/var/lib/ceph/osd/ceph-19) _omap_setkeys 1.1e85s4_head
>> 4#1:a178head# = 0
>> 2017-05-30 08:57:04.155840 7f9d6c723700 10
>> bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d1f180 cost
>> 3542664 (2 ios * 150 + 542664 bytes)
>> 2017-05-30 08:57:04.155841 7f9d6c723700 20
>> bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d1f180
>> onodes 0x7f9da6651840 shared_blobs
>> 2017-05-30 08:57:04.155843 7f9d6c723700 20
>> bluestore.extentmap(0x7f9da6651930) update
>> 4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%
>> 2fvar266%2f0%2fseg0015.ts:head#
>> 2017-05-30 08:57:04.155852 7f9d6c723700 20
>> bluestore.extentmap(0x7f9da6651930) update  inline shard 607 bytes from
>> 2 extents
>> 2017-05-30 08:57:04.155858 7f9d6c723700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)   onode
>> 4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%
>> 2fvar266%2f0%2fseg0015.ts:head#
>> is 1348 (735 bytes onode + 2 bytes spanning blobs + 611 bytes inline
>> extents)
>> 2017-05-30 08:57:04.155876 7f9d6c723700 20
>> bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d1f180
>> allocated 0x[388af1~9] released 0x[]
>> 2017-05-30 08:57:04.155880 7f9d6c723700 10 freelist allocate
>> 0x388af1~9
>> 2017-05-30 08:57:04.155882 7f9d6c723700 20 freelist _xor first_key
>> 0x388af0 last_key 0x388af8
>> 2017-05-30 08:57:07.698556 7f9d6a71f700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)
>> statfsstore_statfs(0x574506f/0x5753b99, stored
>> 0xa389a531/0xa70e, compress 0x0/0x0/0x0)
>> 2017-05-30 08:57:08.201089 7f9d6a71f700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)
>> statfsstore_statfs(0x574506f/0x5753b99, stored
>> 0xa389a531/0xa70e, compress 0x0/0x0/0x0)
>> 2017-05-30 08:57:09.903290 7f9d6a71f700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)
>> statfsstore_statfs(0x574506f/0x5753b99, stored
>> 0xa389a531/0xa70e, compress 0x0/0x0/0x0)
>> 2017-05-30 08:57:11.605590 7f9d6a71f700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)
>> statfsstore_statfs(0x574506f/0x5753b99, stored
>> 0xa389a531/0xa70e, compress 0x0/0x0/0x0)
>> 2017-05-30 08:57:15.707908 7f9d6a71f700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)
>> statfsstore_statfs(0x574506f/0x5753b99, stored
>> 0xa389a531/0xa70e, compress 0x0/0x0/0x0)
>> 2017-05-30 08:57:17.410365 7f9d6a71f700 20
>> bluestore(/var/lib/ceph/osd/ceph-19)
>> statfsstore_statfs(0x574506f/0x5753b99, stored
>> 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread Mark Nelson

On 05/30/2017 05:07 AM, nokia ceph wrote:

Hello Mark,

I can able to reproduce this problem everytime.


Ok, next question, does it happen 60s after starting the 200MB/s load, 
or does it take a while?  Sounds like it's pretty random across OSDs? 
I'm thinking we want to figure out what state the tp_osd_tp threads are 
in when this is happening (maybe via a gdb bt or using the wallclock 
profiler to gather several samples) and then figure out via the logs 
what lead to the chain of events that put it in that state.


Mark



Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64

Tested with debug bluestore = 20...

From ceph watch
===

2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 35167
B/s rd, 77279 kB/s wr, 726 op/s
2017-05-30 08:57:35.134604 mon.0 [INF] pgmap v15650: 8192 pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 19206
B/s rd, 63835 kB/s wr, 579 op/s
2017-05-30 08:57:36.295086 mon.0 [INF] pgmap v15651: 8192 pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 10999
B/s rd, 44093 kB/s wr, 379 op/s
2017-05-30 08:57:37.228422 mon.0 [INF] osd.243 10.50.62.154:6895/1842853
 failed (4 reporters from different
host after 20.062159 >= grace 20.00)
2017-05-30 08:57:37.234308 mon.0 [INF] osd.19 10.50.62.152:6856/1940715
 failed (4 reporters from different
host after 20.000234 >= grace 20.00)
2017-05-30 08:57:37.368342 mon.0 [INF] pgmap v15652: 8192 pgs: 8192
active+clean; 774 GB data, 1360 GB used, 1854 TB / 1855 TB avail; 12628
B/s rd, 50471 kB/s wr, 444 op/s
2017-05-30 08:57:37.506743 mon.0 [INF] osdmap e848: 340 osds: 338 up, 340 in


From failed osd.19 log
=

===
2017-05-30 08:57:04.155836 7f9d6c723700 10
bluestore(/var/lib/ceph/osd/ceph-19) _omap_setkeys 1.1e85s4_head
4#1:a178head# = 0
2017-05-30 08:57:04.155840 7f9d6c723700 10
bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d1f180 cost
3542664 (2 ios * 150 + 542664 bytes)
2017-05-30 08:57:04.155841 7f9d6c723700 20
bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d1f180
onodes 0x7f9da6651840 shared_blobs
2017-05-30 08:57:04.155843 7f9d6c723700 20
bluestore.extentmap(0x7f9da6651930) update
4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%2fvar266%2f0%2fseg0015.ts:head#
2017-05-30 08:57:04.155852 7f9d6c723700 20
bluestore.extentmap(0x7f9da6651930) update  inline shard 607 bytes from
2 extents
2017-05-30 08:57:04.155858 7f9d6c723700 20
bluestore(/var/lib/ceph/osd/ceph-19)   onode
4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%2fvar266%2f0%2fseg0015.ts:head#
is 1348 (735 bytes onode + 2 bytes spanning blobs + 611 bytes inline
extents)
2017-05-30 08:57:04.155876 7f9d6c723700 20
bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d1f180
allocated 0x[388af1~9] released 0x[]
2017-05-30 08:57:04.155880 7f9d6c723700 10 freelist allocate
0x388af1~9
2017-05-30 08:57:04.155882 7f9d6c723700 20 freelist _xor first_key
0x388af0 last_key 0x388af8
2017-05-30 08:57:07.698556 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:08.201089 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:09.903290 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:11.605590 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:15.707908 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:17.410365 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:18.512773 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
*2017-05-30 08:57:20.005812 7f9d8f712700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-30 08:57:20.006648 7f9d8ff13700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-30 08:57:20.015585 7f9d8f712700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-30 08:57:20.015677 7f9d8f712700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*
*
**

2017-05-30 08:57:37.506457 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread nokia ceph
Hello Mark,

I can able to reproduce this problem everytime.

Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64

Tested with debug bluestore = 20...

>From ceph watch
===

2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 35167 B/s
rd, 77279 kB/s wr, 726 op/s
2017-05-30 08:57:35.134604 mon.0 [INF] pgmap v15650: 8192 pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 19206 B/s
rd, 63835 kB/s wr, 579 op/s
2017-05-30 08:57:36.295086 mon.0 [INF] pgmap v15651: 8192 pgs: 8192
active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 10999 B/s
rd, 44093 kB/s wr, 379 op/s
2017-05-30 08:57:37.228422 mon.0 [INF] osd.243 10.50.62.154:6895/1842853
failed (4 reporters from different host after 20.062159 >= grace 20.00)
2017-05-30 08:57:37.234308 mon.0 [INF] osd.19 10.50.62.152:6856/1940715
failed (4 reporters from different host after 20.000234 >= grace 20.00)
2017-05-30 08:57:37.368342 mon.0 [INF] pgmap v15652: 8192 pgs: 8192
active+clean; 774 GB data, 1360 GB used, 1854 TB / 1855 TB avail; 12628 B/s
rd, 50471 kB/s wr, 444 op/s
2017-05-30 08:57:37.506743 mon.0 [INF] osdmap e848: 340 osds: 338 up, 340 in


>From failed osd.19 log
=

===
2017-05-30 08:57:04.155836 7f9d6c723700 10
bluestore(/var/lib/ceph/osd/ceph-19) _omap_setkeys 1.1e85s4_head
4#1:a178head# = 0
2017-05-30 08:57:04.155840 7f9d6c723700 10
bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d1f180 cost
3542664 (2 ios * 150 + 542664 bytes)
2017-05-30 08:57:04.155841 7f9d6c723700 20
bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d1f180
onodes 0x7f9da6651840 shared_blobs
2017-05-30 08:57:04.155843 7f9d6c723700 20
bluestore.extentmap(0x7f9da6651930) update
4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%2fvar266%2f0%2fseg0015.ts:head#
2017-05-30 08:57:04.155852 7f9d6c723700 20
bluestore.extentmap(0x7f9da6651930) update  inline shard 607 bytes from 2
extents
2017-05-30 08:57:04.155858 7f9d6c723700 20
bluestore(/var/lib/ceph/osd/ceph-19)   onode
4#1:a179feb3:::%2fc20%2fvx033%2fpot%2fchannel1%2fhls%2fvar266%2f0%2fseg0015.ts:head#
is 1348 (735 bytes onode + 2 bytes spanning blobs + 611 bytes inline
extents)
2017-05-30 08:57:04.155876 7f9d6c723700 20
bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d1f180
allocated 0x[388af1~9] released 0x[]
2017-05-30 08:57:04.155880 7f9d6c723700 10 freelist allocate
0x388af1~9
2017-05-30 08:57:04.155882 7f9d6c723700 20 freelist _xor first_key
0x388af0 last_key 0x388af8
2017-05-30 08:57:07.698556 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:08.201089 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:09.903290 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:11.605590 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:15.707908 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:17.410365 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
2017-05-30 08:57:18.512773 7f9d6a71f700 20
bluestore(/var/lib/ceph/osd/ceph-19)
statfsstore_statfs(0x574506f/0x5753b99, stored
0xa389a531/0xa70e, compress 0x0/0x0/0x0)
*2017-05-30 08:57:20.005812 7f9d8f712700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-30 08:57:20.006648 7f9d8ff13700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-30 08:57:20.015585 7f9d8f712700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*
*2017-05-30 08:57:20.015677 7f9d8f712700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60*

**

2017-05-30 08:57:37.506457 7f9d875fd700 20
bluestore(/var/lib/ceph/osd/ceph-19) _do_write perform garbage collection,
expected benefit = 0 AUs
2017-05-30 08:57:37.506459 7f9d875fd700 20
bluestore.extentmap(0x7f9da0ed50b0) dirty_range mark inline shard dirty
2017-05-30 08:57:37.506461 7f9d875fd700 10
bluestore(/var/lib/ceph/osd/ceph-19) _write meta
#-1:7b3f43c4:::osd_superblock:0# 0x0~1f8 = 0
2017-05-30 08:57:37.506463 7f9d875fd700 10
bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d22680 cost
3104677 (2 ios * 150 + 104677 bytes)
2017-05-30 08:57:37.506479 

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-28 Thread Mark Nelson

Hi Jake,

200MB/s is pretty low load across 5 servers.  I wouldn't expect the 
tp_osd_tp threads to be that heavily loaded that it's not responding for 
60s.  Sounds like a bug.  Can you reproduce it?  It might be worth 
trying it with debug bluestore = 20.


Mark

On 05/27/2017 05:02 AM, Jake Grimmett wrote:

Dear All,

I wonder if anyone can give advice regarding bluestore OSD's going down on
Luminous 12.0.3 when the cluster is under moderate (200MB/s) load,

OSD's seem to do down randomly across the 5 OSD servers. When one OSD is
down, load decreases, so no further OSD's drop, until I restart the OSD,
then another fails.

There are no obvious disk errors seen in /var/log/messages.

Here though is part of a ceph-osd.46.log...

2017-05-27 10:42:28.781821 7f7c503b4700  0 log_channel(cluster) log [WRN]
: slow request 120.247733 seconds old, received at 2017-05-27
10:40:28.534048: osd_op(osd.52.1251:8732 1.3ffs0 1.a8ec73ff (undecoded)
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected
e1342) currently queued_for_pg
2017-05-27 10:42:33.748109 7f7c52bf1700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60

- these two errors repeat, with more of the 'tp_osd_tp thread tp_osd_tp'
errors.

Hopefully this is not due to the highpoint rocket r750 cards in my OSD
servers (the OSD servers are all 45drive.com storinators)

Other info - each node has 64GB ram, 10 x 8TB Ironwolf drive, 10Gb Intel
nic, single E5-2620 v4

Any advice gratefully received!

thanks,

Jake


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-27 Thread Jake Grimmett
Dear All,

I wonder if anyone can give advice regarding bluestore OSD's going down on
Luminous 12.0.3 when the cluster is under moderate (200MB/s) load,

OSD's seem to do down randomly across the 5 OSD servers. When one OSD is
down, load decreases, so no further OSD's drop, until I restart the OSD,
then another fails.

There are no obvious disk errors seen in /var/log/messages.

Here though is part of a ceph-osd.46.log...

2017-05-27 10:42:28.781821 7f7c503b4700  0 log_channel(cluster) log [WRN]
: slow request 120.247733 seconds old, received at 2017-05-27
10:40:28.534048: osd_op(osd.52.1251:8732 1.3ffs0 1.a8ec73ff (undecoded)
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected
e1342) currently queued_for_pg
2017-05-27 10:42:33.748109 7f7c52bf1700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60

- these two errors repeat, with more of the 'tp_osd_tp thread tp_osd_tp'
errors.

Hopefully this is not due to the highpoint rocket r750 cards in my OSD
servers (the OSD servers are all 45drive.com storinators)

Other info - each node has 64GB ram, 10 x 8TB Ironwolf drive, 10Gb Intel
nic, single E5-2620 v4

Any advice gratefully received!

thanks,

Jake
-- 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com