Re: osd crash during resync

2012-01-26 Thread Martin Mailand

Hi Sage,
I uploaded the osd.0 log as well.

http://85.214.49.87/ceph/20120124/osd.0.log.bz2

-martin

Am 25.01.2012 23:08, schrieb Sage Weil:

Hi Martin,

On Tue, 24 Jan 2012, Martin Mailand wrote:

Hi,
today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
crashed.
I am not sure, if the crashes happened because I played with osd.0, or if they
are bugs.


osd.2
-rw---  1 root root 1.1G 2012-01-24 12:19
core-ceph-osd-1000-1327403927-s-brick-002

log:
2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
osd.0 from (185'113859,185'113859] 0//0 to 196'114038
osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t, bool)',
in thread '7f1fdab26700'
osd/PG.cc: 1553: FAILED assert(recovery_ops_active  0)

-rw---  1 root root 758M 2012-01-24 15:58
core-ceph-osd-20755-1327417128-s-brick-002


Can you post the log for osd.0 too?

Thanks!
sage





log:
2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
r=0 lpr=376 rops=1 mlcod 202'286160 active m=6]  * oi-watcher: client.4478
cookie=1
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
'7fe26fdca700'
osd/ReplicatedPG.cc: 3199: FAILED assert(obc-watchers.size() == 0)
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
'7fe26fdca700'

http://85.214.49.87/ceph/20120124/osd.2.log.bz2



osd.3
-rw---  1 root root 986M 2012-01-24 12:24
core-ceph-osd-962-1327404263-s-brick-003

log:
2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction,
std::listContext*, std::mapint, std::mappg_t, PG::Query  ,
std::mapint, MOSDPGInfo**)', in thread '7f30c8fde700'

http://85.214.49.87/ceph/20120124/osd.3.log.bz2



-martin


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd crash during resync

2012-01-25 Thread Sage Weil
Hi Martin,

On Tue, 24 Jan 2012, Martin Mailand wrote:
 Hi,
 today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
 osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
 osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
 crashed.
 I am not sure, if the crashes happened because I played with osd.0, or if they
 are bugs.
 
 
 osd.2
 -rw---  1 root root 1.1G 2012-01-24 12:19
 core-ceph-osd-1000-1327403927-s-brick-002
 
 log:
 2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
 osd.0 from (185'113859,185'113859] 0//0 to 196'114038
 osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t, bool)',
 in thread '7f1fdab26700'
 osd/PG.cc: 1553: FAILED assert(recovery_ops_active  0)
 
 -rw---  1 root root 758M 2012-01-24 15:58
 core-ceph-osd-20755-1327417128-s-brick-002

Can you post the log for osd.0 too?

Thanks!
sage



 
 log:
 2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
 r=0 lpr=376 rops=1 mlcod 202'286160 active m=6]  * oi-watcher: client.4478
 cookie=1
 osd/ReplicatedPG.cc: In function 'void
 ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
 '7fe26fdca700'
 osd/ReplicatedPG.cc: 3199: FAILED assert(obc-watchers.size() == 0)
 osd/ReplicatedPG.cc: In function 'void
 ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
 '7fe26fdca700'
 
 http://85.214.49.87/ceph/20120124/osd.2.log.bz2
 
 
 
 osd.3
 -rw---  1 root root 986M 2012-01-24 12:24
 core-ceph-osd-962-1327404263-s-brick-003
 
 log:
 2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
 on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
 2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
 osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction,
 std::listContext*, std::mapint, std::mappg_t, PG::Query ,
 std::mapint, MOSDPGInfo**)', in thread '7f30c8fde700'
 
 http://85.214.49.87/ceph/20120124/osd.3.log.bz2
 
 
 
 -martin
 
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd crash during resync

2012-01-24 Thread Gregory Farnum
On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand mar...@tuxadero.com wrote:
 Hi,
 today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
 rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
 I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
 osd.3 crashed.
 I am not sure, if the crashes happened because I played with osd.0, or if
 they are bugs.

These are OSD-level issues not caused by btrfs, so your new kernel
definitely didn't do it. It's probably fallout from the backfill
changes that got merged in last week. I created new bugs to track
them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
Josh are going wild on some other issues that we've turned up and
these have been added to the queue as soon as somebody qualified can
get to them. :)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd crash during resync

2012-01-24 Thread Martin Mailand

Hi Greg,
ok, do you guys still need the core files, or could I delete them?

-martin

Am 24.01.2012 22:13, schrieb Gregory Farnum:

On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailandmar...@tuxadero.com  wrote:

Hi,
today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
osd.3 crashed.
I am not sure, if the crashes happened because I played with osd.0, or if
they are bugs.


These are OSD-level issues not caused by btrfs, so your new kernel
definitely didn't do it. It's probably fallout from the backfill
changes that got merged in last week. I created new bugs to track
them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
Josh are going wild on some other issues that we've turned up and
these have been added to the queue as soon as somebody qualified can
get to them. :)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd crash during resync

2012-01-24 Thread Gregory Farnum
On Tue, Jan 24, 2012 at 1:22 PM, Martin Mailand mar...@tuxadero.com wrote:
 Hi Greg,
 ok, do you guys still need the core files, or could I delete them?

Sam thinks probably not since we have the backtraces and the
logs...thanks for asking, though! :)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html