Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-12 Thread Jasper Siero
Hello Greg,

The specific PG was always deep scrubbing (ceph pg dump all showed the last 
deep scrub of this PG was in august) but now when I look at it again the deep 
scrub is finished en everything is healthy. Maybe it is solved because the mds 
is running fine now and it unlocked something.

The problem is solved now :)

Thanks!

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: dinsdag 11 november 2014 19:19
Aan: Jasper Siero
CC: ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Tue, Nov 11, 2014 at 5:06 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 No problem thanks for helping.
 I don't want to disable the deep scrubbing process itself because its very 
 useful but one placement group (3.30) is continuously deep scrubbing and it 
 should finish after some time but it won't.

Hmm, how are you determining that this one PG won't stop scrubbing?
This doesn't sound like any issues familiar to me.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-11 Thread Jasper Siero
No problem thanks for helping. 
I don't want to disable the deep scrubbing process itself because its very 
useful but one placement group (3.30) is continuously deep scrubbing and it 
should finish after some time but it won't.

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: maandag 10 november 2014 18:24
Aan: Jasper Siero
CC: ceph-users; John Spray
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

It's supposed to do that; deep scrubbing is an ongoing
consistency-check mechanism. If you really want to disable it you can
set an osdmap flag to prevent it, but you'll have to check the docs
for exactly what that is as I can't recall.
Glad things are working for you; sorry it took so long!
-Greg

On Mon, Nov 10, 2014 at 8:49 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello John and Greg,

 I used the new patch and now the undump succeeded and the mds is working fine 
 and I can mount cephfs again!

 I still have one placement group which keeps deep scrubbing even after 
 restarting the ceph cluster:
 dumped all in format plain
 3.300   0   0   0   0   0   0   
 active+clean+scrubbing+deep 2014-11-10 17:21:15.866965  0'0 
 2414:418[1,9]   1   [1,9]   1   631'34632014-08-21 
 15:14:45.430926  602'31312014-08-18 15:14:37.494913

 I there a way to solve this?

 Kind regards,

 Jasper
 
 Van: Gregory Farnum [g...@gregs42.com]
 Verzonden: vrijdag 7 november 2014 22:42
 Aan: Jasper Siero
 CC: ceph-users; John Spray
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

 On Thu, Nov 6, 2014 at 11:49 AM, John Spray john.sp...@redhat.com wrote:
 This is still an issue on master, so a fix will be coming soon.
 Follow the ticket for updates:
 http://tracker.ceph.com/issues/10025

 Thanks for finding the bug!

 John is off for a vacation, but he pushed a branch wip-10025-firefly
 that if you install that (similar address to the other one) should
 work for you. You'll need to reset and undump again (I presume you
 still have the journal-as-a-file). I'll be merging them in to the
 stable branches pretty shortly as well.
 -Greg


 John

 On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote:
 Jasper,

 Thanks for this -- I've reproduced this issue in a development
 environment.  We'll see if this is also an issue on giant, and
 backport a fix if appropriate.  I'll update this thread soon.

 Cheers,
 John

 On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens 
 Gregory Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process 
 again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c 
 /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 
 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-10 Thread Jasper Siero
Hello Greg and John,

Thanks for solving the bug. I will compile the patch and make new rpm packages 
and test it on the Ceph cluster. I will let you know what the results are.

Kind regards,

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: vrijdag 7 november 2014 22:42
Aan: Jasper Siero
CC: ceph-users; John Spray
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Thu, Nov 6, 2014 at 11:49 AM, John Spray john.sp...@redhat.com wrote:
 This is still an issue on master, so a fix will be coming soon.
 Follow the ticket for updates:
 http://tracker.ceph.com/issues/10025

 Thanks for finding the bug!

John is off for a vacation, but he pushed a branch wip-10025-firefly
that if you install that (similar address to the other one) should
work for you. You'll need to reset and undump again (I presume you
still have the journal-as-a-file). I'll be merging them in to the
stable branches pretty shortly as well.
-Greg


 John

 On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote:
 Jasper,

 Thanks for this -- I've reproduced this issue in a development
 environment.  We'll see if this is also an issue on giant, and
 backport a fix if appropriate.  I'll update this thread soon.

 Cheers,
 John

 On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
 Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process 
 again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c 
 /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 
 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing 9552529629~1048576
  writing 9553578205~1048576
  writing 9554626781~1048576
  writing 9555675357~1048576
  writing 9556723933~1048576
  writing 9557772509~1048576
  writing 9558821085~1048576
  writing 9559869661~1048576
  writing 9560918237~1048576
  writing 9561966813~1048576
  writing 9563015389~1048576
  writing 9564063965~1048576
  writing 9565112541~1048576
  writing 9566161117~1048576

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-10 Thread Jasper Siero
Hello John and Greg,

I used the new patch and now the undump succeeded and the mds is working fine 
and I can mount cephfs again!

I still have one placement group which keeps deep scrubbing even after 
restarting the ceph cluster:
dumped all in format plain
3.300   0   0   0   0   0   0   
active+clean+scrubbing+deep 2014-11-10 17:21:15.866965  0'0 
2414:418[1,9]   1   [1,9]   1   631'34632014-08-21 
15:14:45.430926  602'31312014-08-18 15:14:37.494913

I there a way to solve this?

Kind regards,

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: vrijdag 7 november 2014 22:42
Aan: Jasper Siero
CC: ceph-users; John Spray
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Thu, Nov 6, 2014 at 11:49 AM, John Spray john.sp...@redhat.com wrote:
 This is still an issue on master, so a fix will be coming soon.
 Follow the ticket for updates:
 http://tracker.ceph.com/issues/10025

 Thanks for finding the bug!

John is off for a vacation, but he pushed a branch wip-10025-firefly
that if you install that (similar address to the other one) should
work for you. You'll need to reset and undump again (I presume you
still have the journal-as-a-file). I'll be merging them in to the
stable branches pretty shortly as well.
-Greg


 John

 On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote:
 Jasper,

 Thanks for this -- I've reproduced this issue in a development
 environment.  We'll see if this is also an issue on giant, and
 backport a fix if appropriate.  I'll update this thread soon.

 Cheers,
 John

 On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I saw that the site of the previous link of the logs uses a very short 
 expiring time so I uploaded it to another one:

 http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

 Thanks,

 Jasper

 
 Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
 Farnum [gfar...@redhat.com]
 Verzonden: donderdag 30 oktober 2014 1:03
 Aan: Jasper Siero
 CC: John Spray; ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process 
 again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c 
 /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 
 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-03 Thread Jasper Siero
Hello Greg,

I saw that the site of the previous link of the logs uses a very short expiring 
time so I uploaded it to another one:

http://www.mediafire.com/download/gikiy7cqs42cllt/ceph-mds.th1-mon001.log.tar.gz

Thanks,

Jasper


Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
Farnum [gfar...@redhat.com]
Verzonden: donderdag 30 oktober 2014 1:03
Aan: Jasper Siero
CC: John Spray; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
 --cluster ceph --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing 9552529629~1048576
  writing 9553578205~1048576
  writing 9554626781~1048576
  writing 9555675357~1048576
  writing 9556723933~1048576
  writing 9557772509~1048576
  writing 9558821085~1048576
  writing 9559869661~1048576
  writing 9560918237~1048576
  writing 9561966813~1048576
  writing 9563015389~1048576
  writing 9564063965~1048576
  writing 9565112541~1048576
  writing 9566161117~1048576
  writing 9567209693~1048576
  writing 9568258269~1048576
  writing 9569306845~1048576
  writing 9570355421~1048576
  writing 9571403997~1048576
  writing 9572452573~1048576
  writing 9573501149~1048576
  writing 9574549725~1048576
  writing 9575598301~1048576
  writing 9576646877~1048576
  writing 9577695453~1048576
  writing 9578744029~1048576
  writing 9579792605~1048576
  writing 9580841181~1048576
  writing 9581889757~1048576
  writing 9582938333~1048576
  writing 9583986909~1048576
  writing 9585035485~1048576
  writing 9586084061~1048576
  writing 9587132637~1048576
  writing 9588181213~1048576
  writing 9589229789~1048576
  writing 9590278365~1048576
  writing 9591326941~1048576
  writing 9592375517~1048576
  writing 9593424093~1048576
  writing 9594472669~1048576
  writing 9595521245~1048576
  writing 9596569821~1048576
  writing 9597618397~1048576
  writing 9598666973~1048576
  writing 9599715549~1048576
  writing 9600764125~1048576
  writing 9601812701~1048576
  writing 9602861277~1048576
  writing 9603909853~1048576
  writing 9604958429~1048576
  writing 9606007005~1048576
  writing 9607055581~1048576
  writing 9608104157~1048576
  writing 9609152733~1048576
  writing 9610201309~1048576
  writing 9611249885~1048576
  writing 9612298461~1048576
  writing 9613347037~1048576
  writing 9614395613~1048576
  writing 9615444189~1048576
  writing 9616492765~1044159
 done.
 [root

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-30 Thread Jasper Siero
Hello Greg,

You are right I missed a comment before [mds] in ceph.conf. :-)
The new log file can be downloaded below because its to big to send:

http://expirebox.com/download/1bdbc2c1b71c784da2bcd0a28e3cdf97.html

Thanks,

Jasper

Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
Farnum [gfar...@redhat.com]
Verzonden: donderdag 30 oktober 2014 1:03
Aan: Jasper Siero
CC: John Spray; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Wed, Oct 29, 2014 at 7:51 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I added the debug options which you mentioned and started the process again:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134233517
 new journal start will be 9621733376 (4176246 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
 --cluster ceph --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
  writing 9483323613~1048576
  writing 9484372189~1048576
  writing 9485420765~1048576
  writing 9486469341~1048576
  writing 9487517917~1048576
  writing 9488566493~1048576
  writing 9489615069~1048576
  writing 9490663645~1048576
  writing 9491712221~1048576
  writing 9492760797~1048576
  writing 9493809373~1048576
  writing 9494857949~1048576
  writing 9495906525~1048576
  writing 9496955101~1048576
  writing 9498003677~1048576
  writing 9499052253~1048576
  writing 9500100829~1048576
  writing 9501149405~1048576
  writing 9502197981~1048576
  writing 9503246557~1048576
  writing 9504295133~1048576
  writing 9505343709~1048576
  writing 9506392285~1048576
  writing 9507440861~1048576
  writing 9508489437~1048576
  writing 9509538013~1048576
  writing 9510586589~1048576
  writing 9511635165~1048576
  writing 9512683741~1048576
  writing 9513732317~1048576
  writing 9514780893~1048576
  writing 9515829469~1048576
  writing 9516878045~1048576
  writing 9517926621~1048576
  writing 9518975197~1048576
  writing 9520023773~1048576
  writing 9521072349~1048576
  writing 9522120925~1048576
  writing 9523169501~1048576
  writing 9524218077~1048576
  writing 9525266653~1048576
  writing 9526315229~1048576
  writing 9527363805~1048576
  writing 9528412381~1048576
  writing 9529460957~1048576
  writing 9530509533~1048576
  writing 9531558109~1048576
  writing 9532606685~1048576
  writing 9533655261~1048576
  writing 9534703837~1048576
  writing 9535752413~1048576
  writing 9536800989~1048576
  writing 9537849565~1048576
  writing 9538898141~1048576
  writing 9539946717~1048576
  writing 9540995293~1048576
  writing 9542043869~1048576
  writing 9543092445~1048576
  writing 9544141021~1048576
  writing 9545189597~1048576
  writing 9546238173~1048576
  writing 9547286749~1048576
  writing 9548335325~1048576
  writing 9549383901~1048576
  writing 9550432477~1048576
  writing 9551481053~1048576
  writing 9552529629~1048576
  writing 9553578205~1048576
  writing 9554626781~1048576
  writing 9555675357~1048576
  writing 9556723933~1048576
  writing 9557772509~1048576
  writing 9558821085~1048576
  writing 9559869661~1048576
  writing 9560918237~1048576
  writing 9561966813~1048576
  writing 9563015389~1048576
  writing 9564063965~1048576
  writing 9565112541~1048576
  writing 9566161117~1048576
  writing 9567209693~1048576
  writing 9568258269~1048576
  writing 9569306845~1048576
  writing 9570355421~1048576
  writing 9571403997~1048576
  writing 9572452573~1048576
  writing 9573501149~1048576
  writing 9574549725~1048576
  writing 9575598301~1048576
  writing 9576646877~1048576
  writing 9577695453~1048576
  writing 9578744029~1048576
  writing 9579792605~1048576
  writing 9580841181~1048576
  writing 9581889757~1048576
  writing 9582938333~1048576
  writing 9583986909~1048576
  writing 9585035485~1048576
  writing 9586084061~1048576
  writing 9587132637~1048576
  writing 9588181213~1048576
  writing 9589229789~1048576
  writing 9590278365~1048576
  writing 9591326941~1048576
  writing 9592375517~1048576
  writing 9593424093~1048576
  writing 9594472669~1048576
  writing 9595521245~1048576
  writing 9596569821~1048576
  writing 9597618397~1048576
  writing 9598666973~1048576
  writing 9599715549~1048576
  writing 9600764125~1048576
  writing 9601812701~1048576
  writing 9602861277~1048576
  writing 9603909853~1048576
  writing 9604958429~1048576
  writing 9606007005~1048576
  writing 9607055581~1048576
  writing 9608104157~1048576
  writing 9609152733~1048576
  writing 9610201309~1048576
  writing 9611249885~1048576
  writing 9612298461~1048576
  writing 9613347037~1048576
  writing 9614395613~1048576
  writing 9615444189~1048576
  writing 9616492765~1044159
 done.
 [root

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-29 Thread Jasper Siero
Hello Greg,

I added the debug options which you mentioned and started the process again:

[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
/var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
--reset-journal 0
old journal was 9483323613~134233517
new journal start will be 9621733376 (4176246 bytes past old end)
writing journal head
writing EResetJournal entry
done
[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf 
--cluster ceph --undump-journal 0 journaldumptgho-mon001 
undump journaldumptgho-mon001
start 9483323613 len 134213311
writing header 200.
 writing 9483323613~1048576
 writing 9484372189~1048576
 writing 9485420765~1048576
 writing 9486469341~1048576
 writing 9487517917~1048576
 writing 9488566493~1048576
 writing 9489615069~1048576
 writing 9490663645~1048576
 writing 9491712221~1048576
 writing 9492760797~1048576
 writing 9493809373~1048576
 writing 9494857949~1048576
 writing 9495906525~1048576
 writing 9496955101~1048576
 writing 9498003677~1048576
 writing 9499052253~1048576
 writing 9500100829~1048576
 writing 9501149405~1048576
 writing 9502197981~1048576
 writing 9503246557~1048576
 writing 9504295133~1048576
 writing 9505343709~1048576
 writing 9506392285~1048576
 writing 9507440861~1048576
 writing 9508489437~1048576
 writing 9509538013~1048576
 writing 9510586589~1048576
 writing 9511635165~1048576
 writing 9512683741~1048576
 writing 9513732317~1048576
 writing 9514780893~1048576
 writing 9515829469~1048576
 writing 9516878045~1048576
 writing 9517926621~1048576
 writing 9518975197~1048576
 writing 9520023773~1048576
 writing 9521072349~1048576
 writing 9522120925~1048576
 writing 9523169501~1048576
 writing 9524218077~1048576
 writing 9525266653~1048576
 writing 9526315229~1048576
 writing 9527363805~1048576
 writing 9528412381~1048576
 writing 9529460957~1048576
 writing 9530509533~1048576
 writing 9531558109~1048576
 writing 9532606685~1048576
 writing 9533655261~1048576
 writing 9534703837~1048576
 writing 9535752413~1048576
 writing 9536800989~1048576
 writing 9537849565~1048576
 writing 9538898141~1048576
 writing 9539946717~1048576
 writing 9540995293~1048576
 writing 9542043869~1048576
 writing 9543092445~1048576
 writing 9544141021~1048576
 writing 9545189597~1048576
 writing 9546238173~1048576
 writing 9547286749~1048576
 writing 9548335325~1048576
 writing 9549383901~1048576
 writing 9550432477~1048576
 writing 9551481053~1048576
 writing 9552529629~1048576
 writing 9553578205~1048576
 writing 9554626781~1048576
 writing 9555675357~1048576
 writing 9556723933~1048576
 writing 9557772509~1048576
 writing 9558821085~1048576
 writing 9559869661~1048576
 writing 9560918237~1048576
 writing 9561966813~1048576
 writing 9563015389~1048576
 writing 9564063965~1048576
 writing 9565112541~1048576
 writing 9566161117~1048576
 writing 9567209693~1048576
 writing 9568258269~1048576
 writing 9569306845~1048576
 writing 9570355421~1048576
 writing 9571403997~1048576
 writing 9572452573~1048576
 writing 9573501149~1048576
 writing 9574549725~1048576
 writing 9575598301~1048576
 writing 9576646877~1048576
 writing 9577695453~1048576
 writing 9578744029~1048576
 writing 9579792605~1048576
 writing 9580841181~1048576
 writing 9581889757~1048576
 writing 9582938333~1048576
 writing 9583986909~1048576
 writing 9585035485~1048576
 writing 9586084061~1048576
 writing 9587132637~1048576
 writing 9588181213~1048576
 writing 9589229789~1048576
 writing 9590278365~1048576
 writing 9591326941~1048576
 writing 9592375517~1048576
 writing 9593424093~1048576
 writing 9594472669~1048576
 writing 9595521245~1048576
 writing 9596569821~1048576
 writing 9597618397~1048576
 writing 9598666973~1048576
 writing 9599715549~1048576
 writing 9600764125~1048576
 writing 9601812701~1048576
 writing 9602861277~1048576
 writing 9603909853~1048576
 writing 9604958429~1048576
 writing 9606007005~1048576
 writing 9607055581~1048576
 writing 9608104157~1048576
 writing 9609152733~1048576
 writing 9610201309~1048576
 writing 9611249885~1048576
 writing 9612298461~1048576
 writing 9613347037~1048576
 writing 9614395613~1048576
 writing 9615444189~1048576
 writing 9616492765~1044159
done.
[root@th1-mon001 ~]# service ceph start mds
=== mds.th1-mon001 === 
Starting Ceph mds.th1-mon001 on th1-mon001...
starting mds.th1-mon001 at :/0


The new logs:
http://pastebin.com/wqqjuEpy


Kind regards,

Jasper


Van: gregory.far...@inktank.com [gregory.far...@inktank.com] namens Gregory 
Farnum [gfar...@redhat.com]
Verzonden: dinsdag 28 oktober 2014 19:26
Aan: Jasper Siero
CC: John Spray; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

You'll need to gather a log with the offsets visible; you can do this
with debug ms = 1; debug mds = 20; debug journaler = 20.
-Greg

On Fri, Oct 24, 2014 at 7:03 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg and John,

 I used

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-24 Thread Jasper Siero
Hello Greg and John,

I used the patch on the ceph cluster and tried it again:
 /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph 
--undump-journal 0 journaldumptgho-mon001
undump journaldumptgho-mon001
start 9483323613 len 134213311
writing header 200.
writing 9483323613~1048576
writing 9484372189~1048576


writing 9614395613~1048576
writing 9615444189~1048576
writing 9616492765~1044159
done.

It went well without errors and after that I restarted the mds.
The status went from up:replay to up:reconnect to up:rejoin(lagged or crashed)

In the log there is an error about trim_to  trimming_pos and its like Greg 
mentioned that maybe the dumpfile needs to be truncated to the proper length 
and resetting and undumping again.

How can I truncate the dumped file to the correct length?

The mds log during the undumping and starting the mds:
http://pastebin.com/y14pSvM0

Kind Regards,

Jasper

Van: john.sp...@inktank.com [john.sp...@inktank.com] namens John Spray 
[john.sp...@redhat.com]
Verzonden: donderdag 16 oktober 2014 12:23
Aan: Jasper Siero
CC: Gregory Farnum; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

Following up: firefly fix for undump is: https://github.com/ceph/ceph/pull/2734

Jasper: if you still need to try undumping on this existing firefly
cluster, then you can download ceph-mds packages from this
wip-firefly-undump branch from
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/

Cheers,
John

On Wed, Oct 15, 2014 at 8:15 PM, John Spray john.sp...@redhat.com wrote:
 Sadly undump has been broken for quite some time (it was fixed in
 giant as part of creating cephfs-journal-tool).  If there's a one line
 fix for this then it's probably worth putting in firefly since it's a
 long term supported branch -- I'll do that now.

 John

 On Wed, Oct 15, 2014 at 8:23 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 The dump and reset of the journal was succesful:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --dump-journal 0 journaldumptgho-mon001
 journal is 9483323613~134215459
 read 134213311 bytes at offset 9483323613
 wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001
 NOTE: this is a _sparse_ file; you can
 $ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001
   to efficiently compress it while preserving sparseness.

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134215459
 new journal start will be 9621733376 (4194304 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done


 Undumping the journal was not successful and looking into the error 
 client_lock.is_locked() is showed several times. The mds is not running 
 when I start the undumping so maybe have forgot something?

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
 osdc/Objecter.cc: In function 'ceph_tid_t 
 Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 
 09:09:32.020287
 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x80f15e]
  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
  3: (main()+0x1632) [0x569c62]
  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
  5: /usr/bin/ceph-mds() [0x567d99]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 
 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 
 2014-10-15 09:09:32.020287
 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())

  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x80f15e]
  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
  3: (main()+0x1632) [0x569c62]
  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
  5: /usr/bin/ceph-mds() [0x567d99]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

  0 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In 
 function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 
 time 2014-10-15 09:09:32.020287
 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())

  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x80f15e]
  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
  3: (main

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-16 Thread Jasper Siero
Hi John,

Thanks I will look into it. Is there already a new Giant release date?

Jasper

Van: john.sp...@inktank.com [john.sp...@inktank.com] namens John Spray 
[john.sp...@redhat.com]
Verzonden: donderdag 16 oktober 2014 12:23
Aan: Jasper Siero
CC: Gregory Farnum; ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

Following up: firefly fix for undump is: https://github.com/ceph/ceph/pull/2734

Jasper: if you still need to try undumping on this existing firefly
cluster, then you can download ceph-mds packages from this
wip-firefly-undump branch from
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/

Cheers,
John

On Wed, Oct 15, 2014 at 8:15 PM, John Spray john.sp...@redhat.com wrote:
 Sadly undump has been broken for quite some time (it was fixed in
 giant as part of creating cephfs-journal-tool).  If there's a one line
 fix for this then it's probably worth putting in firefly since it's a
 long term supported branch -- I'll do that now.

 John

 On Wed, Oct 15, 2014 at 8:23 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 The dump and reset of the journal was succesful:

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --dump-journal 0 journaldumptgho-mon001
 journal is 9483323613~134215459
 read 134213311 bytes at offset 9483323613
 wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001
 NOTE: this is a _sparse_ file; you can
 $ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001
   to efficiently compress it while preserving sparseness.

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --reset-journal 0
 old journal was 9483323613~134215459
 new journal start will be 9621733376 (4194304 bytes past old end)
 writing journal head
 writing EResetJournal entry
 done


 Undumping the journal was not successful and looking into the error 
 client_lock.is_locked() is showed several times. The mds is not running 
 when I start the undumping so maybe have forgot something?

 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
 /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
 --undump-journal 0 journaldumptgho-mon001
 undump journaldumptgho-mon001
 start 9483323613 len 134213311
 writing header 200.
 osdc/Objecter.cc: In function 'ceph_tid_t 
 Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 
 09:09:32.020287
 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x80f15e]
  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
  3: (main()+0x1632) [0x569c62]
  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
  5: /usr/bin/ceph-mds() [0x567d99]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 
 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 
 2014-10-15 09:09:32.020287
 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())

  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x80f15e]
  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
  3: (main()+0x1632) [0x569c62]
  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
  5: /usr/bin/ceph-mds() [0x567d99]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

  0 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In 
 function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 
 time 2014-10-15 09:09:32.020287
 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())

  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c
 [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x80f15e]
  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
  3: (main()+0x1632) [0x569c62]
  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
  5: /usr/bin/ceph-mds() [0x567d99]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 terminate called after throwing an instance of 'ceph::FailedAssertion'
 *** Caught signal (Aborted) **
  in thread 7fec3e5ad7a0
  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
  1: /usr/bin/ceph-mds() [0x82ef61]
  2: (()+0xf710) [0x7fec3d9a6710]
  3: (gsignal()+0x35) [0x7fec3ca7c635]
  4: (abort()+0x175) [0x7fec3ca7de15]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
  6: (()+0xbcbe6) [0x7fec3d334be6]
  7: (()+0xbcc13) [0x7fec3d334c13]
  8: (()+0xbcd0e) [0x7fec3d334d0e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
 const*)+0x7f2) [0x94b812]
  10: /usr/bin/ceph-mds() [0x80f15e

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-15 Thread Jasper Siero
/bin/ceph-mds() [0x82ef61]
 2: (()+0xf710) [0x7fec3d9a6710]
 3: (gsignal()+0x35) [0x7fec3ca7c635]
 4: (abort()+0x175) [0x7fec3ca7de15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
 6: (()+0xbcbe6) [0x7fec3d334be6]
 7: (()+0xbcc13) [0x7fec3d334c13]
 8: (()+0xbcd0e) [0x7fec3d334d0e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7f2) [0x94b812]
 10: /usr/bin/ceph-mds() [0x80f15e]
 11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
 12: (main()+0x1632) [0x569c62]
 13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
 14: /usr/bin/ceph-mds() [0x567d99]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

Aborted

Jasper

Van: Gregory Farnum [g...@inktank.com]
Verzonden: dinsdag 14 oktober 2014 23:40
Aan: Jasper Siero
CC: ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

ceph-mds --undump-journal rank journal-file
Looks like it accidentally (or on purpose? you can break things with
it) got left out of the help text.

On Tue, Oct 14, 2014 at 8:19 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg,

 I dumped the journal successful to a file:

 journal is 9483323613~134215459
 read 134213311 bytes at offset 9483323613
 wrote 134213311 bytes at offset 9483323613 to journaldumptgho
 NOTE: this is a _sparse_ file; you can
 $ tar cSzf journaldumptgho.tgz journaldumptgho
   to efficiently compress it while preserving sparseness.

 I see the option for resetting the mds journal but I can't find the option 
 for undumping /importing the journal:

  usage: ceph-mds -i name [flags] [[--journal_check 
 rank]|[--hot-standby][rank]]
   -m monitorip:port
 connect to monitor at given address
   --debug_mds n
 debug MDS level (e.g. 10)
   --dump-journal rank filename
 dump the MDS journal (binary) for rank.
   --dump-journal-entries rank filename
 dump the MDS journal (JSON) for rank.
   --journal-check rank
 replay the journal for rank, then exit
   --hot-standby rank
 start up as a hot standby for rank
   --reset-journal rank
 discard the MDS journal for rank, and replace it with a single
 event that updates/resets inotable and sessionmap on replay.

 Do you know how to undump the journal back into ceph?

 Jasper

 
 Van: Gregory Farnum [g...@inktank.com]
 Verzonden: vrijdag 10 oktober 2014 23:45
 Aan: Jasper Siero
 CC: ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

 Ugh, debug journaler, not debug journaled.

 That said, the filer output tells me that you're missing an object out
 of the MDS log. (200.08f5) I think this issue should be resolved
 if you dump the journal to a file, reset it, and then undump it.
 (These are commands you can invoke from ceph-mds.)
 I haven't done this myself in a long time, so there may be some hard
 edges around it. In particular, I'm not sure if the dumped journal
 file will stop when the data stops, or if it will be a little too
 long. If so, we can fix that by truncating the dumped file to the
 proper length and resetting and undumping again.
 (And just to harp on it, this journal manipulation is a lot simpler in
 Giant... ;) )
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com

 On Wed, Oct 8, 2014 at 7:11 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Greg,

 No problem thanks for looking into the log. I attached the log to this email.
 I'm looking forward for the new release because it would be nice to have 
 more possibilities to diagnose problems.

 Kind regards,

 Jasper Siero
 
 Van: Gregory Farnum [g...@inktank.com]
 Verzonden: dinsdag 7 oktober 2014 19:45
 Aan: Jasper Siero
 CC: ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running 
 full

 Sorry; I guess this fell off my radar.

 The issue here is not that it's waiting for an osdmap; it got the
 requested map and went into replay mode almost immediately. In fact
 the log looks good except that it seems to finish replaying the log
 and then simply fail to transition into active. Generate a new one,
 adding in debug journaled = 20 and debug filer = 20, and we can
 probably figure out how to fix it.
 (This diagnosis is much easier in the upcoming Giant!)
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Oct 7, 2014 at 7:55 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Gregory,

 We still have the same problems with our test ceph cluster and didn't 
 receive a reply from you after I send you the requested log files. Do you 
 know if it's possible to get our cephfs filesystem working again or is it 
 better to give up the files on cephfs and start over again?

 We restarted the cluster serveral times but it's still degraded:
 [root@th1-mon001

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-14 Thread Jasper Siero
Hello Greg,

I dumped the journal successful to a file:

journal is 9483323613~134215459
read 134213311 bytes at offset 9483323613
wrote 134213311 bytes at offset 9483323613 to journaldumptgho
NOTE: this is a _sparse_ file; you can
$ tar cSzf journaldumptgho.tgz journaldumptgho
  to efficiently compress it while preserving sparseness.

I see the option for resetting the mds journal but I can't find the option for 
undumping /importing the journal:

 usage: ceph-mds -i name [flags] [[--journal_check rank]|[--hot-standby][rank]]
  -m monitorip:port
connect to monitor at given address
  --debug_mds n
debug MDS level (e.g. 10)
  --dump-journal rank filename
dump the MDS journal (binary) for rank.
  --dump-journal-entries rank filename
dump the MDS journal (JSON) for rank.
  --journal-check rank
replay the journal for rank, then exit
  --hot-standby rank
start up as a hot standby for rank
  --reset-journal rank
discard the MDS journal for rank, and replace it with a single
event that updates/resets inotable and sessionmap on replay.

Do you know how to undump the journal back into ceph?

Jasper


Van: Gregory Farnum [g...@inktank.com]
Verzonden: vrijdag 10 oktober 2014 23:45
Aan: Jasper Siero
CC: ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

Ugh, debug journaler, not debug journaled.

That said, the filer output tells me that you're missing an object out
of the MDS log. (200.08f5) I think this issue should be resolved
if you dump the journal to a file, reset it, and then undump it.
(These are commands you can invoke from ceph-mds.)
I haven't done this myself in a long time, so there may be some hard
edges around it. In particular, I'm not sure if the dumped journal
file will stop when the data stops, or if it will be a little too
long. If so, we can fix that by truncating the dumped file to the
proper length and resetting and undumping again.
(And just to harp on it, this journal manipulation is a lot simpler in
Giant... ;) )
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Wed, Oct 8, 2014 at 7:11 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hello Greg,

 No problem thanks for looking into the log. I attached the log to this email.
 I'm looking forward for the new release because it would be nice to have more 
 possibilities to diagnose problems.

 Kind regards,

 Jasper Siero
 
 Van: Gregory Farnum [g...@inktank.com]
 Verzonden: dinsdag 7 oktober 2014 19:45
 Aan: Jasper Siero
 CC: ceph-users
 Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

 Sorry; I guess this fell off my radar.

 The issue here is not that it's waiting for an osdmap; it got the
 requested map and went into replay mode almost immediately. In fact
 the log looks good except that it seems to finish replaying the log
 and then simply fail to transition into active. Generate a new one,
 adding in debug journaled = 20 and debug filer = 20, and we can
 probably figure out how to fix it.
 (This diagnosis is much easier in the upcoming Giant!)
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Oct 7, 2014 at 7:55 AM, Jasper Siero
 jasper.si...@target-holding.nl wrote:
 Hello Gregory,

 We still have the same problems with our test ceph cluster and didn't 
 receive a reply from you after I send you the requested log files. Do you 
 know if it's possible to get our cephfs filesystem working again or is it 
 better to give up the files on cephfs and start over again?

 We restarted the cluster serveral times but it's still degraded:
 [root@th1-mon001 ~]# ceph -w
 cluster c78209f5-55ea-4c70-8968-2231d2b05560
  health HEALTH_WARN mds cluster is degraded
  monmap e3: 3 mons at 
 {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
  election epoch 432, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
  mdsmap e190: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
  osdmap e2248: 12 osds: 12 up, 12 in
   pgmap v197548: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
 124 GB used, 175 GB / 299 GB avail
  491 active+clean
1 active+clean+scrubbing+deep

 One placement group stays in the deep scrubbing fase.

 Kind regards,

 Jasper Siero


 
 Van: Jasper Siero
 Verzonden: donderdag 21 augustus 2014 16:43
 Aan: Gregory Farnum
 Onderwerp: RE: [ceph-users] mds isn't working anymore after osd's running 
 full

 I did restart it but you are right about the epoch number which has changed 
 but the situation looks the same.
 2014-08-21 16:33:06.032366 7f9b5f3cd700  1 mds.0.27  need osdmap epoch 1994, 
 have 1993
 2014-08-21 16:33:06.032368 7f9b5f3cd700  1 mds.0.27  waiting for osdmap 1994 
 (which blacklists
 prior instance)
 I started

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-08-20 Thread Jasper Siero
Unfortunately that doesn't help. I restarted both the active and standby mds 
but that doesn't change the state of the mds. Is there a way to force the mds 
to look at the 1832 epoch (or earlier) instead of 1833 (need osdmap epoch 1833, 
have 1832)? 

Thanks,

Jasper

Van: Gregory Farnum [g...@inktank.com]
Verzonden: dinsdag 19 augustus 2014 19:49
Aan: Jasper Siero
CC: ceph-users@lists.ceph.com
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Mon, Aug 18, 2014 at 6:56 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 Hi all,

 We have a small ceph cluster running version 0.80.1 with cephfs on five
 nodes.
 Last week some osd's were full and shut itself down. To help de osd's start
 again I added some extra osd's and moved some placement group directories on
 the full osd's (which has a copy on another osd) to another place on the
 node (as mentioned in
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
 After clearing some space on the full osd's I started them again. After a
 lot of deep scrubbing and two pg inconsistencies which needed to be repaired
 everything looked fine except the mds which still is in the replay state and
 it stays that way.
 The log below says that mds need osdmap epoch 1833 and have 1832.

 2014-08-18 12:29:22.268248 7fa786182700  1 mds.-1.0 handle_mds_map standby
 2014-08-18 12:29:22.273995 7fa786182700  1 mds.0.25 handle_mds_map i am now
 mds.0.25
 2014-08-18 12:29:22.273998 7fa786182700  1 mds.0.25 handle_mds_map state
 change up:standby -- up:replay
 2014-08-18 12:29:22.274000 7fa786182700  1 mds.0.25 replay_start
 2014-08-18 12:29:22.274014 7fa786182700  1 mds.0.25  recovery set is
 2014-08-18 12:29:22.274016 7fa786182700  1 mds.0.25  need osdmap epoch 1833,
 have 1832
 2014-08-18 12:29:22.274017 7fa786182700  1 mds.0.25  waiting for osdmap 1833
 (which blacklists prior instance)

  # ceph status
 cluster c78209f5-55ea-4c70-8968-2231d2b05560
  health HEALTH_WARN mds cluster is degraded
  monmap e3: 3 mons at
 {th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
 election epoch 362, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
  mdsmap e154: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
  osdmap e1951: 12 osds: 12 up, 12 in
   pgmap v193685: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
 124 GB used, 175 GB / 299 GB avail
  492 active+clean

 # ceph osd tree
 # idweighttype nameup/downreweight
 -10.2399root default
 -20.05997host th1-osd001
 00.01999osd.0up1
 10.01999osd.1up1
 20.01999osd.2up1
 -30.05997host th1-osd002
 30.01999osd.3up1
 40.01999osd.4up1
 50.01999osd.5up1
 -40.05997host th1-mon003
 60.01999osd.6up1
 70.01999osd.7up1
 80.01999osd.8up1
 -50.05997host th1-mon002
 90.01999osd.9up1
 100.01999osd.10up1
 110.01999osd.11up1

 What is the way to get the mds up and running again?

 I still have all the placement group directories which I moved from the full
 osds which where down to create disk space.

Try just restarting the MDS daemon. This sounds a little familiar so I
think it's a known bug which may be fixed in a later dev or point
release on the MDS, but it's a soft-state rather than a disk state
issue.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds isn't working anymore after osd's running full

2014-08-18 Thread Jasper Siero
Hi all,

We have a small ceph cluster running version 0.80.1 with cephfs on five nodes.
Last week some osd's were full and shut itself down. To help de osd's start 
again I added some extra osd's and moved some placement group directories on 
the full osd's (which has a copy on another osd) to another place on the node 
(as mentioned in 
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
After clearing some space on the full osd's I started them again. After a lot 
of deep scrubbing and two pg inconsistencies which needed to be repaired 
everything looked fine except the mds which still is in the replay state and it 
stays that way.
The log below says that mds need osdmap epoch 1833 and have 1832.

2014-08-18 12:29:22.268248 7fa786182700  1 mds.-1.0 handle_mds_map standby
2014-08-18 12:29:22.273995 7fa786182700  1 mds.0.25 handle_mds_map i am now 
mds.0.25
2014-08-18 12:29:22.273998 7fa786182700  1 mds.0.25 handle_mds_map state change 
up:standby -- up:replay
2014-08-18 12:29:22.274000 7fa786182700  1 mds.0.25 replay_start
2014-08-18 12:29:22.274014 7fa786182700  1 mds.0.25  recovery set is
2014-08-18 12:29:22.274016 7fa786182700  1 mds.0.25  need osdmap epoch 1833, 
have 1832
2014-08-18 12:29:22.274017 7fa786182700  1 mds.0.25  waiting for osdmap 1833 
(which blacklists prior instance)

 # ceph status
cluster c78209f5-55ea-4c70-8968-2231d2b05560
 health HEALTH_WARN mds cluster is degraded
 monmap e3: 3 mons at 
{th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
 election epoch 362, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
 mdsmap e154: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
 osdmap e1951: 12 osds: 12 up, 12 in
  pgmap v193685: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
124 GB used, 175 GB / 299 GB avail
 492 active+clean

# ceph osd tree
# idweighttype nameup/downreweight
-10.2399root default
-20.05997host th1-osd001
00.01999osd.0up1
10.01999osd.1up1
20.01999osd.2up1
-30.05997host th1-osd002
30.01999osd.3up1
40.01999osd.4up1
50.01999osd.5up1
-40.05997host th1-mon003
60.01999osd.6up1
70.01999osd.7up1
80.01999osd.8up1
-50.05997host th1-mon002
90.01999osd.9up1
100.01999osd.10up1
110.01999osd.11up1

What is the way to get the mds up and running again?

I still have all the placement group directories which I moved from the full 
osds which where down to create disk space.



Kind regards,

Jasper Siero
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com