Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-08 Thread Mark Kirkwood
Right - I see from the 0.80.8 notes that we merged a fix for #9073.
However (unfortunately) there were a number of patches that we
experimented with on this issue - and this looks like one of the earlier
ones (i.e not what we merged into master at the time), which is a bit
confusing (maybe it was to avoid a more invasive patch...). Maybe
Somnath or Jianpeng know why?

Cheers

Mark


On 08/06/15 20:08, Christian Balzer wrote:
 
 Mark,
 
 one would hope you can't with 0.80.9 as per the release notes, while
 0.80.7 definitely was susceptible. 
 
 Christian
 
 On Mon, 08 Jun 2015 20:05:20 +1200 Mark Kirkwood wrote:
 
 Trying out some tests on my pet VMs with 0.80.9 does not elicit any 
 journal failures...However ISTR that running on the bare metal was the 
 most reliable way to reproduce...(proceeding - currently cannot get 
 ceph-deploy to install this configuration...I'll investigate further 
 tomorrow)!

 Cheers

 Mark

 On 06/06/15 18:04, Mark Kirkwood wrote:
 Righty - I'll see if I can replicate what you see if I setup an 0.80.9
 cluster using the same workstation hardware (WD Raptors and Intel 520s)
 that showed up the issue previously at 0.83 (I wonder if I never tried
 a fresh install using the 0.80.* tree)...

 May be a few days...

 On 05/06/15 16:49, Christian Balzer wrote:

 Hello,

 On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:

 Well, whatever it is, I appear to not be the only one after all:
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361

 Looking quickly at the relevant code:

 FileJournal::stop_writer() in src/os/FileJpurnal.cc

 I see that we didn't start seeing the (original) issue until changes
 in 0.83, which suggests that 0.80 tree might not be doing the same
 thing. *However* I note that I'm not happy with the placement of the
 two thread join operations in there - it *looks* to me like 0.80
 could in fact be vulnerable to the same journal corrupting problem,
 so if it occurs again might be interesting to apply the gist of
 https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of
 course would be best if this was on a test system)!

 Alas this is neither a test cluster, nor do I have things set up to
 compile
 from source here ATM.

 Christian


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-08 Thread Mark Kirkwood
Trying out some tests on my pet VMs with 0.80.9 does not elicit any 
journal failures...However ISTR that running on the bare metal was the 
most reliable way to reproduce...(proceeding - currently cannot get 
ceph-deploy to install this configuration...I'll investigate further 
tomorrow)!


Cheers

Mark

On 06/06/15 18:04, Mark Kirkwood wrote:

Righty - I'll see if I can replicate what you see if I setup an 0.80.9
cluster using the same workstation hardware (WD Raptors and Intel 520s)
that showed up the issue previously at 0.83 (I wonder if I never tried a
fresh install using the 0.80.* tree)...

May be a few days...

On 05/06/15 16:49, Christian Balzer wrote:


Hello,

On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:

Well, whatever it is, I appear to not be the only one after all:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361


Looking quickly at the relevant code:

FileJournal::stop_writer() in src/os/FileJpurnal.cc

I see that we didn't start seeing the (original) issue until changes in
0.83, which suggests that 0.80 tree might not be doing the same thing.
*However* I note that I'm not happy with the placement of the two thread
join operations in there - it *looks* to me like 0.80 could in fact be
vulnerable to the same journal corrupting problem, so if it occurs again
might be interesting to apply the gist of
https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of
course would be best if this was on a test system)!


Alas this is neither a test cluster, nor do I have things set up to
compile
from source here ATM.

Christian



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-08 Thread Christian Balzer

Mark,

one would hope you can't with 0.80.9 as per the release notes, while
0.80.7 definitely was susceptible. 

Christian

On Mon, 08 Jun 2015 20:05:20 +1200 Mark Kirkwood wrote:

 Trying out some tests on my pet VMs with 0.80.9 does not elicit any 
 journal failures...However ISTR that running on the bare metal was the 
 most reliable way to reproduce...(proceeding - currently cannot get 
 ceph-deploy to install this configuration...I'll investigate further 
 tomorrow)!
 
 Cheers
 
 Mark
 
 On 06/06/15 18:04, Mark Kirkwood wrote:
  Righty - I'll see if I can replicate what you see if I setup an 0.80.9
  cluster using the same workstation hardware (WD Raptors and Intel 520s)
  that showed up the issue previously at 0.83 (I wonder if I never tried
  a fresh install using the 0.80.* tree)...
 
  May be a few days...
 
  On 05/06/15 16:49, Christian Balzer wrote:
 
  Hello,
 
  On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:
 
  Well, whatever it is, I appear to not be the only one after all:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361
 
  Looking quickly at the relevant code:
 
  FileJournal::stop_writer() in src/os/FileJpurnal.cc
 
  I see that we didn't start seeing the (original) issue until changes
  in 0.83, which suggests that 0.80 tree might not be doing the same
  thing. *However* I note that I'm not happy with the placement of the
  two thread join operations in there - it *looks* to me like 0.80
  could in fact be vulnerable to the same journal corrupting problem,
  so if it occurs again might be interesting to apply the gist of
  https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of
  course would be best if this was on a test system)!
 
  Alas this is neither a test cluster, nor do I have things set up to
  compile
  from source here ATM.
 
  Christian
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-06 Thread Mark Kirkwood
Righty - I'll see if I can replicate what you see if I setup an 0.80.9 
cluster using the same workstation hardware (WD Raptors and Intel 520s) 
that showed up the issue previously at 0.83 (I wonder if I never tried a 
fresh install using the 0.80.* tree)...


May be a few days...

On 05/06/15 16:49, Christian Balzer wrote:


Hello,

On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:

Well, whatever it is, I appear to not be the only one after all:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361


Looking quickly at the relevant code:

FileJournal::stop_writer() in src/os/FileJpurnal.cc

I see that we didn't start seeing the (original) issue until changes in
0.83, which suggests that 0.80 tree might not be doing the same thing.
*However* I note that I'm not happy with the placement of the two thread
join operations in there - it *looks* to me like 0.80 could in fact be
vulnerable to the same journal corrupting problem, so if it occurs again
might be interesting to apply the gist of
https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of
course would be best if this was on a test system)!


Alas this is neither a test cluster, nor do I have things set up to compile
from source here ATM.

Christian



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-04 Thread Christian Balzer

Hello,

On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:

Well, whatever it is, I appear to not be the only one after all:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361

 Looking quickly at the relevant code:
 
 FileJournal::stop_writer() in src/os/FileJpurnal.cc
 
 I see that we didn't start seeing the (original) issue until changes in 
 0.83, which suggests that 0.80 tree might not be doing the same thing. 
 *However* I note that I'm not happy with the placement of the two thread 
 join operations in there - it *looks* to me like 0.80 could in fact be 
 vulnerable to the same journal corrupting problem, so if it occurs again 
 might be interesting to apply the gist of
 https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of 
 course would be best if this was on a test system)!

Alas this is neither a test cluster, nor do I have things set up to compile
from source here ATM.
 
Christian

 Cheers
 
 Mark
 
 On 05/06/15 15:28, Christian Balzer wrote:
 
  Hello Mark,
 
  On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote:
 
  Sorry Christian,
 
  I did briefly wonder, then thought, oh yeah, that fix is already
  merged in...However - on reflection, perhaps *not* in the 0.80
  tree...doh!
 
  No worries, I'm just happy to hear that you think it's the same thing
  as well.
 
  I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement
  after setting straw_calc_version 1 and doing a reweight all) today.
 
  Should it happen again, I know who and where to poke. ^^
 
  Christian
 
  On 04/06/15 18:57, Christian Balzer wrote:
 
  Hello,
 
  Actually after going through the changelogs with a fine comb and the
  ole Mark I eyeball I think I might be seeing this:
  ---
  osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma
  Jianpeng, Somnath Roy) ---
 
  The details in the various related bug reports certainly make it look
  related.
  Funny that nobody involved in those bug reports noticed the
  similarity.
 
  Now I wouldn't have installed 0.80.8 due to the regression speed bug
  anyway, but now that 0.80.9 has made it into Jessie backports I shall
  install that tomorrow and hopefully never see that problem again.
 
  Christian
 
  On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:
 
  On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com
  wrote:
 
  Hello Greg,
 
  On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
 
  The description of the logging abruptly ending and the journal
  being bad really sounds like part of the disk is going back in
  time. I'm not sure if XFS internally is set up in such a way that
  something like losing part of its journal would allow that?
 
  I'm special. ^o^
  No XFS, EXT4. As stated in the original thread, below.
  And the (OSD) journal is a raw partition on a DC S3700.
 
  And since there was at least a 30 seconds pause between the
  completion of the /etc/init.d/ceph stop and issuing of the
  shutdown command, the logging abruptly ending seems to be unlikely
  related to the shutdown at all.
 
  Oh, sorry...
  I happened to read this article last night:
  http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/
 
  Depending on configuration (I think you'd need to have a
  journal-as-file) you could be experiencing that. And again, not many
  people use ext4 so who knows what other ways there are of things
  being broken that nobody else has seen yet.
 
 
  If any of the OSD developers have the time it's conceivable a copy
  of the OSD journal would be enlightening (if e.g. the header
  offsets are wrong but there are a bunch of valid journal entries),
  but this is two reports of this issue from you and none very
  similar from anybody else. I'm still betting on something in the
  software or hardware stack misbehaving. (There aren't that many
  people running Debian; there are lots of people running Ubuntu and
  we find bad XFS kernels there not infrequently; I think you're
  hitting something like that.)
 
  There should be no file system involved with the raw partition SSD
  journal, n'est-ce pas?
 
  ...and I guess probably you aren't since you are using partitions.
 
 
  The hardware is vastly different, the previous case was on an AMD
  system with onboard SATA (SP5100), this one is a SM storage goat
  with LSI 3008.
 
  The only thing they have in common is the Ceph version 0.80.7 (via
  the Debian repository, not Ceph) and Debian Jessie as OS with
  kernel 3.16 (though there were minor updates on that between those
  incidents, backported fixes)
 
  A copy of the journal would consist of the entire 10GB partition,
  since we don't know where in loop it was at the time, right?
 
  Yeah.
 
 
 
 
 
 
 
 
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-04 Thread Mark Kirkwood

Looking quickly at the relevant code:

FileJournal::stop_writer() in src/os/FileJpurnal.cc

I see that we didn't start seeing the (original) issue until changes in 
0.83, which suggests that 0.80 tree might not be doing the same thing. 
*However* I note that I'm not happy with the placement of the two thread 
join operations in there - it *looks* to me like 0.80 could in fact be 
vulnerable to the same journal corrupting problem, so if it occurs again 
might be interesting to apply the gist of
https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of 
course would be best if this was on a test system)!


Cheers

Mark

On 05/06/15 15:28, Christian Balzer wrote:


Hello Mark,

On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote:


Sorry Christian,

I did briefly wonder, then thought, oh yeah, that fix is already merged
in...However - on reflection, perhaps *not* in the 0.80 tree...doh!


No worries, I'm just happy to hear that you think it's the same thing as
well.

I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement
after setting straw_calc_version 1 and doing a reweight all) today.

Should it happen again, I know who and where to poke. ^^

Christian


On 04/06/15 18:57, Christian Balzer wrote:


Hello,

Actually after going through the changelogs with a fine comb and the
ole Mark I eyeball I think I might be seeing this:
---
osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng,
Somnath Roy) ---

The details in the various related bug reports certainly make it look
related.
Funny that nobody involved in those bug reports noticed the similarity.

Now I wouldn't have installed 0.80.8 due to the regression speed bug
anyway, but now that 0.80.9 has made it into Jessie backports I shall
install that tomorrow and hopefully never see that problem again.

Christian

On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:


On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com
wrote:


Hello Greg,

On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:


The description of the logging abruptly ending and the journal being
bad really sounds like part of the disk is going back in time. I'm
not sure if XFS internally is set up in such a way that something
like losing part of its journal would allow that?


I'm special. ^o^
No XFS, EXT4. As stated in the original thread, below.
And the (OSD) journal is a raw partition on a DC S3700.

And since there was at least a 30 seconds pause between the
completion of the /etc/init.d/ceph stop and issuing of the
shutdown command, the logging abruptly ending seems to be unlikely
related to the shutdown at all.


Oh, sorry...
I happened to read this article last night:
http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/

Depending on configuration (I think you'd need to have a
journal-as-file) you could be experiencing that. And again, not many
people use ext4 so who knows what other ways there are of things being
broken that nobody else has seen yet.




If any of the OSD developers have the time it's conceivable a copy
of the OSD journal would be enlightening (if e.g. the header
offsets are wrong but there are a bunch of valid journal entries),
but this is two reports of this issue from you and none very
similar from anybody else. I'm still betting on something in the
software or hardware stack misbehaving. (There aren't that many
people running Debian; there are lots of people running Ubuntu and
we find bad XFS kernels there not infrequently; I think you're
hitting something like that.)


There should be no file system involved with the raw partition SSD
journal, n'est-ce pas?


...and I guess probably you aren't since you are using partitions.



The hardware is vastly different, the previous case was on an AMD
system with onboard SATA (SP5100), this one is a SM storage goat with
LSI 3008.

The only thing they have in common is the Ceph version 0.80.7 (via
the Debian repository, not Ceph) and Debian Jessie as OS with kernel
3.16 (though there were minor updates on that between those
incidents, backported fixes)

A copy of the journal would consist of the entire 10GB partition,
since we don't know where in loop it was at the time, right?


Yeah.












___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-04 Thread Christian Balzer

Hello Mark,

On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote:

 Sorry Christian,
 
 I did briefly wonder, then thought, oh yeah, that fix is already merged 
 in...However - on reflection, perhaps *not* in the 0.80 tree...doh!

No worries, I'm just happy to hear that you think it's the same thing as
well.

I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement
after setting straw_calc_version 1 and doing a reweight all) today.

Should it happen again, I know who and where to poke. ^^
  
Christian

 On 04/06/15 18:57, Christian Balzer wrote:
 
  Hello,
 
  Actually after going through the changelogs with a fine comb and the
  ole Mark I eyeball I think I might be seeing this:
  ---
  osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng,
  Somnath Roy) ---
 
  The details in the various related bug reports certainly make it look
  related.
  Funny that nobody involved in those bug reports noticed the similarity.
 
  Now I wouldn't have installed 0.80.8 due to the regression speed bug
  anyway, but now that 0.80.9 has made it into Jessie backports I shall
  install that tomorrow and hopefully never see that problem again.
 
  Christian
 
  On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:
 
  On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com
  wrote:
 
  Hello Greg,
 
  On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
 
  The description of the logging abruptly ending and the journal being
  bad really sounds like part of the disk is going back in time. I'm
  not sure if XFS internally is set up in such a way that something
  like losing part of its journal would allow that?
 
  I'm special. ^o^
  No XFS, EXT4. As stated in the original thread, below.
  And the (OSD) journal is a raw partition on a DC S3700.
 
  And since there was at least a 30 seconds pause between the
  completion of the /etc/init.d/ceph stop and issuing of the
  shutdown command, the logging abruptly ending seems to be unlikely
  related to the shutdown at all.
 
  Oh, sorry...
  I happened to read this article last night:
  http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/
 
  Depending on configuration (I think you'd need to have a
  journal-as-file) you could be experiencing that. And again, not many
  people use ext4 so who knows what other ways there are of things being
  broken that nobody else has seen yet.
 
 
  If any of the OSD developers have the time it's conceivable a copy
  of the OSD journal would be enlightening (if e.g. the header
  offsets are wrong but there are a bunch of valid journal entries),
  but this is two reports of this issue from you and none very
  similar from anybody else. I'm still betting on something in the
  software or hardware stack misbehaving. (There aren't that many
  people running Debian; there are lots of people running Ubuntu and
  we find bad XFS kernels there not infrequently; I think you're
  hitting something like that.)
 
  There should be no file system involved with the raw partition SSD
  journal, n'est-ce pas?
 
  ...and I guess probably you aren't since you are using partitions.
 
 
  The hardware is vastly different, the previous case was on an AMD
  system with onboard SATA (SP5100), this one is a SM storage goat with
  LSI 3008.
 
  The only thing they have in common is the Ceph version 0.80.7 (via
  the Debian repository, not Ceph) and Debian Jessie as OS with kernel
  3.16 (though there were minor updates on that between those
  incidents, backported fixes)
 
  A copy of the journal would consist of the entire 10GB partition,
  since we don't know where in loop it was at the time, right?
 
  Yeah.
 
 
 
 
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-04 Thread Christian Balzer

Hello,

Actually after going through the changelogs with a fine comb and the ole
Mark I eyeball I think I might be seeing this:
---
osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath 
Roy)
---

The details in the various related bug reports certainly make it look
related. 
Funny that nobody involved in those bug reports noticed the similarity. 

Now I wouldn't have installed 0.80.8 due to the regression speed bug
anyway, but now that 0.80.9 has made it into Jessie backports I shall
install that tomorrow and hopefully never see that problem again.

Christian

On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:

 On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote:
 
  Hello Greg,
 
  On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
 
  The description of the logging abruptly ending and the journal being
  bad really sounds like part of the disk is going back in time. I'm not
  sure if XFS internally is set up in such a way that something like
  losing part of its journal would allow that?
 
  I'm special. ^o^
  No XFS, EXT4. As stated in the original thread, below.
  And the (OSD) journal is a raw partition on a DC S3700.
 
  And since there was at least a 30 seconds pause between the completion
  of the /etc/init.d/ceph stop and issuing of the shutdown command, the
  logging abruptly ending seems to be unlikely related to the shutdown at
  all.
 
 Oh, sorry...
 I happened to read this article last night:
 http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/
 
 Depending on configuration (I think you'd need to have a
 journal-as-file) you could be experiencing that. And again, not many
 people use ext4 so who knows what other ways there are of things being
 broken that nobody else has seen yet.
 
 
  If any of the OSD developers have the time it's conceivable a copy of
  the OSD journal would be enlightening (if e.g. the header offsets are
  wrong but there are a bunch of valid journal entries), but this is two
  reports of this issue from you and none very similar from anybody
  else. I'm still betting on something in the software or hardware stack
  misbehaving. (There aren't that many people running Debian; there are
  lots of people running Ubuntu and we find bad XFS kernels there not
  infrequently; I think you're hitting something like that.)
 
  There should be no file system involved with the raw partition SSD
  journal, n'est-ce pas?
 
 ...and I guess probably you aren't since you are using partitions.
 
 
  The hardware is vastly different, the previous case was on an AMD
  system with onboard SATA (SP5100), this one is a SM storage goat with
  LSI 3008.
 
  The only thing they have in common is the Ceph version 0.80.7 (via the
  Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
  (though there were minor updates on that between those incidents,
  backported fixes)
 
  A copy of the journal would consist of the entire 10GB partition,
  since we don't know where in loop it was at the time, right?
 
 Yeah.
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-06-04 Thread Mark Kirkwood

Sorry Christian,

I did briefly wonder, then thought, oh yeah, that fix is already merged 
in...However - on reflection, perhaps *not* in the 0.80 tree...doh!


On 04/06/15 18:57, Christian Balzer wrote:


Hello,

Actually after going through the changelogs with a fine comb and the ole
Mark I eyeball I think I might be seeing this:
---
osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath 
Roy)
---

The details in the various related bug reports certainly make it look
related.
Funny that nobody involved in those bug reports noticed the similarity.

Now I wouldn't have installed 0.80.8 due to the regression speed bug
anyway, but now that 0.80.9 has made it into Jessie backports I shall
install that tomorrow and hopefully never see that problem again.

Christian

On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:


On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote:


Hello Greg,

On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:


The description of the logging abruptly ending and the journal being
bad really sounds like part of the disk is going back in time. I'm not
sure if XFS internally is set up in such a way that something like
losing part of its journal would allow that?


I'm special. ^o^
No XFS, EXT4. As stated in the original thread, below.
And the (OSD) journal is a raw partition on a DC S3700.

And since there was at least a 30 seconds pause between the completion
of the /etc/init.d/ceph stop and issuing of the shutdown command, the
logging abruptly ending seems to be unlikely related to the shutdown at
all.


Oh, sorry...
I happened to read this article last night:
http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/

Depending on configuration (I think you'd need to have a
journal-as-file) you could be experiencing that. And again, not many
people use ext4 so who knows what other ways there are of things being
broken that nobody else has seen yet.




If any of the OSD developers have the time it's conceivable a copy of
the OSD journal would be enlightening (if e.g. the header offsets are
wrong but there are a bunch of valid journal entries), but this is two
reports of this issue from you and none very similar from anybody
else. I'm still betting on something in the software or hardware stack
misbehaving. (There aren't that many people running Debian; there are
lots of people running Ubuntu and we find bad XFS kernels there not
infrequently; I think you're hitting something like that.)


There should be no file system involved with the raw partition SSD
journal, n'est-ce pas?


...and I guess probably you aren't since you are using partitions.



The hardware is vastly different, the previous case was on an AMD
system with onboard SATA (SP5100), this one is a SM storage goat with
LSI 3008.

The only thing they have in common is the Ceph version 0.80.7 (via the
Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
(though there were minor updates on that between those incidents,
backported fixes)

A copy of the journal would consist of the entire 10GB partition,
since we don't know where in loop it was at the time, right?


Yeah.






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Christian Balzer

Hello Greg,

On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:

 The description of the logging abruptly ending and the journal being
 bad really sounds like part of the disk is going back in time. I'm not
 sure if XFS internally is set up in such a way that something like
 losing part of its journal would allow that?
 
I'm special. ^o^
No XFS, EXT4. As stated in the original thread, below.
And the (OSD) journal is a raw partition on a DC S3700.

And since there was at least a 30 seconds pause between the completion of
the /etc/init.d/ceph stop and issuing of the shutdown command, the
logging abruptly ending seems to be unlikely related to the shutdown at
all.

 If any of the OSD developers have the time it's conceivable a copy of
 the OSD journal would be enlightening (if e.g. the header offsets are
 wrong but there are a bunch of valid journal entries), but this is two
 reports of this issue from you and none very similar from anybody
 else. I'm still betting on something in the software or hardware stack
 misbehaving. (There aren't that many people running Debian; there are
 lots of people running Ubuntu and we find bad XFS kernels there not
 infrequently; I think you're hitting something like that.)
 
There should be no file system involved with the raw partition SSD
journal, n'est-ce pas?

The hardware is vastly different, the previous case was on an AMD
system with onboard SATA (SP5100), this one is a SM storage goat with LSI
3008.

The only thing they have in common is the Ceph version 0.80.7 (via the
Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
(though there were minor updates on that between those incidents,
backported fixes)
 
A copy of the journal would consist of the entire 10GB partition, since we
don't know where in loop it was at the time, right?

Christian
 
 On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote:
 
  Hello again (marvel at my elephantine memory and thread necromancy)
 
  Firstly, this happened again, details below.
  Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph
  stop which dutifully listed all OSDs as being killed/stopped BEFORE
  rebooting the node.
 
  This is completely new node with significantly different HW than the
  example below.
  But the same SW versions as before (Debian Jessie, Ceph 0.80.7).
  And just like below/before the logs for that OSD have nothing in them
  indicating it did shut down properly (no journal flush done) and when
  coming back on reboot we get the dreaded:
  ---
  2015-05-25 10:32:55.439492 7f568aa157c0  1 journal
  _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes,
  block size 4096 bytes, directio = 1, aio = 1 2015-05-25
  10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding
  journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1
  filestore(/var/lib/ceph/osd/ceph-30) mount failed to open
  journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument
  2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable
  to mount object store ---
 
  I see nothing in the changelogs for 0.80.8 and .9 that seems related to
  this, never mind that from the looks of it the repository at Ceph has
  only Wheezy (bpo70) packages and Debian Jessie is still stuck at
  0.80.7 (Sid just went to .9 last week)
 
  I'm preserving the state of things as they are for a few days, so if
  any developer would like a peek or more details, speak up now.
 
  I'd open an issue, but I don't have a reliable way to reproduce this
  and even less desire to do so on this production cluster. ^_-
 
  Christian
 
  On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote:
 
  On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote:
 
   On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com
   wrote:
   
Hello,
   
This morning I decided to reboot a storage node (Debian Jessie,
thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals)
after applying some changes.
   
It came back up one OSD short, the last log lines before the
reboot are: ---
2014-12-05 09:35:27.700330 7f87e789c700  2 --
10.0.8.21:6823/29520  10.0.8.22:0/5161 pipe(0x7f881b772580
sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0)
Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4
pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289
n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288
pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active]
cancel_copy_ops ---
   
Quite obviously it didn't complete its shutdown, so
unsurprisingly we get: ---
2014-12-05 09:37:40.278128 7f218a7037c0  1 journal
_open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes,
block size 4096 bytes, directio = 1, aio = 1 2014-12-05
09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding
journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Christian Balzer
On Thu, 28 May 2015 10:32:18 +0200 Jan Schermer wrote:

 Can you check the capacitor reading on the S3700 with smartctl ? 

I suppose you mean this?
---
175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail  Always   
-   648 (2 2862)
---

Never mind that these are brand new.

This
 drive has non-volatile cache which *should* get flushed when power is
 lost, depending on what hardware does on reboot it might get flushed
 even when rebooting. 

That would probably trigger an increase in the unsafe shutdown count
SMART value. 
I will have to test that from a known starting point, since the current
values are likely from earlier tests and actual shutdowns. 
I'd be surprised if a reboot would drop power to the drives, but it is a
possibility of course.

However I'm VERY unconvinced that this could result in data loss, with the
SSDs in perfect CAPS health. 

I just got this drive for testing yesterday and
 it’s a beast, but some things were peculiar - for example my fio
 benchmark slowed down (35K IOPS - 5K IOPS) after several GB (random -
 5-40) written, and then it would creep back up over time even under
 load. Disabling write cache helps, no idea why.
 
I haven't seen that behavior with DC S3700s, but with 5xx ones and
some Samsung, yes.

Christian

 Z.
 
 
  On 28 May 2015, at 09:22, Christian Balzer ch...@gol.com wrote:
  
  
  Hello Greg,
  
  On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
  
  The description of the logging abruptly ending and the journal being
  bad really sounds like part of the disk is going back in time. I'm not
  sure if XFS internally is set up in such a way that something like
  losing part of its journal would allow that?
  
  I'm special. ^o^
  No XFS, EXT4. As stated in the original thread, below.
  And the (OSD) journal is a raw partition on a DC S3700.
  
  And since there was at least a 30 seconds pause between the completion
  of the /etc/init.d/ceph stop and issuing of the shutdown command, the
  logging abruptly ending seems to be unlikely related to the shutdown at
  all.
  
  If any of the OSD developers have the time it's conceivable a copy of
  the OSD journal would be enlightening (if e.g. the header offsets are
  wrong but there are a bunch of valid journal entries), but this is two
  reports of this issue from you and none very similar from anybody
  else. I'm still betting on something in the software or hardware stack
  misbehaving. (There aren't that many people running Debian; there are
  lots of people running Ubuntu and we find bad XFS kernels there not
  infrequently; I think you're hitting something like that.)
  
  There should be no file system involved with the raw partition SSD
  journal, n'est-ce pas?
  
  The hardware is vastly different, the previous case was on an AMD
  system with onboard SATA (SP5100), this one is a SM storage goat with
  LSI 3008.
  
  The only thing they have in common is the Ceph version 0.80.7 (via the
  Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
  (though there were minor updates on that between those incidents,
  backported fixes)
  
  A copy of the journal would consist of the entire 10GB partition,
  since we don't know where in loop it was at the time, right?
  
  Christian
  
  On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com
  wrote:
  
  Hello again (marvel at my elephantine memory and thread necromancy)
  
  Firstly, this happened again, details below.
  Secondly, as I changed things to sysv-init AND did a
  /etc/init.d/ceph stop which dutifully listed all OSDs as being
  killed/stopped BEFORE rebooting the node.
  
  This is completely new node with significantly different HW than the
  example below.
  But the same SW versions as before (Debian Jessie, Ceph 0.80.7).
  And just like below/before the logs for that OSD have nothing in them
  indicating it did shut down properly (no journal flush done) and
  when coming back on reboot we get the dreaded:
  ---
  2015-05-25 10:32:55.439492 7f568aa157c0  1 journal
  _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes,
  block size 4096 bytes, directio = 1, aio = 1 2015-05-25
  10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding
  journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1
  filestore(/var/lib/ceph/osd/ceph-30) mount failed to open
  journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument
  2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable
  to mount object store ---
  
  I see nothing in the changelogs for 0.80.8 and .9 that seems related
  to this, never mind that from the looks of it the repository at Ceph
  has only Wheezy (bpo70) packages and Debian Jessie is still stuck at
  0.80.7 (Sid just went to .9 last week)
  
  I'm preserving the state of things as they are for a few days, so if
  any developer would like a peek or more details, speak up now.
  
  I'd open an issue, but I don't have a reliable way to reproduce this
  

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Jan Schermer
Can you check the capacitor reading on the S3700 with smartctl ? This drive has 
non-volatile cache which *should* get flushed when power is lost, depending on 
what hardware does on reboot it might get flushed even when rebooting.
I just got this drive for testing yesterday and it’s a beast, but some things 
were peculiar - for example my fio benchmark slowed down (35K IOPS - 5K IOPS) 
after several GB (random - 5-40) written, and then it would creep back up over 
time even under load. Disabling write cache helps, no idea why.

Z.


 On 28 May 2015, at 09:22, Christian Balzer ch...@gol.com wrote:
 
 
 Hello Greg,
 
 On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
 
 The description of the logging abruptly ending and the journal being
 bad really sounds like part of the disk is going back in time. I'm not
 sure if XFS internally is set up in such a way that something like
 losing part of its journal would allow that?
 
 I'm special. ^o^
 No XFS, EXT4. As stated in the original thread, below.
 And the (OSD) journal is a raw partition on a DC S3700.
 
 And since there was at least a 30 seconds pause between the completion of
 the /etc/init.d/ceph stop and issuing of the shutdown command, the
 logging abruptly ending seems to be unlikely related to the shutdown at
 all.
 
 If any of the OSD developers have the time it's conceivable a copy of
 the OSD journal would be enlightening (if e.g. the header offsets are
 wrong but there are a bunch of valid journal entries), but this is two
 reports of this issue from you and none very similar from anybody
 else. I'm still betting on something in the software or hardware stack
 misbehaving. (There aren't that many people running Debian; there are
 lots of people running Ubuntu and we find bad XFS kernels there not
 infrequently; I think you're hitting something like that.)
 
 There should be no file system involved with the raw partition SSD
 journal, n'est-ce pas?
 
 The hardware is vastly different, the previous case was on an AMD
 system with onboard SATA (SP5100), this one is a SM storage goat with LSI
 3008.
 
 The only thing they have in common is the Ceph version 0.80.7 (via the
 Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
 (though there were minor updates on that between those incidents,
 backported fixes)
 
 A copy of the journal would consist of the entire 10GB partition, since we
 don't know where in loop it was at the time, right?
 
 Christian
 
 On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote:
 
 Hello again (marvel at my elephantine memory and thread necromancy)
 
 Firstly, this happened again, details below.
 Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph
 stop which dutifully listed all OSDs as being killed/stopped BEFORE
 rebooting the node.
 
 This is completely new node with significantly different HW than the
 example below.
 But the same SW versions as before (Debian Jessie, Ceph 0.80.7).
 And just like below/before the logs for that OSD have nothing in them
 indicating it did shut down properly (no journal flush done) and when
 coming back on reboot we get the dreaded:
 ---
 2015-05-25 10:32:55.439492 7f568aa157c0  1 journal
 _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes,
 block size 4096 bytes, directio = 1, aio = 1 2015-05-25
 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding
 journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1
 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open
 journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument
 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable
 to mount object store ---
 
 I see nothing in the changelogs for 0.80.8 and .9 that seems related to
 this, never mind that from the looks of it the repository at Ceph has
 only Wheezy (bpo70) packages and Debian Jessie is still stuck at
 0.80.7 (Sid just went to .9 last week)
 
 I'm preserving the state of things as they are for a few days, so if
 any developer would like a peek or more details, speak up now.
 
 I'd open an issue, but I don't have a reliable way to reproduce this
 and even less desire to do so on this production cluster. ^_-
 
 Christian
 
 On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote:
 
 On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote:
 
 On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com
 wrote:
 
 Hello,
 
 This morning I decided to reboot a storage node (Debian Jessie,
 thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals)
 after applying some changes.
 
 It came back up one OSD short, the last log lines before the
 reboot are: ---
 2014-12-05 09:35:27.700330 7f87e789c700  2 --
 10.0.8.21:6823/29520  10.0.8.22:0/5161 pipe(0x7f881b772580
 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0)
 Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4
 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289
 n=8 ec=5 les/c 289/289 

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Jan Schermer

 On 28 May 2015, at 10:56, Christian Balzer ch...@gol.com wrote:
 
 On Thu, 28 May 2015 10:32:18 +0200 Jan Schermer wrote:
 
 Can you check the capacitor reading on the S3700 with smartctl ? 
 
 I suppose you mean this?
 ---
 175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail  Always  
  -   648 (2 2862)
 ---
 
 Never mind that these are brand new.
 

Most of the failures occur on either very new or very old hardware :-)

 This
 drive has non-volatile cache which *should* get flushed when power is
 lost, depending on what hardware does on reboot it might get flushed
 even when rebooting. 
 
 That would probably trigger an increase in the unsafe shutdown count
 SMART value. 
 I will have to test that from a known starting point, since the current
 values are likely from earlier tests and actual shutdowns. 
 I'd be surprised if a reboot would drop power to the drives, but it is a
 possibility of course.
 
 However I'm VERY unconvinced that this could result in data loss, with the
 SSDs in perfect CAPS health. 
 

You are right, it shouldn’t happen, but stuff happens.

 I just got this drive for testing yesterday and
 it’s a beast, but some things were peculiar - for example my fio
 benchmark slowed down (35K IOPS - 5K IOPS) after several GB (random -
 5-40) written, and then it would creep back up over time even under
 load. Disabling write cache helps, no idea why.
 
 I haven't seen that behavior with DC S3700s, but with 5xx ones and
 some Samsung, yes.


Try this simple test

fio --filename=/dev/$device --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 
--iodepth=1 --runtime=60 --time_based --name=journal-test —size=10M
(play with iodepth, if I remember correctly then the highest gain was with 
iodepth=1, higher depths reach almost the max without disabling write cache)
first run with WC enabled
hdparm -W1 /dev/$device
then with WCE disabled
hdparm -W0 /dev/$device

I get much higher IOPS with cache disabled on all SSDs I tested - Kingston, 
Samsung, Intel. I think it disables compression on those drives that use it 
internally, and it probably causes the SSD not to wait for other IOs to 
coalesce it with. This might have a very bad effect on the drive longevity in 
the long run, though...

Jan

 
 Christian
 
 Z.
 
 
 On 28 May 2015, at 09:22, Christian Balzer ch...@gol.com wrote:
 
 
 Hello Greg,
 
 On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
 
 The description of the logging abruptly ending and the journal being
 bad really sounds like part of the disk is going back in time. I'm not
 sure if XFS internally is set up in such a way that something like
 losing part of its journal would allow that?
 
 I'm special. ^o^
 No XFS, EXT4. As stated in the original thread, below.
 And the (OSD) journal is a raw partition on a DC S3700.
 
 And since there was at least a 30 seconds pause between the completion
 of the /etc/init.d/ceph stop and issuing of the shutdown command, the
 logging abruptly ending seems to be unlikely related to the shutdown at
 all.
 
 If any of the OSD developers have the time it's conceivable a copy of
 the OSD journal would be enlightening (if e.g. the header offsets are
 wrong but there are a bunch of valid journal entries), but this is two
 reports of this issue from you and none very similar from anybody
 else. I'm still betting on something in the software or hardware stack
 misbehaving. (There aren't that many people running Debian; there are
 lots of people running Ubuntu and we find bad XFS kernels there not
 infrequently; I think you're hitting something like that.)
 
 There should be no file system involved with the raw partition SSD
 journal, n'est-ce pas?
 
 The hardware is vastly different, the previous case was on an AMD
 system with onboard SATA (SP5100), this one is a SM storage goat with
 LSI 3008.
 
 The only thing they have in common is the Ceph version 0.80.7 (via the
 Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
 (though there were minor updates on that between those incidents,
 backported fixes)
 
 A copy of the journal would consist of the entire 10GB partition,
 since we don't know where in loop it was at the time, right?
 
 Christian
 
 On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com
 wrote:
 
 Hello again (marvel at my elephantine memory and thread necromancy)
 
 Firstly, this happened again, details below.
 Secondly, as I changed things to sysv-init AND did a
 /etc/init.d/ceph stop which dutifully listed all OSDs as being
 killed/stopped BEFORE rebooting the node.
 
 This is completely new node with significantly different HW than the
 example below.
 But the same SW versions as before (Debian Jessie, Ceph 0.80.7).
 And just like below/before the logs for that OSD have nothing in them
 indicating it did shut down properly (no journal flush done) and
 when coming back on reboot we get the dreaded:
 ---
 2015-05-25 10:32:55.439492 7f568aa157c0  1 journal
 _open 

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-28 Thread Gregory Farnum
On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote:

 Hello Greg,

 On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:

 The description of the logging abruptly ending and the journal being
 bad really sounds like part of the disk is going back in time. I'm not
 sure if XFS internally is set up in such a way that something like
 losing part of its journal would allow that?

 I'm special. ^o^
 No XFS, EXT4. As stated in the original thread, below.
 And the (OSD) journal is a raw partition on a DC S3700.

 And since there was at least a 30 seconds pause between the completion of
 the /etc/init.d/ceph stop and issuing of the shutdown command, the
 logging abruptly ending seems to be unlikely related to the shutdown at
 all.

Oh, sorry...
I happened to read this article last night:
http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/

Depending on configuration (I think you'd need to have a
journal-as-file) you could be experiencing that. And again, not many
people use ext4 so who knows what other ways there are of things being
broken that nobody else has seen yet.


 If any of the OSD developers have the time it's conceivable a copy of
 the OSD journal would be enlightening (if e.g. the header offsets are
 wrong but there are a bunch of valid journal entries), but this is two
 reports of this issue from you and none very similar from anybody
 else. I'm still betting on something in the software or hardware stack
 misbehaving. (There aren't that many people running Debian; there are
 lots of people running Ubuntu and we find bad XFS kernels there not
 infrequently; I think you're hitting something like that.)

 There should be no file system involved with the raw partition SSD
 journal, n'est-ce pas?

...and I guess probably you aren't since you are using partitions.


 The hardware is vastly different, the previous case was on an AMD
 system with onboard SATA (SP5100), this one is a SM storage goat with LSI
 3008.

 The only thing they have in common is the Ceph version 0.80.7 (via the
 Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16
 (though there were minor updates on that between those incidents,
 backported fixes)

 A copy of the journal would consist of the entire 10GB partition, since we
 don't know where in loop it was at the time, right?

Yeah.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-27 Thread Gregory Farnum
The description of the logging abruptly ending and the journal being
bad really sounds like part of the disk is going back in time. I'm not
sure if XFS internally is set up in such a way that something like
losing part of its journal would allow that?

If any of the OSD developers have the time it's conceivable a copy of
the OSD journal would be enlightening (if e.g. the header offsets are
wrong but there are a bunch of valid journal entries), but this is two
reports of this issue from you and none very similar from anybody
else. I'm still betting on something in the software or hardware stack
misbehaving. (There aren't that many people running Debian; there are
lots of people running Ubuntu and we find bad XFS kernels there not
infrequently; I think you're hitting something like that.)
-Greg

On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote:

 Hello again (marvel at my elephantine memory and thread necromancy)

 Firstly, this happened again, details below.
 Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph
 stop which dutifully listed all OSDs as being killed/stopped BEFORE
 rebooting the node.

 This is completely new node with significantly different HW than the
 example below.
 But the same SW versions as before (Debian Jessie, Ceph 0.80.7).
 And just like below/before the logs for that OSD have nothing in them
 indicating it did shut down properly (no journal flush done) and when
 coming back on reboot we get the dreaded:
 ---
 2015-05-25 10:32:55.439492 7f568aa157c0  1 journal _open 
 /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding 
 journal header
 2015-05-25 10:32:55.439905 7f568aa157c0 -1 
 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open journal 
 /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument
 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount 
 object store
 ---

 I see nothing in the changelogs for 0.80.8 and .9 that seems related to
 this, never mind that from the looks of it the repository at Ceph has only
 Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid
 just went to .9 last week)

 I'm preserving the state of things as they are for a few days, so if any
 developer would like a peek or more details, speak up now.

 I'd open an issue, but I don't have a reliable way to reproduce this and
 even less desire to do so on this production cluster. ^_-

 Christian

 On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote:

 On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote:

  On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote:
  
   Hello,
  
   This morning I decided to reboot a storage node (Debian Jessie, thus
   3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after
   applying some changes.
  
   It came back up one OSD short, the last log lines before the reboot
   are: ---
   2014-12-05 09:35:27.700330 7f87e789c700  2 -- 10.0.8.21:6823/29520 
   10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1
   c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350
   7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347
   (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288)
   [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346
   active] cancel_copy_ops ---
  
   Quite obviously it didn't complete its shutdown, so unsurprisingly we
   get: ---
   2014-12-05 09:37:40.278128 7f218a7037c0  1 journal
   _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes,
   block size 4096 bytes, directio = 1, aio = 1 2014-12-05
   09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding
   journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1
   filestore(/var/lib/ceph/osd/ceph-4) mount failed to open
   journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument
   2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable
   to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1
   ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m ---
  
   Thankfully this isn't production yet and I was eventually able to
   recover the OSD by re-creating the journal (ceph-osd -i 4
   --mkjournal), but it leaves me with a rather bad taste in my mouth.
  
   So the pertinent questions would be:
  
   1. What caused this?
   My bet is on the evil systemd just pulling the plug before the poor
   OSD had finished its shutdown job.
  
   2. How to prevent it from happening again?
   Is there something the Ceph developers can do with regards to init
   scripts? Or is this something to be brought up with the Debian
   maintainer? Debian is transiting from sysv-init to systemd (booo!)
   with Jessie, but the OSDs still have a sysvinit magic file in their
   top directory. Could this have an affect on things?
  
   3. Is it really that easy to 

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2015-05-24 Thread Christian Balzer

Hello again (marvel at my elephantine memory and thread necromancy)

Firstly, this happened again, details below.
Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph
stop which dutifully listed all OSDs as being killed/stopped BEFORE
rebooting the node.

This is completely new node with significantly different HW than the
example below. 
But the same SW versions as before (Debian Jessie, Ceph 0.80.7).
And just like below/before the logs for that OSD have nothing in them
indicating it did shut down properly (no journal flush done) and when
coming back on reboot we get the dreaded:
---
2015-05-25 10:32:55.439492 7f568aa157c0  1 journal _open 
/var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding 
journal header
2015-05-25 10:32:55.439905 7f568aa157c0 -1 filestore(/var/lib/ceph/osd/ceph-30) 
mount failed to open journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid 
argument
2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount 
object store
---

I see nothing in the changelogs for 0.80.8 and .9 that seems related to
this, never mind that from the looks of it the repository at Ceph has only
Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid
just went to .9 last week)

I'm preserving the state of things as they are for a few days, so if any
developer would like a peek or more details, speak up now.

I'd open an issue, but I don't have a reliable way to reproduce this and
even less desire to do so on this production cluster. ^_-

Christian

On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote:

 On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote:
 
  On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote:
  
   Hello,
  
   This morning I decided to reboot a storage node (Debian Jessie, thus
   3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after
   applying some changes.
  
   It came back up one OSD short, the last log lines before the reboot
   are: ---
   2014-12-05 09:35:27.700330 7f87e789c700  2 -- 10.0.8.21:6823/29520 
   10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1
   c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350
   7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347
   (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288)
   [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346
   active] cancel_copy_ops ---
  
   Quite obviously it didn't complete its shutdown, so unsurprisingly we
   get: ---
   2014-12-05 09:37:40.278128 7f218a7037c0  1 journal
   _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes,
   block size 4096 bytes, directio = 1, aio = 1 2014-12-05
   09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding
   journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1
   filestore(/var/lib/ceph/osd/ceph-4) mount failed to open
   journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument
   2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable
   to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1
   ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m ---
  
   Thankfully this isn't production yet and I was eventually able to
   recover the OSD by re-creating the journal (ceph-osd -i 4
   --mkjournal), but it leaves me with a rather bad taste in my mouth.
  
   So the pertinent questions would be:
  
   1. What caused this?
   My bet is on the evil systemd just pulling the plug before the poor
   OSD had finished its shutdown job.
  
   2. How to prevent it from happening again?
   Is there something the Ceph developers can do with regards to init
   scripts? Or is this something to be brought up with the Debian
   maintainer? Debian is transiting from sysv-init to systemd (booo!)
   with Jessie, but the OSDs still have a sysvinit magic file in their
   top directory. Could this have an affect on things?
  
   3. Is it really that easy to trash your OSDs?
   In the case a storage node crashes, am I to expect most if not all
   OSDs or at least their journals to require manual loving?
  
  So this can't happen. 
 
 Good thing you quoted that, as it clearly did. ^o^
 
 Now the question of how exactly remains to be answered.
 
  Being force killed definitely can't kill the
  OSD's disk state; that's the whole point of the journaling. 
 
 The other OSDs got to the point where they logged journal flush done,
 this one didn't. Coincidence? I think not.
 
 Totally agree about the point of journaling being to prevent this kind of
 situation of course.
 
  The error
  message indicates that the header written on disk is nonsense to the
  OSD, which means that the local filesystem or disk lost something
  somehow (assuming you haven't done something silly like downgrading
  the software version it's running) and doesn't know it (if there 

Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2014-12-05 Thread Gregory Farnum
On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote:

 Hello,

 This morning I decided to reboot a storage node (Debian Jessie, thus 3.16
 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some
 changes.

 It came back up one OSD short, the last log lines before the reboot are:
 ---
 2014-12-05 09:35:27.700330 7f87e789c700  2 -- 10.0.8.21:6823/29520  
 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 
 c=0x7f881f469020).fault (0) Success
 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 
 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) 
 [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] 
 cancel_copy_ops
 ---

 Quite obviously it didn't complete its shutdown, so unsurprisingly we get:
 ---
 2014-12-05 09:37:40.278128 7f218a7037c0  1 journal _open 
 /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 
 bytes, directio = 1, aio = 1
 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding 
 journal header
 2014-12-05 09:37:40.278479 7f218a7037c0 -1 
 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal 
 /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument
 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount 
 object store
 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init 
 failed: (22) Invalid argument
 ESC[0m
 ---

 Thankfully this isn't production yet and I was eventually able to recover
 the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it
 leaves me with a rather bad taste in my mouth.

 So the pertinent questions would be:

 1. What caused this?
 My bet is on the evil systemd just pulling the plug before the poor OSD
 had finished its shutdown job.

 2. How to prevent it from happening again?
 Is there something the Ceph developers can do with regards to init scripts?
 Or is this something to be brought up with the Debian maintainer?
 Debian is transiting from sysv-init to systemd (booo!) with Jessie, but
 the OSDs still have a sysvinit magic file in their top directory. Could
 this have an affect on things?

 3. Is it really that easy to trash your OSDs?
 In the case a storage node crashes, am I to expect most if not all OSDs or
 at least their journals to require manual loving?

So this can't happen. Being force killed definitely can't kill the
OSD's disk state; that's the whole point of the journaling. The error
message indicates that the header written on disk is nonsense to the
OSD, which means that the local filesystem or disk lost something
somehow (assuming you haven't done something silly like downgrading
the software version it's running) and doesn't know it (if there had
been a read error the output would be different). I'd double-check
your disk settings etc just to be sure, and check for known issues
with xfs on Jessie.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2014-12-05 Thread Christian Balzer
On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote:

 On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote:
 
  Hello,
 
  This morning I decided to reboot a storage node (Debian Jessie, thus
  3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after
  applying some changes.
 
  It came back up one OSD short, the last log lines before the reboot
  are: ---
  2014-12-05 09:35:27.700330 7f87e789c700  2 -- 10.0.8.21:6823/29520 
  10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1
  c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350
  7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347
  (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288)
  [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346
  active] cancel_copy_ops ---
 
  Quite obviously it didn't complete its shutdown, so unsurprisingly we
  get: ---
  2014-12-05 09:37:40.278128 7f218a7037c0  1 journal
  _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block
  size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427
  7f218a7037c0 -1 journal read_header error decoding journal header
  2014-12-05 09:37:40.278479 7f218a7037c0 -1
  filestore(/var/lib/ceph/osd/ceph-4) mount failed to open
  journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument
  2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to
  mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1
  ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m ---
 
  Thankfully this isn't production yet and I was eventually able to
  recover the OSD by re-creating the journal (ceph-osd -i 4
  --mkjournal), but it leaves me with a rather bad taste in my mouth.
 
  So the pertinent questions would be:
 
  1. What caused this?
  My bet is on the evil systemd just pulling the plug before the poor OSD
  had finished its shutdown job.
 
  2. How to prevent it from happening again?
  Is there something the Ceph developers can do with regards to init
  scripts? Or is this something to be brought up with the Debian
  maintainer? Debian is transiting from sysv-init to systemd (booo!)
  with Jessie, but the OSDs still have a sysvinit magic file in their
  top directory. Could this have an affect on things?
 
  3. Is it really that easy to trash your OSDs?
  In the case a storage node crashes, am I to expect most if not all
  OSDs or at least their journals to require manual loving?
 
 So this can't happen. 

Good thing you quoted that, as it clearly did. ^o^

Now the question of how exactly remains to be answered.

 Being force killed definitely can't kill the
 OSD's disk state; that's the whole point of the journaling. 

The other OSDs got to the point where they logged journal flush done,
this one didn't. Coincidence? I think not.

Totally agree about the point of journaling being to prevent this kind of
situation of course.

 The error
 message indicates that the header written on disk is nonsense to the
 OSD, which means that the local filesystem or disk lost something
 somehow (assuming you haven't done something silly like downgrading
 the software version it's running) and doesn't know it (if there had
 been a read error the output would be different). 

The journal is on an SSD, as stated. 
And before you ask it's on an Intel DC S3700.

This was created on 0.80.7 just a day before, so no version games.

 I'd double-check
 your disk settings etc just to be sure, and check for known issues
 with xfs on Jessie.
 
I'm using ext4, but that shouldn't be an issue here to begin with, as the
journal is a raw SSD partition.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)

2014-12-04 Thread Christian Balzer

Hello,

This morning I decided to reboot a storage node (Debian Jessie, thus 3.16
kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some
changes. 

It came back up one OSD short, the last log lines before the reboot are:
---
2014-12-05 09:35:27.700330 7f87e789c700  2 -- 10.0.8.21:6823/29520  
10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 
c=0x7f881f469020).fault (0) Success
2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 
289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) 
[8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] 
cancel_copy_ops
---

Quite obviously it didn't complete its shutdown, so unsurprisingly we get:
---
2014-12-05 09:37:40.278128 7f218a7037c0  1 journal _open 
/var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 
bytes, directio = 1, aio = 1
2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding 
journal header
2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) 
mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid 
argument
2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount 
object store
2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: 
(22) Invalid argument
ESC[0m
---

Thankfully this isn't production yet and I was eventually able to recover
the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it
leaves me with a rather bad taste in my mouth.

So the pertinent questions would be:

1. What caused this? 
My bet is on the evil systemd just pulling the plug before the poor OSD
had finished its shutdown job. 

2. How to prevent it from happening again?
Is there something the Ceph developers can do with regards to init scripts?
Or is this something to be brought up with the Debian maintainer?
Debian is transiting from sysv-init to systemd (booo!) with Jessie, but
the OSDs still have a sysvinit magic file in their top directory. Could
this have an affect on things?

3. Is it really that easy to trash your OSDs?
In the case a storage node crashes, am I to expect most if not all OSDs or
at least their journals to require manual loving?


Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com