Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Right - I see from the 0.80.8 notes that we merged a fix for #9073. However (unfortunately) there were a number of patches that we experimented with on this issue - and this looks like one of the earlier ones (i.e not what we merged into master at the time), which is a bit confusing (maybe it was to avoid a more invasive patch...). Maybe Somnath or Jianpeng know why? Cheers Mark On 08/06/15 20:08, Christian Balzer wrote: Mark, one would hope you can't with 0.80.9 as per the release notes, while 0.80.7 definitely was susceptible. Christian On Mon, 08 Jun 2015 20:05:20 +1200 Mark Kirkwood wrote: Trying out some tests on my pet VMs with 0.80.9 does not elicit any journal failures...However ISTR that running on the bare metal was the most reliable way to reproduce...(proceeding - currently cannot get ceph-deploy to install this configuration...I'll investigate further tomorrow)! Cheers Mark On 06/06/15 18:04, Mark Kirkwood wrote: Righty - I'll see if I can replicate what you see if I setup an 0.80.9 cluster using the same workstation hardware (WD Raptors and Intel 520s) that showed up the issue previously at 0.83 (I wonder if I never tried a fresh install using the 0.80.* tree)... May be a few days... On 05/06/15 16:49, Christian Balzer wrote: Hello, On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote: Well, whatever it is, I appear to not be the only one after all: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361 Looking quickly at the relevant code: FileJournal::stop_writer() in src/os/FileJpurnal.cc I see that we didn't start seeing the (original) issue until changes in 0.83, which suggests that 0.80 tree might not be doing the same thing. *However* I note that I'm not happy with the placement of the two thread join operations in there - it *looks* to me like 0.80 could in fact be vulnerable to the same journal corrupting problem, so if it occurs again might be interesting to apply the gist of https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of course would be best if this was on a test system)! Alas this is neither a test cluster, nor do I have things set up to compile from source here ATM. Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Trying out some tests on my pet VMs with 0.80.9 does not elicit any journal failures...However ISTR that running on the bare metal was the most reliable way to reproduce...(proceeding - currently cannot get ceph-deploy to install this configuration...I'll investigate further tomorrow)! Cheers Mark On 06/06/15 18:04, Mark Kirkwood wrote: Righty - I'll see if I can replicate what you see if I setup an 0.80.9 cluster using the same workstation hardware (WD Raptors and Intel 520s) that showed up the issue previously at 0.83 (I wonder if I never tried a fresh install using the 0.80.* tree)... May be a few days... On 05/06/15 16:49, Christian Balzer wrote: Hello, On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote: Well, whatever it is, I appear to not be the only one after all: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361 Looking quickly at the relevant code: FileJournal::stop_writer() in src/os/FileJpurnal.cc I see that we didn't start seeing the (original) issue until changes in 0.83, which suggests that 0.80 tree might not be doing the same thing. *However* I note that I'm not happy with the placement of the two thread join operations in there - it *looks* to me like 0.80 could in fact be vulnerable to the same journal corrupting problem, so if it occurs again might be interesting to apply the gist of https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of course would be best if this was on a test system)! Alas this is neither a test cluster, nor do I have things set up to compile from source here ATM. Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Mark, one would hope you can't with 0.80.9 as per the release notes, while 0.80.7 definitely was susceptible. Christian On Mon, 08 Jun 2015 20:05:20 +1200 Mark Kirkwood wrote: Trying out some tests on my pet VMs with 0.80.9 does not elicit any journal failures...However ISTR that running on the bare metal was the most reliable way to reproduce...(proceeding - currently cannot get ceph-deploy to install this configuration...I'll investigate further tomorrow)! Cheers Mark On 06/06/15 18:04, Mark Kirkwood wrote: Righty - I'll see if I can replicate what you see if I setup an 0.80.9 cluster using the same workstation hardware (WD Raptors and Intel 520s) that showed up the issue previously at 0.83 (I wonder if I never tried a fresh install using the 0.80.* tree)... May be a few days... On 05/06/15 16:49, Christian Balzer wrote: Hello, On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote: Well, whatever it is, I appear to not be the only one after all: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361 Looking quickly at the relevant code: FileJournal::stop_writer() in src/os/FileJpurnal.cc I see that we didn't start seeing the (original) issue until changes in 0.83, which suggests that 0.80 tree might not be doing the same thing. *However* I note that I'm not happy with the placement of the two thread join operations in there - it *looks* to me like 0.80 could in fact be vulnerable to the same journal corrupting problem, so if it occurs again might be interesting to apply the gist of https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of course would be best if this was on a test system)! Alas this is neither a test cluster, nor do I have things set up to compile from source here ATM. Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Righty - I'll see if I can replicate what you see if I setup an 0.80.9 cluster using the same workstation hardware (WD Raptors and Intel 520s) that showed up the issue previously at 0.83 (I wonder if I never tried a fresh install using the 0.80.* tree)... May be a few days... On 05/06/15 16:49, Christian Balzer wrote: Hello, On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote: Well, whatever it is, I appear to not be the only one after all: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361 Looking quickly at the relevant code: FileJournal::stop_writer() in src/os/FileJpurnal.cc I see that we didn't start seeing the (original) issue until changes in 0.83, which suggests that 0.80 tree might not be doing the same thing. *However* I note that I'm not happy with the placement of the two thread join operations in there - it *looks* to me like 0.80 could in fact be vulnerable to the same journal corrupting problem, so if it occurs again might be interesting to apply the gist of https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of course would be best if this was on a test system)! Alas this is neither a test cluster, nor do I have things set up to compile from source here ATM. Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello, On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote: Well, whatever it is, I appear to not be the only one after all: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361 Looking quickly at the relevant code: FileJournal::stop_writer() in src/os/FileJpurnal.cc I see that we didn't start seeing the (original) issue until changes in 0.83, which suggests that 0.80 tree might not be doing the same thing. *However* I note that I'm not happy with the placement of the two thread join operations in there - it *looks* to me like 0.80 could in fact be vulnerable to the same journal corrupting problem, so if it occurs again might be interesting to apply the gist of https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of course would be best if this was on a test system)! Alas this is neither a test cluster, nor do I have things set up to compile from source here ATM. Christian Cheers Mark On 05/06/15 15:28, Christian Balzer wrote: Hello Mark, On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote: Sorry Christian, I did briefly wonder, then thought, oh yeah, that fix is already merged in...However - on reflection, perhaps *not* in the 0.80 tree...doh! No worries, I'm just happy to hear that you think it's the same thing as well. I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement after setting straw_calc_version 1 and doing a reweight all) today. Should it happen again, I know who and where to poke. ^^ Christian On 04/06/15 18:57, Christian Balzer wrote: Hello, Actually after going through the changelogs with a fine comb and the ole Mark I eyeball I think I might be seeing this: --- osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath Roy) --- The details in the various related bug reports certainly make it look related. Funny that nobody involved in those bug reports noticed the similarity. Now I wouldn't have installed 0.80.8 due to the regression speed bug anyway, but now that 0.80.9 has made it into Jessie backports I shall install that tomorrow and hopefully never see that problem again. Christian On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote: On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Yeah. -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Looking quickly at the relevant code: FileJournal::stop_writer() in src/os/FileJpurnal.cc I see that we didn't start seeing the (original) issue until changes in 0.83, which suggests that 0.80 tree might not be doing the same thing. *However* I note that I'm not happy with the placement of the two thread join operations in there - it *looks* to me like 0.80 could in fact be vulnerable to the same journal corrupting problem, so if it occurs again might be interesting to apply the gist of https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of course would be best if this was on a test system)! Cheers Mark On 05/06/15 15:28, Christian Balzer wrote: Hello Mark, On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote: Sorry Christian, I did briefly wonder, then thought, oh yeah, that fix is already merged in...However - on reflection, perhaps *not* in the 0.80 tree...doh! No worries, I'm just happy to hear that you think it's the same thing as well. I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement after setting straw_calc_version 1 and doing a reweight all) today. Should it happen again, I know who and where to poke. ^^ Christian On 04/06/15 18:57, Christian Balzer wrote: Hello, Actually after going through the changelogs with a fine comb and the ole Mark I eyeball I think I might be seeing this: --- osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath Roy) --- The details in the various related bug reports certainly make it look related. Funny that nobody involved in those bug reports noticed the similarity. Now I wouldn't have installed 0.80.8 due to the regression speed bug anyway, but now that 0.80.9 has made it into Jessie backports I shall install that tomorrow and hopefully never see that problem again. Christian On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote: On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Yeah. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello Mark, On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote: Sorry Christian, I did briefly wonder, then thought, oh yeah, that fix is already merged in...However - on reflection, perhaps *not* in the 0.80 tree...doh! No worries, I'm just happy to hear that you think it's the same thing as well. I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement after setting straw_calc_version 1 and doing a reweight all) today. Should it happen again, I know who and where to poke. ^^ Christian On 04/06/15 18:57, Christian Balzer wrote: Hello, Actually after going through the changelogs with a fine comb and the ole Mark I eyeball I think I might be seeing this: --- osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath Roy) --- The details in the various related bug reports certainly make it look related. Funny that nobody involved in those bug reports noticed the similarity. Now I wouldn't have installed 0.80.8 due to the regression speed bug anyway, but now that 0.80.9 has made it into Jessie backports I shall install that tomorrow and hopefully never see that problem again. Christian On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote: On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Yeah. -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello, Actually after going through the changelogs with a fine comb and the ole Mark I eyeball I think I might be seeing this: --- osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath Roy) --- The details in the various related bug reports certainly make it look related. Funny that nobody involved in those bug reports noticed the similarity. Now I wouldn't have installed 0.80.8 due to the regression speed bug anyway, but now that 0.80.9 has made it into Jessie backports I shall install that tomorrow and hopefully never see that problem again. Christian On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote: On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Yeah. -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Sorry Christian, I did briefly wonder, then thought, oh yeah, that fix is already merged in...However - on reflection, perhaps *not* in the 0.80 tree...doh! On 04/06/15 18:57, Christian Balzer wrote: Hello, Actually after going through the changelogs with a fine comb and the ole Mark I eyeball I think I might be seeing this: --- osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng, Somnath Roy) --- The details in the various related bug reports certainly make it look related. Funny that nobody involved in those bug reports noticed the similarity. Now I wouldn't have installed 0.80.8 due to the regression speed bug anyway, but now that 0.80.9 has made it into Jessie backports I shall install that tomorrow and hopefully never see that problem again. Christian On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote: On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Yeah. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Christian On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote: Hello again (marvel at my elephantine memory and thread necromancy) Firstly, this happened again, details below. Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph stop which dutifully listed all OSDs as being killed/stopped BEFORE rebooting the node. This is completely new node with significantly different HW than the example below. But the same SW versions as before (Debian Jessie, Ceph 0.80.7). And just like below/before the logs for that OSD have nothing in them indicating it did shut down properly (no journal flush done) and when coming back on reboot we get the dreaded: --- 2015-05-25 10:32:55.439492 7f568aa157c0 1 journal _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount object store --- I see nothing in the changelogs for 0.80.8 and .9 that seems related to this, never mind that from the looks of it the repository at Ceph has only Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid just went to .9 last week) I'm preserving the state of things as they are for a few days, so if any developer would like a peek or more details, speak up now. I'd open an issue, but I don't have a reliable way to reproduce this and even less desire to do so on this production cluster. ^_- Christian On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote: On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote: On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote: Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
On Thu, 28 May 2015 10:32:18 +0200 Jan Schermer wrote: Can you check the capacitor reading on the S3700 with smartctl ? I suppose you mean this? --- 175 Power_Loss_Cap_Test 0x0033 100 100 010Pre-fail Always - 648 (2 2862) --- Never mind that these are brand new. This drive has non-volatile cache which *should* get flushed when power is lost, depending on what hardware does on reboot it might get flushed even when rebooting. That would probably trigger an increase in the unsafe shutdown count SMART value. I will have to test that from a known starting point, since the current values are likely from earlier tests and actual shutdowns. I'd be surprised if a reboot would drop power to the drives, but it is a possibility of course. However I'm VERY unconvinced that this could result in data loss, with the SSDs in perfect CAPS health. I just got this drive for testing yesterday and it’s a beast, but some things were peculiar - for example my fio benchmark slowed down (35K IOPS - 5K IOPS) after several GB (random - 5-40) written, and then it would creep back up over time even under load. Disabling write cache helps, no idea why. I haven't seen that behavior with DC S3700s, but with 5xx ones and some Samsung, yes. Christian Z. On 28 May 2015, at 09:22, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Christian On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote: Hello again (marvel at my elephantine memory and thread necromancy) Firstly, this happened again, details below. Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph stop which dutifully listed all OSDs as being killed/stopped BEFORE rebooting the node. This is completely new node with significantly different HW than the example below. But the same SW versions as before (Debian Jessie, Ceph 0.80.7). And just like below/before the logs for that OSD have nothing in them indicating it did shut down properly (no journal flush done) and when coming back on reboot we get the dreaded: --- 2015-05-25 10:32:55.439492 7f568aa157c0 1 journal _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount object store --- I see nothing in the changelogs for 0.80.8 and .9 that seems related to this, never mind that from the looks of it the repository at Ceph has only Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid just went to .9 last week) I'm preserving the state of things as they are for a few days, so if any developer would like a peek or more details, speak up now. I'd open an issue, but I don't have a reliable way to reproduce this
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Can you check the capacitor reading on the S3700 with smartctl ? This drive has non-volatile cache which *should* get flushed when power is lost, depending on what hardware does on reboot it might get flushed even when rebooting. I just got this drive for testing yesterday and it’s a beast, but some things were peculiar - for example my fio benchmark slowed down (35K IOPS - 5K IOPS) after several GB (random - 5-40) written, and then it would creep back up over time even under load. Disabling write cache helps, no idea why. Z. On 28 May 2015, at 09:22, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Christian On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote: Hello again (marvel at my elephantine memory and thread necromancy) Firstly, this happened again, details below. Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph stop which dutifully listed all OSDs as being killed/stopped BEFORE rebooting the node. This is completely new node with significantly different HW than the example below. But the same SW versions as before (Debian Jessie, Ceph 0.80.7). And just like below/before the logs for that OSD have nothing in them indicating it did shut down properly (no journal flush done) and when coming back on reboot we get the dreaded: --- 2015-05-25 10:32:55.439492 7f568aa157c0 1 journal _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount object store --- I see nothing in the changelogs for 0.80.8 and .9 that seems related to this, never mind that from the looks of it the repository at Ceph has only Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid just went to .9 last week) I'm preserving the state of things as they are for a few days, so if any developer would like a peek or more details, speak up now. I'd open an issue, but I don't have a reliable way to reproduce this and even less desire to do so on this production cluster. ^_- Christian On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote: On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote: On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote: Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
On 28 May 2015, at 10:56, Christian Balzer ch...@gol.com wrote: On Thu, 28 May 2015 10:32:18 +0200 Jan Schermer wrote: Can you check the capacitor reading on the S3700 with smartctl ? I suppose you mean this? --- 175 Power_Loss_Cap_Test 0x0033 100 100 010Pre-fail Always - 648 (2 2862) --- Never mind that these are brand new. Most of the failures occur on either very new or very old hardware :-) This drive has non-volatile cache which *should* get flushed when power is lost, depending on what hardware does on reboot it might get flushed even when rebooting. That would probably trigger an increase in the unsafe shutdown count SMART value. I will have to test that from a known starting point, since the current values are likely from earlier tests and actual shutdowns. I'd be surprised if a reboot would drop power to the drives, but it is a possibility of course. However I'm VERY unconvinced that this could result in data loss, with the SSDs in perfect CAPS health. You are right, it shouldn’t happen, but stuff happens. I just got this drive for testing yesterday and it’s a beast, but some things were peculiar - for example my fio benchmark slowed down (35K IOPS - 5K IOPS) after several GB (random - 5-40) written, and then it would creep back up over time even under load. Disabling write cache helps, no idea why. I haven't seen that behavior with DC S3700s, but with 5xx ones and some Samsung, yes. Try this simple test fio --filename=/dev/$device --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=journal-test —size=10M (play with iodepth, if I remember correctly then the highest gain was with iodepth=1, higher depths reach almost the max without disabling write cache) first run with WC enabled hdparm -W1 /dev/$device then with WCE disabled hdparm -W0 /dev/$device I get much higher IOPS with cache disabled on all SSDs I tested - Kingston, Samsung, Intel. I think it disables compression on those drives that use it internally, and it probably causes the SSD not to wait for other IOs to coalesce it with. This might have a very bad effect on the drive longevity in the long run, though... Jan Christian Z. On 28 May 2015, at 09:22, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Christian On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote: Hello again (marvel at my elephantine memory and thread necromancy) Firstly, this happened again, details below. Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph stop which dutifully listed all OSDs as being killed/stopped BEFORE rebooting the node. This is completely new node with significantly different HW than the example below. But the same SW versions as before (Debian Jessie, Ceph 0.80.7). And just like below/before the logs for that OSD have nothing in them indicating it did shut down properly (no journal flush done) and when coming back on reboot we get the dreaded: --- 2015-05-25 10:32:55.439492 7f568aa157c0 1 journal _open
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
On Thu, May 28, 2015 at 12:22 AM, Christian Balzer ch...@gol.com wrote: Hello Greg, On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote: The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? I'm special. ^o^ No XFS, EXT4. As stated in the original thread, below. And the (OSD) journal is a raw partition on a DC S3700. And since there was at least a 30 seconds pause between the completion of the /etc/init.d/ceph stop and issuing of the shutdown command, the logging abruptly ending seems to be unlikely related to the shutdown at all. Oh, sorry... I happened to read this article last night: http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/ Depending on configuration (I think you'd need to have a journal-as-file) you could be experiencing that. And again, not many people use ext4 so who knows what other ways there are of things being broken that nobody else has seen yet. If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) There should be no file system involved with the raw partition SSD journal, n'est-ce pas? ...and I guess probably you aren't since you are using partitions. The hardware is vastly different, the previous case was on an AMD system with onboard SATA (SP5100), this one is a SM storage goat with LSI 3008. The only thing they have in common is the Ceph version 0.80.7 (via the Debian repository, not Ceph) and Debian Jessie as OS with kernel 3.16 (though there were minor updates on that between those incidents, backported fixes) A copy of the journal would consist of the entire 10GB partition, since we don't know where in loop it was at the time, right? Yeah. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
The description of the logging abruptly ending and the journal being bad really sounds like part of the disk is going back in time. I'm not sure if XFS internally is set up in such a way that something like losing part of its journal would allow that? If any of the OSD developers have the time it's conceivable a copy of the OSD journal would be enlightening (if e.g. the header offsets are wrong but there are a bunch of valid journal entries), but this is two reports of this issue from you and none very similar from anybody else. I'm still betting on something in the software or hardware stack misbehaving. (There aren't that many people running Debian; there are lots of people running Ubuntu and we find bad XFS kernels there not infrequently; I think you're hitting something like that.) -Greg On Sun, May 24, 2015 at 7:26 PM, Christian Balzer ch...@gol.com wrote: Hello again (marvel at my elephantine memory and thread necromancy) Firstly, this happened again, details below. Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph stop which dutifully listed all OSDs as being killed/stopped BEFORE rebooting the node. This is completely new node with significantly different HW than the example below. But the same SW versions as before (Debian Jessie, Ceph 0.80.7). And just like below/before the logs for that OSD have nothing in them indicating it did shut down properly (no journal flush done) and when coming back on reboot we get the dreaded: --- 2015-05-25 10:32:55.439492 7f568aa157c0 1 journal _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount object store --- I see nothing in the changelogs for 0.80.8 and .9 that seems related to this, never mind that from the looks of it the repository at Ceph has only Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid just went to .9 last week) I'm preserving the state of things as they are for a few days, so if any developer would like a peek or more details, speak up now. I'd open an issue, but I don't have a reliable way to reproduce this and even less desire to do so on this production cluster. ^_- Christian On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote: On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote: On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote: Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m --- Thankfully this isn't production yet and I was eventually able to recover the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it leaves me with a rather bad taste in my mouth. So the pertinent questions would be: 1. What caused this? My bet is on the evil systemd just pulling the plug before the poor OSD had finished its shutdown job. 2. How to prevent it from happening again? Is there something the Ceph developers can do with regards to init scripts? Or is this something to be brought up with the Debian maintainer? Debian is transiting from sysv-init to systemd (booo!) with Jessie, but the OSDs still have a sysvinit magic file in their top directory. Could this have an affect on things? 3. Is it really that easy to
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello again (marvel at my elephantine memory and thread necromancy) Firstly, this happened again, details below. Secondly, as I changed things to sysv-init AND did a /etc/init.d/ceph stop which dutifully listed all OSDs as being killed/stopped BEFORE rebooting the node. This is completely new node with significantly different HW than the example below. But the same SW versions as before (Debian Jessie, Ceph 0.80.7). And just like below/before the logs for that OSD have nothing in them indicating it did shut down properly (no journal flush done) and when coming back on reboot we get the dreaded: --- 2015-05-25 10:32:55.439492 7f568aa157c0 1 journal _open /var/lib/ceph/osd/ceph-30/journal fd 23: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-05-25 10:32:55.439859 7f568aa157c0 -1 journal read_header error decoding journal header 2015-05-25 10:32:55.439905 7f568aa157c0 -1 filestore(/var/lib/ceph/osd/ceph-30) mount failed to open journal /var/lib/ceph/osd/ceph-30/journal: (22) Invalid argument 2015-05-25 10:32:55.936975 7f568aa157c0 -1 osd.30 0 OSD:init: unable to mount object store --- I see nothing in the changelogs for 0.80.8 and .9 that seems related to this, never mind that from the looks of it the repository at Ceph has only Wheezy (bpo70) packages and Debian Jessie is still stuck at 0.80.7 (Sid just went to .9 last week) I'm preserving the state of things as they are for a few days, so if any developer would like a peek or more details, speak up now. I'd open an issue, but I don't have a reliable way to reproduce this and even less desire to do so on this production cluster. ^_- Christian On Sat, 6 Dec 2014 12:48:25 +0900 Christian Balzer wrote: On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote: On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote: Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m --- Thankfully this isn't production yet and I was eventually able to recover the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it leaves me with a rather bad taste in my mouth. So the pertinent questions would be: 1. What caused this? My bet is on the evil systemd just pulling the plug before the poor OSD had finished its shutdown job. 2. How to prevent it from happening again? Is there something the Ceph developers can do with regards to init scripts? Or is this something to be brought up with the Debian maintainer? Debian is transiting from sysv-init to systemd (booo!) with Jessie, but the OSDs still have a sysvinit magic file in their top directory. Could this have an affect on things? 3. Is it really that easy to trash your OSDs? In the case a storage node crashes, am I to expect most if not all OSDs or at least their journals to require manual loving? So this can't happen. Good thing you quoted that, as it clearly did. ^o^ Now the question of how exactly remains to be answered. Being force killed definitely can't kill the OSD's disk state; that's the whole point of the journaling. The other OSDs got to the point where they logged journal flush done, this one didn't. Coincidence? I think not. Totally agree about the point of journaling being to prevent this kind of situation of course. The error message indicates that the header written on disk is nonsense to the OSD, which means that the local filesystem or disk lost something somehow (assuming you haven't done something silly like downgrading the software version it's running) and doesn't know it (if there
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote: Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m --- Thankfully this isn't production yet and I was eventually able to recover the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it leaves me with a rather bad taste in my mouth. So the pertinent questions would be: 1. What caused this? My bet is on the evil systemd just pulling the plug before the poor OSD had finished its shutdown job. 2. How to prevent it from happening again? Is there something the Ceph developers can do with regards to init scripts? Or is this something to be brought up with the Debian maintainer? Debian is transiting from sysv-init to systemd (booo!) with Jessie, but the OSDs still have a sysvinit magic file in their top directory. Could this have an affect on things? 3. Is it really that easy to trash your OSDs? In the case a storage node crashes, am I to expect most if not all OSDs or at least their journals to require manual loving? So this can't happen. Being force killed definitely can't kill the OSD's disk state; that's the whole point of the journaling. The error message indicates that the header written on disk is nonsense to the OSD, which means that the local filesystem or disk lost something somehow (assuming you haven't done something silly like downgrading the software version it's running) and doesn't know it (if there had been a read error the output would be different). I'd double-check your disk settings etc just to be sure, and check for known issues with xfs on Jessie. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
On Fri, 5 Dec 2014 11:23:19 -0800 Gregory Farnum wrote: On Thu, Dec 4, 2014 at 7:03 PM, Christian Balzer ch...@gol.com wrote: Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m --- Thankfully this isn't production yet and I was eventually able to recover the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it leaves me with a rather bad taste in my mouth. So the pertinent questions would be: 1. What caused this? My bet is on the evil systemd just pulling the plug before the poor OSD had finished its shutdown job. 2. How to prevent it from happening again? Is there something the Ceph developers can do with regards to init scripts? Or is this something to be brought up with the Debian maintainer? Debian is transiting from sysv-init to systemd (booo!) with Jessie, but the OSDs still have a sysvinit magic file in their top directory. Could this have an affect on things? 3. Is it really that easy to trash your OSDs? In the case a storage node crashes, am I to expect most if not all OSDs or at least their journals to require manual loving? So this can't happen. Good thing you quoted that, as it clearly did. ^o^ Now the question of how exactly remains to be answered. Being force killed definitely can't kill the OSD's disk state; that's the whole point of the journaling. The other OSDs got to the point where they logged journal flush done, this one didn't. Coincidence? I think not. Totally agree about the point of journaling being to prevent this kind of situation of course. The error message indicates that the header written on disk is nonsense to the OSD, which means that the local filesystem or disk lost something somehow (assuming you haven't done something silly like downgrading the software version it's running) and doesn't know it (if there had been a read error the output would be different). The journal is on an SSD, as stated. And before you ask it's on an Intel DC S3700. This was created on 0.80.7 just a day before, so no version games. I'd double-check your disk settings etc just to be sure, and check for known issues with xfs on Jessie. I'm using ext4, but that shouldn't be an issue here to begin with, as the journal is a raw SSD partition. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD trashed by simple reboot (Debian Jessie, systemd?)
Hello, This morning I decided to reboot a storage node (Debian Jessie, thus 3.16 kernel and Ceph 0.80.7, HDD OSDs with SSD journals) after applying some changes. It came back up one OSD short, the last log lines before the reboot are: --- 2014-12-05 09:35:27.700330 7f87e789c700 2 -- 10.0.8.21:6823/29520 10.0.8.22:0/5161 pipe(0x7f881b772580 sd=247 :6823 s=2 pgs=21 cs=1 l=1 c=0x7f881f469020).fault (0) Success 2014-12-05 09:35:27.700350 7f87f011d700 10 osd.4 pg_epoch: 293 pg[3.316( v 289'1347 (0'0,289'1347] local-les=289 n=8 ec=5 les/c 289/289 288/288/288) [8,4,16] r=1 lpr=288 pi=276-287/1 luod=0'0 crt=289'1345 lcod 289'1346 active] cancel_copy_ops --- Quite obviously it didn't complete its shutdown, so unsurprisingly we get: --- 2014-12-05 09:37:40.278128 7f218a7037c0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 24: 1269312 bytes, block size 4096 bytes, directio = 1, aio = 1 2014-12-05 09:37:40.278427 7f218a7037c0 -1 journal read_header error decoding journal header 2014-12-05 09:37:40.278479 7f218a7037c0 -1 filestore(/var/lib/ceph/osd/ceph-4) mount failed to open journal /var/lib/ceph/osd/ceph-4/journal: (22) Invalid argument 2014-12-05 09:37:40.776203 7f218a7037c0 -1 osd.4 0 OSD:init: unable to mount object store 2014-12-05 09:37:40.776223 7f218a7037c0 -1 ESC[0;31m ** ERROR: osd init failed: (22) Invalid argument ESC[0m --- Thankfully this isn't production yet and I was eventually able to recover the OSD by re-creating the journal (ceph-osd -i 4 --mkjournal), but it leaves me with a rather bad taste in my mouth. So the pertinent questions would be: 1. What caused this? My bet is on the evil systemd just pulling the plug before the poor OSD had finished its shutdown job. 2. How to prevent it from happening again? Is there something the Ceph developers can do with regards to init scripts? Or is this something to be brought up with the Debian maintainer? Debian is transiting from sysv-init to systemd (booo!) with Jessie, but the OSDs still have a sysvinit magic file in their top directory. Could this have an affect on things? 3. Is it really that easy to trash your OSDs? In the case a storage node crashes, am I to expect most if not all OSDs or at least their journals to require manual loving? Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com