Okay so far since switching back it looks more stable.  I have around 2GB/s
and 100k iops flowing with FIO atm to test.
_____________________________________________



On Mon, Aug 27, 2018 at 11:06 PM Adam Tygart <mo...@ksu.edu> wrote:

> This issue was related to using Jemalloc. Jemalloc is not as well
> tested with Bluestore and lead to lots of segfaults. We moved back to
> the default of tcmalloc with Bluestore and these stopped.
>
> Check /etc/sysconfig/ceph under RHEL based distros.
>
> --
> Adam
> On Mon, Aug 27, 2018 at 9:51 PM Tyler Bishop
> <tyler.bis...@beyondhosting.net> wrote:
> >
> > Did you solve this?  Similar issue.
> > _____________________________________________
> >
> >
> > On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson <kylehut...@ksu.edu> wrote:
> >>
> >> I'm following up from awhile ago. I don't think this is the same bug.
> The bug referenced shows "abort: Corruption: block checksum mismatch", and
> I'm not seeing that on mine.
> >>
> >> Now I've had 8 OSDs down on this one server for a couple of weeks, and
> I just tried to start it back up. Here's a link to the log of that OSD
> (which segfaulted right after starting up):
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
> >>
> >> To me, it looks like the logs are providing surprisingly few hints as
> to where the problem lies. Is there a way I can turn up logging to see if I
> can get any more info as to why this is happening?
> >>
> >> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor <m...@oeg.com.au> wrote:
> >>>
> >>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> >>> > We had a 26-node production ceph cluster which we upgraded to
> Luminous
> >>> > a little over a month ago. I added a 27th-node with Bluestore and
> >>> > didn't have any issues, so I began converting the others, one at a
> >>> > time. The first two went off pretty smoothly, but the 3rd is doing
> >>> > something strange.
> >>> >
> >>> > Initially, all the OSDs came up fine, but then some started to
> >>> > segfault. Out of curiosity more than anything else, I did reboot the
> >>> > server to see if it would get better or worse, and it pretty much
> >>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
> >>> > those, 3 again segfaulted
> >>> >
> >>> > I picked one that didn't properly come up and copied the log to where
> >>> > anybody can view it:
> >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> >>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.426.log>
> >>> >
> >>> > You can contrast that with one that is up:
> >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> >>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.428.log>
> >>> >
> >>> > (which is still showing segfaults in the logs, but seems to be
> >>> > recovering from them OK?)
> >>> >
> >>> > Any ideas?
> >>> Ideas ? yes
> >>>
> >>> There is a a bug which is hitting a small number of systems and at this
> >>> time there is no solution. Issues details at
> >>> http://tracker.ceph.com/issues/22102.
> >>>
> >>> Please submit more details of your problem on the ticket.
> >>>
> >>> Mike
> >>>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to