Okay so far since switching back it looks more stable. I have around 2GB/s and 100k iops flowing with FIO atm to test. _____________________________________________
On Mon, Aug 27, 2018 at 11:06 PM Adam Tygart <mo...@ksu.edu> wrote: > This issue was related to using Jemalloc. Jemalloc is not as well > tested with Bluestore and lead to lots of segfaults. We moved back to > the default of tcmalloc with Bluestore and these stopped. > > Check /etc/sysconfig/ceph under RHEL based distros. > > -- > Adam > On Mon, Aug 27, 2018 at 9:51 PM Tyler Bishop > <tyler.bis...@beyondhosting.net> wrote: > > > > Did you solve this? Similar issue. > > _____________________________________________ > > > > > > On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson <kylehut...@ksu.edu> wrote: > >> > >> I'm following up from awhile ago. I don't think this is the same bug. > The bug referenced shows "abort: Corruption: block checksum mismatch", and > I'm not seeing that on mine. > >> > >> Now I've had 8 OSDs down on this one server for a couple of weeks, and > I just tried to start it back up. Here's a link to the log of that OSD > (which segfaulted right after starting up): > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log > >> > >> To me, it looks like the logs are providing surprisingly few hints as > to where the problem lies. Is there a way I can turn up logging to see if I > can get any more info as to why this is happening? > >> > >> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor <m...@oeg.com.au> wrote: > >>> > >>> On 7/02/2018 8:23 AM, Kyle Hutson wrote: > >>> > We had a 26-node production ceph cluster which we upgraded to > Luminous > >>> > a little over a month ago. I added a 27th-node with Bluestore and > >>> > didn't have any issues, so I began converting the others, one at a > >>> > time. The first two went off pretty smoothly, but the 3rd is doing > >>> > something strange. > >>> > > >>> > Initially, all the OSDs came up fine, but then some started to > >>> > segfault. Out of curiosity more than anything else, I did reboot the > >>> > server to see if it would get better or worse, and it pretty much > >>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of > >>> > those, 3 again segfaulted > >>> > > >>> > I picked one that didn't properly come up and copied the log to where > >>> > anybody can view it: > >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log > >>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.426.log> > >>> > > >>> > You can contrast that with one that is up: > >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log > >>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.428.log> > >>> > > >>> > (which is still showing segfaults in the logs, but seems to be > >>> > recovering from them OK?) > >>> > > >>> > Any ideas? > >>> Ideas ? yes > >>> > >>> There is a a bug which is hitting a small number of systems and at this > >>> time there is no solution. Issues details at > >>> http://tracker.ceph.com/issues/22102. > >>> > >>> Please submit more details of your problem on the ticket. > >>> > >>> Mike > >>> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com