Have you tried running a Luminous OSD with filestore instead of BlueStore? As BlueStore is all new code and uses a lot of optimizations and tricks for fast and efficient use of memory, some 64-bit assumptions may have snuck in there. I'm not sure how much interest there is in making sure that works on 32-bit systems at this point, but narrowing it down to a specific component would certainly help.
On Fri, Sep 22, 2017 at 8:57 PM Dyweni - Ceph-Users <6exbab4fy...@dyweni.com> wrote: > It crashes with SimpleMessenger as well (ms_type = simple) > > > I've also tried with and without these two settings, but still crashes. > bluestore cache size = 536870912 > bluestore cache kv max = 268435456 > > > When using SimpleMessenger, it tells me it is crashing (Segmentation > Fault) in 'thread_name:ms_pipe_write'. This is common in all crashes under > SimpleMessenger, just like 'msgr-worker-<n>' was common > under AsyncMessenger. > > > The node I'm testing this on is running a 32bit kernel (4.12.5) and has > 8GB ram (free -m). > > > Per 'ps aux', VSZ and RSS never get much above 1196392 and 544024 > respectively. (One time they didn't get past 999536 and 329712 > respectively.) > > > Also, under SimpleMessenger, gdb is reporting stack corruption in the back > traces. > > > What other memory tuning options should I try? > > > > > > On 2017-09-11 08:05, Gregory Farnum wrote: > > You could try setting it to run with SimpleMessenger instead of > AsyncMessenger -- the default changed across those releases. > I imagine the root of the problem though is that with BlueStore the OSD is > using a lot more memory than it used to and so we're overflowing the 32-bit > address space...which means a more permanent solution might require turning > down the memory tuning options. Sage has discussed those in various places. > On Sun, Sep 10, 2017 at 11:52 PM Dyweni - Ceph-Users < > 6exbab4fy...@dyweni.com> wrote: > >> Hi, >> >> Is anyone running Ceph Luminous (12.2.0) on 32bit Linux? Have you seen >> any problems? >> >> >> >> My setup has been 1 MON and 7 OSDs (no MDS, RGW, etc), all running Jewel >> (10.2.1), on 32bit, with no issues at all. >> >> I've upgraded everything to latest version of Jewel (10.2.9) and still >> no issues. >> >> Next I upgraded my MON to Luminous (12.2.0) and added MGR to it. Still >> no issues. >> >> Next I removed one node from the cluster, wiped it clean, upgraded it to >> Luminous (12.2.), and created a new BlueStore data area. Now this node >> crashes with segmentation fault usually within a few minutes of starting >> up. I've loaded symbols and used GDB to examine back traces. From what >> I can tell, the seg faults are happening randomly, and the stack is >> corrupted, so traces from GDB are unusable (even with all symbols >> installed for all packages on the system). However, in all cases, the >> seg fault is occuring in the 'msgr-worker-<n>' thread. >> >> >> >> >> My data is fine, just would like to get Ceph 12.2.0 running stably on >> this node, so I can upgrade the remaining nodes and switch everything >> over to BlueStore. >> >> >> >> Thanks, >> Dyweni >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com