On Fri, 16 Aug 2013, James Harper wrote: > > > > Of course, the old standby is to just crank up the logging detail and > > > > try > > > > to narrow down where the crash happens. Have you tried that yet? > > > > > > I haven't touched the rbd code. Is increased logging a compile-time > > > option or a config option? > > > > That is probably the first you should try then. In the [client] section > > of ceph.conf on the node where tapdisk is running add something like > > > > [client] > > debug rbd = 20 > > debug rados = 20 > > debug ms = 1 > > log file = /var/log/ceph/client.$name.$pid.log > > > > and make sure the log directory is writeable. > > > > Excellent. How noisy are those levels likely to be? > > Is it the consumer of librbd that reads those values? I mean all I need > to do is restart tapdisk process and the logging should happen right?
That sound do it, yeah. > > > > There is a probable issue with aio_flush and caching enabled that Mike > > > > Dawson is trying to reproduce. Are you running with caching on or off? > > > > > > I have not enabled caching, and I believe it's disabled by default. > > > > There is a fix for an aio hang that just hit the cuttlefish branch today > > that could conceivably be the issue. It causes a hang on qemu but maybe > > tapdisk is more sensitive? I'd make sure you're running with that in any > > case to rule it out. > > > > I switched to dumpling in the last few days to see if the problem existed > there. Is the fix you mention in dumpling? I'm not yet running mission > critical production code on ceph, just a secondary windows domain controller, > secondary spam filter, and a few other machines that don't affect production > if they crash. The fix is in the dumpling branch, but not in v0.67. It will be in v0.67.1. > I'm also testing valgrind at the moment, just basic memtest, but suddenly > everything is quite stable even though it's under reasonable load right now. > Stupid heisenbugs. Valgrind makes things go very slow (~10x?), which can have a huge effect on timing. Sometimes that reveals new races, other times it hides others. :/ sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html