On Sat, 19 Apr 2014, Dan McGee wrote: > On Sat, Apr 19, 2014 at 1:45 PM, dormando <dorma...@rydia.net> wrote: > > On Sat, Apr 19, 2014 at 12:43 PM, dormando <dorma...@rydia.net> wrote: > > Well, that learns me for trying to write software without the > 10+ VM > > buildbots... > > > > The i386 one, can you include the output of "stats settings", > and also > > manually run: "lru_crawler enable" (or start with -o > lru_crawler) then run > > "stats settings" again please? Really weird that it fails > there, but not > > the lines before it looking for the "OK" while enabling it. > > > > > > As soon as I type "lru_crawler enable", memcached crashes. I see this > in dmesg. > > > > [189571.108397] traps: memcached-debug[31776] general protection > ip:f7749988 sp:f47ff2d8 error:0 in > libpthread-2.19.so[f7739000+18000] > > [189969.840918] traps: memcached-debug[2600] general protection > ip:7f976510a1c8 sp:7f976254aed8 error:0 in > libpthread-2.19.so[7f97650f9000+18000] > > [195892.554754] traps: memcached-debug[31871] general protection > ip:f76f0988 sp:f46ff2d8 error:0 in > libpthread-2.19.so[f76e0000+18000] > > > > Starting with "-o lru_crawler" also crashes. > > > > [195977.276379] traps: memcached-debug[2182] general protection > ip:f7738988 sp:f75782d8 error:0 in libpthread-2.19.so[f7728000+18000] > > > > This is running both 32 bit and 64 bit executables on the same build > box; note in the above dmesg output that two of them appear to > be from 32-bit > > processes, and we also see a crash in what looks a lot like a 64 bit > pointer address, if I'm reading this right... > > Uhh... is your cross compile goofed? > > Any chance you could start the memcached-debug binary under gdb and then > crash it the same way? Get a full stack trace. > > Thinking if I even have a 32bit host left somewhere to test with... will > have to spin up the VM's later, but a stacktrace might be enlightening > anyway. > > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xf7dbfb40 (LWP 7)] > 0xf7f7f988 in __lll_unlock_elision () from /usr/lib/libpthread.so.0 > (gdb) bt > #0 0xf7f7f988 in __lll_unlock_elision () from /usr/lib/libpthread.so.0 > #1 0xf7f790e0 in __pthread_mutex_unlock_usercnt () from > /usr/lib/libpthread.so.0 > #2 0xf7f79bff in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib/libpthread.so.0 > #3 0x08061bfe in item_crawler_thread () > #4 0xf7f75f20 in start_thread () from /usr/lib/libpthread.so.0 > #5 0xf7ead94e in clone () from /usr/lib/libc.so.6
Holy crap lock elision. I have one machine with a haswell chip here, but I'll have to USB boot. Is getting an Arch liveimage especially time consuming? https://github.com/dormando/memcached/tree/crawler_fix Can you try this? The lock elision might've made my "undefined behavior" mistake of not holding a lock before initially waiting on the condition fatal. A further fix might be required, as it's possible someone could kill the do_etc flag before the thread fully starts and it'd drop out with the lock held. That would be an incredible feat though. > > Thanks! > > > > > On the 64bit host, can you try increasing the sleep on > t/lru-crawler.t:39 > > from 3 to 8 and try again? I was trying to be clever but that > may not be > > working out. > > > > > > Didn't change anything, same two failures with the same output listed. > > I feel like something's a bit different between your two tests. In the > first set, it's definitely not crashing for the 64bit test, but not > working either. Is something weird going on with the second set of tests? > You noted it seems to be running a 32bit binary still. > > I'm willing to ignore the 64-bit failures for now until we figure out the > 32-bit ones. > > In any case, I wouldn't blame the cross-compile or toolchain, these have all > been built in very clean, single architecture systemd-nspawn chroots. Thanks, I'm just trying to reason why it's failing in two different ways. The initial failure of finding 90 items when it expected 60 is a timing glitch, the other ones are this thread crashing the daemon. -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to memcached+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.