Re: 1.4.18

dormando Sat, 19 Apr 2014 12:18:23 -0700

On Sat, 19 Apr 2014, Dan McGee wrote:

> On Sat, Apr 19, 2014 at 1:45 PM, dormando <dorma...@rydia.net> wrote:
>       > On Sat, Apr 19, 2014 at 12:43 PM, dormando <dorma...@rydia.net> wrote:
>       >       Well, that learns me for trying to write software without the 
> 10+ VM
>       >       buildbots...
>       >
>       >       The i386 one, can you include the output of "stats settings", 
> and also
>       >       manually run: "lru_crawler enable" (or start with -o 
> lru_crawler) then run
>       >       "stats settings" again please? Really weird that it fails 
> there, but not
>       >       the lines before it looking for the "OK" while enabling it.
>       >
>       >
>       > As soon as I type "lru_crawler enable", memcached crashes. I see this 
> in dmesg.
>       >
>       > [189571.108397] traps: memcached-debug[31776] general protection 
> ip:f7749988 sp:f47ff2d8 error:0 in
>       libpthread-2.19.so[f7739000+18000]
>       > [189969.840918] traps: memcached-debug[2600] general protection 
> ip:7f976510a1c8 sp:7f976254aed8 error:0 in
>       libpthread-2.19.so[7f97650f9000+18000]
>       > [195892.554754] traps: memcached-debug[31871] general protection 
> ip:f76f0988 sp:f46ff2d8 error:0 in
>       libpthread-2.19.so[f76e0000+18000]
>       >
>       > Starting with "-o lru_crawler" also crashes.
>       >
>       > [195977.276379] traps: memcached-debug[2182] general protection 
> ip:f7738988 sp:f75782d8 error:0 in libpthread-2.19.so[f7728000+18000]
>       >
>       > This is running both 32 bit and 64 bit executables on the same build 
> box; note in the above dmesg output that two of them appear to
>       be from 32-bit
>       > processes, and we also see a crash in what looks a lot like a 64 bit 
> pointer address, if I'm reading this right...
>
> Uhh... is your cross compile goofed?
>
> Any chance you could start the memcached-debug binary under gdb and then
> crash it the same way? Get a full stack trace.
>
> Thinking if I even have a 32bit host left somewhere to test with... will
> have to spin up the VM's later, but a stacktrace might be enlightening
> anyway.
>
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xf7dbfb40 (LWP 7)]
> 0xf7f7f988 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
> (gdb) bt
> #0  0xf7f7f988 in __lll_unlock_elision () from /usr/lib/libpthread.so.0
> #1  0xf7f790e0 in __pthread_mutex_unlock_usercnt () from 
> /usr/lib/libpthread.so.0
> #2  0xf7f79bff in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib/libpthread.so.0
> #3  0x08061bfe in item_crawler_thread ()
> #4  0xf7f75f20 in start_thread () from /usr/lib/libpthread.so.0
> #5  0xf7ead94e in clone () from /usr/lib/libc.so.6


Holy crap lock elision. I have one machine with a haswell chip here, but
I'll have to USB boot. Is getting an Arch liveimage especially time
consuming?

https://github.com/dormando/memcached/tree/crawler_fix

Can you try this? The lock elision might've made my "undefined behavior"
mistake of not holding a lock before initially waiting on the condition
fatal.

A further fix might be required, as it's possible someone could kill the
do_etc flag before the thread fully starts and it'd drop out with the lock
held. That would be an incredible feat though.
  
>
>       Thanks!
>
>       >
>       >       On the 64bit host, can you try increasing the sleep on 
> t/lru-crawler.t:39
>       >       from 3 to 8 and try again? I was trying to be clever but that 
> may not be
>       >       working out.
>       >
>       >
>       > Didn't change anything, same two failures with the same output listed.
>
> I feel like something's a bit different between your two tests. In the
> first set, it's definitely not crashing for the 64bit test, but not
> working either. Is something weird going on with the second set of tests?
> You noted it seems to be running a 32bit binary still.
>
> I'm willing to ignore the 64-bit failures for now until we figure out the 
> 32-bit ones.
>
> In any case, I wouldn't blame the cross-compile or toolchain, these have all 
> been built in very clean, single architecture systemd-nspawn chroots.

Thanks, I'm just trying to reason why it's failing in two different ways.
The initial failure of finding 90 items when it expected 60 is a timing
glitch, the other ones are this thread crashing the daemon.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 1.4.18

Reply via email to