On Fri, Mar 22, 2019 at 10:55 AM Robert Johnstone <r.w.johnst...@gmail.com> wrote: > > I don't see any memory barriers in your assembly. If you are modifying the > backing array while it is being scanned by the GC, there could be some > interaction. I don't know enough about the GC internals to say more than > that. If you look at when memory barriers are inserted by the Go compiler, > it might provide more guidance.
If it's just []uint64 that shouldn't be an issue, as write barriers are not required for uint64. You are certainly correct if the assembly is manipulating slices that contain pointers. Ian > On Friday, 22 March 2019 00:39:34 UTC-4, Tom wrote: >> >> I've been stuck on this for a few days so thought I would ask the brains >> trust. >> >> TL;DR: When I have native amd64 instructions mutating (updating the len + >> values of a []uint64) a slice, I experience spurious & random memory >> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing >> the same thing continuously), and only when the GC is enabled. Any debugging >> ideas or things I should look into? >> >> Background: >> >> I'm calling into go assembly with a few pointers to slices (*[]uint64), and >> that assembly is mutating them (reading/writing values, updating len within >> capacity). I'm experiencing random memory corruption, but I can only trigger >> it in the following scenarios: >> >> Heavy load - Doing a zillion things at once (specifically running all my >> test cases in parallel) and maxing out my machine. >> Parallelism - A panic due to memory corruption happens faster if --parallel >> is set higher, and never if not in parallel. >> GC - The panic never happens if the GC is disabled (of course, the test >> process eventually runs out of memory). >> >> The memory corruption varies, but usually results in an element of an >> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, or >> (less likely) a segfault. >> >> Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my >> test cases at once (with --count at 8000 or so & using t.Parallel()). >> Running thing serially or individually yields the correct behaviour. >> >> The assembly in question looks like this: >> >> TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 >> GO_ARGS >> MOVQ asm+0(FP), AX // Load the address of the assembly section. >> MOVQ stack+8(FP), R10 // Load the address of the 1st slice. >> MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. >> MOVQ 0(AX), AX // Deference pointer to native code. >> JMP AX // Jump to native code. >> >> And slice manipulation like this (this is a 'pop'): >> >> MOVQ r13, [r10+8] // Load the length of the slice. >> DECQ r13 // Decrements the len (I can guarantee this >> will never underflow). >> MOVQ r12, [r10] // Load the 0th element address. >> LEAQ r12, [r12 + r13*8] // Compute the address of the last element. >> MOVQ reg, [r12] // Load the element to reg. >> MOVQ [r10+8], r13 // Write the len back. >> >> or 'push' like this (note: cap is always large enough for any pushes) ... >> >> MOVQ r12, [r10] // Load the 0th element address. >> MOVQ r13, [r10+8] // Load the len. >> LEAQ r12, [r12 + r13*8] // Compute the address of the last element + 1. >> INCQ r13 // Increment the len. >> MOVQ [r10+8], r13 // Save the len. >> MOVQ [r12], reg // Write the new element. >> >> >> I acknowledge that calling into code like this is unsupported, but I >> struggle to understand how such corruption can happen, and having stared at >> it for a few days, I am frankly stumped. I mean, even if non-cooperative >> preemption was in these versions of Go I would expect the GC to abort when >> it cant find the stack maps for my RIP value. With no GC safe points in my >> native assembly, I dont see how the GC could interfere (yet the issue >> disappears with the GC off??). >> >> Questions: >> >> Any ideas what I'm doing wrong? >> Any ideas how I can trace this from the application side and also the >> runtime side? I've tried schedtrace and the like, but the output didnt >> appear useful or correlated to the crashes. >> Any suggestions for assumptions I might have missed and should write tests / >> guards for? >> >> Thanks, >> Tom > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.