Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread jsonp via golang-nuts
I'm not making any function calls in the assembly, just writing to memory 
addresses that represent the elements / len of the slice. I've also tried 
using LockOSThread() to see if that made any difference, alas it does not.

On Friday, March 22, 2019 at 4:59:30 AM UTC-7, Robert Engels wrote:
>
> Are you making any calls modifying the len that would allow GC to occur, 
> or change stack size? You might need to pin the Go routine so that the 
> operation you are performing is “atomic” with respect to those. 
>
> This also sounds very scary if the Go runtime every had a compacting 
> collector. 
>
> On Mar 22, 2019, at 12:27 AM, Tom > wrote:
>
> The allocation is in go, and assembly never modifies the size of the 
> backing array. Assembly only ever modifies len, which is the len of the 
> slice and not the backing array.
>
> On Thursday, 21 March 2019 22:18:29 UTC-7, Tamás Gulácsi wrote:
>>
>> 2019. március 22., péntek 6:06:06 UTC+1 időpontban Tom a következőt írta:
>>>
>>> Still errors I'm afraid :/
>>>
>>> On Thursday, 21 March 2019 21:54:59 UTC-7, Ian Lance Taylor wrote:

 On Thu, Mar 21, 2019 at 9:39 PM Tom  wrote: 
 > 
 > I've been stuck on this for a few days so thought I would ask the 
 brains trust. 
 > 
 > TL;DR: When I have native amd64 instructions mutating (updating the 
 len + values of a []uint64) a slice, I experience spurious & random memory 
 corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
 the same thing continuously), and only when the GC is enabled. Any 
 debugging ideas or things I should look into? 
 > 
 > Background: 
 > 
 > I'm calling into go assembly with a few pointers to slices 
 (*[]uint64), and that assembly is mutating them (reading/writing values, 
 updating len within capacity). I'm experiencing random memory corruption, 
 but I can only trigger it in the following scenarios: 
 > 
 > Heavy load - Doing a zillion things at once (specifically running all 
 my test cases in parallel) and maxing out my machine. 
 > Parallelism - A panic due to memory corruption happens faster if 
 --parallel is set higher, and never if not in parallel. 
 > GC - The panic never happens if the GC is disabled (of course, the 
 test process eventually runs out of memory). 
 > 
 > The memory corruption varies, but usually results in an element of an 
 unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
 or (less likely) a segfault. 
 > 
 > Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all 
 my test cases at once (with --count at 8000 or so & using t.Parallel()). 
 Running thing serially or individually yields the correct behaviour. 
 > 
 > The assembly in question looks like this: 
 > 
 > TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
 > GO_ARGS 
 > MOVQ asm+0(FP), AX  // Load the address of the assembly 
 section. 
 > MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
 > MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
 > MOVQ 0(AX), AX  // Deference pointer to native code. 
 > JMP AX  // Jump to native code. 
 > 
 > And slice manipulation like this (this is a 'pop'): 
 > 
 >  MOVQ r13, [r10+8]   // Load the length of the slice. 
 >  DECQ r13// Decrements the len (I can guarantee 
 this will never underflow). 
 >  MOVQ r12, [r10] // Load the 0th element address. 
 >  LEAQ r12, [r12 + r13*8] // Compute the address of the last 
 element. 
 >  MOVQ reg, [r12] // Load the element to reg. 
 >  MOVQ [r10+8], r13   // Write the len back. 
 > 
 > or 'push' like this (note: cap is always large enough for any pushes) 
 ... 
 > 
 >  MOVQ r12, [r10]  // Load the 0th element address. 
 >  MOVQ r13, [r10+8]// Load the len. 
 >  LEAQ r12, [r12 + r13*8]  // Compute the address of the last 
 element + 1. 
 >  INCQ r13 // Increment the len. 
 >  MOVQ [r10+8], r13// Save the len. 
 >  MOVQ [r12],   reg// Write the new element. 
 > 
 > 
 > I acknowledge that calling into code like this is unsupported, but I 
 struggle to understand how such corruption can happen, and having stared 
 at 
 it for a few days, I am frankly stumped. I mean, even if non-cooperative 
 preemption was in these versions of Go I would expect the GC to  abort 
 when 
 it cant find the stack maps for my RIP value. With no GC safe points in my 
 native assembly, I dont see how the GC could interfere (yet the issue 
 disappears with the GC off??). 
 > 
 > Questions: 
 > 
 > Any ideas what I'm doing wrong? 
 > Any ideas how I can 

Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread jsonp via golang-nuts
The assembly should never write to a position / update the len beyond the 
backing array (specifically, the assembly is generated from code where the 
'max stack depth' has been computed and validated, and the capacity of the 
slice is that size).

On Friday, March 22, 2019 at 5:39:36 AM UTC-7, Howard C. Shaw III wrote:
>
> On Friday, March 22, 2019 at 12:27:37 AM UTC-5, Tom wrote:
>>
>> The allocation is in go, and assembly never modifies the size of the 
>> backing array. Assembly only ever modifies len, which is the len of the 
>> slice and not the backing array.
>>
>>
> Can the assembly ever modify len to a size greater than the length of the 
> backing array? When that happens within go, a new, larger array gets 
> allocated and the backing array gets copied to it. If it happens in your 
> assembly?
>
> Howard
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread howardcshaw
On Friday, March 22, 2019 at 12:27:37 AM UTC-5, Tom wrote:
>
> The allocation is in go, and assembly never modifies the size of the 
> backing array. Assembly only ever modifies len, which is the len of the 
> slice and not the backing array.
>
>
Can the assembly ever modify len to a size greater than the length of the 
backing array? When that happens within go, a new, larger array gets 
allocated and the backing array gets copied to it. If it happens in your 
assembly?

Howard

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-22 Thread Robert Engels
Are you making any calls modifying the len that would allow GC to occur, or 
change stack size? You might need to pin the Go routine so that the operation 
you are performing is “atomic” with respect to those. 

This also sounds very scary if the Go runtime every had a compacting collector. 

> On Mar 22, 2019, at 12:27 AM, Tom  wrote:
> 
> The allocation is in go, and assembly never modifies the size of the backing 
> array. Assembly only ever modifies len, which is the len of the slice and not 
> the backing array.
> 
>> On Thursday, 21 March 2019 22:18:29 UTC-7, Tamás Gulácsi wrote:
>> 2019. március 22., péntek 6:06:06 UTC+1 időpontban Tom a következőt írta:
>>> 
>>> Still errors I'm afraid :/
>>> 
 On Thursday, 21 March 2019 21:54:59 UTC-7, Ian Lance Taylor wrote:
 On Thu, Mar 21, 2019 at 9:39 PM Tom  wrote: 
 > 
 > I've been stuck on this for a few days so thought I would ask the brains 
 > trust. 
 > 
 > TL;DR: When I have native amd64 instructions mutating (updating the len 
 > + values of a []uint64) a slice, I experience spurious & random memory 
 > corruption when under heavy load (# runnable goroutines > MAXPROCS, 
 > doing the same thing continuously), and only when the GC is enabled. Any 
 > debugging ideas or things I should look into? 
 > 
 > Background: 
 > 
 > I'm calling into go assembly with a few pointers to slices (*[]uint64), 
 > and that assembly is mutating them (reading/writing values, updating len 
 > within capacity). I'm experiencing random memory corruption, but I can 
 > only trigger it in the following scenarios: 
 > 
 > Heavy load - Doing a zillion things at once (specifically running all my 
 > test cases in parallel) and maxing out my machine. 
 > Parallelism - A panic due to memory corruption happens faster if 
 > --parallel is set higher, and never if not in parallel. 
 > GC - The panic never happens if the GC is disabled (of course, the test 
 > process eventually runs out of memory). 
 > 
 > The memory corruption varies, but usually results in an element of an 
 > unrelated slice being zero'ed, the len of a unrelated slice being 
 > zeroed, or (less likely) a segfault. 
 > 
 > Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my 
 > test cases at once (with --count at 8000 or so & using t.Parallel()). 
 > Running thing serially or individually yields the correct behaviour. 
 > 
 > The assembly in question looks like this: 
 > 
 > TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
 > GO_ARGS 
 > MOVQ asm+0(FP), AX  // Load the address of the assembly 
 > section. 
 > MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
 > MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
 > MOVQ 0(AX), AX  // Deference pointer to native code. 
 > JMP AX  // Jump to native code. 
 > 
 > And slice manipulation like this (this is a 'pop'): 
 > 
 >  MOVQ r13, [r10+8]   // Load the length of the slice. 
 >  DECQ r13// Decrements the len (I can guarantee this 
 > will never underflow). 
 >  MOVQ r12, [r10] // Load the 0th element address. 
 >  LEAQ r12, [r12 + r13*8] // Compute the address of the last element. 
 >  MOVQ reg, [r12] // Load the element to reg. 
 >  MOVQ [r10+8], r13   // Write the len back. 
 > 
 > or 'push' like this (note: cap is always large enough for any pushes) 
 > ... 
 > 
 >  MOVQ r12, [r10]  // Load the 0th element address. 
 >  MOVQ r13, [r10+8]// Load the len. 
 >  LEAQ r12, [r12 + r13*8]  // Compute the address of the last element 
 > + 1. 
 >  INCQ r13 // Increment the len. 
 >  MOVQ [r10+8], r13// Save the len. 
 >  MOVQ [r12],   reg// Write the new element. 
 > 
 > 
 > I acknowledge that calling into code like this is unsupported, but I 
 > struggle to understand how such corruption can happen, and having stared 
 > at it for a few days, I am frankly stumped. I mean, even if 
 > non-cooperative preemption was in these versions of Go I would expect 
 > the GC to  abort when it cant find the stack maps for my RIP value. With 
 > no GC safe points in my native assembly, I dont see how the GC could 
 > interfere (yet the issue disappears with the GC off??). 
 > 
 > Questions: 
 > 
 > Any ideas what I'm doing wrong? 
 > Any ideas how I can trace this from the application side and also the 
 > runtime side? I've tried schedtrace and the like, but the output didnt 
 > appear useful or correlated to the crashes. 
 > Any suggestions for assumptions I might have missed and should write 
 > tests / guards 

Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-21 Thread Tom
The allocation is in go, and assembly never modifies the size of the 
backing array. Assembly only ever modifies len, which is the len of the 
slice and not the backing array.

On Thursday, 21 March 2019 22:18:29 UTC-7, Tamás Gulácsi wrote:
>
> 2019. március 22., péntek 6:06:06 UTC+1 időpontban Tom a következőt írta:
>>
>> Still errors I'm afraid :/
>>
>> On Thursday, 21 March 2019 21:54:59 UTC-7, Ian Lance Taylor wrote:
>>>
>>> On Thu, Mar 21, 2019 at 9:39 PM Tom  wrote: 
>>> > 
>>> > I've been stuck on this for a few days so thought I would ask the 
>>> brains trust. 
>>> > 
>>> > TL;DR: When I have native amd64 instructions mutating (updating the 
>>> len + values of a []uint64) a slice, I experience spurious & random memory 
>>> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
>>> the same thing continuously), and only when the GC is enabled. Any 
>>> debugging ideas or things I should look into? 
>>> > 
>>> > Background: 
>>> > 
>>> > I'm calling into go assembly with a few pointers to slices 
>>> (*[]uint64), and that assembly is mutating them (reading/writing values, 
>>> updating len within capacity). I'm experiencing random memory corruption, 
>>> but I can only trigger it in the following scenarios: 
>>> > 
>>> > Heavy load - Doing a zillion things at once (specifically running all 
>>> my test cases in parallel) and maxing out my machine. 
>>> > Parallelism - A panic due to memory corruption happens faster if 
>>> --parallel is set higher, and never if not in parallel. 
>>> > GC - The panic never happens if the GC is disabled (of course, the 
>>> test process eventually runs out of memory). 
>>> > 
>>> > The memory corruption varies, but usually results in an element of an 
>>> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
>>> or (less likely) a segfault. 
>>> > 
>>> > Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all 
>>> my test cases at once (with --count at 8000 or so & using t.Parallel()). 
>>> Running thing serially or individually yields the correct behaviour. 
>>> > 
>>> > The assembly in question looks like this: 
>>> > 
>>> > TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
>>> > GO_ARGS 
>>> > MOVQ asm+0(FP), AX  // Load the address of the assembly 
>>> section. 
>>> > MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
>>> > MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
>>> > MOVQ 0(AX), AX  // Deference pointer to native code. 
>>> > JMP AX  // Jump to native code. 
>>> > 
>>> > And slice manipulation like this (this is a 'pop'): 
>>> > 
>>> >  MOVQ r13, [r10+8]   // Load the length of the slice. 
>>> >  DECQ r13// Decrements the len (I can guarantee 
>>> this will never underflow). 
>>> >  MOVQ r12, [r10] // Load the 0th element address. 
>>> >  LEAQ r12, [r12 + r13*8] // Compute the address of the last 
>>> element. 
>>> >  MOVQ reg, [r12] // Load the element to reg. 
>>> >  MOVQ [r10+8], r13   // Write the len back. 
>>> > 
>>> > or 'push' like this (note: cap is always large enough for any pushes) 
>>> ... 
>>> > 
>>> >  MOVQ r12, [r10]  // Load the 0th element address. 
>>> >  MOVQ r13, [r10+8]// Load the len. 
>>> >  LEAQ r12, [r12 + r13*8]  // Compute the address of the last 
>>> element + 1. 
>>> >  INCQ r13 // Increment the len. 
>>> >  MOVQ [r10+8], r13// Save the len. 
>>> >  MOVQ [r12],   reg// Write the new element. 
>>> > 
>>> > 
>>> > I acknowledge that calling into code like this is unsupported, but I 
>>> struggle to understand how such corruption can happen, and having stared at 
>>> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
>>> preemption was in these versions of Go I would expect the GC to  abort when 
>>> it cant find the stack maps for my RIP value. With no GC safe points in my 
>>> native assembly, I dont see how the GC could interfere (yet the issue 
>>> disappears with the GC off??). 
>>> > 
>>> > Questions: 
>>> > 
>>> > Any ideas what I'm doing wrong? 
>>> > Any ideas how I can trace this from the application side and also the 
>>> runtime side? I've tried schedtrace and the like, but the output didnt 
>>> appear useful or correlated to the crashes. 
>>> > Any suggestions for assumptions I might have missed and should write 
>>> tests / guards for? 
>>>
>>>
>>>
> Do the allocation in Go, don't modify the slice's backing array's length 
> outside of Go - the runtime won't know about it and happily allocate over 
> the grown slice. 
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-21 Thread Tamás Gulácsi
2019. március 22., péntek 6:06:06 UTC+1 időpontban Tom a következőt írta:
>
> Still errors I'm afraid :/
>
> On Thursday, 21 March 2019 21:54:59 UTC-7, Ian Lance Taylor wrote:
>>
>> On Thu, Mar 21, 2019 at 9:39 PM Tom  wrote: 
>> > 
>> > I've been stuck on this for a few days so thought I would ask the 
>> brains trust. 
>> > 
>> > TL;DR: When I have native amd64 instructions mutating (updating the len 
>> + values of a []uint64) a slice, I experience spurious & random memory 
>> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
>> the same thing continuously), and only when the GC is enabled. Any 
>> debugging ideas or things I should look into? 
>> > 
>> > Background: 
>> > 
>> > I'm calling into go assembly with a few pointers to slices (*[]uint64), 
>> and that assembly is mutating them (reading/writing values, updating len 
>> within capacity). I'm experiencing random memory corruption, but I can only 
>> trigger it in the following scenarios: 
>> > 
>> > Heavy load - Doing a zillion things at once (specifically running all 
>> my test cases in parallel) and maxing out my machine. 
>> > Parallelism - A panic due to memory corruption happens faster if 
>> --parallel is set higher, and never if not in parallel. 
>> > GC - The panic never happens if the GC is disabled (of course, the test 
>> process eventually runs out of memory). 
>> > 
>> > The memory corruption varies, but usually results in an element of an 
>> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
>> or (less likely) a segfault. 
>> > 
>> > Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all 
>> my test cases at once (with --count at 8000 or so & using t.Parallel()). 
>> Running thing serially or individually yields the correct behaviour. 
>> > 
>> > The assembly in question looks like this: 
>> > 
>> > TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
>> > GO_ARGS 
>> > MOVQ asm+0(FP), AX  // Load the address of the assembly 
>> section. 
>> > MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
>> > MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
>> > MOVQ 0(AX), AX  // Deference pointer to native code. 
>> > JMP AX  // Jump to native code. 
>> > 
>> > And slice manipulation like this (this is a 'pop'): 
>> > 
>> >  MOVQ r13, [r10+8]   // Load the length of the slice. 
>> >  DECQ r13// Decrements the len (I can guarantee 
>> this will never underflow). 
>> >  MOVQ r12, [r10] // Load the 0th element address. 
>> >  LEAQ r12, [r12 + r13*8] // Compute the address of the last 
>> element. 
>> >  MOVQ reg, [r12] // Load the element to reg. 
>> >  MOVQ [r10+8], r13   // Write the len back. 
>> > 
>> > or 'push' like this (note: cap is always large enough for any pushes) 
>> ... 
>> > 
>> >  MOVQ r12, [r10]  // Load the 0th element address. 
>> >  MOVQ r13, [r10+8]// Load the len. 
>> >  LEAQ r12, [r12 + r13*8]  // Compute the address of the last 
>> element + 1. 
>> >  INCQ r13 // Increment the len. 
>> >  MOVQ [r10+8], r13// Save the len. 
>> >  MOVQ [r12],   reg// Write the new element. 
>> > 
>> > 
>> > I acknowledge that calling into code like this is unsupported, but I 
>> struggle to understand how such corruption can happen, and having stared at 
>> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
>> preemption was in these versions of Go I would expect the GC to  abort when 
>> it cant find the stack maps for my RIP value. With no GC safe points in my 
>> native assembly, I dont see how the GC could interfere (yet the issue 
>> disappears with the GC off??). 
>> > 
>> > Questions: 
>> > 
>> > Any ideas what I'm doing wrong? 
>> > Any ideas how I can trace this from the application side and also the 
>> runtime side? I've tried schedtrace and the like, but the output didnt 
>> appear useful or correlated to the crashes. 
>> > Any suggestions for assumptions I might have missed and should write 
>> tests / guards for? 
>>
>>
>>
Do the allocation in Go, don't modify the slice's backing array's length 
outside of Go - the runtime won't know about it and happily allocate over 
the grown slice. 
 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-21 Thread Tom
Still errors I'm afraid :/

On Thursday, 21 March 2019 21:54:59 UTC-7, Ian Lance Taylor wrote:
>
> On Thu, Mar 21, 2019 at 9:39 PM Tom > 
> wrote: 
> > 
> > I've been stuck on this for a few days so thought I would ask the brains 
> trust. 
> > 
> > TL;DR: When I have native amd64 instructions mutating (updating the len 
> + values of a []uint64) a slice, I experience spurious & random memory 
> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
> the same thing continuously), and only when the GC is enabled. Any 
> debugging ideas or things I should look into? 
> > 
> > Background: 
> > 
> > I'm calling into go assembly with a few pointers to slices (*[]uint64), 
> and that assembly is mutating them (reading/writing values, updating len 
> within capacity). I'm experiencing random memory corruption, but I can only 
> trigger it in the following scenarios: 
> > 
> > Heavy load - Doing a zillion things at once (specifically running all my 
> test cases in parallel) and maxing out my machine. 
> > Parallelism - A panic due to memory corruption happens faster if 
> --parallel is set higher, and never if not in parallel. 
> > GC - The panic never happens if the GC is disabled (of course, the test 
> process eventually runs out of memory). 
> > 
> > The memory corruption varies, but usually results in an element of an 
> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
> or (less likely) a segfault. 
> > 
> > Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my 
> test cases at once (with --count at 8000 or so & using t.Parallel()). 
> Running thing serially or individually yields the correct behaviour. 
> > 
> > The assembly in question looks like this: 
> > 
> > TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
> > GO_ARGS 
> > MOVQ asm+0(FP), AX  // Load the address of the assembly 
> section. 
> > MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
> > MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
> > MOVQ 0(AX), AX  // Deference pointer to native code. 
> > JMP AX  // Jump to native code. 
> > 
> > And slice manipulation like this (this is a 'pop'): 
> > 
> >  MOVQ r13, [r10+8]   // Load the length of the slice. 
> >  DECQ r13// Decrements the len (I can guarantee this 
> will never underflow). 
> >  MOVQ r12, [r10] // Load the 0th element address. 
> >  LEAQ r12, [r12 + r13*8] // Compute the address of the last element. 
> >  MOVQ reg, [r12] // Load the element to reg. 
> >  MOVQ [r10+8], r13   // Write the len back. 
> > 
> > or 'push' like this (note: cap is always large enough for any pushes) 
> ... 
> > 
> >  MOVQ r12, [r10]  // Load the 0th element address. 
> >  MOVQ r13, [r10+8]// Load the len. 
> >  LEAQ r12, [r12 + r13*8]  // Compute the address of the last element 
> + 1. 
> >  INCQ r13 // Increment the len. 
> >  MOVQ [r10+8], r13// Save the len. 
> >  MOVQ [r12],   reg// Write the new element. 
> > 
> > 
> > I acknowledge that calling into code like this is unsupported, but I 
> struggle to understand how such corruption can happen, and having stared at 
> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
> preemption was in these versions of Go I would expect the GC to  abort when 
> it cant find the stack maps for my RIP value. With no GC safe points in my 
> native assembly, I dont see how the GC could interfere (yet the issue 
> disappears with the GC off??). 
> > 
> > Questions: 
> > 
> > Any ideas what I'm doing wrong? 
> > Any ideas how I can trace this from the application side and also the 
> runtime side? I've tried schedtrace and the like, but the output didnt 
> appear useful or correlated to the crashes. 
> > Any suggestions for assumptions I might have missed and should write 
> tests / guards for? 
>
> See whether it helps to add runtime.KeepAlive calls for the slices and 
> any other pointers that you pass to the assembly code.  If that fixes 
> the problem, then it's a liveness problem. 
>
> Ian 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-21 Thread Ian Lance Taylor
On Thu, Mar 21, 2019 at 9:39 PM Tom  wrote:
>
> I've been stuck on this for a few days so thought I would ask the brains 
> trust.
>
> TL;DR: When I have native amd64 instructions mutating (updating the len + 
> values of a []uint64) a slice, I experience spurious & random memory 
> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing the 
> same thing continuously), and only when the GC is enabled. Any debugging 
> ideas or things I should look into?
>
> Background:
>
> I'm calling into go assembly with a few pointers to slices (*[]uint64), and 
> that assembly is mutating them (reading/writing values, updating len within 
> capacity). I'm experiencing random memory corruption, but I can only trigger 
> it in the following scenarios:
>
> Heavy load - Doing a zillion things at once (specifically running all my test 
> cases in parallel) and maxing out my machine.
> Parallelism - A panic due to memory corruption happens faster if --parallel 
> is set higher, and never if not in parallel.
> GC - The panic never happens if the GC is disabled (of course, the test 
> process eventually runs out of memory).
>
> The memory corruption varies, but usually results in an element of an 
> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, or 
> (less likely) a segfault.
>
> Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my test 
> cases at once (with --count at 8000 or so & using t.Parallel()). Running 
> thing serially or individually yields the correct behaviour.
>
> The assembly in question looks like this:
>
> TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24
> GO_ARGS
> MOVQ asm+0(FP), AX  // Load the address of the assembly section.
> MOVQ stack+8(FP),   R10 // Load the address of the 1st slice.
> MOVQ locals+16(FP), R11 // Load the address of the 2nd slice.
> MOVQ 0(AX), AX  // Deference pointer to native code.
> JMP AX  // Jump to native code.
>
> And slice manipulation like this (this is a 'pop'):
>
>  MOVQ r13, [r10+8]   // Load the length of the slice.
>  DECQ r13// Decrements the len (I can guarantee this will 
> never underflow).
>  MOVQ r12, [r10] // Load the 0th element address.
>  LEAQ r12, [r12 + r13*8] // Compute the address of the last element.
>  MOVQ reg, [r12] // Load the element to reg.
>  MOVQ [r10+8], r13   // Write the len back.
>
> or 'push' like this (note: cap is always large enough for any pushes) ...
>
>  MOVQ r12, [r10]  // Load the 0th element address.
>  MOVQ r13, [r10+8]// Load the len.
>  LEAQ r12, [r12 + r13*8]  // Compute the address of the last element + 1.
>  INCQ r13 // Increment the len.
>  MOVQ [r10+8], r13// Save the len.
>  MOVQ [r12],   reg// Write the new element.
>
>
> I acknowledge that calling into code like this is unsupported, but I struggle 
> to understand how such corruption can happen, and having stared at it for a 
> few days, I am frankly stumped. I mean, even if non-cooperative preemption 
> was in these versions of Go I would expect the GC to  abort when it cant find 
> the stack maps for my RIP value. With no GC safe points in my native 
> assembly, I dont see how the GC could interfere (yet the issue disappears 
> with the GC off??).
>
> Questions:
>
> Any ideas what I'm doing wrong?
> Any ideas how I can trace this from the application side and also the runtime 
> side? I've tried schedtrace and the like, but the output didnt appear useful 
> or correlated to the crashes.
> Any suggestions for assumptions I might have missed and should write tests / 
> guards for?

See whether it helps to add runtime.KeepAlive calls for the slices and
any other pointers that you pass to the assembly code.  If that fixes
the problem, then it's a liveness problem.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Accessing *[]uint64 from assembly - strange memory corruption under heavy load - any ideas?

2019-03-21 Thread Tom
I've been stuck on this for a few days so thought I would ask the brains 
trust.

*TL;DR: *When I have native amd64 instructions mutating (updating the len + 
values of a []uint64) a slice, I experience spurious & random memory 
corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
the same thing continuously), and only when the GC is enabled. Any 
debugging ideas or things I should look into?

*Background:*

I'm calling into go assembly with a few pointers to slices (*[]uint64), and 
that assembly is mutating them (reading/writing values, updating len within 
capacity). I'm experiencing random memory corruption, but I can only 
trigger it in the following scenarios:

   1. Heavy load - Doing a zillion things at once (specifically running all 
   my test cases in parallel) and maxing out my machine.
   2. Parallelism - A panic due to memory corruption happens faster if 
   --parallel is set higher, and never if not in parallel.
   3. GC - The panic never happens if the GC is disabled (of course, the 
   test process eventually runs out of memory).

The memory corruption varies, but usually results in an element of an 
unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
or (less likely) a segfault.

Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my 
test cases at once (with --count at 8000 or so & using t.Parallel()). 
Running thing serially or individually yields the correct behaviour.

The assembly in question looks like this:

TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24
GO_ARGS
MOVQ asm+0(FP), AX  // Load the address of the assembly section.
MOVQ stack+8(FP),   R10 // Load the address of the 1st slice.
MOVQ locals+16(FP), R11 // Load the address of the 2nd slice.
MOVQ 0(AX), AX  // Deference pointer to native code.
JMP AX  // Jump to native code.

And slice manipulation like this (this is a 'pop'):

 MOVQ r13, [r10+8]   // Load the length of the slice.
 DECQ r13// Decrements the len (I can guarantee this 
will never underflow).
 MOVQ r12, [r10] // Load the 0th element address.
 LEAQ r12, [r12 + r13*8] // Compute the address of the last element.
 MOVQ reg, [r12] // Load the element to reg.
 MOVQ [r10+8], r13   // Write the len back.

or 'push' like this (note: cap is always large enough for any pushes) ...

 MOVQ r12, [r10]  // Load the 0th element address.
 MOVQ r13, [r10+8]// Load the len.
 LEAQ r12, [r12 + r13*8]  // Compute the address of the last element + 
1.
 INCQ r13 // Increment the len.
 MOVQ [r10+8], r13// Save the len.
 MOVQ [r12],   reg// Write the new element.


I acknowledge that calling into code like this is unsupported, but I 
struggle to understand how such corruption can happen, and having stared at 
it for a few days, I am frankly stumped. I mean, even if non-cooperative 
preemption was in these versions of Go I would expect the GC to  abort when 
it cant find the stack maps for my RIP value. With no GC safe points in my 
native assembly, I dont see how the GC could interfere (yet the issue 
disappears with the GC off??).

*Questions:*

   1. Any ideas what I'm doing wrong?
   2. Any ideas how I can trace this from the application side and also the 
   runtime side? I've tried schedtrace and the like, but the output didnt 
   appear useful or correlated to the crashes.
   3. Any suggestions for assumptions I might have missed and should write 
   tests / guards for?

Thanks,
Tom

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.