Re: [go-nuts] weird "index out of range" in strconv.formatBits

2023-09-01 Thread metronome


We waited for two weeks, but the panic never resurfaced, will provide 
further updates if it reoccurs or as soon as we have more information.

Thanks Dan and Kurtis for looking into it.

On Friday, August 18, 2023 at 3:18:10 PM UTC+8 Dan Kortschak wrote:

> On Thu, 2023-08-17 at 23:32 -0700, metronome wrote:
> > > > Have you built with CGO_ENABLED=0?
> > Building with CGO_ENABLED=0 succeeded, does that mean the binary's
> > runtime behavior has nothing to do with CGO, deploying
> > a CGO_ENABLED=0 binary online is not an option as well, for now (We
> > are trying, but not sure if we can make it happen).
>
> Do you get the same behaviour with CGO_ENABLED=0?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/55c5fc8d-aa5e-4a7e-8ea6-812c1462fd36n%40googlegroups.com.


Re: [go-nuts] weird "index out of range" in strconv.formatBits

2023-08-18 Thread metronome
Thanks.

>> Have you eliminated the possibility of races?
We're trying to extract a smaller case for reproducing (perhaps 
impractical, as the frequency of the panic is way low) then apply the race 
detector, enabling race check online is not an option now, sadly. 

>> Have you built with CGO_ENABLED=0?
Building with CGO_ENABLED=0 succeeded, does that mean the binary's runtime 
behavior has nothing to do with CGO, deploying
a CGO_ENABLED=0 binary online is not an option as well, for now (We are 
trying, but not sure if we can make it happen).

On Thursday, August 17, 2023 at 2:46:16 PM UTC+8 Dan Kortschak wrote:

> On Wed, 2023-08-16 at 23:43 -0700, metronome wrote:
> > Thanks for commenting, a few supplements.
> > 
> > # 1. Version of Go?
> > We observed the issue with both go1.20.5 and go1.18.10 on linux/amd64
> > (centos)
> > 
> > # 2. Context?
> > All panics we observed so far are from either 
> >  strconv.FormatInt -> strconv.formatBits chain, or
> >  strconv.FormatUint -> strconv.formatBits chain
> > where the base is always 10.
> > 
> > // typical call site, toId is an "*int64".
> > if com_count > 1 {
> > com_string = anchor + "," + strconv.FormatInt(*toId, 10)
> > }
> > 
> > # 3. If your program using pure Go (statically linked) or Cgo?
> > Binary was built with CGO_ENABLED=1 and -buildmode=exe.
> > All panic call sites we observed so far are "pure go", that is, no C
> > calling go path.
> > 
> > # 4. panic stack trace
> > panic: runtime error: index out of range [18446744073708732603] with
> > length 200
> > 
> > goroutine 1 [running]:
> > strconv.formatBits({0x0?, 0x0?, 0x0?}, 0xc09e00b750?, 0x1?, 0x1?,
> > 0x0?)
> > /usr/lib/go-1.20/src/strconv/itoa.go:140 +0x4b9
> > strconv.FormatInt(0x0?, 0xc07393df80?)
> > /usr/lib/go-1.20/src/strconv/itoa.go:29 +0xa5
> >   ...
>
>
> Have you eliminated the possibility of races? Have you built with
> CGO_ENABLED=0?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/bc3cde3e-3ed5-4402-b778-a7bd8d7a22c4n%40googlegroups.com.


Re: [go-nuts] weird "index out of range" in strconv.formatBits

2023-08-17 Thread metronome
Thanks for commenting, a few supplements.

*# 1. Version of Go?*
We observed the issue with both go1.20.5 and go1.18.10 on linux/amd64 
(centos)

*# 2. Context?*
All panics we observed so far are from either 
 *strconv.FormatInt -> strconv.formatBits* chain, or
 *strconv.FormatUint -> strconv.formatBits* chain
where the base is always 10.

// typical call site, toId is an "*int64".

if com_count > 1 {

com_string = anchor + "," + strconv.FormatInt(*toId, 10)

}


*# 3. If your program using pure Go (statically linked) or Cgo?*

Binary was built with CGO_ENABLED=1 and -buildmode=exe.

All panic call sites we observed so far are "pure go", that is, no C 
calling go path.


*# 4. panic stack trace*
panic: runtime error: index out of range [18446744073708732603] with length 
200

goroutine 1 [running]:
strconv.formatBits({0x0?, 0x0?, 0x0?}, 0xc09e00b750?, 0x1?, 0x1?, 0x0?)
/usr/lib/go-1.20/src/strconv/itoa.go:140 +0x4b9
strconv.FormatInt(0x0?, 0xc07393df80?)
/usr/lib/go-1.20/src/strconv/itoa.go:29 +0xa5

  ...


On Thursday, August 17, 2023 at 12:56:17 PM UTC+8 Kurtis Rader wrote:

> Insufficient information. Version of Go? Since formatBits is private we 
> need to see the actual code of a call to a public API that resulted in the 
> call to formatBits that failed. Also, show us the literal panic stack. 
> Showing us the assembly code with no context is not useful. If your program 
> using pure Go (statically linked) or Cgo? Wild guesses, what I used to call 
> SWAGS (silly wild ass guesses) as a Unix support engineer, are seldom 
> useful. If your guess is scientifically informed that is a different matter 
> but you should be able to articulate why you think your guess is more 
> likely to be true than a random coin flip.
>
> On Wed, Aug 16, 2023 at 8:31 PM metronome  wrote:
>
>> Hi,
>>
>> We ran into a weird *out of range* issue of *strconv.formatBits*, hope 
>> someone can shed a light on what could be the root cause, any comment is 
>> highly appreciated.
>>
>> problem description:
>> *  random out of range at code 
>> <https://github.com/golang/go/blob/release-branch.go1.20/src/strconv/itoa.go#L140>,
>>  
>> most of the time the indexing is a huge int but we observed at least one 
>> exception (#2).*
>>
>> *  #1: runtime error: index out of range [18446744073709449339] with 
>> length 200*
>> * #2: runtime error: index out of range [102511] with length 200*
>>
>> Wild guesses:
>> 1. The machine code seems to suggest it's unlikely a data race or memory 
>> corruption? But perhaps 
>> relevant registers, like R10, had been saved and restored, then it might 
>> be due to stack corruption?
>> Given that R12 is scratch reg, is it possible that R12 is clobbered 
>> somehow, say, by signal handling?
>>
>> ===
>>0x00495b0a<+810>:mov%rdi,%r10
>>
>>0x00495b0d<+813>:shr%rdi
>>
>>0x00495b10<+816>:mov%rax,%rsi
>>
>>0x00495b13<+819>:movabs $0xa3d70a3d70a3d70b,%rax
>>
>>0x00495b1d<+829>:mov%rdx,%r11
>>
>>0x00495b20<+832>:mul%rdi
>>
>>0x00495b23<+835>:shr$0x5,%rdx
>>
>> *   0x00495b27<+839>:imul   $0x64,%rdx,%r12*
>>
>> *   0x00495b2b<+843>:sub%r12,%r10*
>>
>> *   0x00495b2e<+846>:lea(%r10,%r10,1),%rax*
>>
>>0x00495b32<+850>:lea0x1(%rax),%rax
>>
>>0x00495b36<+854>:nopw   0x0(%rax,%rax,1)
>>
>>0x00495b3f<+863>:nop
>>
>>0x00495b40<+864>:cmp$0xc8,%rax
>>
>>0x00495b46<+870>:jae0x495c8f 
>> 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/ff804d4c-24ee-480d-8ed1-219f9b8d7cbcn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/golang-nuts/ff804d4c-24ee-480d-8ed1-219f9b8d7cbcn%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>
>
> -- 
> Kurtis Rader
> Caretaker of the exceptional canines Junior and Hank
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/003ec4ce-443e-48ec-9ac8-22aa57b0c006n%40googlegroups.com.


[go-nuts] weird "index out of range" in strconv.formatBits

2023-08-16 Thread metronome
Hi,

We ran into a weird *out of range* issue of *strconv.formatBits*, hope 
someone can shed a light on what could be the root cause, any comment is 
highly appreciated.

problem description:
*  random out of range at code 
,
 
most of the time the indexing is a huge int but we observed at least one 
exception (#2).*

*  #1: runtime error: index out of range [18446744073709449339] with 
length 200*
* #2: runtime error: index out of range [102511] with length 200*

Wild guesses:
1. The machine code seems to suggest it's unlikely a data race or memory 
corruption? But perhaps 
relevant registers, like R10, had been saved and restored, then it might be 
due to stack corruption?
Given that R12 is scratch reg, is it possible that R12 is clobbered 
somehow, say, by signal handling?

===
   0x00495b0a<+810>:mov%rdi,%r10

   0x00495b0d<+813>:shr%rdi

   0x00495b10<+816>:mov%rax,%rsi

   0x00495b13<+819>:movabs $0xa3d70a3d70a3d70b,%rax

   0x00495b1d<+829>:mov%rdx,%r11

   0x00495b20<+832>:mul%rdi

   0x00495b23<+835>:shr$0x5,%rdx

*   0x00495b27<+839>:imul   $0x64,%rdx,%r12*

*   0x00495b2b<+843>:sub%r12,%r10*

*   0x00495b2e<+846>:lea(%r10,%r10,1),%rax*

   0x00495b32<+850>:lea0x1(%rax),%rax

   0x00495b36<+854>:nopw   0x0(%rax,%rax,1)

   0x00495b3f<+863>:nop

   0x00495b40<+864>:cmp$0xc8,%rax

   0x00495b46<+870>:jae0x495c8f 


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/ff804d4c-24ee-480d-8ed1-219f9b8d7cbcn%40googlegroups.com.


Re: [go-nuts] epoll: why write 1 byte to the pipe in netpollBreak() and read up to 16 bytes in netpoll()?

2023-08-14 Thread metronome
Hi Ian,

Thanks for clarifying, yes it's no harm to leave the code untouched.

On Tuesday, August 15, 2023 at 3:21:57 AM UTC+8 Ian Lance Taylor wrote:

> On Mon, Aug 14, 2023 at 11:28 AM metronome  wrote:
> >
> > >> If several different goroutines decide to wake up the polling
> > >> goroutine before the polling goroutine wakes up, they will each write
> > >> a single byte
> >
> > Wondering, with the introduction of "netpollWakeSig", does it still hold 
> true? Thanks.
>
> Good point, I think you're right. With netpollWakeSig we shouldn't
> expect to see more than a single byte written to the pipe.
>
> Doesn't hurt to try to read more bytes, though.
>
> Ian
>
> > On Tuesday, July 11, 2023 at 9:00:36 AM UTC+8 Ian Lance Taylor wrote:
> >>
> >> On Mon, Jul 10, 2023 at 6:10 AM shaouai  wrote:
> >> >
> >> > In the implementation of the Go netpoller, `netpollBreak()` attempts 
> to write 1 byte to `netpollBreakWr`, whereas `netpoll()` reads up to 16 
> bytes from `netpollBreakRd`, why 16 bytes rather than 1 byte?
> >> >
> >> > write up to 1 byte: 
> https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/runtime/netpoll_epoll.go;l=77;drc=c7cc2b94c63af610a29b1b48cfbfb87cb8abf05b
> >> >
> >> > read up to 16 bytes: 
> https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/runtime/netpoll_epoll.go;l=146;drc=c7cc2b94c63af610a29b1b48cfbfb87cb8abf05b
> >>
> >> A single byte will wake up a goroutine sleeping in netpoll, so there
> >> is no reason to write more than one byte.
> >>
> >> If several different goroutines decide to wake up the polling
> >> goroutine before the polling goroutine wakes up, they will each write
> >> a single byte, and they will all be satisfied by a single wakeup.
> >> And, if we don't read all those bytes, there will still be bytes in
> >> the pipe and we'll wake up the next time around the poll loop even if
> >> we don't have to. So we try to read all of their wakeup bytes at
> >> once. The number 16 is arbitrary, based on the assumption that it's
> >> not all that likely that more than 16 goroutines will try to wake up
> >> the poller simultaneously.
> >>
> >> Ian
> >
> > --
> > You received this message because you are subscribed to the Google 
> Groups "golang-nuts" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to golang-nuts...@googlegroups.com.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/8a3ae80e-1e78-441d-8c9a-c99e94e3c2c9n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/6a42744a-6c3f-4039-99f2-2f6e993fad52n%40googlegroups.com.


Re: [go-nuts] epoll: why write 1 byte to the pipe in netpollBreak() and read up to 16 bytes in netpoll()?

2023-08-14 Thread metronome
>> If several different goroutines decide to wake up the polling
>> goroutine before the polling goroutine wakes up, they will each write
>> a single byte

Wondering, with the introduction of "netpollWakeSig", does it still hold 
true? Thanks.

On Tuesday, July 11, 2023 at 9:00:36 AM UTC+8 Ian Lance Taylor wrote:

> On Mon, Jul 10, 2023 at 6:10 AM shaouai  wrote:
> >
> > In the implementation of the Go netpoller, `netpollBreak()` attempts to 
> write 1 byte to `netpollBreakWr`, whereas `netpoll()` reads up to 16 bytes 
> from `netpollBreakRd`, why 16 bytes rather than 1 byte?
> >
> > write up to 1 byte: 
> https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/runtime/netpoll_epoll.go;l=77;drc=c7cc2b94c63af610a29b1b48cfbfb87cb8abf05b
> >
> > read up to 16 bytes: 
> https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/runtime/netpoll_epoll.go;l=146;drc=c7cc2b94c63af610a29b1b48cfbfb87cb8abf05b
>
> A single byte will wake up a goroutine sleeping in netpoll, so there
> is no reason to write more than one byte.
>
> If several different goroutines decide to wake up the polling
> goroutine before the polling goroutine wakes up, they will each write
> a single byte, and they will all be satisfied by a single wakeup.
> And, if we don't read all those bytes, there will still be bytes in
> the pipe and we'll wake up the next time around the poll loop even if
> we don't have to. So we try to read all of their wakeup bytes at
> once. The number 16 is arbitrary, based on the assumption that it's
> not all that likely that more than 16 goroutines will try to wake up
> the poller simultaneously.
>
> Ian
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/8a3ae80e-1e78-441d-8c9a-c99e94e3c2c9n%40googlegroups.com.


[go-nuts] about "hybrid barrier"

2023-07-28 Thread metronome
 

Hi,

I came across two questions regarding the hybrid barrier that I am hoping 
someone can help with.


1.

Chang https://go-review.googlesource.com/c/go/+/31765 says:

*"It's unconditional for now because barriers on channel operations require 
checking both the source and destination stacks and we don't have a way to 
funnel this information into the write barrier at the moment."*

Can anyone help in understanding the statement, say with a sample? Isn't 
"channel operations involving both stacks" already covered by 
runtime.sendDirect?

2. comments in mbarrier.go says

// The insertion part of the barrier

// is necessary while the calling goroutine's stack is grey. In

// pseudocode, the barrier is:

//

// writePointer(slot, ptr):

// shade(*slot)

// if current stack is grey:

// shade(ptr)

// *slot = ptr

What does "grey stack" mean? Is a stack considered 'grey' right after its 
goroutine gets suspended, scanned and resumed to execution?

Thanks a lot.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a36d3b5f-144b-446d-83a6-746394d2d99an%40googlegroups.com.


[go-nuts] about "hybrid barrier"

2023-07-28 Thread metronome
Hi,

I came across two questions regarding the hybrid barrier that I am hoping 
someone can help with.

1.
Chang https://go-review.googlesource.com/c/go/+/31765 says:

"It's unconditional for now because barriers on channel operations require 
checking both the source and destination stacks and we don't have a way to 
funnel this information into the write barrier at the moment."

Can anyone help in understanding the statement, say with a sample? Isn't 
"channel operations involving both stacks" already covered by runtime.
sendDirect?

2. comments in mbarrier.go says

// The insertion part of the barrier
// is necessary while the calling goroutine's stack is grey. In
// pseudocode, the barrier is:
//
// writePointer(slot, ptr):
// shade(*slot)
// if current stack is grey:
// shade(ptr)
// *slot = ptr

What does "grey stack" mean? Is a stack considered 'grey' right after its 
goroutine gets suspended, scanned and resumed to execution?

Thanks a lot. 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/797952e5-bd27-423f-91cc-e5b5ac46f4d1n%40googlegroups.com.