I think the allocation triggers the GC, but it is stuck in start because it is 
waiting for all other Go routines to come to the safe point. I would grab a 
trace (but I’m not sure you can capture the trace if things are “stuck”) and 
review that. Or maybe use an external sampler like ‘Instruments’ on osx, or 
similar. It certainly seems like some routine is spinning - prevent GC from 
happening and pausing everything else.

> On Dec 12, 2018, at 5:20 PM, Eric Hamilton <e...@kasten.io> wrote:
> 
> Thanks for the various possibilities.  It’s helpful to know things to look 
> out for.
> 
> I don’t think there’s a tight polling loop in the code— at least I haven’t 
> found it yet.
> 
> We’re using some open source packages, notably Kanister 
> (https://github.com/kanisterio <https://github.com/kanisterio>) which Kasten 
> contributes, Stow (https://github.com/graymeta/stow 
> <https://github.com/graymeta/stow>) and in this case some Google cloud APIs 
> through Stow.
> 
> On more than one occasion it has looked like the application goroutine gets 
> stuck allocating an 8MB byte slice.  (8MB is the default chunk size in one of 
> the Google cloud storage APIs and Stow lets it default.)
> 
> I don’t know whether an 8MB malloc is a problem per se or perhaps that a few 
> of those trigger GC and take the shouldhelpgc path.
> 
> It’s most likely that the problem is in our code somewhere or settings that 
> match our usage pattern of growing and shrinking the live heap.  I have 
> wondered whether we might be hitting a corner case in the runtime (missed 
> wakeup?).  Given the wide usage of Go across a variety of different types of 
> programs, I doubt it, but something sure looks funny with long StopTheWorld 
> times.
> 
> Any additional thought on diagnosis or experiments?  If I can’t figure out 
> exactly what’s happening I’ll see about a workaround such as explicit calls 
> to runtme.GC() at safe spots or periodically, etc.
> 
> 
> Eric
> 
> 
> p.s. Here’s the top of a stack trace triggered with a panic in SIGTERM 
> handler.  It shows the make([]byte, 0, 8 * 1024 *1024) triggering gcStart().
> 
> 
> goroutine 15 [runnable]:
> runtime.gcStart(0x0, 0x1, 0x0, 0x0)
>       /usr/local/go/src/runtime/mgc.go:1236 +0x583 fp=0xc000cd7540 
> sp=0xc000cd7538 pc=0x419e63
> runtime.mallocgc(0x800000, 0x1c04f80, 0x1, 0x600)
>       /usr/local/go/src/runtime/malloc.go:996 +0x40e fp=0xc000cd75e0 
> sp=0xc000cd7540 pc=0x40bb0e
> runtime.makeslice(0x1c04f80, 0x0, 0x800000, 0xc00022ad50, 0xc000cd7678, 
> 0x15cfcdc)
>       /usr/local/go/src/runtime/slice.go:70 +0x77 fp=0xc000cd7610 
> sp=0xc000cd75e0 pc=0x4414f7
> vendor/google.golang.org/api/gensupport.NewMediaBuffer 
> <http://google.golang.org/api/gensupport.NewMediaBuffer>(...)
>       /repo/go/src/vendor/google.golang.org/api/gensupport/buffer.go:26 
> <http://google.golang.org/api/gensupport/buffer.go:26>
> vendor/google.golang.org/api/gensupport.PrepareUpload 
> <http://google.golang.org/api/gensupport.PrepareUpload>(0x23aa900, 
> 0xc000039b30, 0x800000, 0x0, 0x23aa900, 0xc000039b30)
>       /repo/go/src/vendor/google.golang.org/api/gensupport/media.go:188 
> <http://google.golang.org/api/gensupport/media.go:188> +0x50 fp=0xc000cd7688 
> sp=0xc000cd7610 pc=0x15d03e0
> vendor/google.golang.org/api/storage/v1 
> <http://google.golang.org/api/storage/v1>.(*ObjectsInsertCall).Media(0xc000b2e780,
>  0x23a5a80, 0xc00022ad50, 0x0, 0x0, 0x0, 0xe)
>       
> /repo/go/src/vendor/google.golang.org/api/storage/v1/storage-gen.go:8622 
> <http://google.golang.org/api/storage/v1/storage-gen.go:8622> +0xb8 
> fp=0xc000cd7718 sp=0xc000cd7688 pc=0x195c188
> vendor/github.com/graymeta/stow/google 
> <http://github.com/graymeta/stow/google>.(*Container).Put(0xc000de4be0, 
> 0xc00058a871, 0x86, 0x23a5a80, 0xc00022ad50, 0x45, 0xc00022ad80, 0x50, 
> 0xc000cd7840, 0x40c278, ...)
>       /repo/go/src/vendor/github.com/graymeta/stow/google/container.go:144 
> <http://github.com/graymeta/stow/google/container.go:144> +0x186 
> fp=0xc000cd77d0 sp=0xc000cd7718 pc=0x1969d36
> github.com/kanisterio/kanister/pkg/objectstore 
> <http://github.com/kanisterio/kanister/pkg/objectstore>.(*directory).Put(0xc000de5b20,
>  0x23c9740, 0xc00090b200, 0x20fdcf9, 0xe, 0x23a5a80, 0xc00022ad50, 0x45, 0x0, 
> 0xc000171ad0, ...)
>       
> /repo/go/src/github.com/kanisterio/kanister/pkg/objectstore/directory.go:206 
> <http://github.com/kanisterio/kanister/pkg/objectstore/directory.go:206> 
> +0xe9 fp=0xc000cd7850 sp=0xc000cd77d0 pc=0x1973769
> github.com/kanisterio/kanister/pkg/objectstore 
> <http://github.com/kanisterio/kanister/pkg/objectstore>.(*directory).PutBytes(0xc000de5b20,
>  0x23c9740, 0xc00090b200, 0x20fdcf9, 0xe, 0xc000e4e410, 0x45, 0x50, 0x0, 
> 0x9d895a, ...)
>       
> /repo/go/src/github.com/kanisterio/kanister/pkg/objectstore/directory.go:212 
> <http://github.com/kanisterio/kanister/pkg/objectstore/directory.go:212> 
> +0xd5 fp=0xc000cd78b8 sp=0xc000cd7850 pc=0x19739a5
> 
> 
> 
> 
>> On Dec 11, 2018, at 8:21 PM, robert engels <reng...@ix.netcom.com 
>> <mailto:reng...@ix.netcom.com>> wrote:
>> 
>> You might want to review this https://github.com/golang/go/issues/10958 
>> <https://github.com/golang/go/issues/10958>
>> 
>>> On Dec 11, 2018, at 10:16 PM, robert engels <reng...@ix.netcom.com 
>>> <mailto:reng...@ix.netcom.com>> wrote:
>>> 
>>> Reviewing the code, the 5s of cpu time is technically the stop-the-world 
>>> (STW) sweep termination
>>> 
>>> So, I think the cause of your problem is that you have a tight / “infinite” 
>>> loop that is not calling runtime.Gosched(), so it is taking a very long 
>>> time for the sweep termination to complete.
>>> 
>>> 
>>>> On Dec 11, 2018, at 9:43 PM, robert engels <reng...@ix.netcom.com 
>>>> <mailto:reng...@ix.netcom.com>> wrote:
>>>> 
>>>> Well, your pause is clearly related to the GC - the first phase, the mark, 
>>>> is 5s in #17. Are you certain you don’t have an incorrect highly recursive 
>>>> loop that is causing the stack marking to take a really long time… ?
>>>> 
>>>> 
>>>>> On Dec 11, 2018, at 8:45 PM, Eric Hamilton <e...@kasten.io 
>>>>> <mailto:e...@kasten.io>> wrote:
>>>>> 
>>>>> Of course.  (I forgot about that option and I'd collected those traces at 
>>>>> a time when I thought I'd ruled out GC-- probably misread MemStats 
>>>>> values).
>>>>> 
>>>>> Here's the gctrace output for a repro with a 5.6 second delay ending with 
>>>>> finish of gc17 and a 30+ second delay ending in SIGTERM (which coincides 
>>>>> with completion of gc18):
>>>>> 
>>>>> gc 1 @0.022s 1%: 0.94+10+0.19 ms clock, 0.94+0.65/0.60/5.3+0.19 ms cpu, 
>>>>> 4->4->1 MB, 5 MB goal, 4 P
>>>>> gc 2 @0.048s 3%: 0.007+5.9+0.17 ms clock, 0.007+0.24/5.7/1.0+0.17 ms cpu, 
>>>>> 4->4->1 MB, 5 MB goal, 4 P
>>>>> gc 3 @0.062s 3%: 0.004+3.7+0.18 ms clock, 0.004+0.19/0.78/2.8+0.18 ms 
>>>>> cpu, 4->4->1 MB, 5 MB goal, 4 P
>>>>> gc 4 @0.075s 13%: 0.003+3.4+69 ms clock, 0.003+0.069/0.99/2.3+69 ms cpu, 
>>>>> 4->4->2 MB, 5 MB goal, 4 P
>>>>> gc 5 @0.154s 12%: 0.007+7.4+0.032 ms clock, 0.007+0.15/0/5.1+0.032 ms 
>>>>> cpu, 4->5->3 MB, 5 MB goal, 4 P
>>>>> gc 6 @0.234s 10%: 0.004+34+0.025 ms clock, 0.004+0/33/6.8+0.025 ms cpu, 
>>>>> 6->7->5 MB, 7 MB goal, 4 P
>>>>> gc 7 @0.296s 8%: 0.005+28+0.031 ms clock, 0.005+0/0.99/50+0.031 ms cpu, 
>>>>> 10->10->5 MB, 11 MB goal, 4 P
>>>>> gc 8 @0.416s 6%: 0.007+49+1.1 ms clock, 0.007+0.13/4.5/32+1.1 ms cpu, 
>>>>> 11->11->6 MB, 12 MB goal, 4 P
>>>>> gc 9 @0.591s 5%: 0.006+23+0.032 ms clock, 0.006+0/6.5/31+0.032 ms cpu, 
>>>>> 12->12->7 MB, 13 MB goal, 4 P
>>>>> gc 10 @0.640s 5%: 0.006+10+0.032 ms clock, 0.006+0/3.6/12+0.032 ms cpu, 
>>>>> 13->13->8 MB, 14 MB goal, 4 P
>>>>> gc 11 @120.657s 0%: 0.012+11+0.030 ms clock, 0.012+0/1.7/23+0.030 ms cpu, 
>>>>> 13->13->9 MB, 16 MB goal, 4 P
>>>>> gc 12 @240.674s 0%: 0.011+12+0.037 ms clock, 0.011+0/6.1/27+0.037 ms cpu, 
>>>>> 10->10->9 MB, 18 MB goal, 4 P
>>>>> gc 13 @360.691s 0%: 0.011+10+0.022 ms clock, 0.011+0/2.7/25+0.022 ms cpu, 
>>>>> 10->10->9 MB, 18 MB goal, 4 P
>>>>> gc 14 @480.711s 0%: 0.018+11+0.021 ms clock, 0.018+0/4.1/27+0.021 ms cpu, 
>>>>> 10->10->9 MB, 18 MB goal, 4 P
>>>>> gc 15 @600.727s 0%: 0.016+14+0.030 ms clock, 0.016+0/7.0/30+0.030 ms cpu, 
>>>>> 11->11->9 MB, 18 MB goal, 4 P
>>>>> gc 16 @720.746s 0%: 0.016+13+0.023 ms clock, 0.016+0/5.7/27+0.023 ms cpu, 
>>>>> 10->10->9 MB, 18 MB goal, 4 P
>>>>> gc 17 @816.508s 0%: 5560+26+0.067 ms clock, 5560+7.3/26/22+0.067 ms cpu, 
>>>>> 18->19->11 MB, 19 MB goal, 4 P
>>>>> gc 18 @824.133s 1%: 40566+16+0.035 ms clock, 40566+6.8/15/10+0.035 ms 
>>>>> cpu, 23->24->18 MB, 24 MB goal, 4 P
>>>>> 
>>>>> I've attached a file that also includes the schedtrace=1000, health check 
>>>>> log messages, key app logs, and the runtime.MemStats collected at the end 
>>>>> of the same run.
>>>>> 
>>>>> Thanks again for any insights,
>>>>> Eric
>>>>> 
>>>>> 
>>>>> On Tuesday, December 11, 2018 at 4:32:33 PM UTC-8, robert engels wrote:
>>>>> I think it would be more helpful if you used gctrace=1 to report on the 
>>>>> GC activity.
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "golang-nuts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>>>> For more options, visit https://groups.google.com/d/optout 
>>>>> <https://groups.google.com/d/optout>.
>>>>> <trace.medium.txt>
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "golang-nuts" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>>> For more options, visit https://groups.google.com/d/optout 
>>>> <https://groups.google.com/d/optout>.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>> For more options, visit https://groups.google.com/d/optout 
>>> <https://groups.google.com/d/optout>.
>> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to