Re: [go-nuts] Memory limits

2018-11-21 Thread rlh
If this is important and wasn't fixed in 1.11 or at tip then please file a 
bug report  with a reproducer at https://github.com/golang/go/issues. An 
issue number will result in the Go team investigating.

On Friday, November 16, 2018 at 1:12:58 PM UTC-5, Robert Engels wrote:
>
> This article 
> https://syslog.ravelin.com/further-dangers-of-large-heaps-in-go-7a267b57d487 
> would 
> imply it is far less than that. 
>
> I responded that something didn’t feel right, but maybe Gos lack of 
> generational GC is a real stumbling block here. 
>
> On Nov 16, 2018, at 11:58 AM, Matthew Zimmerman  > wrote:
>
> What are the current memory limits?  I know it used to be 512gb.  Is there 
> a difference between the platforms?  I couldn't find documentation on this.
>
> Thanks!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: Persistent memory support for Go

2019-04-06 Thread rlh
Out of curiosity what HW/OS is this being developed on? I need new HW and 
might as well get the same since it will make playing around with this 
smoother.

On Wednesday, April 3, 2019 at 6:35:13 PM UTC-4, Jerrin Shaji George wrote:
>
> Hi,
>
>  
>
> I am part of a small team at VMware working on projects related to 
> persistent
>
> memory (others in CC). We have recently been working on adding persistent 
> memory
>
> support to the Go programming language, and I wanted to spread the word 
> about
>
> couple of these projects.
>
>  
>
> 1) Go-pmem-transaction
>
> The go-pmem-transaction project introduces a new programming model for 
>
> developing applications in Go for persistent memory. It consists of two 
> packages
>
> - pmem and transaction. 
>
>  
>
> The pmem package provides methods to initialize persistent memory and an
>
> interface to set and retrieve objects in persistent memory. The transaction
>
> package provides undo and redo transaction logging APIs to support
>
> crash-consistent updates to persistent memory data. 
>
>  
>
> Project page - https://github.com/vmware/go-pmem-transaction
>
>  
>
> 2) Go-pmem
>
> The Go-pmem project adds native persistent memory support to Go. 
>
> Some of the features of the persistent memory support added to Go are:
>
> * Support for persistent memory allocations
>
> * Garbage collector now collects objects from persistent 
> heap and volatile
>
> heap
>
> * Runtime automatically swizzles pointers if the memory 
> mapping address
>
> changes on an application restart
>
> * The persistent memory heap is dynamically sized and 
> supports automatic
>
> heap growth depending on memory demand
>
>  
>
> Project page - https://github.com/jerrinsg/go-pmem
>
>  
>
> The project pages contains links to further documentation. We welcome the
>
> community to try out these projects and send any feedback our way!
>
>  
>
> Also see the blog post at 
> https://blogs.vmware.com/opensource/2019/04/03/persistent-memory-with-go/
>
>  
>
> Thanks,
>
> Jerrin
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: How Go GC perform with 128GB+ heap

2016-08-01 Thread rlh
I think the high bit here is that the Go community is very aggressive about 
GC latency. Go has large
users with large heaps, lots of goroutines, and SLOs similar to those being 
discussed here. When 
they run into GC related latency problems the Go team works with them to 
root cause and address
the problem. It has been a very successful collaboration. 

Stepping back, work on large heaps is being motivated by the fact that RAM 
hardware, due to its
thermal characteristics, is still doubling byte/$ every 2 years or so. As 
heaps grow GC latency needs
to be independent of heap size if Go is going to continue to scale over the 
next decade. The Go 
team is well aware of this, is motivated by it, and continues to design a 
GC to address this trend. 

On Sunday, July 31, 2016 at 9:26:13 AM UTC-4, almeida@gmail.com wrote:
>
> I'm starting a proof of concept project to the company i work. The project 
> is a http proxy with a smart layer of cache (Varnish, Nginx and others 
> don't work because we have business rules on cache invalidation) for a very 
> big microservice architecture (300+ services).
>
> We have 2x128GB machines available today for this project. 
> I don't have any doubt that Go has amazing performance, used in other 
> projects, and they are rock solid, very fast and consuming very little 
> memory.
> But i'm afraid to use Go at this project because of the GC. I'm planning 
> to use all the available memory on cache. Isn't all this memory on heap be 
> a problem?
>
> It's a new area to me, store tons of GB in a GC language.
> What is my options? Use a []byte and or mmap to stay out of GC?
> Lots and lots of code to reimplement this datastructures on top of slices 
> just to avoid the GC, not counting all the encoding/decoding to get/set the 
> values.
>
> Stick with the raw slices?
> Didn't used Cgo before, but it is a viable option? 
> Or should i go 100% offheap with something like Rust or C?
>
> I hope to add as little overhead as possible.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: GC direct bitmap allocation and Intel Cache Allocation Technology

2016-08-01 Thread rlh
Thanks for pointing this out. While it isn't clear how this is applicable 
to the sweep free alloc 
work it does seem relevant to the mark worker's heap tracing which can 
charitable be 
described as a cache smashing machine. The mark worker, loads an object 
from a random 
location in memory,  scans it, drops pointers to other objects needing to 
be scanned into a 
buffer, and then does not visit the object's cache lines again. The loads 
evicts cache lines 
being used by goroutines and while a victim cache may help ameliorate the 
problem any 
cost we can move from a goroutine to the GC worker is a win.

Segregating the GC's cache lines from that of the go-routines should result 
in the 
goroutine's cache lines not being evicted and the GC evicting a cache line 
recently used by
the GC. This seems like a good replacement policy. It certainly seems like 
a promising avenue. 
It's a build and measure. (and build and measure)


 

On Sunday, July 31, 2016 at 5:47:44 AM UTC-4, EduRam wrote:
>
> Hi!
>
> I am catching up some summer readings ... and recently came across this 
> article 
> about Intel Cache Allocation Technology (CAT), on Haswell processors
> (http://lwn.net/Articles/694800/)
>
>
> It appears it will be possible to allocate partial CPU L3 cache to 
> processes.
>
>
> I remember reading about a new experiment on Go runtime GC allocation, 
> that would use more "cache friendly" bitmap.
> (
> https://github.com/golang/proposal/blob/master/design/12800-sweep-free-alloc.md
> )
>
>
> Just out of curiosity ... could this CAT mechanism, help further the GC 
> mechanism ?
> I must confess this is outside my knowledge domain. Just reading for fun 
> of it.
>
> Thanks and great holidays,
>
> Edu
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Excessive garbage collection

2016-10-18 Thread rlh
>From the trace (4->4->0) it looks like the app is allocating about 4MB 
every 10ms. The app also has little (0 rounded) reachable data, sometimes 
called heap ballast. Since there is little ballast the GC is attempting to 
keep the heap from growing beyond 5MB. The GC is using about 2% of the CPU 
resources to do its job. 

All of this seems perfectly reasonable from the GC's perspective.


On Tuesday, October 18, 2016 at 12:32:47 AM UTC-4, Jiří Šimša wrote:
>
> go version go1.7.1 darwin/amd64
>
> --
> Jiří Šimša
>
> On Mon, Oct 17, 2016 at 8:02 PM, Ian Lance Taylor  > wrote:
>
>> On Mon, Oct 17, 2016 at 6:20 PM,  > 
>> wrote:
>> >
>> > The backend of my web server (written in Go) have recently started 
>> consuming
>> > large amounts of CPU. AFAICT, the CPU seems to be consumed by the 
>> garbage
>> > collector and I would appreciate any information that would help me 
>> track
>> > down the root cause.
>>
>> What version of Go?  What platform?
>>
>> Ian
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: Why does a "concurrent" Go GC phase appear to be stop-the-world?

2016-10-19 Thread rlh
This is likely 23540 .  


On Wednesday, October 19, 2016 at 8:32:18 AM UTC-4, Will Sewell wrote:
>
> Hey, I previously posted this on StackOverflow, but I was told this 
> mailing list would be a better forum for discussion.
>
> I am attempting to benchmark the maximum STW GC pause time for different 
> numbers of heap objects. To do this I have written a simple benchmark that 
> pushes and pops messages from a map:
>
> package main
>
> type message []byte
>
> type channel map[int]message
>
> const (
> windowSize = 20
> msgCount = 100
> )
>
> func mkMessage(n int) message {
> m := make(message, 1024)
> for i := range m {
> m[i] = byte(n)
> }
> return m
> }
>
>
> func pushMsg(c *channel, highID int) {
> lowID := highID - windowSize
> m := mkMessage(highID)
> (*c)[highID] = m
> if lowID >= 0 {
> delete(*c, lowID)
> }
> }
>
>
> func main() {
> c := make(channel)
> for i := 0; i < msgCount; i++ {
> pushMsg(&c, i)
> }
> }
>
> I ran this with GODEBUG=gctrace=1 , and 
> on my machine the output is:
>
> gc 1 @0.004s 2%: 0.007+0.44+0.032 ms clock, 0.029+0.22/0.20/0.28+0.12 ms 
> cpu, 4->4->3 MB, 5 MB goal, 4 P
> gc 2 @0.009s 3%: 0.007+0.64+0.042 ms clock, 0.030+0/0.53/0.18+0.17 ms cpu, 
> 7->7->7 MB, 8 MB goal, 4 P
> gc 3 @0.019s 1%: 0.007+0.99+0.037 ms clock, 0.031+0/0.13/1.0+0.14 ms cpu, 
> 13->13->13 MB, 14 MB goal, 4 P
> gc 4 @0.044s 2%: 0.009+2.3+0.032 ms clock, 0.039+0/2.3/0.30+0.13 ms cpu, 
> 25->25->25 MB, 26 MB goal, 4 P
> gc 5 @0.081s 1%: 0.009+9.2+0.082 ms clock, 0.039+0/0.32/9.7+0.32 ms cpu, 
> 49->49->48 MB, 50 MB goal, 4 P
> gc 6 @0.162s 0%: 0.020+10+0.078 ms clock, 0.082+0/0.28/11+0.31 ms cpu, 93
> ->93->91 MB, 96 MB goal, 4 P
> gc 7 @0.289s 0%: 0.020+27+0.092 ms clock, 0.080+0/0.95/28+0.37 ms cpu, 178
> ->178->173 MB, 182 MB goal, 4 P
> gc 8 @0.557s 1%: 0.023+38+0.086 ms clock, 0.092+0/38/10+0.34 ms cpu, 337->
> 339->209 MB, 346 MB goal, 4 P
> gc 9 @0.844s 1%: 0.008+40+0.077 ms clock, 0.032+0/5.6/46+0.30 ms cpu, 407
> ->409->211 MB, 418 MB goal, 4 P
> gc 10 @1.100s 1%: 0.009+43+0.047 ms clock, 0.036+0/6.6/50+0.19 ms cpu, 411
> ->414->212 MB, 422 MB goal, 4 P
> gc 11 @1.378s 1%: 0.008+45+0.093 ms clock, 0.033+0/6.5/52+0.37 ms cpu, 414
> ->417->213 MB, 425 MB goal, 4 P
>
> My version of Go is:
>
> $ go version
> go version go1.7.1 darwin/amd64
>
> From the above results, the longest wall clock STW pause time is 0.093ms. 
> Great!
>
> However as a sanity check I also manually timed how long it took to create 
> a new message by wrapping mkMessage with
>
> start := time.Now()
> m := mkMessage(highID)
> elapsed := time.Since(start)
>
> and printed the slowest `elapsed` time. The time I get for this was 
> 38.573036ms!
>
> I was instantly suspicious because this correlated strongly with the wall 
> clock times in the supposedly concurrent mark/scan phase, and in particular 
> with "idle GC time".
>
> *My question is: why does this supposedly concurrent phase of the GC 
> appear to block the mutator?*
>
> If I force the GC to run at regular intervals, my manually calculated 
> pause times go way down to <1ms, so it appears to be hitting some kind of 
> limit of non-live heap objects. If so, I'm not sure what that limit is, and 
> why it would cause a concurrent phase of the GC to appear to block the 
> mutator.
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: Is ROC part of 1.8?

2016-10-28 Thread rlh
We are still actively working on ROC. Unfortunately it will not be part of 
1.8. 

On Thursday, October 27, 2016 at 1:17:44 PM UTC-4, Chandra Sekar S wrote:
>
> Is the request-oriented GC slated to be included in the 1.8 release?
>
> --
> Chandra Sekar.S
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: 10x latency spikes during GC alloc assist phase

2017-07-26 Thread rlh
I would add to 14812 
.
 
The report should include the environment variables, HW , and RAM. The 
report should indicate if any environment variables are not set to the 
default, such as GOGC (SetGCPercent).

On Monday, July 24, 2017 at 8:44:10 PM UTC-4, stb...@laserlike.com wrote:
>
> Hi,
>
> We are experiencing a problem that I believe may be related to issue 14812 
> 
>  
> but I wanted to ask here before adding to that case or filing a new issue. 
> Of course, we’d also greatly appreciate any advice about how to make our 
> program performant.
>
> Here is what we observe: at Laserlike one of our core user-facing services 
> (the “leaf”) typically responds to a particular rpc in <400ms.  During GC 
> we see spikes in latency to >5s on some simple requests.  The stop the 
> world periods are short, so the GC spikes appear to happen at other times.
>
> We have been unable to replicate this in a simple program, but we did run 
> our code in a test mode that repros it.  In our test environment the server 
> loads ~10 GB of persistent data (never deleted so doesn’t really need to be 
> GCed), and we ask for 8 processors.  We are running go version 1.8.3 on 
> kubernetes on GCP machine of type n1-highmem-64.  To create the problem we 
> send the server a single request with >500 tasks..
>
>
> This google drive folder has leaf-logs-full.redacted.txt as well as other 
> performance tooling files 
> .
>  
>  A snippet from that log here shows normal query responses and timing:
>
> I0719 22:50:22.467367 leaf.go:363] Worker #5 done search for '[redacted]', 
> took 0.013 seconds
>
> I0719 22:50:22.467406 leaf.go:225] Worker #5 starting search for 
> '[redacted]'
>
> I0719 22:50:22.467486 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.001 seconds
>
> I0719 22:50:22.467520 leaf.go:225] Worker #6 starting search for 
> '[redacted]'
>
> I0719 22:50:22.468050 leaf.go:363] Worker #9 done search for '[redacted]', 
> took 0.005 seconds
>
>
> We have observed that if a GC cycle happens to start while serving traffic 
> (which is often) AND there is a large amount of time spent in assist, then 
> our serving latency skyrockets by 10x.  In the log the slowdown commences 
> roughly when pacer assist starts at I0719 22:50:31.079283 and then reverts 
> to normal latencies shortly after the gc cycle completes at I0719 
> 22:50:36.806085.
>
> Below I copy parts of the log where we see latencies of up to 729ms on 
> tasks.  I also bold the line that shows 32929ms spent on alloc gc assist.
>
> We have captured an attached cpu profile during this time which seems to 
> confirm a large amount of time spent in runtime.gcAssistAlloc.func1.
>
>
> Pardon our ignorance about GC in golang, but our hypothesis about what may 
> be going wrong is that our large in-memory data structures are causing gc 
> to often go into assist mode, and that for reasons we don’t understand 
> malloc becomes expensive in that mode.  Since we also create ~100k new data 
> objects when processing user requests, we are guessing those allocs become 
> very slow.  Another piece of evidence for this hypothesis is that we have 
> another (prototype) implementation of this kind of service that makes more 
> use of object pools and doesn’t seem to have as much of slowdown during GC.
>
> Note on large in-memory data-structures:
>
> The principal data structures can be thought of as:
>
> Map[uint64][]byte (about 10M map entries, the slice lengths between 5K to 
> 50K) (around ~10G total memory usage) 
>
> Map[uint64][]uint64 (about 50M map entries, the slice lengths vary between 
> 10 and 100K, in a zipfian distribution, about 3G total memory usage)
>
> These data structures mostly stay as is over the life of the program.
>
> We are trying to figure out how to solve this so would appreciate any 
> advice. An engineer on our team wrote up the following ideas, none of which 
> are that great:
>
>1. 
>
>Figure out a simple way to prevent our goroutines slowing down during 
>GC.  I had some hopes LockOSThread() could be made to work, but it didn't 
>seem to help in my experiments.  I'm not ruling this solution out 
> entirely, 
>but if it's the write barriers that are the main problem, I don't have 
> much 
>hope.
>2. 
>
>Run at least 2 replicas of all our servers.  Manage their GC cycles 
>ourselves, synchronized so that at most one replica is in GC at any given 
>time.  The clients should either send all requests to both replicas (and 
>cancel when one replies), or use some more complicated Kubernetes and 
>client logic so a GCing replica is never sent r

Re: [go-nuts] Re: GC SW times on Heroku (Beta metrics)

2017-12-05 Thread rlh
The wall clock is the first set of numbers, the second set is CPU. So 8P 
running for 8ms wall clock will result in 64ms CPU. The word "wall" was 
dropped to keep the line short.

There will be a beta out in the proverbial next few days that could help 
reduce even these STW times. The original post talked about 20 second and 
400 and 900 ms pauses. From what I'm seeing here it is hard to attribute 
them to GC STW pauses.

Also the GC is taking up (a rounded) 0% of the CPU which is pretty good 
(insert fancy new emoji).  It is also doing it with a budget of 10 or 11 
MBtyes on a machine that likely has 8 GB of Ram. To further test whether 
this is a GC issue or not try increasing GOGC until the MB goal on the 
gctrace line is 10x or 100x larger. This will reduce GC frequency by 10x or 
100x and if your tail latency is a GC problem the 99%tile latency numbers 
will become 99.9%tile or 99.99%tile numbers.

On Tuesday, December 5, 2017 at 2:39:53 AM UTC-5, Henrik Johansson wrote:
>
> I am watching with childlike fascination...
> This is interesting perhaps:
>
> gc 130 @2834.158s 0%: 0.056+3.4+2.9 ms clock, 0.45+2.8/5.6/0+23 ms cpu, 
> 8->8->4 MB, 9 MB goal, 8 P 
> gc 131 @2834.178s 0%: 0.023+7.3+0.12 ms clock, 0.18+1.2/5.4/9.2+1.0 ms 
> cpu, 9->9->5 MB, 10 MB goal, 8 P 
>
> ---> gc 132 @2836.882s 0%: 3.5+34+8.0 ms clock, 28+1.6/3.8/27+64 ms cpu, 
> 10->11->4 MB, 11 MB goal, 8 P 
>
> gc 133 @2836.961s 0%: 0.022+14+1.0 ms clock, 0.18+2.1/12/0+8.4 ms cpu, 
> 8->9->5 MB, 9 MB goal, 8 P 
> gc 134 @2837.010s 0%: 7.0+18+0.16 ms clock, 56+14/21/1.6+1.2 ms cpu, 
> 9->10->5 MB, 10 MB goal, 8 P 
>
> 28 + 64 ms SW (if I understand this correctly) to collect what 6-7 MB?
>
>
>
> tis 5 dec. 2017 kl 08:25 skrev Dave Cheney 
> >:
>
>> Oh yeah, I forgot someone added that a while back. That should work.
>>
>> On Tue, Dec 5, 2017 at 6:23 PM, Henrik Johansson > > wrote:
>> > So it has to run the program? I thought I saw "logfile" scenario in the
>> > examples?
>> >
>> > GODEBUG=gctrace=1 godoc -index -http=:6060 2> stderr.log
>> > cat stderr.log | gcvis
>> >
>> > I have shuffled the Heroku logs into Papertrail so I should be able to
>> > extract the log lines from there.
>> >
>> >
>> > tis 5 dec. 2017 kl 08:10 skrev Dave Cheney > >:
>> >>
>> >> Probably not for your scenario, gcviz assumes it can run your program
>> >> as a child.
>> >>
>> >> On Tue, Dec 5, 2017 at 6:07 PM, Henrik Johansson > >
>> >> wrote:
>> >> > I found https://github.com/davecheney/gcvis from +Dave Cheney is it 
>> a
>> >> > good
>> >> > choice for inspecting the gc logs?
>> >> >
>> >> > tis 5 dec. 2017 kl 07:57 skrev Henrik Johansson > >:
>> >> >>
>> >> >> I have just added the gc tracing and it looks like this more or less
>> >> >> all
>> >> >> the time:
>> >> >>
>> >> >> gc 78 @253.095s 0 
>> <https://maps.google.com/?q=@253.095s+0&entry=gmail&source=g>%: 
>> 0.032+3.3+0.46 ms clock, 0.26+0.24/2.6/2.4+3.6 ms
>> >> >> cpu,
>> >> >> 11->12->4 MB, 12 MB goal, 8 P
>> >> >> gc 79 @253.109s 0%: 0.021+2.1+0.17 ms clock, 0.16+0.19/3.6/1.2+1.3 
>> ms
>> >> >> cpu,
>> >> >> 9->9->4 MB, 10 MB goal, 8 P
>> >> >> gc 80 @253.120s 0%: 0.022+2.8+2.2 ms clock, 0.17+0.27/4.8/0.006+18 
>> ms
>> >> >> cpu,
>> >> >> 8->8->4 MB, 9 MB goal, 8 P
>> >> >> gc 81 @253.138s 0%: 0.019+2.3+0.10 ms clock, 0.15+0.73/3.9/3.1+0.81 
>> ms
>> >> >> cpu, 9->9->5 MB, 10 MB goal, 8 P
>> >> >>
>> >> >> Heroku already reports a SW of 343 ms but I can't find it by manual
>> >> >> inspection. I will download the logs later today and try to generate
>> >> >> realistic load.
>> >> >> What is the overhead of running like this, aside from the obvious 
>> extra
>> >> >> logging?
>> >> >> Are there any automatic tools to analyze these logs?
>> >> >>
>> >> >> lör 2 dec. 2017 kl 22:36 skrev Henrik Johansson > >:
>> >> >>>
>> >> >>> I am sorry, I was unclear. The app uses very little ram but the
>> >> >>> provisioned available memory is 512 MB.
>> >> >>>
>> >>

[go-nuts] Re: Strange, intermittent panic issues

2016-12-06 Thread rlh

0xb01dfacedebac1e is a poison pill that usually indicates misuse of 
unsafe.Pointer. 
If there is any use of unsafe.Pointer or CGO in the program that would be a 
good
place to start looking.

You can google "0xb01dfacedebac1e" for more details.



On Tuesday, December 6, 2016 at 5:00:37 PM UTC-5, brik...@gmail.com wrote:
>
> Hi folks,
>
> We're using Go to write an open source CLI tool called terragrunt 
> . We cross-compile binaries 
> for the tool for multiple OS's. Some of the users are reporting 
> intermittent crashes due to mysterious panic errors. Here are two examples:
>
>- https://github.com/gruntwork-io/terragrunt/issues/41 (crash log 
>
>) 
>- https://github.com/gruntwork-io/terragrunt/issues/68 (crash log 
>)
>
> The panics seem to happen in strange places. For example, one happens deep 
> in a call to the Printf method of a logger. Another in malloc. 
>
> We're at a loss for how to debug these as this doesn't seem to be caused 
> by the usual culprits (e.g. a nil in the code) and can only be reproduced 
> intermittently. Any suggestions?
>
> Thanks,
> Jim
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: too many runtime.gcBgMarkStartWorkers ?

2016-12-30 Thread rlh
The default is GOMAXPROCS == numCPU and the runtime is optimized and tested 
for this. There are use cases involving co-tenancy where setting GOMAXPROCS 
< numCPU tells the OS to limit HW allocation and improves overall 
throughput when several programs are running concurrently. 

Setting GOMAXPROCS > numCPU seems to indicate that the Go scheduler and the 
OS scheduler are out of sync. Perhaps the delay between the OS knowing a 
call is blocked and the Go scheduler knowing it is blocked is the root 
cause. Any insight into why setting GOMAXPROCS > numCPU is a win might lead 
to improving the Go scheduler or perhaps Go / OS interaction.

On Wednesday, December 28, 2016 at 3:41:31 PM UTC-5, Dave Cheney wrote:
>
> There may be a bug here. IMO the runtime should never try to start more 
> that numCPUs background workers regardless of the size of GOMAXPROCS as 
> that'll just cause contention during the GC cycle. 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Re: too many runtime.gcBgMarkStartWorkers ?

2017-01-01 Thread rlh
Will making tight loops preemptable (CL 10958 
) resolve this use case?

Is it true that some of the goroutines are compute bound while others have 
to respond to some sort of stimulus?  What response time do these 
 goroutines required? 1ms, 10ms, 100ms, 1second?

Thanks

On Friday, December 30, 2016 at 11:07:46 AM UTC-5, John Souvestre wrote:
>
> Ø  Any insight into why setting GOMAXPROCS > numCPU is a win might lead 
> to improving the Go scheduler or perhaps Go / OS interaction.
>
>  
>
> I’ve found it useful if you happen to have a number of compute-bound 
> goroutines which don’t reschedule.  Setting more processes than CPUs causes 
> the kernel to do some scheduling for them.
>
>  
>
> John
>
> John Souvestre - New Orleans LA
>
>  
>
> *From:* golan...@googlegroups.com  [mailto:
> golan...@googlegroups.com ] *On Behalf Of *r...@golang.org 
> 
> *Sent:* 2016 December 30, Fri 09:53
> *To:* golang-nuts
> *Subject:* [go-nuts] Re: too many runtime.gcBgMarkStartWorkers ?
>
>  
>
> The default is GOMAXPROCS == numCPU and the runtime is optimized and 
> tested for this. There are use cases involving co-tenancy where setting 
> GOMAXPROCS < numCPU tells the OS to limit HW allocation and improves 
> overall throughput when several programs are running concurrently. 
>
>  
>
> Setting GOMAXPROCS > numCPU seems to indicate that the Go scheduler and 
> the OS scheduler are out of sync. Perhaps the delay between the OS knowing 
> a call is blocked and the Go scheduler knowing it is blocked is the root 
> cause. Any insight into why setting GOMAXPROCS > numCPU is a win might lead 
> to improving the Go scheduler or perhaps Go / OS interaction.
>
>  
>
> On Wednesday, December 28, 2016 at 3:41:31 PM UTC-5, Dave Cheney wrote:
>
> There may be a bug here. IMO the runtime should never try to start more 
> that numCPUs background workers regardless of the size of GOMAXPROCS as 
> that'll just cause contention during the GC cycle. 
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: Large GC pauses with large map

2017-04-21 Thread rlh
How did you generate the GC pause graphs? Could also provide the output 
from "GODEBUG=gctrace=1 yourApp"? It would help confirm that it is a GC 
pause problem. Also some insight into the number of cores / HW threads and 
the value of GOMAXPROCS could eliminate some possibilities.
A reproducer would be great.
Thanks in advance. 

On Thursday, April 20, 2017 at 9:49:49 AM UTC-4, Lee Armstrong wrote:
>
> See attached graph which shows the GC pauses of an application we have.
>
> I am frequently seeing pauses of 1-1.5 seconds. This is using Go 1.8.1 and 
> have a large map that is frequently accessed and items are removed and 
> added to it.  These can be of some size.
>
> Is there a way to get these pauses down at all?  Would forcing a GC() 
> after removing a batch of elements help at all?
>
> Alongside the pauses I see some substantial CPU usage showing up in traces 
> for the GC scan.
>
> Thanks in advance!
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: Large GC pauses with large map

2017-04-21 Thread rlh
Lee,  
As far as I can tell this is resolved. Thanks for the discussion and for 
working with stackimpact to fix the root cause.


On Friday, April 21, 2017 at 3:52:55 PM UTC-4, Keith Randall wrote:
>
> It is almost never a good idea to call runtime.GC explicitly.
> It does block until a garbage collection completes.  This behavior is 
> sometimes useful in tests, but almost never otherwise.  If it weren't for 
> go1 compatibility, we'd rename this function to something that more clearly 
> spells out its blocking behavior.
>
> On Friday, April 21, 2017 at 11:51:17 AM UTC-7, Lee Armstrong wrote:
>>
>> Interestingly stackimpact.com just updated their code to remove the 
>> runtime.GC() calls.
>>
>> It has made a HUGE difference to the GC pauses.
>>
>> The code was updated just before 19:30.
>>
>> Interesting that the manual call had such an impact!
>>
>>
>> 
>>
>>
>> On Thursday, April 20, 2017 at 2:49:49 PM UTC+1, Lee Armstrong wrote:
>>>
>>> See attached graph which shows the GC pauses of an application we have.
>>>
>>> I am frequently seeing pauses of 1-1.5 seconds. This is using Go 1.8.1 
>>> and have a large map that is frequently accessed and items are removed and 
>>> added to it.  These can be of some size.
>>>
>>> Is there a way to get these pauses down at all?  Would forcing a GC() 
>>> after removing a batch of elements help at all?
>>>
>>> Alongside the pauses I see some substantial CPU usage showing up in 
>>> traces for the GC scan.
>>>
>>> Thanks in advance!
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] ROC (Request-Oriented Collector)

2018-05-15 Thread rlh via golang-nuts
The current plan is to polish and publish our learnings by the end of next 
month (June 2018). 

On Sunday, May 13, 2018 at 11:48:45 PM UTC-4, Ian Lance Taylor wrote:
>
> [ +rlh, austin] 
>
> On Sun, May 13, 2018 at 11:24 AM, Tanya Borisova  > wrote: 
> > Hi! 
> > 
> > Is Golang team still working on Request-Oriented Collector? Is there any 
> > update on it? Quick search in golang-nuts and golang-dev didn't yield 
> any 
> > new updates. 
> > 
> > If not, it would be very interesting to hear why not and what Golang GC 
> team 
> > is working on next. 
> > 
> > Thanks, 
> > Tanya 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "golang-nuts" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to golang-nuts...@googlegroups.com . 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Latency spike during GC

2017-06-12 Thread rlh via golang-nuts
 allows it to use 25% of the CPU and some 
>>>>>> assists from high allocating goroutines that is at most proportional to 
>>>>>> the 
>>>>>> goroutine's allocations. Beyond that the intent is that the GC only 
>>>>>> enlist 
>>>>>> otherwise idle Ps.  If it is the first 25% that is backing up the 
>>>>>> goroutines then things are working as expected. If on the other hand it 
>>>>>> is 
>>>>>> the enlistment of idle Ps that are backing up the goroutines then we 
>>>>>> need 
>>>>>> to understand what is happening and adjust the scheduling based on what 
>>>>>> we 
>>>>>> learn. I am hoping that the experiment I proposed above will shed some 
>>>>>> light on what is causing the backup.
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 1, 2017 at 12:48 PM, Xun Liu >>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 1, 2017 at 6:06 AM, Rick Hudson >>>>>> > wrote:
>>>>>>>
>>>>>>>> Yes, the GC seems to finish early but that is the result of having 
>>>>>>>> to be very conservative about the amount of work left to do. Add to 
>>>>>>>> this a 
>>>>>>>> lot of Ps with nothing scheduled that are enlisted to do GC work. It 
>>>>>>>> seems 
>>>>>>>> counter intuitive to leave Ps idle so the approach will need numbers 
>>>>>>>> to 
>>>>>>>> support it. There is no easy way to change the assist and idle 
>>>>>>>> settings.
>>>>>>>>
>>>>>>>
>>>>>>> I would argue that the fact that we saw runnable goroutines piling 
>>>>>>> up from time to time during gc (which almost never happens outside gc) 
>>>>>>> suggests go gc is too aggressive in enlisting help to a point that can 
>>>>>>> starve user goroutines -- unless you have other explanation for the 
>>>>>>> goroutine queue-up.
>>>>>>>
>>>>>>>
>>>>>>>  
>>>>>>>
>>>>>>>>
>>>>>>>> In an attempt to get some sort of an upper bound on potential 
>>>>>>>> latency improvements that might result from making the GC less 
>>>>>>>> aggressive 
>>>>>>>> could you run the following experiments. 
>>>>>>>>
>>>>>>>> GOMAXPROCS=24 yourApp
>>>>>>>> and report what, if any, change you see on your P50, P90 and P99 
>>>>>>>> numbers when GC is not running.
>>>>>>>> This should give us an upper bound on what can be expected if the 
>>>>>>>> GC runs with only 25% of the CPU. The experiment is not interested in 
>>>>>>>> the 
>>>>>>>> latency when the GC is running. This is trying to simulate what 
>>>>>>>> happens if 
>>>>>>>> we have a minimally aggressive GC.
>>>>>>>>
>>>>>>>> GOMAXPROCS=16 yourApp
>>>>>>>> This should give us an idea of what to expect if the GC never 
>>>>>>>> schedules half the processors regardless if they are idle.
>>>>>>>>
>>>>>>>> GOMAXPROCS=8 yourApp
>>>>>>>> This should give us an idea of what to expect if the GC never 
>>>>>>>> schedules 25% of the processors regardless if they are idle.
>>>>>>>>
>>>>>>>> Thanks, looking forward to seeing the percentile numbers and 
>>>>>>>> graphs. 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 31, 2017 at 3:01 PM, Xun Liu >>>>>>> > wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 31, 2017 at 11:35 AM, Rick Hudson >>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> gc 347 @6564.164s 0%: 0.89+518+1.

[go-nuts] Re: Running Go binary on a 56 core VM

2017-09-03 Thread rlh via golang-nuts
Without building and measuring it is impossible to know which of these 
approaches, or even a third one where you simply run a single instance, is 
best for your application.

Each approach has upsides and downsides. The GC believes GOMAXPROCS and 
will uses as much CPU as it believes is available. This can take CPU away 
from the other 3 instances. If more CPU is available but the GC doesn't 
know about it then those cycles will go unused.

Build, measure, and figure out why the application is seeing different 
performance characteristics. Please report back, 56 HW threads and 
256Gbytes RAM is of considerable interest to many of us.


On Thursday, August 31, 2017 at 9:59:59 AM UTC-4, Pradeep Singh wrote:
>
> Hi Guys,
>
> So, we wrote a Go service which does some heavy network IO over ZMQ (using 
> cgo calls).
>
> Now, we have to put this service on a VM in private cloud which has 56 
> cores and 256GB of physical memory.
>
> I am guessing it is mostly a dual core NUMA Intel Xeon machine with Xen 
> installed on it.
>
> We want to horizontally scale the application by launching 4 instances of 
> this service in this VM.
>
> We have tested the code for 30K+ QPS on a 16 core EC2 AMI.
>
> There are two ways we can do it.
>
> 1. Run 4 instances of the application as it is without changing any 
> defaults except configuration files and output data folders.
>
> 2. Run 4 instances of the application after modifying GOMAXPROCS.
> - GOMAXPROCS=16 ./run-my-unoptimized-app
>
> Which of these 2 scenarios would benefits us more in terms of performance.
>
> Does it makes sense to run all with default GOMAXPROCS value, which would 
> be 56 for all the 4 instances?
>
> Or it would be wise to follow option 2 with possibly pinning each to a 
> range of 16 cores using taskset.
>
> Would pinning help in second scenario?
>
> Thanks,
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: GC SW times on Heroku (Beta metrics)

2017-12-02 Thread rlh via golang-nuts
Hard telling what it going on. 35MB, even for 1 CPU, seems very small. Most 
modern system provision more than 1GB per HW thread though I've seen some 
provision as little as 512MB. GOGC (SetGCPercent) can be adjust so that the 
application uses more of the available RAM. Running with GODEBUG=gctrace=1 
will give you a sense of the GC's view of the application.

In any case these kinds of numbers, running on a real systems, and 
duplicatable on tip are worth filing an issue.

On Saturday, December 2, 2017 at 3:02:30 AM UTC-5, Henrik Johansson wrote:
>
> Hi,
>
> I am befuddled by GC SW times on several seconds (seen 20s once) in the 
> metrics page for our app. There are several things that are strange but 
> perhaps I am misreading it. The same metrics page reports Max Total 35 MB 
> out of which 1 MB s swap the rest RSS. The response times on the service is 
> has 99% ~400 ms which is not good but 95% is ~120 ms usually. 
> The app reloads an in memory cache as needed using atomic,Value as a 
> holder and the size is no more than a few thousand at any given time.
> Basically a map with pointers to simple structs and lists with pointers to 
> the same structs to allow for some simple access scenarios.
>
> Now I haven't profiled the app yet but even in a very pathologial case it 
> seems as though the GC would be able to keep up easily with such little 
> amount of memory being used. Granted this is a Standard 1x dyno but even so 
> once the machiine is stopped the GC should be able to complete it's work in 
> a very short time given the low used memory.
>
> Has anyone seen this as well? Could the Go metrics on Heroku simply report 
> erroneously? Perhaps a couple of orders of magnitide?
>
> Cheers,
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] Why golang garbage-collector not implement Generational and Compact gc?

2017-05-16 Thread rlh via golang-nuts
The Johnstone / Wilson paper "The memory fragmentation problem: solved?" 
[1] is the original source.

Modern malloc systems including Google's TCMalloc, Hoard [2], and Intel's 
Scalable Malloc (aka Mcrt Malloc [3]) all owe much to that paper and along 
with other memory managers all segregate objects by size. Many languages, 
most notable C/C++, use these fragmentation avoidance memory managers to 
build large system without the need for copy compaction.

[1] Mark S. Johnstone and Paul R. Wilson. 1998. The memory fragmentation 
problem: solved?. In Proceedings of the 1st international symposium on 
Memory management (ISMM '98). ACM, New York, NY, USA, 26-36. 
DOI=http://dx.doi.org/10.1145/286860.286864

[2] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. 
Wilson. 2000. Hoard: a scalable memory allocator for multithreaded 
applications. SIGPLAN Not. 35, 11 (November 2000), 117-128. 
DOI=http://dx.doi.org/10.1145/356989.357000

[3] Richard L. Hudson, Bratin Saha, Ali-Reza Adl-Tabatabai, and Benjamin C. 
Hertzberg. 2006. McRT-Malloc: a scalable transactional memory allocator. In 
Proceedings of the 5th international symposium on Memory management (ISMM 
'06). ACM, New York, NY, USA, 74-83. 
DOI=http://dx.doi.org/10.1145/1133956.1133967

On Tuesday, May 16, 2017 at 12:48:38 PM UTC-4, Zellyn wrote:
>
> Thanks for the enlightening and interesting reply, Ian.
>
> One quick question: do you have a link or a short description of why 
> “modern memory allocation algorithms, like the tcmalloc-based approach used 
> by the Go runtime, have essentially no fragmentation issues”?
>
> I was curious, but a quick search for [tcmalloc fragmentation] yielded 
> mostly people struggling with fragmentation issues when using tcmalloc.
>
> Zellyn
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.