Re: [go-nuts] Re: Low memory utilization when using soft memory limit with GOGC=off

2024-01-10 Thread Zhihui Jiang


On Monday, January 8, 2024 at 7:24:27 AM UTC-8 Michael Knyszek wrote:

On Sun, Jan 7, 2024 at 9:06 PM 'Zhihui Jiang' via golang-nuts <
golan...@googlegroups.com> wrote:

Hi Micheal,

Sorry about delayed response! Please see my replies inline below.

On Wednesday, December 6, 2023 at 8:22:34 PM UTC-8 Michael Knyszek wrote:

On Tuesday, December 5, 2023 at 1:06:19 AM UTC-5 Zhihui Jiang wrote:

Hi there, 

We are running a large scale recommendation system using Golang and we are 
working on some GC related improvement recently. One of the changes we are 
trying to apply is to use soft memory limit with GOGC=off as suggested 
here: https://github.com/golang/go/issues/48409.

But during our testing, the memory usage never reaches the memory limit we 
set. For example, we have 100GB memory available and we set the memory 
limit to 90GB, but the actual memory usage is very low like <50GB. We also 
observed the GC is actually very frequent which cost a lot of CPU time.

That's odd. Are you positive GOGC=off is set correctly? GOGC=0 is not the 
same, and will in fact trigger the garbage collector constantly.

Can you collect the stderr of your program running with GODEBUG=gctrace=1 
set? That'll help identify what's going on. This behavior isn't what I 
would expect, so if you suspect a bug, please file an issue on GitHub.

Yes, I'm pretty sure GOGC is set to OFF when the issue happened. Here is a 
sample GC trace log when we set GOMEMLIMIT=165GB:

   - *gc 42787 @354529.311s 4%: 0.43+680+0.11 ms clock, 
   24+6.4/9387/23913+6.2 ms cpu, 54417->55384->32404 MB, 66903 MB goal, 0 MB 
   stacks, 1 MB globals, 56 P*

All other gc trace logs are quite similar with the different heap goal 
numbers ranging from 65GB to 70GB which is much smaller than 165GB.


We also had another test by setting GOGC=100, it turned out the GC behavior 
and memory usage is quite similar to GOGC=OFF.

I agree, this just looks like GOGC=100.

This may be nothing, but are you setting exactly `GOGC=OFF`, that is, with 
all letters capitalized? I'm pretty sure it needs to be `GOGC=off`. The 
code in the runtime *only* matches the lowercase form. 
<https://cs.opensource.google/go/go/+/master:src/runtime/mgcpacer.go;l=1279?q=GOGC=go%2Fgo>
 
Otherwise, it just silently uses the default value, which is 100.

I tried testing GOGC=off, it works. The issue turns out be to upper case 
OFF. Thanks a lot Micheal!


Something else to try is to call `runtime/debug.SetGCPercent(-1)` from your 
application.

If neither of those things work, then please file a bug at 
github.com/golang/go/issues and we'll continue there. :)


Also need to note that we have a *big in memory cache of around 30GB* which 
will always exist in memory, not sure that will affect how GC heap goal is 
decided in this case.

The GC just sees your cache as live memory.


We had another tweak to set GOGC to a high value like 200, and the memory 
usage is quite close to the memory limit.


I have two questions here:

   1. Is it expected behavior that the memory usage is much lower than 
   memory limit when GOGC is set to off? From the official doc (
   https://github.com/golang/go/issues/48409), it claims " by setting 
   GOGC=off, the Go runtime will always grow the heap to the full memory 
   limit"?

No, I would expect the runtime's total memory use to always be roughly at 
the soft limit in that case.


   1. How is the GC frequency decided when using soft memory limit + 
   GOGC=off? Is there some internal defalut value for GOGC in this case to 
   decide when to GC?

The frequency isn't decided directly. The heap goal (the total target heap 
size) is determined entirely from the memory limit. Approximately: heap 
goal = memory limit - all other memory the runtime is using. The frequency 
then depends on how big your live heap is and how fast your application 
allocates from the live heap size to the total target heap size.


Thanks!

-- 
You received this message because you are subscribed to a topic in the 
Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/golang-nuts/3Jiw78hOoEw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
golang-nuts...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/c387bdc3-67e1-40d1-97aa-90083febfdafn%40googlegroups.com
 
<https://groups.google.com/d/msgid/golang-nuts/c387bdc3-67e1-40d1-97aa-90083febfdafn%40googlegroups.com?utm_medium=email_source=footer>
.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/4596b729-f4cb-474e-b9ad-0a8503a3f361n%40googlegroups.com.


[go-nuts] Re: Low memory utilization when using soft memory limit with GOGC=off

2024-01-07 Thread 'Zhihui Jiang' via golang-nuts
Hi Micheal,

Sorry about delayed response! Please see my replies inline below.

On Wednesday, December 6, 2023 at 8:22:34 PM UTC-8 Michael Knyszek wrote:

On Tuesday, December 5, 2023 at 1:06:19 AM UTC-5 Zhihui Jiang wrote:

Hi there, 

We are running a large scale recommendation system using Golang and we are 
working on some GC related improvement recently. One of the changes we are 
trying to apply is to use soft memory limit with GOGC=off as suggested 
here: https://github.com/golang/go/issues/48409.

But during our testing, the memory usage never reaches the memory limit we 
set. For example, we have 100GB memory available and we set the memory 
limit to 90GB, but the actual memory usage is very low like <50GB. We also 
observed the GC is actually very frequent which cost a lot of CPU time.

That's odd. Are you positive GOGC=off is set correctly? GOGC=0 is not the 
same, and will in fact trigger the garbage collector constantly.

Can you collect the stderr of your program running with GODEBUG=gctrace=1 
set? That'll help identify what's going on. This behavior isn't what I 
would expect, so if you suspect a bug, please file an issue on GitHub.

Yes, I'm pretty sure GOGC is set to OFF when the issue happened. Here is a 
sample GC trace log when we set GOMEMLIMIT=165GB:

   - *gc 42787 @354529.311s 4%: 0.43+680+0.11 ms clock, 
   24+6.4/9387/23913+6.2 ms cpu, 54417->55384->32404 MB, 66903 MB goal, 0 MB 
   stacks, 1 MB globals, 56 P*

All other gc trace logs are quite similar with the different heap goal 
numbers ranging from 65GB to 70GB which is much smaller than 165GB.

We also had another test by setting GOGC=100, it turned out the GC behavior 
and memory usage is quite similar to GOGC=OFF.

Also need to note that we have a *big in memory cache of around 30GB* which 
will always exist in memory, not sure that will affect how GC heap goal is 
decided in this case.


We had another tweak to set GOGC to a high value like 200, and the memory 
usage is quite close to the memory limit.


I have two questions here:

   1. Is it expected behavior that the memory usage is much lower than 
   memory limit when GOGC is set to off? From the official doc (
   https://github.com/golang/go/issues/48409), it claims " by setting 
   GOGC=off, the Go runtime will always grow the heap to the full memory 
   limit"?

No, I would expect the runtime's total memory use to always be roughly at 
the soft limit in that case.


   1. How is the GC frequency decided when using soft memory limit + 
   GOGC=off? Is there some internal defalut value for GOGC in this case to 
   decide when to GC?

The frequency isn't decided directly. The heap goal (the total target heap 
size) is determined entirely from the memory limit. Approximately: heap 
goal = memory limit - all other memory the runtime is using. The frequency 
then depends on how big your live heap is and how fast your application 
allocates from the live heap size to the total target heap size.


Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/c387bdc3-67e1-40d1-97aa-90083febfdafn%40googlegroups.com.


Re: [go-nuts] Low memory utilization when using soft memory limit with GOGC=off

2023-12-05 Thread Zhihui Jiang
Hi Andrew,

Can you be more specific? We use GOMEMLIMIT to set the soft memory limit.

On Tuesday, December 5, 2023 at 4:30:49 AM UTC-8 Harris, Andrew wrote:

> Might be worth looking into GOMEMLIMIT.
> --
> *From:* golan...@googlegroups.com  on behalf 
> of Zhihui Jiang 
> *Sent:* Monday, December 4, 2023 10:05:52 PM
> *To:* golang-nuts 
> *Subject:* [go-nuts] Low memory utilization when using soft memory limit 
> with GOGC=off 
>  
> Hi there,  
>
> We are running a large scale recommendation system using Golang and we are 
> working on some GC related improvement recently. One of the changes we are 
> trying to apply is to use soft memory limit with GOGC=off as suggested 
> here: https://github.com/golang/go/issues/48409.
>
> But during our testing, the memory usage never reaches the memory limit we 
> set. For example, we have 100GB memory available and we set the memory 
> limit to 90GB, but the actual memory usage is very low like <50GB. We also 
> observed the GC is actually very frequent which cost a lot of CPU time.
>
> We had another tweak to set GOGC to a high value like 200, and the memory 
> usage is quite close to the memory limit.
>
> I have two questions here:
>
>1. Is it expected behavior that the memory usage is much lower than 
>memory limit when GOGC is set to off? From the official doc (
>https://github.com/golang/go/issues/48409), it claims " by setting 
>GOGC=off, the Go runtime will always grow the heap to the full memory 
>limit"?
>2. How is the GC frequency decided when using soft memory limit + 
>GOGC=off? Is there some internal defalut value for GOGC in this case to 
>decide when to GC?
>
>
> Thanks!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/81d76570-9c09-4583-b900-fcfd19023f28n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/81d76570-9c09-4583-b900-fcfd19023f28n%40googlegroups.com?utm_medium=email_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/976f0bd7-161b-4dcd-9235-40e11d1a25c0n%40googlegroups.com.


[go-nuts] Low memory utilization when using soft memory limit with GOGC=off

2023-12-04 Thread Zhihui Jiang
Hi there, 

We are running a large scale recommendation system using Golang and we are 
working on some GC related improvement recently. One of the changes we are 
trying to apply is to use soft memory limit with GOGC=off as suggested 
here: https://github.com/golang/go/issues/48409.

But during our testing, the memory usage never reaches the memory limit we 
set. For example, we have 100GB memory available and we set the memory 
limit to 90GB, but the actual memory usage is very low like <50GB. We also 
observed the GC is actually very frequent which cost a lot of CPU time.

We had another tweak to set GOGC to a high value like 200, and the memory 
usage is quite close to the memory limit.

I have two questions here:

   1. Is it expected behavior that the memory usage is much lower than 
   memory limit when GOGC is set to off? From the official doc 
   (https://github.com/golang/go/issues/48409), it claims " by setting 
   GOGC=off, the Go runtime will always grow the heap to the full memory 
   limit"?
   2. How is the GC frequency decided when using soft memory limit + 
   GOGC=off? Is there some internal defalut value for GOGC in this case to 
   decide when to GC?


Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/81d76570-9c09-4583-b900-fcfd19023f28n%40googlegroups.com.


Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-30 Thread Zhihui Jiang


On Monday, October 30, 2023 at 10:12:08 PM UTC-7 Robert Engels wrote:

What is the average wall time of a request?

The average latency is a few hundred milliseconds.  


Based on what you wrote it appears that handling a single request generates 
a lot of garbage - high allocation rate - and for this to be significant I 
suspect the runtime is also significant - which implies to me a spawn and 
destroy request handler is your best bet. 

I actually didn't quite get your suggestion earlier. We are using gRPC and 
I think for each request we already have separate goroutines to handle it. 
Can you explain a little bit more about spawn and destroy request handler? 


On Oct 30, 2023, at 11:56 PM, Zhihui Jiang  wrote:

Hi Michael, Jason and Robert, thanks a lot for the replies and suggestions!


I did some profiling today, here are some specific findings:
1, CPUs used for GC is around 35% after we enabled soft memory limit, and 
it was 45%+ before. I don't have too much experience here on how much CPU 
we should spend on GCs ideally, but my intuition 35% is pretty high.
2, For GC, most of the CPU is on *runtime.scanObject* which I guess is 
dependent on how many object we allocate and how fast that is. 
3, With some further look at the heap profiling, it turns out most of the 
objects (70%+) allocated are due to complex protobuf messages we use for 
communications between services which can be big and might have deep-nested 
submessages.

On Monday, October 30, 2023 at 2:19:23 PM UTC-7 Michael Knyszek wrote:

I second Jason's message, and +1 to off-heap memory as a last resort. 

Yes, indeed. One of the advantage using Go is we don't need to manage 
memory by ourselves, I will try other options first and see how much we can 
improve. 

Here are a few more details:

For a starting point on how to reduce memory allocations directly, see 
https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
restructuring your program in places. (e.g. passing byte slices to 
functions to be filled instead of returning byte slices; that sort of 
thing.)
RE: pooling memory, take a look look at sync.Pool (
https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
reducing the number of memory allocations that are made in the steady-state.

Object pooling is actually one of the most promising option we are trying 
to implement right now. One quick question: is sync.Pool also feasible for 
complex protobuf messages? any pitfall we should be take into 
consideration? 


On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:

Similar to Robert's suggestion, you could just use non-GC-ed memory within 
the process.

https://github.com/glycerine/offheap provides an example. 

The central idea is that the Go GC will never touch memory that you have 
requested
yourself from the OS. So you can make your own Arenas. 
https://en.wikipedia.org/wiki/Region-based_memory_management

But I would save these as last resorts of course. Before that:

a) can you reduce the objects allocated per request?  
b) can you allocate everything else on the stack? There are flags to see 
why things are escaping to the heap, use those in your analysis.
(This is by far the simplest and fastest thing. Since the stack is 
automatically unwound when the user request finishes, typically, there is 
no GC to do.)

Will try this out and let you know if we have interesting findings here. 

c) can you allocate a pool of objects that is just reused instead of 
allocating for each new user request?
d) Is there anything that can be effectively cached and re-used instead of 
allocated?

Good point! We actually have an in-memory cache which already haas very 
high cache hit ratio of 95%+. Seems not too much headroom here to further 
reduce the CPUs on GC. 


Use the profiler pprof to figure out what is going on.

Thanks! pprof indeed is very helpful tool and the problem we are facing 
seems to boil down to too many large/complex protobuf message passed around 
different services which allocates too many objects during the proto 
unmarshal. 

-- 

You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/2e3ac44e-923b-4b6b-88ec-743f8474c83an%40googlegroups.com
 
<https://groups.google.com/d/msgid/golang-nuts/2e3ac44e-923b-4b6b-88ec-743f8474c83an%40googlegroups.com?utm_medium=email_source=footer>
.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/29875b49-316d-4332-9854-35da4c043005n%40googlegroups.com.


Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-30 Thread Zhihui Jiang
Hi Michael, Jason and Robert, thanks a lot for the replies and suggestions!

I did some profiling today, here are some specific findings:
1, CPUs used for GC is around 35% after we enabled soft memory limit, and 
it was 45%+ before. I don't have too much experience here on how much CPU 
we should spend on GCs ideally, but my intuition 35% is pretty high.
2, For GC, most of the CPU is on *runtime.scanObject* which I guess is 
dependent on how many object we allocate and how fast that is. 
3, With some further look at the heap profiling, it turns out most of the 
objects (70%+) allocated are due to complex protobuf messages we use for 
communications between services which can be big and might have deep-nested 
submessages.

On Monday, October 30, 2023 at 2:19:23 PM UTC-7 Michael Knyszek wrote:

I second Jason's message, and +1 to off-heap memory as a last resort. 

Yes, indeed. One of the advantage using Go is we don't need to manage 
memory by ourselves, I will try other options first and see how much we can 
improve. 

Here are a few more details:

For a starting point on how to reduce memory allocations directly, see 
https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
restructuring your program in places. (e.g. passing byte slices to 
functions to be filled instead of returning byte slices; that sort of 
thing.)
RE: pooling memory, take a look look at sync.Pool (
https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
reducing the number of memory allocations that are made in the steady-state.

Object pooling is actually one of the most promising option we are trying 
to implement right now. One quick question: is sync.Pool also feasible for 
complex protobuf messages? any pitfall we should be take into 
consideration? 


On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:

Similar to Robert's suggestion, you could just use non-GC-ed memory within 
the process.

https://github.com/glycerine/offheap provides an example. 

The central idea is that the Go GC will never touch memory that you have 
requested
yourself from the OS. So you can make your own Arenas. 
https://en.wikipedia.org/wiki/Region-based_memory_management

But I would save these as last resorts of course. Before that:

a) can you reduce the objects allocated per request?  
b) can you allocate everything else on the stack? There are flags to see 
why things are escaping to the heap, use those in your analysis.
(This is by far the simplest and fastest thing. Since the stack is 
automatically unwound when the user request finishes, typically, there is 
no GC to do.)

Will try this out and let you know if we have interesting findings here. 

c) can you allocate a pool of objects that is just reused instead of 
allocating for each new user request?
d) Is there anything that can be effectively cached and re-used instead of 
allocated?

Good point! We actually have an in-memory cache which already haas very 
high cache hit ratio of 95%+. Seems not too much headroom here to further 
reduce the CPUs on GC. 


Use the profiler pprof to figure out what is going on.

Thanks! pprof indeed is very helpful tool and the problem we are facing 
seems to boil down to too many large/complex protobuf message passed around 
different services which allocates too many objects during the proto 
unmarshal. 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/2e3ac44e-923b-4b6b-88ec-743f8474c83an%40googlegroups.com.


[go-nuts] Suggestions on optimizing Go GC

2023-10-29 Thread Zhihui Jiang
Hi there,

We have a large-scale recommendation system serving millions of users which 
is built using Golang. It has worked well until recently when we are trying 
to enlarge our index or candidate pool by 10X in which case the number of 
candidate objects created to serve each user request can also increase by 
5~10X. Those huge number of objects created on heap cause a big jump of the 
CPU used for GC itself and thus significantly reduces the system throughput.

We have tried different ways to reduce GC cost, like using soft memory limit 

 and 
dynamically tuning the value of GOGC similar to what is described here 
.
 
Those indeed helped, but they won't reduce the intrinsic cost of GC because 
the huge number of objects in heap have to be recycled anyway. 

I'm wondering if you have any suggestions about how to reduce object 
allocations during request serving?

Thanks!
Best

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a6b8f58f-9452-43e6-9e63-92d944dd0caan%40googlegroups.com.