Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-31 Thread Robert Engels
A few hundred milliseconds is pretty short. Are you certain you don’t have a 
memory leak?

Still, just run N request handler PROCESSES each with 1/N the heap you have 
allocated now. 

If each of these continue to grow to unmanageable size - simply kill and spawn 
a new request process - with a load balancer to hit the available ones for any 
request. 

You are going to spend extra cpu on GC overall - but you can limit the GC in 
each process since you’re going to kill it once it reaches a certain heap size. 

If that won’t work then there is really only one choice - the off heap memory 
that others have mentioned. 

You may have an an allocation pattern that needs a generational collector which 
Go doesn’t offer. 

> On Oct 31, 2023, at 12:47 AM, Zhihui Jiang  wrote:
> 
> 
> 
> On Monday, October 30, 2023 at 10:12:08 PM UTC-7 Robert Engels wrote:
> What is the average wall time of a request?
> The average latency is a few hundred milliseconds.  
> 
> Based on what you wrote it appears that handling a single request generates a 
> lot of garbage - high allocation rate - and for this to be significant I 
> suspect the runtime is also significant - which implies to me a spawn and 
> destroy request handler is your best bet. 
> I actually didn't quite get your suggestion earlier. We are using gRPC and I 
> think for each request we already have separate goroutines to handle it. Can 
> you explain a little bit more about spawn and destroy request handler? 
> 
>> On Oct 30, 2023, at 11:56 PM, Zhihui Jiang  wrote:
>> 
>> Hi Michael, Jason and Robert, thanks a lot for the replies and suggestions!
> 
>> 
>> I did some profiling today, here are some specific findings:
>> 1, CPUs used for GC is around 35% after we enabled soft memory limit, and it 
>> was 45%+ before. I don't have too much experience here on how much CPU we 
>> should spend on GCs ideally, but my intuition 35% is pretty high.
>> 2, For GC, most of the CPU is on runtime.scanObject which I guess is 
>> dependent on how many object we allocate and how fast that is. 
>> 3, With some further look at the heap profiling, it turns out most of the 
>> objects (70%+) allocated are due to complex protobuf messages we use for 
>> communications between services which can be big and might have deep-nested 
>> submessages.
>> 
>> On Monday, October 30, 2023 at 2:19:23 PM UTC-7 Michael Knyszek wrote:
>> I second Jason's message, and +1 to off-heap memory as a last resort.
>> Yes, indeed. One of the advantage using Go is we don't need to manage memory 
>> by ourselves, I will try other options first and see how much we can 
>> improve. 
>> Here are a few more details:
>> 
>> For a starting point on how to reduce memory allocations directly, see 
>> https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
>> restructuring your program in places. (e.g. passing byte slices to functions 
>> to be filled instead of returning byte slices; that sort of thing.)
>> RE: pooling memory, take a look look at sync.Pool 
>> (https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
>> reducing the number of memory allocations that are made in the steady-state.
>> Object pooling is actually one of the most promising option we are trying to 
>> implement right now. One quick question: is sync.Pool also feasible for 
>> complex protobuf messages? any pitfall we should be take into consideration? 
>>> 
>>> On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:
>>> Similar to Robert's suggestion, you could just use non-GC-ed memory within 
>>> the process.
>>> 
>>> https://github.com/glycerine/offheap provides an example. 
>>> 
>>> The central idea is that the Go GC will never touch memory that you have 
>>> requested
>>> yourself from the OS. So you can make your own Arenas. 
>>> https://en.wikipedia.org/wiki/Region-based_memory_management
>>> 
>>> But I would save these as last resorts of course. Before that:
>>> 
>>> a) can you reduce the objects allocated per request?  
>>> b) can you allocate everything else on the stack? There are flags to see 
>>> why things are escaping to the heap, use those in your analysis.
>>> (This is by far the simplest and fastest thing. Since the stack is 
>>> automatically unwound when the user request finishes, typically, there is 
>>> no GC to do.)
>>> Will try this out and let you know if we have interesting findings here. 
>>> c) can you allocate a pool of objects that is just reused instead of 
>>> allocating for each new user request?
>>> d) Is there anything that can be effectively cached and re-used instead of 
>>> allocated?
>>> Good point! We actually have an in-memory cache which already haas very 
>>> high cache hit ratio of 95%+. Seems not too much headroom here to further 
>>> reduce the CPUs on GC. 
>>> 
>>> Use the profiler pprof to figure out what is going on.
>>> Thanks! pprof indeed is very helpful tool and the problem we are facing 
>>> seems to boil down to too many 

Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-30 Thread Zhihui Jiang


On Monday, October 30, 2023 at 10:12:08 PM UTC-7 Robert Engels wrote:

What is the average wall time of a request?

The average latency is a few hundred milliseconds.  


Based on what you wrote it appears that handling a single request generates 
a lot of garbage - high allocation rate - and for this to be significant I 
suspect the runtime is also significant - which implies to me a spawn and 
destroy request handler is your best bet. 

I actually didn't quite get your suggestion earlier. We are using gRPC and 
I think for each request we already have separate goroutines to handle it. 
Can you explain a little bit more about spawn and destroy request handler? 


On Oct 30, 2023, at 11:56 PM, Zhihui Jiang  wrote:

Hi Michael, Jason and Robert, thanks a lot for the replies and suggestions!


I did some profiling today, here are some specific findings:
1, CPUs used for GC is around 35% after we enabled soft memory limit, and 
it was 45%+ before. I don't have too much experience here on how much CPU 
we should spend on GCs ideally, but my intuition 35% is pretty high.
2, For GC, most of the CPU is on *runtime.scanObject* which I guess is 
dependent on how many object we allocate and how fast that is. 
3, With some further look at the heap profiling, it turns out most of the 
objects (70%+) allocated are due to complex protobuf messages we use for 
communications between services which can be big and might have deep-nested 
submessages.

On Monday, October 30, 2023 at 2:19:23 PM UTC-7 Michael Knyszek wrote:

I second Jason's message, and +1 to off-heap memory as a last resort. 

Yes, indeed. One of the advantage using Go is we don't need to manage 
memory by ourselves, I will try other options first and see how much we can 
improve. 

Here are a few more details:

For a starting point on how to reduce memory allocations directly, see 
https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
restructuring your program in places. (e.g. passing byte slices to 
functions to be filled instead of returning byte slices; that sort of 
thing.)
RE: pooling memory, take a look look at sync.Pool (
https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
reducing the number of memory allocations that are made in the steady-state.

Object pooling is actually one of the most promising option we are trying 
to implement right now. One quick question: is sync.Pool also feasible for 
complex protobuf messages? any pitfall we should be take into 
consideration? 


On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:

Similar to Robert's suggestion, you could just use non-GC-ed memory within 
the process.

https://github.com/glycerine/offheap provides an example. 

The central idea is that the Go GC will never touch memory that you have 
requested
yourself from the OS. So you can make your own Arenas. 
https://en.wikipedia.org/wiki/Region-based_memory_management

But I would save these as last resorts of course. Before that:

a) can you reduce the objects allocated per request?  
b) can you allocate everything else on the stack? There are flags to see 
why things are escaping to the heap, use those in your analysis.
(This is by far the simplest and fastest thing. Since the stack is 
automatically unwound when the user request finishes, typically, there is 
no GC to do.)

Will try this out and let you know if we have interesting findings here. 

c) can you allocate a pool of objects that is just reused instead of 
allocating for each new user request?
d) Is there anything that can be effectively cached and re-used instead of 
allocated?

Good point! We actually have an in-memory cache which already haas very 
high cache hit ratio of 95%+. Seems not too much headroom here to further 
reduce the CPUs on GC. 


Use the profiler pprof to figure out what is going on.

Thanks! pprof indeed is very helpful tool and the problem we are facing 
seems to boil down to too many large/complex protobuf message passed around 
different services which allocates too many objects during the proto 
unmarshal. 

-- 

You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/2e3ac44e-923b-4b6b-88ec-743f8474c83an%40googlegroups.com
 

.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/29875b49-316d-4332-9854-35da4c043005n%40googlegroups.com.


Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-30 Thread Robert Engels
What is the average wall time of a request?

Based on what you wrote it appears that handling a single request generates a 
lot of garbage - high allocation rate - and for this to be significant I 
suspect the runtime is also significant - which implies to me a spawn and 
destroy request handler is your best bet. 

> On Oct 30, 2023, at 11:56 PM, Zhihui Jiang  wrote:
> 
> Hi Michael, Jason and Robert, thanks a lot for the replies and suggestions!
> 
> I did some profiling today, here are some specific findings:
> 1, CPUs used for GC is around 35% after we enabled soft memory limit, and it 
> was 45%+ before. I don't have too much experience here on how much CPU we 
> should spend on GCs ideally, but my intuition 35% is pretty high.
> 2, For GC, most of the CPU is on runtime.scanObject which I guess is 
> dependent on how many object we allocate and how fast that is. 
> 3, With some further look at the heap profiling, it turns out most of the 
> objects (70%+) allocated are due to complex protobuf messages we use for 
> communications between services which can be big and might have deep-nested 
> submessages.
> 
> On Monday, October 30, 2023 at 2:19:23 PM UTC-7 Michael Knyszek wrote:
> I second Jason's message, and +1 to off-heap memory as a last resort.
> Yes, indeed. One of the advantage using Go is we don't need to manage memory 
> by ourselves, I will try other options first and see how much we can improve. 
> Here are a few more details:
> 
> For a starting point on how to reduce memory allocations directly, see 
> https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
> restructuring your program in places. (e.g. passing byte slices to functions 
> to be filled instead of returning byte slices; that sort of thing.)
> RE: pooling memory, take a look look at sync.Pool 
> (https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
> reducing the number of memory allocations that are made in the steady-state.
> Object pooling is actually one of the most promising option we are trying to 
> implement right now. One quick question: is sync.Pool also feasible for 
> complex protobuf messages? any pitfall we should be take into consideration? 
> 
> On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:
> Similar to Robert's suggestion, you could just use non-GC-ed memory within 
> the process.
> 
> https://github.com/glycerine/offheap provides an example. 
> 
> The central idea is that the Go GC will never touch memory that you have 
> requested
> yourself from the OS. So you can make your own Arenas. 
> https://en.wikipedia.org/wiki/Region-based_memory_management
> 
> But I would save these as last resorts of course. Before that:
> 
> a) can you reduce the objects allocated per request?  
> b) can you allocate everything else on the stack? There are flags to see why 
> things are escaping to the heap, use those in your analysis.
> (This is by far the simplest and fastest thing. Since the stack is 
> automatically unwound when the user request finishes, typically, there is no 
> GC to do.)
> Will try this out and let you know if we have interesting findings here. 
> c) can you allocate a pool of objects that is just reused instead of 
> allocating for each new user request?
> d) Is there anything that can be effectively cached and re-used instead of 
> allocated?
> Good point! We actually have an in-memory cache which already haas very high 
> cache hit ratio of 95%+. Seems not too much headroom here to further reduce 
> the CPUs on GC. 
> 
> Use the profiler pprof to figure out what is going on.
> Thanks! pprof indeed is very helpful tool and the problem we are facing seems 
> to boil down to too many large/complex protobuf message passed around 
> different services which allocates too many objects during the proto 
> unmarshal. 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/2e3ac44e-923b-4b6b-88ec-743f8474c83an%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/0A66E87C-1BD6-4486-B152-763188B404C7%40ix.netcom.com.


Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-30 Thread Zhihui Jiang
Hi Michael, Jason and Robert, thanks a lot for the replies and suggestions!

I did some profiling today, here are some specific findings:
1, CPUs used for GC is around 35% after we enabled soft memory limit, and 
it was 45%+ before. I don't have too much experience here on how much CPU 
we should spend on GCs ideally, but my intuition 35% is pretty high.
2, For GC, most of the CPU is on *runtime.scanObject* which I guess is 
dependent on how many object we allocate and how fast that is. 
3, With some further look at the heap profiling, it turns out most of the 
objects (70%+) allocated are due to complex protobuf messages we use for 
communications between services which can be big and might have deep-nested 
submessages.

On Monday, October 30, 2023 at 2:19:23 PM UTC-7 Michael Knyszek wrote:

I second Jason's message, and +1 to off-heap memory as a last resort. 

Yes, indeed. One of the advantage using Go is we don't need to manage 
memory by ourselves, I will try other options first and see how much we can 
improve. 

Here are a few more details:

For a starting point on how to reduce memory allocations directly, see 
https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
restructuring your program in places. (e.g. passing byte slices to 
functions to be filled instead of returning byte slices; that sort of 
thing.)
RE: pooling memory, take a look look at sync.Pool (
https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
reducing the number of memory allocations that are made in the steady-state.

Object pooling is actually one of the most promising option we are trying 
to implement right now. One quick question: is sync.Pool also feasible for 
complex protobuf messages? any pitfall we should be take into 
consideration? 


On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:

Similar to Robert's suggestion, you could just use non-GC-ed memory within 
the process.

https://github.com/glycerine/offheap provides an example. 

The central idea is that the Go GC will never touch memory that you have 
requested
yourself from the OS. So you can make your own Arenas. 
https://en.wikipedia.org/wiki/Region-based_memory_management

But I would save these as last resorts of course. Before that:

a) can you reduce the objects allocated per request?  
b) can you allocate everything else on the stack? There are flags to see 
why things are escaping to the heap, use those in your analysis.
(This is by far the simplest and fastest thing. Since the stack is 
automatically unwound when the user request finishes, typically, there is 
no GC to do.)

Will try this out and let you know if we have interesting findings here. 

c) can you allocate a pool of objects that is just reused instead of 
allocating for each new user request?
d) Is there anything that can be effectively cached and re-used instead of 
allocated?

Good point! We actually have an in-memory cache which already haas very 
high cache hit ratio of 95%+. Seems not too much headroom here to further 
reduce the CPUs on GC. 


Use the profiler pprof to figure out what is going on.

Thanks! pprof indeed is very helpful tool and the problem we are facing 
seems to boil down to too many large/complex protobuf message passed around 
different services which allocates too many objects during the proto 
unmarshal. 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/2e3ac44e-923b-4b6b-88ec-743f8474c83an%40googlegroups.com.


Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-30 Thread 'Michael Knyszek' via golang-nuts
I second Jason's message, and +1 to off-heap memory as a last resort. Here 
are a few more details:

For a starting point on how to reduce memory allocations directly, see 
https://go.dev/doc/gc-guide#Optimization_guide. Note that this may require 
restructuring your program in places. (e.g. passing byte slices to 
functions to be filled instead of returning byte slices; that sort of 
thing.)
RE: pooling memory, take a look look at sync.Pool (
https://pkg.go.dev/sync#Pool). A sync.Pool can be really effective at 
reducing the number of memory allocations that are made in the steady-state.

On Monday, October 30, 2023 at 2:33:21 PM UTC-4 Jason E. Aten wrote:

> Similar to Robert's suggestion, you could just use non-GC-ed memory within 
> the process.
>
> https://github.com/glycerine/offheap provides an example. 
>
> The central idea is that the Go GC will never touch memory that you have 
> requested
> yourself from the OS. So you can make your own Arenas. 
> https://en.wikipedia.org/wiki/Region-based_memory_management
>
> But I would save these as last resorts of course. Before that:
>
> a) can you reduce the objects allocated per request?  
> b) can you allocate everything else on the stack? There are flags to see 
> why things are escaping to the heap, use those in your analysis.
> (This is by far the simplest and fastest thing. Since the stack is 
> automatically unwound when the user request finishes, typically, there is 
> no GC to do.)
> c) can you allocate a pool of objects that is just reused instead of 
> allocating for each new user request?
> d) Is there anything that can be effectively cached and re-used instead of 
> allocated?
>
> Use the profiler pprof to figure out what is going on.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/184e160a-d30d-43e4-a822-b7dc61a03b1bn%40googlegroups.com.


Re: [go-nuts] Suggestions on optimizing Go GC

2023-10-29 Thread Robert Engels
If the objects are created and discarded you might be able to spawn a new 
process to handle the request - or maybe a bunch - then just kill it and start 
a new process. Possibly use shared memory for constant data. 

This is the poor man’s generational garbage collector. 

> On Oct 29, 2023, at 9:43 PM, Zhihui Jiang  wrote:
> 
> Hi there,
> 
> We have a large-scale recommendation system serving millions of users which 
> is built using Golang. It has worked well until recently when we are trying 
> to enlarge our index or candidate pool by 10X in which case the number of 
> candidate objects created to serve each user request can also increase by 
> 5~10X. Those huge number of objects created on heap cause a big jump of the 
> CPU used for GC itself and thus significantly reduces the system throughput.
> 
> We have tried different ways to reduce GC cost, like using soft memory limit 
> and dynamically tuning the value of GOGC similar to what is described here. 
> Those indeed helped, but they won't reduce the intrinsic cost of GC because 
> the huge number of objects in heap have to be recycled anyway. 
> 
> I'm wondering if you have any suggestions about how to reduce object 
> allocations during request serving?
> 
> Thanks!
> Best
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/a6b8f58f-9452-43e6-9e63-92d944dd0caan%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/5741D093-86FC-41F0-BBF1-E4106C92ABC6%40ix.netcom.com.


[go-nuts] Suggestions on optimizing Go GC

2023-10-29 Thread Zhihui Jiang
Hi there,

We have a large-scale recommendation system serving millions of users which 
is built using Golang. It has worked well until recently when we are trying 
to enlarge our index or candidate pool by 10X in which case the number of 
candidate objects created to serve each user request can also increase by 
5~10X. Those huge number of objects created on heap cause a big jump of the 
CPU used for GC itself and thus significantly reduces the system throughput.

We have tried different ways to reduce GC cost, like using soft memory limit 

 and 
dynamically tuning the value of GOGC similar to what is described here 
.
 
Those indeed helped, but they won't reduce the intrinsic cost of GC because 
the huge number of objects in heap have to be recycled anyway. 

I'm wondering if you have any suggestions about how to reduce object 
allocations during request serving?

Thanks!
Best

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a6b8f58f-9452-43e6-9e63-92d944dd0caan%40googlegroups.com.