Re: [go-nuts] [generics] Data structures and the garbage collector

2020-07-11 Thread t hepudds
Hello Markus,

> "But apparently (see Ian's answer) this is not the case, and the GC 
always needs to trace all elements, keys and values."

I think Ian said something a bit different from what you said there.

As far as I understand, the GC does not scan the keys and values of a map 
if they are not pointers and do not contain pointers. See for example 
https://go-review.googlesource.com/c/go/+/3288

> "so the payload type is interface{} now."

For your case, it sounds like with generics you could substantially reduce 
the total count of pointers used by your data structure by storing the 
value directly as Ian said, rather than using interface{}.

Separately, a great overview of the GC system is in this golang.org blog 
post by Rick Hudson:

https://blog.golang.org/ismmkeynote

Also, the code itself is very readable, including there are some large 
comments that give overviews, such as:

https://github.com/golang/go/blob/master/src/runtime/mgc.go#L5
  
Those are worth a quick read if you are interested in more details of how 
the GC works.

Regards,
thepudds

On Friday, July 10, 2020 at 12:48:05 PM UTC-4, Markus Heukelom wrote:
>
>
> Many data structures are implemented using pointers. How does this perform 
>>> wrt garbage collecting? For example take any data structure that uses some 
>>> tree of nodes, or even simpler just a singly linked list: there is one 
>>> pointer from the left node to the next. Wouldn't this incur huge GC 
>>> (unavoidable) cycles for large linked lists that will look at all pointers? 
>>> I can imagine that a large AVL tree would then cause potentially large 
>>> hick-ups in the GC cycle? 
>>>
>>>
>> Is your worry latency in the form of pause times while doing GC, or 
>> bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear.
>>
>>
> My "worry" is about CPU cycles that would not be (necessarily) needed in 
> let's say C++ or C. The fact that Go has a concurrent GC resulting in no 
> "stop the world"-hick-ups is really awesome, but was not my main point. 
>
> (btw I don't really "worry" about it for any real applications; it was 
> more a question out of interest)
>
> The reason I asked is because I use a custom tree data-structure that 
> manages parent-child, and leaf node prev-next, relationships using 
> pointers. I use the tree for multiple value types, so the payload type is 
> interface{} now. It would be a great place for me to use generics; this 
> would make my code more clear and safer. 
>
> In any case, you are right that these things require attention in a 
>> garbage collector, both of them. But Go has addressed this for many years 
>> now.
>>
>> As an example, imagine you have a binary AVL or Red/Black tree taking up 
>> 128 Gigabyte of memory space. You now delete the root node of said tree.
>>
>> Example 1: in C++ we have reference counted the tree. Deleting the root 
>> node sets its childrens refcounts to 0, so they get deleted, and so on 
>> recursively down the tree. It might take some time to delete 128 Gigabyte 
>> of data this way. So C++ isn't exempt from memory reclamation pauses either.
>>
>>
> In C/C++ you could better use an arena-allocator and not refcounts. 
> Clearing the tree would then use only (nodes / (nodes per region)) 
> deallocation calls. As the region is typically large, this would probably 
> be orders of magnitude quicker. I believe compilers use arena allocators 
> for this reason. (Deleting individual nodes would be cheaper maybe as 
> well.) 
>
> Refcounting I think is best reserved for situations where the lifetime of 
> an object is not determined by, or escapes, the package (class internals in 
> C++), or for reasons  of simplicity of implementation of course.
>
> I was under the (false) impression that one of the reasons of having a 
> built-in array, slice and map type in Go (besides it just being really 
> handy) is that for example a map[string]int would relieve the GC needing to 
> trace the keys and values (in contrast to for example map[int]*string). But 
> apparently (see Ian's answer) this is not the case, and the GC always needs 
> to trace all elements, keys and values. In that case there is (luckily) no 
> reason to choose build-in map over a custom map written with generics.
>  
>
>> Example 2: Imagine a language using Cheney's copying collection as the 
>> main garbage collection scheme. It will use 256 gigabyte of memory for its 
>> two copying spaces, but collection time will be 0, since the tree's data 
>> isn't live. You are paying with memory for faster collection in this case.
>>
>> Example 3: Concurrent collectors, like the one Go uses nowadays, will 
>> collect the tree while the program is running and doing productive work. 
>> While doing so incurs a bandwidth loss, latency pause times do not suffer. 
>> If you have spare CPU cores, they'll be used to handle the collection while 
>> the remaining cores run the program (more or less, this is a gross 
>> simplification).
>>
>> As

Re: [go-nuts] [generics] Data structures and the garbage collector

2020-07-10 Thread Markus Heukelom


> Many data structures are implemented using pointers. How does this perform 
>> wrt garbage collecting? For example take any data structure that uses some 
>> tree of nodes, or even simpler just a singly linked list: there is one 
>> pointer from the left node to the next. Wouldn't this incur huge GC 
>> (unavoidable) cycles for large linked lists that will look at all pointers? 
>> I can imagine that a large AVL tree would then cause potentially large 
>> hick-ups in the GC cycle? 
>>
>>
> Is your worry latency in the form of pause times while doing GC, or 
> bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear.
>
>
My "worry" is about CPU cycles that would not be (necessarily) needed in 
let's say C++ or C. The fact that Go has a concurrent GC resulting in no 
"stop the world"-hick-ups is really awesome, but was not my main point. 

(btw I don't really "worry" about it for any real applications; it was more 
a question out of interest)

The reason I asked is because I use a custom tree data-structure that 
manages parent-child, and leaf node prev-next, relationships using 
pointers. I use the tree for multiple value types, so the payload type is 
interface{} now. It would be a great place for me to use generics; this 
would make my code more clear and safer. 

In any case, you are right that these things require attention in a garbage 
> collector, both of them. But Go has addressed this for many years now.
>
> As an example, imagine you have a binary AVL or Red/Black tree taking up 
> 128 Gigabyte of memory space. You now delete the root node of said tree.
>
> Example 1: in C++ we have reference counted the tree. Deleting the root 
> node sets its childrens refcounts to 0, so they get deleted, and so on 
> recursively down the tree. It might take some time to delete 128 Gigabyte 
> of data this way. So C++ isn't exempt from memory reclamation pauses either.
>
>
In C/C++ you could better use an arena-allocator and not refcounts. 
Clearing the tree would then use only (nodes / (nodes per region)) 
deallocation calls. As the region is typically large, this would probably 
be orders of magnitude quicker. I believe compilers use arena allocators 
for this reason. (Deleting individual nodes would be cheaper maybe as 
well.) 

Refcounting I think is best reserved for situations where the lifetime of 
an object is not determined by, or escapes, the package (class internals in 
C++), or for reasons  of simplicity of implementation of course.

I was under the (false) impression that one of the reasons of having a 
built-in array, slice and map type in Go (besides it just being really 
handy) is that for example a map[string]int would relieve the GC needing to 
trace the keys and values (in contrast to for example map[int]*string). But 
apparently (see Ian's answer) this is not the case, and the GC always needs 
to trace all elements, keys and values. In that case there is (luckily) no 
reason to choose build-in map over a custom map written with generics.
 

> Example 2: Imagine a language using Cheney's copying collection as the 
> main garbage collection scheme. It will use 256 gigabyte of memory for its 
> two copying spaces, but collection time will be 0, since the tree's data 
> isn't live. You are paying with memory for faster collection in this case.
>
> Example 3: Concurrent collectors, like the one Go uses nowadays, will 
> collect the tree while the program is running and doing productive work. 
> While doing so incurs a bandwidth loss, latency pause times do not suffer. 
> If you have spare CPU cores, they'll be used to handle the collection while 
> the remaining cores run the program (more or less, this is a gross 
> simplification).
>
> As for generics:
>
> When you use a generic, you substitute in a type to generate a 
> special-case of a function for that type. If the implementation uses 
> monomorphization, it will generate a new piece of code specifically for 
> that type invocation. This might mean other optimizations trigger for the 
> code path, notably stripping pointers away.
>
> As for general worry:
>
> In some cases it is beneficial to construct data structures such that they 
> pack data well and don't use excessive amounts of pointers. One reason is 
> that collecting pointers takes time in any language. Another reason is that 
> pointer indirections cost memory loads, which can be expensive. A third 
> reason is that packed data has cache locality, so access to data close 
> together is several orders of magnitude faster than following yet another 
> pointer indirection. And finally, it might avoid memory pressure in copying 
> data around.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/db652e

Re: [go-nuts] [generics] Data structures and the garbage collector

2020-07-10 Thread Jesper Louis Andersen
On Thu, Jul 9, 2020 at 6:40 PM Markus Heukelom 
wrote:

> Many data structures are implemented using pointers. How does this perform
> wrt garbage collecting? For example take any data structure that uses some
> tree of nodes, or even simpler just a singly linked list: there is one
> pointer from the left node to the next. Wouldn't this incur huge GC
> (unavoidable) cycles for large linked lists that will look at all pointers?
> I can imagine that a large AVL tree would then cause potentially large
> hick-ups in the GC cycle?
>
>
Is your worry latency in the form of pause times while doing GC, or
bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear.

In any case, you are right that these things require attention in a garbage
collector, both of them. But Go has addressed this for many years now.

As an example, imagine you have a binary AVL or Red/Black tree taking up
128 Gigabyte of memory space. You now delete the root node of said tree.

Example 1: in C++ we have reference counted the tree. Deleting the root
node sets its childrens refcounts to 0, so they get deleted, and so on
recursively down the tree. It might take some time to delete 128 Gigabyte
of data this way. So C++ isn't exempt from memory reclamation pauses either.

Example 2: Imagine a language using Cheney's copying collection as the main
garbage collection scheme. It will use 256 gigabyte of memory for its two
copying spaces, but collection time will be 0, since the tree's data isn't
live. You are paying with memory for faster collection in this case.

Example 3: Concurrent collectors, like the one Go uses nowadays, will
collect the tree while the program is running and doing productive work.
While doing so incurs a bandwidth loss, latency pause times do not suffer.
If you have spare CPU cores, they'll be used to handle the collection while
the remaining cores run the program (more or less, this is a gross
simplification).

As for generics:

When you use a generic, you substitute in a type to generate a special-case
of a function for that type. If the implementation uses monomorphization,
it will generate a new piece of code specifically for that type invocation.
This might mean other optimizations trigger for the code path, notably
stripping pointers away.

As for general worry:

In some cases it is beneficial to construct data structures such that they
pack data well and don't use excessive amounts of pointers. One reason is
that collecting pointers takes time in any language. Another reason is that
pointer indirections cost memory loads, which can be expensive. A third
reason is that packed data has cache locality, so access to data close
together is several orders of magnitude faster than following yet another
pointer indirection. And finally, it might avoid memory pressure in copying
data around.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGrdgiWh%2By%2BN92M0%2Bas650hQ4ZLJXAnDq5vz1RqJXPZooNDi%3DA%40mail.gmail.com.


Re: [go-nuts] [generics] Data structures and the garbage collector

2020-07-09 Thread Ian Lance Taylor
On Thu, Jul 9, 2020 at 9:40 AM Markus Heukelom  wrote:
>
> Many data structures are implemented using pointers. How does this perform 
> wrt garbage collecting? For example take any data structure that uses some 
> tree of nodes, or even simpler just a singly linked list: there is one 
> pointer from the left node to the next. Wouldn't this incur huge GC 
> (unavoidable) cycles for large linked lists that will look at all pointers? I 
> can imagine that a large AVL tree would then cause potentially large hick-ups 
> in the GC cycle?
>
> Of course, luckily we have map where you can just store the values. But if 
> people start using generics for their own map types, or trees, wouldn't this 
> potentially be a bigger problem? Wouldn't this defy the use of generics for 
> data structures aimed to manage a lot of data?
>
> I don't know how Java does this for example, or if it is no practical issue 
> at all. In C++ you can you ref counting for the nodes, in C you would use a 
> arena allocator probably for the nodes etc. I was wondering how this relates 
> to Go.

There is no special connection between the builtin map type and the
garbage collector.  The builtin map type uses pointers too.  The
garbage collector has to trace those pointers.

Go uses a concurrent garbage collector, so a large linked list won't
cause any sort of program pause.  The garbage collector does use CPU
time, and it will use more CPU time for data structures that use more
pointers.  But I don't see that as being particularly tied to
generics.  If a program needs to build data structures with lots of
pointers, it will build those data structures whether or not the
language has generics.

In fact, there is a good argument that generics will reduce the number
of pointers overall.  Without generics, a general purpose data
structure will use interface types, and in the current implementation
interface types always contain two pointers.  With generics, a general
purpose data structure will store the value directly; if the value
does not contain any pointers, then the overall effect will be fewer
pointers and less GC work.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcUQLF38t_R1LYkqk2TGDM3d6vtwE6oZZQGb1Fw6-%3DLUhg%40mail.gmail.com.


[go-nuts] [generics] Data structures and the garbage collector

2020-07-09 Thread Markus Heukelom
Hi,

Many data structures are implemented using pointers. How does this perform 
wrt garbage collecting? For example take any data structure that uses some 
tree of nodes, or even simpler just a singly linked list: there is one 
pointer from the left node to the next. Wouldn't this incur huge GC 
(unavoidable) cycles for large linked lists that will look at all pointers? 
I can imagine that a large AVL tree would then cause potentially large 
hick-ups in the GC cycle? 

Of course, luckily we have map where you can just store the values. But if 
people start using generics for their own map types, or trees, wouldn't 
this potentially be a bigger problem? Wouldn't this defy the use of 
generics for data structures aimed to manage a lot of data?

I don't know how Java does this for example, or if it is no practical issue 
at all. In C++ you can you ref counting for the nodes, in C you would use a 
arena allocator probably for the nodes etc. I was wondering how this 
relates to Go.

-Markus






-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/942ebe05-2bd8-4fc2-bed7-b7d90518fd6fn%40googlegroups.com.