Re: [go-nuts] [generics] Data structures and the garbage collector
Hello Markus, > "But apparently (see Ian's answer) this is not the case, and the GC always needs to trace all elements, keys and values." I think Ian said something a bit different from what you said there. As far as I understand, the GC does not scan the keys and values of a map if they are not pointers and do not contain pointers. See for example https://go-review.googlesource.com/c/go/+/3288 > "so the payload type is interface{} now." For your case, it sounds like with generics you could substantially reduce the total count of pointers used by your data structure by storing the value directly as Ian said, rather than using interface{}. Separately, a great overview of the GC system is in this golang.org blog post by Rick Hudson: https://blog.golang.org/ismmkeynote Also, the code itself is very readable, including there are some large comments that give overviews, such as: https://github.com/golang/go/blob/master/src/runtime/mgc.go#L5 Those are worth a quick read if you are interested in more details of how the GC works. Regards, thepudds On Friday, July 10, 2020 at 12:48:05 PM UTC-4, Markus Heukelom wrote: > > > Many data structures are implemented using pointers. How does this perform >>> wrt garbage collecting? For example take any data structure that uses some >>> tree of nodes, or even simpler just a singly linked list: there is one >>> pointer from the left node to the next. Wouldn't this incur huge GC >>> (unavoidable) cycles for large linked lists that will look at all pointers? >>> I can imagine that a large AVL tree would then cause potentially large >>> hick-ups in the GC cycle? >>> >>> >> Is your worry latency in the form of pause times while doing GC, or >> bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear. >> >> > My "worry" is about CPU cycles that would not be (necessarily) needed in > let's say C++ or C. The fact that Go has a concurrent GC resulting in no > "stop the world"-hick-ups is really awesome, but was not my main point. > > (btw I don't really "worry" about it for any real applications; it was > more a question out of interest) > > The reason I asked is because I use a custom tree data-structure that > manages parent-child, and leaf node prev-next, relationships using > pointers. I use the tree for multiple value types, so the payload type is > interface{} now. It would be a great place for me to use generics; this > would make my code more clear and safer. > > In any case, you are right that these things require attention in a >> garbage collector, both of them. But Go has addressed this for many years >> now. >> >> As an example, imagine you have a binary AVL or Red/Black tree taking up >> 128 Gigabyte of memory space. You now delete the root node of said tree. >> >> Example 1: in C++ we have reference counted the tree. Deleting the root >> node sets its childrens refcounts to 0, so they get deleted, and so on >> recursively down the tree. It might take some time to delete 128 Gigabyte >> of data this way. So C++ isn't exempt from memory reclamation pauses either. >> >> > In C/C++ you could better use an arena-allocator and not refcounts. > Clearing the tree would then use only (nodes / (nodes per region)) > deallocation calls. As the region is typically large, this would probably > be orders of magnitude quicker. I believe compilers use arena allocators > for this reason. (Deleting individual nodes would be cheaper maybe as > well.) > > Refcounting I think is best reserved for situations where the lifetime of > an object is not determined by, or escapes, the package (class internals in > C++), or for reasons of simplicity of implementation of course. > > I was under the (false) impression that one of the reasons of having a > built-in array, slice and map type in Go (besides it just being really > handy) is that for example a map[string]int would relieve the GC needing to > trace the keys and values (in contrast to for example map[int]*string). But > apparently (see Ian's answer) this is not the case, and the GC always needs > to trace all elements, keys and values. In that case there is (luckily) no > reason to choose build-in map over a custom map written with generics. > > >> Example 2: Imagine a language using Cheney's copying collection as the >> main garbage collection scheme. It will use 256 gigabyte of memory for its >> two copying spaces, but collection time will be 0, since the tree's data >> isn't live. You are paying with memory for faster collection in this case. >> >> Example 3: Concurrent collectors, like the one Go uses nowadays, will >> collect the tree while the program is running and doing productive work. >> While doing so incurs a bandwidth loss, latency pause times do not suffer. >> If you have spare CPU cores, they'll be used to handle the collection while >> the remaining cores run the program (more or less, this is a gross >> simplification). >> >> As
Re: [go-nuts] [generics] Data structures and the garbage collector
> Many data structures are implemented using pointers. How does this perform >> wrt garbage collecting? For example take any data structure that uses some >> tree of nodes, or even simpler just a singly linked list: there is one >> pointer from the left node to the next. Wouldn't this incur huge GC >> (unavoidable) cycles for large linked lists that will look at all pointers? >> I can imagine that a large AVL tree would then cause potentially large >> hick-ups in the GC cycle? >> >> > Is your worry latency in the form of pause times while doing GC, or > bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear. > > My "worry" is about CPU cycles that would not be (necessarily) needed in let's say C++ or C. The fact that Go has a concurrent GC resulting in no "stop the world"-hick-ups is really awesome, but was not my main point. (btw I don't really "worry" about it for any real applications; it was more a question out of interest) The reason I asked is because I use a custom tree data-structure that manages parent-child, and leaf node prev-next, relationships using pointers. I use the tree for multiple value types, so the payload type is interface{} now. It would be a great place for me to use generics; this would make my code more clear and safer. In any case, you are right that these things require attention in a garbage > collector, both of them. But Go has addressed this for many years now. > > As an example, imagine you have a binary AVL or Red/Black tree taking up > 128 Gigabyte of memory space. You now delete the root node of said tree. > > Example 1: in C++ we have reference counted the tree. Deleting the root > node sets its childrens refcounts to 0, so they get deleted, and so on > recursively down the tree. It might take some time to delete 128 Gigabyte > of data this way. So C++ isn't exempt from memory reclamation pauses either. > > In C/C++ you could better use an arena-allocator and not refcounts. Clearing the tree would then use only (nodes / (nodes per region)) deallocation calls. As the region is typically large, this would probably be orders of magnitude quicker. I believe compilers use arena allocators for this reason. (Deleting individual nodes would be cheaper maybe as well.) Refcounting I think is best reserved for situations where the lifetime of an object is not determined by, or escapes, the package (class internals in C++), or for reasons of simplicity of implementation of course. I was under the (false) impression that one of the reasons of having a built-in array, slice and map type in Go (besides it just being really handy) is that for example a map[string]int would relieve the GC needing to trace the keys and values (in contrast to for example map[int]*string). But apparently (see Ian's answer) this is not the case, and the GC always needs to trace all elements, keys and values. In that case there is (luckily) no reason to choose build-in map over a custom map written with generics. > Example 2: Imagine a language using Cheney's copying collection as the > main garbage collection scheme. It will use 256 gigabyte of memory for its > two copying spaces, but collection time will be 0, since the tree's data > isn't live. You are paying with memory for faster collection in this case. > > Example 3: Concurrent collectors, like the one Go uses nowadays, will > collect the tree while the program is running and doing productive work. > While doing so incurs a bandwidth loss, latency pause times do not suffer. > If you have spare CPU cores, they'll be used to handle the collection while > the remaining cores run the program (more or less, this is a gross > simplification). > > As for generics: > > When you use a generic, you substitute in a type to generate a > special-case of a function for that type. If the implementation uses > monomorphization, it will generate a new piece of code specifically for > that type invocation. This might mean other optimizations trigger for the > code path, notably stripping pointers away. > > As for general worry: > > In some cases it is beneficial to construct data structures such that they > pack data well and don't use excessive amounts of pointers. One reason is > that collecting pointers takes time in any language. Another reason is that > pointer indirections cost memory loads, which can be expensive. A third > reason is that packed data has cache locality, so access to data close > together is several orders of magnitude faster than following yet another > pointer indirection. And finally, it might avoid memory pressure in copying > data around. > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/db652e
Re: [go-nuts] [generics] Data structures and the garbage collector
On Thu, Jul 9, 2020 at 6:40 PM Markus Heukelom wrote: > Many data structures are implemented using pointers. How does this perform > wrt garbage collecting? For example take any data structure that uses some > tree of nodes, or even simpler just a singly linked list: there is one > pointer from the left node to the next. Wouldn't this incur huge GC > (unavoidable) cycles for large linked lists that will look at all pointers? > I can imagine that a large AVL tree would then cause potentially large > hick-ups in the GC cycle? > > Is your worry latency in the form of pause times while doing GC, or bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear. In any case, you are right that these things require attention in a garbage collector, both of them. But Go has addressed this for many years now. As an example, imagine you have a binary AVL or Red/Black tree taking up 128 Gigabyte of memory space. You now delete the root node of said tree. Example 1: in C++ we have reference counted the tree. Deleting the root node sets its childrens refcounts to 0, so they get deleted, and so on recursively down the tree. It might take some time to delete 128 Gigabyte of data this way. So C++ isn't exempt from memory reclamation pauses either. Example 2: Imagine a language using Cheney's copying collection as the main garbage collection scheme. It will use 256 gigabyte of memory for its two copying spaces, but collection time will be 0, since the tree's data isn't live. You are paying with memory for faster collection in this case. Example 3: Concurrent collectors, like the one Go uses nowadays, will collect the tree while the program is running and doing productive work. While doing so incurs a bandwidth loss, latency pause times do not suffer. If you have spare CPU cores, they'll be used to handle the collection while the remaining cores run the program (more or less, this is a gross simplification). As for generics: When you use a generic, you substitute in a type to generate a special-case of a function for that type. If the implementation uses monomorphization, it will generate a new piece of code specifically for that type invocation. This might mean other optimizations trigger for the code path, notably stripping pointers away. As for general worry: In some cases it is beneficial to construct data structures such that they pack data well and don't use excessive amounts of pointers. One reason is that collecting pointers takes time in any language. Another reason is that pointer indirections cost memory loads, which can be expensive. A third reason is that packed data has cache locality, so access to data close together is several orders of magnitude faster than following yet another pointer indirection. And finally, it might avoid memory pressure in copying data around. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAGrdgiWh%2By%2BN92M0%2Bas650hQ4ZLJXAnDq5vz1RqJXPZooNDi%3DA%40mail.gmail.com.
Re: [go-nuts] [generics] Data structures and the garbage collector
On Thu, Jul 9, 2020 at 9:40 AM Markus Heukelom wrote: > > Many data structures are implemented using pointers. How does this perform > wrt garbage collecting? For example take any data structure that uses some > tree of nodes, or even simpler just a singly linked list: there is one > pointer from the left node to the next. Wouldn't this incur huge GC > (unavoidable) cycles for large linked lists that will look at all pointers? I > can imagine that a large AVL tree would then cause potentially large hick-ups > in the GC cycle? > > Of course, luckily we have map where you can just store the values. But if > people start using generics for their own map types, or trees, wouldn't this > potentially be a bigger problem? Wouldn't this defy the use of generics for > data structures aimed to manage a lot of data? > > I don't know how Java does this for example, or if it is no practical issue > at all. In C++ you can you ref counting for the nodes, in C you would use a > arena allocator probably for the nodes etc. I was wondering how this relates > to Go. There is no special connection between the builtin map type and the garbage collector. The builtin map type uses pointers too. The garbage collector has to trace those pointers. Go uses a concurrent garbage collector, so a large linked list won't cause any sort of program pause. The garbage collector does use CPU time, and it will use more CPU time for data structures that use more pointers. But I don't see that as being particularly tied to generics. If a program needs to build data structures with lots of pointers, it will build those data structures whether or not the language has generics. In fact, there is a good argument that generics will reduce the number of pointers overall. Without generics, a general purpose data structure will use interface types, and in the current implementation interface types always contain two pointers. With generics, a general purpose data structure will store the value directly; if the value does not contain any pointers, then the overall effect will be fewer pointers and less GC work. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcUQLF38t_R1LYkqk2TGDM3d6vtwE6oZZQGb1Fw6-%3DLUhg%40mail.gmail.com.
[go-nuts] [generics] Data structures and the garbage collector
Hi, Many data structures are implemented using pointers. How does this perform wrt garbage collecting? For example take any data structure that uses some tree of nodes, or even simpler just a singly linked list: there is one pointer from the left node to the next. Wouldn't this incur huge GC (unavoidable) cycles for large linked lists that will look at all pointers? I can imagine that a large AVL tree would then cause potentially large hick-ups in the GC cycle? Of course, luckily we have map where you can just store the values. But if people start using generics for their own map types, or trees, wouldn't this potentially be a bigger problem? Wouldn't this defy the use of generics for data structures aimed to manage a lot of data? I don't know how Java does this for example, or if it is no practical issue at all. In C++ you can you ref counting for the nodes, in C you would use a arena allocator probably for the nodes etc. I was wondering how this relates to Go. -Markus -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/942ebe05-2bd8-4fc2-bed7-b7d90518fd6fn%40googlegroups.com.