I looked at the code as is, and Packets themselves aren't ever inherited by anything, and they maintain a stack (I think) of sender state objects which doesn't affect the size of the packet. The sender state is inherited in a bunch of places since that is often customized. As far as compressed packets, what would the benefit be? It seems like there would be a lot of overhead to compress and decompress them, and I don't think they ever get all that big. I'm also hesitant to add "final" in there since I think it would be potentially limiting, but sometimes sensible limits are useful :-).
The reason exactly how big Request/Packets are is a concern is that I think you'd get a meaningful performance benefit if you could use extra knowledge to do things faster than just using new. For instance, if you knew that what you were allocating was always X bytes, then you could just have a bunch of blocks of X bytes laying around in a list and return one when asked. You wouldn't have to try to keep things compacted like new would, or keep track of (or even check) how big things were. That marginal bit of savings could make a big difference if allocations happened a lot, particularly if the type of allocations were ones new (and/or malloc if it's built on it) was bad at. My brief Googling says malloc is bad at lots of small allocations, for what that's worth. Also, a custom allocator could make some types of debugging marginally harder. For instance, if you used valgrind, it would replace malloc/new, and that wouldn't be used in the typical way for those types. They would likely be mostly batch allocated at the start of simulation, and then just returned by the various mechanisms later. That would make information about where a particular Request object came from less useful, although you would still likely be able to see that it was accessed in an illegal way, and where that access happened. I agree that taking an incremental approach and doing some experiments are worth while, and that wrapping request allocation in a small function would be a good first step. I can't promise a timeline, but hopefully I'll have something to share at some point in the future. Gabe On Thu, Nov 29, 2018 at 9:21 AM Nikos Nikoleris <[email protected]> wrote: > Hey all, > > The bigger opportunity is probably around packets as sometimes we are > allocating a number of them per request. > > A custom allocator would be a great contribution if it would make gem5 > substantially faster. But if it proved make things only marginally > better, I agree with Jason, we would need to factor in the fact that we > will be making the code more fragile. Would debugging/profiling be > harder if we used a custom deallocator? > > If the size of the Packet is a concern, we will probably have to deal > with members that are not fixed in size such as the senderState. It > might also be worth looking into something similar for the pointers from > packets such as data. > > Packet and Request memory management is something that we have been > concerned about as well. Giacomo, recently, changed all naked Request > pointers for shared_ptr and now we don't have to manually deallocate > them. But when we tried to do this for Packet pointers (figuring out > when to de-allocate a packet is not trivial) there was a substantial > overhead. > > Nikos > > > On 29/11/2018 16:40, Jason Lowe-Power wrote: > > Hey Gabe, > > > > Are you thinking that a custom allocator would make a difference in terms > > of memory footprint or in terms of performance (or both)? > > > > A couple of thoughts: > > - I'm hesitant to put the final keyword on Packet. I think we could see > > some code cleanup by making Packet a polymorphic object (e.g., Daniel has > > been talking about implementing packets with compressed data). Obviously, > > using polymorphism isn't required, and I don't have any plans to make > this > > change in the near future. But it's something that I've been thinking > about. > > - I like the idea of requiring the use of a small function to allocate > > requests and packets. In many cases, for Packets, we use > > Packet::createRead()/createWrite(), which is a step in that direction. I > > think this would be a relatively non-controversial change moving in the > > direction of custom allocators. It may also allow us to do some testing > on > > this more easily. > > > > Another argument against custom allocators is that we'll be adding a > > significant amount of error-prone code. I like the idea of using standard > > libraries as much as possible to aid in understanding the code and > reducing > > the bug surface. > > > > That said, if we get a 20% speedup or a reduction in memory footprint > then > > this change would likely be worth it! > > > > Cheers, > > Jason > > > > On Wed, Nov 28, 2018 at 7:22 PM Gabe Black <[email protected]> wrote: > > > >> I was just wondering whether it would make sense to have custom > allocators > >> defined for Request and Packet types which would keep around a pool of > them > >> rather than defaulting to the normal allocator. I suspect since both > types > >> of objects are allocated very frequently this could save a lot of heap > >> overhead. > >> > >> I think there are two complications to doing this. First, we'd want to > make > >> sure Request and Packet objects are truly those objects and are of the > >> appropriate size. Subclasses could add new members and make the types > >> bigger. I think to ensure that, we'd need to add the "final" keyword > (added > >> in C++11) to ensure those types can't be subclassed and have > unpredictable > >> sizes. > >> > >> Second, we use make_shared a lot to build Requests, and that actually > >> allocates a larger amount of memory to hold the object and reference > count > >> information. It allocates that larger object with new, and it looks like > >> you're supposed to use allocate_shared if you want to use a custom > >> allocator. Writing a custom allocator looks relatively complicated, but > it > >> might make a big difference if it works correctly. > >> > >> Also we'd probably want to put the Request allocator incantation in a > small > >> function rather than calling make_shared or allocate_shared directly to > >> avoid a lot of boilerplate and churn if things need to change. > >> > >> Thoughts? Like I said I suspect this may make a significant difference, > but > >> it might not be easy to implement and may not actually make much of a > >> difference at all. > >> > >> Gabe > >> _______________________________________________ > >> gem5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
