https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356
Bug ID: 91356 Summary: Poor optimization of calls involving std::unique_ptr Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: nisse at lysator dot liu.se Target Milestone: --- The naïve understanding of unique_ptr, is that it is handled the same way as a raw pointer, with just * additional compile time safety checks, and * automatic runtime calls to delete whenever a non-null unique_ptr goes out of scope. However, the calling convention for unique_ptr implies a *lot* more overhead than passing a raw pointer. For a start, a unique_ptr is not passed in a register, but by "invisible reference". To make things worse, the invisible reference refers to a temporary object that the caller is responsible for destroying. Consider a function just passing on a unique_ptr: void bar(std::unique_ptr<int> p); void baz(std::unique_ptr<int> p) { bar(std::move(p)); } This compiles (with g++-8 -O3 --fno-exceptions, on gnu/linux x86_64) to _Z3bazSt10unique_ptrIiSt14default_deleteIiEE: subq $24, %rsp movq (%rdi), %rax movq $0, (%rdi) leaq 8(%rsp), %rdi movq %rax, 8(%rsp) call _Z3barSt10unique_ptrIiSt14default_deleteIiEE@PLT movq 8(%rsp), %rdi testq %rdi, %rdi je .L6 movl $4, %esi call _ZdlPvm@PLT .L6: addq $24, %rsp ret As I read this, the steps are 1. Allocate a new temporary unique_ptr on the stack. 2. Move-construct it from the input argument (pointed to by %rdi). 3. Put the address of the object in %rdi, and invoke the bar function. 4. Destroy the temporary object, including a null test and a branch, and a call to the destructor of the underlying type if appropriate. This can be compared to the raw pointer version, void bar(int* p); void baz(int* p) { bar(p); } which compiles to a single jump instruction: _Z3bazPi: jmp _Z3barPi@PLT As far as I understand, it's not possible to really fix this in just the compiler or library, it's also an ABI issue. I see two somewhat independent things needed to make the calling convention for unique_ptr more efficient: 1. Move responsibility for destructing the temporary object from caller to callee. This is particularly nice for unique_ptr, since the callee often knows statically that the unique_ptr is null when going out of scope, and then both the null test and the destructor call should be optimized away completely. I don't fully understand C++ rules on destruction order, but I've been told that callee-destruction is allowed by the language specification (and used in the i386-pc-win32 abi). It's less clear if a forwarding function like baz(std::unique_ptr<int> p) can delegate responsibility further. 2. Make it possible to pass small objects in registers, even if they have a non-trivial destructor or copy-constructor. In particular, invoke the unique_ptr destructor with the object to be destructed in a register. The callee may then need to move the object to memory if it for any reason needs a pointer to it. To allow that move, one may need something like a "relocatable" property, https://quuxplusone.github.io/draft/d1144-object-relocation.html, or https://en.cppreference.com/w/cpp/language/attributes/no_unique_address