http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075
--- Comment #33 from François Dumont <fdumont at gcc dot gnu.org> 2012-11-03 15:28:30 UTC --- I fear that this performance issue is a normal drawback of the major enhancement for PR 41975. Before this evolution the hashtable data model was like a std::vector<std::forward_list>. During the insert operation it was easy to find out if the inserted element was already in the targetted bucket by simply checking for equality with each element of the bucket forwward_list. The code was something like: for (auto it = bucket.begin(); it != bucket.end(); ++it) if (comp(*it, element)) // The element already exists return false; // Ok, no equivalent element return true; After the evolution the data model became a combination of a std::forward_list containing all elements and a std::vector<std::forward_list<>::iterator> representing buckets. Now to check if an element is already in a bucket we still need to compare for equality with each element of the bucket but the element of the bucket are harder to identified. There is no more bucket end so we must also check that the element we are about to compare is still in the bucket. The code became something like: // Pre: bucket not empty. for (auto it = buckets[n];) { if (comp(*it, element)) // The element already exists return false; ++it; if (it == end() || bucket(it) != n) // We are not in bucket n anymore return false; } // Ok, no equivalent element return true; The new design helps to iterate through the container because even if it is almost empty we simply need to iterate within the forward_list. In the previous implementation we needed to iterate within a bucket forward_list and then jump above all empty buckets which could be very expensive. Try to run testsuite/performance/23_containers/insert_erase/41975.cc, you can easily tweak it to make it run with the tr1 implementation and you will notice the difference. This test also show the insert overhead even if with a perfect hasher like the one use in this test the impact is very limited. To make bucket(it) not too expensive the new implementation caches most of the time the hash code which explain the additional memory. You can try to ask for it not to be cache but you might have to qualified your hasher with noexcept, static assertions will tell you so. So for the moment I can see a way to restore tr1 insert performance without impacting erase performance. In my opinion, a Standard implementation needs to behave correctly in all use cases and not to be perform very well in one use case but very bad in an other.