[Bug libstdc++/54075] [4.7.1] unordered_map insert 3x slower than 4.6.2

fdumont at gcc dot gnu.org Sat, 03 Nov 2012 08:28:54 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075


--- Comment #33 from François Dumont <fdumont at gcc dot gnu.org> 2012-11-03 
15:28:30 UTC ---
I fear that this performance issue is a normal drawback of the major
enhancement for PR 41975. Before this evolution the hashtable data model was
like a std::vector<std::forward_list>. During the insert operation it was easy
to find out if the inserted element was already in the targetted bucket by
simply checking for equality with each element of the bucket forwward_list. The
code was something like:

for (auto it = bucket.begin(); it != bucket.end(); ++it)
  if (comp(*it, element))
    // The element already exists
    return false;
// Ok, no equivalent element
return true;

  After the evolution the data model became a combination of a
std::forward_list containing all elements and a
std::vector<std::forward_list<>::iterator> representing buckets. Now to check
if an element is already in a bucket we still need to compare for equality with
each element of the bucket but the element of the bucket are harder to
identified. There is no more bucket end so we must also check that the element
we are about to compare is still in the bucket. The code became something like:

// Pre: bucket not empty.
for (auto it = buckets[n];)
{
  if (comp(*it, element))
    // The element already exists
    return false;
  ++it;
  if (it == end() || bucket(it) != n)
    // We are not in bucket n anymore
    return false;
}
// Ok, no equivalent element
return true;

The new design helps to iterate through the container because even if it is
almost empty we simply need to iterate within the forward_list. In the previous
implementation we needed to iterate within a bucket forward_list and then jump
above all empty buckets which could be very expensive. Try to run
testsuite/performance/23_containers/insert_erase/41975.cc, you can easily tweak
it to make it run with the tr1 implementation and you will notice the
difference. This test also show the insert overhead even if with a perfect
hasher like the one use in this test the impact is very limited.

To make bucket(it) not too expensive the new implementation caches most of the
time the hash code which explain the additional memory. You can try to ask for
it not to be cache but you might have to qualified your hasher with noexcept,
static assertions will tell you so.

So for the moment I can see a way to restore tr1 insert performance without
impacting erase performance. In my opinion, a Standard implementation needs to
behave correctly in all use cases and not to be perform very well in one use
case but very bad in an other.

[Bug libstdc++/54075] [4.7.1] unordered_map insert 3x slower than 4.6.2

Reply via email to