[Issue 13410] Performance problem with associative array byKey/byValue

via Digitalmars-d-bugs Mon, 01 Sep 2014 05:36:01 -0700

https://issues.dlang.org/show_bug.cgi?id=13410


--- Comment #6 from Ketmar Dark <ket...@ketmar.no-ip.org> ---
(In reply to bearophile_hugs from comment #5)
> For me the C++ version of the code runs in about 0.20 seconds, so there's
> still a significant performance difference.
yes, AA still needs to search for the "first item" many times, caching just
makes it faster in some cases. this can't be changed without complete rewrite
of AA (and with performance hit for other use cases).

"get first item" will always be costly for AAs, that's the way AAs works. there
are alot of 'buckets' for items, many of which are empty. to get *known* item
from AA we can choose the necessary bucket with simple '%' operation, but to
find "first available" item we must loop thru all buckets until we find the
used one.

my patch does caching of "first used bucket" index (and tries to keep that in
sync when AA contents changes), but sometimes i have to just "reset" cache, or
get/remove operations will become unnecessarely slow.

besides, alot of adding/removing between 'byKey' calls will update cache for no
reason at all, doing unnecessary calculations.

i tried to "amortise" that bad cases by not updating cache when there was no
calls to byKey/byValue for the given AA, and by resetting cache on AA
rehashing. this allows us to has almost no slowdowns when we using AAs without
iterating and we can just call aa.rehash before adding/deleting big amount of
data to turn off caching (besides, adding alot of data will eventually hit
rehashing too).

i can't think out anything better for now. trying to speed up our use case will
lead only to code bloating.

--

[Issue 13410] Performance problem with associative array byKey/byValue

Reply via email to