On Wed, Jan 27, 2021 at 01:28:33AM +0000, Paul Backus via Digitalmars-d-learn wrote: > On Tuesday, 26 January 2021 at 23:57:43 UTC, methonash wrote: > > > Using AA's may not necessarily improve performance. It depends on > > > what your code does with it. Because AA's require random access > > > to memory, it's not friendly to the CPU cache hierarchy, whereas > > > traversing linear arrays is more cache-friendly and in some cases > > > will out-perform AA's. > > > > I figured a built-in AA might be an efficient path to performing > > unique string de-duplication. If there's a more performant method > > available, I'll certainly try it. > > You could try sorting the array first, and then using `uniq` [1] to > discard duplicate elements. There's an example in the docs that shows > how to do this in-place (without allocating additional memory). > > [1] http://phobos.dpldocs.info/std.algorithm.iteration.uniq.html
Yes, definitely try this. This will completely eliminate the overhead of using an AA, which has to allocate memory (at least) once per entry added. Especially since the data has to be sorted eventually anyway, you might as well sort first then use the sortedness as a convenient property for fast de-duplication. Since .uniq traverses the range linearly, this will be cache-friendly, and along with eliminating GC load should give you a speed boost. T -- Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln