Ok. I realize I also didn't tell @Serge who never replied about needing to do `wordFrequencies.sort`. So, as penance I did this series of **_six_** successively faster versions. First would be @gemath's, then the no-deps variant. Then using the heapqueue I mentioned. Then `Table[string,int]` instead of `CountTable[string]`. Then (for a big 2x time drop) using `cligen mfile/mslice`. Then pre-sizing with a good guess for average bytes per unique word. First will be the code and then the timings on an i7-6700k@4.7GHz with hopefully easy to reproduce input data. To keep things simple, I only include the `iterator` definition once. # whist 00 import strutils, tables, zero_functional proc main() = var histo = initCountTable[string]() for line in lines "/dev/stdin": for word in line.split(" "): if word.len > 0: histo.inc(word) histo.sort histo.pairs() --> take(10).foreach(echo it[0] & " " & $it[1]) main() #whist 0i; same as above but post build iterative not functional histo.sort # maybe the 'real' answer to @Serge's Question var i = 10 for word, cnt in histo: echo cnt, " ", word i.dec if i == 0: break # whist1; Now introduce `iterator topN` import strutils, tables, heapqueue iterator topN[T](h: CountTable[T]|Table[T, int], n=10): tuple[cnt: int; key: T] = var q = initHeapQueue[tuple[cnt: int; key: T]]() for key, cnt in h: if q.len < n: q.push((cnt, key)) elif cnt > q[0].cnt: # retain 1st seen on tied cnt discard q.replace((cnt, key)) while q.len > 0: # q now has top n entries yield q.pop # Print in ascending order proc main() = var histo = initCountTable[string]() for line in lines "/dev/stdin": for word in line.split(" "): if word.len > 0: histo.inc(word) for tup in histo.topN: echo tup[0], " ", tup[1] #whist2: move from `CountTable[string]` to `Table[string,int]` proc main() = var histo = initTable[string, int]() for line in lines "/dev/stdin": for word in line.split(" "): if word.len > 0: histo.mgetOrPut(word, 0).inc for tup in histo.topN: echo tup[0], " ", tup[1] # whist3; move to memory mapped IO and from string to slices. import cligen/[mfile, mslice] proc main() = let mf = mopen("/dev/stdin") var histo = initTable[MSlice, int]() for line in mf.mSlices: for word in line.mSlices(' '): if word.len > 0: histo.mgetOrPut(word, 0).inc for tup in histo.topN: echo tup[0], " ", tup[1] # whist4: presize table; everything the same except this line var histo = initTable[MSlice, int](rightSize(mf.len div 45)) Run
Then we run these 6 versions against the text files that are the first 50,000 lines of alphabetically-recursively globbed source in the Nim git repository. There are Zsh-isms here with `setopt extendedglob numericglobsort` and I use a little `utime` program I have that times the execution of its argument like /usr/bin/time but to nanoseconds and a little program to emit Parzen-interpolated quantiles with 0 equivalent to the sample minimum. nim-repo..908bbb250f$ cat **.nim|tail -n50000 > /tmp/j $ buncha PGO compiles with my nim-pgo script; panics:on -d:useMalloc --gc:arc, Linux $ for p in whist[0-4]*(x.); echo $p $((repeat 100 utime ./$p < j >/dev/null)|& quantiles 0 .05 .1) whist00 0.02391167 0.02400840 0.02403021 whist0i 0.02335627 0.02340394 0.02345088 whist1 0.02189838 0.02192395 0.02201866 whist2 0.01853665 0.01885521 0.01904099 whist3 0.00922274 0.00943208 0.00952293 whist4 0.00760648 0.00772719 0.00777402 Run My `/tmp` is a /dev/shm FS. So, no device IO here. To maybe see ratios relative to the "first row" as we optimize, we can pipe to an `awk`: $ awk '{print $1,$2/0.02391167,$3/0.02400840,$4/0.02403021}' whist00 1 1 1 whist0i 0.976773 0.974823 0.975892 whist1 0.915803 0.913178 0.916291 whist2 0.775214 0.785359 0.792377 whist3 0.3857 0.392866 0.39629 whist4 0.318107 0.321854 0.32351 Run Not bad. We eventually do over 3x better than the initial version going from 24ms to 7.6ms. As @gemath alludes to, the hit for zero_functional is only like 2.5%. The topN iterator surprised me in how little it saved (at that slow stage) only being another 6%. Then regular `Table` is 1.2x faster than `CountTable`. Then `MSlice` and memory mapped IO is about 2.0x faster than using `string`. Then presizing gets you another 1.2x speed-up. As a cross check on my input data, this is the output or just `whist4`: 1201 of 1296 echo 1317 """ 1389 let 1934 doAssert 2180 proc 2543 # 2574 var 3441 == 11863 = Run The outputs are the same across all programs but whist0* prints those 10 things in reverse order. Obviously easy and fast to use `proc algorithm.reverse` to flip those if desired.