This is sort of a first step along the lines I was suggesting:
import tables
type
Strings = object
strings*: string
str2ix*: Table[string, int32]
proc put*(s: var Strings, x: string) =
let n = s.strings.len.int32
if s.str2ix.mgetOrPut(x, n) == n:
s.strings.add x & "\0"
proc c_strlen(s: cstring): csize_t {.
importc: "strlen", header: "<string.h>" .}
proc str*(s: var Strings, i: int32): string =
let n = s.strings[i].addr.c_strlen
result.setLen n # this does the zero byte for us
copyMem result[0].addr, s.strings[i].addr, n
when isMainModule:
var s: Strings
s.put "two"; s.put "does"; s.put "use"; s.put "two"
echo s.str2ix
echo s.str(9)
Run
Then
$ nim r intern.nim
{"use": 9, "does": 4, "two": 0}
use
Run
Of course, you get sparse indices (0, 4, 9) not a dense (0,1,2) word numbering
this way, but this was unspecified by @Serge's question. Sparse indices are
already "fast like integers" and so may be all you really need. If you need
dense numbers, you could do more metadata like another `seq` to map word to
offset { also used in the customized table to avoid `string`, etc. }
There are many succint representations like tries, Nim's own `critbits`, ..,
but I doubt they are much smaller (if not bigger) and are likely slower in
practice than the above simple idea, and they are often substantially more
complex to code.