Any fuzzy or trigram substring search library, with index, so it's instant?

alexeypetrushin Mon, 29 May 2023 03:30:10 -0700

Just a side note.

I did what I always do when I have idea to reinvent a bycicle. Stop, have a 
good rest, lay down on coach, had a walk in the forest, and rejected idea to 
make my own casino with blackjack and hookers. I already have my own web 
framework for toy plays and that's enough :).


Yet, I think the in-memory Search Engine could be a very good showcase of Nim. 
As it has bunch of interesting tasks a) IO for high request / sec, ideally CSP 
instead of async. b) threads / multicore to use all CPU cores to scan in-memory 
search indexes in parallel c) multicore, as you share the search index and 
process it in parallel d) memory and algorighms, the index structure should be 
both efficient in memory and colocated (value types) for fastest CPU (maybe 
even GPU) scan. e) the code size should be relatively small if you limit it to 
in-memory only, so it would be still accessible for people to examine and learn.

It could be a blueprint example of what Nim could do and how to use it. And it 
should be good enough for real usage as a a small and lightweight private 
search engine. And even could be compiled to JS and used for in-browser search.

It could be implemented in 2 ways:

  1. Classical way, using inverse term index.
  2. Novel way, split docs in chunks, build for each sparce token vector, 
compress sparce vector into dence feature vector, and then do search as nearest 
neighbours search.



The tokens could be clasical tokenisers, trigrams etc.

Any fuzzy or trigram substring search library, with index, so it's instant?

Reply via email to