Hi David, Funny you mention this as I was working on it today. I have been using PostgreSQL's pg_trigram for years and had the interest to implement this in 4D. The code to create n-grams is pretty simple. Putting trigrams in a keyword indexed text field makes it super fast to find matches. The main problem seems to be finding too many matches. Scoring the results is a lot less efficient than finding the candidates. Still looking into it.
I'm still using the LCS code you published years ago. Thanks very much for that. If anything useful comes out of my exploration of trigrams, I'll try to do the same. John DeSoi, Ph.D. > On Jul 6, 2017, at 6:17 PM, David Adams via 4D_Tech <4d_tech@lists.4d.com> > wrote: > > Next up the complexity chain is n-gram comparison (sometimes called q-gram > comparison, for no clear reason.) N-gram is a confusing term now because > historically it meant "strings of a certain length", like take a word and > break it into 3-character strings. n=3. 3 is a good length, based on > research. It's confusing now because Google's public n-gram data sets and > tools are based on proximate words, not strings. Anyway, n-gram analyses is > very powerful and proven tech...but I've failed to get great results in 4D. > It could very well be me...I haven't had enough time/attention to ever > really dive into this in recent years. ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **********************************************************************