Hi, It seems that the built-in tokenizers (or at least the unicode61 one) has no lower-limit regarding the number of characters in a token. For instance looking for records containing `t` will return the ones with sentences containing ?don?t?.
Does this mean FTS is indexing all the ?I? and ?a? in English sentences as well as all single digit occurrences, or is there some higher level exclusion heuristic? Is there any way to configure the tokenize to ignore token less than 2 characters? -Pol ________________________________ Pol-Online info at pol-online.net (mailto:info at pol-online.net)