Hi, We're using Ferret in a slightly unorthodox way: We're indexing a large (>100,000) list of names of places all around the world. Mostly we're quite happy with it, and have been able to graft on our own particular required functionality with just a little tweaking.
There's one strange problem, though: We've got a place in Cyprus called "Gazima\304\237usa" (that \304\237 is a multibyte character in UTF-8), and it matches a search for "usa". We'd rather it not match. I don't know that much about Ferret or about this sort of indexing in general, but is this because Ferret views \304\237 as a word break, and splits the name into two words? If so, is there a way you'd recommend to get around this -- keeping in mind that we've got names in romanized forms of many different languages? Thanks in advance, Francis _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

