Alexander, Search stores all fields/values as UTF-8. As long as your encoding reaches Search as UTF-8 things should work.
However, if your data contains different languages that is only part of the problem. The other part is that analyzers need to be aware of language. E.g., the definition of a "word" in English is different from Chinese. All the analyzers in Search analyze based on ASCII, e.g. a word boundary is a space (0x20). Now, the Search analyzers may not be aware of other languages but they will treat both indexes and queries the same way, so even it's wrong it's consistently wrong (I hope that makes sense). We have some tests that check different character sets on top of UTF-8 but to be extra sure I would run some tests yourself to verify the entire stack plays well together. -Ryan On Thu, Feb 2, 2012 at 8:09 AM, Alexander Sicular <[email protected]>wrote: > Hello All, > > Are there limitations as to character sets or other special characters > (perhaps a guide or docs) that I should be actively filtering out of user > generated searches that will be passed to Riak Search? > > Yes, standard filtering rules apply, but are there any specific "gotchas" > that people have come across when implementing riak search? > > A rough sketch of my flow: > Browser > nodejs > riak search > mapreduce > > > Tia, > Alexander > > @siculars on twitter > http://siculars.posterous.com > > Sent from my iRotaryPhone > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
