One approach would be to "normalize" all the text and search against that.
That is, basically convert all non-ASCII characters to their equivalents. I've had to do this in Solr for searching for the exact reasons you've outlined: treat "ñ" as "n". Ditto for "ü" -> "u", "é" => "e", etc. This is easily done in Solr via the included ASCIIFoldingFilterFactory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory You could look at the code to see how they do the conversion and implement it. /Cody On Oct 1, 2011, at 7:09 PM, planas wrote: > On Sun, 2011-10-02 at 01:25 +0200, Reuven M. Lerner wrote: >> Hi, everyone. I'm working on a project on PostgreSQL 9.0 (soon to be >> upgraded to 9.1, given that we haven't yet launched). The project will >> involve numerous text fields containing English, Spanish, and Portuguese. >> Some of those text fields will be searchable by the user. That's easy >> enough to do; for our purposes, I was planning to use some combination of >> LIKE searches; the database is small enough that this doesn't take very much >> time, and we don't expect the number of searchable records (or columns >> within those records) to be all that large. >> >> The thing is, the people running the site want searches to work on what I'm >> calling (for lack of a better term) "bare" letters. That is, if the user >> searches for "n", then the search should also match Spanish words containing >> "ñ". I'm told by Spanish-speaking members of the team that this is how they >> would expect searches to work. However, when I just did a quick test using >> a UTF-8 encoded 9.0 database, I found that PostgreSQL didn't see the two >> characters as identical. (I must say, this is the behavior that I would >> have expected, had the Spanish-speaking team member not said anything on the >> subject.) >> >> So my question is whether I can somehow wrangle PostgreSQL into thinking >> that "n" and "ñ" are the same character for search purposes, or if I need to >> do something else -- use regexps, keep a "naked," searchable version of each >> column alongside the native one, or something else entirely -- to get this >> to work. >> > Could you parse the search string for the non-English characters and convert > them to the appropriate English character? My skills are not that good or I > would offer more details. >> Any ideas? >> >> Thanks, >> >> Reuven >> >> >> -- >> Reuven M. Lerner -- Web development, consulting, and training >> Mobile: +972-54-496-8405 * US phone: 847-230-9795 >> Skype/AIM: reuvenlerner > > > -- > Jay Lozier > jsloz...@gmail.com