According to Sunny Fortune: > My site contains documents in both English as well as > Spanish. I would like my search to accept both english > and spanish words and return the appropriate > documents. > > Is there a way of indexing my site by using a single > configuration file having locale set to "C" as well as > "es_MX" and setting the dictionary attributes to the > correct paths?
English is pretty easy to index with another language, because any ISO-8859-* based locale will include the whole 7-bit ASCII set. So, with a locale of es_MX, htdig won't have any problems indexing English and Spanish words (assuming the locale actually works on your system). In general, documents of different languages can be indexed together as long as they share the same encoding, and you have a locale that supports that full encoding. The problem is that the locale definitions on some systems limit the recognized accents to a subset of the full encoding. That's not a problem in this case because the English alphabet is a subset of the Spanish one. If you can index Spanish words, you can index English ones too. The dictionaries are a bit more complicated, but less critical. They're only used for fuzzy matching, using the "endings" algorithm. If you want to support the endings algorithm in both languages, you'd need to setup two different configuration files and build the dictionaries separately for the two languages, then allow user selection of one or the other configuration file via the "config" input parameter in the search form. (See FAQ 4.10 and 4.2 at http://www.htdig.org/FAQ.html) It's theoretically possible, but may be a bit tedious, to actually merge the word and affix files for two languages to make a combined endings database for the two. The difficulty is in resolving any conflicts in affix definitions by relettering some of the affix codes. If you only need to support "endings" in one language, or not at all, then this need not be a concern either. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

