Given the current state of the auth modules today, this is an auth_LDAP issue only. LDAP requires UTF-8 not only for user names but any other text string that is passed to it (with the exception of the password, but that is another issue). In order for any of the other auth modules to accept UTF-8 they would have to be modified. For example, I believe that mod_auth simply uses a strcmp or something similar to compare the user name and password against the contents of the htpasswd file. Any comparisons would have to be modified to be compatible with UTF-8 strings. In my opinion this would require APR to adopt a UTF-8 string manipulation set of API's. The next issue is the fact that there is no good way that I can see for a module or a filter to decrypt the header auth string, convert it to UTF-8 and then re-encrypt it so that it is compatible with any auth module. The decryption needs to be done at the point when the auth module makes the comparison. Many of the auth modules use the function ap_get_basic_auth_pw() to decrypt a base64 auth header string. There is the possibility of enabling the UTF-8 conversion functionality in this function, but ap_get_basic_auth_pw() would have to get much smarter about charsets and the parameters would have to change. Also, should the password be converted to UTF-8? I know that a UTF-8 password may not always work with the Novell LDAP library. It just wants the password as a bag-o-bits just as it was received. If a module or filter were to handle the conversion, as I see it, the results would need to be stored in either another field in the request_rec or as a name/value pair in one of the tables. The original auth header should remain as is. Both of these solutions would require fixing all auth modules before they could take advantage of UTF-8. I don't have a problem with this solution and would be willing to create a mod_request_charset module, but the auth modules would have to be adapted to work with UTF-8 strings as I mentioned before. In reality, this is not a rock solid solution to the charset problem for auth_LDAP or for any other auth module. It is simply an attempt to make auth_LDAP work in many situations where it would have been guaranteed to fail. The real issue is the fact that the HTTP header does not carry enough information to enable a real charset conversion. AFAIK there is nothing in the header that tells the server what charset was used to create the request. The only bit of information is the accept-language header which is meant to indicate what language the browser will accept. This solution is just making a best guess effort based on the information it has. I am all for a global solution to this problem since a global solution would probably take care of other issues such as charset encoded URLs. In the little bit of work I have done so far with regards to charsets, I just haven't been able to come up with a global solution. I seem to be running into more issues than solutions.
Also, I decided to put the charset.conv file in experimental until auth_ldap is promoted to a standard module. If docs/conf is a better place to put it at this time, I am all for it. Brad Brad Nicholes Senior Software Engineer Novell, Inc., the leading provider of Net business solutions http://www.novell.com >>> [EMAIL PROTECTED] Monday, December 30, 2002 11:51:28 AM >>> The more I look at this patch, the more concerned I become. Here's an easy question; for starters... Isn't this the same problem that users of all auth schemas encounter? Even supposing the user creates their own user identifier and it permits opaque bytes, that user may begin on a browser configured to send all requests as utf-8, and later open a browser configured with their local code page. I don't see how this is an LDAP issue. Would you consider moving this new code to a mod_request_charset module? Then all auth user may have normalized usernames. We simply document that turning on mod_request_charset affects all auth providers, and therefore all auth databases should be created with utf-8 usernames. Even Basic auth passwords suffer the same. So do digest hashes, but there is nothing we can ever do about that. Once this code is in its own module, we can begin to expand on it. The last problem I have with the patch is a nit. Can we please move the charset.conv file over to the docs/conf directory where it belongs? That will help get fixes committed to that file by our very multilingual docs team. Bill At 06:13 PM 12/13/2002, you wrote: >bnicholes 2002/12/13 16:13:15 > > Modified: modules/experimental NWGNUauthldap mod_auth_ldap.c > Added: modules/experimental charset.conv > Log: > Added character set support to mod_auth_LDAP to allow it to convert > extended characters used in the user ID to UTF-8 before authenticating > against the LDAP directory. The new directive AuthLDAPCharsetConfig is > used to specify the config file that contains the character set conversion table. > > Revision Changes Path > 1.7 +1 -0 httpd-2.0/modules/experimental/NWGNUauthldap > > Index: NWGNUauthldap > =================================================================== > RCS file: /home/cvs/httpd-2.0/modules/experimental/NWGNUauthldap,v > retrieving revision 1.6 > retrieving revision 1.7 > diff -u -r1.6 -r1.7 > --- NWGNUauthldap 16 Oct 2002 23:52:26 -0000 1.6 > +++ NWGNUauthldap 14 Dec 2002 00:13:15 -0000 1.7 > @@ -246,6 +246,7 @@ > # > install :: nlms FORCE > copy $(OBJDIR)\*.nlm $(INSTALL)\Apache2\modules\*.* > + copy charset.conv $(INSTALL)\Apache2\conf\*.* > > # > # Any specialized rules here > > > > 1.10 +160 -2 httpd-2.0/modules/experimental/mod_auth_ldap.c > > Index: mod_auth_ldap.c > =================================================================== > RCS file: /home/cvs/httpd-2.0/modules/experimental/mod_auth_ldap.c,v > retrieving revision 1.9 > retrieving revision 1.10 > diff -u -r1.9 -r1.10 > --- mod_auth_ldap.c 11 Dec 2002 06:11:11 -0000 1.9 > +++ mod_auth_ldap.c 14 Dec 2002 00:13:15 -0000 1.10 > @@ -62,6 +62,7 @@ > > #include <apr_ldap.h> > #include <apr_strings.h> > +#include <apr_xlate.h> > > #include "ap_config.h" > #if APR_HAVE_UNISTD_H > @@ -116,7 +117,7 @@ > it's the exact string passed by the HTTP client */ > > int netscapessl; /* True if Netscape SSL is enabled */ > - int starttls; /* True if StartTLS is enabled */ > + int starttls; /* True if StartTLS is enabled */ > } mod_auth_ldap_config_t; > > typedef struct mod_auth_ldap_request_t { > @@ -143,6 +144,59 @@ > > /* ---------------------------------------- */ > > +static apr_hash_t *charset_conversions = NULL; > +static char *to_charset = NULL; /* UTF-8 identifier derived from the charset.conv file */ > + > +/* Derive a code page ID give a language name or ID */ > +static char* derive_codepage_from_lang (apr_pool_t *p, char *language) > +{ > + int lang_len; > + int check_short = 0; > + char *charset; > + > + if (!language) // our default codepage > + return apr_pstrdup(p, "ISO-8859-1"); > + else > + lang_len = strlen(language); > + > + charset = (char*) apr_hash_get(charset_conversions, language, APR_HASH_KEY_STRING); > + > + if (!charset) { > + language[2] = '\0'; > + charset = (char*) apr_hash_get(charset_conversions, language, APR_HASH_KEY_STRING); > + } > + > + if (charset) { > + charset = apr_pstrdup(p, charset); > + } > + > + return charset; > +} > + > +static apr_xlate_t* get_conv_set (request_rec *r) > +{ > + char *lang_line = (char*)apr_table_get(r->headers_in, "accept-language"); > + char *lang; > + apr_xlate_t *convset; > + > + if (lang_line) { > + lang_line = apr_pstrdup(r->pool, lang_line); > + for (lang = lang_line;*lang;lang++) { > + if ((*lang == ',') || (*lang == ';')) { > + *lang = '\0'; > + break; > + } > + } > + lang = derive_codepage_from_lang(r->pool, lang_line); > + > + if (lang && (apr_xlate_open(&convset, to_charset, lang, r->pool) == APR_SUCCESS)) { > + return convset; > + } > + } > + > + return NULL; > +} > + > > /* > * Build the search filter, or at least as much of the search filter that > @@ -168,6 +222,33 @@ > mod_auth_ldap_config_t *sec) > { > char *p, *q, *filtbuf_end; > + char *user; > + apr_xlate_t *convset = NULL; > + apr_size_t inbytes; > + apr_size_t outbytes; > + char *outbuf; > + > + if (r->user != NULL) { > + user = apr_pstrdup (r->pool, r->user); > + } > + else > + return; > + > + if (charset_conversions) { > + convset = get_conv_set(r); > + } > + > + if (convset) { > + inbytes = strlen(user); > + outbytes = (inbytes+1)*3; > + outbuf = apr_pcalloc(r->pool, outbytes); > + > + /* Convert the user name to UTF-8. This is only valid for LDAP v3 */ > + if (apr_xlate_conv_buffer(convset, user, &inbytes, outbuf, &outbytes) == APR_SUCCESS) { > + user = apr_pstrdup(r->pool, outbuf); > + } > + } > + > /* > * Create the first part of the filter, which consists of the > * config-supplied portions. > @@ -179,7 +260,7 @@ > * LDAP filter metachars are escaped. > */ > filtbuf_end = filtbuf + FILTER_LENGTH - 1; > - for (p = r->user, q=filtbuf + strlen(filtbuf); > + for (p = user, q=filtbuf + strlen(filtbuf); > *p && q < filtbuf_end; *q++ = *p++) { > if (strchr("*()\\", *p) != NULL) { > *q++ = '\\'; > @@ -270,6 +351,13 @@ > return result; > } > > + if (r->user == NULL) { > + ap_log_rerror(APLOG_MARK, APLOG_DEBUG|APLOG_NOERRNO, 0, r, > + "[%d] auth_ldap authenticate: no user specified", getpid()); > + util_ldap_connection_close(ldc); > + return sec->auth_authoritative? HTTP_UNAUTHORIZED : DECLINED; > + } > + > /* build the username filter */ > mod_auth_ldap_build_filter(filtbuf, r, sec); > > @@ -796,6 +884,13 @@ > return NULL; > } > > +static const char *set_charset_config(cmd_parms *cmd, void *config, const char *arg) > +{ > + ap_set_module_config(cmd->server->module_config, &auth_ldap_module, > + (void *)arg); > + return NULL; > +} > + > command_rec mod_auth_ldap_cmds[] = { > AP_INIT_TAKE1("AuthLDAPURL", mod_auth_ldap_parse_url, NULL, OR_AUTHCFG, > "URL to define LDAP connection. This should be an RFC 2255 complaint\n" > @@ -870,6 +965,10 @@ > (void *)APR_OFFSETOF(mod_auth_ldap_config_t, frontpage_hack), OR_AUTHCFG, > "Set to 'on' to support Microsoft FrontPage"), > > + AP_INIT_TAKE1("AuthLDAPCharsetConfig", set_charset_config, NULL, RSRC_CONF, > + "Character set conversion configuration file. If omitted, character set" > + "conversion is disabled."), > + > #ifdef APU_HAS_LDAP_STARTTLS > AP_INIT_FLAG("AuthLDAPStartTLS", ap_set_flag_slot, > (void *)APR_OFFSETOF(mod_auth_ldap_config_t, starttls), OR_AUTHCFG, > @@ -879,8 +978,67 @@ > {NULL} > }; > > +static int auth_ldap_post_config(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *s) > +{ > + ap_configfile_t *f; > + char l[MAX_STRING_LEN]; > + const char *charset_confname = ap_get_module_config(s->module_config, > + &auth_ldap_module); > + apr_status_t status; > + > + if (!charset_confname) { > + return OK; > + } > + > + charset_confname = ap_server_root_relative(p, charset_confname); > + if (!charset_confname) { > + ap_log_error(APLOG_MARK, APLOG_ERR, APR_EBADPATH, s, > + "Invalid charset conversion config path %s", > + (const char *)ap_get_module_config(s->module_config, > + &auth_ldap_module)); > + return HTTP_INTERNAL_SERVER_ERROR; > + } > + if ((status = ap_pcfg_openfile(&f, ptemp, charset_confname)) > + != APR_SUCCESS) { > + ap_log_error(APLOG_MARK, APLOG_ERR, status, s, > + "could not open charset conversion config file %s.", > + charset_confname); > + return HTTP_INTERNAL_SERVER_ERROR; > + } > + > + charset_conversions = apr_hash_make(p); > + > + while (!(ap_cfg_getline(l, MAX_STRING_LEN, f))) { > + const char *ll = l; > + char *lang; > + > + if (l[0] == '#') { > + continue; > + } > + lang = ap_getword_conf(p, &ll); > + ap_str_tolower(lang); > + > + if (ll[0]) { > + char *charset = ap_getword_conf(p, &ll); > + apr_hash_set(charset_conversions, lang, APR_HASH_KEY_STRING, charset); > + } > + } > + ap_cfg_closefile(f); > + > + to_charset = derive_codepage_from_lang (p, "utf-8"); > + if (to_charset == NULL) { > + ap_log_error(APLOG_MARK, APLOG_ERR, status, s, > + "could not find the UTF-8 charset in the file %s.", > + charset_confname); > + return HTTP_INTERNAL_SERVER_ERROR; > + } > + > + return OK; > +} > + > static void mod_auth_ldap_register_hooks(apr_pool_t *p) > { > + ap_hook_post_config(auth_ldap_post_config,NULL,NULL,APR_HOOK_MIDDLE); > ap_hook_check_user_id(mod_auth_ldap_check_user_id, NULL, NULL, APR_HOOK_MIDDLE); > ap_hook_auth_checker(mod_auth_ldap_auth_checker, NULL, NULL, APR_HOOK_MIDDLE); > } > > > > 1.1 httpd-2.0/modules/experimental/charset.conv > > Index: charset.conv > =================================================================== > > # Lang-abbv Charset Language > #--------------------------------- > en ISO-8859-1 English > UTF-8 utf8 UTF-8 > Unicode ucs Unicode > th Cp874 Thai > ja SJIS Japanese > ko Cp949 Korean > zh Cp950 Chinese-Traditional > zh-cn GB2312 Chinese-Simplified > zh-tw Cp950 Chinese > cs ISO-8859-2 Czech > hu ISO-8859-2 Hungarian > hr ISO-8859-2 Croation > pl ISO-8859-2 Polish > ro ISO-8859-2 Romanian > sr ISO-8859-2 Serbian > sk ISO-8859-2 Slovak > sl ISO-8859-2 Slovenian > sq ISO-8859-2 Albanian > bg ISO-8859-5 Bulgarian > be ISO-8859-5 Byelorussian > mk ISO-8859-5 Macedonian > ru ISO-8859-5 Russian > uk ISO-8859-5 Ukrainian > ca ISO-8859-1 Catalan > de ISO-8859-1 German > da ISO-8859-1 Danish > fi ISO-8859-1 Finnish > fr ISO-8859-1 French > es ISO-8859-1 Spanish > is ISO-8859-1 Icelandic > it ISO-8859-1 Italian > nl ISO-8859-1 Dutch > no ISO-8859-1 Norwegian > pt ISO-8859-1 Portuguese > sv ISO-8859-1 Swedish > af ISO-8859-1 Afrikaans > eu ISO-8859-1 Basque > fo ISO-8859-1 Faroese > gl ISO-8859-1 Galician > ga ISO-8859-1 Irish > gd ISO-8859-1 Scottish > mt ISO-8859-3 Maltese > eo ISO-8859-3 Esperanto > el ISO-8859-7 Greek > tr ISO-8859-9 Turkish > he ISO-8859-8 Hebrew > iw ISO-8859-8 Hebrew > ar ISO-8859-6 Arabic > et ISO-8859-1 Estonian > lv ISO-8859-2 Latvian > lt ISO-8859-2 Lithuanian > > >