I'm trying to get <URL:http://sourceforge.net/tracker/?func=detail&aid=2830829&group_id=51719&atid=464312> fixed and I'm seeking some feedback on what I have.
What happens: a https://example.com/foo/bar/doh/leaf.html gets fetched after the user supplied http authentication. That pages refers to several stylesheets which are all located under https://example.com/ but not in the foo/bar/doh directory. Currently all those fetches for those stylesheets are resulting in a 401 as in fetch_curl.c(fetch_curl_set_options) the urldb_get_auth_details() does not return authentication which the user supplied for leaf.html. The 401 results in the FETCH_AUTH case in fetchcache.c(fetchcache_callback) and that's the end for the stylesheet parsing as we'll tumble into html_convert_css_callback()'s case CONTENT_MSG_AUTH. Aside, what's the CONTENT_MSG_AUTH case in css.c(nscss_import) ? The urldb_get_auth_details() fails because urldb_set_auth_details() gets done for the path_data of https://example.com/foo/bar/doh/ (the leafname of the URL gets stripped) so all auth retrieving fails for shorter paths. I've been reading http://tools.ietf.org/html/rfc2617 and this learns me that the http authentication details (username & password) are associated with a protection space and the latter is uniquely defined by a canonical root URL and a realm given in a 401 response for a certain URL belonging to that space. All paths below that URL belong to the same protection space. It could, but does not need so, that the paths higher than that URL do also belong to the same protection space but that's something you can not know until you try to fetch them without authentication and get a 401 with the same realm (assuming same canonical root here). I've changed the urldb data structure a bit to better reflect the protection space concept (see patch below) : in the struct host_part we now have a linked list of struct prot_space_data structures, i.e. one per protection space we've seen for that host (apart from its scheme and port because that info is found in each path_data). That structure also contains the identifying realm and associated authentication data. Each struct path_data can have a pointer to one of prot_space_data structures and this only when we know for sure it belongs to that pointed protection space. urldb_set_auth_details() will create a new prot_space_data structure when a new realm is seen for a canonical host (= being host_data + scheme and port number of that lowest path_data of given URL). The lower path_data will point to that new prot_space_data or an existing prot_space_data (in the latter case its auth data will be overwritten). urldb_get_auth_details will try to find prot_space_data in its lowest path_data of given URL and when not found iterate upwards until it finds one or hit struct host_part which means we are not finding a prot_space_data at all in the path_data travested. In that case I believe we should let urldb_get_auth_details() fail even when the host_part has one or more prot_space_data structures associated with and await a 401 which gives us the right realm so that we can make the right choice for selecting prot_space_data or if it is a new realm, offer the user the oppertunity to enter an username and password. Note it might very well be that we don't get a 401 at all as such an URL might not need authentication. I've tried to implement that but failed miserable, basically I wanted to do a browser_window_reload() or something similar in the CONTENT_MSG_AUTH case of browser.c(browser_window_callback) when urldb_get_auth_details(c->url, data.auth_realm) returns something non-NULL (btw, which would at that point have filled in the prot_space_data of lowest path_data so that at fetch_curl_set_options() time, urldb_get_auth_details(f->url, NULL) returns the same data even with realm param NULL). But even if that would somehow work, it wouldn't for loaded stylesheets as something similar should happen in the CONTENT_MSG_AUTH case in html_convert_css_callback(), right ? I see other CONTENT_MSG_AUTH cases (like css.c(nscss_import) so probably even more needs to be done. So I went for the quick hack in urldb_get_auth_details() which picks the first prot_space_data for given host if available for the case where non of the path_data structs of given URL has a ptr to prot_space_data. It works for my sf.net case mentioned above but it will result in authentication data given for URLs which do not require them, or can give wrong authentication (because of wrongly chosen prot_space_data). Now the questions: 1. Is this hack acceptable for the time being ? 2. Or should we shoot for, what I think is, the right approach, i.e. when no prot_space_data ptr can be found in path_date structs, await a 401 and with the realm given at that point take the right prot_space_data or offer the uses the possibility to enter a new username/password for that realm. But that needs to work for all content fetched, not only html (what I think is happening). TBH, this is probably above my head to implement for the time I can spend on. Aside: path_data::port seems to be 0 for URLs where the port is not specified, is that intended ? Aside: the riscos/401login.c change allows to keep the username and password in the 401login dialogue after a 2nd 401 (for the same realm and canonical root). If this is acceptable, other front-end can do a similar thing. Patch (long lines need to be trimmed before ci): --8<-- Index: riscos/401login.c =================================================================== --- riscos/401login.c (revision 8988) +++ riscos/401login.c (working copy) @@ -98,6 +98,7 @@ { struct session_401 *session; wimp_w w; + const char *auth; session = calloc(1, sizeof(struct session_401)); if (!session) { @@ -111,10 +112,28 @@ warn_user("NoMemory", 0); return; } - session->uname[0] = '\0'; - session->pwd[0] = '\0'; + if (realm == NULL) + realm = "Secure Area"; + auth = urldb_get_auth_details(session->url, realm); + if (auth == NULL) { + session->uname[0] = '\0'; + session->pwd[0] = '\0'; + } else { + const char *pwd; + size_t pwd_len; + + pwd = strchr(auth, ':'); + assert(pwd && pwd < auth + sizeof(session->uname)); + memcpy(session->uname, auth, pwd - auth); + session->uname[pwd - auth] = '\0'; + ++pwd; + pwd_len = strlen(pwd); + assert(pwd_len < sizeof(session->pwd)); + memcpy(session->pwd, pwd, pwd_len); + session->pwd[pwd_len] = '\0'; + } session->host = strdup(host); - session->realm = strdup(realm ? realm : "Secure Area"); + session->realm = strdup(realm); session->bwin = bw; if ((!session->host) || (!session->realm)) { free(session->host); Index: content/fetchers/fetch_curl.c =================================================================== --- content/fetchers/fetch_curl.c (revision 8988) +++ content/fetchers/fetch_curl.c (working copy) @@ -560,7 +560,7 @@ SETOPT(CURLOPT_COOKIE, NULL); } - if ((auth = urldb_get_auth_details(f->url)) != NULL) { + if ((auth = urldb_get_auth_details(f->url, NULL)) != NULL) { SETOPT(CURLOPT_HTTPAUTH, CURLAUTH_ANY); SETOPT(CURLOPT_USERPWD, auth); } else { Index: content/urldb.c =================================================================== --- content/urldb.c (revision 8988) +++ content/urldb.c (working copy) @@ -132,10 +132,17 @@ struct cookie_internal_data *next; /**< Next in list */ }; -struct auth_data { +/* A protection space is defined by a canonical hostname and a realm. + * The canonical hostname is here defined by struct host_part and all + * its parents together with struct prot_space_data::scheme and + * prot_space_data::port. */ +struct prot_space_data { + char *scheme; /**< URL scheme of canonical hostname of this protection space. */ + unsigned int port; /**< Port number of canonical hostname of this protection space. */ char *realm; /**< Protection realm */ char *auth; /**< Authentication details in form * username:password */ + struct prot_space_data *next; /**< Next sibling */ }; struct cache_internal_data { @@ -161,7 +168,7 @@ struct bitmap *thumb; /**< Thumbnail image of resource */ struct url_internal_data urld; /**< URL data for resource */ struct cache_internal_data cache; /**< Cache data for resource */ - struct auth_data auth; /**< Authentication data for resource */ + const struct prot_space_data *prot_space; /**< Protection space to which this resource belongs too. Can be NULL. No ownership. */ struct cookie_internal_data *cookies; /**< Cookies associated with resource */ struct cookie_internal_data *cookies_end; /**< Last cookie in list */ @@ -183,6 +190,8 @@ char *part; /**< Part of host string */ + struct prot_space_data *prot_space; /**< Canonical hostname + realm defines a protection space. Only to be accessed when struct host_part::parent == NULL. */ + struct host_part *next; /**< Next sibling */ struct host_part *prev; /**< Previous sibling */ struct host_part *parent; /**< Parent host part */ @@ -203,6 +212,7 @@ static void urldb_destroy_path_tree(struct path_data *root); static void urldb_destroy_path_node_content(struct path_data *node); static void urldb_destroy_cookie(struct cookie_internal_data *c); +static void urldb_destroy_prot_space(struct prot_space_data *space); static void urldb_destroy_search_tree(struct search_node *root); /* Saving */ @@ -925,11 +935,13 @@ * Look up authentication details in database * * \param url Absolute URL to search for + * \param realm When non-NULL, it is realm which can be used to determine + * the protection space when that's not been done before for given URL. * \return Pointer to authentication details, or NULL if not found */ -const char *urldb_get_auth_details(const char *url) +const char *urldb_get_auth_details(const char *url, const char *realm) { - struct path_data *p, *q = NULL; + struct path_data *p, *p_cur, *p_top; assert(url); @@ -940,29 +952,48 @@ if (!p) return NULL; - /* Check for any auth details attached to this node */ - if (p && p->auth.realm && p->auth.auth) - return p->auth.auth; + /* Check for any auth details attached to the path_data node or any of + * its parents. */ + for (p_cur = p; p_cur != NULL; p_top = p_cur, p_cur = p_cur->parent) { + if (p_cur->prot_space) { + return p_cur->prot_space->auth; + } + } - /* Now consider ancestors */ - for (; p; p = p->parent) { - /* The parent path entry is stored hung off the - * parent entry with an empty (not NULL) segment string. - * We look for this here. - */ - for (q = p->children; q; q = q->next) { - if (q->segment[0] == '\0') - break; + if (realm != NULL) { + const struct host_part *h = (const struct host_part *)p_top; + const struct prot_space_data *space; + + /* Search for a possible matching protection space. */ + for (space = h->prot_space; space != NULL; space = space->next) { + if (!strcmp(space->realm, realm) + && !strcmp(space->scheme, p->scheme) + && space->port == p->port) { + p->prot_space = space; + return p->prot_space->auth; + } } - - if (q && q->auth.realm && q->auth.auth) - break; } + /* FIXME: HACK: the following is a hack. We're not supposed to return + * any authentication data at this point as we do not know if given URL + * belongs to a protection space and if so, to which one. We should + * wait for a 401 and use its realm to determine that (and if there is + * no 401 for that URL, it doesn't belong to a protection space at all). + * But it looks that not all content types with a 401 can be refetched + * easily. + * This hack possibly leaks authentication data. + */ +#if 1 + else { + const struct host_part *h = (const struct host_part *)p_top; + if (h->prot_space) { + p->prot_space = h->prot_space; + return p->prot_space->auth; + } + } +#endif - if (!q) - return NULL; - - return q->auth.auth; + return NULL; } /** @@ -975,7 +1006,7 @@ bool urldb_get_cert_permissions(const char *url) { struct path_data *p; - struct host_part *h; + const struct host_part *h; assert(url); @@ -985,8 +1016,9 @@ for (; p && p->parent; p = p->parent) /* do nothing */; + assert(p); - h = (struct host_part *)p; + h = (const struct host_part *)p; return h->permit_invalid_certs; } @@ -1001,48 +1033,62 @@ void urldb_set_auth_details(const char *url, const char *realm, const char *auth) { - struct path_data *p; - char *urlt, *t1, *t2; + struct path_data *p, *pi; + struct host_part *h; + struct prot_space_data *space, *space_alloc; + char *realm_alloc, *auth_alloc, *scheme_alloc; assert(url && realm && auth); - urlt = strdup(url); - if (!urlt) - return; - - /* strip leafname from URL */ - t1 = strrchr(urlt, '/'); - if (t1) { - *(t1 + 1) = '\0'; - } - /* add url, in case it's missing */ - urldb_add_url(urlt); + urldb_add_url(url); - p = urldb_find_url(urlt); + p = urldb_find_url(url); - free(urlt); - if (!p) return; - /** \todo search subtree for same realm/auth details - * and remove them (as the lookup routine searches up the tree) */ + /* Search for host_part */ + for (pi = p; pi->parent != NULL; pi = pi->parent) + ; + h = (struct host_part *)pi; - t1 = strdup(realm); - t2 = strdup(auth); + /* Search if given URL belongs to a protection space we already know of. */ + for (space = h->prot_space; space; space = space->next) { + if (!strcmp(space->realm, realm) + && !strcmp(space->scheme, p->scheme) + && space->port == p->port) + break; + } - if (!t1 || !t2) { - free(t1); - free(t2); - return; + if (space != NULL) { + /* Overrule existing auth. */ + free(space->auth); + space->auth = strdup(auth); + } else { + /* Create a new protection space. */ + space = space_alloc = malloc(sizeof(struct prot_space_data)); + scheme_alloc = strdup(p->scheme); + realm_alloc = strdup(realm); + auth_alloc = strdup(auth); + + if (!space_alloc || !scheme_alloc || !realm_alloc || !auth_alloc) { + free(space_alloc); + free(scheme_alloc); + free(realm_alloc); + free(auth_alloc); + return; + } + + space->scheme = scheme_alloc; + space->port = p->port; + space->realm = realm_alloc; + space->auth = auth_alloc; + space->next = h->prot_space; + h->prot_space = space; } - free(p->auth.realm); - free(p->auth.auth); - - p->auth.realm = t1; - p->auth.auth = t2; + p->prot_space = space; } /** @@ -1067,6 +1113,7 @@ for (; p && p->parent; p = p->parent) /* do nothing */; + assert(p); h = (struct host_part *)p; @@ -3878,6 +3925,7 @@ { struct host_part *a, *b; struct path_data *p, *q; + struct prot_space_data *s, *t; /* Destroy children */ for (a = root->children; a; a = b) { @@ -3894,6 +3942,12 @@ /* Root path */ urldb_destroy_path_node_content(&root->paths); + /* Proctection space data */ + for (s = root->prot_space; s; s = t) { + t = s->next; + urldb_destroy_prot_space(s); + } + /* And ourselves */ free(root->part); free(root); @@ -3955,8 +4009,6 @@ bitmap_destroy(node->thumb); free(node->urld.title); - free(node->auth.realm); - free(node->auth.auth); for (a = node->cookies; a; a = b) { b = a->next; @@ -3981,6 +4033,21 @@ } /** + * Destroy protection space data + * + * \param space Protection space to destroy + */ +void urldb_destroy_prot_space(struct prot_space_data *space) +{ + free(space->scheme); + free(space->realm); + free(space->auth); + + free(space); +} + + +/** * Destroy a search tree * * \param root Root node of tree to destroy Index: content/urldb.h =================================================================== --- content/urldb.h (revision 8988) +++ content/urldb.h (working copy) @@ -85,7 +85,7 @@ /* Authentication modification / lookup */ void urldb_set_auth_details(const char *url, const char *realm, const char *auth); -const char *urldb_get_auth_details(const char *url); +const char *urldb_get_auth_details(const char *url, const char *realm); /* SSL certificate permissions */ void urldb_set_cert_permissions(const char *url, bool permit); --8<-- John. -- John Tytgat [email protected]
