http authentication improvement

John Tytgat Sun, 02 Aug 2009 16:33:10 -0700

I'm trying to get
<URL:http://sourceforge.net/tracker/?func=detail&aid=2830829&group_id=51719&atid=464312>
fixed and I'm seeking some feedback on what I have.


What happens: a https://example.com/foo/bar/doh/leaf.html gets fetched
after the user supplied http authentication.  That pages refers to several
stylesheets which are all located under https://example.com/ but not in
the foo/bar/doh directory.  Currently all those fetches for those
stylesheets are resulting in a 401 as in
fetch_curl.c(fetch_curl_set_options) the urldb_get_auth_details() does not
return authentication which the user supplied for leaf.html.  The 401
results in the FETCH_AUTH case in fetchcache.c(fetchcache_callback) and
that's the end for the stylesheet parsing as we'll tumble into
html_convert_css_callback()'s case CONTENT_MSG_AUTH.

Aside, what's the CONTENT_MSG_AUTH case in css.c(nscss_import) ?

The urldb_get_auth_details() fails because urldb_set_auth_details() gets
done for the path_data of https://example.com/foo/bar/doh/ (the leafname
of the URL gets stripped) so all auth retrieving fails for shorter paths.

I've been reading http://tools.ietf.org/html/rfc2617 and this learns me
that the http authentication details (username & password) are associated
with a protection space and the latter is uniquely defined by a canonical
root URL and a realm given in a 401 response for a certain URL belonging
to that space. All paths below that URL belong to the same protection
space.

It could, but does not need so, that the paths higher than that URL do
also belong to the same protection space but that's something you can not
know until you try to fetch them without authentication and get a 401 with
the same realm (assuming same canonical root here).

I've changed the urldb data structure a bit to better reflect the
protection space concept (see patch below) : in the struct host_part we
now have a linked list of struct prot_space_data structures, i.e. one per
protection space we've seen for that host (apart from its scheme and port
because that info is found in each path_data).  That structure also
contains the identifying realm and associated authentication data.  Each
struct path_data can have a pointer to one of prot_space_data structures
and this only when we know for sure it belongs to that pointed protection
space.

urldb_set_auth_details() will create a new prot_space_data structure when
a new realm is seen for a canonical host (= being host_data + scheme and
port number of that lowest path_data of given URL).  The lower path_data
will point to that new prot_space_data or an existing prot_space_data (in
the latter case its auth data will be overwritten).

urldb_get_auth_details will try to find prot_space_data in its lowest
path_data of given URL and when not found iterate upwards until it finds
one or hit struct host_part which means we are not finding a
prot_space_data at all in the path_data travested.

In that case I believe we should let urldb_get_auth_details() fail even
when the host_part has one or more prot_space_data structures associated
with and await a 401 which gives us the right realm so that we can make
the right choice for selecting prot_space_data or if it is a new realm,
offer the user the oppertunity to enter an username and password.  Note
it might very well be that we don't get a 401 at all as such an URL might
not need authentication.

I've tried to implement that but failed miserable, basically I wanted to
do a browser_window_reload() or something similar in the CONTENT_MSG_AUTH
case of browser.c(browser_window_callback) when
urldb_get_auth_details(c->url, data.auth_realm) returns something non-NULL
(btw, which would at that point have filled in the prot_space_data of
lowest path_data so that at fetch_curl_set_options() time,
urldb_get_auth_details(f->url, NULL) returns the same data even with
realm param NULL).

But even if that would somehow work, it wouldn't for loaded stylesheets
as something similar should happen in the CONTENT_MSG_AUTH case in
html_convert_css_callback(), right ? I see other CONTENT_MSG_AUTH cases
(like css.c(nscss_import) so probably even more needs to be done.

So I went for the quick hack in urldb_get_auth_details() which picks the
first prot_space_data for given host if available for the case where non
of the path_data structs of given URL has a ptr to prot_space_data.  It
works for my sf.net case mentioned above but it will result in
authentication data given for URLs which do not require them, or can
give wrong authentication (because of wrongly chosen prot_space_data).

Now the questions:

1. Is this hack acceptable for the time being ?
2. Or should we shoot for, what I think is, the right approach, i.e.
   when no prot_space_data ptr can be found in path_date structs, await
   a 401 and with the realm given at that point take the right
   prot_space_data or offer the uses the possibility to enter a new
   username/password for that realm.  But that needs to work for all
   content fetched, not only html (what I think is happening).
   TBH, this is probably above my head to implement for the time I can
   spend on.

Aside: path_data::port seems to be 0 for URLs where the port is not
specified, is that intended ?

Aside: the riscos/401login.c change allows to keep the username and
password in the 401login dialogue after a 2nd 401 (for the same realm
and canonical root).  If this is acceptable, other front-end can do
a similar thing.

Patch (long lines need to be trimmed before ci):

--8<--
Index: riscos/401login.c
===================================================================
--- riscos/401login.c   (revision 8988)
+++ riscos/401login.c   (working copy)
@@ -98,6 +98,7 @@
 {
        struct session_401 *session;
        wimp_w w;
+       const char *auth;
 
        session = calloc(1, sizeof(struct session_401));
        if (!session) {
@@ -111,10 +112,28 @@
                warn_user("NoMemory", 0);
                return;
        }
-       session->uname[0] = '\0';
-       session->pwd[0] = '\0';
+       if (realm == NULL)
+               realm = "Secure Area";
+        auth = urldb_get_auth_details(session->url, realm);
+       if (auth == NULL) {
+               session->uname[0] = '\0';
+               session->pwd[0] = '\0';
+       } else {
+               const char *pwd;
+               size_t pwd_len;
+
+               pwd = strchr(auth, ':');
+               assert(pwd && pwd < auth + sizeof(session->uname));
+               memcpy(session->uname, auth, pwd - auth);
+               session->uname[pwd - auth] = '\0';
+               ++pwd;
+               pwd_len = strlen(pwd);
+               assert(pwd_len < sizeof(session->pwd));
+               memcpy(session->pwd, pwd, pwd_len);
+               session->pwd[pwd_len] = '\0';
+       }
        session->host = strdup(host);
-       session->realm = strdup(realm ? realm : "Secure Area");
+       session->realm = strdup(realm);
        session->bwin = bw;
        if ((!session->host) || (!session->realm)) {
                free(session->host);
Index: content/fetchers/fetch_curl.c
===================================================================
--- content/fetchers/fetch_curl.c       (revision 8988)
+++ content/fetchers/fetch_curl.c       (working copy)
@@ -560,7 +560,7 @@
                SETOPT(CURLOPT_COOKIE, NULL);
        }
 
-       if ((auth = urldb_get_auth_details(f->url)) != NULL) {
+       if ((auth = urldb_get_auth_details(f->url, NULL)) != NULL) {
                SETOPT(CURLOPT_HTTPAUTH, CURLAUTH_ANY);
                SETOPT(CURLOPT_USERPWD, auth);
        } else {
Index: content/urldb.c
===================================================================
--- content/urldb.c     (revision 8988)
+++ content/urldb.c     (working copy)
@@ -132,10 +132,17 @@
        struct cookie_internal_data *next;      /**< Next in list */
 };
 
-struct auth_data {
+/* A protection space is defined by a canonical hostname and a realm.
+ * The canonical hostname is here defined by struct host_part and all
+ * its parents together with struct prot_space_data::scheme and
+ * prot_space_data::port. */
+struct prot_space_data {
+       char *scheme;           /**< URL scheme of canonical hostname of this 
protection space. */
+       unsigned int port;      /**< Port number of canonical hostname of this 
protection space. */
        char *realm;            /**< Protection realm */
        char *auth;             /**< Authentication details in form
                                 * username:password */
+       struct prot_space_data *next;   /**< Next sibling */
 };
 
 struct cache_internal_data {
@@ -161,7 +168,7 @@
        struct bitmap *thumb;   /**< Thumbnail image of resource */
        struct url_internal_data urld;  /**< URL data for resource */
        struct cache_internal_data cache;       /**< Cache data for resource */
-       struct auth_data auth;  /**< Authentication data for resource */
+       const struct prot_space_data *prot_space;       /**< Protection space 
to which this resource belongs too. Can be NULL. No ownership. */
        struct cookie_internal_data *cookies;   /**< Cookies associated with 
resource */
        struct cookie_internal_data *cookies_end;       /**< Last cookie in 
list */
 
@@ -183,6 +190,8 @@
 
        char *part;             /**< Part of host string */
 
+       struct prot_space_data *prot_space;     /**< Canonical hostname + realm 
defines a protection space. Only to be accessed when struct host_part::parent 
== NULL. */
+
        struct host_part *next; /**< Next sibling */
        struct host_part *prev; /**< Previous sibling */
        struct host_part *parent;       /**< Parent host part */
@@ -203,6 +212,7 @@
 static void urldb_destroy_path_tree(struct path_data *root);
 static void urldb_destroy_path_node_content(struct path_data *node);
 static void urldb_destroy_cookie(struct cookie_internal_data *c);
+static void urldb_destroy_prot_space(struct prot_space_data *space);
 static void urldb_destroy_search_tree(struct search_node *root);
 
 /* Saving */
@@ -925,11 +935,13 @@
  * Look up authentication details in database
  *
  * \param url Absolute URL to search for
+ * \param realm When non-NULL, it is realm which can be used to determine
+ * the protection space when that's not been done before for given URL.
  * \return Pointer to authentication details, or NULL if not found
  */
-const char *urldb_get_auth_details(const char *url)
+const char *urldb_get_auth_details(const char *url, const char *realm)
 {
-       struct path_data *p, *q = NULL;
+       struct path_data *p, *p_cur, *p_top;
 
        assert(url);
 
@@ -940,29 +952,48 @@
        if (!p)
                return NULL;
 
-       /* Check for any auth details attached to this node */
-       if (p && p->auth.realm && p->auth.auth)
-               return p->auth.auth;
+       /* Check for any auth details attached to the path_data node or any of
+        * its parents. */
+       for (p_cur = p; p_cur != NULL; p_top = p_cur, p_cur = p_cur->parent) {
+               if (p_cur->prot_space) {
+                       return p_cur->prot_space->auth;
+               }
+       }
 
-       /* Now consider ancestors */
-       for (; p; p = p->parent) {
-               /* The parent path entry is stored hung off the
-                * parent entry with an empty (not NULL) segment string.
-                * We look for this here.
-                */
-               for (q = p->children; q; q = q->next) {
-                       if (q->segment[0] == '\0')
-                               break;
+       if (realm != NULL) {
+               const struct host_part *h = (const struct host_part *)p_top;
+               const struct prot_space_data *space;
+
+               /* Search for a possible matching protection space. */
+               for (space = h->prot_space; space != NULL; space = space->next) 
{
+                       if (!strcmp(space->realm, realm)
+                                       && !strcmp(space->scheme, p->scheme)
+                                       && space->port == p->port) {
+                               p->prot_space = space;
+                               return p->prot_space->auth;
+                       }
                }
-
-               if (q && q->auth.realm && q->auth.auth)
-                       break;
        }
+       /* FIXME: HACK: the following is a hack. We're not supposed to return
+        * any authentication data at this point as we do not know if given URL
+        * belongs to a protection space and if so, to which one.  We should
+        * wait for a 401 and use its realm to determine that (and if there is
+        * no 401 for that URL, it doesn't belong to a protection space at all).
+        * But it looks that not all content types with a 401 can be refetched
+        * easily.
+        * This hack possibly leaks authentication data.
+        */
+#if 1
+       else {
+               const struct host_part *h = (const struct host_part *)p_top;
+               if (h->prot_space) {
+                       p->prot_space = h->prot_space;
+                       return p->prot_space->auth;
+               }
+       }
+#endif
 
-       if (!q)
-               return NULL;
-
-       return q->auth.auth;
+       return NULL;
 }
 
 /**
@@ -975,7 +1006,7 @@
 bool urldb_get_cert_permissions(const char *url)
 {
        struct path_data *p;
-       struct host_part *h;
+       const struct host_part *h;
 
        assert(url);
 
@@ -985,8 +1016,9 @@
 
        for (; p && p->parent; p = p->parent)
                /* do nothing */;
+       assert(p);
 
-       h = (struct host_part *)p;
+       h = (const struct host_part *)p;
 
        return h->permit_invalid_certs;
 }
@@ -1001,48 +1033,62 @@
 void urldb_set_auth_details(const char *url, const char *realm,
                const char *auth)
 {
-       struct path_data *p;
-       char *urlt, *t1, *t2;
+       struct path_data *p, *pi;
+       struct host_part *h;
+       struct prot_space_data *space, *space_alloc;
+       char *realm_alloc, *auth_alloc, *scheme_alloc;
 
        assert(url && realm && auth);
 
-       urlt = strdup(url);
-       if (!urlt)
-               return;
-
-       /* strip leafname from URL */
-       t1 = strrchr(urlt, '/');
-       if (t1) {
-               *(t1 + 1) = '\0';
-       }
-
        /* add url, in case it's missing */
-       urldb_add_url(urlt);
+       urldb_add_url(url);
 
-       p = urldb_find_url(urlt);
+       p = urldb_find_url(url);
 
-       free(urlt);
-
        if (!p)
                return;
 
-       /** \todo search subtree for same realm/auth details
-        * and remove them (as the lookup routine searches up the tree) */
+       /* Search for host_part */
+       for (pi = p; pi->parent != NULL; pi = pi->parent)
+               ;
+       h = (struct host_part *)pi;
 
-       t1 = strdup(realm);
-       t2 = strdup(auth);
+       /* Search if given URL belongs to a protection space we already know 
of. */
+       for (space = h->prot_space; space; space = space->next) {
+               if (!strcmp(space->realm, realm)
+                               && !strcmp(space->scheme, p->scheme)
+                               && space->port == p->port)
+                       break;
+       }
 
-       if (!t1 || !t2) {
-               free(t1);
-               free(t2);
-               return;
+       if (space != NULL) {
+               /* Overrule existing auth. */
+               free(space->auth);
+               space->auth = strdup(auth);
+       } else {
+               /* Create a new protection space. */
+               space = space_alloc = malloc(sizeof(struct prot_space_data));
+               scheme_alloc = strdup(p->scheme);
+               realm_alloc = strdup(realm);
+               auth_alloc = strdup(auth);
+
+               if (!space_alloc || !scheme_alloc || !realm_alloc || 
!auth_alloc) {
+                       free(space_alloc);
+                       free(scheme_alloc);
+                       free(realm_alloc);
+                       free(auth_alloc);
+                       return;
+               }
+
+               space->scheme = scheme_alloc;
+               space->port = p->port;
+               space->realm = realm_alloc;
+               space->auth = auth_alloc;
+               space->next = h->prot_space;
+               h->prot_space = space;
        }
 
-       free(p->auth.realm);
-       free(p->auth.auth);
-
-       p->auth.realm = t1;
-       p->auth.auth = t2;
+       p->prot_space = space;
 }
 
 /**
@@ -1067,6 +1113,7 @@
 
        for (; p && p->parent; p = p->parent)
                /* do nothing */;
+       assert(p);
 
        h = (struct host_part *)p;
 
@@ -3878,6 +3925,7 @@
 {
        struct host_part *a, *b;
        struct path_data *p, *q;
+       struct prot_space_data *s, *t;
 
        /* Destroy children */
        for (a = root->children; a; a = b) {
@@ -3894,6 +3942,12 @@
        /* Root path */
        urldb_destroy_path_node_content(&root->paths);
 
+       /* Proctection space data */
+       for (s = root->prot_space; s; s = t) {
+               t = s->next;
+               urldb_destroy_prot_space(s);
+       }
+
        /* And ourselves */
        free(root->part);
        free(root);
@@ -3955,8 +4009,6 @@
                bitmap_destroy(node->thumb);
 
        free(node->urld.title);
-       free(node->auth.realm);
-       free(node->auth.auth);
 
        for (a = node->cookies; a; a = b) {
                b = a->next;
@@ -3981,6 +4033,21 @@
 }
 
 /**
+ * Destroy protection space data
+ *
+ * \param space Protection space to destroy
+ */
+void urldb_destroy_prot_space(struct prot_space_data *space)
+{
+       free(space->scheme);
+       free(space->realm);
+       free(space->auth);
+
+       free(space);
+}
+
+
+/**
  * Destroy a search tree
  *
  * \param root Root node of tree to destroy
Index: content/urldb.h
===================================================================
--- content/urldb.h     (revision 8988)
+++ content/urldb.h     (working copy)
@@ -85,7 +85,7 @@
 /* Authentication modification / lookup */
 void urldb_set_auth_details(const char *url, const char *realm,
                const char *auth);
-const char *urldb_get_auth_details(const char *url);
+const char *urldb_get_auth_details(const char *url, const char *realm);
 
 /* SSL certificate permissions */
 void urldb_set_cert_permissions(const char *url, bool permit);
--8<--

John.
-- 
John Tytgat
[email protected]

http authentication improvement

Reply via email to