Hi Daniel,

I've amended the previous patches based on your feedback. Well, the first patch 
only. I re-attach the second one without modifications.

On 08/10/2015 06:06 PM, Daniel Kahn Gillmor wrote:

There is no pointer or reference here to what the "correct HSTS database
format used by Wget" is.  providing a brief pointer (e.g. "look at file
doc/FOO in the wget source" or "see https://example.org/wget-hsts-db";)
would be useful.


I have now added a concise description of how the HSTS database should look 
like.

Is it possible that a user would want to decouple reading from
(respecting) an HSTS file and writing to it?

During development, neither Tim nor I ever talked about separating read and 
write. It just didn't happen.

But I guess you're right. There should be a way of separating reads and writes. 
The default behaviour should still be kept the same, but there should be a way 
of decoupling those processes, as you say. It should not be prohibitively 
expensive, and someone might benefit from such features.

I've been thinking about a couple of possibilities here, based on the examples 
you provided. Discussion starts here ;-)

    * Make the HSTS database read-only. Load the HSTS entries contained there 
but do not rewrite the file. This could be governed by an extra command-line 
switch, like '--hsts-read-only'. It could be passed together with 
'--hsts-file', for example:

        wget --hsts-read-only --hsts-file=~/my-fixed-hsts-file

    * Do not use an HSTS database at all. HSTS would be handled internally but 
there would be no interaction with the file system. Something like 
'--no-hsts-file'.

    * <Your option here>

This sounds like there might still be a race condition where one
process's changes get clobbered.  can we make any atomicity guarantees
for users who might be concerned about this?

You're right. My fault not to take this into account. This could be fixed with 
flock/fcntl. I think they're both in gnulib.

These last two issues require code changes. I'll take the responsibility to fix 
them, but outside of GSoC. The first one requires a bit of consensus before 
coding anything, but the second one seems a bit more straightforward. For now, 
I attach the two previous patches. The first one (the HSTS docs) amended with 
dkg's suggestions. If there're no more complaints, I think they can be pushed, 
because Wget's behaviour has not changed yet. When we implement any of the 
ideas proposed above, we'll update the docs.

I'd normally characterize these threats as "privacy concerns", not
necessarily "security concerns".  users of wget might understand them
most closely as offering "cookie-equivalent" mechanisms for some
http/https clients.


I think that's a widespread misconception. It's true many users map HSTS to HTTP cookies, 
but IMO that's a mistake. HTTP cookies and HSTS might be similar in some aspects, but 
they're two mechanisms that were designed for different purposes. HTTP cookies bridge the 
gap between HTTP's traditional stateless nature, and the stateful needs of modern 
Internet users. On the other hand, HSTS was conceived as a workaround for some security 
threats. It's true that these threats might more specifically target privacy, but I think 
it's an error for us, GNU developers, to keep on feeding the "HSTS == cookie" 
misconception.


Regards,
- AJ
>From a62468f652cc5ba94809e55bb9b187e38b05a9c4 Mon Sep 17 00:00:00 2001
From: Ander Juaristi <[email protected]>
Date: Sat, 8 Aug 2015 19:40:49 +0200
Subject: [PATCH 1/2] Extra debug traces for HSTS.

 * src/main.c (load_hsts, save_hsts): added DEBUGP() calls to signal
   reads and saves of the HSTS database file.
---
 src/main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/main.c b/src/main.c
index 61accfe..c6944d9 100644
--- a/src/main.c
+++ b/src/main.c
@@ -174,6 +174,8 @@ load_hsts (void)
 
       if (filename)
         {
+          DEBUGP (("Reading HSTS entries from %s\n", filename));
+
           hsts_store = hsts_store_open (filename);
 
           if (!hsts_store)
@@ -195,6 +197,9 @@ save_hsts (void)
     {
       char *filename = get_hsts_database ();
 
+      if (filename)
+        DEBUGP (("Saving HSTS entries to %s\n", filename));
+
       hsts_store_save (hsts_store, filename);
       hsts_store_close (hsts_store);
 
-- 
1.9.1

>From 4dbe3e40e7c8dfdbcbcd6539d1c2f6b5605c7d40 Mon Sep 17 00:00:00 2001
From: Ander Juaristi <[email protected]>
Date: Tue, 18 Aug 2015 00:45:36 +0200
Subject: [PATCH 2/2] Updated HSTS documentation

 * doc/wget.texi: updated HSTS documentation.

   Reported-by: Daniel Kahn Gillmor <[email protected]>
---
 doc/wget.texi | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index d2ff7dc..e3d2b46 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -1669,8 +1669,9 @@ form-based authentication.
 
 @cindex SSL
 To support encrypted HTTP (HTTPS) downloads, Wget must be compiled
-with an external SSL library. The current default is GnuTLS. If Wget is compiled
-without SSL support, none of these options are available.
+with an external SSL library. The current default is GnuTLS.
+In addition, Wget also supports HSTS (HTTP Strict Transport Security).
+If Wget is compiled without SSL support, none of these options are available.
 
 @table @samp
 @cindex SSL protocol, choose
@@ -1800,6 +1801,71 @@ read random data from EGD socket specified using this option.
 If this option is not specified (and the equivalent startup command is
 not used), EGD is never contacted.  EGD is not needed on modern Unix
 systems that support @file{/dev/urandom}.
+
+@cindex HSTS
+@item --no-hsts
+Wget supports HSTS (HTTP Strict Transport Security, RFC 6797) by default.
+Use @samp{--no-hsts} to make Wget act as a non-HSTS-compliant UA. As a
+consequence, Wget would ignore all the @code{Strict-Transport-Security}
+headers, and would not enforce any existing HSTS policy.
+
+@item --hsts-file=@var{file}
+By default, Wget stores its HSTS database in @file{~/.wget-hsts}.
+You can use @samp{--hsts-file} to override this. Wget will use
+the supplied file as the HSTS database. Such file must conform to the
+correct HSTS database format used by Wget. If Wget cannot parse the provided
+file, the behaviour is unspecified.
+
+The Wget's HSTS database is a plain text file. Each line contains an HSTS entry
+(ie. a site that has issued a @code{Strict-Transport-Security} header and that
+therefore has specified a concrete HSTS policy to be applied). Lines starting with
+a dash (@code{#}) are ignored by Wget. Please note that in spite of this convenient
+human-readability hand-hacking the HSTS database is generally not a good idea.
+
+An HSTS entry line consists of several fields separated by one or more whitespace:
+
+@code{<hostname> SP [<port>] SP <include subdomains> SP <created> SP <max-age>}
+
+The @var{hostname} and @var{port} fields indicate the hostname and port to which
+the given HSTS policy applies. The @var{port} field may be zero, and it will, in
+most of the cases. That means that the port number will not be taken into account
+when deciding whether such HSTS policy should be applied on a given request (only
+the hostname will be evaluated). When @var{port} is different to zero, both the
+target hostname and the port will be evaluated and the HSTS policy will only be applied
+if both of them match. This feature has been included for testing/development purposes only.
+The Wget testsuite (in @file{testenv/}) creates HSTS databases with explicit ports
+with the purpose of ensuring Wget's correct behaviour. Applying HSTS policies to ports
+other than the default ones is discouraged by RFC 6797 (see Appendix B "Differences
+between HSTS Policy and Same-Origin Policy"). Thus, this functionality should not be used
+in production environments and @var{port} will typically be zero. The last three fields
+do what they are expected to. The field @var{include_subdomains} can either be @code{1}
+or @code{0} and it signals whether the subdomains of the target domain should be
+part of the given HSTS policy as well. The @var{created} and @var{max-age} fields
+hold the timestamp values of when such entry was created (first seen by Wget) and the
+HSTS-defined value 'max-age', which states how long should that HSTS policy remain active,
+measured in seconds elapsed since the timestamp stored in @var{created}. Once that time
+has passed, that HSTS policy will no longer be valid and will eventually be removed
+from the database.
+
+If you supply your own HSTS database via @samp{--hsts-file}, be aware that Wget
+may modify the provided file if any change occurs between the HSTS policies
+requested by the remote servers and those in the file. When Wget exists,
+it effectively updates the HSTS database by rewriting the database file with the new entries.
+
+If the supplied file does not exist, Wget will create one. This file will contain the new HSTS
+entries. If no HSTS entries were generated (no @code{Strict-Transport-Security} headers
+were sent by any of the servers) then no file will be created, not even an empty one. This
+behaviour applies to the default database file (@file{~/.wget-hsts}) as well: it will not be
+created until some server enforces an HSTS policy.
+
+Care is taken not to override possible changes made by other Wget processes at
+the same time over the HSTS database. Before dumping the updated HSTS entries
+on the file, Wget will re-read it and merge the changes.
+
+Using a custom HSTS database and/or modifying an existing one is discouraged.
+For more information about the potential security threats arised from such practice,
+see section 14 "Security Considerations" of RFC 6797, specially section 14.9
+"Creative Manipulation of HSTS Policy Store".
 @end table
 
 @cindex WARC
-- 
1.9.1

Reply via email to