Here I send to you a draft of the protocol, but there are a lot of work
to do yet.
Numbers and lengths are drafts too.
Gabriel.
Ben Laurie wrote:
>
> Gabriel Belingueres wrote:
> >
> > Hi,
> >
> > Talking in the sci.crypt newsgroup, I did have an
> > idea about how to do the Web more secure against traffic analysis. The
> > idea come from a paper I been reading ("Analysis of the SSL 3.0
> > protocol" by B. Schneier and D. Wagner). They describe how an attacker
> > can guess the pages you have been accessed by looking the lengths of the
> > SSL messages exchanged in the HTTPS's requests and replys.
> > The idea I was thinking is to add a tiny protocol between HTTP and SSL,
> > to break the 1-to-1 mapping between HTTP and SSL messages. The mapping
> > now would be in a random way.
>
> ?? How?
>
> > Could anybody give me your impressions about that idea?
> > Should I continue further designing the protocol, or you think that
> > nobody cares about web traffic analysis?
>
> It is interesting, but I don't see how you propose to defeat it.
>
> Cheers,
>
> Ben.
>
> --
> http://www.apache-ssl.org/ben.html
>
> "My grandfather once told me that there are two kinds of people: those
> who work and those who take the credit. He told me to try to be in the
> first group; there was less competition there."
> - Indira Gandhi
> ______________________________________________________________________
> OpenSSL Project http://www.openssl.org
> Development Mailing List [EMAIL PROTECTED]
> Automated List Manager [EMAIL PROTECTED]
--
Gabriel Belingueres
Providing protection against traffic analysis on the Web
========================================================
The basic problem is that an attacker can guess the page the user is viewing because
the interaction with the server and the lengths of the messages are known to him/her.
The idea behind this protocol is to break the one-to-one mapping between HTTP messages
and SSL connections.
As noted in [CHENG], the "pattern" of messages of the HTTP protocol is that:
1) First, the client perform a request of a web page (GET /index.html...)
2) The server answer to the client, providing him/her with the HTML file, using the
same socket.
3) The client process the HTML file and issues the corresponding requests of the
objects that the HTML file contains. The client establishes ONE SSL connection per
HTML GET request.
4) The server answer the requests concurrently.
The attacker is based in both this "interaction pattern" and the message lengths to do
your attack.
In the face of that, is very important that one of the basic requeriments of the
protocol is to provide the SAME interaction pattern as the HTTPS protocol normally do,
together with the length "masking" of the transmitted messages.
What the protocol will do is basicly receive the HTTP messages (from the upper layer),
breaking each in a pseudo-random quantity of fragments. Then open a pseudo-random
quantity of SSL connections and send all the fragments through a pseudo-random chosen
connection.
Of course, that pseudo-random numbers generated by the server has to be regenerated by
(or communicated to) the client in order to read from the appropiate SSL connection,
and reordering and reassembling of the fragments to obtain the transmitted files. Once
done that, the protocol in the client will pass the files to the upper layer (HTTP) to
show in the browser.
The layering of the protocol would be something like this:
+------------------+
| HTTPS |
+------------------+
| This protocol |
+------------------+
| SSL |
+------------------+
| TCP/IP |
+------------------+
How it works
============
Bellow there is an interaction diagram in a typical HTTPS connection, using the
protocol decrypted here. The example shows a HTML page that contains 9 images.
HTTPS client HTTPS server
1)
----------------- SSL's ClientHello + [Code] ----------------->
<-------------------- SSL's ServerHello -----------------------
...Completion of the SSL Handshake, resulting in a...
<---------------- SSL connection established ----------------->
2)
-------- GET /index.html HTTP 1.1 Host: www.ibm.com ---------->
<------------------------- HTML file --------------------------
<------------------------- HTML file --------------------------
...
<--------------- HTML file + [ACK] + [Padding] ----------------
3)
c1 ---- GET /file9.gif ; GET /file5.gif ; GET /file1.gif ----->
c2 --------------------- GET /file6.gif ; GET /file2.gif ----->
c3 ------------------------ DECOY ---------------------------->
...
cN --------------------- GET /file8.gif ; GET /file4.gif ----->
4)
c1 <------ file1.gif/h; file9.gif/h; file8.gif/f1; ... -------
c2 <------ file2.gif/h; file1.gif/f1; file9.gif/f1; ... -------
c3 <------ file3.gif/h; file2.gif/f1; file1.gif/f2; ... -------
...
cN <------ file8.gif/h; file7.gif/f1; file6.gif/f2; ... -------
Step 1
------
The user clicks in a link with the URL https://www.ibm.com/index.html.
The browser has to set a SSL connection in order to send the HTTP's GET request, so it
initiates the SSL Handshake, as always.
But this time, at the end of the SSL's CLientHello message, the browser writes a
"code" that tells the server it want the request be served using this protocol.
Writing extra data after the compression methods list in the ClientHello message is
legal [TLS], and it is included in the SSL's handshake hashes, so any modification of
ClientHello is detectable by the communicating parties at the end of the handshake.
This code in the ClientHello message is ignored by SSL 3.0 y TLS 1.0 as we know them
now. In this way, this protocol is interoperable with existing Web servers and web
browsers. The only party that can send he code is the client, so if the code is sent,
the SSL layer of the web server will ignore it and will continue normally.
Although the client send the code in the clear, it doesn't means that the HTTP
interaction will be served using this protocol, because the server confirmation will
be sent in a secure way, as described in the Step 2.
The SSL or TLS protocol has to be modified to recognize the extra data as "data for
the upper layer protocol". Technically, this is not a "layer invasion" since the data
is not interpreted in any way by SSL or TLS.
However, the extra data in the ClientHello message is intended for "backward
compatibility" with future versions of TLS, not for provide with extra data to upper
layers. I propose to add an "Escape" code for this purpose. This Escape code has to be
standardized in the RFC of TLS in order to be useful in other contexts. In the same
way, the data that follows the Escape code must be standardized too.
It is convenient that the SSL or TLS version number (3.0 and 3.1) doesn't change,
because if it does, it would be revealing that the server is prepared to accept this
protocol, because it would show the number in the ServerHello message, witch can
travels in the clear. We don't want to reveal this information.
I propose that the extra data provided by SSL stay first, then the Escape code and the
data for the upper layer protocol, in other words, the ClientHello message would have
the following data (see [TLS] to see the message structure):
client_version + random + session_id + cipher_suites + compression_methods +
"tls_data" + ESC + "upper_layer_data"
Web server implementations that not provides any kind of added functionality using an
intermediate protocol like this, MUST NOT set that the implementation of SSL it is
using forward to the upper layer the extra data after the Escape code.
When the SSL handshake is done, the SSL connection has been established.
Step 2
------
Now the web browser issues a GET request for a html file, in the usual way.
When the web server receives the request, it returns the html file as always. But
after the EOF, this protocol appends a ACK message (witch one?), that means that the
web server supports this protocol and will send the data requested by the client in a
special way, as described later in the Step 4.
In the client side, when it receives the reply from the server, it scans after the EOF
if came something extra. If yes, the client knows that the server will send the HTML
data in a way that in difficult to guess for a traffic analysis attack (I hope!).
The ACK message is transmitted in a secure way. This means that the SSL CipherSuite
agreed between the client and server is encrypted, i.e. the CipherSuite agreed MUST
NOT be TLS_RSA_NULL_WITH_MD5 or TLS_RSA_NULL_WITH_SHA.
If nothing comes after the EOF of the HTML file before the SSL connection closes,
then it means that the server doesn't support this protocol (or don't want to provide
us the service).
The client then set how many SSL connections the has to open, (cN in the figure). The
client chooses this random number following a uniform probability distribution (UPD).
The client could have a Lower Limit (1? or more?) and a Upper Limit for this number.
The Upper Limit has to be chosen in such a way that does not be suspicious for the
attacker (witch number?).
After the ACK comes a random padding (how long?) following a UPD, to provide a
"masking" of what web page has been accessed (in addition of that of SSL). I seems to
be the only choice, providing we have to respect the interaction pattern of the
HTTPS's GET reply.
Step 3
------
Having set the cN connections, the client sends your requests for the HTML page's
embedded objects.
The client wait for the web browser to send to it all the GETs it wants. Once done
that, the protocol select at random some "dummy" connections (DECOY) (Hoy many is
good?). This selection is done with a random number between [1, cN] using a UPD.
The protocol concatenate the GETs into the non-dummy connections, taking a "circular"
strategy.
The dummy connections are treated separately. They transport a random sequence of
values using a UPD with length in the range [70,140] (using a UPD) (numbers need
tests).
In ALL the cases, at the end of the all data in each one of the connection is added a
random padding (using a UPD).
Step 4
------
The server receives the GETs from the cN connections opened from the client. While is
receiving data from the connections, the protocol could parse a message from the
client and obtaining the GETs to throw to the upper layer.
Then, the web server will send to the protocol the header of each of the objects
requested by the client in a "circular" strategy.
Each of the data objects are broken in cN fragments of random length. If the quantity
of fragments is lower than cN, then a dummy, random length fragment is generated until
complete the cN fragments.
Those fragments are sent "circularly" starting in the next connection (the one next to
the one with the last header).
If the quantity of connections (cN) is bigger than the total number of fragments, then
now a "dummy fragment" is sent in the rest of the connections.
In both cases, at the end of the last fragment of each of the connections, a random
padding is added, as in Step 3.
This is done until the end of the object's data is sent.
Structures
==========
The data structures are not defined yet. First I want to know if the overall protocol
and security provided is good enough.
Security considerations
=======================
First of all, this protocol is completely based in the security provided by SSL.
Although, it recommends some changes to it, this protocol could work anyway without
those changes.
Turning on the web browser cache helps mitigate web traffic analysis as said in
[CHENG], in addition of save network bandwidth and downloading time.
I recommend turning on the cache when somebody want to protect your web surfing
against traffic analysis, but at least the first time the HTML page and their embedded
objects has to be downloaded. Furthermore, after a few days, the cache will expire,
and the pages will be erased, so an attacker with pacience will carry out a successful
attack sooner or later.
Providing TLS with a random padding for the CipherSuites using a stream cipher is
required, as stated in [WAGNER] and [CHENG].
(Somebody told me one time that you can use block ciphers whenever you want, but that
isn't the answer to the problems.)
Since this protocol is independent of both the (upper) application protocol and the
(lower) secure transport protocol, the secure transport protocol MUST warrant that the
data delivered is padded in the same way, independently of witch one was the
CipherSuite chosen.
The information that a web server from a given company supports this protocol may be
difficult to maintain secret, because of marketing affairs (the company want to
differentiate itself from its competitors, adding a value-added service) or because of
disloyal employees.
Only the information that a given request be served by this protocol it keep it secret
(because it travels encrypted).
The CipherSuites that only authenticates messages, but not encrypt it, such as
TLS_RSA_NULL_WITH_MD5 or TLS_RSA_NULL_WITH_SHA MUST NOT be allowed by the web service
providing this service, because if it allowed, and attacker can set one of those
CipherSuites as your preferred choice, and then foil all this protocol.
The SSL protocol must be configured not to accept this kind or ciphersuites.
The Code to request the server to use this protocol will probably sent in the clear,
but that only means that the client want to use it. The ACK
The minimum length GET request is something like that:
GET /A HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.02 [en] (WinNT; U)
Host: a.ar:1999
Accept: */*
Accept-Languaje: en
Accept-Charset: *
Witch is about 140 bytes, I think that a random padding between [0,128] bytes is
enough.
The DECOY (dummy message) must be at more or less between [140,1024] bytes (140
minimum because of the minimum GET request).
The dummy fragment (at Step 4) length must be at more or less between [1,1024] bytes.
(This numbers are not definitive, still need empirical tests).
Advantages
==========
1) Interoperable with existing web servers and browsers.
2) The use of this protocol does not implies the modification of neither the
specification nor implementation of the HTTPS protocol. The API in which HTTPS is
based is the same one that that of SSL (I think).
3) Because of the data of the upper layer is not interpreted in any way, the protocol
can be used with any other application protocol, such as SSMTP, SPOP3, etc., with the
only condition that the application protocol make a SSL protected client request and a
SSL protected server reply as a minimum.
4) The protocol is "stateless". It means that it have not to save information
regarding any prior of future HTTPS's request and reply.
Disadvantages
=============
1) The protocol consumes extra memory, because it has to retain the objects received
while the fragments received don't be in order.
2) The SSL implementation that support this protocol has to be changed for providing
to this protocol with the extra data carried by the SSL's ClientHello message.
3) It don't provide anonymity of the parties, just the impossibility of infer which
was the accessed file.
4) Another layer between HTTPS and the Sockets API, witch adds some overhead.
References
==========
[CHENG] Cheng and Avnur, "Traffic Analysis of SSL encrypted browsing".
[TLS] Dierks and Allen, RFC 2246, "TLS protocol v1.0".
[WAGNER] Wagner and Schneier, "Analysis of the SSL 3.0 protocol".
Author
======
Gabriel Belingueres
[EMAIL PROTECTED]