Re: [tor-dev] Brainstorming about steganographic transports

2012-07-29 Thread Kevin P Dyer
On Wed, Jul 25, 2012 at 9:18 PM, David Fifield  wrote:
> This is a summary of some discussion among developers of pluggable
> transports about steganographic channels and deriving them from protocol
> grammars. Two things prompted the discussion:
>
> [snip]
>
> David (yours truly) wants to write or help write a simple pluggable
> transport derived from regular-expression signatures, even with the
> limitations shown above. Client and relay would need matching signature
> models. For the same of simplicity, it will only seek to match the given
> signature, and not be indistinguishable in the strong sense mentioned
> above. It won't do symmetric encryption of the underlying TLS (or if it
> does, will use a fixed key). It won't use the constructions from the
> Provably Secure Steganography paper, rather it will just encode its
> stream directly in DFA edge transitions. I think it will be interesting
> to see 1) how far a simple system can get us, and 2) what additional
> changes we would have to make to be provably secure against censors
> using more sophisticated computational models than regex.

Protocol grammars present an interesting foundation for designing
pluggable transports. As Roger knows, my co-authors and I came up with
this idea about a year ago and have since been working on realizing it
too. We call our approach "Format Transforming Encryption."

Our approach at a high level is similar to what you describe: we use
regular expressions to efficiently encode traffic on the wire. We've
been working out a lot of the challenges that need to be overcome to
make our approach feasible. As you could imagine, it's non-trivial to
produce languages that are efficient, satisfy basic security
constraints, and are able to pass through proxies. However, I'm happy
to report we have a proof-of-concept that's nearly ready to release to
the Tor community. We are in the process of preparing a research paper
for submission. Once it's ready we'll also post a technical report and
I'll point you guys to it.

At that point ---should be just a couple of weeks--- I'll be happy to
explain more details about our work, share code, etc. There will, of
course, be lots of interesting questions remaining about practical
deployment and we'd be happy to get feedback to improve our framework
and get it in shape to be deployed with Tor.

-Kevin
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Brainstorming about steganographic transports

2012-07-28 Thread Zack Weinberg
On Wed, Jul 25, 2012 at 11:56 PM, Robert Ransom  wrote:
> On 7/26/12, David Fifield  wrote:
>
>> We can use appid-like signatures to make steganographic channels, if we
>> assume that the signatures are a realistic reflection of actual use of
>> the protocols. But: this relies critically on the accuracy of the model.
>> (Specifically, does it match the censor's model? If he uses simple
>> regular expressions for blocking, then we win; if not, then we probably
>> lose.)
>
> Not quite.  If the language your syntactic model was based on is
> accepted by the particular regular expressions that the censor is
> currently using, you win (until They change to new regexps).
> Otherwise, you lose.

Some aspects of the wire traffic generated by any given protocol are
going to be wholly defined by the protocol specification (including
errata and "everyone does it this way" spackle).  Other aspects can
vary depending on higher-level input to the protocol.  It might be
useful to decompose your arithmetic-coding output model into two
stages: do the generation of said 'higher-level input'
probabilistically, then use an actual implementation of the cover
protocol to produce the fixed components.

Here's another example from StegoTorus: Right now the HTTP request
generator stuffs arbitrary base64-encoded data into the Cookie:
header, making sure to insert equals signs and semicolons to make it
syntactically valid. This isn't _wrong_ per se, because JavaScript
running on the client side can manipulate the cookies to be sent with
the next request, but it is _abnormal_: most cookie-using sites send
down a Set-Cookie header with the first page load and then the browser
will send exactly that string back on all subsequent requests.  I
doubt one could detect this with regexps, but I'm sure I could do it
with a support vector machine or something like that.  (And then we
get to have the argument about what, exactly, DPI boxen are capable
of.)

If we were using an actual implementation of HTTP we would not have
gone down this road in the first place because the client API makes
clear where you do and do not get to stick your payload.

zw
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Brainstorming about steganographic transports

2012-07-26 Thread Robert Ransom
On 7/26/12, David Fifield  wrote:

> We can use appid-like signatures to make steganographic channels, if we
> assume that the signatures are a realistic reflection of actual use of
> the protocols. But: this relies critically on the accuracy of the model.
> (Specifically, does it match the censor's model? If he uses simple
> regular expressions for blocking, then we win; if not, then we probably
> lose.)

Not quite.  If the language your syntactic model was based on is
accepted by the particular regular expressions that the censor is
currently using, you win (until They change to new regexps).
Otherwise, you lose.

For example, 
accepts “UseR :BOGUS line containing only a username with too many
spaces\n\n\n\n\n\r”, but no real IRC client will generate “UseR” (or
the other protocol violation on that line).  If They are using appid
with that particular protocol-recognition file, you win; if They
validate IRC using better regexps, you lose.


Robert Ransom
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Brainstorming about steganographic transports

2012-07-25 Thread David Fifield
This is a summary of some discussion among developers of pluggable
transports about steganographic channels and deriving them from protocol
grammars. Two things prompted the discussion:

* A program called appid, http://code.google.com/p/appid/ , which
  compiles simple protocol specifications into big regular expressions
  and uses them to classify network streams. Supported protocols are
  https://code.google.com/p/appid/source/browse/trunk/apps ; a sample is
  https://code.google.com/p/appid/source/browse/trunk/apps/afs . Roger
  suggested to make a program that takes one of these specifications and
  embeds data within a stream that appid will classify as the chosen
  protocol.

* This paper:
"Provably Secure Steganography"
http://www-users.cs.umn.edu/~hopper/tc-stego.pdf
  The authors define a "channel" as a probability distribution on
  sequences. You can think of a channel as something that answers "given
  this prior history of packets, what are the possible next packets and
  their probabilities?" or even as an oracle that merely allows sampling
  this distribution. They show a method of steganographically encoding a
  bit within a channel such that that a passive adversary cannot
  distinguish it from "normal" traffic (Sec. 4.1 and Fig. 1). For the
  topics we were talking about, the most important sections to read are
  2 and 4.1.

You can embed data in a string that will match a given regular
expression by constructing a DFA (http://swtch.com/~rsc/regexp/regexp1.html)
and walking it state by state, choosing state transition edges so as to
encode something. For example, if there are two outgoing edges from a
state, which one you choose can encode a bit. Many of the appid
signatures contain any* blocks that can straightforwardly encode bytes.
An example of this idea is the regular expression /^(ab*[cd])+/. The b*
lets us encode one bit (say "" = 0 and "b" = 1), and the choice of c or
d allows another bit ("c" = 0 and "d = 1). We could send the message
00110001 as "acabdacad".

This idea can really be thought of an application of the paper. A
regular expression defines a probabilistic channel if we assign
probabilties to edge transitions, which we may do uniformly. So, for
example, with the regular expression given above, the first byte is 'a'
with probability 1. The second byte is 'b' with probability 0.5, 'c'
with probability 0.25, and 'c' with probability 0.25. But we see how a
channel is sensitive to history: if the string so far is "ab", then the
next byte may be 'c' or 'd'; but if the string so far is "ac", then the
next byte must be 'a' (or EOF).

We can use appid-like signatures to make steganographic channels, if we
assume that the signatures are a realistic reflection of actual use of
the protocols. But: this relies critically on the accuracy of the model.
(Specifically, does it match the censor's model? If he uses simple
regular expressions for blocking, then we win; if not, then we probably
lose.) Robert Ransom gave the example of a model of HTTP using just the
syntax; i.e. using RFC 2616 as the definition of a channel. Such a thing
would not only not look like real-world HTTP, but would be broken easily
by proxies that rearrange headers, for example.

rransom also suggested a way to use arithmetic encoding to obfuscate a
stream using a probabilistic model of a protocol; I'll quote him in
part:
* send it through an arithmetic-code encoder, using a model which
  contains a (low-probability) 'pad' symbol;
* encrypt it with a stream cipher;
* send it through an arithmetic-code decoder, using a model which
  matches the 'cover protocol' (the protocol you are trying to mimic).

David (yours truly) wants to write or help write a simple pluggable
transport derived from regular-expression signatures, even with the
limitations shown above. Client and relay would need matching signature
models. For the same of simplicity, it will only seek to match the given
signature, and not be indistinguishable in the strong sense mentioned
above. It won't do symmetric encryption of the underlying TLS (or if it
does, will use a fixed key). It won't use the constructions from the
Provably Secure Steganography paper, rather it will just encode its
stream directly in DFA edge transitions. I think it will be interesting
to see 1) how far a simple system can get us, and 2) what additional
changes we would have to make to be provably secure against censors
using more sophisticated computational models than regex.

David Fifield
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev