[tor-dev] Multiple Formats in Marionette

2018-07-27 Thread John Helmsen
Flipchan,

Thanks for letting me know about Layerprox. I hadn't heard of it before.

Currently, as you probably know, Marionette does one format at a time.
However, Marionette is hierarchical in its format construction, and
therefore we can do random selection between formats by considering each
format as a 'sub-format' of the overarching format.  Look at web_sess443 in
our current release as a way that this might be done.

As we continue to go forward we will be updating the formats.  Exactly when
depends on funding and other priorities.  We recognize that an update is
needed.  You don't have to wait for us to do them, though.  If you make a
format we would be happy to consider it for integration in the repository.

If you want to put a wrapper around Marionette to give it extended
switching ability, we have implemented the PT2.X standard, so feel free to
integrate it into your code that calls it.  Just let us know that you have
integrated it, so we can add you to our list of people that we have
integrated with.

-- 
John Helmsen
john.helm...@redjack.com
C: (240) 899-5676
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] tor-dev Digest, Vol 90, Issue 32

2018-07-27 Thread flipchan
Regarding implementing Marionette.

It's a great project and a great way to use fte! Worked on a fork of it called 
layerprox a while ago , however, here is my question: Marionette has a dsl that 
you write "formats" in that generates traffic patterns, is the idea to randomly 
switch between these formats or use the same all the time ? Also is the formats 
automatically gonna be updated ? 

Take care 
/flipchan 

Ps


I'm sorry for awnsering all emails in this thread (my email client is not the 
greatest)

On July 24, 2018 8:48:22 PM UTC, tor-dev-requ...@lists.torproject.org wrote:
>Send tor-dev mailing list submissions to
>   tor-dev@lists.torproject.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>   https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>or, via email, send a message with subject or body 'help' to
>   tor-dev-requ...@lists.torproject.org
>
>You can reach the person managing the list at
>   tor-dev-ow...@lists.torproject.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of tor-dev digest..."
>
>
>Today's Topics:
>
>   1. Re: Ready to Integrate/Review New Marionette Version into Tor
>  (John Helmsen)
>   2. Re: Ready to Integrate/Review New Marionette Version into Tor
>  (David Fifield)
>   3. Re: Ready to Integrate/Review New Marionette Version into Tor
>  (John Helmsen)
>   4. Re: Ready to Integrate/Review New Marionette Version into Tor
>  (David Fifield)
>   5. Re: Proposal 295: Using the ATL construction for relay
>  cryptography (solving the crypto-tagging attack) (Taylor Yu)
>
>
>--
>
>Message: 1
>Date: Tue, 24 Jul 2018 11:42:08 -0400
>From: John Helmsen 
>To: John Helmsen ,
>   tor-dev@lists.torproject.org, a...@0x90.dk, Ben Johnson
>   
>Subject: Re: [tor-dev] Ready to Integrate/Review New Marionette
>   Version into Tor
>Message-ID:
>   
>Content-Type: text/plain; charset="utf-8"
>
>David,
>
>Thank you, I have created the ticket as #26920.
>https://trac.torproject.org/projects/tor/ticket/26920#ticket.  Having
>downloaded the git project, it seems that this work cannot be performed
>on
>a Mac, since it doesn't run 'runc'.  Is that right?
>
>Ben,
>
>I am currently trying to create a virtual machine using Ubuntu 16.04
>for
>development.  Unless I am mistaken, this work cannot be done on a Mac.
>Please do the same, so that we can put this thing to bed.
>
>
>On Mon, Jul 23, 2018 at 10:05 PM, David Fifield 
>wrote:
>
>> On Fri, Jul 20, 2018 at 04:12:21PM -0400, John Helmsen wrote:
>> > We are in the process of writing the documentation for Marionette,
>but
>> the
>> > documentation on the web page should be sufficient for at least
>getting
>> a full
>> > evaluation started.  We'd like to have the evaluation complete by
>the
>> end of
>> > next month, hopefully the middle of next month, and stand ready to
>make
>> any and
>> > all changes necessary.
>> >
>> > A full set of documentation will also be written for designing your
>own
>> > protocols.  This is in process.
>> >
>> > Please let us know what you need.
>>
>> The Tor Browser developers may have more specific requests, but I can
>> suggest some steps to get started.
>>
>> Open a ticket at https://trac.torproject.org/ for discussion and to
>> track progress.
>> Type: project
>> Component: Applications/Tor Browser
>> Keywords: marionette
>> The old ticket for FTE is a good reference:
>https://bugs.torproject.org/
>> 10362
>>
>> And then it would help if you port your build process to the Tor
>Browser
>> build system. General information:
>> https://trac.torproject.org/projects/tor/wiki/doc/TorBrowser/Hacking
>> First, just build
>> git clone https://git.torproject.org/
>> builders/tor-browser-build.git
>> cd tor-browser-build
>> git checkout tbb-8.0a9-build3
>> make testbuild # or, e.g., testbuild-linux-x86_64
>> Then you'll have to add a new project (consisting of a "build" and
>> "config" file) for Marionette and each of its dependencies. You can
>copy
>> from existing projects as templates. Here is the meek project, for
>> example:
>> https://gitweb.torproject.org/builders/tor-browser-build.
>> git/tree/projects/meek
>> You'll also need to add bridge lines to:
>> https://gitweb.torproject.org/builders/tor-browser-build.
>> git/tree/projects/tor-browser/Bundle-Data/PTConfigs/bridge_prefs.js
>> To build just one project, not an entire release, do e.g.:
>> rbm/rbm build gmp --target testbuild --target
>> torbrowser-linux-x86_64
>> rbm/rbm build marionette --target testbuild --target
>> torbrowser-linux-x86_64
>>
>
>
>
>-- 
>John Helmsen
>john.helm...@redjack.com
>C: (240) 899-5676
>-- next part --
>An HTML attachment was scrubbed...
>URL:
>
>
>

Re: [tor-dev] WTF-PAD and the future

2018-07-27 Thread Mike Perry
George Kadianakis:
> Hello Mike,
> 
> I had a talk with Marc and Mohsen today about WTF-PAD. I now understand
> much more about WTF-PAD and how it works with regards to histograms.  I
> think I might even understand enough to start some sort of conversation
> about it:
> 
> Here are some takeaways:
> 
> 1) Marc and Mohsen think that WTF-PAD might not be the way forward
>because of its various drawbacks and its complexity. Apparently there
>are various attacks on WTF-PAD that Roger has discovered (SENDME
>cells side-channels?) and also the deep learning crowd has done some
>pretty good damage to the WTF-PAD padding (90%-60% accuracy?). They
>also told me that achieving needed precision on the timings might be
>a PITA.

Are there citations for any of this? Last I heard Matt Wright was
working on a deep learning study but the results were mixed.

Furthermore, we need to do adversarial learning and other optimizations
on these histograms to tune them. They are a generalized approach. Just
like it is not a valid evaluation to train a classifier on a dataset and
then add a new defense and show that it can't classify the defended
traffic using the old model, it is similarly not accurate to develop an
attack on WTF-PAD with a new classifier without also adversarially
optimizing the WTF-PAD histograms under that classifier. When you do
this, your results are not invalidating WTF-PAD, they are only
invalidating the histograms that were tuned against the previous
classifier/attack.

The same thing applies to the SENDME concern. The core piece of the
SENDME issue is "Tor should never send more than 1000 cells without a
SENDME. So *IF* I can tell which cells are SENDMEs, and *IF* I see more
than 1000 cells between them, then AHA I know that some cells are
actually padding and not real traffic".

Both of these are very big *IF*s, and even if they were shown to be
valid assumptions (which AFAIK they have not been), that does not mean
that it is actually useful for a classifier to know the percentage of
padding after 1000 cells, and it also does not mean that there isn't a
simple tweak to the histograms that encodes what looks like SENDME
transmission to that classifier.

> 2) From what I understand you are also hoping to use WTF-PAD to protect
>against circuit fingerprinting and not just website
>fingerprinting. They told me that while this might be plausible,
>there is no current research on how well it can achieve that.  Are we
>hoping to do that? And what research remains here? How can I help?
>Which parts of the Tor circuit protocol are we hoping to hide?

I am designing WTF-PAD to be a framework for deploying padding against
arbitrary traffic analysis attacks. It is meant to allow us to define
histograms on the fly (in the Tor consensus) as these are studied. The
fact that they have not yet been studied is not super relevant to
deploying the framework for it now.

> 3) Marc and Mohsen suggested using application-layer defences because
>the application-layer has much better view of the actual structures
>that are sent on the wire, instead of the black box view that the
>network layer has.
> 
>In particular they were mainly concerned about onion services
>fingerprinting because they are part of a restricted closed world,
>whereas they were less concerned about the entire internet because of
>its vast size.
> 
>They suggested that we could investigate using the service-side
>"alpaca" library for onion services (e.g. as part of securedrop?)
>which should resolve the most pressing concern of HS identification.

I mean yeah application-layer defenses are useful for website traffic
fingerprinting, but that is a very narrow slice of the traffic analysis
problems that I want this framework to solve.

WTF-PAD also doesn't rule out hidden service operators using alpaca,
either. 

> 4) They also told me of research by Tobias Pulls which eliminates the
>needs for histograms in WTF-PAD and instead it samples from the
>probability distribution directly. They think that this can simplify
>things somewhat. Any thoughts on this?

Yes this is actually exactly what I want to do with the next iteration
of WTF-PAD! The question is what form/model to use for these probability
distributions. Right now we're encoding inter-burst and inter-packet
timings with some weird geometric distribution determining how long
these bursts should go on for, when it might be more natural to encode
and sample from length-based distributions/histograms.

(Histograms vs distribution is not the problem -- its what they encode
and how they encode it that matters).

I don't see this paper on Tobias's website. Is it up anywhere yet?
 
> Let me know what you think. I still don't understand the entire space
> completely yet, so please be gentle. ;) 

I hope I was gentle enough. If there's anything that triggers rage mode
in me me more than someone being wrong on the inte

[tor-dev] WTF-PAD and the future

2018-07-27 Thread George Kadianakis
Hello Mike,

I had a talk with Marc and Mohsen today about WTF-PAD. I now understand
much more about WTF-PAD and how it works with regards to histograms.  I
think I might even understand enough to start some sort of conversation
about it:

Here are some takeaways:

1) Marc and Mohsen think that WTF-PAD might not be the way forward
   because of its various drawbacks and its complexity. Apparently there
   are various attacks on WTF-PAD that Roger has discovered (SENDME
   cells side-channels?) and also the deep learning crowd has done some
   pretty good damage to the WTF-PAD padding (90%-60% accuracy?). They
   also told me that achieving needed precision on the timings might be
   a PITA.

2) From what I understand you are also hoping to use WTF-PAD to protect
   against circuit fingerprinting and not just website
   fingerprinting. They told me that while this might be plausible,
   there is no current research on how well it can achieve that.  Are we
   hoping to do that? And what research remains here? How can I help?
   Which parts of the Tor circuit protocol are we hoping to hide?

3) Marc and Mohsen suggested using application-layer defences because
   the application-layer has much better view of the actual structures
   that are sent on the wire, instead of the black box view that the
   network layer has.

   In particular they were mainly concerned about onion services
   fingerprinting because they are part of a restricted closed world,
   whereas they were less concerned about the entire internet because of
   its vast size.

   They suggested that we could investigate using the service-side
   "alpaca" library for onion services (e.g. as part of securedrop?)
   which should resolve the most pressing concern of HS identification.

4) They also told me of research by Tobias Pulls which eliminates the
   needs for histograms in WTF-PAD and instead it samples from the
   probability distribution directly. They think that this can simplify
   things somewhat. Any thoughts on this?

Let me know what you think. I still don't understand the entire space
completely yet, so please be gentle. ;) 

Cheers! :)
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev