Re: [tor-dev] WTF-PAD and the future

2018-08-03 Thread Mike Perry
Hi Yawning!

Yawning Angel:
> On 08/02/2018 08:26 PM, Mike Perry wrote:
> > Should we consider recommending basket2 also?
> 
> No.
> 
> > Is anyone running bridges with it? Probably not, I guess :/.
> 
> No one should be, it is incomplete, buggy, and needs a re-design.

Thanks for the heads up.
 
> As a side note, I question the utility of a PT that has the AGPL3
> network interaction requirement, though there is an exception for
> bridges distributed via BridgeDB and those shipped with Tor Browser.

Would you recommend anything else other than obfs4 at this time, as per
that README_SECURITY doc?
(https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)


-- 
Mike Perry


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] WTF-PAD and the future

2018-08-03 Thread Yawning Angel
On 08/02/2018 08:26 PM, Mike Perry wrote:
> Should we consider recommending basket2 also?

No.

> Is anyone running bridges with it? Probably not, I guess :/.

No one should be, it is incomplete, buggy, and needs a re-design.

As a side note, I question the utility of a PT that has the AGPL3
network interaction requirement, though there is an exception for
bridges distributed via BridgeDB and those shipped with Tor Browser.

Regards,

-- 
Yawning Angel



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] WTF-PAD and the future

2018-08-02 Thread Mike Perry
Tobias Pulls:
> On 29/07/18 15:42, George Kadianakis wrote:
> >>> 4) They also told me of research by Tobias Pulls which eliminates the
> >>>needs for histograms in WTF-PAD and instead it samples from the
> >>>probability distribution directly. They think that this can simplify
> >>>things somewhat. Any thoughts on this?
> >> Yes this is actually exactly what I want to do with the next iteration
> >> of WTF-PAD! The question is what form/model to use for these probability
> >> distributions. Right now we're encoding inter-burst and inter-packet
> >> timings with some weird geometric distribution determining how long
> >> these bursts should go on for, when it might be more natural to encode
> >> and sample from length-based distributions/histograms.
> >>
> >> (Histograms vs distribution is not the problem -- its what they encode
> >> and how they encode it that matters).
> >>
> >> I don't see this paper on Tobias's website. Is it up anywhere yet?
> >>  
> > Hmm. Looking at the README of wtfpad (see the APE section), I think this
> > blog post is the best resource we have on this:
> >  https://www.cs.kau.se/pulls/hot/thebasketcase-ape/
> 
> Hi George and Mike,
> 
> You found the main writeup of the hasty work I did in this direction a
> while back, also some comments in the source [0]. Unfortunately my
> funding took me in other directions and I didn't want to publish any
> paper without spending more time on it. As written on the blog post it
> looks like a promising direction, but please also note that the attack
> implementation of Wa-kNN used has some rough edges for example when it
> comes to time-based features (so robustness of the naive distributions
> when moving around the PT server far from a given). If someone wants to
> collaborate on this I'd be more than happy to contribute, got funding to
> work on Tor-related things again starting August.

This is great! Sorry it took me so long to reply. I've been deep in it
thinking about related traffic analysis issues with onion services.

I'm very much interested in this direction. This is the post, right:
https://www.cs.kau.se/pulls/hot/thebasketcase-ape/

Did you handle deplenishing the distributions when normal traffic is
transmitted? Counting traffic that fits the target distribution as
"already sent padding" (and thus sending padding less overall traffic in
that case) is a key piece of WTF-PAD that allows it to have better
goodput. This is in fact why the original e2e defense was called
"Adaptive Padding". Because its padding distributions adapt to observed
traffic.

If we could alter the distribution in this same way, it may be the a
good way to go. However, histograms tend to be easier to do this with,
and they also encode distributions (just perhaps more tediously and
verbosely).

One of the other things I want to try, that may overlap, is changing the
type of information the distribution/histogram encodes. Inter-packet and
inter-burst delay (encoded as two separate states in the state machines)
is perhaps not as optimal or useful or easy to specify/optimize as
something more naturally resembling web traffic, such as a distribution
of request sizes and object sizes, and some way to simulate concurrent
fetch (selection of overlap) of these object sizes, and subtract these
objects-size instances from the distribution when we see them.

What do you think about that? Does that make sense?

Do you think we should try to do this as a parameterized distribution,
or as a histogram?

Are you interested in attempting to implement both/either?

> [0]: https://github.com/pylls/basket2/blob/master/padding_ape.go

Ooh nice! This is done as a PT implementation. 

You might like:
https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md

In it, I recommend obfs4 with iat-mode=2 because it does some limited
traffic packet size and timing obfuscation. Should we consider
recommending basket2 also? Is anyone running bridges with it? Probably
not, I guess :/.

-- 
Mike Perry


signature.asc
Description: Digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] WTF-PAD and the future

2018-07-29 Thread teor

> On 29 Jul 2018, at 23:42, George Kadianakis  wrote:
> 
>>> 2) From what I understand you are also hoping to use WTF-PAD to protect
>>>   against circuit fingerprinting and not just website
>>>   fingerprinting. They told me that while this might be plausible,
>>>   there is no current research on how well it can achieve that.  Are we
>>>   hoping to do that? And what research remains here? How can I help?
>>>   Which parts of the Tor circuit protocol are we hoping to hide?
>> 
>> I am designing WTF-PAD to be a framework for deploying padding against
>> arbitrary traffic analysis attacks. It is meant to allow us to define
>> histograms on the fly (in the Tor consensus) as these are studied. The
>> fact that they have not yet been studied is not super relevant to
>> deploying the framework for it now.
>> 
> 
> ACK.
> 
> What other traffic analysis attacks are we looking at addressing here?
> 
> I'm thinking of stuff like "circuit fingerprinting of onion services",
> but I wonder if histograms and random sampling is too crude to actually
> be able to help against sophisticated attacks. I don't have a suggestion
> for something better currently.
> 
> On that topic, is it decided whether the adaptive padding of WTF-PAD
> will also happen during circuit construction, or only after that?

Padding during circuit construction should work with VPADDING cells:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n508

At least it did last time I checked:
https://github.com/teor2345/endosome/blob/master/client-or-22929.py
https://trac.torproject.org/projects/tor/ticket/22929

We should avoid using PADDING cells during the handshake, because Tor
sometimes closes the connection:
https://github.com/teor2345/endosome/blob/master/client-or-22934.py

T

--
teor

Please reply @torproject.org
New subkeys 1 July 2018
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
--


signature.asc
Description: Message signed with OpenPGP
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] WTF-PAD and the future

2018-07-29 Thread Tobias Pulls
On 29/07/18 15:42, George Kadianakis wrote:
>>> 4) They also told me of research by Tobias Pulls which eliminates the
>>>needs for histograms in WTF-PAD and instead it samples from the
>>>probability distribution directly. They think that this can simplify
>>>things somewhat. Any thoughts on this?
>> Yes this is actually exactly what I want to do with the next iteration
>> of WTF-PAD! The question is what form/model to use for these probability
>> distributions. Right now we're encoding inter-burst and inter-packet
>> timings with some weird geometric distribution determining how long
>> these bursts should go on for, when it might be more natural to encode
>> and sample from length-based distributions/histograms.
>>
>> (Histograms vs distribution is not the problem -- its what they encode
>> and how they encode it that matters).
>>
>> I don't see this paper on Tobias's website. Is it up anywhere yet?
>>  
> Hmm. Looking at the README of wtfpad (see the APE section), I think this
> blog post is the best resource we have on this:
>  https://www.cs.kau.se/pulls/hot/thebasketcase-ape/

Hi George and Mike,

You found the main writeup of the hasty work I did in this direction a
while back, also some comments in the source [0]. Unfortunately my
funding took me in other directions and I didn't want to publish any
paper without spending more time on it. As written on the blog post it
looks like a promising direction, but please also note that the attack
implementation of Wa-kNN used has some rough edges for example when it
comes to time-based features (so robustness of the naive distributions
when moving around the PT server far from a given). If someone wants to
collaborate on this I'd be more than happy to contribute, got funding to
work on Tor-related things again starting August.

Best,
Tobias

[0]: https://github.com/pylls/basket2/blob/master/padding_ape.go
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] WTF-PAD and the future

2018-07-29 Thread George Kadianakis
Mike Perry  writes:

> George Kadianakis:
>> Hello Mike,
>> 
>> I had a talk with Marc and Mohsen today about WTF-PAD. I now understand
>> much more about WTF-PAD and how it works with regards to histograms.  I
>> think I might even understand enough to start some sort of conversation
>> about it:
>> 
>> Here are some takeaways:
>> 
>> 1) Marc and Mohsen think that WTF-PAD might not be the way forward
>>because of its various drawbacks and its complexity. Apparently there
>>are various attacks on WTF-PAD that Roger has discovered (SENDME
>>cells side-channels?) and also the deep learning crowd has done some
>>pretty good damage to the WTF-PAD padding (90%-60% accuracy?). They
>>also told me that achieving needed precision on the timings might be
>>a PITA.
>
> Are there citations for any of this? Last I heard Matt Wright was
> working on a deep learning study but the results were mixed.
>

I think this is the best we have in terms of public results:
  https://arxiv.org/abs/1801.02265

>> 2) From what I understand you are also hoping to use WTF-PAD to protect
>>against circuit fingerprinting and not just website
>>fingerprinting. They told me that while this might be plausible,
>>there is no current research on how well it can achieve that.  Are we
>>hoping to do that? And what research remains here? How can I help?
>>Which parts of the Tor circuit protocol are we hoping to hide?
>
> I am designing WTF-PAD to be a framework for deploying padding against
> arbitrary traffic analysis attacks. It is meant to allow us to define
> histograms on the fly (in the Tor consensus) as these are studied. The
> fact that they have not yet been studied is not super relevant to
> deploying the framework for it now.
>

ACK.

What other traffic analysis attacks are we looking at addressing here?

I'm thinking of stuff like "circuit fingerprinting of onion services",
but I wonder if histograms and random sampling is too crude to actually
be able to help against sophisticated attacks. I don't have a suggestion
for something better currently.

On that topic, is it decided whether the adaptive padding of WTF-PAD
will also happen during circuit construction, or only after that?

>> 3) Marc and Mohsen suggested using application-layer defences because
>>the application-layer has much better view of the actual structures
>>that are sent on the wire, instead of the black box view that the
>>network layer has.
>> 
>>In particular they were mainly concerned about onion services
>>fingerprinting because they are part of a restricted closed world,
>>whereas they were less concerned about the entire internet because of
>>its vast size.
>> 
>>They suggested that we could investigate using the service-side
>>"alpaca" library for onion services (e.g. as part of securedrop?)
>>which should resolve the most pressing concern of HS identification.
>
> I mean yeah application-layer defenses are useful for website traffic
> fingerprinting, but that is a very narrow slice of the traffic analysis
> problems that I want this framework to solve.
>
> WTF-PAD also doesn't rule out hidden service operators using alpaca,
> either. 
>

Agreed.

>> 4) They also told me of research by Tobias Pulls which eliminates the
>>needs for histograms in WTF-PAD and instead it samples from the
>>probability distribution directly. They think that this can simplify
>>things somewhat. Any thoughts on this?
>
> Yes this is actually exactly what I want to do with the next iteration
> of WTF-PAD! The question is what form/model to use for these probability
> distributions. Right now we're encoding inter-burst and inter-packet
> timings with some weird geometric distribution determining how long
> these bursts should go on for, when it might be more natural to encode
> and sample from length-based distributions/histograms.
>
> (Histograms vs distribution is not the problem -- its what they encode
> and how they encode it that matters).
>
> I don't see this paper on Tobias's website. Is it up anywhere yet?
>  

Hmm. Looking at the README of wtfpad (see the APE section), I think this
blog post is the best resource we have on this:
 https://www.cs.kau.se/pulls/hot/thebasketcase-ape/

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] WTF-PAD and the future

2018-07-27 Thread Mike Perry
George Kadianakis:
> Hello Mike,
> 
> I had a talk with Marc and Mohsen today about WTF-PAD. I now understand
> much more about WTF-PAD and how it works with regards to histograms.  I
> think I might even understand enough to start some sort of conversation
> about it:
> 
> Here are some takeaways:
> 
> 1) Marc and Mohsen think that WTF-PAD might not be the way forward
>because of its various drawbacks and its complexity. Apparently there
>are various attacks on WTF-PAD that Roger has discovered (SENDME
>cells side-channels?) and also the deep learning crowd has done some
>pretty good damage to the WTF-PAD padding (90%-60% accuracy?). They
>also told me that achieving needed precision on the timings might be
>a PITA.

Are there citations for any of this? Last I heard Matt Wright was
working on a deep learning study but the results were mixed.

Furthermore, we need to do adversarial learning and other optimizations
on these histograms to tune them. They are a generalized approach. Just
like it is not a valid evaluation to train a classifier on a dataset and
then add a new defense and show that it can't classify the defended
traffic using the old model, it is similarly not accurate to develop an
attack on WTF-PAD with a new classifier without also adversarially
optimizing the WTF-PAD histograms under that classifier. When you do
this, your results are not invalidating WTF-PAD, they are only
invalidating the histograms that were tuned against the previous
classifier/attack.

The same thing applies to the SENDME concern. The core piece of the
SENDME issue is "Tor should never send more than 1000 cells without a
SENDME. So *IF* I can tell which cells are SENDMEs, and *IF* I see more
than 1000 cells between them, then AHA I know that some cells are
actually padding and not real traffic".

Both of these are very big *IF*s, and even if they were shown to be
valid assumptions (which AFAIK they have not been), that does not mean
that it is actually useful for a classifier to know the percentage of
padding after 1000 cells, and it also does not mean that there isn't a
simple tweak to the histograms that encodes what looks like SENDME
transmission to that classifier.

> 2) From what I understand you are also hoping to use WTF-PAD to protect
>against circuit fingerprinting and not just website
>fingerprinting. They told me that while this might be plausible,
>there is no current research on how well it can achieve that.  Are we
>hoping to do that? And what research remains here? How can I help?
>Which parts of the Tor circuit protocol are we hoping to hide?

I am designing WTF-PAD to be a framework for deploying padding against
arbitrary traffic analysis attacks. It is meant to allow us to define
histograms on the fly (in the Tor consensus) as these are studied. The
fact that they have not yet been studied is not super relevant to
deploying the framework for it now.

> 3) Marc and Mohsen suggested using application-layer defences because
>the application-layer has much better view of the actual structures
>that are sent on the wire, instead of the black box view that the
>network layer has.
> 
>In particular they were mainly concerned about onion services
>fingerprinting because they are part of a restricted closed world,
>whereas they were less concerned about the entire internet because of
>its vast size.
> 
>They suggested that we could investigate using the service-side
>"alpaca" library for onion services (e.g. as part of securedrop?)
>which should resolve the most pressing concern of HS identification.

I mean yeah application-layer defenses are useful for website traffic
fingerprinting, but that is a very narrow slice of the traffic analysis
problems that I want this framework to solve.

WTF-PAD also doesn't rule out hidden service operators using alpaca,
either. 

> 4) They also told me of research by Tobias Pulls which eliminates the
>needs for histograms in WTF-PAD and instead it samples from the
>probability distribution directly. They think that this can simplify
>things somewhat. Any thoughts on this?

Yes this is actually exactly what I want to do with the next iteration
of WTF-PAD! The question is what form/model to use for these probability
distributions. Right now we're encoding inter-burst and inter-packet
timings with some weird geometric distribution determining how long
these bursts should go on for, when it might be more natural to encode
and sample from length-based distributions/histograms.

(Histograms vs distribution is not the problem -- its what they encode
and how they encode it that matters).

I don't see this paper on Tobias's website. Is it up anywhere yet?
 
> Let me know what you think. I still don't understand the entire space
> completely yet, so please be gentle. ;) 

I hope I was gentle enough. If there's anything that triggers rage mode
in me me more than someone being wrong on the 

[tor-dev] WTF-PAD and the future

2018-07-27 Thread George Kadianakis
Hello Mike,

I had a talk with Marc and Mohsen today about WTF-PAD. I now understand
much more about WTF-PAD and how it works with regards to histograms.  I
think I might even understand enough to start some sort of conversation
about it:

Here are some takeaways:

1) Marc and Mohsen think that WTF-PAD might not be the way forward
   because of its various drawbacks and its complexity. Apparently there
   are various attacks on WTF-PAD that Roger has discovered (SENDME
   cells side-channels?) and also the deep learning crowd has done some
   pretty good damage to the WTF-PAD padding (90%-60% accuracy?). They
   also told me that achieving needed precision on the timings might be
   a PITA.

2) From what I understand you are also hoping to use WTF-PAD to protect
   against circuit fingerprinting and not just website
   fingerprinting. They told me that while this might be plausible,
   there is no current research on how well it can achieve that.  Are we
   hoping to do that? And what research remains here? How can I help?
   Which parts of the Tor circuit protocol are we hoping to hide?

3) Marc and Mohsen suggested using application-layer defences because
   the application-layer has much better view of the actual structures
   that are sent on the wire, instead of the black box view that the
   network layer has.

   In particular they were mainly concerned about onion services
   fingerprinting because they are part of a restricted closed world,
   whereas they were less concerned about the entire internet because of
   its vast size.

   They suggested that we could investigate using the service-side
   "alpaca" library for onion services (e.g. as part of securedrop?)
   which should resolve the most pressing concern of HS identification.

4) They also told me of research by Tobias Pulls which eliminates the
   needs for histograms in WTF-PAD and instead it samples from the
   probability distribution directly. They think that this can simplify
   things somewhat. Any thoughts on this?

Let me know what you think. I still don't understand the entire space
completely yet, so please be gentle. ;) 

Cheers! :)
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev