Re: [tor-dev] HSDir Auth and onion descriptor scraping

2014-11-10 Thread Gareth Owen
Grarpamp

I'm only not publishing it because of privacy concerns - ultimately some HS
operators might not wish to have their existence publically known..  I
would be open to supplying it to bona fide and verifiable tor project
members if it is for a legitimate research purpose.

I am collecting version 2 descriptors.  I have exactly 445994 hidden
service descriptors - for approximately 70,000 unique hidden services.  I
do not believe the introduction points are secret, having a list of IPs
doesn't help you connect to the hidden service.

Best
Gareth

On 9 November 2014 23:39, grarpamp  wrote:

> On Sun, Nov 9, 2014 at 3:22 PM, Gareth Owen 
> wrote:
> > I have several hundred thousand (or million? Haven't counted) hs
> descriptors
> > saved on my hard disk from a data collection experiment (from 70k HSes).
> > I'm a bit nervous about sharing these en masse as whilst not confidential
> > they're supposed to be difficult to obtain in this quantity.  However, if
> > someone wants to write a quick script that goes through all of them and
> > counts the number of authenticated vs nonauthed then I do not mind
> running
> > it on the dataset and publishing the results.  I have a directory where
> each
> > file is a hs descriptor.
> >
> > The introduction point data is base64 encoded plaibtext when unauthed or
> has
> > high entropy otherwise.
>
> What version descriptors are you collecting?
>
> There are a few reports I could think to run against your dataset, even if
> the IntroPoints were replaced with 127.0.0.n (n set to 1, 2, 3, n for each
> IntroPoint in respective descriptors list)... or even 1:1 mapped for all
> descriptors either a) randomly into a new parallel IPv4/IPv6 space
> (dot-quad),
> or b) serially into a respective 32 or 128 bit number (not dot-quad).
>
> Whether on or off list I could use your collection patches, and a raw
> sample of a single recent on disk descriptor from a public service such as
> hbjw7wjeoltskhol or kpvz7ki2v5agwt35 so we know your data format.
>
> It's effectively public info anyways, I'll get to it sooner or later,
> others
> already have.
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>



-- 
Dr Gareth Owen
Senior Lecturer
Forensic Computing Course Leader
School of Computing, University of Portsmouth

*Office:* BK1.25
*Tel:* +44 (0)2392 84 (6423)
*Web*: ghowen.me
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Hidden Service authorization UI

2014-11-10 Thread Gareth Owen
It is verifiable.  In authenticated hidden services, the introduction
points are first encrypted and then base64 encoded.  So a simple test is:
When base64 decoded, is the MSB bit set on any bytes ?  If yes, then it's
probably authenticated, otherwise not.

Note, you can use the Tor research framework to fetch any hidden service
descriptor, it will even parse the document and pull out the IP text.

Best
Gareth

On 10 November 2014 07:42, Andrea Shepard  wrote:

> On Sun, Nov 09, 2014 at 09:16:40PM -0500, Griffin Boyce wrote:
> > On 2014-11-09 15:30, Fabio Pietrosanti - lists wrote:
> > >On 11/9/14 8:58 PM, Jacob Appelbaum wrote:
> > >>>For example, it would be interesting if TBB would allow people to
> > >>>input a password/pubkey upon visiting a protected HS. Protected HSes
> > >>>can be recognized by looking at the "authentication-required"
> > >>>field of
> > >>>the HS descriptor. Typing your password on the browser is much more
> > >>>useable than editing a config file.
> > >>That sounds interesting.
> > >
> > >Also i love this idea but i would suggest to preserve the copy&paste
> > >self-authenticated URL property of TorHS, also in presence of
> > >authorization.
> >
> >   I'm conflicted about this idea.  Much better for usability ~but~
> > there should be an option for authenticated hidden services that
> > want to *not* prompt and instead fail silently if the key isn't in
> > the torrc (or x.y.onion url, depending on the design).
> >
> >   Use case: if someone finds my hidden service url written in my
> > planner while traveling across the border, they might visit it to
> > see what it contains. If it offers a prompt, then they know it
> > exists and can press me for the auth key (perhaps with an M4
> > carbine).  If there's no prompt and the request fails, then perhaps
> > it "used to exist" a long time ago, or I wrote down an example URL.
> >
> > best,
> > Griffin
>
> I believe it's verifiable whether an authenticated HS exists anyway; you
> can
> get the descriptor, but the list of intro points is encrypted.
>
> --
> Andrea Shepard
> 
> PGP fingerprint (ECC): BDF5 F867 8A52 4E4A BECF  DE79 A4FF BC34 F01D D536
> PGP fingerprint (RSA): 3611 95A4 0740 ED1B 7EA5  DF7E 4191 13D9 D0CF BDA5
>
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
>


-- 
Dr Gareth Owen
Senior Lecturer
Forensic Computing Course Leader
School of Computing, University of Portsmouth

*Office:* BK1.25
*Tel:* +44 (0)2392 84 (6423)
*Web*: ghowen.me
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] HSDir Auth and onion descriptor scraping

2014-11-10 Thread George Kadianakis
Gareth Owen  writes:

> Grarpamp
>
> I'm only not publishing it because of privacy concerns - ultimately some HS
> operators might not wish to have their existence publically known..  I
> would be open to supplying it to bona fide and verifiable tor project
> members if it is for a legitimate research purpose.
>
> I am collecting version 2 descriptors.  I have exactly 445994 hidden
> service descriptors - for approximately 70,000 unique hidden services.  I
> do not believe the introduction points are secret, having a list of IPs
> doesn't help you connect to the hidden service.
>

>From the number of introduction points you might be able to deduce the
popularity of the hidden service. Fortunately, this feature doesn't
work very well: https://trac.torproject.org/projects/tor/ticket/8950
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] HSDir Auth and onion descriptor scraping - actual stats

2014-11-10 Thread Gareth Owen
OK. curiosity got the better of me.  I took a random sample of 20,368 HS
descriptors and just 131 were authenticated - that's about 0.6%.

The code I used is here:
https://github.com/drgowen/tor-research-framework/blob/master/src/main/java/tor/examples/HSIsAuthed.java

Best
Gareth

PS - I only took a sample (rather than the whole batch) because each HS
descriptor is in its own file and it takes ages to process - nevertheless,
because of the way its collected the sample should be representative.

-- 
Dr Gareth Owen
Senior Lecturer
Forensic Computing Course Leader
School of Computing, University of Portsmouth

*Office:* BK1.25
*Tel:* +44 (0)2392 84 (6423)
*Web*: ghowen.me
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Hidden Service authorization UI

2014-11-10 Thread Nathan Freitas
On Sun, Nov 9, 2014, at 07:50 AM, George Kadianakis wrote:
> Hidden Service authorization is a pretty obscure feature of HSes, that
> can be quite useful for small-to-medium HSes.
... 
> For example, it would be interesting if TBB would allow people to
> input a password/pubkey upon visiting a protected HS. Protected HSes
> can be recognized by looking at the "authentication-required" field of
> the HS descriptor. Typing your password on the browser is much more
> useable than editing a config file.

We have been working on implementing an OnionShare feature for Orbot, as
a plugin/add-on. Since the client and the server are both one simple
app, it seems like we could easily implement the HS Authorization
feature. Since the goal is to share a file with a small audience, the
second "client key" approach seems to be the best, most secure approach
to me.

Any thoughts?
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Pluggable-transport implementations of your website fingerprinting defenses

2014-11-10 Thread David Fifield
On Sun, Nov 09, 2014 at 08:23:33PM -0500, Xiang Cai wrote:
> I started to work on csbuflo code a long time ago, and I wasn’t using any
> version control software back then, so I don’t have file commit history 
> either…
> Sorry about that.
> 
> However, I only modified several core files based on openssh-5.9p1 source 
> code:
> clientloop.c
> serverloop.c
> packet.c
> misc.c
> and related header files. A simple diff between these files and the original
> ssh code will tell you what I modified.
> 
> I am not sure if the code is directly useable for your purpose, but I’ll
> briefly talk about what my code does, and hopefully, it will give you some 
> help
> when reading the code. 
> 
> The code actually implements the Glove system, which requires that both the
> client and server have a transcript of “super traces”. — I believe the 
> location
> of the transcript is hardcoded as ‘/var/tmp/st.txt’ in my code …
> When visiting a website that is not shown in the transcript, the system falls
> back to use CSBuFLO scheme.

I see. Thanks for the references.

Building this code into a pluggable transport would be more work than I
had originally supposed. If there were a minimal network client and
server, only implementing the website fingerprinting defense, then I
wouldn't mind spending half a day to make them into a pluggable
transport. But as it stands, it looks like it will be more work,
essentially a reimplementation.

David Fifield
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Defending against guard discovery attacks by pinning middle nodes

2014-11-10 Thread Mike Perry
A. Johnson:
> > It seems to me that we want to defend against (at least) two
> > different attacks here:
> > 
> > Sybil attack:
> ...
> > Coercion attack:
> 
> Yes, I also am currently thinking about the problem in this way.
> 
> >  Unfortunately, it doesn't really make sense to add two '5 day
> >  guards' in a circuit, since a Sybil adversary has equal chances to
> >  pop at the guard nearest to the HS.
> 
> Yup.
> 
> > * While more hops are useless for Sybil attacks, they actually help
> > against coercion attacks. Unfortunately, they only add 5 days per
> > extra hop to the time to deanonymization.
> 
> And yes again. In this model, an ultra-mega-secret HS should use a
> long chain of guards. Of course, at some point, it is easier to do a
> congestion attack to identify the first guard being used by the HS.
> That is still a win, though, in that such an attack takes more
> technical skill and effort.

I think this brings up another good point about hidden services that the
"you must use a single guard forever" model causes us to ignore.

Imagine if we had k-Conflux routing, where you could pick 1-k guards,
and send packets to a single exit or RP.

Under this model, congestion attacks would be more difficult, because
you have to also do the combinatorics to choke out the Choose(N, k)
combinations of k of N total guards. If you only choked a subset, conflux
would rebalance to the remaining free paths (assuming good, responsive
flow control).

> > * It seems that coercion attacks are noisy. At least in this case,
> > relays got seized (why?) and people got notified that something was
> > going on. It would be nice if we could make coercion attacks even
> > more noisy, so that adversaries can't do them without tipping off
> > the whole network.
> 
> I’m not optimistic about this. Surveillance is no good if the target
> is aware of it, and so it can be expected to be difficult to detect.
> 
> > * The more I think about this problem, the more I realize that our
> > solutions are quite hacky. Maybe guards are not the right layer to
> > fix this problem, and we should try to fix the guard discovery
> > problem in circuit establishment as Mike has been suggesting?
> > Unfortunately, the virtual circuits idea seems hard to analyze and
> > do securely.
> 
> What do you mean by "the guard discovery problem in circuit
> establishment”? Do you mean using some level of traffic padding to
> make it difficult to determine when your relay is directly observing
> an HS guard? This seems straightforward to do just by making every
> relay see the same type and number of cells in every non-terminal
> position in the circuit during circuit creation (some will have no
> effect, detectable only by the last relay). I do worry about how the
> cell RTTs could still leak your relative circuit position. Ignoring
> that, maybe you can make it so that the adversary either (i) has to
> start surveillance on an observed hop and hope that it is a relatively
> static guard close to the HS or (ii) has to wait until some relay is
> observed *multiple* times from the malicious relays to be sure that it
> is in some layer of guards for the targeted HS.

I think padding and other defenses against introducing active side
channels are fine longer-term plans, but I think what George means is
that we should re-design HS path selection specifically for this guard
discovery threat, taking it as a given that the adversary is able to use
*some* side channel to determine that it is in the path and knows its
position.

While I find passive timing attacks skeptical at scale, I am not so
crazy as to believe that it is impossible to *actively* encode a
reliable side channel in a traffic pattern that you control (which is
used in the HS guard discovery research). Conflux may also make these
types of side channels more difficult to construct, but I suspect that
just means you have to send more data as an attacker. Since you can send
a bunch of data at the HS for most application layer protocols (HTTP
POST comes to mind), this is probably not a huge barrier.


After reading your back-of-the-envelope calculations on node rotation
and guard lifetime for multi-tiered guards in your parent reply, I think
that it stands to reason that some topology like this is best (with
Conflux):

HS -> Guard_1 -> Guard_2 -> Guard_3 -> RP

The idea would be that Guard_3 would rotate on the order of hours,
Guard_2 would come from a set that is rotated on the order of days
(based on the expected duration for the adversary to become Guard_3), and
Guard_1 would rotate on the order of months (based on the expected
duration for the adversary to become Guard_2).

Based on your math in your parent reply, this now does strike me as
superior to what I actually proposed with "virtual circuits", since if
the whole virtual circuit had a lifetime of hours, you'd still have only
days before the adversary ended up in the Guard_2 position.

I would prefer if we had closed-form or code-based vers

Re: [tor-dev] Defending against guard discovery attacks by pinning middle nodes

2014-11-10 Thread A. Johnson

>> And yes again. In this model, an ultra-mega-secret HS should use a
>> long chain of guards. Of course, at some point, it is easier to do a
>> congestion attack to identify the first guard being used by the HS.
>> That is still a win, though, in that such an attack takes more
>> technical skill and effort.
> 
> I think this brings up another good point about hidden services that the
> "you must use a single guard forever" model causes us to ignore.
> 
> Imagine if we had k-Conflux routing, where you could pick 1-k guards,
> and send packets to a single exit or RP.
> 
> Under this model, congestion attacks would be more difficult, because
> you have to also do the combinatorics to choke out the Choose(N, k)
> combinations of k of N total guards. If you only choked a subset, conflux
> would rebalance to the remaining free paths (assuming good, responsive
> flow control).

That is a good point, and if you had a set of guards for which you had high 
collective trust, then I think multiple guards for load balancing and making 
congestion attacks more difficult would be a good idea. I think you need them 
to be collectively trustworthy because otherwise each added guard gives the 
adversary another chance to be selected, which multiplies his ability to 
observe user traffic in the network. I am skeptical that you wouldn’t be able 
to apply the congestion attack to the guards one-by-one, because the load 
balancing would take some time to notice the throughput drop and adjust. If you 
had multiple guards, then you could use them to provide stronger protection 
against discovery via congestion by redundantly sending the same packet to each 
one (PF Syverson and I analyzed this idea in a PETS10 paper 
).

> After reading your back-of-the-envelope calculations on node rotation
> and guard lifetime for multi-tiered guards in your parent reply, I think
> that it stands to reason that some topology like this is best (with
> Conflux):
> 
> HS -> Guard_1 -> Guard_2 -> Guard_3 -> RP
> 
> The idea would be that Guard_3 would rotate on the order of hours,
> Guard_2 would come from a set that is rotated on the order of days
> (based on the expected duration for the adversary to become Guard_3), and
> Guard_1 would rotate on the order of months (based on the expected
> duration for the adversary to become Guard_2).

Why set guard 2 to expire in days? If that is less than surveillance speed, 
then once the adversary had guard 3, it’s game over.

> I would prefer if we had closed-form or code-based versions of this
> calculation, though, so we could play with set sizes and lifespans for
> the 3 hops. We also want to play with questions like "How many circuits
> should we use at a time?" and "How big should the set of middle nodes we
> choose from be?" Making all of that parameterized will help us tune it.
> Heck, the surface might even be plottable or traversable with gradient
> descent.

I took a stab at this by writing a quick Monte Carlo simulator (in python). 
Calculating exact probabilities gets incredibly complicated when you have 
several parameters and somewhat complicated compromise scenarios, and 
simulation in this case is nearly as accurate and reasonably fast. The 
simulator considers the HS to use just one circuit, and each node on the 
circuit expires at an individual rate. It outputs the distribution of how long 
it takes for the adversary to identify the HS. The parameters you might modify 
(set near the end of the script) are
  1. node_expiration_times: a list of node expirations in order from HS (in 
hours), e.g. [3,2,1] means the first hop (aka the HS’s entry guard) expires in 
3 hours, the second hop expires 2 hours, and the third hop in 1 hour
  2. surveillance_time: time needed by the adversary to accomplish surveillance 
(in hours), e.g. if 24 then the adversary compromises a node that expires in 24 
or more hours from the current time
  3. adv_relay_probs: list containing the probability of selecting an 
adversarial relay in each position, ordered from HS, e.g. [0.05, 0.01, 0.01] 
means that the adversary controls 0.05 of entry guard consensus weight, 0.01 of 
second (aka middle) hop weight, and 0.01 of third-hop weight

The script is attached. I hope it is useful. If so, maybe I can develop it more 
next week.

Best,
Aaron
import random
import math

def check_for_compromise(sample_expiration_times, cur_time, adv_position,
surveillance_time):
"""Checks if adversary in adv_position has surveillance_time left in each previous node
before they expire. If so, the HS is compromised via repeated surveillance."""
if (adv_position == 0):
return True
else:
if (sample_expiration_times[adv_position-1] >=\
cur_time + surveillance_time):
return check_for_compromise(sample_expiration_times,
cur_time + surveillance_time, adv_position-1, surveillance_time)
else:
return