[tor-dev] [ooniprobe-dev] Summary of OONI hackfest

2012-08-22 Thread Arturo Filastò
We had a mini hackfest at noisebridge on OONI and this thread is to 
summarize what we discussed.


mct: feel free to add things from your notes that are not detailed here.

One of the problems that we mostly focused our attention on was how to 
detect censorship on HTTP when the user is presented with a block page. 
The reason for focusing on this is that censorship over HTTP is the most 
prevalent form.


# Detecting censorship in web sites

Detecting that the presented web page is not the expected site, but is a 
censorship page turns out to be a non trivial task, especially when you 
are dealing with web pages that have dynamic content. This problem is 
detailed in this trac ticket: 
https://trac.torproject.org/projects/tor/ticket/6180.


We ended up dividing the possible approaches into two macro categories: 
Statistical and Heuristic [1]


One of the properties we would like our classifier to have is that that 
a user should be able to detect that the presented page is the block 
page by just having a copy of OONI and some data that can be given to 
them over a non censored channel (sneakernet for example).


## DOMClass, a eigenvalue based statistical classifier

What we ended up implementing is a classifier that considers the DOM 
structure of the web page. We can easily build such a database of the 
DOM structure features of the websites we are interested in analyzing 
and ship these to users.


This is how the algorithm works:

  This classifier uses the DOM structure of a website to determine 
how similar

  the two sites are.
  The procedure we use is the following:
  * First we parse all the DOM tree of the web page and we 
build a list of
TAG parent child relationships (ex. 
 =>

(html, a), (a, b), (html, c)).

  * We then use this information to build a matrix (M) where 
m[i][j] = P(of
transitioning from tag[i] to tag[j]). If tag[i] does not 
exists P() = 0.

Note: M is a square matrix that is number_of_tags wide.

  * We then calculate the eigenvectors (v_i) and eigenvalues 
(e) of M.


  * The corelation between page A and B is given via this formula:
correlation = dot_product(e_A, e_B), where e_A and e_B are
resepectively the eigenvalues for the probability matrix A 
and the

probability matrix B.

You will note that the Matrix we are computing eigenvalues for somewhat 
resembles a Markov model for transitions between DOM element X to Y.


This algorithm appears to work pretty well even on highly dynamic web 
sites (such as the homepage of news.google.com).


Problems with this algorithm:

* It does not take into account the global position of DOM elements or 
how deeply nested they are. ( 
(a,b),(b,a),(a,b) is equivalent to  
(a,b), (b,a), (a,b))


Looking into other possible solutions to this problem I took a look at 
algorithms that are used to compute graph similarity.


This problem could be solved by using an algorithm that calculates the 
maximum isomorphic subgraph, but this problem is NP-hard. [2]
I am unsure if the computation effort to use algorithms along these 
lines is worth while.


I can't seem to find the notes on the rest of the discussion, perhaps 
mct will integrate with this discussion.


- Art.

[1] https://trac.torproject.org/projects/tor/ticket/6180#comment:3
[2] 
http://en.wikipedia.org/wiki/Maximum_common_subgraph_isomorphism_problem 

___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] DNS(SEC) draft update

2012-08-22 Thread Ondrej Mikle
On 08/20/2012 02:43 AM, Mike Perry wrote:
> Thus spake Ondrej Mikle (ondrej.mi...@gmail.com):
> 
>> I've revised the DNS draft, attaching it. In section 4 there are some options
>> for integration with libunbound, but each of them requires some work with the
>> stock libunbound code.
> 
> I'm not a DNS expert, but I have a couple preliminary requests/questions.
> 
> First, can you provide a section in the proposal on the analysis of the
> number of round trips over Tor for different request scenarios? If you
> offload full DNS responsibility to the client, certain query behaviors
> are going to be better than others with respect to the number of round
> trips over Tor. We're going to want to minimize these round trips,
> especially if we decide we want to rely on DNSsec/DANE for everything.
> Clients may also want to use this information to try to intelligently
> decide cases where we don't want to do full DNSsec queries and revert to
> the oldstyle SOCKS4A.

Added section 8 to the draft with a "common" and "extreme" example. Validation
still would be done at both exit and client: client can't trust the AD bit from
exit and exit must implement own recursive resolver via libunbound as ISP's
resolvers often won't work with DNSSEC, the problem is usually in fetching DS
records.

> Second (and related), is it totally insane to map some sort of magic IP
> to "foward this query the local exit node resolver" so that the client
> can easily get DNS(sec) perspectives from each exit node's resolver
> caches? This might both minimize round trips for clients who don't want
> to either hardcode 8.8.8.8 or do full recursive resolves against the
> root servers. On the other hand, it might complicate query handling on
> the exit side and also introduce weird cache/poisoning attacks?

It's actually quite interesting idea, though not sure how to map a local
127.0.0./8 IP to a specific exit. If the exit changes inbetween queries (new
circuit), should the client know somehow?

I also thought about "most lightweight" implementation which would just use ldns
library on the exit's side - client would employ the "magic IP" as forwarder for
local standalone unbound daemon. But it breaks on the inability of ISPs'
resolvers to fetch DS records mentioned above.

For the perspective it should be noted that many CDNs and load balancers use
short TTLs in the range 5-30, two subsequent queries may return different 
results.

Ondrej
Filename: xxx-dns-dnssec.txt
Title: Support for full DNS and DNSSEC resolution in Tor
Authors: Ondrej Mikle
Created: 4 February 2012
Modified: 19 August 2012
Status: Draft

0. Overview

  Adding support for any DNS query type to Tor, as well as DNSSEC support.

0.1. Motivation

  Many applications running over Tor need more than just resolving FQDN to
  IPv4 and vice versa. Sometimes to prevent DNS leaks the applications have to
  be hacked around to be supplied necessary data by hand (e.g. SRV records in
  XMPP). TLS connections will benefit from planned TLSA record that provides
  certificate pinning to avoid another Diginotar-like fiasco.
  
  DNSSEC is part of the DNS protocol and the most appropriate place for DNSSEC
  API would be probably in OS libraries (e.g. libc). However that will
  probably take time until it becomes widespread.

  On the Tor's side (as opposed to application's side), DNSSEC will provide
  protection against DNS cache-poisoning attacks (provided that exit is not
  malicious itself, but still reduces attack surface).

1. Design

1.1 New cells

  There will be two new cells, RELAY_DNS_BEGIN and RELAY_DNS_RESPONSE (we'll
  use DNS_BEGIN and DNS_RESPONSE for short below).

  DNS_BEGIN payload:

DNS packet data (variable length)

  The DNS packet must be generated internally by libunbound to avoid
  fingerprinting users by differences in client resolvers' behavior.

  DNS_RESPONSE payload:
  
total length (2 octets)
data (variable)
  
  Data contains the reply DNS packet or its part if packet would not fit into
  the cell. Total length describes length of complete response packet, thus
  one DNS_BEGIN may be answered by multiple DNS_RESPONSE cells.

  DNS_BEGIN must use a non-zero, distinct StreamID, corresponding DNS_RESPONSE
  will use the same StreamID. Similarly to RELAY_RESOLVE(D), no actual stream
  is created.

  AXFR and IXRF are not supported in this cell by design (see specialized tool
  below).

2. Interfaces to applications

  DNSPort evdns - existing implementation will be updated to use DNS_BEGIN.
  
3. Limitations on DNS query

  Query class is limited to IN (INTERNET) since the only other useful class
  CHAOS is practical for directly querying authoritative servers (OR in this
  case acts as a recursive resolver). Query for class other than IN will
  return REFUSED in the inner DNS packet.

  Multiple questions in a single packet are not supported and OR will respond
  with REFUSED as the DNS error code.

  All query RR types are allowed.

  [ I o

[tor-dev] Tor and NAT devices: increasing bridge & relay reachability or, enabling the use of NAT-PMP and UPnP by default.

2012-08-22 Thread Jacob Appelbaum
Hi,

My latest tech report is now up. I think it would be a nice idea if we
could kick off a discussion about Tor, NAT devices and reachability.

Tor and NAT devices: increasing bridge & relay reachability or, enabling
the use of NAT-PMP and UPnP by default:
https://research.torproject.org/techreports/tor-nat-plan-2012-08-22.pdf

Feedback welcome and encouraged!

All the best,
Jake
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev