Re: [tor-dev] Building better pluggable transports - GSoC 2013 project
Hi Chang, On 29 May 2013, at 06:22, Chang Lan changl...@gmail.com wrote: Given that ScrambleSuite is being deployed, improving protocol obfuscation will be my main focus. HTTP impersonation is really useful, since there are numerous HTTP proxy outside the censored region, while the number of bridges is quite limited. What I'm gonna be doing during the summer is implementing a good enough HTTP impersonation based on pluggable transports specification. There are still many open questions indeed. Discussions are more than welcome! There certainly are quite a few open questions, so it would be good to start planning early. Implementing HTTP is a deceptively difficult project. I'd suggest starting by reading the HTTP specification in detail, particularly the parts that deal with caching: http://tools.ietf.org/html/rfc2616 For comparison HTTP/1.0 is also worth looking at: http://tools.ietf.org/html/rfc1945 Some issues that you will need to deal with are: - Individual HTTP requests may be re-ordered if they are over different TCP connections - Responses may be truncated without an error being reported to higher layers (which is why HTTP includes length fields as an option). - HTTP doesn't give the same congestion avoidance as TCP - Proxies can both cache and modify data they transmit. - Proxies deviate from what is permitted by the specification - (and others) When dealing with these, you will need to ensure you don't introduce any new ways for a censor to efficiently and reliably distinguish your protocol from HTTP. I think it would also be a good idea to implement scanning resistance. Since it will be over TCP, you can't hide that something is listening, but you can ensure that if the initial request does not demonstrate knowledge of a valid secret, the response does not disclose that it is a Tor bridge. As you start implementing, you should have some way of testing. Initially this can be a direct connection from your pluggable transport client to pluggable transport server. You can set up an OP and bridge on the same machine (set your bridge not to advertise itself), and get your OP to talk to your bridge via your pluggable transport. However, you shouldn't keep to this setup for very long, as it won't test how your pluggable transport works with a proxy. So you should put a caching proxy (e.g. Squid) between your pluggable transport server and client, and make sure they keep working. You can try configuring Squid in ways to stress your pluggable transport, and also replace Squid with a proxy server you create (e.g. based on one of the many Python HTTP proxies http://proxies.xhaus.com/python/). This proxy server could behave pathologically, and test the corner cases of your pluggable transport. When working on your experiments, automate the set up, running of the test, and processing of the results. This is not just to make your life easier but it means that your experiments can be repeatable. The scripts and configuration files should be checked into version control. Your goal should be that someone can check out your code, install a few standard packages via apt-get or yum, run a single command, and get the same results. There are tools to help do this (e.g. http://software-carpentry.org/4_0/data/mgmt.html and http://software-carpentry.org/4_0/data/bein.html) but just using make and shell scripts might be fine. There's a lot to think about here, so we don't need answers to everything now, but if you have any questions or comments do let me know. Best wishes, Steven ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] grabbing Tor circuit (node) data- Tor stem, torrc and Tor control port
Damian, thanks. Your summary pretty well sums up where i am trying to start. I will try this out and go from there. On Jun 10, 2013, at 11:08 PM, Damian Johnson ata...@torproject.org wrote: Hi Sarah. I'm not really sure what you're trying to ask in most of these questions. Assuming that your goal is simply 'I want to connect to port 9051 and dump information about all the relays I'm presently using in my circuits' then the following should do the trick. Take the following with a grain of salt - I just wrote it, I haven't tried running it. Should be close though. ;) from stem.control import Controller with Controller.from_port(port = 9051) as controller: my_relay_fingerprints = [] # fingerprints of all relays in our circuits for circ in controller.get_circuits(): my_relay_fingerprints += [fp for (fp, nickname) in circ.path] for fingerprint in my_relay_fingerprints: desc = controller.get_network_status(fingerprint) country = controller.get_info(ip-to-country/%s % desc.address, unknown) print relay: %s: % fingerprint print address: %s:%s % (desc.address, desc.or_port) print locale: %s % country Did you have any other questions? On Mon, Jun 10, 2013 at 9:43 AM, SARAH CORTES sa...@lewis.is wrote: Damian, thanks, this is very helpful. is there a way to do this in torrc? Else, i suppose i will need to: 1) create a socket or connection to my port 9051 ; do i need/can i use TORRC_CONTROL_SOCKET ? 2) call get_circuits() ; grab the relay fingerprints Do I need circuit = controller.get_circuit(circ_id) 3) return class:`stem.events.CircuitEvent` ; for the given circuit Not sure whether or where to use the path attribute 4) call controller.get_network_status() to get IP address, nickname, ORPort, Should I use: desc_by_fingerprint = controller.get_network_status(test_relay.fingerprint) + desc_by_nickname = controller.get_network_status(test_relay.nickname) 5) use Maxmind- i already have the GeoIPLite DB to grab AS and country, and onion code also from Arturo Any guidance is appreciated https://lists.torproject.org/pipermail/tor-commits/2012-December/051174.html get_circuit(self, circuit_id, default = UNDEFINED): + +Provides a circuit presently available from tor. + +:param int circuit_id: circuit to be fetched +:param object default: response if the query fails + +:returns: :class:`stem.events.CircuitEvent` for the given circuit + +:raises: + * :class:`stem.ControllerError` if the call fails + * ValueError if the circuit doesn't exist + + An exception is only raised if we weren't provided a default response. + + +try: + for circ in self.get_circuits(): +if circ.id == circuit_id: + return circ + + raise ValueError(Tor presently does not have a circuit with the id of '%s' % circuit_id) +except Exception, exc: + if default: return default + else: raise exc + def get_circuits(self): Provides the list of circuits Tor is currently handling. On Jun 10, 2013, at 10:34 AM, Damian Johnson ata...@torproject.org wrote: Hi, Damian, thanks. I am happy to discuss it on tor-dev@. But I want to keep off spam, which some of my questions at first may be, essentially Qs. But, if you think they would be of interest to tor-dev, or others could help, just let me know, and i will sign up for it. They certainly are! If you're interested in tor and development then I would definitely suggest being on that list. Including it for this thread. I am trying to figure out how to pull in the nodes that are actually used in my Tor circuits. They are the nodes reflected in the Network map function. You want the get_circuits() method. As you mentioned the 'path' attribute has the relays in your present circuits... https://stem.torproject.org/api/control.html#stem.control.Controller.get_circuits I have created a MySql DB of some of my Tor circuits and nodes which i am analyzing. I grabbed 48 circuits with their 144 nodes and info (IP address, nickname, country) manually from my laptop's Tor network map. That certainly sounds painful. The circuit paths will provide the relay fingerprints which you can use for get_network_status() to get the address, nickname, ORPort, etc... https://stem.torproject.org/api/control.html#stem.control.Controller.get_network_status As for locales that would be done via get_info('ip-to-country/address')... https://gitweb.torproject.org/torspec.git/blob/HEAD:/control-spec.txt#l672 ... and ultimately to AS and country. AS will require the Maxmind AS database or something else. I know that Onionoo includes the AS information so the options that come to mind are either to (a) see how it does it or (b) query Onionoo for this information. https://onionoo.torproject.org/ And i have read much of the control-spec, don't know how
Re: [tor-dev] Metrics Plans
I can try experimenting with this later on (when we have the full / needed importer working, e.g.), but it might be difficult to scale indeed (not sure, of course). Do you have any specific use cases in mind? (actually curious, could be interesting to hear.) The advantages of being able to reconstruct Descriptor instances is simpler usage (and hence more maintainable code). Ie, usage could be as simple as... from tor.metrics import descriptor_db # Fetches all of the server descriptors for a given date. These are provided as # instances of... # # stem.descriptor.server_descriptor.RelayDescriptor for desc in descriptor_db.get_server_descriptors(2013, 1, 1): # print the addresses of only the exits if desc.exit_policy.is_exiting_allowed(): print desc.address Obviously we'd still want to do raw SQL queries for high traffic applications. However, for applications where maintainability trumps speed this could be a nice feature to have. * After making the schema update the importer could then run over this raw data table, constructing Descriptor instances from it and performing updates for any missing attributes. I can't say I can easily see the specifics of how all this would work, but if we had an always-up-to-date data model (mediated by Stem Relay Descriptor class, but not necessarily), this might work.. (The ORM - Stem Descriptor object mapping itself is trivial, so all is well in that regard.) I'm not sure if I entirely follow. As I understand it the importer... * Reads raw rsynced descriptor data. * Uses it to construct stem Descriptor instances. * Persists those to the database. My suggestion is that for the first step it could read the rsynced descriptors *or* the raw descriptor content from the database itself. This means that the importer could be used to not only populate new descriptors, but also back-fill after a schema update. That is to say, adding a new column would simply be... * Perform the schema update. * Run the importer, which... * Reads raw descriptor data from the database. * Uses it to construct stem Descriptor instances. * Performs an UPDATE for anything that's out of sync or missing from the database. Cheers! -Damian ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Client simulation
On 6/10/13 4:40 AM, Karsten Loesing wrote: On 6/6/13 7:32 PM, Norman Danner wrote: I have two questions regarding a possible research project. First, the research question: can one use machine-learning techniques to construct a model of Tor client behavior? Or in a more general form: can one use fill-in-the-blank to construct a model of Tor client behavior? A student of mine did some work on this over the last year, and the results are encouraging, though not strong enough to do anything with yet. The intent is that each cluster (represented by a single hidden Markov model) represents a type of client, even though we don't know for sure what that client type does. We can make some guesses about some: the type of steady high-volume cell counts is probably a bulk downloader; the type of steady zero cell counts is probably an unused circuit; etc. But in some sense, I'm thinking that what counts is the behavior of the client, not the reason for that behavior. We don't have to instrument clients for this. Of course, then one has to ask whether this kind of modeling is in fact useful. It is somewhat different than what you are envisioning, I think. There are about a billion variations (at last count) on this theme. We chose one particular one as a test case to play with the methodology. I think the methodology is mostly OK, though I'm not completely satisfied with the results of the particular variation Julian worked on. So now I'm trying to figure out whether to push this forward and in particular what directions and end goals would be useful. Interesting stuff! You're indeed taking a different approach than I were envisioning by gathering data on a single guard rather than on a set of volunteering clients. Both approaches have their pros and cons, but I think your approach leads to some interesting results and can be done in a privacy-preserving fashion. Two thoughts: - I could imagine that your results are quite valuable for modeling better Shadow/ExperimenTor clients or for deriving better client models for Tor path simulators. Maybe Julian's thesis already has some good data for that, or maybe we'll have to repeat the experiment in a slightly different setting. I'm cc'ing Rob (the Shadow author) and Aaron (working on a path simulator) to make sure they saw this thread. I can help by reviewing code changes to Tor to make sure data is gathered in a privacy-preserving way, and I'd appreciate if those code changes would be made public together with analysis results. I'm in the process of rewriting the data collection code, and will e-mail later with some of the details. But maybe off-list initially, as I think the first few passes will be very special-purpose and hence not of general interest (though I'm happy to discuss it more publicly if that's more appropriate). Right now I'm considering focusing on trying to get a reasonable (partial) answer to the following question: how well do various timing-analysis attacks actually work?That is, how well do they work when the client model is accurate? I'm not even sure how exactly to define accurate, though I can think of at least a few different ways. But I'm hoping that by focusing on a relatively narrow question, we can see manageable chunks of questions related to what kinds of data can be reasonably collected, and how can we use that data for other purposes. - It might be interesting to observe how Tor usage changes over time. Maybe the research experiment leads to a set of classifiers telling us when a circuit is most likely used for bulk downloads, used for web browsing, used for IRC, unused, or whatever. We could then extend circuit statistics to have all relays report aggregate data of how circuits can be classified. Requires a proposal and code, but I could help with those. Yes, I can see a number of longer-range applications like this. I'm not sure I want to think about proposals and code just yet. - Norman -- Norman Danner - ndan...@wesleyan.edu - http://ndanner.web.wesleyan.edu Department of Mathematics and Computer Science - Wesleyan University ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Building better pluggable transports - GSoC 2013 project
On Tue, Jun 11, 2013 at 05:46:49PM +0100, Steven Murdoch wrote: On 11 Jun 2013, at 12:49, Steven Murdoch [1]steven.murd...@cl.cam.ac.uk wrote: There certainly are quite a few open questions, so it would be good to start planning early. Implementing HTTP is a deceptively difficult project. I've started a design document https://github.com/sjmurdoch/http-transport/ blob/master/design.md which is very much a work-in-progress but I'm interested in comments. Here are some ideas on a few things that I've been thinking about recently, mostly taken from https://www.bamsoftware.com/papers/oss.pdf. That's an HTTP-based transport, though one with different goals: It's meant to evade IP-based blocking and not DPI. (The paper does have a section at the end about mitigations against DPI.) Bi-directional data Tor requires that communication exchanges be initiated either by the bridge client or bridge server. In contrast HTTP clients initiate all communications. There are a few ways to avoid this problem: * The client periodically polls the server to check if any data is available * The client keeps a long-running Comet TCP connection, on which the server can send responses * The client and server both act as HTTP clients and HTTP servers, so can each send data when they wish Making the client an HTTP server has the same NAT problems that flash proxy has. The OSS model has the worst of both worlds: the client has to be an HTTP server and also has to poll. But we implemented polling and it was usable. Proxy busting Proxies will, under certain conditions, not send a request they receive to the destination server, but instead serve whatever the proxy thinks is the correct response. The HTTP specification dictates a proxy's behaviour but some proxy servers may deviate from the requirements. The pluggable transport will therefore need to either prevent the proxy from caching responses or detect cached data and trigger a re-transmission. It may be unusual behaviour for a HTTP client to always send unique requests, so it perhaps should occasionally send dummy requests which are the same as before and so would be cached. To inhibit caching we added a random number to every request. However that's a good point about not having all requests be unique. Client to server (requests) Cookies Short and usually do not change, so possibly not a good choice HTTP POST file uploads Quite unusual, but permit large uploads Another avenue is URLs--they are sometime kilobytes long (and clients and servers support much longer than that), and often contain opaque binary data. David ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Building better pluggable transports - GSoC 2013 project
On 11 Jun 2013, at 12:49, Steven Murdoch steven.murd...@cl.cam.ac.uk wrote: There certainly are quite a few open questions, so it would be good to start planning early. Implementing HTTP is a deceptively difficult project. I've started a design document https://github.com/sjmurdoch/http-transport/blob/master/design.md which is very much a work-in-progress but I'm interested in comments. Best wishes, Steven ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Building better pluggable transports - GSoC 2013 project
I've been thinking about writing a lessons-learned document about StegoTorus; I'll bump that up a little on the todo queue. For right now I want to mention that any greenfields design should take a hard look at MinimaLT http://cr.yp.to/tcpip/minimalt-20130522.pdf as its cryptographic layer. It looks like it addresses most if not all of the problems I was trying to tackle with ST's crypto layer, only (unlike ST's crypto layer) it's actually *finished*. On Tue, Jun 11, 2013 at 12:46 PM, Steven Murdoch steven.murd...@cl.cam.ac.uk wrote: On 11 Jun 2013, at 12:49, Steven Murdoch steven.murd...@cl.cam.ac.uk wrote: There certainly are quite a few open questions, so it would be good to start planning early. Implementing HTTP is a deceptively difficult project. I've started a design document https://github.com/sjmurdoch/http-transport/blob/master/design.md which is very much a work-in-progress but I'm interested in comments. Best wishes, Steven ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev