Re: [tor-dev] Status report - Stream-RTT
On Saturday 10 August 2013 02:37:48 Damian Johnson wrote: Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it. Is this meaning at least documented somewhere and I have just read over it? Here's the relevant part of the spec... https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1738 Patch submitted in ticket 9932[0]. Best, Robert [0] https://trac.torproject.org/projects/tor/ticket/9932 signature.asc Description: This is a digitally signed message part. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
[tor-dev] Status report - Stream-RTT
tl;dr When building a circuit, measuring the RTT a single time could provide better latency and anonymity while not affecting throughput. Multiple measurements could be used for running real-time applications like VoIP or optimizing throughput. Despite the fact that the Tor network is currently in an unusual state so to say, I have been spending the last weeks looking into stream-RTT data of circuits. I gathered the data shortly before and at the beginning of the huge botnet usage. This is what I have found out: As assumed stream-RTT measurements of a single circuit are not at a fixed value but distributed since they are subject to multiple influences. After comparing stream-RTT distributions of multiple circuits, I found lots of different shapes and I realized that no single distribution fits them all. The Time-To-First-Byte (TTFB) for fetching a small website over HTTP is used to approximate the latency of a certain circuit. I used different methods to check the correlation between the RTT of a circuit and its TTFB - all indicating a very high correlation. Hence, stream-RTTs of a circuit make a good estimator for its TTFB and therefor its latency. In terms of latency, using a single stream-RTT measurement (First-RTT) performs better than the currently used method CBT. So far I haven't done any testing/calculations on the other metrics: bandwidth and anonymity. I would assume the former to be unaffected by First-RTT. Latter could probably be slightly increased, if the percentage of discarded circuits would be reduced from 20% with CBT to 10% or 15% with First-RTT - while still achieving a minor improvement in latency. Nevertheless I would not recommend using First-RTT as method for providing low latency circuits to applications, because it only gives a small hint about the quality of a circuit and cannot make sure that some latency properties hold for a certain circuit. Nevertheless First-RTT works pretty well comparing to the minimum effort it takes. Additionally I played around a lot with methods to provide a better estimator for latency properties of a certain circuit. But they all need far more than a single measurement and are therefor out of scope for the common case. Besides they cannot protect against suddenly changing circuit conditions. But they could be used to fulfill a application specific maximum RTT for real-time applications like VoIP. With the use of similar techniques it should be possible to detect circuits that include a node that's within its bandwidth limit. This could be used for providing high bandwidth circuits for applications like BitTorrent. Best, Robert signature.asc Description: This is a digitally signed message part. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Status report - Stream-RTT
On Saturday 10 August 2013 23:52:44 Damian Johnson wrote: If I understand this correctly you're thinking that multiple calls to extend_circuit() cause parallel EXTENDCIRCUIT requests, and the first response would be used for both callers. Is that right? Yes. If so then I would be very interested if you actually see that behaviour. Stem provides thread safe controller communication. See the msg() method of the BaseController - though the Controller's methods are called in parallel the actual socket requests are done in serial to prevent that exact issue that you describe. That looks fine to me. I obviously drew the wrong conclusion from the issues I have encountered. My fault, sorry. Best, Robert signature.asc Description: This is a digitally signed message part. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Status report - Stream-RTT
On Saturday 10 August 2013 02:37:48 Damian Johnson wrote: Hi Robert. Here's the relevant part of the spec... https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1738 Thanks. I will try to make that part more clear and open a ticket. If requests are sent to Tor to create more then a single circuit at once, the mapping between circuit events and create-request is unknown because the circuit ID is not known until the LAUNCHED-event has been received. This is clearly an issue on Tor's side but one could argue that Stem should stop me from using it that way. Not sure that I follow. The extend_circuit() returns the circuit id (it's provided by the EXTENDCIRCUIT call). Are you saying that tor's EXTENDCIRCUIT response is wrong when done in parallel? As far as I understand it it's not necessarily wrong but it might be the case that a response that does not belong to the call is received first: Assume a single program making two extend_circuit() calls within a short time. If the first EXTENDED response is delayed for some reason, both calls receive the EXTENDED response belonging to the second call - both calls use the same circuit ID. Another case, again a single program making two extend_circuit() calls within a short time: if the second call has been made before the first EXTENDED response is received, the second call will use the EXTENDED response from the the first call when it arrives - both calls use the same circuit ID. Therefore the await_build parameter should be True by default IMHO. Anyway it should be made clear that the await_build parameter doesn't work when extend_circuit() is used by two separate programs/threads that run concurrently. The user has to do the locking of (at least) the LAUNCHED event herself then. Besides I could not find any filtering of Tor-internal circuit events. If a Tor- internal circuit EXTENDED event occurs during an extend_circuit() call, the wrong circuit ID will be used. I hope, this is not too confusing. Best, Robert signature.asc Description: This is a digitally signed message part. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Status report - Stream-RTT
As far as I understand it it's not necessarily wrong but it might be the case that a response that does not belong to the call is received first: Assume a single program making two extend_circuit() calls within a short time. If the first EXTENDED response is delayed for some reason, both calls receive the EXTENDED response belonging to the second call - both calls use the same circuit ID. If I understand this correctly you're thinking that multiple calls to extend_circuit() cause parallel EXTENDCIRCUIT requests, and the first response would be used for both callers. Is that right? If so then I would be very interested if you actually see that behaviour. Stem provides thread safe controller communication. See the msg() method of the BaseController - though the Controller's methods are called in parallel the actual socket requests are done in serial to prevent that exact issue that you describe. Apologies if I'm misunderstanding what you're describing. -Damian ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Status report - Stream-RTT
Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it. Is this meaning at least documented somewhere and I have just read over it? Hi Robert. Here's the relevant part of the spec... https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1738 What kind of issue does that encounter? Is it a problem with stem's thread safety or an issue on tor's side? If requests are sent to Tor to create more then a single circuit at once, the mapping between circuit events and create-request is unknown because the circuit ID is not known until the LAUNCHED-event has been received. This is clearly an issue on Tor's side but one could argue that Stem should stop me from using it that way. Not sure that I follow. The extend_circuit() returns the circuit id (it's provided by the EXTENDCIRCUIT call). Are you saying that tor's EXTENDCIRCUIT response is wrong when done in parallel? Not quite. The connect_port() function never returns an exception. Rather, if it fails to establish a control connection then it prints the issue to stdout and returns None. Also, the connection it provides is already authenticated. If Tor has ControlPort enabled without having HashedControlPassword set, authenticate() has to be called to authenticate the connection. Though this is not recommended I don't know which other default setting would be more appropriate. I think there's some misunderstanding. Yes, when you establish a new controller connection you need to call authenticate(), even if Tor doesn't require any credentials. connect_port() is a convenience function that does everything (including authentication) for you. If tor requires a password then it gives the user a password prompt. If it runs into an error then it prints an explanation of the failure and returns None. Sounds like I need some more documentation here... Cheers! -Damian ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Status report - Stream-RTT
Hi Robert, sorry about the delay. I couldn't sink the time a reply to this thread deserved until now. -) When I wanted to check if a certain node is an exit node it took me some time to figure out that looking for an exit flag is not sufficient because some nodes are in fact exit nodes but don't have an exit flag. Yup. It's unfortunate that tor decided to include an 'Exit' flag with such an unintuitive meaning. You're not the first person to be confused by it. One has to look at the nodes exit policy which is unaccessible by default because of microdescriptors. Maybe returning some meaningful message when one uses get_server_descriptor() and microdescriptors are enabled would help..? Good idea! Done... https://gitweb.torproject.org/stem.git/commitdiff/e78f1b7 -) It is not safe to use extend_circuit in parallel for creating new circuits. I think this is not mentioned anywhere. What kind of issue does that encounter? Is it a problem with stem's thread safety or an issue on tor's side? or would like a code review then let me know. That would be awesome! Few things I'm spotting offhand... self._lock.acquire() Manual lock handling is risky. If anything within this block raises an exception (and there's several points throughout your script where you use Controller methods that can potentially raise errors) then the lock won't be released. The safer way of doing this is to use the 'with' keyword... with self._lock: # do stuff This is the same as... try: self._lock.acquire() # do stuff finally: self._lock.release() def read(self): ... return None Not necessary. Methods return None by default. # pylint: disable-msg=R0902 You might want to look into pyflakes and pep8. I've found them to be better static analysis tools. try: controller = connect_port() except SocketError: sys.stderr.write(ERROR: Couldn't connect to Tor.\n) sys.exit(1) controller.authenticate() Not quite. The connect_port() function never returns an exception. Rather, if it fails to establish a control connection then it prints the issue to stdout and returns None. Also, the connection it provides is already authenticated. This should instead be... controller = connect_port() if not controller: sys.exit(1) # failed to get a control connenction Cheers! -Damian ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Status report - Stream-RTT
On Saturday 27 July 2013 04:41:45 Damian Johnson wrote: Hi ra, glad to see that you're using stem! Sure, stem works really great! If you have any questions, suggestions, feature requests, These parts were a bit tricky to figure out for me: -) When I wanted to check if a certain node is an exit node it took me some time to figure out that looking for an exit flag is not sufficient because some nodes are in fact exit nodes but don't have an exit flag. One has to look at the nodes exit policy which is unaccessible by default because of microdescriptors. Maybe returning some meaningful message when one uses get_server_descriptor() and microdescriptors are enabled would help..? -) It is not safe to use extend_circuit in parallel for creating new circuits. I think this is not mentioned anywhere. -) Router status V2/V3 also took me some time but this has already been fixed. or would like a code review then let me know. That would be awesome! As of just four weeks ago the Controller started providing v3 responses I missed that obviously. Fixed in [0]. circ.build_flags.count('IS_INTERNAL') == 0 This would more commonly be done as... 'IS_INTERNAL' not in circ.build_flags Fixed in [0]. try: controller.reset_conf(__DisablePredictedCircuits) controller.reset_conf(__LeaveStreamsUnattached) controller.close() except NameError: pass What raises a NameError? This was a leftover where it has been possible that controller doesn't exist at that time. Fixed in [0]. # close circuit, but ignore if it does not exist anymore try: self._controller.get_circuit(self._cid) self._controller.close_circuit(self._cid) except (ValueError, InvalidArguments): pass What is the purpose of the get_circuit() call? If it's not superfluous It doesn't do any harm but is definitely superfluous. Fixed in [0]. try: controller = Controller.from_port() except SocketError: sys.stderr.write(ERROR: Couldn't connect to Tor.\n) sys.exit(1) controller.authenticate() This is certainly a fine way of doing it, but you might want to also look at connection.connect_port()... https://stem.torproject.org/api/connection.html#stem.connection.connect_por t It is intended to be a quick and easy method of getting a Controller for command-line applications. For instance, it will present a password prompt if tor is configured to use password authentication. Just realized I should have included it in a tutorial somewhere... I didn't know that. Since the script now depends on stem version 1.0.1 anyway, I integrated it. Thank you for your feedback so far! Your code looks great! If you wouldn't mind I'd love to reference it on stem's examples page... Sure, go ahead. Shall I reference 'https://bitbucket.org/ra_/tor-rtt/' or do you anticipate your project having a more permanent home? (this might be a question for Mike as much as you) I would not mind but I don't have any plans for that. Mike only asked me to make the code accessible online. Best, Robert [0] https://bitbucket.org/ra_/tor- rtt/commits/666e0b173871ba3f699c8bc07bfb156f653adf7a signature.asc Description: This is a digitally signed message part. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
[tor-dev] Status report - Stream-RTT
Hi all! During the last weeks I have been very busy working on my GSoC project which is about reducing the RTT of preemptively built circuits. There is now a single script called rttprober[0] that depends on a patched[1] Tor client running a certain configuration[2]. The goal is to measure RTTs of Tor circuits. It takes a few parameters as input: an authenticated Stem Tor controller for communication with the Tor client, the number of circuits to probe, the number of probes to be taken for each circuit and the number of circuits that should be probed concurrently. It outputs a tar file containing lzo-compressed serialized data with detailed node information, all circuit- and stream-events involved and the circuit build time for further analysis. Since the RTT-measurements are run in parallel with very short locks it is important not to overload Tor nodes. Therefore a single node is not probed more than once at a time. A first analysis of some measurements taken supports the original assumption that a Frechét distribution fits both the circuit build times[3] and round trip times[4]. I will continue gathering and analyzing measurement data and will hopefully be able to draw some conclusions from that. Best, Robert [0] https://bitbucket.org/ra_/tor- rtt/src/1127f6936086664981fc55b4dbc82b1570714140/rttprober.py?at=master [1] https://bitbucket.org/ra_/tor- rtt/src/1127f6936086664981fc55b4dbc82b1570714140/patches?at=master [2] https://bitbucket.org/ra_/tor- rtt/src/1127f6936086664981fc55b4dbc82b1570714140/torrc?at=master [3] http://postimg.org/image/je8k5yydt/ [4] http://postimg.org/image/ktk90vxm7/ signature.asc Description: This is a digitally signed message part. ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev