Re: [tor-dev] Scaling tor for a global population
Thanks isis. We worked on PIR-Tor a while ago, so my recollection could be a bit rusty, but here are some thoughts. 1) From a performance perspective, the best results are obtained by using the information-theoretic variant of PIR-Tor (ITPIR) (as opposed to the computational variant). As you correctly point out, we considered the three guard relays to be the PIR directory servers, with the assumption that not all three collude. (Our argument for the non-colluding assumption was that if all three guards are malicious, then the security offered by Tor is terrible already.) Note than ITPIR necessarily requires more than one server. Given Tor's move to a single guard design, the first challenge is to figure out alternative PIR servers for the client. (I havn't thought much about this challenge, but it seems that the single guard could be one server, and two other relays (maybe with guard flags) could play the role of PIR directory servers. I'll have to think more about this, but it is possible that this design continues to provide similar security properties to Tor) 2) Even outside the context of PIR, there are several advantages to having some structure in the database, with fine grained signatures on individual relay/micro descriptors (relays could also be grouped into blocks, with signatures at the block level). For example, if the Tor network were to use 2 hop circuits, then clients would have close to 0 (or at least significantly less than 1%) overhead for maintaining network state. Why? Because the first hop (guard) is fixed, fetching the directory to select the guard is only a *one-time* operation; note that much of the directory overhead in Mike Perry's analysis comes from *periodic fetches*. Secondly, clients do not need to use PIR to fetch the exit relay in this setting, since guards know the identity of exits in a two hop design anyway (bandwidth overhead of fetching a single exit is simply the size of the descriptor/micro-descriptor -- resulting in *several orders of magnitude* bandwidth savings). (This optimization was referred to in the paper in the context of fetching middle relays) (Of course there are anonymity implications of moving to two hop designs. Perhaps the most significant concern is easy guard identification, but perhaps an additional 100 million clients + move to a single guard reduces these concerns. The bandwidth savings could be very significant, specially in the context of mobile clients, which otherwise might use hundreds of MB per month just to maintain network state. Anyway, I digress.) Please find some comments about your previous mail inline (below). On Tue, Sep 30, 2014 at 1:04 AM, isis i...@torproject.org wrote: 1. The authors assume that a client's three [sic] Guard Nodes will be the directory mirrors for that client, accepting PIR queries for descriptors. Right, please see (1) above. 3. There is zero handling of colluding directory mirrors. This seems incorrect? We used the Percy open source library to perform PIR, which does handle collusion among mirrors (to the best of my recollection). The parameters can be set such that one honest server is sufficient for security. 4. The *increase* to descriptor size is not well factored in to the analysis. 4.b. All of the Directory Authorities would need to sign each and every descriptor by itself. This would result in the current microdescriptors, which are sized from roughly 900 bytes to some of the larger ones which are 2.5 KB, increasing in size by roughly 4 KB (that's the size of the DirAuth signatures on the current consensus). I am not sure which signature scheme you are considering. In the paper, we talk about threshold BLS signatures, which have only 22 byte overhead. (This might increase for better security guarantees, but 4KB overhead seems very conservative.) While one of the benefits of PIR would be that clients would no longer need to request the descriptors for all relays listed in the consensus, this benefit actually doesn't help as much as it would seem, because switching to a PIR scheme would require a client to make three of these roughly O(146449)-bit requests, every ten minutes or so, when the client wants a new circuit. Right, clients would roughly need 18 middle/exit relays (using PIR queries) over a three hour interval to build a Tor circuit every 10 minutes. My recollection is that even in 2011, PIR seemed more efficient (we estimated a 12KB overhead per PIR query) than a full network fetch every three hours (12*18 = 216KB with PIR vs several MBs without PIR). Though these numbers might change depending on implementation details (including the use of micro-descriptors), the benefits would only improve with growing network size. Thanks, Prateek -- Prateek Mittal Assistant Professor Princeton University http://www.princeton.edu/~pmittal/ ___ tor-dev mailing
Re: [tor-dev] Scaling tor for a global population
On Tue, Sep 30, 2014 at 6:29 AM, Prateek Mittal pmit...@princeton.edu wrote: 4. The *increase* to descriptor size is not well factored in to the analysis. 4.b. All of the Directory Authorities would need to sign each and every descriptor by itself. This would result in the current microdescriptors, which are sized from roughly 900 bytes to some of the larger ones which are 2.5 KB, increasing in size by roughly 4 KB (that's the size of the DirAuth signatures on the current consensus). I am not sure which signature scheme you are considering. In the paper, we talk about threshold BLS signatures, which have only 22 byte overhead. (This might increase for better security guarantees, but 4KB overhead seems very conservative.) You can make this even lower because every PIR query returns a block which is roughly square root the size of the entire database; you would only need to have a signature on each block. - Nikita -- Nikita Borisov - http://hatswitch.org/~nikita/ Associate Professor, Electrical and Computer Engineering Tel: +1 (217) 244-5385, Office: 460 CSL ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Scaling tor for a global population
On Tue, Sep 30, 2014 at 12:04 AM, isis i...@torproject.org wrote: [1]: I assume that we need replication in Tor's use case. There are papers, such as the following: Kushilevitz, Eyal, and Rafail Ostrovsky. Replication is not needed: Single database, computationally-private information retrieval. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, 2013. http://www.cs.fiu.edu/~carbunar/teaching/cis5374/slides/pir-single.pdf for which the research doesn't apply because it was aimed at computationally-hiding PIR schemes, and obviously Tor aims for information theoretic security. Other than the PIR-Tor paper, I haven't found any literature which analyses PIR for anything close to Tor's use case. (Although I'd be stoked to hear about any such papers.) The current Tor design relies heavily on computational assumptions in all of its cryptography, so I don't think that the Tor network setting is a reason to use information-theoretic rather than computational PIR. We advocated ITPIR because it is much more efficient and because collusion among your three guards was already a significant problem. With the move to a single guard, that argument no longer makes as much sense. We do have an analysis in the paper of the computational PIR scenario, but it does impose significantly more (computational) overhead on the directory mirrors. - Nikita -- Nikita Borisov - http://hatswitch.org/~nikita/ Associate Professor, Electrical and Computer Engineering Tel: +1 (217) 244-5385, Office: 460 CSL ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Scaling tor for a global population
On 09/30/2014 06:28 AM, AFO-Admin wrote: E.g. you have a Server with 2x E5-2683 v3 v3 and a 10 Gbit/s pipe you would need atleast 14 IP's to use most of the CPU. Multicore support is hard and needs developers. Raising the limit from 2 relays per IP to x per IP has been discussed in the past and would be an easy change. -- Moritz Bartl https://www.torservers.net/ ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
[tor-dev] Getting the network consensus
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi all, While working on peer-to-peer hidden services I've been wondering how long two clients need to wait before arriving at the same view of the Tor network. The answer, which I find surprising, is that this is never guaranteed to happen. Here's why. In what follows I'll assume that the interval between network consensuses being published is one hour, but the conclusions remain the same if the interval is longer or shorter - just change the units. I'll use 1:00 consensus to mean a consensus that becomes valid at 1:00, is fresh until 2:00, and is valid until 4:00. According to dir-spec.txt, each directory cache downloads the 1:00 consensus at a randomly chosen time between 1:00 and 1:30. Each client replaces the 1:00 consensus with a newer consensus at a randomly chosen time between 2:45 and 3:51. (The precise endpoint is 3:50:37.5, due to a calculation described in section 5.1 of the spec. I'm curious to know the origin of that calculation.) Some observations: 1. The period when clients are replacing each consensus slightly overlaps the period when they're replacing the previous consensus. For about five and a half minutes of every hour there are twice as many clients replacing their consensuses, so directory caches will experience greater load during this period. How much greater? We can divide clients that are downloading the consensus into three groups: A. Clients that have never downloaded a consensus. B. Clients that are replacing a consensus that's no longer valid. C. Clients that are replacing a valid consensus with a newer consensus. Clients in group A bypass the directory caches and hit the authorities, so they don't contribute to load on the caches. Clients in group B have just started up and loaded an expired consensus from disk. These clients are equally likely to hit the caches at any point in the consensus interval (although the load will vary by time of day, day of week, etc). Clients in group C are twice as likely to hit the caches between 45 minutes and 51 minutes past the hour as at any other time, due to the overlap between replacement intervals. So unless group C is dwarfed by group B, we should see a noticeable increase in load on the caches between 45 and 51 minutes past the hour. (If group C is dwarfed by group B then the increase will be lost in the noise.) If you run a directory cache, do your logs show that pattern? 2. The time a client chooses for replacing a newly downloaded consensus may be earlier than the time it chose for replacing the previous consensus. Again, this is due to the overlap between replacement intervals: a client may replace the 1:00 consensus at 3:48 and then choose 3:47 as the random time to replace the newly downloaded 2:00 consensus. What happens in that case? I haven't looked at the code, but I'm going to speculate that it doesn't explode in flames or travel through time. A reasonable way to handle the situation would be to round up the replacement time to the present time - in other words, to replace the newly downloaded consensus immediately. If that guess is correct, the extra load during the overlap between replacement intervals will get squeezed towards the end of the overlap, as replacement times occasionally get rounded up but never get rounded down. Some clients will download two consensuses back-to-back during the overlap. Again, if you run a directory cache I'd be interested to know whether you're seeing this pattern. 3. Clients often skip consensuses. The interval for clients to replace the 1:00 consensus runs from 2:45 to 3:51, and the interval for caches to download the 3:00 consensus runs from 3:00 to 3:30. So there's roughly a 46% chance that a client replacing the 1:00 consensus will get the 2:00 consensus, and roughly a 54% chance that it will get the 3:00 consensus. This means it's possible for two clients to replace their consensuses regularly but never see the same consensus: one of them sees the consensuses published on odd hours, the other sees the consensuses published on even hours. This situation is unlikely to continue for long if each client makes a fresh random choice for each replacement time, but there's no time at which we can say the clients will definitely have seen the same consensus. I can't see any partitioning attacks arising from this, but I thought I'd point it out anyway because I find the result surprising. As I said before, I'm curious to know the origin of the calculation in section 5.1 that produces the overlap between replacement intervals. The replacement time is chosen uniformly at random from the interval between the time 3/4 into the first interval after the consensus is no longer fresh, and 7/8 of the time remaining after that before the consensus is invalid. It's the 7/8 part that results in the odd figure of 50 minutes 37.5 seconds. If the replacement time were to be chosen uniformly at random from the interval between the time 3/4
[tor-dev] Damian's Status Report - September 2014
Code! Beautiful code! How I've missed thee! As mentioned before I'm hip deep in the middle of an arm rewrite. Not only to overhaul the arm codebase, but to ensure Stem will be up to snuff for a Tor GUI or web dashboard (projects I'd *love* to jump into someday). I'm delighted to announce a nice bit of progress on this front. September went toward rewriting its header panel (the top portion of arm's interface)... * Old zomg my eyes, they bleed! https://gitweb.torproject.org/arm.git/blob/e249dc8f5c4c282326324161cf8421f947cf660e:/src/cli/headerPanel.py * New hotness https://gitweb.torproject.org/arm.git/blob/HEAD:/arm/header_panel.py As mentioned though this is really more about Stem than arm. Stem improvements made along the way this month include... * Far better handling for the 'private' keyword (the default prefix) and the default suffix in our ExitPolicy class. (#10107) * New BaseController.connection_time() method that provides when our control connection was established or detached. [1] * New Controller.get_accounting_stats() method which reports accounting information for our relay. [2] * Controller now fetches our own descriptor if no fingerprint is provided to get_server_descriptor(), get_network_status(), or get_microdescriptor(). * New crop() and join() methods for the str_tools module. [3][4] * Dropped a redundant get_* prefix from most of our util function names. Old names still work as aliases. * Our launch_tor_with_config() function raised an OSError if called too many times, caught thanks to RSenet. (#13141) DocTor, the monitor for directory authority health, also got some love this month... * Corrected suppression behavior when we have staggered alerts. This should greatly cut down on the amount of noise on the list. * We now check if too many relays lack a bandwidth authority measurement... or we would if this wouldn't perpetually be in alarm. (#12989, #13046) * Revised DocTor's schedule as requested by Sebastian. (#13210) * Expanded suppression for messages about turtles and kicked off a conversation to at long last sort this out. (#13296) One final note is that I uploaded my Tor Ecosystem presentation to YouTube so volunteer page visitors can see a streaming presentation rather than downloading a 53 MB video... https://www.youtube.com/watch?v=fb6iqZcQsSg Cheers! -Damian [1] https://stem.torproject.org/api/control.html#stem.control.BaseController.connection_time [2] https://stem.torproject.org/api/control.html#stem.control.Controller.get_accounting_stats [3] https://stem.torproject.org/api/util/str_tools.html#stem.util.str_tools.crop [4] https://stem.torproject.org/api/util/str_tools.html#stem.util.str_tools.join ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Scaling tor for a global population
Moritz Bartl transcribed 0.5K bytes: Raising the limit from 2 relays per IP to x per IP has been discussed in the past and would be an easy change. We *still* have that limit? I thought we killed it a long time ago. Can we kill it now? It's not going to do anything to prevent Sybils, it'll only prevent good relay operators on larger machines from giving more bandwidth. -- ♥Ⓐ isis agora lovecruft _ GPG: 4096R/A3ADB67A2CDB8B35 Current Keys: https://blog.patternsinthevoid.net/isis.txt signature.asc Description: Digital signature ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Re: [tor-dev] Scaling tor for a global population
isis: Moritz Bartl transcribed 0.5K bytes: Raising the limit from 2 relays per IP to x per IP has been discussed in the past and would be an easy change. We *still* have that limit? I thought we killed it a long time ago. Can we kill it now? It's not going to do anything to prevent Sybils, it'll only prevent good relay operators on larger machines from giving more bandwidth. If not kill it, raising it to 4-8 might make more sense and be a good interim step. We could lower it again if we have real multicore support (which I still think we should aim for because it also improves the mix properties of fast relays to an external observer, but I realize is no small task). -- Mike Perry signature.asc Description: Digital signature ___ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev