Hi- I'm hoping to get some feedback on a design for censorship-resistant mirroring system called Where's the Party? It's distinguished by requiring ZERO custom software at all for either clients or hosts. Rather, it uses any web browser (including IE6) and a dumb webserver. I hope this will enable several orders of magnitude greater participation than previous efforts.
I've discussed this with some of you already, but the design has evolved substantially, so please take another look. In order to facilitate discussion, I'm just gonna paste the readme below. Pardon the wall of text - if it's tl;dr for you, check out the How it Works section. I think I've got something really promising here; your feedback is appreciated. --Pete ******************* Where's the Party? ******************* Where's the Party is a scalable, censorship-resistant mirror network for the web. It aims to make mirroring more accessible for both clients and hosts, allowing several orders of magnitude greater participation. WTP is different from previous mirroring efforts because it works entirely with software users already have - *no* additional packages need to be installed for either clients or hosts. WTP consists of software tools for building mirror networks and a website for matching content with volunteer hosts. It operates by using JavaScript in the browser to reconstruct the namespace that is a website from content hosted on multiple servers. :author: Peter Fein :email: p...@wearpants.org :source: http://github.com/pfein/wherestheparty =============== Problems Solved =============== Where's the Party enables the distribution of static HTML content to users on censored networks. It aims to be resistant against several forms of censorship: host/IP blocking, content-aware filtering (DPI) and legal takedowns (DMCA). The extremely low rates of uptake for browser plugins and other installable software is well known. At the height of the Egypt crisis, usage of `TOR skyrocketed`_ - to just over 2000 users, out of a population of 83 million. Peer-to-peer solutions such as BitTorrent often fail the "grandmother test", restricting access to the technologically savvy. For clients, WTP leverages the familiar web browser that users already have installed. It will support all browsers back through IE6 - a particular benefit in China, where `IE6 usage rates`_ remain as high as 35%. Working on the `streisand.me`_ mirror project, we observed similar problems for volunteers who wish to host a mirror. Apache configuration requires a fair degree of technological ability, and is simply not available for many popular hosting platforms (S3, Dropbox). The sheer size of content, often including videos, is an obstacle to hosting. For example, the `HBGary leaks site`_ was 9 GB in size. While not enormous, a tarball this large will take several hours to download and may exceed the capacity of many hosting plans. WTP makes mirroring easier by serving content from any directory on a dumb web host. An individual host may mirror only a portion of the total content. ===================== Problems Unsolved ===================== Some features of an ideal mirroring system need to be sacrificed for greater accessibility. There is no support for dynamic content such as a blog or CMS (though this may be mitigated by static snapshots). WTP does not provide anonymity or privacy, leaving clients who surf mirrors vulnerable to detection. Such concerns are best left to special-purpose tools like TOR & VPNs. WTP does not specify how to securely find an initial entry node. Rather, users find the network the same way as any other website - from a link, a URI written on a sidewalk, or a message from $DEITY (navigation *between* nodes is cryptographically verified). I call this approach a "rumor net" - as a user, you find the party via out-of-band channels. Once there, you can safely stay at the party and tell others where it is. Finally WTP may result in a slightly slower browsing experience, as resources may need to be loaded multiple times. This is faster than completely unavailable, and therefore acceptable. ======== Censors ======== The task of a censor is complicated by WTP. Takedowns and blocking are more difficult because of the large number of mirrors, in multiple jurisdictions. The colocation of censorable and inoffensive content on the same server may make coarse blocking or domain seizure less appealing. The use of cryptographic signing makes hijacking a WTP session impossible. Since the list of mirrors is public, it is relatively easy to map the extent of the network (once the censor understands how the system operates). Hopefully, new mirrors can be added quickly enough to stay ahead. Censors may run mirrors themselves as honeypots, to collect IPs and other user data. ============= How it Works ============= Basic Operation ++++++++++++++++ The content of a website is copied across the mirror network. An individual node may host only a fraction of the total pages; some resources, such as CSS or JS may be present on all mirrors. Each mirror has a list of the root URIs for some (not all) of the other nodes, and the public half of a keypair (the "verification keys"). A cryptographic signature is stored next to each resource (index.html.sig). A browser connects to the network via an out-of-band link. All pages include JavaScript which intercepts clicks and resources loads (images, etc.) for URIs with the current host. Resources from other hosts are not modified. On a click, the JavaScript checks the current host for the resource. If found, the associated signature is checked with the verification public key. If the check passes, the resource is loaded. Several errors are possible: 1. the resource may not exist on the current server (404) 2. the current server timeouts. This can occur if the user leaves a browser window open and the node is taken down or blocked. 3. the verification signature is invalid For (1) or (2), the JavaScript uses a cross-domain request (XDM) and walks through the list of mirrors to find the target resource. If found, it verifies the signature of the resource *and* the signature of the WTP JavaScript on the remote mirror, as well as that the remote public key is the same. If these tests pass, the browser is redirected to the remote resource. If none of the nodes in the current mirror list has the resource, their mirror lists are consulted by the same process. For (3), the user is alerted via popup, and given the option to load the resource from the current host or from a different node. XXX user choice here is lame Obfuscation +++++++++++ Obfuscation is introduced to thwart content-aware filtering at the network level. A second keypair (the "obfuscation keys") are added; the private key is stored on the mirror. Resources are non-semantically mutated in a unique way for each node (by introducing spaces between HTML tags, say). By encrypting a mutated resource, the raw bytes transmitted by a particular node will be totally different than any other. The client JavaScript loads the resource and replaces the page body with the decrypted version. Similar results may be achieved for binary formats (images, video) by flipping a single bit. Note that this encryption provides only obfuscation, not security (as the publicly-accessible mirror has the private key), The JavaScript itself cannot be so encrypted, as it would need to decrypt itself. Instead, existing JS obfuscaters can be used, ideally ones which take a user-provided seed. The filenames of common resources may also be varied, including those of the WTP JavaScript itself. Such files could be found by examining the index.html at a mirror root. Proof of Authorship ++++++++++++++++++++++ Proof of authorship may be added by signing the verification key with a known, identified keypair (the "author keys"). JavaScript cold be used to fetch the author's public key from the PGP keyservers (using XDM) and then verify the signature of the verification key. While anonymity may be maintained by using a newly-created email & keypair, this step is entirely optional. Health Checks ++++++++++++++ A standalone application could be used to spider a WTP mirror network and report on down nodes, signature errors, resource replication statistics, and so on. Similarly, client JavaScript could optionally report back to a web service specified by the party creator about down nodes and signature errors. Versioning +++++++++++ As publishing updates to a distributed mirror network may take some time, WTP can include a version number for the party as a whole (a la Subversion's revision numbers). JavaScript can detect if a resource on a remote mirror is older than the current generation. It can then look for newer copies on other hosts, alerting the user that content may be out of date if necessary. ======================== wherestheparty.net ======================== wherestheparty.net (WTPnet) is a website to facilitate the matching of content with volunteer hosts. Volunteers sign up, specify how much and what kind of content they want to host, and provide login credentials (rsync, (s)ftp, S3, etc.) for a webserver. WTPnet will periodically scan `The Pirate Bay`_ and other BitTorrent search engines for specially tagged content (`partywithme`). Such torrents will be automatically downloaded, their content extracted and then transformed to add the necessary JavaScript, keys and signatures. The resultant party will be divided into appropriately-sized portions and uploaded to volunteer hosts. Mirror lists on existing hosts will be updated periodically. As the website is highly likely to be blocked, its use is entirely optional. However, as content creators need access to BitTorrent, not the site itself, this problem is somewhat mitigated. Updates +++++++ By signing the content tarball using author keys (described in `Proof of Authorship`_), the party creator gains the ability to update content in the future. To update a party, the author creates an update tarball with new/changed files and a manifest of deletions. This file is signed using the author private key, and the tarball and signature are served through BitTorrent as described above. WTPnet can download this new torrent, verify the signature and update the mirrors as necessary. Note that the public author key can be included in the torrent and need not be uploaded to an external keyserver. Community +++++++++++++++++ Several difficulties arise from a fully-automated mirroring system. There may be more content than hosting space available. Some content may expose mirror owners to local legal or political liability. The existence of free storage is an attractive target for spammers and trolls. These problems can be mitigated with the use of collaborative decision making systems (a la `Reddit`_). A small subset of content from a potential party will be unpacked and served to browsers (either by direct hosting or on nodes willing to host unreviewed content). Users can help provide a brief description and other metadata (political relevance, legal risks), as well as flag potential parties as spam or inappropriate. They will be able to vote on whether that content should be mirrored on WTPnet. Additional weight will be given to the votes of users who: * provide more mirror space (logarithmic, so that small mirrors are not overwhelmed) * have a longer history of mirroring (again logarithmic, so that new users are not automatically outvoted) * mirror content on under served countries, languages and topics * mirror under-replicated content (see below) The actual content mirrored on a particular node is left up to that node's owner. Volunteers may allocate space to parties selected by the community, subject to constraints they specify (i.e., "exclude content that is legally risky in my jurisdiction"). Alternately, they may prefer individual parties, authors, topics or countries. Extra voting weight will be given to volunteers who mirror scarce (i.e., under-replicated) content. System administrators may set reasonable limits on the number of mirrors for popular parties. For example, the world probably doesn't need any more `WikiLeaks mirrors`_ at present. Other Content & Services ++++++++++++++++++++++++ WTPnet will provide a list of known parties, instructions on how to use the software and links to information about communications safety. It could run a spider as described in `Health Checks`_ and use the reports to improve the redundancy of the networks it manages. Note that WTPnet will *not* host parties itself, as this would significantly increase its exposure to legal and technological threats. ======================== Implementation ======================== Core JavaScript logic will be written using `Coffeescript`_, a friendlier dialect of JavaScript. Cross-domain requests will use `EasyXDM`_. Cryptography will use the `Stanford JavaScript Crypto Library`_. The use of jQuery will be avoided to allow its use by content without conflicts. Python will be used to transform content, using `lxml`_. Key generation and signing will be done with `Pycrypto`_ or `M2Crypto`_. A health check spider could be written with `scrapy`_. Testing can use `selenium`_ and/or `Browsershots`_. For WTPnet, the main site could be written in `Django`_ or another of the many Python web frameworks. Screen scrapers for The Pirate Bay would be written with standard library modules, lxml or scrapy. The orignal `BitTorrent`_ client could be used for downloads. For updloading to mirrors, there is `ftplib`_ for FTP, `paramiko`_ for ssh/sftp, `pysync`_ for rsync. (several alternatives available for all of these, including wrappers around commandline utilities). `scipy`_ and `NLTK`_ can be used for automated language and topic identification, and spam filtering. `Google Translate`_ links will be present on sample pages. ====================== Open Questions/Issues ====================== * Which JS obfuscater to use? Don't know if any support the ability to obfuscate in a variable way. * Is there a better domain than wherestheparty.net? All the good ones are taken. * Are there other ways of getting content into WTPnet? Searching for tags/links/named files on Google, file hosting services or links on pastebins perhaps? * Elliptic curve DSA would be preferable to RSA, but SJCL doesn't currently support it. * WTPnet could generate tarballs on demand for users who do not want to supply login credentials. This makes updating their mirror lists more difficult, but maybe a small mirror-list-update script could be provided. * Should WTPnet have a keypair so that tarballs can be transmitted to it securely? Motivation is to prevent content filtering on upload to a file hosting site. .. _`TOR skyrocketed`: https://blog.torproject.org/blog/recent-events-egypt .. _`IE6 usage rates`: http://micgadget.com/11633/why-the-chinese-still-favour-internet-explorer-6/ .. _`streisand.me`: http://streisand.me/ .. _`HBGary leaks site`: http://hbgary.anonleaks.ch/ .. _`The Pirate Bay`: http://thepiratebay.org/ .. _`Reddit`: http://reddit.com/ .. _`WikiLeaks mirrors`: http://wikileaks.ch/Mirrors.html .. _`Coffeescript`: http://jashkenas.github.com/coffee-script/ .. _`EasyXDM`: http://easyxdm.net .. _`Stanford JavaScript Crypto Library`: http://bitwiseshiftleft.github.com/sjcl/ .. _`lxml`: http://lxml.de/ .. _`Pycrypto`: http://pycrypto.org .. _`M2Crypto`: http://chandlerproject.org/bin/view/Projects/MeTooCrypto .. _`scrapy`: http://scrapy.org .. _`selenium`: http://seleniumhq.org/ .. _`Browsershots`: http://browsershots.org/ .. _`Django`: http://djangoproject.org .. _`BitTorrent`: http://pypi.python.org/pypi/BitTorrent/ .. _`ftplib`: http://docs.python.org/library/ftplib.html .. _`paramiko`: http://www.lag.net/paramiko/ .. _`pysync`: http://freshmeat.net/projects/pysync/ .. _`scipy`: http://www.scipy.org .. _`NLTK`: http://www.nltk.org/ .. _`Google Translate`: http://translate.google.com/ _______________________________________________ p2p-hackers mailing list p2p-hackers@lists.zooko.com http://lists.zooko.com/mailman/listinfo/p2p-hackers