Hi I've been hanging out on IRC as of late under the nick 'sajack'. I'm going to be submitting an application to work with Freenet during Google Summer of Code. Anyway, here's the proposal I'm going to be uploading, if anyone has any thoughts. Thanks for looking at it.
Proposal: Improve Implementation and Functionality of Content Filtration and Add Support for Additional Formats Proposer: Spencer Jackson (sajack) Introduction Content is an important part of the Freenet experience. Good, plentiful content attracts users, which attracts donations and creates more nodes, both of which, directly or indirectly, improve performance and security of the network. As such, to make Freenet better, we must make the process of getting information from the network to the user quick, easy, and safe. I am proposing a series of changes to the ContentFilter and adjacent systems so as to realize this. Below are the general steps I will take. Modify content filters to act as streams Presently, Freenet's data filters are passed Bucket objects containing all of the data they need to process. This is suboptimal. Ideally, the data filters should have a stream interface. This will reduce duplication of data and increase performance by removing the need for vast amounts of disk I/O, as less will be needed to be cached on the disk. This will be very easy to implement, as most of the filters deal with streams internally. Right now, filters are a part of FProxy, and are invoked by it when a file is downloaded. Really though, most clients probably desire filtered data, so filtration should be done earlier, with FProxy simply using general functionality. I will therefore move filters into the client layer, and invoke them there. Now while here, it would be useful to add some new functionality. First off, I'll add the ability to filter files being saved to the hard drive. Right now, this doesn't happen, and it's something of a weak spot in our armor. Later on especially, when there are Ogg filters, users may be downloading large files directly to their hard drive. We will want them to be filtered. Another thing while I'm working with filters in the client layer: I will implement filtration of inserts. This will help prevent metadata in a file uploaded by the user from breaking his or her anonymity. For example, EXIF data in jpegs may reveal the serial number of the camera which took the picture, or even the GPS coordinates from where the picture was taken. Of course, there are some usage cases, such as during debugging, where it may be undesirable for a request to be filtered. It must therefore be possible to disable filtering. To accomplish this, I will prevent the data from being filtered when a configuration setting in the request's context has been set. Support for disabling filters will need to be added to FCP. All of this will then need to be supported in the web interface. I will add support for complementary GET and POST variables in FProxy which would be used to trigger this setting. Next, I'll add UI elements to the download and insert queue pages and any other pertinent locations, such as the 'Downloading a page' page, which would enable these variables. These elements should only be visible when the user is in 'Advanced mode,' and, even then, should be tagged with a Big Fat Warning about the risks of turning off filtering. Another feature I will implement is the ability to run data through a filter without placing it on the network. This would be useful for debugging content filters, and for freesite writers, who want to see what their site will look like after its been parsed. This should be pretty easy to implement. I'll create an FCP message which will take data, filter it, and return it. I will also create a way to do this through FProxy, by uploading a file, and receiving the filtered version. The next thing I will implement are stream friendly Compressors. Essentially, we should be able to have a filter and a decompressor running on separate threads, and have data be passable between them transparently using piped streams. Implementation of Ogg container formatRight now, Freenet has filters for HTML and some forms of image files. More filters means more types of content which may be safely viewed by the user. This will allow the network to be used in ways which are currently not safe. After I have implemented the new stream based content filters, I shall implement more of them. The first type of filter which I will implement is for the Ogg container format. This is technically interesting, as it encapsulates other types of data. A generic Ogg parser will be written, which will need to validate the Ogg container, identify the bitstreams it contains, identify the codec used inside these bitstreams, and process the streams using a second(or nth, really, depending on how many bitstreams are in the container) codec specific filter. It should be possible to use this filter to either filter the just beginning of the file, or the whole thing. This will make it possible to preview a partially downloaded file, at some point in the future. Some things which will need to be taken into consideration are the possibility of Ogg pages being concealed inside of other Ogg pages. This will be checked for, and a fatal error will be raised if it occurs. The Ogg codecs which I will initially add support for are, in order, Vorbis, Theora, and FLAC. More content filters The more filters the better. In the time remaining, I will implement as many different possible content filters. While this step is very important, these codecs individually are of a lower priority than previous steps. I will implement ATOM/RSS, mp3, and the rudiments of pdf. Milestones Here are clear milestones which may be used to evaluate my performance. The following are a list of these goals which should be met to signify completion, along with very rough estimates as to how long each step should take: *Stream based filters (3 days) *Filters are moved to the client layer, with support for (disableable) support filtering files going to the hard drive, and inserts (9 days) *Filters can be tested on data, without inserting it into the network (3 days) *Compressors can be interacted with through streams (4 days) *An Ogg content filter is implemented, supporting the following codecs: (3 days) -The Vorbis codec (2 days) -The Theora codec (2 days) -The FLAC codec (2 days) *Content filters for ATOM/RSS are implemented (5 days) *A content filter for MP3 is implemented (6 days) *A basic content filter for pdf is implemented (Remaining time) Biography I initially became interested in Freenet because I am something of a cypherpunk, in that I believe the ability to hold pseudonymous discourse to be a major cornerstone of free speech and the free flow of information. I've skulked around Freenet occasionally, even helping pre-alpha test version 0.7. But I'd like to do more. I want to put my time and energy where my mouth is and spend my summer making the world, in some small way, safer for freedom. Starry-eyed idealism aside, I am an 18 year old American high school senior, who will be studying Computer Science after I graduate. While C/C++ is my 'first language', so to speak, I am also fluent in Java and Python. Last year, I personally rewrote my high school's web page in Python and Django. This year, I've been working on an editor for Model United Nations resolutions, as time permits. This project is licensed under the GPLv3, and is available on GitHub, at http://github.com/spencerjackson/resolute. It's written in C++, and uses GTKmm for the GUI. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20100408/1ab86278/attachment.html>
