On Thu, Apr 8, 2010 at 12:29 AM, Spencer Jackson <spencerandrewjackson at gmail.com> wrote: > Hi > > I've been hanging out on IRC as of late under the nick 'sajack'. I'm going > to be submitting an application to work with Freenet during Google Summer of > Code. Anyway, here's the proposal I'm going to be uploading, if anyone has > any thoughts. Thanks for looking at it. > > > > Proposal: Improve Implementation and Functionality of Content Filtration and > Add Support for Additional Formats > Proposer: Spencer Jackson (sajack) > > Introduction > Content is an important part of the Freenet experience. Good, plentiful > content attracts users, which attracts donations and creates more nodes, > both of which, directly or indirectly, improve performance and security of > the network. As such, to make Freenet better, we must make the process of > getting information from the network to the user quick, easy, and safe. I am > proposing a series of changes to the ContentFilter and adjacent systems so > as to realize this. Below are the general steps I will take. > > > Modify content filters to act as streams > > Presently, Freenet's data filters are passed Bucket objects containing all > of the data they need to process. This is suboptimal. Ideally, the data > filters should have a stream interface. This will reduce duplication of data > and increase performance by removing the need for vast amounts of disk I/O, > as less will be needed to be cached on the disk. This will be very easy to > implement, as most of the filters deal with streams internally. > > Right now, filters are a part of FProxy, and are invoked by it when a file > is downloaded. Really though, most clients probably desire filtered data, so > filtration should be done earlier, with FProxy simply using general > functionality. I will therefore move filters into the client layer, and > invoke them there. > > Now while here, it would be useful to add some new functionality. First off, > I'll add the ability to filter files being saved to the hard drive. Right > now, this doesn't happen, and it's something of a weak spot in our armor. > Later on especially, when there are Ogg filters, users may be downloading > large files directly to their hard drive. We will want them to be filtered. > > Another thing while I'm working with filters in the client layer: I will > implement filtration of inserts. This will help prevent metadata in a file > uploaded by the user from breaking his or her anonymity. For example, EXIF > data in jpegs may reveal the serial number of the camera which took the > picture, or even the GPS coordinates from where the picture was taken. > > Of course, there are some usage cases, such as during debugging, where it > may be undesirable for a request to be filtered. It must therefore be > possible to disable filtering. To accomplish this, I will prevent the data > from being filtered when a configuration setting in the request's context > has been set. Support for disabling filters will need to be added to FCP. > All of this will then need to be supported in the web interface. I will add > support for complementary GET and POST variables in FProxy which would be > used to trigger this setting. Next, I'll add UI elements to the download and > insert queue pages and any other pertinent locations, such as the > 'Downloading a page' page, which would enable these variables. These > elements should only be visible when the user is in 'Advanced mode,' and, > even then, should be tagged with a Big Fat Warning about the risks of > turning off filtering. > > Another feature I will implement is the ability to run data through a filter > without placing it on the network. This would be useful for debugging > content filters, and for freesite writers, who want to see what their site > will look like after its been parsed. This should be pretty easy to > implement. I'll create an FCP message which will take data, filter it, and > return it. I will also create a way to do this through FProxy, by uploading > a file, and receiving the filtered version. > > The next thing I will implement are stream friendly Compressors. > Essentially, we should be able to have a filter and a decompressor running > on separate threads, and have data be passable between them transparently > using piped streams. > > > Implementation of Ogg container formatRight now, Freenet has filters for > HTML and some forms of image files. More filters means more types of content > which may be safely viewed by the user. This will allow the network to be > used in ways which are currently not safe. After I have implemented the new > stream based content filters, I shall implement more of them. > > The first type of filter which I will implement is for the Ogg container > format. This is technically interesting, as it encapsulates other types of > data. A generic Ogg parser will be written, which will need to validate the > Ogg container, identify the bitstreams it contains, identify the codec used > inside these bitstreams, and process the streams using a second(or nth, > really, depending on how many bitstreams are in the container) codec > specific filter. It should be possible to use this filter to either filter > the just beginning of the file, or the whole thing. This will make it > possible to preview a partially downloaded file, at some point in the > future. Some things which will need to be taken into consideration are the > possibility of Ogg pages being concealed inside of other Ogg pages. This > will be checked for, and a fatal error will be raised if it occurs. > > The Ogg codecs which I will initially add support for are, in order, Vorbis, > Theora, and FLAC. > > > More content filters > > The more filters the better. In the time remaining, I will implement as many > different possible content filters. While this step is very important, these > codecs individually are of a lower priority than previous steps. I will > implement ATOM/RSS, mp3, and the rudiments of pdf. > > > > Milestones > Here are clear milestones which may be used to evaluate my performance. The > following are a list of these goals which should be met to signify > completion, along with very rough estimates as to how long each step should > take: > > *Stream based filters (3 days) > *Filters are moved to the client layer, with support for (disableable) > support filtering files going to the hard drive, and inserts (9 days) > *Filters can be tested on data, without inserting it into the network (3 > days) > *Compressors can be interacted with through streams (4 days) > *An Ogg content filter is implemented, supporting the following codecs: (3 > days) > ?-The Vorbis codec (2 days) > ?-The Theora codec (2 days) > ?-The FLAC codec (2 days) > *Content filters for ATOM/RSS are implemented (5 days) > *A content filter for MP3 is implemented (6 days) > *A basic content filter for pdf is implemented (Remaining time) > > > > Biography > I initially became interested in Freenet because I am something of a > cypherpunk, in that I believe the ability to hold pseudonymous discourse to > be a major cornerstone of free speech and the free flow of information. I've > skulked around Freenet occasionally, even helping pre-alpha test version > 0.7. But I'd like to do more. I want to put my time and energy where my > mouth is and spend my summer making the world, in some small way, safer for > freedom. > Starry-eyed idealism aside, I am an 18 year old American high school senior, > who will be studying Computer Science after I graduate. While C/C++ is my > 'first language', so to speak, I am also fluent in Java and Python. Last > year, I personally rewrote my high school's web page in Python and Django. > This year, I've been working on an editor for Model United Nations > resolutions, as time permits. This project is licensed under the GPLv3, and > is available on GitHub, at http://github.com/spencerjackson/resolute. It's > written in C++, and uses GTKmm for the GUI. > > > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl >
IMHO this looks good. My one concern is that your suggested timeline looks aggressive. It looks to me more like a timeline for writing the code, as opposed to a timeline for writing the code, documenting it, writing unit tests, and debugging it. I know that writing copious documentation and unit tests as we go isn't how Freenet normally does things, but it would be nice to improve on that standard :) I think adding 1 day worth of documentation and unit tests after each of your listed steps would make a meaningful improvement to the resultant body of work. Of course, others might disagree, and it's not a big concern. Like I said, this looks good. Evan Daniel