On Thu, Apr 8, 2010 at 11:33 AM, Evan Daniel <evanbd at gmail.com> wrote:
> On Thu, Apr 8, 2010 at 12:29 AM, Spencer Jackson > <spencerandrewjackson at gmail.com> wrote: > > Hi > > > > I've been hanging out on IRC as of late under the nick 'sajack'. I'm > going > > to be submitting an application to work with Freenet during Google Summer > of > > Code. Anyway, here's the proposal I'm going to be uploading, if anyone > has > > any thoughts. Thanks for looking at it. > > > > > > > > Proposal: Improve Implementation and Functionality of Content Filtration > and > > Add Support for Additional Formats > > Proposer: Spencer Jackson (sajack) > > > > Introduction > > Content is an important part of the Freenet experience. Good, plentiful > > content attracts users, which attracts donations and creates more nodes, > > both of which, directly or indirectly, improve performance and security > of > > the network. As such, to make Freenet better, we must make the process of > > getting information from the network to the user quick, easy, and safe. I > am > > proposing a series of changes to the ContentFilter and adjacent systems > so > > as to realize this. Below are the general steps I will take. > > > > > > Modify content filters to act as streams > > > > Presently, Freenet's data filters are passed Bucket objects containing > all > > of the data they need to process. This is suboptimal. Ideally, the data > > filters should have a stream interface. This will reduce duplication of > data > > and increase performance by removing the need for vast amounts of disk > I/O, > > as less will be needed to be cached on the disk. This will be very easy > to > > implement, as most of the filters deal with streams internally. > > > > Right now, filters are a part of FProxy, and are invoked by it when a > file > > is downloaded. Really though, most clients probably desire filtered data, > so > > filtration should be done earlier, with FProxy simply using general > > functionality. I will therefore move filters into the client layer, and > > invoke them there. > > > > Now while here, it would be useful to add some new functionality. First > off, > > I'll add the ability to filter files being saved to the hard drive. Right > > now, this doesn't happen, and it's something of a weak spot in our armor. > > Later on especially, when there are Ogg filters, users may be downloading > > large files directly to their hard drive. We will want them to be > filtered. > > > > Another thing while I'm working with filters in the client layer: I will > > implement filtration of inserts. This will help prevent metadata in a > file > > uploaded by the user from breaking his or her anonymity. For example, > EXIF > > data in jpegs may reveal the serial number of the camera which took the > > picture, or even the GPS coordinates from where the picture was taken. > > > > Of course, there are some usage cases, such as during debugging, where it > > may be undesirable for a request to be filtered. It must therefore be > > possible to disable filtering. To accomplish this, I will prevent the > data > > from being filtered when a configuration setting in the request's context > > has been set. Support for disabling filters will need to be added to FCP. > > All of this will then need to be supported in the web interface. I will > add > > support for complementary GET and POST variables in FProxy which would be > > used to trigger this setting. Next, I'll add UI elements to the download > and > > insert queue pages and any other pertinent locations, such as the > > 'Downloading a page' page, which would enable these variables. These > > elements should only be visible when the user is in 'Advanced mode,' and, > > even then, should be tagged with a Big Fat Warning about the risks of > > turning off filtering. > > > > Another feature I will implement is the ability to run data through a > filter > > without placing it on the network. This would be useful for debugging > > content filters, and for freesite writers, who want to see what their > site > > will look like after its been parsed. This should be pretty easy to > > implement. I'll create an FCP message which will take data, filter it, > and > > return it. I will also create a way to do this through FProxy, by > uploading > > a file, and receiving the filtered version. > > > > The next thing I will implement are stream friendly Compressors. > > Essentially, we should be able to have a filter and a decompressor > running > > on separate threads, and have data be passable between them transparently > > using piped streams. > > > > > > Implementation of Ogg container formatRight now, Freenet has filters for > > HTML and some forms of image files. More filters means more types of > content > > which may be safely viewed by the user. This will allow the network to be > > used in ways which are currently not safe. After I have implemented the > new > > stream based content filters, I shall implement more of them. > > > > The first type of filter which I will implement is for the Ogg container > > format. This is technically interesting, as it encapsulates other types > of > > data. A generic Ogg parser will be written, which will need to validate > the > > Ogg container, identify the bitstreams it contains, identify the codec > used > > inside these bitstreams, and process the streams using a second(or nth, > > really, depending on how many bitstreams are in the container) codec > > specific filter. It should be possible to use this filter to either > filter > > the just beginning of the file, or the whole thing. This will make it > > possible to preview a partially downloaded file, at some point in the > > future. Some things which will need to be taken into consideration are > the > > possibility of Ogg pages being concealed inside of other Ogg pages. This > > will be checked for, and a fatal error will be raised if it occurs. > > > > The Ogg codecs which I will initially add support for are, in order, > Vorbis, > > Theora, and FLAC. > > > > > > More content filters > > > > The more filters the better. In the time remaining, I will implement as > many > > different possible content filters. While this step is very important, > these > > codecs individually are of a lower priority than previous steps. I will > > implement ATOM/RSS, mp3, and the rudiments of pdf. > > > > > > > > Milestones > > Here are clear milestones which may be used to evaluate my performance. > The > > following are a list of these goals which should be met to signify > > completion, along with very rough estimates as to how long each step > should > > take: > > > > *Stream based filters (3 days) > > *Filters are moved to the client layer, with support for (disableable) > > support filtering files going to the hard drive, and inserts (9 days) > > *Filters can be tested on data, without inserting it into the network (3 > > days) > > *Compressors can be interacted with through streams (4 days) > > *An Ogg content filter is implemented, supporting the following codecs: > (3 > > days) > > -The Vorbis codec (2 days) > > -The Theora codec (2 days) > > -The FLAC codec (2 days) > > *Content filters for ATOM/RSS are implemented (5 days) > > *A content filter for MP3 is implemented (6 days) > > *A basic content filter for pdf is implemented (Remaining time) > > > > > > > > Biography > > I initially became interested in Freenet because I am something of a > > cypherpunk, in that I believe the ability to hold pseudonymous discourse > to > > be a major cornerstone of free speech and the free flow of information. > I've > > skulked around Freenet occasionally, even helping pre-alpha test version > > 0.7. But I'd like to do more. I want to put my time and energy where my > > mouth is and spend my summer making the world, in some small way, safer > for > > freedom. > > Starry-eyed idealism aside, I am an 18 year old American high school > senior, > > who will be studying Computer Science after I graduate. While C/C++ is my > > 'first language', so to speak, I am also fluent in Java and Python. Last > > year, I personally rewrote my high school's web page in Python and > Django. > > This year, I've been working on an editor for Model United Nations > > resolutions, as time permits. This project is licensed under the GPLv3, > and > > is available on GitHub, at http://github.com/spencerjackson/resolute. > It's > > written in C++, and uses GTKmm for the GUI. > > > > > > _______________________________________________ > > Devl mailing list > > Devl at freenetproject.org > > http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl > > > > IMHO this looks good. > > My one concern is that your suggested timeline looks aggressive. It > looks to me more like a timeline for writing the code, as opposed to a > timeline for writing the code, documenting it, writing unit tests, and > debugging it. I know that writing copious documentation and unit > tests as we go isn't how Freenet normally does things, but it would be > nice to improve on that standard :) I think adding 1 day worth of > documentation and unit tests after each of your listed steps would > make a meaningful improvement to the resultant body of work. Of > course, others might disagree, and it's not a big concern. Like I > said, this looks good. > > Evan Daniel > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl > Okay. I removed the syndication feeds, and used that time to add another day to the other steps. So yeah, that's all submitted to GSoC. I have the second proposal done, which is very similar, but with more multimedia stuff, and am just about to click the submit button. So I'll copy that into another post. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20100408/3795e28c/attachment.html>
