[freenet-dev] Development of Free Net Project

Matthew Toseland Fri, 2 Apr 2010 13:11:58 +0100

On Saturday 06 February 2010 16:39:53 Karol Pysniak wrote:
> Hi, 
> I would like to take part in the development of Free Net Project as my
> contribution to Google Summer of Code. I am especially interested in
> Data Bases, Algorithms and AI. I think that it would be great to try to
> create new, 'intelligent' search engine.
> 
> I can program in Java/C++/C/Haskell/Assembler(IA-32).

Great. You will need to apply via the Summer of Code web interface. If you want
to send us your proposal so we can look at it and suggest improvements, please
do so here.

Before we accept you we will need you to demonstrate some basic coding ability
by making some small change (bugfix or feature) to Freenet. See the bug tracker
at https://bugs.freenetproject.org/ for ideas.

You should apply for at least two tasks within Freenet, so that we are able to
choose both students and projects, and don't have to drop a good student
because we already have somebody else for that project.

Please read up on Freenet first or this may not make much sense: The most
fundamental thing is that Freenet only provides "insert" (publish data to a
key) and "request" (fetch a key) operations and everything else is built on top
of it, including searching. There have been proposals for distributed searching
but doing so in a secure and spam-proof way is remarkably difficult so for now
we will probably continue to build search on top of inserts and requests...

As regards searching, here is the current situation.
- A spider (XMLSpider plugin) spiders Freenet freesites and generates an index,
which is inserted into Freenet in the same way as a freesite would be.
- Another plugin downloads the relevant parts of the index when you do a
search, combines them and displays the matching URIs.
- The old XMLLibrarian plugin implemented a simple XML-based search index
format. One file contains the list of sub-indexes (split by md5 of the word
being looked up), and then there is one file for each sub-index. Within each
sub-index, we have a list of URIs, words, and which URIs each word is contained
in. For an example of this format, please have a look here (install Freenet
first):
USK at
5hH~39FtjA7A9~VXWtBKI~prUDTuJZURudDG0xFn3KA,GDgRGt5f6xqbmo-WraQtU54x4H~871Sho9Hz6hC-0RA,AQACAAE/Search/24/index.xml
USK at
5hH~39FtjA7A9~VXWtBKI~prUDTuJZURudDG0xFn3KA,GDgRGt5f6xqbmo-WraQtU54x4H~871Sho9Hz6hC-0RA,AQACAAE/Search/24/index_00.xml
- This format is still supported by the new Library plugin, which replaced
XMLLibrarian for the frontend some time ago.
- Unfortunately there are severe scaling problems with this format: The
sub-indexes can get huge, and inserting them all as a single freesite also
causes problems.
- Plus, the spider's architecture, involving a database of terms and URIs, also
doesn't scale. It takes a week or more to write the index from the database. It
would be better to maintain the database on the fly, rewriting affected parts
every few hours as new keys are spidered.
- We have a new format, created by infinity0, a Summer of Code student last
year, which should scale much better. This is based on b-trees, and therefore
data can be loaded into it progressively and just the changed nodes are
re-uploaded. However, currently the spider uses the old format. So the first
task would be to make the spider use the new format and load data into it
progressively (on the fly). Also, when the new format is on Freenet, it is
forkable - meaning that not only can the original author of the index add data
and only insert those nodes affected (including their parents), but anyone else
can also add their own changes - which do not affect the original btree - and
reuse the existing ones. In other words, it is a "copy on write btree",
although I don't think it has all the tweaks that the COW btrees paper talks
about. This may have many wider applications, e.g. merging others' indexes, and
maybe eventually distributing the spidering process.
- The other half of infinity0's Summer of Code project was to be distributed
search. This would allow each user to publish indexes and to link to other
users' indexes; it is described on the wiki.
- Most likely infinity0 would be your mentor.
- There is limited support for basic page ranking based on word frequencies. I
don't think the current search indexes support the metadata needed, but the
spider should if I remember correctly.
You should read:
http://new-wiki.freenetproject.org/Library
http://new-wiki.freenetproject.org/B-tree_indexes
http://new-wiki.freenetproject.org/Web_of_Trust
And of course some of the papers/videos/introductory stuff on our web page.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20100402/49eb283c/attachment.pgp>

[freenet-dev] Development of Free Net Project

Reply via email to