> On Saturday 06 February 2010 16:39:53 Karol Pysniak wrote:
>> Hi, I would like to take part in the development of Free Net Project as my
>>  contribution to Google Summer of Code. I am especially interested in Data
>> Bases, Algorithms and AI. I think that it would be great to try to create
>> new, 'intelligent' search engine.

what do you mean by "intelligent" search engine?

there is another student, lusha on IRC, lushawang at gmail.com, interested in
doing a project on search. you two should talk to each other. if we take you
both, then you will need to be doing two distinct things - google doesn't allow
students to collaborate on a single proposal, but two related (but distinct)
proposals are fine.

On 04/02/2010 01:11 PM, Matthew Toseland wrote:
> - Unfortunately there are severe scaling problems with this format: The
> sub-indexes can get huge, and inserting them all as a single freesite also
> causes problems. - Plus, the spider's architecture, involving a database of
> terms and URIs, also doesn't scale. It takes a week or more to write the
> index from the database. It would be better to maintain the database on the
> fly, rewriting affected parts every few hours as new keys are spidered.

the work to do in this area is described in some detail at
http://new-wiki.freenetproject.org/Talk:Library

you'll need to understand how SkeletonBTreeMap.update() works first; if the
source code is too unintuitive (async so components are scattered), then ask me
for an explanation.

> - We have a new format, created by infinity0, a Summer of Code student last
> year, which should scale much better. This is based on b-trees, and
> therefore data can be loaded into it progressively and just the changed
> nodes are re-uploaded. However, currently the spider uses the old format. So
> the first task would be to make the spider use the new format and load data
> into it progressively (on the fly). Also, when the new format is on Freenet,
> it is forkable - meaning that not only can the original author of the index
> add data and only insert those nodes affected (including their parents), but
> anyone else can also add their own changes - which do not affect the
> original btree - and reuse the existing ones. In other words, it is a "copy
> on write btree", although I don't think it has all the tweaks that the COW
> btrees paper talks about. This may have many wider applications, e.g.
> merging others' indexes, and maybe eventually distributing the spidering
> process.

there is a COW btree paper? do you have a link?

> - The other  half of infinity0's Summer of Code project was to be
> distributed search. This would allow each user to publish indexes and to
> link to other users' indexes; it is described on the wiki.

i'm working on this atm as part of my uni course; i've coded a prototype and
atm i'm collecting data to test it with (hoop-jumping for dissertation :/). my
deadline is mid-may so if you/lusha want to pick it up afterwards and work on
it for GSoC then i'm happy to explain how it works.

however there is plenty of stuff to work on Library already, IMO it would be
better to get the basics working before trying to graft a more complex system
onto it.

> - Most likely infinity0 would be your mentor.

that's me, btw :) i'm on irc under that nick.

> You should read: http://new-wiki.freenetproject.org/Library 
> http://new-wiki.freenetproject.org/B-tree_indexes 
> http://new-wiki.freenetproject.org/Web_of_Trust And of course some of the
> papers/videos/introductory stuff on our web page.

source code is at
http://github.com/infinity0/plugin-Library-staging

there are some notes in ./doc/ and ./TODO - you can pick things out of that. if
anything is unclear, ask me.

X

Reply via email to