On Friday 02 April 2010 13:41:12 Ximin Luo wrote:
> > On Saturday 06 February 2010 16:39:53 Karol Pysniak wrote:
> >> Hi, I would like to take part in the development of Free Net Project as my
> >>  contribution to Google Summer of Code. I am especially interested in Data
> >> Bases, Algorithms and AI. I think that it would be great to try to create
> >> new, 'intelligent' search engine.
> 
> what do you mean by "intelligent" search engine?

Google is now getting into social search. The WoT stuff might deliver something 
similar...
> 
> there is another student, lusha on IRC, lushawang at gmail.com, interested in
> doing a project on search. you two should talk to each other. if we take you
> both, then you will need to be doing two distinct things - google doesn't 
> allow
> students to collaborate on a single proposal, but two related (but distinct)
> proposals are fine.

We did this last year, it worked fine.
> 
> On 04/02/2010 01:11 PM, Matthew Toseland wrote:
> > - Unfortunately there are severe scaling problems with this format: The
> > sub-indexes can get huge, and inserting them all as a single freesite also
> > causes problems. - Plus, the spider's architecture, involving a database of
> > terms and URIs, also doesn't scale. It takes a week or more to write the
> > index from the database. It would be better to maintain the database on the
> > fly, rewriting affected parts every few hours as new keys are spidered.

s/database/index. Maintain the index on the fly. The database then is just a 
list of URLs that we have and have not spidered yet.
> 
> the work to do in this area is described in some detail at
> http://new-wiki.freenetproject.org/Talk:Library
> 
> you'll need to understand how SkeletonBTreeMap.update() works first; if the
> source code is too unintuitive (async so components are scattered), then ask 
> me
> for an explanation.
> 
> > - We have a new format, created by infinity0, a Summer of Code student last
> > year, which should scale much better. This is based on b-trees, and
> > therefore data can be loaded into it progressively and just the changed
> > nodes are re-uploaded. However, currently the spider uses the old format. So
> > the first task would be to make the spider use the new format and load data
> > into it progressively (on the fly). Also, when the new format is on Freenet,
> > it is forkable - meaning that not only can the original author of the index
> > add data and only insert those nodes affected (including their parents), but
> > anyone else can also add their own changes - which do not affect the
> > original btree - and reuse the existing ones. In other words, it is a "copy
> > on write btree", although I don't think it has all the tweaks that the COW
> > btrees paper talks about. This may have many wider applications, e.g.
> > merging others' indexes, and maybe eventually distributing the spidering
> > process.
> 
> there is a COW btree paper? do you have a link?

http://en.wikipedia.org/wiki/BTRFS#History
The core data structure of Btrfs?the copy-on-write B-tree?was originally 
proposed by IBM researcher Ohad Rodeh at a presentation at USENIX 2007. Rodeh 
suggested adding reference counts and certain relaxations to the balancing 
algorithms of standard B-trees that would make them suitable for a 
high-performance object store with copy-on-write snapshots, yet maintain good 
concurrency.[19]
https://www.usenix.org/events/lsf07/tech/rodeh.pdf

(Did I mention that COW btrees are incredibly awesome? ;) )
> 
> > - The other  half of infinity0's Summer of Code project was to be
> > distributed search. This would allow each user to publish indexes and to
> > link to other users' indexes; it is described on the wiki.
> 
> i'm working on this atm as part of my uni course; i've coded a prototype and
> atm i'm collecting data to test it with (hoop-jumping for dissertation :/). my
> deadline is mid-may so if you/lusha want to pick it up afterwards and work on
> it for GSoC then i'm happy to explain how it works.
> 
> however there is plenty of stuff to work on Library already, IMO it would be
> better to get the basics working before trying to graft a more complex system
> onto it.
> 
> > - Most likely infinity0 would be your mentor.
> 
> that's me, btw :) i'm on irc under that nick.
> 
> > You should read: http://new-wiki.freenetproject.org/Library 
> > http://new-wiki.freenetproject.org/B-tree_indexes 
> > http://new-wiki.freenetproject.org/Web_of_Trust And of course some of the
> > papers/videos/introductory stuff on our web page.
> 
> source code is at
> http://github.com/infinity0/plugin-Library-staging
> 
> there are some notes in ./doc/ and ./TODO - you can pick things out of that. 
> if
> anything is unclear, ask me.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100402/c664f99f/attachment.pgp>

Reply via email to