On Sun, Jun 04, 2006 at 07:01:09PM +0200, Jerome Flesch wrote: > Le Samedi 3 Juin 2006 03:16, Matthew Toseland a ??crit??: > > On Sat, Jun 03, 2006 at 03:00:49AM +0200, Jerome Flesch wrote: > > > > > > The main changes I would make to the librarian > > > > > > format right now would be: > > > > > > - Support splitting. (This is relevant to file indexes) > > > > > > I updated my format proposal on > > > http://wiki.freenetproject.org/AnotherFreenetIndexFormat to try to fit > > > your requirements, but I still need some explanations on this point: > > > I don't really understand why indexes need to handle file splitting: > > > FCPv2 specs specify that the node who does most of this work, no ? > > > > Splitting of the index itself. Because we will want to fetch only the > > relevant pieces if it gets big. If we have a lot of freesites, we will > > need to split the index up - perhaps by the first letter or two - in > > order to avoid having to fetch very large files regularly. Users are > > used to having to wait for search results with p2p, so this isn't > > necessarily a big problem. The search engine would fetch only those > > index parts needed for the particular search. Some letters would likely > > have fewer words under them, in which case they could be aggregated. > > I added a sub-indexes mechanism, assuming spliting is done on the first > letters of words.
That is what we want yes. We might have the number of letters be variable even in a given index, since some prefixes only have a few words in them... > > > > > > > - Maybe include some amount of metadata - functional (mime type), > > > > > > or theoretical (category, dublin core...), or other (activelinks?). > > > > > > (This is definitely relevant to file indexes). > > > > > > Regarding "activelinks", what do you mean exactly ? > > > > 95x32 icons for freesites. > > > I added an option for that, but I'm not sure that was exactly what you meant. They were very popular on 0.4/0.5. > > > > > > > - Include the filename in the index. Possibly using negative word > > > > > > indexes to indicate "in the filename" words; it must be possible > > > > > > to distinguish between matches in the page title and matches in the > > > > > > content. (This is also relevant to both web page indexes and file > > > > > > indexes, though especially to the latter). > > > > > > By filename, did you mean document titles ? > > > > No, I mean the filename - the URI. Which is what you will mostly be > > searching on for searches for non-text files. > > > Hm, wouldn't it be more relevant to do an exception, and use titles at least > for HTML documents ? Maybe both? For HTML the title is far more relevant... -- Matthew J Toseland - toad at amphibian.dyndns.org Freenet Project Official Codemonkey - http://freenetproject.org/ ICTHUS - Nothing is impossible. Our Boss says so. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20060606/282c6270/attachment.pgp>
