Firstly, why do we need two index formats? I'm the first to admit that the current Librarian index format is limited - way too limited - but why do we need two? The main changes I would make to the librarian format right now would be: - Support splitting. (This is relevant to file indexes) - Include word indexes to allow for adjacent word searches. (This is relevant to file indexes too, because you may want to search for adjacent words in a title). - Maybe include some amount of metadata - functional (mime type), or theoretical (category, dublin core...), or other (activelinks?). (This is definitely relevant to file indexes). - Include the filename in the index. Possibly using negative word indexes to indicate "in the filename" words; it must be possible to distinguish between matches in the page title and matches in the content. (This is also relevant to both web page indexes and file indexes, though especially to the latter).
I am quite happy to change the format. Indeed it needs significant changes. Indexes, like all files, are automatically compressed, so don't worry too much about it being overly verbose. Now, you are proposing additional fields: firstly, the size of the content (this isn't especially relevant to web page indexes), and the length of the file if it is audio or video. Both are perfectly reasonable extensions IMHO. If we are going to support metadata we should support a range of metadata; we will need support for a category, (probably tied to a specific site), at least, and this is a very woolly and arbitrary thing. An explicit aim of your index format is to be able to index the contents of text-based files by words. This is a good thing, but if you are going to do that, then you should define a format, (preferably with some of the details of splitting indexes worked out), and make Librarian and Spider use it. Metadata can be shown next to matches, or it can be used to narrow down searches. And I honestly don't care whether it is XML. I see no reason to take strenuous efforts to keep back compatibility, but filters can be written easily enough if need be. On Fri, Jun 02, 2006 at 01:15:25AM +0200, Jerome Flesch wrote: > Hello, > > I designed an index format for the Fuqid replacement, and I wish to have your > opinion: > > http://wiki.freenetproject.org/AnotherFreenetIndexFormat > > > -- > Jerome Flesch. > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl > -- Matthew J Toseland - toad at amphibian.dyndns.org Freenet Project Official Codemonkey - http://freenetproject.org/ ICTHUS - Nothing is impossible. Our Boss says so. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20060602/6be054de/attachment.pgp>
