> I recall Frost having a nasty bug that caused it to crash whenever it > encountered a message malformed in a special way due to the parser not > handling error cases correctly. Using XML allows one to use existing XML > libraries for parsing instead of having to write a new parser, making it > much less likely that such unpleasantness occurs again. > Er ... Currently, I didn't look at Frost source code, but Frost messages seem to be in XML, no ?
> This is especially > important for non-Java programs, since they can easily develop far more > serious symptoms than simply crashing. > > It also allows for trivial backwards-compatible extension: simply state > that a program should ignore all tags and attributes it doesn't > understand, and you can extend the format as needed while the old programs > will still keep on working. > Yes, but I always fear to see one day strange / useless extensions from programs made by someone else, and people asking me to add support for it ... (Ok, it will probably not happen ...) > > On Fri, 02 Jun 2006 00:33:23 +0100, Matthew Toseland wrote: > > Firstly, why do we need two index formats? I'm the first to admit that > > the current Librarian index format is limited - way too limited - but why > > do we need two? The main changes I would make to the librarian format > > right now would be: > > - Support splitting. (This is relevant to file indexes) - Include word > > indexes to allow for adjacent word searches. (This is > > relevant to file indexes too, because you may want to search for > > adjacent words in a title). > > - Maybe include some amount of metadata - functional (mime type), or > > theoretical (category, dublin core...), or other (activelinks?). (This > > is definitely relevant to file indexes). > > - Include the filename in the index. Possibly using negative word > > indexes to indicate "in the filename" words; it must be possible to > > distinguish between matches in the page title and matches in the > > content. (This is also relevant to both web page indexes and file > > indexes, though especially to the latter). > > > > I am quite happy to change the format. Indeed it needs significant > > changes. > > > > Indexes, like all files, are automatically compressed, so don't worry too > > much about it being overly verbose. > > > > Now, you are proposing additional fields: firstly, the size of the > > content (this isn't especially relevant to web page indexes), and the > > length of the file if it is audio or video. Both are perfectly reasonable > > extensions IMHO. If we are going to support metadata we should support a > > range of metadata; we will need support for a category, (probably tied to > > a specific site), at least, and this is a very woolly and arbitrary > > thing. > > > > An explicit aim of your index format is to be able to index the contents > > of text-based files by words. This is a good thing, but if you are going > > to do that, then you should define a format, (preferably with some of the > > details of splitting indexes worked out), and make Librarian and Spider > > use it. Metadata can be shown next to matches, or it can be used to > > narrow down searches. > > > > And I honestly don't care whether it is XML. I see no reason to take > > strenuous efforts to keep back compatibility, but filters can be written > > easily enough if need be. > > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl -- Jerome Flesch.