On Sat, Jun 03, 2006 at 03:00:49AM +0200, Jerome Flesch wrote:
> > > > The main changes I would make to the librarian
> > > > format right now would be:
> > > > - Support splitting. (This is relevant to file indexes)
> 
> I updated my format proposal on 
> http://wiki.freenetproject.org/AnotherFreenetIndexFormat to try to fit your 
> requirements, but I still need some explanations on this point:
> I don't really understand why indexes need to handle file splitting: FCPv2 
> specs specify that the node who does most of this work, no ? 

Splitting of the index itself. Because we will want to fetch only the
relevant pieces if it gets big. If we have a lot of freesites, we will
need to split the index up - perhaps by the first letter or two - in
order to avoid having to fetch very large files regularly. Users are
used to having to wait for search results with p2p, so this isn't
necessarily a big problem. The search engine would fetch only those
index parts needed for the particular search. Some letters would likely
have fewer words under them, in which case they could be aggregated.
> 
> > > > - Include word indexes to allow for adjacent word searches. (This is
> > > >   relevant to file indexes too, because you may want to search for
> > > >   adjacent words in a title).
> 
> Added.
> 
> > > > - Maybe include some amount of metadata - functional (mime type), or
> > > >   theoretical (category, dublin core...), or other (activelinks?).
> > > > (This is definitely relevant to file indexes).
> 
> I agree that mime types would be more relevant than my previous "file" tag 
> attribute called "type", so I replaced this attribute.
> 
> Regarding categories, I still think it would be better to let it as an option 
> (e.g. "option" tag in "file" tag), as it will not always be possible for 
> spider to find good categories, and as we will probably have some lazy users 
> never specifing categories.

Certainly.

> But if you think it's really important, I can put it as "file" tag attribute.

Nah.
> 
> Regarding dublin core metadata, as binary files won't have them, I think it's 
> better to put them as options too.

Most of these things are options.
> 
> Regarding "activelinks", what do you mean exactly ?

95x32 icons for freesites.
> 
> > > > - Include the filename in the index. Possibly using negative word
> > > >   indexes to indicate "in the filename" words; it must be possible to
> > > >   distinguish between matches in the page title and matches in the
> > > >   content. (This is also relevant to both web page indexes and file
> > > >   indexes, though especially to the latter).
> > >
> By filename, did you mean document titles ?

No, I mean the filename - the URI. Which is what you will mostly be searching
on for searches for non-text files.
> >
> > Right. Thanks for your thoroughness, I hope that it doesn't result in
> > your not having time to ship the primary finished product (the GUI
> > searching/sharing tool itself).
> >
> As I said below, I will only do, in a first time, basic work on Librarian and 
> Spider. I don't think I will spend too much time on it. And if I see that 
> it's starting to require too much time, I may let it down some time to come 
> back on it later (e.g. probably after the summer).
> 
> > > Regarding Spider, in a first time, it would only be a basic version /
> > > adaptation, only indexing HTML files. As I will need to create a set of
> > > filters to extract metadata and words for the Fuqid replacement, I could
> > > reuse them later in Spider.
> >
> > Right. I see no reason why your filesharing tool cannot link directly
> > into freesites if they haven't been excluded from the search.
> >
> I agree. It will only require to know where to find the browser, it shouldn't 
> be a problem :)
> 
> > > > Metadata can be shown next to matches, or it can be used
> > > > to narrow down searches.
> > >
> > > For Librarian ? Ok, I don't think it will be a real problem.
> >
> > Yes, for google-style searches. It might be worth thinking about for
> > filesharing type searches too.
> >
> Ok.
-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20060603/7ec6a9cc/attachment.pgp>

Reply via email to