On Fri, 8 Nov 2002, Lachlan Andrew wrote:

> Regarding the flags, I can see why it makes sense to store 
> the information, but it doesn't need to be as a bit-field.  

I do think it makes sense to have a bit field. Remember that we're not
just planning a database for HTML documents. Yes, some of the current bits
are exclusive, but I can imagine that some XML documents might want
combined bits, e.g.:

<foo ...>
        text to be indexed
  <bar>
        more text
  </bar>
</foo>

Yes, some of the current flags could be in a lookup, but some
(i.e. FLAG_CAPITAL) are clearly a bitfield. I could also see some
situations where FLAG_AUTHOR and FLAG_KEYWORDS are combined, and
conceivably the parser should be smart enough to decide if FLAG_LINK_TEXT
and FLAG_URL should be combined, e.g.

<a href="http://foo.com/";>foo.com</a>

Yes, you might argue these are somewhat contrived. But when we were first
planning the database format for 3.2, we considered that arbitrary
documents and XML might be included in a "3.2" release with user-defined
bits and field-restricted searching.

> can thank Mr Gates for that one...  However, it could also 
> be treated as "level 3 heading", unless it is already given 
> extra weight somehow.

It is not given extra weight currently. Again, the catch would be with
field-restricted searches. If we treat things as a level-3 heading or
whatever, then we have to block a search at that level as you'll get more
than you asked for.

-Geoff



-------------------------------------------------------
This sf.net email is sponsored by: See the NEW Palm 
Tungsten T handheld. Power & Color in a compact size!
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to