According to Lachlan Andrew:
> - What is the difference between  Dictionary::Destroy()
>   and  Dictionary::Release() ?

Dictionary entries associate a particular name or keyword with a pointer
to an Object.  Generally, but not necessarily always, when adding items
to a Dictionary, you allocate a new Object to set the value field.  E.g.:

   dict.Add(name, new String(value));

When you delete the dictionary, all the objects in it are deleted too.
If you want to empty a dictionary, deleting all objects, then you'd use
dict.Destroy(), but if you want to empty the dictionary and keep all
the objects around (assuming there are other "live" pointers to these
objects), then you can use dict.Release() to release the objects from
the dictionary.  Once released, the Dictionary itself can be deleted
without harming the objects that were once contained in it.

At least, that's my interpretation of the Dictionary class code in 3.1.
Whether the code that uses the Dictionary class actually uses this
properly, and whether all this is done the same way in 3.2, I couldn't
say for sure.  I think the changes to the destructors for DictionaryEntry
and Dictionary in 3.2 are to avoid excessive recursion, which makes sense.

> - Have the "factor slots" changed at some time?  The
>   keyword and description slots seem out of sync between
>   ExternalParser.cc  (slots 10 and 11, respectively) and
>   HTML.cc (slots 9 and 10).
> 
> (Since I'm rather new to this project, I'm hesitant to 
> change any functionality that hasn't been flagged as a bug.)

Yes, they have changed, and yes, ExternalParser.cc is indeed out of sync.
It's a bug.  Good eye!  It should be using slot 9 for keywords and 10 for
meta description.

> - Is there something in the test suite to test parsing,
>   especially of META tags?

Good question.  There are META tags in the HTML test files in
test/htdocs/set1, but they don't contain every type of meta tag we'd want
to test.  Also, it seems the htdig tests in the test directory just make
sure htdig finds all the URLs it's supposed to, but there doesn't seem
to be any checks that it finds all the words it's supposed to.

> - It seems that <meta foo="bar"> is usually treated the
>   same as <meta name="foo" contents="bar">.  Can it always
>   be? If so, we could avoid some code duplication (or
>   triplication, since much of it is currently in both
>   ExternalParser.cc  and  HTML.cc!).

I think there are some cases where that's true, but not necessarily in all
cases, so I don't know how much you can optimize this.  E.g., for certain
keyword tags we allow the form <meta foo="bar">, but the configurable
keyword names must be of the form <meta name="foo" contents="bar">.
I don't know that we'd want to fully generalize this, but I'm open to
suggestions/recommendations from others.

> - The handling of <meta name="date"...> seems to have
>   disappeared from  ExternalParser.cc.  Has it been moved
>   somewhere, been deliberately removed, or just got lost?

Bear in mind that 3.1 and 3.2 have been developed in parallel for almost 3
years now, so some changes in one don't necessarily make it to the other.
Sometimes that's deliberate, but sometimes they just fall through the
cracks.  In this case, the date handling was added in 3.1.6, and that's
one of the features that still needs to be forward ported to 3.2.  So,
it never disappeared from 3.2 - it just never appeared yet.

> - Where can I read about the reason for changing "factor
>   slots" from explicit factors to bit masks?  I assume that
>   there was a change to the database format which required
>   it...  It would be really nice to be able to specify
>   heading factors depending on the heading level again!

I'm sure there'd be a fair bit of discussion about this in the htdig-dev
archives of 2-3 years ago.  I don't think it ever got formally documented
elsewhere (yet).  The reason was to allow "scoring on the fly".  In 3.1,
the score is calculated by htdig, so if you change the factors, you
need to reindex.  In 3.2, because the word database needs to keep all
instances of each word, for phrase matching, it only makes sense to
also keep a flag indicating word type, so that the score calculations
for different word types can be deferred to the search phase.

The decision to put all headings into one factor was to reduce the number
of bits the flag would take by 5, so the flags can fit in a single byte.
We're going to have to increase this anyway, to accomodate custom fields,
so it might make sense to reintroduce the distinction between heading
levels.  Given that a word can't be in more than one heading level at
once, this could be encoded in 3 bits, with only a minor complication
of the score calculation.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: See the NEW Palm 
Tungsten T handheld. Power & Color in a compact size!
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to