According to Lachlan Andrew: > - What is the difference between Dictionary::Destroy() > and Dictionary::Release() ?
Dictionary entries associate a particular name or keyword with a pointer to an Object. Generally, but not necessarily always, when adding items to a Dictionary, you allocate a new Object to set the value field. E.g.: dict.Add(name, new String(value)); When you delete the dictionary, all the objects in it are deleted too. If you want to empty a dictionary, deleting all objects, then you'd use dict.Destroy(), but if you want to empty the dictionary and keep all the objects around (assuming there are other "live" pointers to these objects), then you can use dict.Release() to release the objects from the dictionary. Once released, the Dictionary itself can be deleted without harming the objects that were once contained in it. At least, that's my interpretation of the Dictionary class code in 3.1. Whether the code that uses the Dictionary class actually uses this properly, and whether all this is done the same way in 3.2, I couldn't say for sure. I think the changes to the destructors for DictionaryEntry and Dictionary in 3.2 are to avoid excessive recursion, which makes sense. > - Have the "factor slots" changed at some time? The > keyword and description slots seem out of sync between > ExternalParser.cc (slots 10 and 11, respectively) and > HTML.cc (slots 9 and 10). > > (Since I'm rather new to this project, I'm hesitant to > change any functionality that hasn't been flagged as a bug.) Yes, they have changed, and yes, ExternalParser.cc is indeed out of sync. It's a bug. Good eye! It should be using slot 9 for keywords and 10 for meta description. > - Is there something in the test suite to test parsing, > especially of META tags? Good question. There are META tags in the HTML test files in test/htdocs/set1, but they don't contain every type of meta tag we'd want to test. Also, it seems the htdig tests in the test directory just make sure htdig finds all the URLs it's supposed to, but there doesn't seem to be any checks that it finds all the words it's supposed to. > - It seems that <meta foo="bar"> is usually treated the > same as <meta name="foo" contents="bar">. Can it always > be? If so, we could avoid some code duplication (or > triplication, since much of it is currently in both > ExternalParser.cc and HTML.cc!). I think there are some cases where that's true, but not necessarily in all cases, so I don't know how much you can optimize this. E.g., for certain keyword tags we allow the form <meta foo="bar">, but the configurable keyword names must be of the form <meta name="foo" contents="bar">. I don't know that we'd want to fully generalize this, but I'm open to suggestions/recommendations from others. > - The handling of <meta name="date"...> seems to have > disappeared from ExternalParser.cc. Has it been moved > somewhere, been deliberately removed, or just got lost? Bear in mind that 3.1 and 3.2 have been developed in parallel for almost 3 years now, so some changes in one don't necessarily make it to the other. Sometimes that's deliberate, but sometimes they just fall through the cracks. In this case, the date handling was added in 3.1.6, and that's one of the features that still needs to be forward ported to 3.2. So, it never disappeared from 3.2 - it just never appeared yet. > - Where can I read about the reason for changing "factor > slots" from explicit factors to bit masks? I assume that > there was a change to the database format which required > it... It would be really nice to be able to specify > heading factors depending on the heading level again! I'm sure there'd be a fair bit of discussion about this in the htdig-dev archives of 2-3 years ago. I don't think it ever got formally documented elsewhere (yet). The reason was to allow "scoring on the fly". In 3.1, the score is calculated by htdig, so if you change the factors, you need to reindex. In 3.2, because the word database needs to keep all instances of each word, for phrase matching, it only makes sense to also keep a flag indicating word type, so that the score calculations for different word types can be deferred to the search phase. The decision to put all headings into one factor was to reduce the number of bits the flag would take by 5, so the flags can fit in a single byte. We're going to have to increase this anyway, to accomodate custom fields, so it might make sense to reintroduce the distinction between heading levels. Given that a word can't be in more than one heading level at once, this could be encoded in 3 bits, with only a minor complication of the score calculation. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by: See the NEW Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
