On Thu, 13 May 2004, Andrew Moise wrote:

> Date: Thu, 13 May 2004 12:46:01 -0400
> From: Andrew Moise <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: Jim <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
> Subject: Re: [htdig-dev] 3.2.0 - is it worth it??
> 
> On Thu, 2004-05-13 at 08:58, Lachlan Andrew wrote:
> > My impression is that the ht://Dig project is basically dead :(  The 
> > existing code is of course still functional (and thanks Jim and 
> > Gilles for all the support you give to the users!), but I don't think 
> > there is enough enthusiasm to either release a new version, either 
> > 3.2 or 3.3. If I get enthusiastic in the next couple of weeks, I 
> > might still try to put 3.2.0b6 together, but that is about as far as 
> > it will go...
> 
>   As a user, I'm very sorry to hear this -- I just deployed 3.2.0b5 on a
> site I administrate, and I've been very pleased with it.  I've been
> waiting until the 3.2.0 release cycle was over to start trying to
> contribute some of my tweaks (I also need to talk to my employer about
> the legalities first), but I guess if the project is stagnating I should
> speak up (there's also the possibility that the imminent death of htdig
> just makes this extra silly, of course... *shrug*).  In any case:
>   So htdig does a bad job when multiple documents match a search in
> similar ways; this shows up particularly when your search query matches
> part of the header or footer of a section of your site, or when your
> search results include threads from a mailing list archive (in which
> case messages within a thread often show up consecutively in the
> results, which adds a lot of noise).  I wrote some code (shoehorned in
> as a ScoreMatch, more for easy control by the 'sort' parameter than for
> any logical reason) which sorts the results once, then reduces the score
> of any match which is similar to matches that are higher in the list,
> then resorts the results; thus the high-ranked results that are returned
> tend to be more unique than otherwise.  This is marginally helpful with
> the header/footer problem (though the excerpts are still usually
> identical in that case), and very helpful with the mailing-list-thread
> problem. AFAICT it doesn't do too much harm to the results in the normal
> case.
>   We also found it beneficial to tweak results' scores by matching their
> URLs against a handmade list of URL pieces and score-hacking factors
> (mailing list archives are mediocre, IRC archives are usually unhelpful,
> a particular section of documentation is generally very useful) -- I
> know this is gross, but it did wonders for the effectiveness of our
> search results, and a coworker of mine convinced me that it's not
> totally against nature -- humans really do have special knowledge of
> which sections of a site are generally "good," and with an hour or so of
> tweaking we got things in a state where close results from a "bad"
> section are presented above loose results from a "good" section when
> appropriate (more or less).
>   It seems to me that it would be useful to generalize these little
> hacks into a search parameter listing which hacks should be applied; for
> example, to select the two score hacks described in the above paragraphs
> you could specify 'result_hacks=unique,urlmatch' in the search query or
> htdig.conf.  htdig already has a couple of result hacks that could fit
> into this scheme (backlink_factor and date_factor), and I can think of
> one more at least that I'd like to add in my copious free time.  It
> certainly would seem right to me to be able (a) to add stuff like the
> above tweaks to the codebase without forcing everyone to care about it,
> and (b) to test, tweak, and reorder the scoring hacks from a query
> parameter while trying to get things configured to work well.
>   As I said, I've got (wrongly-integrated) code for the two tweaks I
> mentioned, which I can try to get into a presentable state, and I might
> be able to find time semi-soon to do the work for the general
> result_hacks parameter, if there are people that think either of those
> would be worthwhile.  Are there such people?

Count one;)  I like both.

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]




-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to