Re: [htdig-dev] wish list for 3.1.6

Gilles Detillieux Fri, 05 Oct 2001 07:20:42 -0700

According to Geoff Hutchison:
> At 10:11 AM -0500 10/3/01, Gilles Detillieux wrote:
> >5.  handle noindex_start & noindex_end as string lists
> 
> I'd say this is a reasonable possibility.


Yeah, that's actually why I ranked it higher than the other 3 changes.
I thought of another documentation change that's needed - to document
htload and htdump for 3.1.6.

> >6.  a "match all documents" mechanism in htsearch
> >7.  add extra_word_casemap to map cases for letters not recognised in locale
> >8.  write accent_map support for accent fuzzy algorithm, to handle
> >     non-ISO-Latin-1 accents
> 
> I'd say these are probably beyond the scope. I'm sure I could write 
> something for #6, but it'd be a horrible hack. (i.e. it would check 
> for a word of '*' or somesuch and then the parser wouldn't bother 
> looking up the words, but just returned all documents.)

Yes, that's what I had in mind.  I don't see why that would be a
horrible hack.  I think of it as an extension/optimisation of the
whole prefix_match_character thing.  When that character appears on
its own, with no prefix before it, it means match everything (and do
it as efficiently as possible, i.e. no need to actually search the word
database and list all words).  I just don't know exactly how you'd do
it.

> >3.2 releases, and item 7 will have to be after the next mifluz merge,
> >because the WordType code will change.
> 
> I'd say that #7 and #8 should probably wait a bit. I think there's 
> actually some movement towards UTF-8 in mifluz, so it would be 
> easiest to deal with locale and accent mapping this way (esp. since 
> some of the Unicode-aware libraries will do charset translation).
> 
> That would be my $0.02. But I'm also really busy and want to get 
> Quim's new htsearch parser in place in 3.2 so that 3.2.0b4 doesn't 
> grow moldy.

Yeah, I see your point.  I'll definitely hold off on 7 & 8.  If neither
you nor I can get 6 in the code in reasonable time, we'll just have to
do without.

I thought of another way of doing this in 3.1.6, without any further
code changes, but this is a horrible hack.  Go through db.wordlist with
sed to harvest all the document IDs, then generate a new word list
record for each unique ID, for a made-up word like "matchalldocs",
then rerun htmerge.  A search for this word will match everything.

Another change I thought of for htsearch would be to handle relative
amounts for startday et al.  This would be so you can easily restrict
matches to, for example, documents changed in the last 90 days.  You
probably see where I'm going with this.  It would be nice to have a
"native" implementation of "what's new", rather than having to fight
with unsupported Perl scripts that don't always work.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Re: [htdig-dev] wish list for 3.1.6

Reply via email to