According to Geoff Hutchison: > At 10:11 AM -0500 10/3/01, Gilles Detillieux wrote: > >5. handle noindex_start & noindex_end as string lists > > I'd say this is a reasonable possibility.
Yeah, that's actually why I ranked it higher than the other 3 changes. I thought of another documentation change that's needed - to document htload and htdump for 3.1.6. > >6. a "match all documents" mechanism in htsearch > >7. add extra_word_casemap to map cases for letters not recognised in locale > >8. write accent_map support for accent fuzzy algorithm, to handle > > non-ISO-Latin-1 accents > > I'd say these are probably beyond the scope. I'm sure I could write > something for #6, but it'd be a horrible hack. (i.e. it would check > for a word of '*' or somesuch and then the parser wouldn't bother > looking up the words, but just returned all documents.) Yes, that's what I had in mind. I don't see why that would be a horrible hack. I think of it as an extension/optimisation of the whole prefix_match_character thing. When that character appears on its own, with no prefix before it, it means match everything (and do it as efficiently as possible, i.e. no need to actually search the word database and list all words). I just don't know exactly how you'd do it. > >3.2 releases, and item 7 will have to be after the next mifluz merge, > >because the WordType code will change. > > I'd say that #7 and #8 should probably wait a bit. I think there's > actually some movement towards UTF-8 in mifluz, so it would be > easiest to deal with locale and accent mapping this way (esp. since > some of the Unicode-aware libraries will do charset translation). > > That would be my $0.02. But I'm also really busy and want to get > Quim's new htsearch parser in place in 3.2 so that 3.2.0b4 doesn't > grow moldy. Yeah, I see your point. I'll definitely hold off on 7 & 8. If neither you nor I can get 6 in the code in reasonable time, we'll just have to do without. I thought of another way of doing this in 3.1.6, without any further code changes, but this is a horrible hack. Go through db.wordlist with sed to harvest all the document IDs, then generate a new word list record for each unique ID, for a made-up word like "matchalldocs", then rerun htmerge. A search for this word will match everything. Another change I thought of for htsearch would be to handle relative amounts for startday et al. This would be so you can easily restrict matches to, for example, documents changed in the last 90 days. You probably see where I'm going with this. It would be nice to have a "native" implementation of "what's new", rather than having to fight with unsupported Perl scripts that don't always work. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
