According to Hans-Peter Nilsson:
> I plan to add a new attribute: extra_word_characters.
> It is the opposite (or something) to valid_punctuation, it marks a
> (possibly) non-alphanumeric as a valid word-character.
It's like valid_punctuation, in that it's taken as part of the word, but
unlike valid_punctuation in that it's not stripped out before the word
is put in the database, if I understand you correctly.
> This way (and no other I know of), I can make "_" characters part of
> words, and searchable as such.
>
> A (hopefully) positive side-effect is that people having problems making
> their systems understand their locale (i.e. it is broken in that it
> handles everything as the "C" locale) can state characters here that the
> locale would normally handle.
>
> Examples:
> extra_word_characters: _
> extra_word_characters: "������"
>
> (If you didn't get the last one, don't worry.)
> Specifying characters handled by the locale as isalpha would be a no-op.
>
> Comments welcome.
Sounds like a good idea to me. I'm planning a round of changes to
HTML.cc next week, especially dealing with space handling, but also with
word handling, so it would be a good idea if we try to avoid stepping
on each others toes. If you get your changes in by Monday or Tuesday,
then I can follow with mine. I want to get my changes in to 3.1.2,
which will eventually get merged into 3.2. My concern is if I change
the same part of the code in 3.1.2 that you change in 3.2, the cvs merge
may not put it all together right.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.