Greetings all,
I have a question about the interpretation of allow_numbers.
If allow_numbers is false, should digits be considered separators?
Looking at the code, it seems someone wanted to say that "3G", "Y2K"
and "X11" would be words, even if allow_numbers is false, because
they contain at least one letter:
int alpha = 0;
for(const unsigned char *p =
(const unsigned char*)(const char*)(char *)word; *p; p++) {
if(IsStrictChar(*p) || (allow_numbers && IsDigit(*p))) {
alpha = 1;
} else if(IsControl(*p)) {
return status | WORD_NORMALIZE_CONTROL;
}
}
//
// Reject if contains no alpha characters
//
if(!alpha) return status | WORD_NORMALIZE_NOALPHA;
Current behaviour is to *ignore* allow_numbers and to default to
treating digits as letters [since WORD_TYPE_DIGIT is included in
IsChar() and IsStrictChar()].
I propose the following behaviour:
1. If allow_numbers is true then digits are treated the same as
extra_word_characters.
2. If allow_numbers is false, then digits are treated as ("invalid")
punctuation.
3. The default be changed to allow_numbers=true (which is
compatibile with the current buggy default behaviour).
Any objections?
Lachlan
On Sat, 11 Oct 2003 05:56, Neal Richter wrote:
> Everyone: Please let me know what kind of time you'd be willing to
> put in to get this stuff tested??!!
--
[EMAIL PROTECTED]
ht://Dig developer DownUnder (http://www.htdig.org)
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev