Greetings all,

I have a question about the interpretation of  allow_numbers.
If  allow_numbers  is false, should digits be considered separators?  
Looking at the code, it seems someone wanted to say that  "3G", "Y2K" 
and "X11" would be words, even if  allow_numbers  is false, because 
they contain at least one letter:

  int alpha = 0;
  for(const unsigned char *p =
        (const unsigned char*)(const char*)(char *)word; *p; p++) {
    if(IsStrictChar(*p) || (allow_numbers && IsDigit(*p))) {
      alpha = 1;
    } else if(IsControl(*p)) {
      return status | WORD_NORMALIZE_CONTROL;
    }
  }

  //
  // Reject if contains no alpha characters
  //
  if(!alpha) return status | WORD_NORMALIZE_NOALPHA;



Current behaviour is to *ignore*  allow_numbers  and to default to 
treating digits as letters [since WORD_TYPE_DIGIT is included in  
IsChar()  and  IsStrictChar()].

I propose the following behaviour:

1. If  allow_numbers  is true then digits are treated the same as 
extra_word_characters.
2. If  allow_numbers  is false, then digits are treated as ("invalid") 
punctuation.
3. The default be changed to  allow_numbers=true  (which is 
compatibile with the current buggy default behaviour).

Any objections?

Lachlan

On Sat, 11 Oct 2003 05:56, Neal Richter wrote:

> Everyone:  Please let me know what kind of time you'd be willing to
> put in to get this stuff tested??!!

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to