> Alvaro Herrera wrote:
> > Tom Lane wrote:
> > 
> >> ISTM that perhaps a more generally useful definition would be
> >>
> >> lword              Only ASCII letters
> >> nlword             Entirely letters per iswalpha(), but not lword
> >> word               Entirely alphanumeric per iswalnum(), but not nlword
> >>            (hence, includes at least one digit)
> > ...
> > I am not sure if there are any western european languages were words can
> > only be formed with non-ascii chars. 
> 
> There is at least in Swedish: "ö" (island) and å (river). They're both a
> bit special because they're just one letter each.
> 
> > lword               Entirely letters per iswalpha, with at least one ASCII
> > nlword              Entirely letters per iswalpha
> > word                Entirely alphanumeric per iswalnum, but not nlword
> 
> I don't like this categorization much more than the original. The
> distinction between lword and nlword is useless for most European
> languages.
> 
> I suppose that Tom's argument that it's useful to distinguish words made
> of purely ASCII characters in computer-oriented stuff is valid, though I
> can't immediately think of a use case. For things like parsing a
> programming language, that's not really enough, so you'd probably end up
> writing your own parser anyway. I'm also not clear what the use case for
> the distinction between words with digits or not is. I don't think
> there's any natural languages where a word can contain digits, so it
> must be a computer-oriented thing as well.
> 
> I like the "aword" name more than "lword", BTW. If we change the meaning
> of the classes, surely we can change the name as well, right?
> 
> Note that the default parser is useless for languages like Japanese,
> where words are not separated by whitespace, anyway.

Above is true but that does not neccessary mean that Tsearch is not
used for Japanese at all. I overcome the problem above by doing a
pre-process step which separate Japanese sentences to words devided by
white space. I wish I could write a new parser which could do the
job for 8.4 or later...

Please change the word definition very carefully.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to