Just for clarification. Are you going to make these changes in the 8.3 beta test period? -- Tatsuo Ishii SRA OSS, Inc. Japan
> If I am reading the state machine in wparser_def.c correctly, the > three classifications of words that the default parser knows are > > lword Composed entirely of ASCII letters > nlword Composed entirely of non-ASCII letters > (where "letter" is defined by iswalpha()) > word Entirely alphanumeric (per iswalnum()), but not above > cases > > This classification is probably sane enough for dealing with mixed > Russian/English text --- IIUC, Russian words will come entirely from > the Cyrillic alphabet which has no overlap with ASCII letters. But > I'm thinking it'll be quite inconvenient for other European languages > whose alphabets include the base ASCII letters plus other stuff such > as accented letters. They will have a lot of words that fall into > the catchall "word" category, which will mean they have to index > mixed alpha-and-number words in order to catch all native words. > > ISTM that perhaps a more generally useful definition would be > > lword Only ASCII letters > nlword Entirely letters per iswalpha(), but not lword > word Entirely alphanumeric per iswalnum(), but not nlword > (hence, includes at least one digit) > > However, I am no linguist and maybe I'm missing something. > > Comments? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match