Re: [PATCHES] Bunch of tsearch fixes and cleanup

Heikki Linnakangas Thu, 23 Aug 2007 07:59:11 -0700

Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
>> - readstopwords calls recode_and_lowerstr directly, instead of using the
>>  "wordop" function pointer in StopList struct. All callers used
>> recode_and_lowerstr anyway, so this simplifies the code a little bit. Is
>> there any external dictionary implementations that would require
>> different behavior?
> 
> I don't think eliminating wordop altogether is such a hot idea; some
> dictionary could possibly want to do different processing than that.


Ok.

> Something that was annoying me yesterday was that it was not clear
> whether we had fixed every single place that uses a tsearch config file
> to assume that the file is in UTF8 and should be converted to database
> encoding.

I'm afraid there's still a lot of inconsistencies in that. I'm just
looking at dict_synonym, and it looks like it has the same problem I
patched in readstopwords; it's using pg_verifymbstr, with database
encoding, to verify the input file. It also seems to be calling
pg_mblen, which depends on database encoding, against UTF-8 encoded
strings. I'll look at those more closely..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: [PATCHES] Bunch of tsearch fixes and cleanup

Reply via email to