On Jan 8, 10:43 am, [EMAIL PROTECTED] wrote: > On Jan 8, 4:14 am, [EMAIL PROTECTED] (Tom Lane) wrote: > > > > > [EMAIL PROTECTED] writes: > > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > > calling to_tsvector : > > > "translation failed from server encoding to wchar_t" > > > My database is UTF8 encoded and the data sent to to_tsvector comes > > > from a bytea column converted to text with > > > encode(COLUMN, 'escape'). > > > Two likely theories: > > > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes > > some other encoding. > > > 2. The encode() is yielding something that isn't valid UTF-8. > > > PG 8.3 contains checks that should complain about both of these > > scenarios, but IIRC 8.2 does not. > > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to [EMAIL PROTECTED] so that your > > message can get through to the mailing list cleanly > > Dear Tom, > > Thanks for your reply. > This is the output of `locale` on my system : > # locale > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_PAPER="en_US.UTF-8" > LC_NAME="en_US.UTF-8" > LC_ADDRESS="en_US.UTF-8" > LC_TELEPHONE="en_US.UTF-8" > LC_MEASUREMENT="en_US.UTF-8" > LC_IDENTIFICATION="en_US.UTF-8" > LC_ALL= > > As for your second scenario I guess you are right, it's possible > encode does not return all UTF8 characters. > But to_tsvector() succeeds and fails at random with this kind of > characters... > So how can I sanitize output from encode before I pipe it to > to_tsvector() ? > > Regards, > > Ilan
Tom, To get around the non-UTF8 chars I used following function : CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS "trigger" AS $$ DECLARE BEGIN RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr; BEGIN NEW.idxFTI := to_tsvector('simple', encode($x$E$x$|| NEW.messageblk, 'escape')); RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr; RETURN NEW; EXCEPTION WHEN character_not_in_repertoire THEN RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %', NEW.messageblk_idnr; NEW.idxFTI := to_tsvector('simple', 'character_not_in_repertoire: This email contains illegal characters.'); RETURN NEW; END; END; $$ LANGUAGE plpgsql; Hope this helps others with DBmail and tsearch2 on postgres 8.2 Thanks for your help Tom, ilan ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq