Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t
On Jan 8, 4:14 am, [EMAIL PROTECTED] (Tom Lane) wrote: [EMAIL PROTECTED] writes: I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : translation failed from server encoding to wchar_t My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). Two likely theories: 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes some other encoding. 2. The encode() is yielding something that isn't valid UTF-8. PG 8.3 contains checks that should complain about both of these scenarios, but IIRC 8.2 does not. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Dear Tom, Thanks for your reply. This is the output of `locale` on my system : # locale LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= As for your second scenario I guess you are right, it's possible encode does not return all UTF8 characters. But to_tsvector() succeeds and fails at random with this kind of characters... So how can I sanitize output from encode before I pipe it to to_tsvector() ? Regards, Ilan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t
[EMAIL PROTECTED] wrote: NEW.idxFTI := to_tsvector('simple', encode($x$E$x$|| NEW.messageblk, 'escape')); I strongly doubt that this does what you think it does - I would check the results if I were you. The $x$E$x$ should almost certainly not be there - if you are trying to get E'foo' behaviour, that is purely for literals. All you are doing here is to prepend a literal 'E' to your value. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t
On Jan 8, 10:43 am, [EMAIL PROTECTED] wrote: On Jan 8, 4:14 am, [EMAIL PROTECTED] (Tom Lane) wrote: [EMAIL PROTECTED] writes: I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : translation failed from server encoding to wchar_t My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). Two likely theories: 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes some other encoding. 2. The encode() is yielding something that isn't valid UTF-8. PG 8.3 contains checks that should complain about both of these scenarios, but IIRC 8.2 does not. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Dear Tom, Thanks for your reply. This is the output of `locale` on my system : # locale LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= As for your second scenario I guess you are right, it's possible encode does not return all UTF8 characters. But to_tsvector() succeeds and fails at random with this kind of characters... So how can I sanitize output from encode before I pipe it to to_tsvector() ? Regards, Ilan Tom, To get around the non-UTF8 chars I used following function : CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS trigger AS $$ DECLARE BEGIN RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr; BEGIN NEW.idxFTI := to_tsvector('simple', encode($x$E$x$|| NEW.messageblk, 'escape')); RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr; RETURN NEW; EXCEPTION WHEN character_not_in_repertoire THEN RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %', NEW.messageblk_idnr; NEW.idxFTI := to_tsvector('simple', 'character_not_in_repertoire: This email contains illegal characters.'); RETURN NEW; END; END; $$ LANGUAGE plpgsql; Hope this helps others with DBmail and tsearch2 on postgres 8.2 Thanks for your help Tom, ilan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t
http://pastebin.ca/845670 This url provides a testcase ... fisrt pass : = UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; ERROR: translation failed from server encoding to wchar_t second pass : = UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 The to_tsvector seems to accept the row at random ... On Jan 7, 9:16 pm, [EMAIL PROTECTED] wrote: I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : translation failed from server encoding to wchar_t My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). In 8.1 with tsearch2 it worked perfectly ... Thanks for you help, ilan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t
Found something interesting with this testcase. update fails after SELECT query. Can anyone confirm this ??? dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 dbname= select * from dbmail_messageblks where messageblk_idnr = 12949; messageblk_idnr | physmessage_id | messageblk | blocksize | is_header | idxfti -+ +-- +---+--- +- 12949 | 6319 | l'\351quipe de Casinos-park a bien re\347u votre messsage. \012\012Vous aurez une r\351ponse d\350s que l'un de nos responsables aura pris connaissance de votre envoi. \012\012cordialement\012\012l'\351quipe de casinos-park. \012\012\012\012The team of Casinos-park received your messsage.\012\012You will have an answer as soon as one of our persons in charge takes note of your sending. \012\012Best regards \012\012The team of casinos-park.\012 | 398 | 0 | '=':3 'e':5 'h':11 'i':2,10 'k':12 'l':1 'o':7 'p':9,16 'r':8 's':6 't':13 'u':15 'y':14 'rk':4 (1 row) dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; ERROR: translation failed from server encoding to wchar_t dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 On Jan 7, 10:21 pm, [EMAIL PROTECTED] wrote: http://pastebin.ca/845670 This url provides a testcase ... fisrt pass : = UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; ERROR: translation failed from server encoding to wchar_t second pass : = UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 The to_tsvector seems to accept the row at random ... On Jan 7, 9:16 pm, [EMAIL PROTECTED] wrote: I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : translation failed from server encoding to wchar_t My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). In 8.1 with tsearch2 it worked perfectly ... Thanks for you help, ilan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t
[EMAIL PROTECTED] writes: I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : translation failed from server encoding to wchar_t My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). Two likely theories: 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes some other encoding. 2. The encode() is yielding something that isn't valid UTF-8. PG 8.3 contains checks that should complain about both of these scenarios, but IIRC 8.2 does not. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly