Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t

2008-01-08 Thread ilanco
On Jan 8, 4:14 am, [EMAIL PROTECTED] (Tom Lane) wrote:
 [EMAIL PROTECTED] writes:
  I am using tsearch2 with pgsql 8.2.5 and get the following error when
  calling to_tsvector :
  translation failed from server encoding to wchar_t
  My database is UTF8 encoded and the data sent to to_tsvector comes
  from a bytea column converted to text with
  encode(COLUMN, 'escape').

 Two likely theories:

 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
 some other encoding.

 2. The encode() is yielding something that isn't valid UTF-8.

 PG 8.3 contains checks that should complain about both of these
 scenarios, but IIRC 8.2 does not.

 regards, tom lane

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Dear Tom,

Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?

Regards,

Ilan



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t

2008-01-08 Thread Andrew Dunstan



[EMAIL PROTECTED] wrote:

NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));

  


I strongly doubt that this does what you think it does - I would check 
the results if I were you. The $x$E$x$ should almost certainly not be 
there - if you are trying to get E'foo' behaviour, that is purely for 
literals. All you are doing here is to prepend a literal 'E' to your value.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t

2008-01-08 Thread ilanco
On Jan 8, 10:43 am, [EMAIL PROTECTED] wrote:
 On Jan 8, 4:14 am, [EMAIL PROTECTED] (Tom Lane) wrote:



  [EMAIL PROTECTED] writes:
   I am using tsearch2 with pgsql 8.2.5 and get the following error when
   calling to_tsvector :
   translation failed from server encoding to wchar_t
   My database is UTF8 encoded and the data sent to to_tsvector comes
   from a bytea column converted to text with
   encode(COLUMN, 'escape').

  Two likely theories:

  1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
  some other encoding.

  2. The encode() is yielding something that isn't valid UTF-8.

  PG 8.3 contains checks that should complain about both of these
  scenarios, but IIRC 8.2 does not.

  regards, tom lane

  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

 Dear Tom,

 Thanks for your reply.
 This is the output of `locale` on my system :
 # locale
 LANG=en_US.UTF-8
 LC_CTYPE=en_US.UTF-8
 LC_NUMERIC=en_US.UTF-8
 LC_TIME=en_US.UTF-8
 LC_COLLATE=en_US.UTF-8
 LC_MONETARY=en_US.UTF-8
 LC_MESSAGES=en_US.UTF-8
 LC_PAPER=en_US.UTF-8
 LC_NAME=en_US.UTF-8
 LC_ADDRESS=en_US.UTF-8
 LC_TELEPHONE=en_US.UTF-8
 LC_MEASUREMENT=en_US.UTF-8
 LC_IDENTIFICATION=en_US.UTF-8
 LC_ALL=

 As for your second scenario I guess you are right, it's possible
 encode does not return all UTF8 characters.
 But to_tsvector() succeeds and fails at random with this kind of
 characters...
 So how can I sanitize output from encode before I pipe it to
 to_tsvector() ?

 Regards,

 Ilan

Tom,

To get around the non-UTF8 chars I used following function :

CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS trigger
AS $$
DECLARE
BEGIN
  RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr;
  BEGIN
NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));
RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr;
RETURN NEW;
  EXCEPTION
WHEN character_not_in_repertoire THEN
  RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %',
NEW.messageblk_idnr;
  NEW.idxFTI := to_tsvector('simple',
'character_not_in_repertoire: This email contains illegal
characters.');
RETURN NEW;
  END;
END;
$$
LANGUAGE plpgsql;

Hope this helps others with DBmail and tsearch2 on postgres 8.2

Thanks for your help Tom,

ilan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t

2008-01-07 Thread ilanco
http://pastebin.ca/845670

This url provides a testcase ...

fisrt pass :
= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR:  translation failed from server encoding to wchar_t

second pass :
= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1


The to_tsvector seems to accept the row at random ...


On Jan 7, 9:16 pm, [EMAIL PROTECTED] wrote:
 I am using tsearch2 with pgsql 8.2.5 and get the following error when
 calling to_tsvector :
 translation failed from server encoding to wchar_t

 My database is UTF8 encoded and the data sent to to_tsvector comes
 from a bytea column converted to text with
 encode(COLUMN, 'escape').

 In 8.1 with tsearch2 it worked perfectly ...

 Thanks for you help,

 ilan


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t

2008-01-07 Thread ilanco
Found something interesting with this testcase.
update fails after SELECT query.
Can anyone confirm this ???


dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname= select * from dbmail_messageblks where messageblk_idnr =
12949;
 messageblk_idnr | physmessage_id
|
messageblk
| blocksize | is_header |
idxfti
-+
+--
+---+---
+-
   12949 |   6319 | l'\351quipe de Casinos-park a bien
re\347u votre messsage. \012\012Vous aurez une r\351ponse d\350s que
l'un de nos responsables aura pris connaissance de votre envoi.
\012\012cordialement\012\012l'\351quipe de casinos-park.
\012\012\012\012The team of Casinos-park received
your messsage.\012\012You will have an answer as soon as one of our
persons in charge takes note of your sending. \012\012Best regards
\012\012The team of casinos-park.\012 |   398 | 0 | '=':3
'e':5 'h':11 'i':2,10 'k':12 'l':1 'o':7 'p':9,16 'r':8 's':6 't':13
'u':15 'y':14 'rk':4
(1 row)

dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR:  translation failed from server encoding to wchar_t
dbname= UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1


On Jan 7, 10:21 pm, [EMAIL PROTECTED] wrote:
 http://pastebin.ca/845670

 This url provides a testcase ...

 fisrt pass :
 = UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
 encode(messageblk, 'escape')) where messageblk_idnr = 12949;
 ERROR:  translation failed from server encoding to wchar_t

 second pass :
 = UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
 encode(messageblk, 'escape')) where messageblk_idnr = 12949;
 UPDATE 1

 The to_tsvector seems to accept the row at random ...

 On Jan 7, 9:16 pm, [EMAIL PROTECTED] wrote:

  I am using tsearch2 with pgsql 8.2.5 and get the following error when
  calling to_tsvector :
  translation failed from server encoding to wchar_t

  My database is UTF8 encoded and the data sent to to_tsvector comes
  from a bytea column converted to text with
  encode(COLUMN, 'escape').

  In 8.1 with tsearch2 it worked perfectly ...

  Thanks for you help,

  ilan


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: translation failed from server encoding to wchar_t

2008-01-07 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I am using tsearch2 with pgsql 8.2.5 and get the following error when
 calling to_tsvector :
 translation failed from server encoding to wchar_t

 My database is UTF8 encoded and the data sent to to_tsvector comes
 from a bytea column converted to text with
 encode(COLUMN, 'escape').

Two likely theories:

1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.

2. The encode() is yielding something that isn't valid UTF-8.

PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly