On 2009-01-30, Bart Degryse <bart.degr...@indicator.be> wrote: > > --=__Part8EA648F8.0__= > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: quoted-printable > > Hi, > I have a text field with data like this: 'de patiënt niet' > (without the quotes). > I would like to convert this string to look like this: 'de patiënt > niet' > Basically what I need to do (I think) is > - get rid of the &, # and ; > - convert the number to hex > - make a UTF8 from that (thus: \xEB)
that is not UTF8. the UTF8 representation for 'ë' in SQL is e'\xC3\xAB' or chr(235) your input appears to be encoded in LATIN-1 (or possibly 8859-13) I think you'll need to write a function. here, where I have database encoding UTF8 this appears to work as you desire. CREATE or replace FUNCTION htmlent(inp text) returns text as $f$ DECLARE str text; BEGIN str= regexp_replace(quote_literal( inp) ,$$&#(\d+);$$ ,$$'||chr(\1)||'$$ ,'g'); execute 'select '||str into str; return str; END $f$ LANGUAGE PLPGSQL; select htmlent('de patiënt niet'); probably the above should be expanded to handle named entities like '&' too. characters outside of the LATIN-1 space are not handled but aparently this is what you want. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql