Hi,
I was having a look at this snippet:
http://wiki.postgresql.org/wiki/Google_Translate
and it turns out that it doesn't work if the result contains non-ASCII
chars. Does anybody know how to fix it?
alvherre=# select gtranslate('en', 'es', 'he');
ERROR: plpython: function "gtranslate" could not create return value
DETALLE: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode
character u'\xe9' in position 0: ordinal not in range(128)
By adding a plpy.log() call you can see that the answer is "él":
LOG: (u'\xe9l',)
I guess it needs some treatment similar to the one in this function:
http://wiki.postgresql.org/wiki/Strip_accents_from_strings
For completeness, here is the code:
CREATE OR REPLACE FUNCTION gtranslate(src text, target text, phrase text)
RETURNS text
LANGUAGE plpythonu
AS $$
import re
import urllib
import simplejson as json
class UrlOpener(urllib.FancyURLopener):
version = "py-gtranslate/1.0"
base_uri = "http://ajax.googleapis.com/ajax/services/language/translate"
default_params = {'v': '1.0'}
def translate(src, to, phrase):
args = default_params.copy()
args.update({
'langpair': '%s%%7C%s' % (src, to),
'q': urllib.quote_plus(phrase),
})
argstring = '%s' % ('&'.join(['%s=%s' % (k,v) for (k,v) in
args.iteritems()]))
resp = json.load(UrlOpener().open('%s?%s' % (base_uri, argstring)))
try:
return resp['responseData']['translatedText']
except:
# should probably warn about failed translation
return phrase
return translate(src, target, phrase)
$$;
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers