Tom Lane <t...@sss.pgh.pa.us> writes: > Thomas Munro <thomas.mu...@gmail.com> writes: >> Erm, it looks like something weird is happening somewhere in cfbot's >> pipeline, because Dag's patch says: > >> +SELECT daitch_mokotoff('Straßburg'); >> + daitch_mokotoff >> +----------------- >> + 294795 >> +(1 row) > > ... so, that test case is guaranteed to fail in non-UTF8 encodings, > I suppose? I wonder what the LANG environment is in that cfbot > instance. > > (We do have methods for dealing with non-ASCII test cases, but > I can't see that this patch is using any of them.) > > regards, tom lane >
I naively assumed that tests would be run in an UTF8 environment. Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that two other modules are using UTF8 characters in tests - citext and unaccent. The citext tests seem to be commented out - "Multibyte sanity tests. Uncomment to run." Looking into the unaccent module, I don't quite understand how it will work with various encodings, since it doesn't seem to decode its input - will it fail if run under anything but ASCII or UTF8? In any case, I see that unaccent.sql starts as follows: CREATE EXTENSION unaccent; -- must have a UTF8 database SELECT getdatabaseencoding(); SET client_encoding TO 'UTF8'; Would doing the same thing in fuzzystrmatch.sql fix the problem with failing tests? Should I prepare a new patch? Best regards Dag Lem