Re: daitch_mokotoff module

Dag Lem Mon, 03 Jan 2022 05:07:38 -0800

Tom Lane <t...@sss.pgh.pa.us> writes:

> Thomas Munro <thomas.mu...@gmail.com> writes:
>> Erm, it looks like something weird is happening somewhere in cfbot's
>> pipeline, because Dag's patch says:
>
>> +SELECT daitch_mokotoff('Straßburg');
>> + daitch_mokotoff
>> +-----------------
>> + 294795
>> +(1 row)
>
> ... so, that test case is guaranteed to fail in non-UTF8 encodings,
> I suppose?  I wonder what the LANG environment is in that cfbot
> instance.
>
> (We do have methods for dealing with non-ASCII test cases, but
> I can't see that this patch is using any of them.)
>
>                       regards, tom lane
>


I naively assumed that tests would be run in an UTF8 environment.

Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
two other modules are using UTF8 characters in tests - citext and
unaccent.

The citext tests seem to be commented out - "Multibyte sanity
tests. Uncomment to run."

Looking into the unaccent module, I don't quite understand how it will
work with various encodings, since it doesn't seem to decode its input -
will it fail if run under anything but ASCII or UTF8?

In any case, I see that unaccent.sql starts as follows:


CREATE EXTENSION unaccent;

-- must have a UTF8 database
SELECT getdatabaseencoding();

SET client_encoding TO 'UTF8';


Would doing the same thing in fuzzystrmatch.sql fix the problem with
failing tests? Should I prepare a new patch?


Best regards

Dag Lem

Re: daitch_mokotoff module

Reply via email to