On Wed, Sep 20, 2017 at 5:07 AM, Corey Huinker <corey.huin...@gmail.com> wrote: > We are having an issue with a query that will return no results when the > query does a merge join with a foreign table, but (correctly) returns > results when using a hash join. > > Here is the situation on the "remote" database (9.5): > > > # \d+ table_with_en_us_utf8_encoding > Table "public.table_with_en_us_utf8_encoding" > Column | Type | Modifiers | Storage | Stats target | > Description > --------+------------------------+-----------+----------+--------------+------------- > id | bigint | | plain | | > str1 | character varying(255) | | extended | | > str2 | character varying(255) | | extended | | > str3 | character varying(255) | | extended | | > str4 | character varying(3) | | extended | | > > analytics=# select encoding, datcollate, datctype from pg_database where > datname = current_database(); > encoding | datcollate | datctype > ----------+-------------+------------- > 6 | en_US.UTF-8 | en_US.UTF-8 > > > > > And here's what we do on the local side (9.6): > > # select encoding, datcollate, datctype from pg_database where datname = > current_database(); > encoding | datcollate | datctype > ----------+------------+---------- > 6 | C | C > > # import foreign schema public limit to (table_with_en_us_utf8_encoding) > from server primary_replica into public; > > # \d+ table_with_en_us_utf8_encoding > Foreign table > "public.table_with_en_us_utf8_encoding" > Column | Type | Collation | Nullable | Default | FDW > options | Storage | Stats target | Description > --------+------------------------+-----------+----------+---------+----------------------+----------+--------------+------------- > id | bigint | | | | > (column_name 'id') | plain | | > str1 | character varying(255) | | | | > (column_name 'str1') | extended | | > str2 | character varying(255) | | | | > (column_name 'str2') | extended | | > str3 | character varying(255) | | | | > (column_name 'str3') | extended | | > str4 | character varying(3) | | | | > (column_name 'str4') | extended | | > Server: primary_replica > FDW options: (schema_name 'public', table_name > 'table_with_en_us_utf8_encoding') >
The collation column is empty here, which means that collation for str* columns is default collation i.e. C. This isn't true, since the default ncollation on the foreign server is different from the default collation of local database. AFAIU, import foreign schema should have set appropriate collation of the foreign table. > # create temporary table tmp_on_c_collated_foreign_server (str2 text); > > # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972'); > # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972'); > # insert into tmp_on_c_collated_foreign_server (str2) values ('576228972'); > > -- > -- query with merge join, returns zero rows > -- > # explain (analyze, verbose) select e.str1, e.str2, e.str3 from > tmp_on_c_collated_foreign_server c left join table_with_en_us_utf8_encoding > e on c.str2 = e.str2 where e.str4='2' ; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------ > Merge Join (cost=18041.88..22322.92 rows=229221 width=1548) (actual > time=102.849..102.849 rows=0 loops=1) > Output: e.str1, e.str2, e.str3 > Merge Cond: ((e.str2)::text = c.str2) > -> Foreign Scan on public.table_with_en_us_utf8_encoding e > (cost=17947.50..18705.95 rows=33709 width=93) (actual time=102.815..102.815 > rows=1 loops=1) > Output: e.id, e.str1, e.str2, e.str3, e.str4 > Remote SQL: SELECT str1, str2, str3 FROM > public.table_with_en_us_utf8_encoding WHERE ((str4 = '2'::text)) ORDER BY > str2 ASC NULLS LAST > -> Sort (cost=94.38..97.78 rows=1360 width=32) (actual > time=0.028..0.029 rows=7 loops=1) > Output: c.str2 > Sort Key: c.str2 > Sort Method: quicksort Memory: 25kB > -> Seq Scan on pg_temp_3.tmp_on_c_collated_foreign_server c > (cost=0.00..23.60 rows=1360 width=32) (actual time=0.010..0.011 rows=7 > loops=1) > Output: c.str2 > Planning time: 4.285 ms > Execution time: 104.458 ms > (14 rows) > Since the results returned by the foreign server are according to the collation of the foreign server, the order doesn't match with order expected by the local server and so the merge join reports different rows. > > -- > -- query with hash join, returns rows > > -- > > -- the default for the foreign server is to use remote estimates, so we turn > that off... > > # alter foreign table table_with_en_us_utf8_encoding OPTIONS (ADD > use_remote_estimate 'false'); > ALTER FOREIGN TABLE > > -- and then run the same query again > > # explain (analyze, verbose) select e.str1, e.str2, e.str3 from > tmp_on_c_collated_foreign_server c left join table_with_en_us_utf8_encoding > e on c.str2 = e.str2 where e.str4='2' ; > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=110.68..139.45 rows=7 width=1548) (actual > time=154.280..154.286 rows=7 loops=1) > Output: e.str1, e.str2, e.str3 > Hash Cond: (c.str2 = (e.str2)::text) > -> Seq Scan on pg_temp_3.tmp_on_c_collated_foreign_server c > (cost=0.00..23.60 rows=1360 width=32) (actual time=0.006..0.008 rows=7 > loops=1) > Output: c.str2 > -> Hash (cost=110.67..110.67 rows=1 width=1548) (actual > time=154.264..154.264 rows=33418 loops=1) > Output: e.str1, e.str2, e.str3 > Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory > Usage: 4003kB > -> Foreign Scan on public.table_with_en_us_utf8_encoding e > (cost=100.00..110.67 rows=1 width=1548) (actual time=8.289..144.210 > rows=33418 loops=1) > Output: e.str1, e.str2, e.str3 > Remote SQL: SELECT str1, str2, str3 FROM > public.table_with_en_us_utf8_encoding WHERE ((str4 = '2'::text)) > Planning time: 0.153 ms > Execution time: 156.557 ms > (13 rows) > > In this case, both tables use same collation while comparing the rows, so result is different from the merge join result. Hash join executed on local server and the same executed on foreign server (by importing local table to the foreign server) would also differ. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers