Re: Bug in either collation docs or code

2018-06-08 Thread Tom Lane
Melanie Plageman  writes:
> On postgres built off of master on my mac (sierra), the following is the
> output:

[ scratches head ... ]  I get the same results on either Mac or Linux:

regression=# create database u8 encoding utf8 template template0;
CREATE DATABASE
regression=# \c u8
You are now connected to database "u8" as user "tgl".
u8=# CREATE TABLE test_col_mac (
u8(# a text COLLATE "de_DE",
u8(# b text COLLATE "es_ES"
u8(# );
CREATE TABLE
u8=# SELECT a < (SELECT 'foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
 ?column? 
--
(0 rows)

u8=# INSERT INTO test_col_mac VALUES('A','A');
INSERT 0 1
u8=# SELECT a < (SELECT 'foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

Now, without the sub-select, it works:

u8=# SELECT a < ('foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
 ?column? 
--
 t
(1 row)

because the explicit COLLATE is considered to determine the
collation of the comparison operator.

I wonder whether you're using stock Postgres, or something that
flattens scalar sub-selects?

regards, tom lane



Re: Bug in either collation docs or code

2018-06-08 Thread Melanie Plageman
>
> I did my test on 9.6.5 ​(Ubuntu 16.04) with:
>
> CREATE TABLE test_col (
> a text COLLATE "en_CA.utf8",
> b text COLLATE "en_US.utf8"
> );
> INSERT INTO test_col VALUES ('A', 'A');
> SELECT a < (SELECT 'foo'::text COLLATE "en_GB.utf8") FROM test_col;
>
> SQL Error: ERROR:  could not determine which collation to use for string
> comparison
> HINT:  Use the COLLATE clause to set the collation explicitly.
>
> Note, I had to include the cast to text in order for the query to
> execute...
>
> David J.
>
>
On postgres built off of master on my mac (sierra), the following is the
output:

DROP TABLE IF EXISTS test_col_mac;
DROP TABLE
CREATE TABLE test_col_mac (
a text COLLATE "de_DE",
b text COLLATE "es_ES"
);
CREATE TABLE
INSERT INTO test_col_mac VALUES('A','A');
INSERT 0 1
SELECT a < (SELECT 'foo'::TEXT COLLATE "fr_FR") FROM test_col_mac;
 ?column?
--
 t
(1 row)


-- 
Melanie Plageman


Re: Bug in either collation docs or code

2018-06-08 Thread Tom Lane
Melanie Plageman  writes:
> In this case, why treat implicit and explicit collation conflicts
> differently?

Um ... because the SQL standard says so?

regards, tom lane



Re: Bug in either collation docs or code

2018-06-08 Thread David G. Johnston
On Fri, Jun 8, 2018 at 9:24 AM, Melanie Plageman 
wrote:

> It seems like this would not allow the function/operator to decide if it
> cares about a determinate collation during execution, since it would
> already have errored out during planning.
>

In the case where the function/operator doesn't care one shouldn't be
attaching explicit collation clauses to its inputs anyway - it is a
semantic bug if nothing else and a bug during planning pointing that out
seems useful.

David J.


Re: Bug in either collation docs or code

2018-06-08 Thread Melanie Plageman
> IIRC this was an intentional decision, made on the grounds that we
> can't tell whether the function/operator actually cares about having
> a determinate collation or not, so we have to leave it to execution of
> that function/operator to complain or not.
>
>
In this case, why treat implicit and explicit collation conflicts
differently? A conflicting explicit collation will produce an error during
planning, whereas a conflicting implicit collation will not produce an
error until execution.

create table foo(a text COLLATE "es_ES");
create table bar(b text COLLATE "de_DE");
insert into foo values('a'), ('b'), ('c'), ('d');
insert into bar values('b'), ('c'), ('g'), ('h');

SELECT * FROM foo WHERE a > (SELECT b FROM bar LIMIT 1); -- error during
execution
EXPLAIN SELECT * FROM foo WHERE a > (SELECT b FROM bar LIMIT 1); -- but no
error during planning
EXPLAIN SELECT 'c' COLLATE "de_DE" > 'ç' COLLATE "es_ES"; -- error during
planning

It seems like this would not allow the function/operator to decide if it
cares about a determinate collation during execution, since it would
already have errored out during planning.

-- 
Melanie Plageman


Re: Bug in either collation docs or code

2018-06-08 Thread David G. Johnston
On Fri, Jun 8, 2018 at 9:12 AM, Melanie Plageman 
wrote:

> I tried inserting data and did not get an error:
>
> CREATE TABLE test1 (
> a text COLLATE "de_DE",
> b text COLLATE "es_ES"
> );
>
> INSERT INTO test1 VALUES('b','b'), ('c','c'), ('g','g'), ('h','h');
> SELECT a < (select 'foo' COLLATE "fr_FR") FROM test1;
>

​Suggest providing a self-contained script (set echo to all and capture the
output to a file), changing the table name to ensure no test pollution, and
including the version of the server in one of the queries.

I did my test on 9.6.5 ​(Ubuntu 16.04) with:

CREATE TABLE test_col (
a text COLLATE "en_CA.utf8",
b text COLLATE "en_US.utf8"
);
INSERT INTO test_col VALUES ('A', 'A');
SELECT a < (SELECT 'foo'::text COLLATE "en_GB.utf8") FROM test_col;

SQL Error: ERROR:  could not determine which collation to use for string
comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

Note, I had to include the cast to text in order for the query to execute...

David J.


Re: Bug in either collation docs or code

2018-06-08 Thread Melanie Plageman
>
> ​Data, apparently...I got the same non-error result before inserting a
> record into test1 then I got the expected error.
>
> Its the function/operator the fails when faced with invalid input, not the
> planner, so the error requires data to provoke.
>
> David J.
>
>
>
I tried inserting data and did not get an error:

CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES"
);

INSERT INTO test1 VALUES('b','b'), ('c','c'), ('g','g'), ('h','h');
SELECT a < (select 'foo' COLLATE "fr_FR") FROM test1;

-- 
Melanie Plageman


Re: Bug in either collation docs or code

2018-06-07 Thread Tom Lane
"David G. Johnston"  writes:
> On Thu, Jun 7, 2018 at 4:37 PM, Melanie Plageman 
> wrote:
>> I thought this would error out because the subquery's result is considered
>> implicit and, in this case, it seems you now have conflicting implicit
>> collations. However, this does not produce an error. What am I missing?

> Data, apparently...I got the same non-error result before inserting a
> record into test1 then I got the expected error.
> Its the function/operator the fails when faced with invalid input, not the
> planner, so the error requires data to provoke.

IIRC this was an intentional decision, made on the grounds that we
can't tell whether the function/operator actually cares about having
a determinate collation or not, so we have to leave it to execution of
that function/operator to complain or not.

If collation had been part of the system design to start with, we'd
probably have insisted on being able to determine this sooner.  But
it wasn't, and when we added it, the ability to throw an error sooner
did not seem worth breaking a lot of third-party code for.

regards, tom lane



Re: Bug in either collation docs or code

2018-06-07 Thread David G. Johnston
On Thu, Jun 7, 2018 at 4:37 PM, Melanie Plageman 
wrote:

> CREATE TABLE test1 (
> a text COLLATE "de_DE",
> b text COLLATE "es_ES",
> ...
> );
>
> My thought was to add the following example:
>
> SELECT a < (select 'foo' COLLATE "fr_FR") FROM test1;
>
> I thought this would error out because the subquery's result is considered
> implicit and, in this case, it seems you now have conflicting implicit
> collations. However, this does not produce an error. What am I missing?
>

​Data, apparently...I got the same non-error result before inserting a
record into test1 then I got the expected error.

Its the function/operator the fails when faced with invalid input, not the
planner, so the error requires data to provoke.

David J.


Re: Bug in either collation docs or code

2018-06-07 Thread Melanie Plageman
You could mark the subquery's result with a collation like this:
>
> postgres=# SELECT 'c' COLLATE "de_DE" > (SELECT 'ç') COLLATE "es_ES";
> ERROR:  collation mismatch between explicit collations "de_DE" and "es_ES"
>
> I'm not sure if this behavior is considered a bug, but I also can't imagine
>> how it would be expected given the current documentation. It seems to me
>> one or the other should be updated.
>>
>
> It seems correct to me. It does say "An explicit collation derivation
> occurs when a COLLATE clause is used; all other collation derivations are
> implicit". A subquery falls under the "all other collation derivations"
> category. Perhaps we could make it more clear what the COLLATE clause binds
> to, especially with subqueries, but I'm not sure how exactly to phrase it.
> Perhaps an additional example with a subquery would help?
>

So, I tried coming up with an example with a subquery that explains what
the COLLATE clause binds to, and I realized I'm still not clear. I came up
with an example using the DDL that is in the docs
:

CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES",
...
);

My thought was to add the following example:

SELECT a < (select 'foo' COLLATE "fr_FR") FROM test1;

I thought this would error out because the subquery's result is considered
implicit and, in this case, it seems you now have conflicting implicit
collations. However, this does not produce an error. What am I missing? The
result of the subquery has collation "fr_FR" and, if it's implicit, then I
shouldn't be able to compare it with test1.a, which has an implicit
collation of "de_DE".


Re: Bug in either collation docs or code

2018-06-06 Thread Heikki Linnakangas

On 05/06/18 07:10, Melanie Plageman wrote:

Hi,
I noticed what seems like a bug in collation.

This query errors out:
SELECT (SELECT 'c' COLLATE "de_DE") > (SELECT 'ç' COLLATE "es_ES");  --
error

While this query does not:
SELECT 'c' COLLATE "de_DE" > (SELECT 'ç' COLLATE "es_ES");  -- no error

It seems like this is in conflict with what the documentation
 says:
"If any input expression has an explicit collation derivation, then all
explicitly derived collations among the input expressions must be the same,
otherwise an error is raised."

After digging into it a bit, I believe the reason for this is that when we
are assigning collations in assign_collations_walker, we always set
collation strength to IMPLICIT for the subquery and always set the
collation strength to EXPLICIT for the collate node on the other side of
the OpExpr. So, we don't hit an error later like the one in
merge_collation_state when the collation of one expression is conflicting
with that of its parent and the strength of both is EXPLICIT.
I think this still applies to our case because one of the two arguments to
OpExpr would have set their parent's collation strength to either IMPLICIT
or EXPLICIT and then we will process the other argument which would then
have a different collation strength than the one we just set its parent to.
So, we end up setting the inputcollid for the OpExpr to that of the
explicit collation in the collate node.

Basically, it seems like our subquery will always have its collation
strength set to IMPLICIT, so, if we have explicit collation in the first
target entry of the subquery's target list, it looks like we will never
truly treat that as explicit collation.


Right, a subquery's result is considered IMPLICIT, even if there is an 
explicit COLLATE inside the subquery.


You could mark the subquery's result with a collation like this:

postgres=# SELECT 'c' COLLATE "de_DE" > (SELECT 'ç') COLLATE "es_ES";
ERROR:  collation mismatch between explicit collations "de_DE" and "es_ES"


I'm not sure if this behavior is considered a bug, but I also can't imagine
how it would be expected given the current documentation. It seems to me
one or the other should be updated.


It seems correct to me. It does say "An explicit collation derivation 
occurs when a COLLATE clause is used; all other collation derivations 
are implicit". A subquery falls under the "all other collation 
derivations" category. Perhaps we could make it more clear what the 
COLLATE clause binds to, especially with subqueries, but I'm not sure 
how exactly to phrase it. Perhaps an additional example with a subquery 
would help?


- Heikki



Bug in either collation docs or code

2018-06-04 Thread Melanie Plageman
Hi,
I noticed what seems like a bug in collation.

This query errors out:
SELECT (SELECT 'c' COLLATE "de_DE") > (SELECT 'ç' COLLATE "es_ES");  --
error

While this query does not:
SELECT 'c' COLLATE "de_DE" > (SELECT 'ç' COLLATE "es_ES");  -- no error

It seems like this is in conflict with what the documentation
 says:
"If any input expression has an explicit collation derivation, then all
explicitly derived collations among the input expressions must be the same,
otherwise an error is raised."

After digging into it a bit, I believe the reason for this is that when we
are assigning collations in assign_collations_walker, we always set
collation strength to IMPLICIT for the subquery and always set the
collation strength to EXPLICIT for the collate node on the other side of
the OpExpr. So, we don't hit an error later like the one in
merge_collation_state when the collation of one expression is conflicting
with that of its parent and the strength of both is EXPLICIT.
I think this still applies to our case because one of the two arguments to
OpExpr would have set their parent's collation strength to either IMPLICIT
or EXPLICIT and then we will process the other argument which would then
have a different collation strength than the one we just set its parent to.
So, we end up setting the inputcollid for the OpExpr to that of the
explicit collation in the collate node.

Basically, it seems like our subquery will always have its collation
strength set to IMPLICIT, so, if we have explicit collation in the first
target entry of the subquery's target list, it looks like we will never
truly treat that as explicit collation.

To test the theory that this is why we are not erroring out with explicit
collation on either side of our expression, I added a hack to set the
collation strength for a subquery to EXPLICIT if its first target entry is
a collate node (see attached patch). With this hack, it throws what I think
is the correct error in the case above.
However, I am sure that this is too specific a way of solving this. Just on
first thought, it wouldn't handle SubPlans (where we have a param as the
other argument to the OpExpr).

This query plans and executes with no error now and with the attached patch:
select 'c' COLLATE "de_DE" > ANY(select 'ç' COLLATE "es_ES");

I'm not sure if this behavior is considered a bug, but I also can't imagine
how it would be expected given the current documentation. It seems to me
one or the other should be updated.

-- 
Melanie Plageman


subquery_opexpr_explicit_collation_hack.patch
Description: Binary data