On Sat, Mar 26, 2011 at 12:36:43AM -0400, Tom Lane wrote: > ** Selecting a field from a record-returning function's output. > Currently, we'll use the field's declared collation; except that > if the field has default collation, we'll replace that with the common > collation of the function's inputs, if any. Is either part of that > sane? Do we need to make this work for functions invoked with other > syntax than a plain function call, eg operator or cast syntax?
That seems all a bit weird. I spent some time reading through the SQL spec to see if I could came up with a few ideas about what they thought relevent. I think the gist of it is that I think the result row should have for each column its declared collation in all cases. Firstly, the SQL doesn't go into the general case where the collate result of a function is based in it inputs. But in any case, if the function returns a record, the whole record would have that collation, which is absurd. I think letting it go to the contained columns is just weird. Secondly, I think the derivation algorithm is for determing the collation of expressions which have no otherwise declared collation. Anything returning a predefined record type has a predefined collation and it should be used. If you're in a query referring to rowvar.field and rowvar has a type, that's what should be used. (No doubt you have some corner cases in mind?) As for operators, they should behave like functions wherever possible, otherwise it's just introducing unnecessary differences. The cast-case is related to below. > ** What to do with domains whose declaration includes a COLLATE clause? > Currently, we'll impute that collation to the result of a cast to the > domain type --- even if the cast's input expression includes an > explicit COLLATE clause. It's not clear that that's per spec. If it > is correct, should we behave similarly for functions that are declared > to return a domain type? Should it matter if the cast-to-domain is > explicit or implicit? Perhaps it'd be best if domain collations only > mattered for columns declared with that domain type. Then we'd have > a general rule that collations only come into play in an expression > as a result of (a) the declared type of a column reference or (b) > an explicit COLLATE clause. The SQL spec considers the collation to be part of the datatype, so if you're casting to a domain (or type) you get the collation associated with that domain (or type). As per the spec: "The collation derivation of a declared type with a declared type collation that is explicitly or implicitly specified by a <data type> is implicit." So the result of a cast would be the collation of the specified type/domain with state implicit. Also, apparently the COLLATE clause as allowed anywhere where a datatype is permitted. So you can say: CAST( foo AS TEXT COLLATE "en_US" ) Not sure if that works now. The result would be implicit state, as opposed to if the COLLATE clause appears elsewhere. Incidently, a function returning a domain seems weird to me. What does it mean: (1) the function returns this type, Postgres assumes this is true, or (2) function returns something, Postgres does an implicit cast? In any case, I'd suggest it is treated as being included in the resolving of the return collation with the arguments so if the result is a domain and you apply the normal rules you get: (1) explicit states in the arguments will override it (2) if arguments are implicit state and conflict with domain, the result is no-collation, otherwise implicitly whatever the domain was (3) no arguments have collation, which means you get the domain default. Which all seems eminently reasonable. So I'd agree with your rules, but add a case (c) result of a cast. > * In plpgsql, is it OK for declared local variables to inherit the > function's input collation? Should we provide a COLLATE option in > variable declarations to let that be overridden? If Oracle understands > COLLATE, probably we should look at what they do in PL/SQL. If COLLATE is allowed anywhere where the datatype is allowed, then the COLLATE clause should be permitted there. Otherwise they become the specified type with whatever the default is for that type. In expressions the coercible-default state will get overridden by the IMPLICIT state from the arguments as appropriate. I note I'm using the term coercible default here, because that's what Transact-SQL calls the state for any variable or value that's not a column reference. I'm just checking and don't see any support for it in the SQL standard. While it seemed to me to be extremely useful, since it allows column references to override literals. > * RI triggers should insert COLLATE clauses in generated queries to > satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the > referenced column's collation. Right now you may get either table's > collation depending on which query type is involved. I think an obvious > failure may not be possible so long as equality means the same thing in > all collations, but it's definitely possible that the planner might > decide it can't use the referenced column's unique index, which would > suck for performance. (Note: this rule seems to prove that the > committee assumes equality can mean different things in different > collations, else they'd not have felt the need to specify.) Using the referenced collation makes it clear that you don't want to throw an error if the collations don't match. So yes, the generated queries need to have the correct collation clauses. Collations which redefine equality are on the horizon, case and accent insensetivity being the obvious cases. Better get it right now. > * It'd sure be nice if we had some nontrivial test cases that work in > encodings besides UTF8. I'm still bothered that the committed patch > failed to cover single-byte-encoding cases in upper/lower/initcap. That'd be nice. Perhaps one the existing cases can be converted to some latin-X encoding. It does rely on the collation existing on the test machine. > * Is it worth adding a cares-about-collation flag to pg_proc? Probably > too late to be worrying about such refinements for 9.1. Long term it will be needed. It's been skipped now, but it will let you produce better errors messages in the future. Have a nice day, -- Martijn van Oosterhout <klep...@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
signature.asc
Description: Digital signature