subject:"\[HACKERS\] Theory of operation of collation patch"

Re: [HACKERS] Theory of operation of collation patch

2011-03-10 Thread Greg Stark

On Wed, Mar 9, 2011 at 1:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another interesting item ... I see that you added a collation field to
 TypeName, apparently on the grounds that the SQL spec includes collation
 in data type.  However, it seems to me that that is nonsense up with
 which we should not put.

The SQL committee has demonstrably awful taste but they're usually not
entirely nutty. Usually whatever they do they had a reason. Is there
any hint at all of what they were trying to accomplish here? When you
say basically only used in CAST and column definitions are there
other less common cases where it's convenient?

Or was this just a case of some existing database allowed COLLATE
clauses in column definitions as part of the type and they preferred
to have it in a different syntax so they did this to allow either to
work? If we allow collate clause in ColQualList are we covering the
very case they were trying to deal with?

Can you give an example of what a column definition would look like if
you put the COLLATE clause in the data type in a way that wouldn't
be parsed according to your plan?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-10 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 Can you give an example of what a column definition would look like if
 you put the COLLATE clause in the data type in a way that wouldn't
 be parsed according to your plan?

Column definitions look and act the same.  The point of the change is to
not accept COLLATE in all the places where type names are used that are
*not* column definition lists.  The original patch accepted nonsense like

create cast (text collate C as text collate de_DE) with ...

and we'd have had to do quite a lot of additional hacking to plug those
holes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-10 Thread Peter Eisentraut

On tis, 2011-03-08 at 20:52 -0500, Tom Lane wrote:
 I think we should drop collate clause from TypeName and just have it
 in columnDef and the expression syntax.

Yes, that sounds better in retrospect.  It's easier to see that now that
we see all the cases where it's used and not used.

 This might also ease the
 ambiguity problem that evidently led you to restrict the expression
 production's argument to c_expr.

Maybe, but I seem to recall that I did actually check and concluded that
c_expr covers all cases where collate clause is allowed.  We could of
course allow more cases, but maybe it's not necessary.

 It would also allow us to meet the
 letter of the spec for column definition, in that collate clause
 is not required to immediately follow data type.

Note that that case is listed under a separate feature.  I'm not sure if
it's worth supporting, but if they bothered putting it in it's probably
for compatibility with some existing implementation.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-09 Thread Martijn van Oosterhout

On Tue, Mar 08, 2011 at 08:52:41PM -0500, Tom Lane wrote:
 Another interesting item ... I see that you added a collation field to
 TypeName, apparently on the grounds that the SQL spec includes collation
 in data type.  However, it seems to me that that is nonsense up with
 which we should not put.  data type is basically only used in CAST and

snip

Sounds like a good plan.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Martijn van Oosterhout

On Mon, Mar 07, 2011 at 07:15:28PM -0500, Tom Lane wrote:
 Only if the expression-to-be-sorted does not already fully specify the
 collation, which so far as I can tell (either from the code or your
 description above) it does.  I think that the explicit representation
 of collation as part of the PathKey node is unnecessary, inefficient,
 and bug-inducing --- the latter because it promotes fuzzy thinking about
 where the collation information is coming from.  (And this isn't just
 hypothetical: IMO the bugs I exhibited upthread are *directly* due to
 fuzzy thinking about what defines an index's sort order.)

Well, collation processing happens in two phases. Initially collate
information is provided by the columns in the query, explicit clauses,
etc. These are indeed attached to the values. From here the collations
of expressions are determined. The SQL committee thought up a bunch of
actually quite logical rules here. Explicit overrides implicit,
implicit overrides default. Both explicit or both implicit is an error,
etc. Note error state is only a problem if you use it for sorting or
comparison, otherwise it is ignored.

This phase of processing happens in the parse analysis, the end result
being that every expression should have a collation set, and every
operator where it matters has consistant collation information for its
arguments. So at this point the collation should be attached to the
expression.

In the planning phase however, all collation information is ignored
except where it matters, which is the comparison operators and ORDER BY
and similar. By the previous phase all comparison operators should have
a defined collation for its arguments and thus for itself, this allows
it to construct appropriate pathkeys for index scans, etc.

I havn't looked at the patch, perhaps it confuses the information in
the two phases but the basic idea seems to me:

- parse analysis - collation in expressions
- planning - collation in path key

 Or, to put it another way: the properties that define a sort order are
 the sort comparison operator, the collation, the ASC/DESC bit, and the
 NULLS FIRST/LAST bit.  Given the way that the SQL committee has
 constructed the language, the operator and the two flag bits are
 attached to the ORDER BY clause, but the collation is a property of the
 expression-to-be-sorted.  If we fail to preserve that distinction in the
 internal representation, we're just creating problems for ourselves.

The assigning of the collation to expressions is simply the method for
getting the collation information to the operators. Mainly because 99%
of the time there will be no explicit clauses in the queries, so the
information has to get there by other means.

Hope this helps,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Susanne Ebrecht


On 07.03.2011 17:43, Tom Lane wrote:

because two expressions that are equal() must necessarily have the same 
collation
property.


Peter, Tom,

I am not able to see this.

If 'abc' == 'abc' is not collation depending at all. It is only
encoding depending.

Collation is only needed for upper(), lower() and sorting.
Means it tells if e.g. upper('i') will get Y or I.
It tells if the German s-umlaut will be sorted together with 's' or 
after 'z'.


Btw. the follows on implementing collations will be different -
and I hope Peter is aware of it.

My experience is that a huge follow will be that users will complain 
that the

sorting isn't correct even when it is correct.

Susanne

--
Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Tom Lane

Susanne Ebrecht susa...@2ndquadrant.com writes:
 On 07.03.2011 17:43, Tom Lane wrote:
 because two expressions that are equal() must necessarily have the same 
 collation
 property.

 Peter, Tom,

 I am not able to see this.

 If 'abc' == 'abc' is not collation depending at all. It is only
 encoding depending.

Sorry, I was using a term of art there.  This isn't about data values,
it's about parsed expression trees.  See the equal() function in
src/backend/nodes/equalfuncs.c.  If two expression trees match according
to equal(), then they must necessarily have the same collation property,
because equal() compares all the fields and in particular the collation
fields.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Tom Lane

Martijn van Oosterhout klep...@svana.org writes:
 This phase of processing happens in the parse analysis, the end result
 being that every expression should have a collation set, and every
 operator where it matters has consistant collation information for its
 arguments. So at this point the collation should be attached to the
 expression.

Right.

 I havn't looked at the patch, perhaps it confuses the information in
 the two phases but the basic idea seems to me:

 - parse analysis - collation in expressions
 - planning - collation in path key

Not so right.  A path key contains an expression tree, plus whatever
*additional* information is needed to fully specify the sort ordering.
If the collation is already fully determined by the expression tree,
there is no need to duplicate that information in the PathKey node.
And, as I said, doing so anyway has real negative consequences.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Greg Stark

On Tue, Mar 8, 2011 at 3:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not so right.  A path key contains an expression tree, plus whatever
 *additional* information is needed to fully specify the sort ordering.
 If the collation is already fully determined by the expression tree,
 there is no need to duplicate that information in the PathKey node.
 And, as I said, doing so anyway has real negative consequences.

Isn't the reason to copy that information outside the expression so
that we can choose sometimes to ignore it? Namely, for == we can use
an index with any defined collation even if it doesn't match the
collation in the pathkey we're looking for?

I think currently that's the only example but in theory we could have
collations that are supersets of the desired collation. For example
a UTF8 collation that sorts english in the desired way and sorts utf8
characters in some way that isn't relevant to the query.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Tue, Mar 8, 2011 at 3:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not so right.  A path key contains an expression tree, plus whatever
 *additional* information is needed to fully specify the sort ordering.
 If the collation is already fully determined by the expression tree,
 there is no need to duplicate that information in the PathKey node.
 And, as I said, doing so anyway has real negative consequences.

 Isn't the reason to copy that information outside the expression so
 that we can choose sometimes to ignore it? Namely, for == we can use
 an index with any defined collation even if it doesn't match the
 collation in the pathkey we're looking for?

No, that's nonsense.  A PathKey is all about sort order.  It is not
relevant in any situation where sort order is ignorable.

 I think currently that's the only example but in theory we could have
 collations that are supersets of the desired collation. For example
 a UTF8 collation that sorts english in the desired way and sorts utf8
 characters in some way that isn't relevant to the query.

That sounds like nonsense as well.  There is no such thing as a superset
ordering: to have such a thing, you'd have to have some orderings that
didn't fully determine the ordering of data values, which is pretty much
unworkable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Peter Eisentraut

On mån, 2011-03-07 at 12:52 -0500, Tom Lane wrote:
 It looks like indcollation is acting as a
 substitute for including a CollateClause in the index key expression,
 which doesn't seem like a particularly good tradeoff considering all
 the overhead you must introduce into the default case.

Conceptually, this characterization is correct.  But assuming you do
away with the indcollation column and instead create expression indexes
on (col COLLATE x), you'd then have an interesting time matching those
expressions to expressions in the query when choosing an index.
Example:

CREATE TABLE test (a text);
CREATE INDEX test_idx ((a COLLATE foo));  -- expression index

-- easy to choose the index here
SELECT * FROM test WHERE a COLLATE foo  'x';

-- not so easy here
SELECT * FROM test WHERE a  'x' COLLATE foo;

-- more difficult
CREATE TABLE test2 (b text COLLATE foo);
SELECT * FROM test WHERE x  (SELECT b FROM test2 WHERE something);

Note that this situation already exists: If you create a plain column
index using CREATE INDEX test_idx (a COLLATE foo), all of the above
cases should pick up the index; if you create an expression index using
CREATE INDEX test_idx ((a COLLATE foo)), they won't, except the first
one.  (That is modulo your earlier demonstration that it apparently
doesn't work at all for you, but I'm just saying in case you want to try
out various ways of changing this.)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Peter Eisentraut

On mån, 2011-03-07 at 19:15 -0500, Tom Lane wrote:
 Or, to put it another way: the properties that define a sort order are
 the sort comparison operator, the collation, the ASC/DESC bit, and the
 NULLS FIRST/LAST bit.  Given the way that the SQL committee has
 constructed the language, the operator and the two flag bits are
 attached to the ORDER BY clause, but the collation is a property of
 the expression-to-be-sorted.  If we fail to preserve that distinction
 in the internal representation, we're just creating problems for
 ourselves.

You are probably right; it could be simplified.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On mÃ¥n, 2011-03-07 at 12:52 -0500, Tom Lane wrote:
 It looks like indcollation is acting as a
 substitute for including a CollateClause in the index key expression,
 which doesn't seem like a particularly good tradeoff considering all
 the overhead you must introduce into the default case.

 Conceptually, this characterization is correct.  But assuming you do
 away with the indcollation column and instead create expression indexes
 on (col COLLATE x), you'd then have an interesting time matching those
 expressions to expressions in the query when choosing an index.

Hmm ... interesting point, but I think it already doesn't work in all
cases for expression indexes.  There is an analogous issue for
binary-compatible datatypes, which we hack around by stripping
RelabelType nodes before doing this type of comparison.  Maybe we'll
have to do something similar with CollateClauses.  I never much cared
for that hack, though ... wonder if there is a cleaner way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-08 Thread Tom Lane

Another interesting item ... I see that you added a collation field to
TypeName, apparently on the grounds that the SQL spec includes collation
in data type.  However, it seems to me that that is nonsense up with
which we should not put.  data type is basically only used in CAST and
column definitions (plus some features related to methods, which we
haven't got).  In CAST, they have to have a syntax rule stating that you
must not use COLLATE in the target type name.  In column definitions,
they call out collate clause as a separate syntactic element, and then
they have to have a syntax rule stating that you can't use both that
and a collate clause embedded in the type name.  This is just stupid.
And it conflates the notion of collation with that of type --- based on
our discussions so far, it seems like it'd be a lot better to keep those
separate.

I think we should drop collate clause from TypeName and just have it
in columnDef and the expression syntax.  This might also ease the
ambiguity problem that evidently led you to restrict the expression
production's argument to c_expr.  It would also allow us to meet the
letter of the spec for column definition, in that collate clause
is not required to immediately follow data type.  I see that right now
you don't allow collate clause in ColQualList, so it's not accepting
syntax that the spec says is valid; and you really can't fix that as long
as collate clause can be part of TypeName, because then there would be
two ways to parse the case where COLLATE does immediately follow the
type name.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Theory of operation of collation patch

2011-03-07 Thread Tom Lane

Is there any documentation of $SUBJECT?   Because the more I look at
this patch the more I think it's misdesigned; either that or I
fundamentally misunderstand what it's trying to do.  I complained
yesterday about why the planner wasn't using indcollation to identify
the sort order of an index.  I now think that the reason it doesn't
obviously fail to fail is that indcollation is dead code, and so is
approximately 99% of what you added to the planner, because two
expressions that are equal() must necessarily have the same collation
property.  Tracking the collation as a separate property of a pathkey
is thus a useless activity.  If this conclusion isn't correct, please
explain why not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-07 Thread Peter Eisentraut

On mån, 2011-03-07 at 11:43 -0500, Tom Lane wrote:
 Is there any documentation of $SUBJECT?   Because the more I look at
 this patch the more I think it's misdesigned; either that or I
 fundamentally misunderstand what it's trying to do.  I complained
 yesterday about why the planner wasn't using indcollation to identify
 the sort order of an index.  I now think that the reason it doesn't
 obviously fail to fail is that indcollation is dead code, and so is
 approximately 99% of what you added to the planner, because two
 expressions that are equal() must necessarily have the same collation
 property.  Tracking the collation as a separate property of a pathkey
 is thus a useless activity.  If this conclusion isn't correct, please
 explain why not.

I'll have to check into these details, but here is a test case that
shows that it's doing something with the collation of an index:

DROP TABLE test;
CREATE TABLE test (a text);
CREATE INDEX test_1 ON test (a);
CREATE INDEX test_d ON test (a COLLATE de_DE);
CREATE INDEX test_e ON test (a COLLATE es_ES);
CREATE INDEX test_f ON test (a COLLATE fr_FR);
SET enable_seqscan TO off;
SET enable_bitmapscan TO off;
EXPLAIN SELECT * FROM test WHERE a  'foo';
  QUERY PLAN   
---
 Index Scan using test_1 on test  (cost=0.00..51.90 rows=437 width=32)
   Index Cond: (a  'foo'::text)
(2 rows)

EXPLAIN SELECT * FROM test WHERE a  'foo' COLLATE de_DE;
  QUERY PLAN   
---
 Index Scan using test_d on test  (cost=0.00..51.90 rows=437 width=32)
   Index Cond: (a  ('foo'::text COLLATE de_DE))
(2 rows)

EXPLAIN SELECT * FROM test WHERE a  'foo' COLLATE es_ES;
  QUERY PLAN   
---
 Index Scan using test_e on test  (cost=0.00..51.90 rows=437 width=32)
   Index Cond: (a  ('foo'::text COLLATE es_ES))
(2 rows)

EXPLAIN SELECT * FROM test WHERE a  'foo' COLLATE fr_FR;
  QUERY PLAN   
---
 Index Scan using test_f on test  (cost=0.00..51.90 rows=437 width=32)
   Index Cond: (a  ('foo'::text COLLATE fr_FR))
(2 rows)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-07 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On mÃ¥n, 2011-03-07 at 11:43 -0500, Tom Lane wrote:
 ... I now think that the reason it doesn't
 obviously fail to fail is that indcollation is dead code, and so is
 approximately 99% of what you added to the planner, because two
 expressions that are equal() must necessarily have the same collation
 property.  Tracking the collation as a separate property of a pathkey
 is thus a useless activity.  If this conclusion isn't correct, please
 explain why not.

 I'll have to check into these details, but here is a test case that
 shows that it's doing something with the collation of an index:

[ pokes at it some more... ]  It looks like indcollation is acting as a
substitute for including a CollateClause in the index key expression,
which doesn't seem like a particularly good tradeoff considering all the
overhead you must introduce into the default case.

But more to the point, your examples do *not* work for me.  I can
reproduce both failing to use an index that should work, and selecting
an index that doesn't work:

d1u=# CREATE TABLE test (a text);
CREATE TABLE
d1u=# CREATE INDEX test_1 ON test (a);
CREATE INDEX
d1u=# CREATE INDEX test_d ON test (a COLLATE de_DE);
CREATE INDEX
d1u=# CREATE INDEX test_e ON test (a COLLATE es_ES);
CREATE INDEX
d1u=# CREATE INDEX test_f ON test (a COLLATE fr_FR);
CREATE INDEX
d1u=# CREATE INDEX test_fz ON test ((a||'z') COLLATE fr_FR);
CREATE INDEX
d1u=# explain select * from test order by a;
   QUERY PLAN   

 Index Scan using test_f on test  (cost=0.00..63.90 rows=1310 width=32)
(1 row)

d1u=# explain select * from test order by a collate fr_FR;
  QUERY PLAN   
---
 Sort  (cost=90.93..94.20 rows=1310 width=32)
   Sort Key: ((a COLLATE fr_FR))
   -  Seq Scan on test  (cost=0.00..23.10 rows=1310 width=32)
(3 rows)

d1u=# set enable_seqscan TO 0;
SET
d1u=# explain select * from test order by a collate fr_FR;
QUERY PLAN  
  
--
 Sort  (cost=190.93..194.20 rows=1310 width=32)
   Sort Key: ((a COLLATE fr_FR))
   -  Seq Scan on test  (cost=100.00..123.10 rows=1310 
width=32)
(3 rows)

d1u=# explain select * from test order by a||'z';   
   QUERY PLAN
-
 Index Scan using test_fz on test  (cost=0.00..67.18 rows=1310 width=32)
(1 row)

(This is in a database with encoding utf8 and lc_collate = c)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

2011-03-07 Thread Martijn van Oosterhout

On Mon, Mar 07, 2011 at 11:43:20AM -0500, Tom Lane wrote:
 Is there any documentation of $SUBJECT?   Because the more I look at
 this patch the more I think it's misdesigned; either that or I
 fundamentally misunderstand what it's trying to do.  I complained
 yesterday about why the planner wasn't using indcollation to identify
 the sort order of an index.  I now think that the reason it doesn't
 obviously fail to fail is that indcollation is dead code, and so is
 approximately 99% of what you added to the planner, because two
 expressions that are equal() must necessarily have the same collation
 property.  Tracking the collation as a separate property of a pathkey
 is thus a useless activity.  If this conclusion isn't correct, please
 explain why not.

The collation is a property of the operators/functions and not of the
values. An individual value does not have a collation, a column does.

A pathkey represents a sort order, right? To define a sort order you
need a collation and so the path key is the natural place to put it.
Two path keys that differ only by the collation are *not* compatable,
but obviously can be converted by a sort node.

After the plan phase, the collation will be ignored in 99% of the
executer, except for operators like =,  and  that need to look at it.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Theory of operation of collation patch

2011-03-07 Thread Tom Lane

Martijn van Oosterhout klep...@svana.org writes:
 On Mon, Mar 07, 2011 at 11:43:20AM -0500, Tom Lane wrote:
 Is there any documentation of $SUBJECT?

 The collation is a property of the operators/functions and not of the
 values. An individual value does not have a collation, a column does.

OK.

 A pathkey represents a sort order, right? To define a sort order you
 need a collation and so the path key is the natural place to put it.

Only if the expression-to-be-sorted does not already fully specify the
collation, which so far as I can tell (either from the code or your
description above) it does.  I think that the explicit representation
of collation as part of the PathKey node is unnecessary, inefficient,
and bug-inducing --- the latter because it promotes fuzzy thinking about
where the collation information is coming from.  (And this isn't just
hypothetical: IMO the bugs I exhibited upthread are *directly* due to
fuzzy thinking about what defines an index's sort order.)

Or, to put it another way: the properties that define a sort order are
the sort comparison operator, the collation, the ASC/DESC bit, and the
NULLS FIRST/LAST bit.  Given the way that the SQL committee has
constructed the language, the operator and the two flag bits are
attached to the ORDER BY clause, but the collation is a property of the
expression-to-be-sorted.  If we fail to preserve that distinction in the
internal representation, we're just creating problems for ourselves.

I'm willing to take a pass at fixing this code during the alpha cycle,
but I want to be sure I understand it correctly first.  So, if there's
a hole in my thinking, please point it out.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

[HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

Re: [HACKERS] Theory of operation of collation patch

19 matches

Site Navigation

Mail list logo

Footer information