Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2006-01-01 Thread Andrew - Supernews
On 2005-12-29, Tom Lane [EMAIL PROTECTED] wrote:
 Well, no, that's not the problem: the problem is that you should be able
 to specify ORDER BY any sort ordering that the system can deal with, and
 the USING syntax is in fact too impoverished to do that.  What if the
 mentioned operator is in more than one operator class?  I believe that
 ATM the code makes a random choice of which opclass' sort function to
 use, which pretty much sucks.

Does it matter? How would the same operator specify different orderings
in different operator classes, given that it must be a strict weak ordering
for sorting to even work, and such an ordering is completely determined by
either one of its greater-than/less-than operators?

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2006-01-01 Thread Martijn van Oosterhout
On Sun, Jan 01, 2006 at 07:48:56AM -, Andrew - Supernews wrote:
 Doesn't this result in incorrect output in multi-column sorts?
 
 i.e. if 'Foo' = 'foo', but for sorting purposes you always sort them
 with 'Foo' first, then a multicolumn sort of the following data:
 
 ('Foo',1)
 ('foo',2)
 ('Foo',3)
 
 would produce the wrong output, no?

Yeah, I was thinking about that but I'm not sure how big a problem it
is.

We could make this work by having the sorting routines (and by
extension the lookups for indexes) by doing two passes: one with the
comparison routines and use the actual sorting routines only on
tie-breaks. A kind of big mark against the idea. Although this is
exactly the kind of work required for string sorting in many locales...

Whatever we do, the result of ORDER BY has to match index order. Does
this mean scankeys need to have two functions, in case of tie-breaking
multi-column keys. I dunno but it needs thinking about.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpoA71wMSYFC.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2006-01-01 Thread Martijn van Oosterhout
On Thu, Dec 29, 2005 at 04:33:32PM -, Andrew - Supernews wrote:
 On 2005-12-29, Tom Lane [EMAIL PROTECTED] wrote:
  Well, no, that's not the problem: the problem is that you should be able
  to specify ORDER BY any sort ordering that the system can deal with, and
  the USING syntax is in fact too impoverished to do that.  What if the
  mentioned operator is in more than one operator class?  I believe that
  ATM the code makes a random choice of which opclass' sort function to
  use, which pretty much sucks.
 
 Does it matter? How would the same operator specify different orderings
 in different operator classes, given that it must be a strict weak ordering
 for sorting to even work, and such an ordering is completely determined by
 either one of its greater-than/less-than operators?

Well, we currently don't forbid it and indeed encourage it (by
encouraging reverse operator classes) as the only way to handle the
ORDER a, b DESC case right now.

I don't think I can find any other examples right now. I don't think
I'd have a problem with forbidding it at some future date.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpWfmh0w16LM.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2006-01-01 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Thu, Dec 29, 2005 at 04:33:32PM -, Andrew - Supernews wrote:
 Does it matter? How would the same operator specify different orderings
 in different operator classes,

 Well, we currently don't forbid it and indeed encourage it (by
 encouraging reverse operator classes) as the only way to handle the
 ORDER a, b DESC case right now.
 I don't think I can find any other examples right now. I don't think
 I'd have a problem with forbidding it at some future date.

Right, the reverse-sort opclass is the only practical example that
anyone's pointed out ... for btree.  For GiST it would be a serious
error to try to restrict operators to appear in at most one opclass.
Therefore, we're not going to be forbidding it, and the code has to
behave in a sane fashion if someone does it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Martijn van Oosterhout
On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote:
 I think this is a mistake -- the same mistake that got us into trouble with
 Turkish.
 
 Hashing depends on the concept of equality which is integral to the type. Two
 things are either the same or they aren't, and that can't change based on
 context.

So someone who wants a case-insensetive search actually doesn't want
Foo to equal foo? If you're arguing that that should be a different
type, well, that's a possibility. But does that mean someone who wants
an accent insensetive match also needs a new type? And a phonebook
match, where Mc and Mac are the same?

It was my understanding that the problem with Turkish/Hungarian was the
we only allow one collation for strings over the whole database. The
point is that in the future you will be able to select this on a per
column/index/query basis, so we don't need to stick to such a
restriction if the user explicitly asks to ignore it.

On a more practical level, a Hash Join needs to produce the same
results as a Merge Join, so if (a = b) then (hash(a) = hash(b)). So if
the user types (a = b COLLATE ignorecase) then the hash function needs
to change to match.

 Specifically in the case of strings, two strings should only be considered
 equal if they consist of the exact same series of characters. (That is, they
 could be encoded differently but they have to encode the same actual
 characters.) That they happen to sort equally compared to all other strings
 doesn't mean that they're equal.

Sure, for straight strings (COLLATE POSIX), that's absolutly a
requirement. But people also have other requirements, like treating
strings case-insensetively. I don't think we should restrict ourselves
to not being able to support their wishes.

You do bring up the possibility of secondary sort functions. Functions
which are not involved in testing for equality, but provide addition
sorting so that even in a case-insensetive sort, the different
variations in case appear together. All variations are equal, but some
are more equal than others type setup.

Thanks for the feedback,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgp1xpk5wgBqf.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote:
  Two things are either the same or they aren't, and that can't change
 based on context.

 So someone who wants a case-insensetive search actually doesn't want
 Foo to equal foo?

That nice simple worldview falls down in other areas as well.  An
example is zero and minus zero in IEEE math: they are equal for some
purposes but not others.  I think you really have to say that equality
is defined with respect to a particular datatype and a particular set
of operators.

The example of case-insensitive sorting suggests that we need to assume
that sort comparison functions can make finer-grained comparisons than
the associated equals operator does.  The current infrastructure
forces these to be exactly the same, but as long as we're busy
reinventing stuff, we could have two comparison functions associated
with a btree opclass: one that mimics the operators' behavior and one
that makes finer-grained comparisons and defines the actual sort order.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Martijn van Oosterhout
On Sat, Dec 31, 2005 at 02:54:18PM -0500, Tom Lane wrote:
 The example of case-insensitive sorting suggests that we need to assume
 that sort comparison functions can make finer-grained comparisons than
 the associated equals operator does.  The current infrastructure
 forces these to be exactly the same, but as long as we're busy
 reinventing stuff, we could have two comparison functions associated
 with a btree opclass: one that mimics the operators' behavior and one
 that makes finer-grained comparisons and defines the actual sort order.

Indeed, that's exactly the thought I had this afternoon, distiguish a
collation and a comparison function. It's certainly a lot easier to
implement than anything else I could think of

Have a great New Year everyone,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpLhmlXjIqDY.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-31 Thread Andrew - Supernews
On 2005-12-31, Martijn van Oosterhout kleptog@svana.org wrote:
 You do bring up the possibility of secondary sort functions. Functions
 which are not involved in testing for equality, but provide addition
 sorting so that even in a case-insensetive sort, the different
 variations in case appear together. All variations are equal, but some
 are more equal than others type setup.

Doesn't this result in incorrect output in multi-column sorts?

i.e. if 'Foo' = 'foo', but for sorting purposes you always sort them
with 'Foo' first, then a multicolumn sort of the following data:

('Foo',1)
('foo',2)
('Foo',3)

would produce the wrong output, no?

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-30 Thread Martijn van Oosterhout
On Thu, Dec 29, 2005 at 10:49:23AM -0500, Tom Lane wrote:
 What I'd really like is to deprecate the USING operator syntax in
 favor of a USING operatorclassname syntax.  Actually, USING opclass
 [ASC/DESC] would get the job done, since given an opclass you can
 certainly run the sort function either normal or reverse.

Thought of something this morning: this seems OK at first glance but I
don't think it's workable. The example being locale dependant sorting.
I really don't think we want to create a new operator class for each
possible way you can sort strings.

Collations (currently anyway) are really just an operator class +
[ASC/DESC] + optional locale rolled into a single identifier. The idea
being that when you use '', the collation on that node decides
unambiguously which version of '' you mean, rather than looking up the
operator and trying to guess which operator class you meant.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpugX7Tq8Z5h.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-30 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Thu, Dec 29, 2005 at 10:49:23AM -0500, Tom Lane wrote:
 What I'd really like is to deprecate the USING operator syntax in
 favor of a USING operatorclassname syntax.  Actually, USING opclass
 [ASC/DESC] would get the job done, since given an opclass you can
 certainly run the sort function either normal or reverse.

 Thought of something this morning: this seems OK at first glance but I
 don't think it's workable. The example being locale dependant sorting.
 I really don't think we want to create a new operator class for each
 possible way you can sort strings.

Well, you would need to add a COLLATE layer on top of this in just the
same way as you'd need a COLLATE layer now if you want locale-dependent
sorting.  I didn't claim it handled that; just pointing out that COLLATE
doesn't handle this, either.

 Collations (currently anyway) are really just an operator class +
 [ASC/DESC] + optional locale rolled into a single identifier.

I really need to study your mail from the other day, but unfortunately
other pressures will probably keep me from getting to it today :-(.
One comment though --- it's not really sane to include ASC/DESC in there
is it?  I thought the spec wanted ORDER BY foo COLLATE bar [ASC/DESC]
... or if not, users certainly will.  If every single collation has to
be created in a matched ASC/DESC pair, you've done it wrong.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-30 Thread Martijn van Oosterhout
On Fri, Dec 30, 2005 at 10:18:48AM -0500, Tom Lane wrote:
 I really need to study your mail from the other day, but unfortunately
 other pressures will probably keep me from getting to it today :-(.
 One comment though --- it's not really sane to include ASC/DESC in there
 is it?  I thought the spec wanted ORDER BY foo COLLATE bar [ASC/DESC]
 ... or if not, users certainly will.  If every single collation has to
 be created in a matched ASC/DESC pair, you've done it wrong.

Well, that's an interesting question. From a user point of view,
forward and backwards isn't a real big deal. Internally it is though.
Two places really care:

- Pathkeys
- Index keys

For pathkeys in particular, I really don't like the propect of going
through all the planning code changing a single pathkey (which is
currently just the oid of the sorting operator) to a
collation plus direction pair.

Index keys are similar, though the code to change it not as much. We'd
still need to invent another column to store the direction.

For users ofcourse, we want them to look similar. Currently forward and
backward collations have different names and different OIDs, though
it's quite reasonable to give them the same name and require the user
to say DESC to access the reverse collation (still with different
OIDs).

However, the real reason I think they should both be declared is that
as actual objects, they don't resemble eachother very much. For
example, an int4 forward collation can be merge joined with an int2
forward collation but not with an int2 backward collation. When you
start considering relationships between collations, forwards and
backwards behave very differently.

Final food for thought: does the xid type have a collation. There's
no b-tree operator class but it does have the concept of equality. It
has a valid hashing function. Is this a collation? If so it has no
order (so ASC/DESC don't apply). Yet hashing is also a property of the
collation, not the type. The same string in different locales would
hash differently.

Where this heads is that a user creates a type, then a collation by
specifying an order and a hash function. The system then creates the
operator classes in the background. This is a radical departure from
where we are now and not something I'm seriously proposing.

Don't worry about how long it takes to think about it. 'tis the season
to be jolly, no? I myself will be offline for most of next week anyway
and the next few days are to be enjoyed.

Seasons greetings,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgp1AXRb2Cbax.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-30 Thread Greg Stark

Martijn van Oosterhout kleptog@svana.org writes:

 Yet hashing is also a property of the collation, not the type. The same
 string in different locales would hash differently.

I think this is a mistake -- the same mistake that got us into trouble with
Turkish.

Hashing depends on the concept of equality which is integral to the type. Two
things are either the same or they aren't, and that can't change based on
context.

Specifically in the case of strings, two strings should only be considered
equal if they consist of the exact same series of characters. (That is, they
could be encoded differently but they have to encode the same actual
characters.) That they happen to sort equally compared to all other strings
doesn't mean that they're equal.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-29 Thread Martijn van Oosterhout
On Wed, Dec 28, 2005 at 07:38:36PM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  The issue is whether anything you want to ORDER BY needs to be
  described by an B-tree operator class, and hence have a real sort
  function.
 
 I think it's reasonable to remove that feature, *after* we provide
 a workable substitute.  So, no to both questions ...

Hmm. By feature I assume you mean ORDER BY ... USING (which no-one
could find an example of) and not requiring the operator to be part of
an opclass.

The only people affected would be people who defined a less-than
operator but no operator class, which you said yourself would probably
just be encouraging programmer lazyness. I wasn't suggesting removing
the ORDER BY ... USING syntax, just these two options from the sorting
routines.

In fact, I don't think we ever need to remove the syntax, just as long
as the operator is part of an operator class, it'll be fine.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpiZF30UI8xD.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-29 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 Hmm. By feature I assume you mean ORDER BY ... USING (which no-one
 could find an example of) and not requiring the operator to be part of
 an opclass.

 In fact, I don't think we ever need to remove the syntax, just as long
 as the operator is part of an operator class, it'll be fine.

Well, no, that's not the problem: the problem is that you should be able
to specify ORDER BY any sort ordering that the system can deal with, and
the USING syntax is in fact too impoverished to do that.  What if the
mentioned operator is in more than one operator class?  I believe that
ATM the code makes a random choice of which opclass' sort function to
use, which pretty much sucks.

I haven't had time yet to digest the material you posted yesterday about
COLLATE.  Maybe there's a solution in there, but I think it could only
happen if we assume that every potential sort operator appears in only
one opclass.  Which seems like a pretty restrictive assumption, even
granted that COLLATE will start to carry some of the load of picking
different sorting options.

What I'd really like is to deprecate the USING operator syntax in
favor of a USING operatorclassname syntax.  Actually, USING opclass
[ASC/DESC] would get the job done, since given an opclass you can
certainly run the sort function either normal or reverse.

We could keep the USING operator syntax but insist that it's only
allowed if there's exactly one possible opclass mapping.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-29 Thread Martijn van Oosterhout
On Thu, Dec 29, 2005 at 10:49:23AM -0500, Tom Lane wrote:
 Well, no, that's not the problem: the problem is that you should be able
 to specify ORDER BY any sort ordering that the system can deal with, and
 the USING syntax is in fact too impoverished to do that.  What if the
 mentioned operator is in more than one operator class?  I believe that
 ATM the code makes a random choice of which opclass' sort function to
 use, which pretty much sucks.

Ah, that problem. Yeah, at this stage we can't really do much about
that. All I've done at this stage is made it so that if the operator
isn't a member of any operator class it displays:

ERROR: Couldn't find order function associated with operator XXX
HINT: Create a B-tree operator class with this operator in it

Eventually the COLLATE option should make the choice unambiguous. But
I'm not there yet.

One benefit right now is that it permits some code reorganisation to
remove the myFunctionCall2 hack, by passing FunctionCallInfo rather
than FmgrInfo.

 We could keep the USING operator syntax but insist that it's only
 allowed if there's exactly one possible opclass mapping.

Can't do that yet, that breaks reverse operator classes. Ofcourse, we
may be able to do it eventually if there's another way to do the same
thing. The main problem being that ORDER BY x ASC is actually made
equivalent to ORDER BY x USING , so the code still has to work for the
USING case to not break the normal case.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpw2Y8u7jY9k.pgp
Description: PGP signature


[HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-28 Thread Martijn van Oosterhout
Greetings everybody,

As part of the changes I would like relating to collations, the two
sort options in inlineApplySortFunction() using only a less-than
operator may become unsupportable. This has been indirectly discussed
before [1] and no-one seemed to have an issue with it then. Internally
nothing in PostgreSQL needs it, but if we want to keep supporting that
ORDER BY ... USING clause it would have to stay.

The issue is whether anything you want to ORDER BY needs to be
described by an B-tree operator class, and hence have a real sort
function.

1. Do people have any problems with this?
2. Would a patch for this be accepted seperate from the whole collation
stuff?

Have a nice day,

[1] http://archives.postgresql.org/pgsql-hackers/2005-09/msg00784.php
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpA0lcLWPQHn.pgp
Description: PGP signature


Re: [HACKERS] Removing SORTFUNC_LT/REVLT

2005-12-28 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 The issue is whether anything you want to ORDER BY needs to be
 described by an B-tree operator class, and hence have a real sort
 function.

 1. Do people have any problems with this?
 2. Would a patch for this be accepted seperate from the whole collation
 stuff?

I think it's reasonable to remove that feature, *after* we provide
a workable substitute.  So, no to both questions ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster