Re: [HACKERS] Fixing row comparison semantics

2005-12-28 Thread Bruno Wolff III
On Mon, Dec 26, 2005 at 15:12:48 -0500,
  Gregory Maxwell <[EMAIL PROTECTED]> wrote:
> On 12/26/05, Pavel Stehule <[EMAIL PROTECTED]> wrote:
> > (1,1) <* (1,2) = true
> > (1,2) <* (2,1) is NULL
> > (2,3) <* (1,2) = false
> >
> > it's usefull for multicriterial optimalisation
> 
> This is indeed a sane and useful function which should be adopted by
> the SQL standard.. in postgresql this would easily enough be
> implemented as a user function so I'm not sure we need special support
> for it.
> 
> The idea is that in a multidimension comparison you can only sometimes
> say when one tuple is strictly less than (or greater than) another
> because differing dimensions are incomparable.  So, like his example,
> we can not say if (1,2) is lesser or greater than (2,1) because saying
> so would require some priority of the dimensions which may not be
> known or may not exist, it is only clear that they are not equal..

That's normally called a partial order. From CS classes, I don't remember
ever seeing < and > combined that way. Normally you just looked at whether
or not < was true or whether or not > was true.
Is it common in practice for people to want the answer to both of these at
once? Also if you are really dealing with <= and >= (as your example above
seems to imply), then there are 4 possible states (the new one being a = b)
and you can't represent that with just 3 possible results.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Martijn van Oosterhout
Greetings to all,

On Mon, Dec 26, 2005 at 10:04:59AM -0500, Tom Lane wrote:
> I would recommend posting some fairly detailed design discussions
> concerning what you see as the new semantics, API, and catalog
> representation for operators and operator classes.  If you haven't
> got buy-in at that level from the hackers list, it's premature to be
> writing any code at all.

Well, the problem works the other way around too. Unless you're
familiar with postgres internals you can't talk sensebly about the
changes required. Until two days ago I wasn't sure that my
representation was sufficient for everything I wanted it to do.

Anyway, the details posted before to -patches [1] and to -hackers [2]
produced much discussion about locales but didn't consider the actual
patch itself. I thought they were quite detailed and got exactly one
useful reaction: don't confuse collations and locales. Good advice and
the patch is better for it but I figured I needed to come up with
something more substantial for people to take notice.

Also, don't be confused about the time interval, there was zero
development on it from September until a few days ago, the changes
relative to then are not that large.

Now I can (almost) initdb again and have no more catalog changes I was
thinking of posting another patch with summary and see how far it gets
this time. BTW, my stuff won't change how ScanKeys work, they'll
probably just provide another way to initialise them.

Have a nice day,

[1] http://archives.postgresql.org/pgsql-patches/2005-09/msg00022.php
[2] http://archives.postgresql.org/pgsql-hackers/2005-09/msg00110.php

PS. The web archive for pg-patches seems to be lagging about 4 days
at the moment.
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpXciz1jLdsS.pgp
Description: PGP signature


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Pavel Stehule


Huh?  The only "current behavior" with other operators is failure:


you didn't understand me. I know so operator <* isn't supported now.
I prefere SQL spec behave too. But what I wont:

a <* b   ~ ai <= bi and one ai < bi => true ; if one ai > bi => NULL; else 
false


but this behave is from some views really chaotic. This comparation is used 
in operation research, but propably is far to ideas ANSI SQL. It was  only 
idea.




regression=# select (1,1) <* (1,2);
ERROR:  operator <* is not supported for row expressions

In any case, you can get the equivalent of the current behavior by
writing out
1 <* 1 AND 1 <* 2
so I don't see any strong need to support non-SQL-spec behaviors here.

regards, tom lane


_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
>> Right now, '(a, b) < (1, 2)' is processed as 'a < 1 and b < 2', but
>> the SQL standard requires it to be processed as a column-by-column
>> comparison, so the proper comparison is '(a < 1) OR (a = 1 AND b < 2)'

> Can we save current behave (with small modification) with other operator, 
> like <*

Huh?  The only "current behavior" with other operators is failure:

regression=# select (1,1) <* (1,2);
ERROR:  operator <* is not supported for row expressions

In any case, you can get the equivalent of the current behavior by
writing out
1 <* 1 AND 1 <* 2
so I don't see any strong need to support non-SQL-spec behaviors here.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Tom Lane
Martijn van Oosterhout  writes:
> If you want technical details I can do that too (the summary on
> pg-patches a while ago is now wildly out of date). Currently I'm trying
> to get up to speed on pathkeys and indexes before the tree drifts too
> far...

I've given this advice before to other people: trying to develop a large
patch "in hiding" is doomed to failure.  And the sort of patch you are
talking about isn't large ... it's massive.  Combine that with the fact
that you don't even seem to have gotten any pghackers buy-in yet on what
you are doing, and you are setting yourself up to have the patch
rejected, in the unlikely scenario that it's ever completed.  My bet is
that you by yourself will be unable to complete it, simply because the
tree will drift under you faster than you can respond.  (Case in point:
my current project on row-wise comparisons is going to affect ScanKeys.
I'm not sure how yet, but in designing that I won't be considering what
impact it might have on you, because I have no idea what you might be
trying to do in that area.)

I would recommend posting some fairly detailed design discussions
concerning what you see as the new semantics, API, and catalog
representation for operators and operator classes.  If you haven't
got buy-in at that level from the hackers list, it's premature to be
writing any code at all.

I would further recommend that you ask for help rather than trying to
complete the project by yourself.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Martijn van Oosterhout
On Mon, Dec 26, 2005 at 01:29:19PM +0100, Pavel Stehule wrote:
> Can we save current behave (with small modification) with other operator, 
> like <*
> 
> (1,1) <* (1,2) = true
> (1,2) <* (2,1) is NULL
> (2,3) <* (1,2) = false
> 
> it's usefull for multicriterial optimalisation

That's strange. That's not just doing less-than, but also
distinguishing between equal-to and greater-than. So at the very least
you've have to choose an operator like <=>.

Seems to me you should just define your own operator on an array type
and use that. I don't think the above could use an index scan for
speeding it up so there's no point trying to treat it specially.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgplbfPH8OAZT.pgp
Description: PGP signature


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Pavel Stehule


TODO updated:

* %Make row-wise comparisons work per SQL spec

  Right now, '(a, b) < (1, 2)' is processed as 'a < 1 and b < 2', but
  the SQL standard requires it to be processed as a column-by-column
  comparison, so the proper comparison is '(a < 1) OR (a = 1 AND b < 2)'




Can we save current behave (with small modification) with other operator, 
like <*


(1,1) <* (1,2) = true
(1,2) <* (2,1) is NULL
(2,3) <* (1,2) = false

it's usefull for multicriterial optimalisation

Regards
Pavel Stehule

_
Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Bruce Momjian
Tom Lane wrote:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> > Can someone explain to me how:
> > (a, b) < (1, 2)
> > is different to
> > a < 1 and b < 2
> 
> Right at the moment our code interprets it that way, but this behavior
> is wrong per spec.  It should be an ordered column-by-column comparison,
> so that the equivalent simple expression is
> 
>   (a < 1) OR (a = 1 AND b < 2)

TODO updated:

* %Make row-wise comparisons work per SQL spec

  Right now, '(a, b) < (1, 2)' is processed as 'a < 1 and b < 2', but
  the SQL standard requires it to be processed as a column-by-column
  comparison, so the proper comparison is '(a < 1) OR (a = 1 AND b < 2)'


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fixing row comparison semantics

2005-12-26 Thread Bruce Momjian
Alvaro Herrera wrote:
> Martijn van Oosterhout wrote:
> 
> > > src/tools/make_ctags is your friend...
> > 
> > That just shows you where a symbol is defined, not where it's called
> > from. When you change the parameters of a function, you need to make
> > sure you found all the places that used it...
> > 
> > IOW, it's good for going down the call tree, but not up it.
> 
> I use cscope for that.

There is also id-utils, which is mentioned in our developer FAQ.  That's
what I use, but it is strictly command-line or 'edit all files with X',
while I think glimpse and cscope are more like applications, though they
probably have command-line interfaces too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fixing row comparison semantics

2005-12-25 Thread Martijn van Oosterhout
On Sat, Dec 24, 2005 at 09:38:23AM -0500, Tom Lane wrote:
> Are you suggesting that COLLATE will impose comparison semantics on
> all datatypes including non-string types?  If so, I'd be interested
> to know what you have in mind.  If not, claiming that it makes the
> issue go away is nonsensical.

Well, yes, on all data types. It needs to be done for string types and
it would be nice for user-defined data types, so you may as well do it
for all types. It avoids adding special cases, which is a good thing,
IMHO.

Every data type has at least two collations, ascending and descending.
So instead of all the current stuff with reverse operator classes,
you'll just be able to declare your index as:

CREATE INDEX blah ON foo (a, b COLLATE DESC);

And it'll be able to be used for queries using ORDER BY a, b DESC.

String data types are just the obvious example of types that have many
different collations and they do have the most possibilities. But I
think that user-defined collations would be a powerful idea. All they
need to do is create a btree operator class that describes the basic
order and then they can use this as a collation anywhere they like.

I hope you are not thinking of restricting collations to just string
types, because the special cases would be dreadful. Doing it this way
just means that most places dealing with order only need to worry about
the collation eg pathkeys and not the implementation. 

Technically speaking, NULLS FIRST/LAST are also a form of collation but
I'm not going to touch those until I can at least replicate current
functionality, and they are not relevent for row comparisons anyway.
Collations to operator classes are a many-to-one relationship. I can
see situations where you would have 20 collations using a single
operator class.

Locale specific ordering is really just a subset of collation. At the
moment it just uses the xlocale support present in glibc/MacOS X/Win32
but my hope is that it will be pluggable in the sense that you'll be
able to say:

CREATE LOCALE hungarian AS 'hu_HU' USING glibc;
CREATE LOCALE serbian_us AS '[EMAIL PROTECTED]' USING icu;

(The latter being: Serbian (Latin, Yugoslavia, Revised Orthography,
Currency=US Dollar. Example taken from ICU website).

Then you can use these in column declarations and have them
automatically use that locale for comparisons. It isn't as hard as it
looks but it does touch a lot of different places in the backend. 

If you want technical details I can do that too (the summary on
pg-patches a while ago is now wildly out of date). Currently I'm trying
to get up to speed on pathkeys and indexes before the tree drifts too
far...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpygvGx7t2qJ.pgp
Description: PGP signature


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Alvaro Herrera
Martijn van Oosterhout wrote:

> > src/tools/make_ctags is your friend...
> 
> That just shows you where a symbol is defined, not where it's called
> from. When you change the parameters of a function, you need to make
> sure you found all the places that used it...
> 
> IOW, it's good for going down the call tree, but not up it.

I use cscope for that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> Can someone explain to me how:
> (a, b) < (1, 2)
> is different to
> a < 1 and b < 2

Right at the moment our code interprets it that way, but this behavior
is wrong per spec.  It should be an ordered column-by-column comparison,
so that the equivalent simple expression is

(a < 1) OR (a = 1 AND b < 2)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>> Now, since COLLATE support is still in progress, I'm not sure how much
>> any of this helps you. I'm up to modifying the scankeys but it's hard
>> when you jave to keep rgrepping the tree to work out what is called
>> from where...

> src/tools/make_ctags is your friend...

If grep -r is too slow for you, there's a package called glimpse that
I've used for years.  It builds a full-text index of any specified
collection of files, and then does grep-like searches nearly
instantaneously.  The output format is the same as grep so you can
teach emacs to visit all the hits.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Tom Lane
Martijn van Oosterhout  writes:
> One thing my COLLATE patch does is distinguish between collations and
> operator classes. So the reverse operator class issue disappears
> because it's just a collation and doesn't need a operator class
> (although it won't break anything, see below).

Are you suggesting that COLLATE will impose comparison semantics on
all datatypes including non-string types?  If so, I'd be interested
to know what you have in mind.  If not, claiming that it makes the
issue go away is nonsensical.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Martijn van Oosterhout
On Sat, Dec 24, 2005 at 12:25:41PM +0100, Peter Eisentraut wrote:
> Am Samstag, 24. Dezember 2005 11:46 schrieb Martijn van Oosterhout:
> > That just shows you where a symbol is defined, not where it's called
> > from.
> 
> I've never used ctags, but etags certainly do what you ask for.

Really? I've searched the emacs documentation but don't see it. M-.
finds the definition of a tag, but how do you find usages of a tag?

Thanks in advance,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgp3qSXE2cOJa.pgp
Description: PGP signature


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Peter Eisentraut
Am Samstag, 24. Dezember 2005 11:46 schrieb Martijn van Oosterhout:
> That just shows you where a symbol is defined, not where it's called
> from.

I've never used ctags, but etags certainly do what you ask for.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Martijn van Oosterhout
On Sat, Dec 24, 2005 at 06:05:58PM +0800, Christopher Kings-Lynne wrote:
> >Now, since COLLATE support is still in progress, I'm not sure how much
> >any of this helps you. I'm up to modifying the scankeys but it's hard
> >when you jave to keep rgrepping the tree to work out what is called
> >from where...
> 
> src/tools/make_ctags is your friend...

That just shows you where a symbol is defined, not where it's called
from. When you change the parameters of a function, you need to make
sure you found all the places that used it...

IOW, it's good for going down the call tree, but not up it.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpFVFoqg15RC.pgp
Description: PGP signature


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Christopher Kings-Lynne

Now, since COLLATE support is still in progress, I'm not sure how much
any of this helps you. I'm up to modifying the scankeys but it's hard
when you jave to keep rgrepping the tree to work out what is called
from where...


src/tools/make_ctags is your friend...

Chris

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Martijn van Oosterhout
On Fri, Dec 23, 2005 at 03:18:21PM -0500, Tom Lane wrote:
> I've gotten interested again in the issue of row comparisons, eg
>   (a, b, c) >= (1, 2, 3)
> We've discussed this before, the most comprehensive thread being
> http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php
> but nothing's gotten done.  Unless someone's already working on this
> I think I will take it up.



Since this is related to the COLLATE stuff I'm working on I'd like to
make a few comments.

> 1. Determine which interpretations (btree strategy numbers) exist for
> each pairwise operator.  There must be at least one interpretation that
> is common to all the operators, else fail (for instance, it doesn't help
> if we can identify one operator as "<" and another as ">").

One thing my COLLATE patch does is distinguish between collations and
operator classes. So the reverse operator class issue disappears
because it's just a collation and doesn't need a operator class
(although it won't break anything, see below).

> 2. If there is more than one common interpretation, prefer the one that
> uses the largest number of default opclasses.  If there's a tie, we
> could either reject the construct as ambiguous, or select one of the
> possibilities arbitrarily ... any thoughts about that?

In standard SQL, each node in a query has a collation. Columns use the
collation they were given when the table was created, constants use the
default for the type. It's a little more complicated than that, see the
standard for details.

Anyway, a collation identifies a btree operator class so this problem
solves itself. For each pair of values you are comparing, determine the
collation and look up the operator class to ensure you're using the
same strategy type. There are minor details relating to reverse
collations but they're minor.

The only problem reverse operator classes bring here is that the system
won't realise it and thus won't know that the index is usable. Unless
the user specifies the collation as part of the query.

> 3. A given operator could have the selected interpretation in more than
> one opclass.  Prefer the default opclass if any; otherwise, again we
> have the choice of rejecting or making an arbitrary choice.

If there's a problem, bail. The standard allows you to specify the
collation on a per node basis so any ambiguities can be resolved by the
user.

So something like:

(a COLLATE hungarian, b COLLATE posix, c COLLATE ignorecase) >= ('x','y','z')

Would know exactly what to do (and if you could use an index)...

Now, since COLLATE support is still in progress, I'm not sure how much
any of this helps you. I'm up to modifying the scankeys but it's hard
when you jave to keep rgrepping the tree to work out what is called
from where...

For other people reading this thread, the reason why it can't be
decomposed into (a>=1 AND b>=2 AND c>=3) is because the standard treats
the row as a unit, checking left to right, so:

(4,0,0) < (5,0,0)
(1,2,3) > (0,7,8)

So it needs a new node type and needs to know which index to use.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpajDYwByGz1.pgp
Description: PGP signature


Re: [HACKERS] Fixing row comparison semantics

2005-12-24 Thread Christopher Kings-Lynne

I've gotten interested again in the issue of row comparisons, eg
(a, b, c) >= (1, 2, 3)
We've discussed this before, the most comprehensive thread being
http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php
but nothing's gotten done.  Unless someone's already working on this
I think I will take it up.


Can someone explain to me how:

(a, b) < (1, 2)

is different to

a < 1 and b < 2

?

Chris

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings