subject:"\[HACKERS\] WIP\: Range Types"

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread Robert Haas

On Wed, Jan 12, 2011 at 2:16 AM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Tue, Jan 11, 2011 at 09:24:08PM -0500, Robert Haas wrote:
 commit 6c412f0605afeb809014553ff7ad28cf9ed5526b
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Sun May 1 18:56:19 2005 +

     Change CREATE TYPE to require datatype output and send functions to have
     only one argument.  (Per recent discussion, the option to accept multiple
     arguments is pretty useless for user-defined types, and would be a likely
     source of security holes if it was used.)  Simplify call sites of
     output/send functions to not bother passing more than one argument.

 ...but I don't understand the motivation behind it.

 IIRC, the issue is that a type output function has to interpret a
 Datum. Type output functions can also be called by users, so it is not
 guarenteed that the given OID would actually match the type of the
 Datum given. Hence the decision that the output function must be able
 to determine itself what kind of Datum it is dealing with.

 Thought experiment: the datum is an integer, but the oid says it's a
 pass-by-ref datum. Now the code may now to use the integer to derefence
 an arbitrary place in memory.

I guess that begs the question of why we need to allow users to call
type output functions directly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread David E. Wheeler

On Jan 12, 2011, at 8:48 AM, Robert Haas wrote:

 I guess that begs the question of why we need to allow users to call
 type output functions directly.

I've used them quite a lot in the past; less so on 8.4+ where casting 
everything to text became a lot easier.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié ene 12 13:48:27 -0300 2011:

 I guess that begs the question of why we need to allow users to call
 type output functions directly.

It used to be the case that that was the only way to run certain casts.
For example, see the pre-8.2 version of this:
http://wiki.postgresql.org/wiki/Pg_depend_display

I haven't needed to use that in a long time, but I am not sure if the
need has completely disappeared.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of miÃ© ene 12 13:48:27 -0300 2011:
 I guess that begs the question of why we need to allow users to call
 type output functions directly.

 It used to be the case that that was the only way to run certain casts.
 For example, see the pre-8.2 version of this:
 http://wiki.postgresql.org/wiki/Pg_depend_display

 I haven't needed to use that in a long time, but I am not sure if the
 need has completely disappeared.

The general point is that any out-of-band data transmitted to an output
function has to be trustworthy, and it has to be available at any place
that is going to call the output function.  The latter point tends to
put a crimp in any ideas of this sort anyway: if you can derive the info
you want at any arbitrary place in the system, why not derive it inside
the output function to start with?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread Robert Haas

On Wed, Jan 12, 2011 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of mié ene 12 13:48:27 -0300 2011:
 I guess that begs the question of why we need to allow users to call
 type output functions directly.

 It used to be the case that that was the only way to run certain casts.
 For example, see the pre-8.2 version of this:
 http://wiki.postgresql.org/wiki/Pg_depend_display

 I haven't needed to use that in a long time, but I am not sure if the
 need has completely disappeared.

 The general point is that any out-of-band data transmitted to an output
 function has to be trustworthy, and it has to be available at any place
 that is going to call the output function.  The latter point tends to
 put a crimp in any ideas of this sort anyway: if you can derive the info
 you want at any arbitrary place in the system, why not derive it inside
 the output function to start with?

Under what circumstances would it be necessary to call a type output
function without knowing the data type?  I mean, you had to decide
which type output function you were going to call in the first place,
so...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 12, 2011 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The general point is that any out-of-band data transmitted to an output
 function has to be trustworthy, and it has to be available at any place
 that is going to call the output function.  The latter point tends to
 put a crimp in any ideas of this sort anyway: if you can derive the info
 you want at any arbitrary place in the system, why not derive it inside
 the output function to start with?

 Under what circumstances would it be necessary to call a type output
 function without knowing the data type?  I mean, you had to decide
 which type output function you were going to call in the first place,
 so...

If the out-of-band info is going to be limited to the type OID, you
might as well put it in the object, ie, follow the same solution we're
already using for arrays.  Jeff's problems are already plenty large
enough without insisting that he invent a new, precedent-free solution
for this problem (*and* break every single output-function call site in
both core and third-party modules in order to do so...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-12 Thread Robert Haas

On Wed, Jan 12, 2011 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 12, 2011 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The general point is that any out-of-band data transmitted to an output
 function has to be trustworthy, and it has to be available at any place
 that is going to call the output function.  The latter point tends to
 put a crimp in any ideas of this sort anyway: if you can derive the info
 you want at any arbitrary place in the system, why not derive it inside
 the output function to start with?

 Under what circumstances would it be necessary to call a type output
 function without knowing the data type?  I mean, you had to decide
 which type output function you were going to call in the first place,
 so...

 If the out-of-band info is going to be limited to the type OID, you
 might as well put it in the object, ie, follow the same solution we're
 already using for arrays.  Jeff's problems are already plenty large
 enough without insisting that he invent a new, precedent-free solution
 for this problem (*and* break every single output-function call site in
 both core and third-party modules in order to do so...)

In terms of solving the immediate problem, you're probably correct.
But this isn't the first time this issue has come up, and it probably
won't be the last.  It's pretty lame to waste 4+ bytes on disk for
every value of a given type due to a parameter-passing convention.
And I suppose it's worth pointing out that if Jeff does adopt that
solution, we'll be stuck with it for approximately forever due to
pg_upgrade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-11 Thread Robert Haas

On Sun, Jan 9, 2011 at 4:03 AM, Jeff Davis pg...@j-davis.com wrote:
 It also might be worth figuring out why input functions get the type oid
 and output functions do not. I see this comment above getTypeIOParam():

  * As of PostgreSQL 8.1, output functions receive only the value
 itself
  * and not any auxiliary parameters, so the name of this routine is
 now
  * a bit of a misnomer ... it should be
 getTypeInputParam.

 So, why was it eliminated?

Good question.  The relevant commit is here:

commit 6c412f0605afeb809014553ff7ad28cf9ed5526b
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Sun May 1 18:56:19 2005 +

Change CREATE TYPE to require datatype output and send functions to have
only one argument.  (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.)  Simplify call sites of
output/send functions to not bother passing more than one argument.

...but I don't understand the motivation behind it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-11 Thread Martijn van Oosterhout

On Tue, Jan 11, 2011 at 09:24:08PM -0500, Robert Haas wrote:
 commit 6c412f0605afeb809014553ff7ad28cf9ed5526b
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Sun May 1 18:56:19 2005 +
 
 Change CREATE TYPE to require datatype output and send functions to have
 only one argument.  (Per recent discussion, the option to accept multiple
 arguments is pretty useless for user-defined types, and would be a likely
 source of security holes if it was used.)  Simplify call sites of
 output/send functions to not bother passing more than one argument.
 
 ...but I don't understand the motivation behind it.

IIRC, the issue is that a type output function has to interpret a
Datum. Type output functions can also be called by users, so it is not
guarenteed that the given OID would actually match the type of the
Datum given. Hence the decision that the output function must be able
to determine itself what kind of Datum it is dealing with.

Thought experiment: the datum is an integer, but the oid says it's a
pass-by-ref datum. Now the code may now to use the integer to derefence
an arbitrary place in memory.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] WIP: Range Types

2011-01-09 Thread Jeff Davis

On Sat, 2011-01-08 at 21:58 -0500, Robert Haas wrote:
 I mean, one semi-obvious possibility is to write one set of C
 functions that can have multiple SQL-level definitions bound to it.
 Then when the function is called, it can peek at flinfo-fn_oid to
 figure out which incarnation was called and then get the typo info
 from there.  That's ugly, though.

That would work, but it is pretty ugly. It means it would be very
difficult for users to write new generic functions, because they would
need to add a new catalog entry for every existing range type.

Then again, wasting 4 bytes per row is not ideal, either. And maybe
users could still write some useful generic functions if they were a
combination of other generic functions, using the polymorphic system.

 It'd be really nice if we could just arrange for the info on which
 type anyrange actually is at the moment to be available in the right
 place.  Storing it on disk to work around that is pretty horrible, but
 maybe there's no other reasonable option.

I was surprised when I saw the solution for records. Maybe we should
consider something like that as a last resort (if it's possible for
non-record types)? I'd rather give up typmod than anyrange.

It also might be worth figuring out why input functions get the type oid
and output functions do not. I see this comment above getTypeIOParam():

 * As of PostgreSQL 8.1, output functions receive only the value
itself
 * and not any auxiliary parameters, so the name of this routine is
now
 * a bit of a misnomer ... it should be
getTypeInputParam.  
   

So, why was it eliminated? If the type output function got the type OID,
would it be enough to use fn_expr_get_argtype() for the other generic
functions?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Jeff Davis

When writing the generic range output function, it needs to know the
specific range type in order to call the subtype's output function.

Records accomplish this by using a special cache based on the typmod,
apparently, which looks like a hack to me.

Arrays accomplish this by storing the specific type in every array
value. That seems very wasteful in the case of range types (which only
hold two values).

I thought I could get away with using get_fn_expr_argtype() for most of
the generic functions, but apparently that can't always provide an
answer.

Any ideas? Maybe, with alignment and a flags byte (to hold
inclusivity, infinite boundaries, etc.), the extra 4 bytes doesn't cost
much, anyway?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Robert Haas

On Sat, Jan 8, 2011 at 3:12 PM, Jeff Davis pg...@j-davis.com wrote:
 Any ideas? Maybe, with alignment and a flags byte (to hold
 inclusivity, infinite boundaries, etc.), the extra 4 bytes doesn't cost
 much, anyway?

I'd be really reluctant to bloat the range representation by 4 bytes
to support an anyrange type.  Better to defer this until the great day
when we get a better typmod system, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Jeff Davis

On Sat, 2011-01-08 at 15:47 -0500, Robert Haas wrote:
 On Sat, Jan 8, 2011 at 3:12 PM, Jeff Davis pg...@j-davis.com wrote:
  Any ideas? Maybe, with alignment and a flags byte (to hold
  inclusivity, infinite boundaries, etc.), the extra 4 bytes doesn't cost
  much, anyway?
 
 I'd be really reluctant to bloat the range representation by 4 bytes
 to support an anyrange type.  Better to defer this until the great day
 when we get a better typmod system, at least IMHO.

Can you elaborate? How can we have generic functions without ANYRANGE?

And without generic functions, how do we make it easy for users to
specify a new range type?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Jeff Davis

On Sat, 2011-01-08 at 13:05 -0800, Jeff Davis wrote:
 On Sat, 2011-01-08 at 15:47 -0500, Robert Haas wrote:
  On Sat, Jan 8, 2011 at 3:12 PM, Jeff Davis pg...@j-davis.com wrote:
   Any ideas? Maybe, with alignment and a flags byte (to hold
   inclusivity, infinite boundaries, etc.), the extra 4 bytes doesn't cost
   much, anyway?
  
  I'd be really reluctant to bloat the range representation by 4 bytes
  to support an anyrange type.  Better to defer this until the great day
  when we get a better typmod system, at least IMHO.
 
 Can you elaborate? How can we have generic functions without ANYRANGE?
 
 And without generic functions, how do we make it easy for users to
 specify a new range type?

Another thought:

If we use timestamps, then that's 8 bytes each, meaning 16 bytes. Then,
there is the VARHDRSZ (now we're at 20), the flag byte (21), and the
range type oid (25). With alignment (if it's aligned at all), that's
either 28 or 32 bytes, which is starting to seem ridiculous.

Making it always varlena is kind of nice, because then if the upper or
lower bound is special (NULL or infinity), then we can omit it and save
some space. But I'm starting to think that it's not worth it, and we
should detect whether the subtype is fixed, and if so, make the range
type fixed length. That will save on the varlena header.

Any suggestions on how to represent/align these ranges?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Robert Haas

On Sat, Jan 8, 2011 at 4:05 PM, Jeff Davis pg...@j-davis.com wrote:
 On Sat, 2011-01-08 at 15:47 -0500, Robert Haas wrote:
 On Sat, Jan 8, 2011 at 3:12 PM, Jeff Davis pg...@j-davis.com wrote:
  Any ideas? Maybe, with alignment and a flags byte (to hold
  inclusivity, infinite boundaries, etc.), the extra 4 bytes doesn't cost
  much, anyway?

 I'd be really reluctant to bloat the range representation by 4 bytes
 to support an anyrange type.  Better to defer this until the great day
 when we get a better typmod system, at least IMHO.

 Can you elaborate? How can we have generic functions without ANYRANGE?

 And without generic functions, how do we make it easy for users to
 specify a new range type?

Oh, hmm.  What generic functions did you have in mind?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Robert Haas

On Sat, Jan 8, 2011 at 6:06 PM, Jeff Davis pg...@j-davis.com wrote:
 If we use timestamps, then that's 8 bytes each, meaning 16 bytes. Then,
 there is the VARHDRSZ (now we're at 20), the flag byte (21), and the
 range type oid (25). With alignment (if it's aligned at all), that's
 either 28 or 32 bytes, which is starting to seem ridiculous.

It'll use the 1-byte varlena header format, which is unaligned.  So
you'll end up with 8 + 8 + 2 bytes = 18 bytes, unaligned.  Maybe you
could cram that down to 17 bytes unaligned with sufficient work, but
I'm not sure it's worth the complexity.  If you end up having to
include the type OID though that's pretty horrible; it adds another 4
bytes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Jeff Davis

On Sat, 2011-01-08 at 20:32 -0500, Robert Haas wrote:
 On Sat, Jan 8, 2011 at 4:05 PM, Jeff Davis pg...@j-davis.com wrote:
  On Sat, 2011-01-08 at 15:47 -0500, Robert Haas wrote:
  On Sat, Jan 8, 2011 at 3:12 PM, Jeff Davis pg...@j-davis.com wrote:
   Any ideas? Maybe, with alignment and a flags byte (to hold
   inclusivity, infinite boundaries, etc.), the extra 4 bytes doesn't cost
   much, anyway?
 
  I'd be really reluctant to bloat the range representation by 4 bytes
  to support an anyrange type.  Better to defer this until the great day
  when we get a better typmod system, at least IMHO.
 
  Can you elaborate? How can we have generic functions without ANYRANGE?
 
  And without generic functions, how do we make it easy for users to
  specify a new range type?
 
 Oh, hmm.  What generic functions did you have in mind?

Well, input/output, comparisons, overlaps, intersection, minus, and all
the necessary GiST support functions.

Without generic functions, the only choices we have are:
 * force the user to write and specify them all -- which doesn't leave
much left of my feature (I think the interface would be all that's
left).
 * somehow generate the functions at type creation time

Any other ideas?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-08 Thread Robert Haas

On Sat, Jan 8, 2011 at 9:12 PM, Jeff Davis pg...@j-davis.com wrote:
 Oh, hmm.  What generic functions did you have in mind?

 Well, input/output, comparisons, overlaps, intersection, minus, and all
 the necessary GiST support functions.

 Without generic functions, the only choices we have are:
  * force the user to write and specify them all -- which doesn't leave
 much left of my feature (I think the interface would be all that's
 left).
  * somehow generate the functions at type creation time

 Any other ideas?

Do they have to be good ideas?

I mean, one semi-obvious possibility is to write one set of C
functions that can have multiple SQL-level definitions bound to it.
Then when the function is called, it can peek at flinfo-fn_oid to
figure out which incarnation was called and then get the typo info
from there.  That's ugly, though.

It'd be really nice if we could just arrange for the info on which
type anyrange actually is at the moment to be available in the right
place.  Storing it on disk to work around that is pretty horrible, but
maybe there's no other reasonable option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-07 Thread Florian Weimer

* Jeff Davis:

 On Tue, 2011-01-04 at 14:18 +, Florian Weimer wrote:
 * Jeff Davis:
 
  4. For the GiST penalty function, and perhaps some picksplit algorithms,
  it might be nice to know the length of a range, or do some other kinds
  of math. It introduces a lot of complexity to try to define math
  functions for each subtype, and try to make sure they behave sanely. So
  I was thinking that the user might need to specify a function that
  converts the subtype into a float that approximates a value's position
  in the total order.
 
 Doesn't the eqsel hint already provide this information?

 Can you clarify what you mean? I don't know what the eqsel hint is.

Uhm, it's not not eqsel, but the RESTRICT clause on operators:

http://www.postgresql.org/docs/8.4/static/xoper-optimization.html

I'm wondering if one of these hint functions can be reused to compute
range lengths.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-07 Thread Jeff Davis

On Fri, 2011-01-07 at 08:21 +, Florian Weimer wrote:
 http://www.postgresql.org/docs/8.4/static/xoper-optimization.html
 
 I'm wondering if one of these hint functions can be reused to compute
 range lengths.

Interesting idea.

However, I don't really see a way to make that work. These functions are
tied to the operator, so it would be awkward to try to connect it to the
GiST support functions. Also, it doesn't seem to be an exact fit,
because the RESTRICT function is used to compute the selectivity as of
right now using current statistics.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-06 Thread Jeff Davis

On Thu, 2011-01-06 at 09:30 +0900, Hitoshi Harada wrote:
 Robert Haas originally began to propose the idea of type
 interface to get together three of KNN-GIST, range type and window
 frame issue. For KNN-GIST, it was committed by extending pg_amop
 without considering others and range type will be as well. Not getting
 them together might be the answer.

We may end up combining all of these concepts into type interfaces
later. Now that we have multiple potential users of type interfaces, it
will be easier to design type interfaces to work well for all of them.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-06 Thread Jeff Davis

On Wed, 2011-01-05 at 12:07 -0800, Jeff Davis wrote:
 The current design for range types doesn't ask for add or subtract.
 Although it might be interesting to try to use such an interface for
 range types, it introduces a lot of complexity and makes it easier to
 cause subtle problems (consider that addition of timestamps and
 intervals is not commutative).

A consequence of this design is that some generic range functions, like
length or distance would need to rely on the polymorphism of + and
- to work.

I'm also not sure if a constructor like range(start, offset) returns
anyrange could be made to work generically at all, because the start
and offset may be two different types (and a function that takes
ANYELEMENT requires that all ANYELEMENT arguments are the same type).

Does anyone see a problem with that?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-06 Thread Robert Haas

On Thu, Jan 6, 2011 at 12:32 PM, Jeff Davis pg...@j-davis.com wrote:
 On Wed, 2011-01-05 at 12:07 -0800, Jeff Davis wrote:
 The current design for range types doesn't ask for add or subtract.
 Although it might be interesting to try to use such an interface for
 range types, it introduces a lot of complexity and makes it easier to
 cause subtle problems (consider that addition of timestamps and
 intervals is not commutative).

 A consequence of this design is that some generic range functions, like
 length or distance would need to rely on the polymorphism of + and
 - to work.

 I'm also not sure if a constructor like range(start, offset) returns
 anyrange could be made to work generically at all, because the start
 and offset may be two different types (and a function that takes
 ANYELEMENT requires that all ANYELEMENT arguments are the same type).

 Does anyone see a problem with that?

Seems like you could make people who want that write range(start,
start+offset) instead without too much pain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-05 Thread Robert Haas

On Wed, Jan 5, 2011 at 12:54 AM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2011-01-04 at 16:45 -0800, Josh Berkus wrote:
 On 1/4/11 10:18 AM, Jeff Davis wrote:
  The main drawback here is that only a select group of people will be
  defining discrete range types at all, because it would require them to
  define a function first. Perhaps that's for the best, because, (as Tom
  pointed out) we don't want someone using floats and then specifying a
  granule of '0.01'.

 Frankly, I'm still not convinced that *anyone* will really need discrete
 range types

 Well, *need* is a standard that can never be met. But with something
 like a date range, it's very possible that a discrete version matches
 the real-world problem more closely than a continuous one.

 If you use only continuous ranges, then be careful to stick with exactly
 one convention, or you will likely get wrong results (I think this point
 has already been established). That sounds easy, but consider:
  * If you want to know whether two ranges are adjacent (a common
 requirement), then you need to use [ ) or ( ].
  * If you need to map a single point into a range, the only thing that
 makes sense is [ ].
  * If your query contains current_date, you'll probably want ranges that
 are either in ( ] or [ ] form.
  * If you are mixing data sets, they may use different conventions.

 You can work around all of these problems by making the query more
 complex (and more error-prone). But I wouldn't like to give up on
 discrete ranges for types where it really makes sense (dates, IPs,
 integers).

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-05 Thread Hitoshi Harada

2011/1/5 Jeff Davis pg...@j-davis.com:
 On Tue, 2011-01-04 at 23:04 +0900, Hitoshi Harada wrote:
   CREATE TYPE numrange
     AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);

 I am interested in how you define increment/decrement operation of
 range value in discrete types. The window functions and PARTITION also
 want to represent RANGE but there's no clear solution.

 The user would specify a canonical function like:

   CREATE TYPE int4range AS RANGE (SUBTYPE=int4, SUBTYPE_CMP=btint4cmp,
     CANONICAL=my_int4range_canonical);

 That function would be called when constructing ranges on input or after
 a computation, and could change something like (1,4] into [2,4] if you
 prefer the latter form.

 So the range types would not have increments, decrements, granules, or
 knowledge about the difference type (e.g. interval is the difference
 type for timestamp).

To canonicalize, it might be ok. I wonder if you won't operate on the
range types like extending their bounds or iterate/enum values from
start to end. In such situation, I bet you'll need to know how to walk
values step by step.

 What support do you need/want from range types to help with new window
 function features?

My argument is here:
http://archives.postgresql.org/message-id/aanlktimfmqmbzj5ctxve_pwt_zmcuhpoet3gaqq6p...@mail.gmail.com

For any type to calculate boundary based on RANGE clause in window
functions, we need some type interface mechanism in the core to know
how to add / subtract values to reach the boundary from the current
value. For example,

SELECT count(*) OVER (ORDER BY n_int RANGE BETWEEN 10 PRECEDING AND 5
FOLLOWING) FROM tbl;

In the standard, the types allowed in RANGE are only int, float, date,
timestamp, etc. but we have various extensible data types as you know
and we couldn't assume '+' / '-' operators tell add /subtract
operation absolutely.

 Also, partitioning might have some use for range types to represent
 range partitions. Comments are welcome.

I heard about partitioning which may have the same problem in RANGE
clause from Itagaki-san, but have not looked so much yet.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-05 Thread Hitoshi Harada

2011/1/5 Jeff Davis pg...@j-davis.com:
 On Tue, 2011-01-04 at 23:04 +0900, Hitoshi Harada wrote:
   CREATE TYPE numrange
     AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);

 I am interested in how you define increment/decrement operation of
 range value in discrete types. The window functions and PARTITION also
 want to represent RANGE but there's no clear solution.

 The user would specify a canonical function like:

   CREATE TYPE int4range AS RANGE (SUBTYPE=int4, SUBTYPE_CMP=btint4cmp,
     CANONICAL=my_int4range_canonical);

 That function would be called when constructing ranges on input or after
 a computation, and could change something like (1,4] into [2,4] if you
 prefer the latter form.

 So the range types would not have increments, decrements, granules, or
 knowledge about the difference type (e.g. interval is the difference
 type for timestamp).

To canonicalize, it might be ok. I wonder if you won't operate on the
range types like extending their bounds or iterate/enum values from
start to end. In such situation, I bet you'll need to know how to walk
values step by step.

 What support do you need/want from range types to help with new window
 function features?

My argument is here:
http://archives.postgresql.org/message-id/aanlktimfmqmbzj5ctxve_pwt_zmcuhpoet3gaqq6p...@mail.gmail.com

For any type to calculate boundary based on RANGE clause in window
functions, we need some type interface mechanism in the core to know
how to add / subtract values to reach the boundary from the current
value. For example,

SELECT count(*) OVER (ORDER BY n_int RANGE BETWEEN 10 PRECEDING AND 5
FOLLOWING) FROM tbl;

In the standard, the types allowed in RANGE are only int, float, date,
timestamp, etc. but we have various extensible data types as you know
and we couldn't assume '+' / '-' operators tell add /subtract
operation absolutely.

 Also, partitioning might have some use for range types to represent
 range partitions. Comments are welcome.

I heard about partitioning which may have the same problem in RANGE
clause from Itagaki-san, but have not looked so much yet.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-05 Thread David Fetter

On Thu, Jan 06, 2011 at 02:25:01AM +0900, Hitoshi Harada wrote:
 2011/1/5 Jeff Davis pg...@j-davis.com:
  On Tue, 2011-01-04 at 23:04 +0900, Hitoshi Harada wrote:
    CREATE TYPE numrange
      AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);
 
  I am interested in how you define increment/decrement operation
  of range value in discrete types. The window functions and
  PARTITION also want to represent RANGE but there's no clear
  solution.
 
  The user would specify a canonical function like:
 
    CREATE TYPE int4range AS RANGE (SUBTYPE=int4, SUBTYPE_CMP=btint4cmp,
      CANONICAL=my_int4range_canonical);
 
  That function would be called when constructing ranges on input or after
  a computation, and could change something like (1,4] into [2,4] if you
  prefer the latter form.
 
  So the range types would not have increments, decrements, granules, or
  knowledge about the difference type (e.g. interval is the difference
  type for timestamp).
 
 To canonicalize, it might be ok.  I wonder if you won't operate on
 the range types like extending their bounds or iterate/enum values
 from start to end.  In such situation, I bet you'll need to know how
 to walk values step by step.
 
  What support do you need/want from range types to help with new window
  function features?
 
 My argument is here:
 http://archives.postgresql.org/message-id/aanlktimfmqmbzj5ctxve_pwt_zmcuhpoet3gaqq6p...@mail.gmail.com
 
 For any type to calculate boundary based on RANGE clause in window
 functions, we need some type interface mechanism in the core to know
 how to add / subtract values to reach the boundary from the current
 value.  For example,
 
 SELECT count(*) OVER (ORDER BY n_int RANGE BETWEEN 10 PRECEDING AND 5
 FOLLOWING) FROM tbl;

I'm not sure I get the connection between this type of range and the
range types Jeff is working on.  Jeff's work involves a way to
create types which represent ranges over types which have some kind of
ordering, although not necessarily a successor operation.

Had you planned to cast to an integer range in the process of doing
this window?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-05 Thread Jeff Davis

On Wed, 2011-01-05 at 10:41 -0800, David Fetter wrote:
 On Thu, Jan 06, 2011 at 02:25:01AM +0900, Hitoshi Harada wrote:
  For any type to calculate boundary based on RANGE clause in window
  functions, we need some type interface mechanism in the core to know
  how to add / subtract values to reach the boundary from the current
  value.  For example,
  
  SELECT count(*) OVER (ORDER BY n_int RANGE BETWEEN 10 PRECEDING AND 5
  FOLLOWING) FROM tbl;
 
 I'm not sure I get the connection between this type of range and the
 range types Jeff is working on.  Jeff's work involves a way to
 create types which represent ranges over types which have some kind of
 ordering, although not necessarily a successor operation.
 
 Had you planned to cast to an integer range in the process of doing
 this window?

I don't think Harada-san needs the type infrastructure itself, just the
interface to find the difference type (e.g. interval from
timestamp) as well as functions like add and subtract (potentially two
interesting subtract functions). Without knowing which function to call,
there is no way to find the window boundaries given the current row.

The current design for range types doesn't ask for add or subtract.
Although it might be interesting to try to use such an interface for
range types, it introduces a lot of complexity and makes it easier to
cause subtle problems (consider that addition of timestamps and
intervals is not commutative).

Even if add and subtract were associated with a range type, there's no
way to tell which range type to pick given the window function syntax
(multiple range types could be defined over the same subtype).

I think the interface question should be addressed more directly with a
type interfaces patch.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-05 Thread Hitoshi Harada

2011/1/6 Jeff Davis pg...@j-davis.com:
 Even if add and subtract were associated with a range type, there's no
 way to tell which range type to pick given the window function syntax
 (multiple range types could be defined over the same subtype).

 I think the interface question should be addressed more directly with a
 type interfaces patch.

I agree the current canonical approach fits range type's demand, and
I'm inclined that the type interface discussion is another point.
IIRC, Robert Haas originally began to propose the idea of type
interface to get together three of KNN-GIST, range type and window
frame issue. For KNN-GIST, it was committed by extending pg_amop
without considering others and range type will be as well. Not getting
them together might be the answer.

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Hitoshi Harada

2011/1/4 Jeff Davis pg...@j-davis.com:
 I have been updating my work in progress here:

 http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

 Right now, it's not in a reviewable state, but those interested can
 glance through the code.

 Quick synopsis (for illustration purposes only; don't expect much from
 the current code):

  CREATE TYPE numrange
    AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);

I am interested in how you define increment/decrement operation of
range value in discrete types. The window functions and PARTITION also
want to represent RANGE but there's no clear solution.

Sorry if it's already been discussed since I didn't track the threads.

Regards


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Florian Weimer

* Jeff Davis:

 4. For the GiST penalty function, and perhaps some picksplit algorithms,
 it might be nice to know the length of a range, or do some other kinds
 of math. It introduces a lot of complexity to try to define math
 functions for each subtype, and try to make sure they behave sanely. So
 I was thinking that the user might need to specify a function that
 converts the subtype into a float that approximates a value's position
 in the total order.

Doesn't the eqsel hint already provide this information?

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 2. We need to use the subtype's IO functions, but those may not be
 immutable. So, rather than create new IO functions for each range type,
 I was thinking that I'd use just three (anyrange_i_in, anyrange_s_in,
 and anyrange_v_in), and select the right one at definition time, based
 on the subtype's IO functions' volatility. That seems like a bit of a
 hack -- any better ideas?

You should just do what we do for arrays and records, ie, mark the I/O
functions stable.  There is no reason for anyrange to have a more
complicated approach to this than the existing composite-type structures
do.  See discussion thread here
http://archives.postgresql.org/pgsql-hackers/2010-07/msg00932.php
and commit here
http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php

 3. Right now I allow user-defined parse/deparse functions to be
 specified. In almost all cases, I would think that we want the text
 format to be something like:
   [ 2010-01-01, 2011-01-01 )
 where the brackets denote inclusivity, and the left and right sides can
 be optionally double-quoted. Is it even worth having these parse/deparse
 functions, or should we just force the obvious format?

+1 for forcing a single consistent format.  I compare this to the
Berkeley-era decision to let types specify nondefault array delimiters
--- that was flexibility that didn't help anybody, just resulted in
over-complicated code (or code that would fall over if someone tried
to actually use a delimiter other than comma...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 2:29 AM, Jeff Davis pg...@j-davis.com wrote:
 I liked Robert's suggestion here:

 http://archives.postgresql.org/message-id/aanlktiks_x93_k82b4f_ga634wci0oeb9ftrurf28...@mail.gmail.com

 which says that the user can just define a canonicalize function that
 will take a range as input (or perhaps the logical pieces of a range)
 and put it into an appropriate canonical representation. For instance,
 int4range_canonical might take (1,4] and turn it into [2,4]. This is
 similar to a few other ideas, but Robert's idea seems to require the
 least effort by the person defining the range type, because postgresql
 can still handle representation.

 It doesn't allow for all of the suggested features. In particular, it
 would not allow granules to be specified for discrete ranges. But on
 balance, it seems like this is the most conceptually simple and I think
 it satisfies the primary use cases.

Maybe I'm missing something, but it seems like this approach could
support granules.  You just have to define the canonicalize function
in terms of the granule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 12:21 -0500, Robert Haas wrote:
  It doesn't allow for all of the suggested features. In particular, it
  would not allow granules to be specified for discrete ranges. But on
  balance, it seems like this is the most conceptually simple and I think
  it satisfies the primary use cases.
 
 Maybe I'm missing something, but it seems like this approach could
 support granules.  You just have to define the canonicalize function
 in terms of the granule.

I meant that it doesn't support them as an explicit, user-visible
concept.

The main drawback here is that only a select group of people will be
defining discrete range types at all, because it would require them to
define a function first. Perhaps that's for the best, because, (as Tom
pointed out) we don't want someone using floats and then specifying a
granule of '0.01'.

While we're talking about it, one question I had is: should the
canonicalize function be:
  /* works on the deserialized information right before serialization */
  canonical(flags, lower_bound, upper_bound)
or
  /* works on the serialized form right after serialization */
  range = canonical(range)

I would lean toward the latter because it's simpler on the user (and
allows non-C functions). But perhaps an efficiency argument could be
made for the former because it could avoid one round of
deserialize/reserialize when the representation is not already in
canonical form.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 14:18 +, Florian Weimer wrote:
 * Jeff Davis:
 
  4. For the GiST penalty function, and perhaps some picksplit algorithms,
  it might be nice to know the length of a range, or do some other kinds
  of math. It introduces a lot of complexity to try to define math
  functions for each subtype, and try to make sure they behave sanely. So
  I was thinking that the user might need to specify a function that
  converts the subtype into a float that approximates a value's position
  in the total order.
 
 Doesn't the eqsel hint already provide this information?
 

Can you clarify what you mean? I don't know what the eqsel hint is.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 23:04 +0900, Hitoshi Harada wrote:
 2011/1/4 Jeff Davis pg...@j-davis.com:
  I have been updating my work in progress here:
 
  http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes
 
  Right now, it's not in a reviewable state, but those interested can
  glance through the code.
 
  Quick synopsis (for illustration purposes only; don't expect much from
  the current code):
 
   CREATE TYPE numrange
 AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);
 
 I am interested in how you define increment/decrement operation of
 range value in discrete types. The window functions and PARTITION also
 want to represent RANGE but there's no clear solution.
 
 Sorry if it's already been discussed since I didn't track the threads.

The user would specify a canonical function like:

   CREATE TYPE int4range AS RANGE (SUBTYPE=int4, SUBTYPE_CMP=btint4cmp,
 CANONICAL=my_int4range_canonical);

That function would be called when constructing ranges on input or after
a computation, and could change something like (1,4] into [2,4] if you
prefer the latter form.

So the range types would not have increments, decrements, granules, or
knowledge about the difference type (e.g. interval is the difference
type for timestamp).

What support do you need/want from range types to help with new window
function features?

Also, partitioning might have some use for range types to represent
range partitions. Comments are welcome.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Robert Haas

On Tue, Jan 4, 2011 at 1:18 PM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2011-01-04 at 12:21 -0500, Robert Haas wrote:
  It doesn't allow for all of the suggested features. In particular, it
  would not allow granules to be specified for discrete ranges. But on
  balance, it seems like this is the most conceptually simple and I think
  it satisfies the primary use cases.

 Maybe I'm missing something, but it seems like this approach could
 support granules.  You just have to define the canonicalize function
 in terms of the granule.

 I meant that it doesn't support them as an explicit, user-visible
 concept.

 The main drawback here is that only a select group of people will be
 defining discrete range types at all, because it would require them to
 define a function first. Perhaps that's for the best, because, (as Tom
 pointed out) we don't want someone using floats and then specifying a
 granule of '0.01'.

 While we're talking about it, one question I had is: should the
 canonicalize function be:
  /* works on the deserialized information right before serialization */
  canonical(flags, lower_bound, upper_bound)
 or
  /* works on the serialized form right after serialization */
  range = canonical(range)

 I would lean toward the latter because it's simpler on the user (and
 allows non-C functions).

Yeah, me too.

 But perhaps an efficiency argument could be
 made for the former because it could avoid one round of
 deserialize/reserialize when the representation is not already in
 canonical form.

I believe this might be an appropriate time to apply Knuth's Law.  I'm
not thrilled with the amount of palloc overhead we have in the
backend, but absent some evidence that this case is going to be
particularly significant, I'd be disinclined to contort the interface.
 I suspect that if you run oprofile this won't be the bottleneck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Josh Berkus

On 1/4/11 10:18 AM, Jeff Davis wrote:
 The main drawback here is that only a select group of people will be
 defining discrete range types at all, because it would require them to
 define a function first. Perhaps that's for the best, because, (as Tom
 pointed out) we don't want someone using floats and then specifying a
 granule of '0.01'.

Frankly, I'm still not convinced that *anyone* will really need discrete
range types -- as opposed to continuous range types, which I'm already
using in production ala temporal.  So I'm completely OK with making
discrete range types hard to use, as long as continous range types are
easy to use.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-04 Thread Jeff Davis

On Tue, 2011-01-04 at 16:45 -0800, Josh Berkus wrote:
 On 1/4/11 10:18 AM, Jeff Davis wrote:
  The main drawback here is that only a select group of people will be
  defining discrete range types at all, because it would require them to
  define a function first. Perhaps that's for the best, because, (as Tom
  pointed out) we don't want someone using floats and then specifying a
  granule of '0.01'.
 
 Frankly, I'm still not convinced that *anyone* will really need discrete
 range types 

Well, *need* is a standard that can never be met. But with something
like a date range, it's very possible that a discrete version matches
the real-world problem more closely than a continuous one.

If you use only continuous ranges, then be careful to stick with exactly
one convention, or you will likely get wrong results (I think this point
has already been established). That sounds easy, but consider:
 * If you want to know whether two ranges are adjacent (a common
requirement), then you need to use [ ) or ( ].
 * If you need to map a single point into a range, the only thing that
makes sense is [ ].
 * If your query contains current_date, you'll probably want ranges that
are either in ( ] or [ ] form.
 * If you are mixing data sets, they may use different conventions.

You can work around all of these problems by making the query more
complex (and more error-prone). But I wouldn't like to give up on
discrete ranges for types where it really makes sense (dates, IPs,
integers).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: Range Types

2011-01-03 Thread Jeff Davis

I have been updating my work in progress here:

http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

Right now, it's not in a reviewable state, but those interested can
glance through the code.

Quick synopsis (for illustration purposes only; don't expect much from
the current code):

CREATE TYPE numrange
AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp);

Much of the previous discussion seemed to focus on two issues:

1. Reconciling discrete ranges (like ranges of integers) and continuous
ranges (like ranges of numeric).

I liked Robert's suggestion here:

http://archives.postgresql.org/message-id/aanlktiks_x93_k82b4f_ga634wci0oeb9ftrurf28...@mail.gmail.com

which says that the user can just define a canonicalize function that
will take a range as input (or perhaps the logical pieces of a range)
and put it into an appropriate canonical representation. For instance,
int4range_canonical might take (1,4] and turn it into [2,4]. This is
similar to a few other ideas, but Robert's idea seems to require the
least effort by the person defining the range type, because postgresql
can still handle representation.

It doesn't allow for all of the suggested features. In particular, it
would not allow granules to be specified for discrete ranges. But on
balance, it seems like this is the most conceptually simple and I think
it satisfies the primary use cases.

2. Representational issues. There are many possibilities here:
a. flags for inclusivity, start, and offset
b. flags for inclusivity, start, and end
c. if it's a discrete range, start and end only might suffice
d. if it's a discrete range, perhaps something involving granules

(a) might be interesting, and for some data types might be more compact,
but it introduces a new datatype that is distinct from the range's
subtype: the difference type (that is, for timestamps it's
interval). This approach seemed reasonable on paper, but it involves a
lot of extra complexity, and introduces some strange assumptions (using
an offset of 1 month versus 30 days can't be allowed).

I chose (b). This is the simplest. If desired, we could still allow the
user to specify their own serialize/deserialize functions, which can get
most of the benefits of the other ones anyway.

Other issues:

1. I plan to introduce an ANYRANGE type.

2. We need to use the subtype's IO functions, but those may not be
immutable. So, rather than create new IO functions for each range type,
I was thinking that I'd use just three (anyrange_i_in, anyrange_s_in,
and anyrange_v_in), and select the right one at definition time, based
on the subtype's IO functions' volatility. That seems like a bit of a
hack -- any better ideas?

3. Right now I allow user-defined parse/deparse functions to be
specified. In almost all cases, I would think that we want the text
format to be something like:
[ 2010-01-01, 2011-01-01 )
where the brackets denote inclusivity, and the left and right sides can
be optionally double-quoted. Is it even worth having these parse/deparse
functions, or should we just force the obvious format?

4. For the GiST penalty function, and perhaps some picksplit algorithms,
it might be nice to know the length of a range, or do some other kinds
of math. It introduces a lot of complexity to try to define math
functions for each subtype, and try to make sure they behave sanely. So
I was thinking that the user might need to specify a function that
converts the subtype into a float that approximates a value's position
in the total order. Any better ideas?

Overall:

I think this is one of the simpler designs. Conceptually, defining new
ranges of different granularity with ease sounds like a great idea --
but it introduces a lot of complexity (and generated a lot of different
opinions), so it was not included in this design. Similarly, I am
leaning away from lots of user-specified options unless there is a real
use case.

Any suggestions or comments welcome.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

40 matches

Mail list logo