subject:"\[HACKERS\] Range Types and extensions"

Re: [HACKERS] Range Types and extensions

2011-06-21 Thread Jeff Davis

On Mon, 2011-06-20 at 12:54 -0700, Darren Duncan wrote:
 That DOMAIN-based solution ostensibly sounds like a good one then, under the 
 circumstances.

It's not bad from a theoretical standpoint, but it does require some
extra type annotation, which is not really the SQL way.

   What I *don't* want to see is for things like ranges to have 
 their own collations and the like.

I'm not 100% sure what you mean here. If you mean that you don't want
range types to pay attention to COLLATE clauses, etc., then I agree. I
would also agree if you mean that range values should not carry the
collation with them.

However, it looks like we might try to make the opclass/collation pair a
property of the range type definition. That seems nice, because it
allows us to keep the nice properties of ranges as well as the type
inference and polymorphism for everything except the constructors.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-21 Thread Jeff Davis

On Mon, 2011-06-20 at 13:43 -0400, Tom Lane wrote:
 The other viable alternative seems to be to require those two properties
 (btree opclass and collation) to be part of a specific range type
 definition.  The complaint about that seemed to be that we couldn't
 infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
 that by identifying one range type as the default for the base type,
 and then using that one in cases where we have no ANYRANGE input?

Yes, that sounds similar to Florian's suggestion, and I think there may
be a solution down this path. However, if we're going to have range
types with non-default orderings, then we need a way to construct them.

I suggested that, if constructors are the primary problem case, then
just generate non-polymorphic constructors at range type definition
time, named after the range type name. I'll look into that approach.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Jeff Davis

On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
 If I'm not mistaken about this, that would imply that we also cannot
 have two range types with the same base type, the same opclass,
 but different collations. Which seems rather unfortunate... In fact,
 if that's true, maybe restricing range types to the database collation
 would be best...

Yes, we cannot have two range types with the same base type. That is a
consequence of the polymorphic type system, which needs to be able to
determine the range type given the base type.

A workaround is to use domains. That is effective, but awkward. For
instance, given:
  CREATE DOMAIN textdomain AS text;
  CREATE TYPE textdomainrange AS RANGE (subtype=textdomain);
then:
  '[a,z)'::textdomainrange @ 'b'::textdomain
would work, but:
  '[a,z)'::textdomainrange @ 'b'
would not, which would be annoying.

I don't see a way around this. It's not a collation problem, but a
general multiple range types with the same subtype problem.

I don't think there's much benefit in restricting it to the DB
collation. If someone really needs a different collation (or opclass,
for that matter), it might as well be allowed, even if you have to do
extra type annotations.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Robert Haas

On Mon, Jun 20, 2011 at 2:33 AM, Jeff Davis pg...@j-davis.com wrote:
 On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
 If I'm not mistaken about this, that would imply that we also cannot
 have two range types with the same base type, the same opclass,
 but different collations. Which seems rather unfortunate... In fact,
 if that's true, maybe restricing range types to the database collation
 would be best...

 Yes, we cannot have two range types with the same base type. That is a
 consequence of the polymorphic type system, which needs to be able to
 determine the range type given the base type.

Boy, that's an unfortunate limitation.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread David Fetter

On Sun, Jun 19, 2011 at 11:33:02PM -0700, Jeff Davis wrote:
 On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
  If I'm not mistaken about this, that would imply that we also
  cannot have two range types with the same base type, the same
  opclass, but different collations. Which seems rather
  unfortunate... In fact, if that's true, maybe restricing range
  types to the database collation would be best...
 
 Yes, we cannot have two range types with the same base type. That is
 a consequence of the polymorphic type system, which needs to be able
 to determine the range type given the base type.
 
 A workaround is to use domains. That is effective, but awkward. For
 instance, given:
   CREATE DOMAIN textdomain AS text;
   CREATE TYPE textdomainrange AS RANGE (subtype=textdomain);
 then:
   '[a,z)'::textdomainrange @ 'b'::textdomain
 would work, but:
   '[a,z)'::textdomainrange @ 'b'
 would not, which would be annoying.
 
 I don't see a way around this. It's not a collation problem, but a
 general multiple range types with the same subtype problem.

How might you address that problem, assuming you had the needed
resources to do it?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Florian Pflug

On Jun20, 2011, at 15:19 , Robert Haas wrote:
 On Mon, Jun 20, 2011 at 2:33 AM, Jeff Davis pg...@j-davis.com wrote:
 On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
 If I'm not mistaken about this, that would imply that we also cannot
 have two range types with the same base type, the same opclass,
 but different collations. Which seems rather unfortunate... In fact,
 if that's true, maybe restricing range types to the database collation
 would be best...
 
 Yes, we cannot have two range types with the same base type. That is a
 consequence of the polymorphic type system, which needs to be able to
 determine the range type given the base type.
 
 Boy, that's an unfortunate limitation.  :-(

Hm, I'm starting to wonder if there isn't a way around that. It seems that
this restriction comes from the desire to allow functions with the
polymorphic signature
  (ANYELEMENT, ANYELEMENT) - ANYRANGE.

The only such function I can currently come up with is the generic
range constructor. Is having that worth the restriction to one
range type per base type?

Another option might be to extend polymorphic argument matching
to allow functions with the signature
  (non-polymorphic args) - polymorphic type
but to require the concrete output type to be specified with a cast
at the call site. For the generic range constructor, you'd then
have to write
  RANGE(lower, upper)::range_type

(If we had that, we could also (finally) provide functions to
set and get fields of composite types by name. As it stands,
doing that cleanly is hard because the desired signature of
the get function, namely
  (record, fieldname text) - anyelement
is not supported.)

A third approach might be to first define a PAIR type and then
define ranges on top of that. Since PAIR types wouldn't include
a comparison operators, the restriction to one PAIR type per
base type wouldn't matter. Instead of a generic RANGE constructor
you'd then use the generic PAIR constructor and cast the resulting
PAIR to whatever range you desire, i.e. write
  PAIR(lower, upper)::range_type.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 20, 2011 at 2:33 AM, Jeff Davis pg...@j-davis.com wrote:
 Yes, we cannot have two range types with the same base type. That is a
 consequence of the polymorphic type system, which needs to be able to
 determine the range type given the base type.

 Boy, that's an unfortunate limitation.  :-(

Given the need to deal with multiple collations for collatable types,
I'd say it's not so much unfortunate as utterly unworkable.  At
least unless you give up the notion of binding the collation into the
type definition ... which has other issues, per discussion a few days
ago.  Even ignoring collations, I really think we want to allow multiple
range types for base types that have multiple btree sort orderings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Greg Stark

On Mon, Jun 20, 2011 at 3:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the need to deal with multiple collations for collatable types,
 I'd say it's not so much unfortunate as utterly unworkable.  At
 least unless you give up the notion of binding the collation into the
 type definition ... which has other issues, per discussion a few days
 ago.  Even ignoring collations, I really think we want to allow multiple
 range types for base types that have multiple btree sort orderings.

I was imagining it would be not part of the type but part of the
internal data in the range type. The dumped representation would look
something like ['bar','baz',''en_US'] and input forms like
['bar','baz'] would just default to the database default collation or
the session's default collation or whatever.

The most disturbing thing about this is that it would make
unrestorable dumps if any of those collation names change or are not
installed before the data is loaded. It's kind of like having your
table names embedded in a text column in your tables. It could make
things awkward to manage later.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 On Mon, Jun 20, 2011 at 3:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the need to deal with multiple collations for collatable types,
 I'd say it's not so much unfortunate as utterly unworkable.  At
 least unless you give up the notion of binding the collation into the
 type definition ... which has other issues, per discussion a few days
 ago.  Even ignoring collations, I really think we want to allow multiple
 range types for base types that have multiple btree sort orderings.

 I was imagining it would be not part of the type but part of the
 internal data in the range type. The dumped representation would look
 something like ['bar','baz',''en_US'] and input forms like
 ['bar','baz'] would just default to the database default collation or
 the session's default collation or whatever.

 The most disturbing thing about this is that it would make
 unrestorable dumps if any of those collation names change or are not
 installed before the data is loaded. It's kind of like having your
 table names embedded in a text column in your tables. It could make
 things awkward to manage later.

Yeah.  In particular this would cause issues for pg_upgrade, which would
have to somehow ensure that collation OIDs didn't change between old and
new installations, which is just about impossible given the current
method for assigning them.  I think we need to avoid that, really.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Jeff Davis

On Mon, 2011-06-20 at 16:01 +0200, Florian Pflug wrote:
 Hm, I'm starting to wonder if there isn't a way around that. It seems that
 this restriction comes from the desire to allow functions with the
 polymorphic signature
   (ANYELEMENT, ANYELEMENT) - ANYRANGE.
 
 The only such function I can currently come up with is the generic
 range constructor. Is having that worth the restriction to one
 range type per base type?

Good point.

Having constructors is obviously important, but perhaps they don't have
to be generic. We could generate catalog entries for each constructor
for each range type, and name them after the range type itself. So,
instead of:
  range(1, 10)
you'd write:
  int4range(1,10)

That actually might be better anyway, because relying on the polymorphic
version is not perfect now anyway. For instance, if you want an
int8range using the generic range() constructor, you need a cast.

We'd still need to get the polymorphic type system to work the way we
want in this case. I'll look into that.

 Another option might be to extend polymorphic argument matching
 to allow functions with the signature
   (non-polymorphic args) - polymorphic type
 but to require the concrete output type to be specified with a cast
 at the call site. For the generic range constructor, you'd then
 have to write
   RANGE(lower, upper)::range_type

Interesting idea. 

 A third approach might be to first define a PAIR type and then
 define ranges on top of that. Since PAIR types wouldn't include
 a comparison operators, the restriction to one PAIR type per
 base type wouldn't matter. Instead of a generic RANGE constructor
 you'd then use the generic PAIR constructor and cast the resulting
 PAIR to whatever range you desire, i.e. write
   PAIR(lower, upper)::range_type.

Another interesting idea. A little awkward though, and doesn't offer
much opportunity to specify inclusivity/exclusivity of the bounds.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Merlin Moncure

On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-06-20 at 16:01 +0200, Florian Pflug wrote:
 Hm, I'm starting to wonder if there isn't a way around that. It seems that
 this restriction comes from the desire to allow functions with the
 polymorphic signature
   (ANYELEMENT, ANYELEMENT) - ANYRANGE.

 The only such function I can currently come up with is the generic
 range constructor. Is having that worth the restriction to one
 range type per base type?

 Good point.

 Having constructors is obviously important, but perhaps they don't have
 to be generic. We could generate catalog entries for each constructor
 for each range type, and name them after the range type itself. So,
 instead of:
  range(1, 10)
 you'd write:
  int4range(1,10)

 That actually might be better anyway, because relying on the polymorphic
 version is not perfect now anyway. For instance, if you want an
 int8range using the generic range() constructor, you need a cast.

 We'd still need to get the polymorphic type system to work the way we
 want in this case. I'll look into that.

hm, what if there *was( only one range type per base type, but in the
various contexts where specific ordering and collation was important
you could optionally pass them in?  Meaning, the specific ordering was
not bound rigidly to the type, but to the operation?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Florian Pflug

On Jun20, 2011, at 19:16 , Merlin Moncure wrote:
 On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis pg...@j-davis.com wrote:
 hm, what if there *was( only one range type per base type, but in the
 various contexts where specific ordering and collation was important
 you could optionally pass them in?  Meaning, the specific ordering was
 not bound rigidly to the type, but to the operation?

I suggested that previously here
  http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php

In the ensuing discussion, however, it became clear that by doing so
range types become little more than a pair of values. More specifically,
a range then *doesn't* represent a set of values, because whether or
not a value is in the range depends on a specific sort order.

Actually, you'd probably even loose the possibility of having a
normalization function for discrete base types (which makes sure
that we know that [1,2] is the same as [1,3)), because it's
hard to image one normalization function that works sensibly for
two different orderings.

So by doing that, you effectively turn a RANGE into a quadruple
(lower type, lower_included bool, upper type, upper_included bool).

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Tom Lane

Florian Pflug f...@phlo.org writes:
 On Jun20, 2011, at 19:16 , Merlin Moncure wrote:
 On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis pg...@j-davis.com wrote:
 hm, what if there *was( only one range type per base type, but in the
 various contexts where specific ordering and collation was important
 you could optionally pass them in?  Meaning, the specific ordering was
 not bound rigidly to the type, but to the operation?

 I suggested that previously here
   http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php

 In the ensuing discussion, however, it became clear that by doing so
 range types become little more than a pair of values. More specifically,
 a range then *doesn't* represent a set of values, because whether or
 not a value is in the range depends on a specific sort order.

Yeah, that doesn't seem like the way to go.  If a range value doesn't
represent a well-defined set of base-type values, we lose a lot of the
mathematical underpinnings for range operations.

So ... just how awful would it be if we hard-wired range types to always
use their base type's default btree sort ordering and the database's
default collation?  In principle that sucks, but I'm not sure how wide
the use-cases actually will be for other choices.

The other viable alternative seems to be to require those two properties
(btree opclass and collation) to be part of a specific range type
definition.  The complaint about that seemed to be that we couldn't
infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
that by identifying one range type as the default for the base type,
and then using that one in cases where we have no ANYRANGE input?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Darren Duncan


Tom Lane wrote:

Florian Pflug f...@phlo.org writes:

On Jun20, 2011, at 19:16 , Merlin Moncure wrote:

On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis pg...@j-davis.com wrote:
hm, what if there *was( only one range type per base type, but in the
various contexts where specific ordering and collation was important
you could optionally pass them in?  Meaning, the specific ordering was
not bound rigidly to the type, but to the operation?



I suggested that previously here
  http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php



In the ensuing discussion, however, it became clear that by doing so
range types become little more than a pair of values. More specifically,
a range then *doesn't* represent a set of values, because whether or
not a value is in the range depends on a specific sort order.


Yeah, that doesn't seem like the way to go.  If a range value doesn't
represent a well-defined set of base-type values, we lose a lot of the
mathematical underpinnings for range operations.

So ... just how awful would it be if we hard-wired range types to always
use their base type's default btree sort ordering and the database's
default collation?  In principle that sucks, but I'm not sure how wide
the use-cases actually will be for other choices.

The other viable alternative seems to be to require those two properties
(btree opclass and collation) to be part of a specific range type
definition.  The complaint about that seemed to be that we couldn't
infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
that by identifying one range type as the default for the base type,
and then using that one in cases where we have no ANYRANGE input?

regards, tom lane


I still think that the most elegant solution is for stuff like collation to just 
be built-in to the base types that the range is ranging over, meaning we have a 
separate text base type for each text collation, and the text operators are 
polymorphic over all those base types.  Having collations and stuff as something 
off to the side not built-in to text/etc types is the root of the problem.  The 
range-specific stuff can remain ANYELEMENT and no special-casing is required. 
Also, besides range constructors, a generic membership test like value in 
range is polymorphic. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Tom Lane

Darren Duncan dar...@darrenduncan.net writes:
 I still think that the most elegant solution is for stuff like collation to 
 just 
 be built-in to the base types that the range is ranging over, meaning we have 
 a 
 separate text base type for each text collation, and the text operators are 
 polymorphic over all those base types.  Having collations and stuff as 
 something 
 off to the side not built-in to text/etc types is the root of the
 problem.

I tend to agree that this aspect of the SQL standard isn't terribly well
designed, but it's the standard and we're stuck with it.  We're not
going to support two parallel methods of dealing with collations.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Florian Pflug

On Jun20, 2011, at 20:58 , Tom Lane wrote:
 Darren Duncan dar...@darrenduncan.net writes:
 I still think that the most elegant solution is for stuff like collation to 
 just 
 be built-in to the base types that the range is ranging over, meaning we 
 have a 
 separate text base type for each text collation, and the text operators are 
 polymorphic over all those base types.  Having collations and stuff as 
 something 
 off to the side not built-in to text/etc types is the root of the
 problem.
 
 I tend to agree that this aspect of the SQL standard isn't terribly well
 designed, but it's the standard and we're stuck with it.  We're not
 going to support two parallel methods of dealing with collations.

Plus, you can always define a DOMAIN for every collation you intent to use,
and stay clear of COLLATE clauses except as part of these domain definitions.

Most interestingly, this is also the workaround Jeff Davis suggested for
those who absolutely need two range types over the same base type (i.e.
define one of the ranges over a domain).

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Darren Duncan


Florian Pflug wrote:

On Jun20, 2011, at 20:58 , Tom Lane wrote:

Darren Duncan dar...@darrenduncan.net writes:
I still think that the most elegant solution is for stuff like collation to just 
be built-in to the base types that the range is ranging over, meaning we have a 
separate text base type for each text collation, and the text operators are 
polymorphic over all those base types.  Having collations and stuff as something 
off to the side not built-in to text/etc types is the root of the

problem.

I tend to agree that this aspect of the SQL standard isn't terribly well
designed, but it's the standard and we're stuck with it.  We're not
going to support two parallel methods of dealing with collations.


Plus, you can always define a DOMAIN for every collation you intent to use,
and stay clear of COLLATE clauses except as part of these domain definitions.

Most interestingly, this is also the workaround Jeff Davis suggested for
those who absolutely need two range types over the same base type (i.e.
define one of the ranges over a domain).

best regards,
Florian Pflug


That DOMAIN-based solution ostensibly sounds like a good one then, under the 
circumstances.  What I *don't* want to see is for things like ranges to have 
their own collations and the like.  From the perspective of all range-specific 
things, the types over which they're defined like text should just have their 
own native ordering, which defines the range's sense of before and after. 
If DOMAIN effectively does that for text types, then that is the way to go. -- 
Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-19 Thread Florian Pflug

On Jun19, 2011, at 00:23 , Jeff Davis wrote:
 On Sat, 2011-06-18 at 22:19 +0200, Florian Pflug wrote:
 Yes, that seems necessary for consistency. That leaves the question
 of what to do if someone tries to modify a textrange's collation with
 a COLLATE clause. For example,
 
 For example, whats the result of
  'Ä' in '[A,Z']::textrange_german COLLATE 'C'
 where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
 in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
 that textrange_german was defined with collation 'de_DE').
 
 With the set-based definition of ranges, the only sensible thing
 is to simply ignore the COLLATE clause I think.
 
 I think rejecting it makes more sense, so a range would not be a
 collatable type; it just happens to use collations of the subtype
 internally.

Ah, crap, I put the COLLATE in the wrong place. What I actually
had in mind was
  ('Ä' COLLATE 'C') in '[A,Z]'::textrange_german

I was afraid that the in operator cannot distinguish this case
from
  field in '[A,Z]'::textrange_german
where field is declared with COLLATE 'C'.

In the seconds case, throwing an error seems a bit harsh

There's also this fun little case
  field in '[A,Z]'
(note lack of an explicit cast). Here the input function would
probably need to verify that there's a range type corresponding
to the field's type *and* that the range type's collation matches
the field's collation. I wonder if that's possible - Tom said
somewhere that input function don't receive collation information,
though I don't know if that restriction applies in this case.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-19 Thread Martijn van Oosterhout

On Sun, Jun 19, 2011 at 11:21:28AM +0200, Florian Pflug wrote:
  I think rejecting it makes more sense, so a range would not be a
  collatable type; it just happens to use collations of the subtype
  internally.
 
 Ah, crap, I put the COLLATE in the wrong place. What I actually
 had in mind was
   ('Ä' COLLATE 'C') in '[A,Z]'::textrange_german

Operators don't have to be collation sensetive. If they're not then the
COLLATE in the above statement is redundant.  You can decide that an
interval needs an implicit collation and you can just use that.

 I was afraid that the in operator cannot distinguish this case
 from
   field in '[A,Z]'::textrange_german
 where field is declared with COLLATE 'C'.

It should be able to, after all in the first case the collation is
explicit, in the latter implicit.

 There's also this fun little case
   field in '[A,Z]'
 (note lack of an explicit cast). Here the input function would
 probably need to verify that there's a range type corresponding
 to the field's type *and* that the range type's collation matches
 the field's collation. I wonder if that's possible - Tom said
 somewhere that input function don't receive collation information,
 though I don't know if that restriction applies in this case.

Collation checking is generally done by the planner. I don't see why
the input function should check, the result of an input function is by
definition DEFAULT. It's up to the 'in' operator to check.

Note that the whole idea of collation is not really supposed to be
assigned to object for storage.  How that can be resolved I'm not sure.

Mvg,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Range Types and extensions

2011-06-19 Thread Jeff Davis

On Sun, 2011-06-19 at 12:24 +0200, Martijn van Oosterhout wrote:
 Collation checking is generally done by the planner. I don't see why
 the input function should check, the result of an input function is by
 definition DEFAULT. It's up to the 'in' operator to check.
 
 Note that the whole idea of collation is not really supposed to be
 assigned to object for storage.  How that can be resolved I'm not sure.

I think if we just say that it's a property of the range type
definition, then that's OK. It's similar to specifying a non-default
btree opclass for the range type -- it just changes which total order
the range type adheres to.

If you meant that the collation shouldn't be stored along with the value
itself, then I agree.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-19 Thread Florian Pflug

On Jun19, 2011, at 20:08 , Jeff Davis wrote:
 On Sun, 2011-06-19 at 12:24 +0200, Martijn van Oosterhout wrote:
 Collation checking is generally done by the planner. I don't see why
 the input function should check, the result of an input function is by
 definition DEFAULT. It's up to the 'in' operator to check.
 
 Note that the whole idea of collation is not really supposed to be
 assigned to object for storage.  How that can be resolved I'm not sure.
 
 I think if we just say that it's a property of the range type
 definition, then that's OK. It's similar to specifying a non-default
 btree opclass for the range type -- it just changes which total order
 the range type adheres to.

In fact, it's exactly the same, because what we *actually* need to
specify is not an opclass but a comparison operator. Which is only
well-defined if you know *both* an opclass *and* a collation.

That reminds me - the conclusion there was that we cannot have
two range types with the same base type but different opclasses,
wasn't it?

AFAIR precisely because otherwise there's no sensible way to handle
  'text' in '[lower,upper]'

If I'm not mistaken about this, that would imply that we also cannot
have two range types with the same base type, the same opclass,
but different collations. Which seems rather unfortunate... In fact,
if that's true, maybe restricing range types to the database collation
would be best...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-18 Thread Jeff Davis

On Fri, 2011-06-10 at 00:26 +0200, Florian Pflug wrote:
 Maybe that check should just be removed? If one views the range
 '[L, U)' as a concise way of expressing L = x AND x  U for some
 x, then allowing the case L  U seems quite natural. There won't
 be any such x of course, but the range is still valid, just empty.

[ Please excuse the late reply, I was on vacation. ]

That's an interesting perspective, but I don't think it's a good idea. 

Up to this point, I've considered a range value to be a set of
contiguous values, and the endpoints just happen to be a way to
represent that set. If changing the collation changes a set of positive
cardinality into an empty set, clearly it's a different value.

We don't want the COLLATE clause to change the value, because things
that do change the value (like a typecast) should offer the opportunity
to call a function so that you can verify that it's valid or change it
to some canonical form.

So, I believe that you are proposing to change the concept of a range
value from a contiguous set of values to a pair of bounds. There are
numerous implications, one of which is that I don't think that we can
maintain the equality of all empty ranges. Consider these expressions,
where x is a non-empty range with collation A, but is empty in
collation B (and * means range intersection):

  (x COLLATE B) COLLATE A
  ((x COLLATE B) * '(-Inf, Inf)') COLLATE A
  ('-'::textrange * '(-Inf, Inf)') COLLATE A

All of those expressions should be equal (according to global
substitutibility, as Darren mentioned). But they can't be, because the
last expression is always an empty range, whereas the first one is not
(because merely changing the collation back and forth offers no
opportunity to even notice that you have an empty range at one point).
So, I believe that we'd be stuck with non-equal empty ranges, as well as
many other possibly non-intuitive implications.

So, I lean strongly toward the interpretation that a range is a
contiguous set of values, and changing the collation should not change
the value. Things that do change the value (like a typecast) should
offer the opportunity to handle cases like this with a function call,
but changing collation does not.

This leaves making the collation a part of the range type itself (as
Robert suggested).

Comments?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-18 Thread Florian Pflug

On Jun18, 2011, at 10:10 , Jeff Davis wrote:
 On Fri, 2011-06-10 at 00:26 +0200, Florian Pflug wrote:
 So, I believe that you are proposing to change the concept of a range
 value from a contiguous set of values to a pair of bounds.

Yeah. Mostly though because I figured that'd make defining their
semantics easier, not necessarily because that interpretation is
better, though.

 There are
 numerous implications, one of which is that I don't think that we can
 maintain the equality of all empty ranges. Consider these expressions,
 where x is a non-empty range with collation A, but is empty in
 collation B (and * means range intersection):
 
  (x COLLATE B) COLLATE A
  ((x COLLATE B) * '(-Inf, Inf)') COLLATE A
  ('-'::textrange * '(-Inf, Inf)') COLLATE A
 
 All of those expressions should be equal (according to global
 substitutibility, as Darren mentioned). But they can't be, because the
 last expression is always an empty range, whereas the first one is not
 (because merely changing the collation back and forth offers no
 opportunity to even notice that you have an empty range at one point).
 So, I believe that we'd be stuck with non-equal empty ranges, as well as
 many other possibly non-intuitive implications.

Yeah. Once you give up the idea that range is a set, extensionality
(i.e. the axiom there's only one empty range or more precisely
there only one range which no object is a member of) has to go too.

 So, I lean strongly toward the interpretation that a range is a
 contiguous set of values,

Yeah, I agree now, mainly because defining them as a set give rise
to richer semantics than defining them to be a pair. If someone
needs just a pair of values and maybe a BETWEEN operator, that is
easily done with CREATE TYPE and a few SQL or PLPGSQL functions.

 and changing the collation should not change
 the value. Things that do change the value (like a typecast) should
 offer the opportunity to handle cases like this with a function call,
 but changing collation does not.
 
 This leaves making the collation a part of the range type itself (as
 Robert suggested).

Yes, that seems necessary for consistency. That leaves the question
of what to do if someone tries to modify a textrange's collation with
a COLLATE clause. For example,

For example, whats the result of
  'Ä' in '[A,Z']::textrange_german COLLATE 'C'
where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
that textrange_german was defined with collation 'de_DE').

With the set-based definition of ranges, the only sensible thing
is to simply ignore the COLLATE clause I think.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-18 Thread Jeff Davis

On Sat, 2011-06-18 at 22:19 +0200, Florian Pflug wrote:
 Yes, that seems necessary for consistency. That leaves the question
 of what to do if someone tries to modify a textrange's collation with
 a COLLATE clause. For example,
 
 For example, whats the result of
   'Ä' in '[A,Z']::textrange_german COLLATE 'C'
 where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
 in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
 that textrange_german was defined with collation 'de_DE').
 
 With the set-based definition of ranges, the only sensible thing
 is to simply ignore the COLLATE clause I think.

I think rejecting it makes more sense, so a range would not be a
collatable type; it just happens to use collations of the subtype
internally.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-12 Thread Florian Pflug

On Jun12, 2011, at 04:37 , Robert Haas wrote:
 On Thu, Jun 9, 2011 at 6:26 PM, Florian Pflug f...@phlo.org wrote:
 On Jun8, 2011, at 17:46 , Jeff Davis wrote:
 It looks like the type input function may be a problem, because it
 doesn't look like it knows what the collation is yet. In other words,
 PG_GET_COLLATION() is zero for the type input function.
 
 But I need to do a comparison to find out if the range is valid or not.
 For instance:
  '[a, Z)'::textrange
 is valid in en_US but not C.
 
 Maybe that check should just be removed? If one views the range
 '[L, U)' as a concise way of expressing L = x AND x  U for some
 x, then allowing the case L  U seems quite natural. There won't
 be any such x of course, but the range is still valid, just empty.
 
 Actually, thinking for this a bit, I believe this is the only
 way text ranges can support collations. If the validity of a range
 depends on the collation, then changing the collation after creation
 seems weird, since it can make previous valid ranges invalid and
 vice versa.
 
 There could be a function RANGE_EMPTY() which people can put into
 their CHECK constraints if they don't want such ranges to sneak
 into their tables...
 
 I think the collation is going to have to be baked into the type
 definition, no?  You can't just up and change the collation of the
 column as you could for a straight text column, if that might cause
 the contents of some rows to be viewed as invalid.

Now you've lost me. If a text range is simply a pair of strings,
as I suggested, and collations are applied only during comparison
and RANGE_EMPTY(), why would the collation have to be baked into
the type?

If you're referring to the case
  (1) Create table with text-range column and collation C1
  (2) Add check constraint containing RANGE_EMPTY()
  (3) Add data
  (4) Alter column to have collation C2, possibly changing
  the result of RANGE_EMPTY() for existing ranges.
then that points to a problem with ALTER COLUMN.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-12 Thread Robert Haas

On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug f...@phlo.org wrote:
 I think the collation is going to have to be baked into the type
 definition, no?  You can't just up and change the collation of the
 column as you could for a straight text column, if that might cause
 the contents of some rows to be viewed as invalid.

 Now you've lost me. If a text range is simply a pair of strings,
 as I suggested, and collations are applied only during comparison
 and RANGE_EMPTY(), why would the collation have to be baked into
 the type?

 If you're referring to the case
  (1) Create table with text-range column and collation C1
  (2) Add check constraint containing RANGE_EMPTY()
  (3) Add data
  (4) Alter column to have collation C2, possibly changing
      the result of RANGE_EMPTY() for existing ranges.
 then that points to a problem with ALTER COLUMN.

No, I'm saying that you might have a column containing  '[a, Z)', and
someone might change the collation of the column from en_US to C.
When the collation was en_US, the column could legally contain that
value, but now that the collation is C, it can't.  ALTER TABLE isn't
going to recheck the validity of the data when someone changes the
collation: that's only supposed to affect the sort order, not the
definition of what is a legal value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-12 Thread Darren Duncan


Robert Haas wrote:

On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug f...@phlo.org wrote:

I think the collation is going to have to be baked into the type
definition, no?  You can't just up and change the collation of the
column as you could for a straight text column, if that might cause
the contents of some rows to be viewed as invalid.

Now you've lost me. If a text range is simply a pair of strings,
as I suggested, and collations are applied only during comparison
and RANGE_EMPTY(), why would the collation have to be baked into
the type?

If you're referring to the case
 (1) Create table with text-range column and collation C1
 (2) Add check constraint containing RANGE_EMPTY()
 (3) Add data
 (4) Alter column to have collation C2, possibly changing
 the result of RANGE_EMPTY() for existing ranges.
then that points to a problem with ALTER COLUMN.


No, I'm saying that you might have a column containing  '[a, Z)', and
someone might change the collation of the column from en_US to C.
When the collation was en_US, the column could legally contain that
value, but now that the collation is C, it can't.  ALTER TABLE isn't
going to recheck the validity of the data when someone changes the
collation: that's only supposed to affect the sort order, not the
definition of what is a legal value.


You can have the same collation problem even without range types.

Consider the following:
 (1) Create table with the 2 text columns {L,R} and both columns have the 
collation en_US.

 (2) Add check constraint requiring L = R.
 (3) Add a record with the value 'a' for L and 'Z' for R.
 (4) Alter the columns to have the collation C.

Good language design principles demand that the semantics for this simplified 
case and the semantics for replacing {L,R} with a single range-of-text-typed 
column be the same, including what happens with CHECK and ALTER TABLE.


Likewise, anything that affects ORDER BY should affect {,,=,=} and friends 
the same way and vice-versa and likewise should affect range validity.


It makes sense for collation to be considered part of text data types, and 
changing collation is casting from one text type to another.  Generally 
speaking, any inherent or applied aspect of a text or other value (such as 
collation) that affects the results of any deterministic operations on those 
values (such as sorting) should be considered part of the data type of those values.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-12 Thread Robert Haas

On Mon, Jun 13, 2011 at 12:47 AM, Darren Duncan dar...@darrenduncan.net wrote:
 If you're referring to the case
  (1) Create table with text-range column and collation C1
  (2) Add check constraint containing RANGE_EMPTY()
  (3) Add data
  (4) Alter column to have collation C2, possibly changing
     the result of RANGE_EMPTY() for existing ranges.
 then that points to a problem with ALTER COLUMN.

 No, I'm saying that you might have a column containing  '[a, Z)', and
 someone might change the collation of the column from en_US to C.
 When the collation was en_US, the column could legally contain that
 value, but now that the collation is C, it can't.  ALTER TABLE isn't
 going to recheck the validity of the data when someone changes the
 collation: that's only supposed to affect the sort order, not the
 definition of what is a legal value.

 You can have the same collation problem even without range types.

 Consider the following:
  (1) Create table with the 2 text columns {L,R} and both columns have the
 collation en_US.
  (2) Add check constraint requiring L = R.
  (3) Add a record with the value 'a' for L and 'Z' for R.
  (4) Alter the columns to have the collation C.

Oh, good point.

rhaas=# create table sample (t text collate en_US, check (t  'Z'));
CREATE TABLE
rhaas=# insert into sample values ('a');
INSERT 0 1
rhaas=# alter table sample alter column t type text collate C;
ERROR:  check constraint sample_t_check is violated by some row

But interestingly, my Mac has a different notion of how this collation
works: it thinks 'a'  'Z' even in en_US.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-11 Thread Robert Haas

On Thu, Jun 9, 2011 at 6:26 PM, Florian Pflug f...@phlo.org wrote:
 On Jun8, 2011, at 17:46 , Jeff Davis wrote:
 It looks like the type input function may be a problem, because it
 doesn't look like it knows what the collation is yet. In other words,
 PG_GET_COLLATION() is zero for the type input function.

 But I need to do a comparison to find out if the range is valid or not.
 For instance:
  '[a, Z)'::textrange
 is valid in en_US but not C.

 Maybe that check should just be removed? If one views the range
 '[L, U)' as a concise way of expressing L = x AND x  U for some
 x, then allowing the case L  U seems quite natural. There won't
 be any such x of course, but the range is still valid, just empty.

 Actually, thinking for this a bit, I believe this is the only
 way text ranges can support collations. If the validity of a range
 depends on the collation, then changing the collation after creation
 seems weird, since it can make previous valid ranges invalid and
 vice versa.

 There could be a function RANGE_EMPTY() which people can put into
 their CHECK constraints if they don't want such ranges to sneak
 into their tables...

I think the collation is going to have to be baked into the type
definition, no?  You can't just up and change the collation of the
column as you could for a straight text column, if that might cause
the contents of some rows to be viewed as invalid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-09 Thread Florian Pflug

On Jun8, 2011, at 17:46 , Jeff Davis wrote:
 It looks like the type input function may be a problem, because it
 doesn't look like it knows what the collation is yet. In other words,
 PG_GET_COLLATION() is zero for the type input function.
 
 But I need to do a comparison to find out if the range is valid or not.
 For instance:
  '[a, Z)'::textrange
 is valid in en_US but not C.

Maybe that check should just be removed? If one views the range
'[L, U)' as a concise way of expressing L = x AND x  U for some
x, then allowing the case L  U seems quite natural. There won't
be any such x of course, but the range is still valid, just empty.

Actually, thinking for this a bit, I believe this is the only
way text ranges can support collations. If the validity of a range
depends on the collation, then changing the collation after creation
seems weird, since it can make previous valid ranges invalid and
vice versa.

There could be a function RANGE_EMPTY() which people can put into
their CHECK constraints if they don't want such ranges to sneak
into their tables...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-08 Thread Jeff Davis

On Tue, 2011-06-07 at 10:20 -0700, Jeff Davis wrote:
  BTW, Jeff, have you worked out the implications of collations for
  textual range types?
 
 Well, it seems to work is about as far as I've gotten.
 
 As far as the implications, I'll need to do a little more research and
 thinking. But I don't immediately see anything too worrisome.

I take that back :(

It looks like the type input function may be a problem, because it
doesn't look like it knows what the collation is yet. In other words,
PG_GET_COLLATION() is zero for the type input function.

But I need to do a comparison to find out if the range is valid or not.
For instance:
  '[a, Z)'::textrange
is valid in en_US but not C.

The range constructor:
  range('a', 'Z')
is fine though.

Not sure what to do here.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-08 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 It looks like the type input function may be a problem, because it
 doesn't look like it knows what the collation is yet. In other words,
 PG_GET_COLLATION() is zero for the type input function.

Yeah, we've assumed that I/O functions do not need to know collation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Merlin Moncure

On Mon, Jun 6, 2011 at 6:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 I vote for at minimum the type itself and ANYRANGE to be in core.
 From there you could make it like arrays where the range type is
 automatically generated for each POD type.  I would consider that for
 sure on basis of simplicity in user-land unless all the extra types
 and operators are a performance hit.

 Auto-generation of range types isn't going to happen, simply because the
 range type needs more information than is provided by the base type
 declaration.  (First, you need a btree opclass, and second, you need a
 next function if it's a discrete type.)

 By my count there are only about 20 datatypes in core for which it looks
 sensible to provide a range type (ie, it's a non-deprecated,
 non-composite type with a standard default btree opclass).  For that
 many, we might as well just build 'em in.

right. hm -- can you have multiple range type definitions for a
particular type?  I was thinking about a type reduction for casting
like we have for arrays: select '[1,3)'::int{}. but maybe that isn't
specific enough?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Tom Lane

Merlin Moncure mmonc...@gmail.com writes:
 On Mon, Jun 6, 2011 at 6:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 By my count there are only about 20 datatypes in core for which it looks
 sensible to provide a range type (ie, it's a non-deprecated,
 non-composite type with a standard default btree opclass).  For that
 many, we might as well just build 'em in.

 right. hm -- can you have multiple range type definitions for a
 particular type?

In principle, sure, if the type has multiple useful sort orderings.
I don't immediately see any core types for which we'd bother.  (In
particular I don't see a use case for range types corresponding to
the *_pattern_ops btree opclasses, especially now that COLLATE C
has rendered them sorta obsolete.)

BTW, Jeff, have you worked out the implications of collations for
textual range types?  I confess to not having paid much attention
to range types lately.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Jeff Davis

On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote:
 Merlin Moncure mmonc...@gmail.com writes:
  right. hm -- can you have multiple range type definitions for a
  particular type?
 
 In principle, sure, if the type has multiple useful sort orderings.

Right. Additionally, you might want to use different canonical
functions for the same subtype.

 I don't immediately see any core types for which we'd bother.

Agreed.

 BTW, Jeff, have you worked out the implications of collations for
 textual range types?

Well, it seems to work is about as far as I've gotten.

As far as the implications, I'll need to do a little more research and
thinking. But I don't immediately see anything too worrisome.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Jeff Davis

On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
 On this note, here's a *big* thing that needs discussion ...

[ refering to the concept of discrete versus continuous ranges ]

Yes, there has been much discussion on this topic already.

The solution right now is that they both behave like continuous ranges
for most operations. But each time a value is produced, a discrete range
has a canonicalize function that aligns it to the proper boundaries
and chooses a convention from [], [), (], (). For discrete ranges that's
only a convention, because multiple representations are equal in value,
but that's not so for continuous ranges.

Another approach would be to offer next and prev functions instead
of canonical, or a plus(thetype, integer) and minus(thetype,
integer).


 Can Pg be changed to support . in operator names as long as they don't just 
 appear by themselves?  What would this break to do so?

Someone else would have to comment on that. My feeling is that it might
create problems with qualified names, and also with PG's arg.function
call syntax.

 foo in 1..10

 I believe it is quite reasonable to treat ranges like sets, in an abstract 
 sense, and so using set membership syntax like in is valid.

OK, I think I agree with this now. I'll think about it some more.

 I also see these as considerably less important and useful in practice than 
 the 
 continuous intervals.

[ multiranges ]

Agreed. I've left those alone for now, because it's a separate concept.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
 Can Pg be changed to support . in operator names as long as they don't 
 just 
 appear by themselves?  What would this break to do so?

 Someone else would have to comment on that.

DOT_DOT is already a token in plpgsql; trying to make it be also an
operator name would break a lot of existing plpgsql code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Darren Duncan


Jeff Davis wrote:

On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote:

Merlin Moncure mmonc...@gmail.com writes:

right. hm -- can you have multiple range type definitions for a
particular type?

In principle, sure, if the type has multiple useful sort orderings.


Right. Additionally, you might want to use different canonical
functions for the same subtype.


I don't immediately see any core types for which we'd bother.


Agreed.


BTW, Jeff, have you worked out the implications of collations for
textual range types?


Well, it seems to work is about as far as I've gotten.

As far as the implications, I'll need to do a little more research and
thinking. But I don't immediately see anything too worrisome.


I would expect ranges to have exactly the same semantics as ORDER BY or  etc 
with respect to collations for textual range types.


If collation is an attribute of a textual type, meaning that the textual type or 
its values have a sense of their collation built-in, then ranges for those 
textual types should just work without any extra range-specific syntax, same 
as you could say ORDER BY without any further qualifiers.


If collation is not an attribute of a textual type, meaning that you normally 
have to qualify the desired collation for each order-sensitive operation using 
it (even if that can be defined by a session/etc setting which still just 
ultimately works at the operator rather than type level), or if a textual type 
can have it built in but it is overridable per operator, then either ranges 
should have an extra attribute saying what collation (or other type-specific 
order-determining function) to use, or all range operators take the optional 
collation parameter like with ORDER BY.


Personally, I think it is a more elegant programming language design for an 
ordered type to have its own sense of a one true canonical ordering of its 
values, and where one could conceptually have multiple orderings, there would be 
a separate data type for each one.  That is, while you probably only need a 
single type with respect to ordering for any real numeric type, for textual 
types you could have a separate textual type for each collation.


In particular, I say separate type because a collation can sometimes affect 
differently what text values compare as same, as far as I know.


On a tangent, I believe that various insensitive comparisons or sortings are 
very reasonably expressed as collations rather than some other mechanism, eg if 
you wanted sortings that compare different letter case as same or not, or with 
or without accents as same or not.


So under this elegant system, there is no need to ever specify collation at 
the operator level (which could become quite verbose and unweildy), but instead 
you can cast data types if you want to change their sense of canonical ordering.


Now if the various text-specific operators are polymorphic across these text 
type variants, users don't generally have to know the difference except when it 
matters.


On a tangent, I believe that the best definition of equal or same in a type 
system is global substitutability.  Ignoring implementation details, if a 
program ever finds that 2 operands to the generic = (equality test) operator 
result in TRUE, then the program should feel free to replace all occurrences of 
one operand in the program with occurrences of the other, for optimization, 
because generic = returning TRUE means one is just as good as the other.  This 
assumes generally that we're dealing with immutable value types.


-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Darren Duncan


Jeff Davis wrote:

On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
Can Pg be changed to support . in operator names as long as they don't just 
appear by themselves?  What would this break to do so?


Someone else would have to comment on that. My feeling is that it might
create problems with qualified names, and also with PG's arg.function
call syntax.


With respect to qualified names or arg.function, then unless the function 
can be symbolic, I considered your examples to be the appear by themselves, 
hence . by itself wouldn't be a new operator, and I generally assumed here 
that any multi-character operators with . to be symbolic.


In any event, I also saw Tom's reply about DOT_DOT being a token already.

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Dimitri Fontaine

Jeff Davis pg...@j-davis.com writes:
 I'd like to take another look at Range Types and whether part of it
 should be an extension. Some of these issues relate to extensions in
 general, not just range types.

That's a good question :)

I think the way things are going to be organised now is that we will
have core-blessed extensions:  don't mix the mechanism and the policy.

 non-issue if we had a good type interface system (that works on
 polymorphic types) -- we could just have a built-in range interface,
 and the range extension could add  as the range interface's overlaps
 operator for the type ANYRANGE.

That means that this is, IMHO, the right approach.  Have core support
that enables user defined RANGE types with indexing and planner support,
etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

And the useful stuff you need to have to benefit from that core support
would be an extension.  It could be a core maintained extension, and it
could even get installed by default, so that all the users would need to
do is 'CREATE EXTENSION timeranges;', for example.

So, I see us getting those different kinds of extensions in the future:

 a. core extensions, shipped by default
 b. contribs, not shipped by default, maintained by core hackers
 c. examples, included in the source code only, maintained as contribs
 d. “trusted network” of extensions (pgdg, pgxn, debian, privates, etc)
 e. external independent extensions, just as now 

The other main difference between a core extension and a contrib will be
where it's documented.  Greg Smith proposed a patch wherein he moved
some contribs to a new extension/ place, and had them auto installed.

I think the consensus is to instead add a new chapter (maybe between
current chapters 9. Functions and Operators and 10. Type Conversion) and
host “core extensions” docs there.  The source code organisation is
controversial because technically not necessary.  We have to keep the
work Greg did to keep those contribs shipped by default.  Oh, and that
is on the 9.1 Open Items, right?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
 2011/6/6 Darren Duncan dar...@darrenduncan.net:
  Jeff Davis wrote:
 
  I'd like to take another look at Range Types and whether part of it
  should be an extension. Some of these issues relate to extensions in
  general, not just range types.
 
  First of all, what are the advantages to being in core?
 
 it should be supported by FOREACH statement in PL/pgSQL

Oh, good idea. It would only work for discrete ranges though.

However, I would need to somehow reintroduce the concept of next,
which has some hazards to it (as Tom pointed out, we don't want someone
to define the next for a float to be +1.0). I'll have to think about
this.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
 Jeff Davis wrote:
  I'd like to take another look at Range Types and whether part of it
  should be an extension. Some of these issues relate to extensions in
  general, not just range types.
  
  First of all, what are the advantages to being in core?
 
 I believe that ranges aka intervals are widely useful generic types, next 
 after 
 relations/tuples/arrays, and they *should* be supported in core, same as 
 arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

 Now assuming that a range/interval value is generally defined in terms of a 
 pair 
 of endpoints of some ordered type (that is, a type for which ORDER BY or RANK 
 or 
 {,,=,=} etc or LIMIT makes sense), it will be essential that this value 
 is 
 capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

 Also, if Postgres has some concept of type-generic special values -Inf and 
 +Inf 
 (which always sort before or after any other value in the type system), those 
 can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

 Unless you have some other syntax in mind, I suggest lifting the range 
 literal 
 syntax from Perl 6, where .. is an infix operator building a range between 
 its 
 arguments, and a ^ on either side means that side is open, I think; so 
 there 
 are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, . is not valid in an operator name in PG.
Maybe I can use tilde or dash?

 Any operation that wants to deal with a range somehow, such as the BETWEEN 
 syntax, could instead use a range/interval; for example, both of:
 
foo in 1..10

I don't know if it's reasonable to introduce syntax like in here.
Maybe we could just still use between and it would recognize that the
RHS is a range?


 The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

 Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use : as an operator.

 A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

 An array or relation of these range can represent ranges with holes, and the 
 general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Pavel Stehule

2011/6/6 Jeff Davis pg...@j-davis.com:
 On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
 2011/6/6 Darren Duncan dar...@darrenduncan.net:
  Jeff Davis wrote:
 
  I'd like to take another look at Range Types and whether part of it
  should be an extension. Some of these issues relate to extensions in
  general, not just range types.
 
  First of all, what are the advantages to being in core?

 it should be supported by FOREACH statement in PL/pgSQL

 Oh, good idea. It would only work for discrete ranges though.

 However, I would need to somehow reintroduce the concept of next,
 which has some hazards to it (as Tom pointed out, we don't want someone
 to define the next for a float to be +1.0). I'll have to think about
 this.

we can define a step

FOREACH x IN RANGE . BY 
LOOP
END LOOP

Regards

Pavel


 Regards,
        Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 14:42 +0200, Dimitri Fontaine wrote:
 I think the way things are going to be organised now is that we will
 have core-blessed extensions:  don't mix the mechanism and the policy.

I like that idea.

  non-issue if we had a good type interface system (that works on
  polymorphic types) -- we could just have a built-in range interface,
  and the range extension could add  as the range interface's overlaps
  operator for the type ANYRANGE.
 
 That means that this is, IMHO, the right approach.  Have core support
 that enables user defined RANGE types with indexing and planner support,
 etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

If we take the minimal approach, the index support would be the first to
be moved to an extension. In order to have index support in core, we
need quite a few functions and a significant amount of code.

Minimal would be:
  * CREATE TYPE ... AS RANGE
  * ANYRANGE
  * The IO functions
  * Possibly the constructors and accessors ( range(),
range_oc(), lower(), upper(), etc.)

Regarding the type interfaces, the only thing that really worries me
there is that my future work will depend on them existing, and I haven't
really thought through the details. For instance, it just occurred to me
recently that it would need to support polymorphic types, which might be
a little bit more complicated than a simple lookup.

I suppose it's easier to put a few functions in core later if we get
stuck than to rip them out later.

 And the useful stuff you need to have to benefit from that core support
 would be an extension.  It could be a core maintained extension, and it
 could even get installed by default, so that all the users would need to
 do is 'CREATE EXTENSION timeranges;', for example.

Sounds good to me. However, would the extension be available in
pg_regress? If not, I will need to include those constructors/accessors
to be able to test anything.

 I think the consensus is to instead add a new chapter (maybe between
 current chapters 9. Functions and Operators and 10. Type Conversion) and
 host “core extensions” docs there.  The source code organisation is
 controversial because technically not necessary.  We have to keep the
 work Greg did to keep those contribs shipped by default.  Oh, and that
 is on the 9.1 Open Items, right?

OK, so there are still a few things to be decided around documentation
and tests. Both of those things can take a significant amount of time to
rework, so I think I'll leave it alone until we have more of a
consensus.

We still have time before 9.2 to break some of the code out into an
extension when we do have the doc/test issues resolved.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Christopher Browne

On Sun, Jun 5, 2011 at 6:59 PM, Jeff Davis pg...@j-davis.com wrote:
 There might also be some middle ground, where its like the minimalist
 approach, but with a few very basic constructors and accessors. That
 would at least make it easier to test, but then to be actually useful
 (with index support, operators, fancy functions, etc.) you'd need the
 extension.

 Thoughts?

I can see merit to having parts of RANGE implemented in core, along
with some of the usage parts implemented as extensions, so that if
I'm not actually using (say) INET ranges, then the database isn't
cluttered up with all the functions and operators for INET ranges.

How to slice it apart into an appropriate admixture of core and
extensions is a good question, though it seems pretty likely that
having an extension for each data type that is to be mixed into a
range is a reasonable way to go.

I think this also can make some would-be arguments against RANGE go away...

I hate that this RANGE extension means we have to draw 5000 lines of
code into every database, and draws in 275 operator functions
evaporates if the base part is entirely smaller, and if you only
draw in all the functions and operators if you request loading of each
of the 17 extensions.

Per-type extensions offers a pretty natural partitioning of the code
for each type, which seems pretty good.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
 we can define a step
 
 FOREACH x IN RANGE . BY 

That wouldn't need any of the range infrastructure at all -- it would be
purely syntactic, right?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 16:45 +, Christopher Browne wrote:
 How to slice it apart into an appropriate admixture of core and
 extensions is a good question, though it seems pretty likely that
 having an extension for each data type that is to be mixed into a
 range is a reasonable way to go.

...

 Per-type extensions offers a pretty natural partitioning of the code
 for each type, which seems pretty good.

Ideally, most range types can be created with a simple:

CREATE TYPE foorange AS RANGE (subtype=foo);

There might be a few subtype-specific functions, like the canonical
function, but overall it should be a small amount of code per range.
However, I'd say just bundle a bunch of rangetypes together in one
extension. There's not really much cost -- if you are using one range
type, you'll use a few more.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Robert Haas

On Sun, Jun 5, 2011 at 2:59 PM, Jeff Davis pg...@j-davis.com wrote:
 So, where on this spectrum should range types fall? I think the most
 minimalist would be to only support #1 (and the necessary type IO
 functions); and leave all other functions, operators, and opclasses to
 an extension. That has a lot of appeal, but I don't think we can ignore
 the challenges above.

 On the other hand, trying to make it a complete feature in core has
 challenges as well. For instance, even with Range Types, Exclusion
 Constraints aren't practical out-of-the-box unless we also have
 BTree-GiST in core. So there's a snowball effect.

 There might also be some middle ground, where its like the minimalist
 approach, but with a few very basic constructors and accessors. That
 would at least make it easier to test, but then to be actually useful
 (with index support, operators, fancy functions, etc.) you'd need the
 extension.

I don't have clear feeling on this question in general, but if we're
going to break this up into pieces, it's important that they be
logical pieces.  Putting half the feature in core and half into an
extension just because we can will simplify complicate code
maintenance to no good end.  The snowball effect is something to
avoid, and we need to watch out for that, but if the upshot of putting
part of it in core is that the core code can no longer be understood
or maintained because it depends heavily on a bunch of non-core code,
that's not helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Pavel Stehule

2011/6/6 Jeff Davis pg...@j-davis.com:
 On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
 we can define a step

 FOREACH x IN RANGE . BY 

 That wouldn't need any of the range infrastructure at all -- it would be
 purely syntactic, right?


I don't think. For lot of types the specification of a step is
necessary - for date, for float.

Pavel

 Regards,
        Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Ross J. Reedstrom

On Mon, Jun 06, 2011 at 12:53:49PM -0400, Robert Haas wrote:
 
 I don't have clear feeling on this question in general, but if we're
 going to break this up into pieces, it's important that they be
 logical pieces.  Putting half the feature in core and half into an
 extension just because we can will simplify complicate code
 maintenance to no good end.  The snowball effect is something to
 avoid, and we need to watch out for that, but if the upshot of putting
 part of it in core is that the core code can no longer be understood
 or maintained because it depends heavily on a bunch of non-core code,
 that's not helpful.
 
And concretely, code paths that cannot be exercised easily from
core-only code will not get regression tested, and will therefore rot.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Stephen Frost

* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 That means that this is, IMHO, the right approach.  Have core support
 that enables user defined RANGE types with indexing and planner support,
 etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

Yes, we do, however..

 And the useful stuff you need to have to benefit from that core support
 would be an extension.  It could be a core maintained extension, and it
 could even get installed by default, so that all the users would need to
 do is 'CREATE EXTENSION timeranges;', for example.

I don't like the idea of having a capability which is not utilized
in core.  We should make it so extensions can *also* have access to
define their own, but we should have the basics covered in core.

  a. core extensions, shipped by default

Having it as a core extension might work, but I'm not really 'sold' on
it.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Merlin Moncure

On Sun, Jun 5, 2011 at 1:59 PM, Jeff Davis pg...@j-davis.com wrote:
 In the several talks that I've given, a common question is related to
 multiranges (ranges with holes). These get a little complex, and I
 don't have a complete answer. However, multiranges can be approximated
 with ordered arrays of non-overlapping, non-adjacent ranges. If someone
 wants to take it upon themselves to develop a set of operators here,
 that would be great -- but without ANYRANGE the operators would be
 unmanageable.

 2. Documentation and Tests
 --
 Let's say we take a minimalist view, and only have ANYRANGE and CREATE
 TYPE ... AS RANGE in core; and leave the rest as an extension.

 What exactly would the documentation say? I think it would be even more
 hypothetical and abstract than the documentation for Exclusion
 Constraints. So, there is a certain documentation advantage to having at
 least enough functionality to allow someone to try out the feature.

 And the tests for such a minimalist feature would be a significant
 challenge -- what do we do there? Get pg_regress to load the extension
 from PGXN?


 3. Quality
 --
 PostgreSQL has a great reputation for quality, and for good reason. But
 extensions don't follow the same quality-control standards; and even if
 some do, there is no visible stamp of approval. So, to ask someone to
 use an extension means that they have to evaluate the quality for
 themselves, which is a pretty high barrier.

 Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
 many of the other issues, quality control is one of the biggest ones
 remaining. I still get questions about when the temporal type will be
 in core, and I think this is why.

 I don't think this is a good excuse to put it in core though. We need to
 solve this problem, and the best way to start is by getting
 well-reviewed, high-quality extensions out there.


 4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
 -
 There are a few aspects of range types that aren't in the first patch,
 but are fairly obvious follow-up additions. These will require some
 knowledge about ranges in the backend, like finding the overlaps
 operator for a range. The current patch provides this knowledge by
 providing a built-in overlaps operator for ANYRANGE. This would be a
 non-issue if we had a good type interface system (that works on
 polymorphic types) -- we could just have a built-in range interface,
 and the range extension could add  as the range interface's overlaps
 operator for the type ANYRANGE.

 =

 So, where on this spectrum should range types fall? I think the most
 minimalist would be to only support #1 (and the necessary type IO
 functions); and leave all other functions, operators, and opclasses to
 an extension. That has a lot of appeal, but I don't think we can ignore
 the challenges above.

 On the other hand, trying to make it a complete feature in core has
 challenges as well. For instance, even with Range Types, Exclusion
 Constraints aren't practical out-of-the-box unless we also have
 BTree-GiST in core. So there's a snowball effect.

 There might also be some middle ground, where its like the minimalist
 approach, but with a few very basic constructors and accessors. That
 would at least make it easier to test, but then to be actually useful
 (with index support, operators, fancy functions, etc.) you'd need the
 extension.

 Thoughts?

ISTM (I haven't followed all the lead up so apologies if this is
already covered) a range is a 3rd pseudo 'container' type (the other
two being composites and arrays). Do you see:

*) being able to make arrays of ranges/ranges of arrays?
*) range of composites?

I vote for at minimum the type itself and ANYRANGE to be in core.
From there you could make it like arrays where the range type is
automatically generated for each POD type.  I would consider that for
sure on basis of simplicity in user-land unless all the extra types
and operators are a performance hit.

A clean and highly usable implementation in the type system in the
spirit of arrays would be fantastic.  I'm particularly interested in
hypothetical constructor/destructor and in/out mechanics...an 'unnest'
like function, a range(a,b,c) that does as row(a,b,c) does, etc,
especially if you can work it out so that everything is not hammered
through textual processing.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Dimitri Fontaine

Stephen Frost sfr...@snowman.net writes:
 I don't like the idea of having a capability which is not utilized
 in core.  We should make it so extensions can *also* have access to
 define their own, but we should have the basics covered in core.

Well if another part of core depends on the feature set, then of course
you don't have a choice to make it an extension any more.  I think
that's where I would draw the line.

 Having it as a core extension might work, but I'm not really 'sold' on
 it.

Well, core extension means built by default, part of default regression
tests and all.  The regression test simply begins with the create
extension stanza, that's about it.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Darren Duncan


Jeff Davis wrote:

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?
I believe that ranges aka intervals are widely useful generic types, next after 
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.


I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.


Generally speaking, the best way to go about this is to define the *generic* 
data type in the core, and leave most operators to extensions.  So, in core, we 
need to have the way to select a range value over ANYTYPE either completely as a 
value literal or in terms of endpoint values from arbitrary expressions or 
variables, store the range value in a database, retrieve it, and access its 
component attributes (endpoints, open/closed) in user-defined constraint and 
operator definitions.


The fundamental value of ranges is the fact that they're a concise way to store 
and express an interval over an ordered type, and to either compare such 
intervals or test whether individual values or sets of values are in intervals. 
 And people do that a *lot* (such as with dates), so I see having this range 
type, which is generic and orthogonal to other types in the same way as arrays 
or tables are, in core just makes the most sense, and as previously illustrated, 
ranges are useful in places one might not always think about.


Ranges are also much more flexible than BETWEEN for what it does, because AFAIK 
you can't indicate open or closed with BETWEEN.


You should not need to define separate range types or operators for each ordered 
type, same as you should not have to do so for arrays, or where such 
functionality is defined should be similar; whatever functionality for arrays 
you do or don't define in core, do corresponding things for ranges.


Now assuming that a range/interval value is generally defined in terms of a pair 
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or 
{,,=,=} etc or LIMIT makes sense), it will be essential that this value is 
capable of distinguishing open and closed intervals.


Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.


On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about ranges 
is actually being overloaded for 2 very distinct concepts, which are probably 
best dealt with separately, possibly as distinct types.


This discussion came up in the development of Perl 6 too, and that discussion is 
probably worth looking into.


Ranges/intervals in the general sense can *not* be used to enumerate a list of 
values in a standard type-sensical manner, such as FOREACH requires. 
Ranges/intervals are about *comparison*, meaning combinations of tests of how 2 
arbitrary values of an ordered type sort relative to each other, and that's it. 
 This usage works for integers, other numbers, strings, dates, and so on, all 
in a natural manner.


Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and are 
used in distinct contexts.  Enumeration requires next/prev-value operators, 
while ranges/intervals in general do not.  Enumeration requires discrete types 
(or the faking of such) like integers while ranges work for continuous types.


Moreover, in practice, one probably wants enumerations to be more flexible than 
just monotonic increases.  With enumerations you'd probably want to start go 
top-down or bottom-up, you might want to increase geometrically or by some other 
formula rather than incrementally.


I totally agree with sharing syntax and using ranges/intervals to define 
sequence generators, but a range value should be considered immutable like a 
number or string while a sequence generator may mutate.


For syntax, one could use x..y to define an interval while x...y for a 
sequence generator, or that's what Perl 6 does.


See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that 
talks about how Perl 6 does ranges.


Also, if Postgres has some concept of type-generic special values -Inf and +Inf 
(which always sort before or after any other value in the type system), those 
can be used as endpoints to indicate that the interval is unbounded.


I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.


Well, what you have is the least one would want.

Unless you have some

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Tom Lane

Merlin Moncure mmonc...@gmail.com writes:
 I vote for at minimum the type itself and ANYRANGE to be in core.
 From there you could make it like arrays where the range type is
 automatically generated for each POD type.  I would consider that for
 sure on basis of simplicity in user-land unless all the extra types
 and operators are a performance hit.

Auto-generation of range types isn't going to happen, simply because the
range type needs more information than is provided by the base type
declaration.  (First, you need a btree opclass, and second, you need a
next function if it's a discrete type.)

By my count there are only about 20 datatypes in core for which it looks
sensible to provide a range type (ie, it's a non-deprecated,
non-composite type with a standard default btree opclass).  For that
many, we might as well just build 'em in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Range Types and extensions

2011-06-05 Thread Jeff Davis

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

1. ANYRANGE + CREATE TYPE ... AS RANGE
--
This is the most compelling, in my opinion. People can define new range
functions and new range types independently and each one gets the
benefit of the other automatically. Without this, there will be an
explosion of functions and a bunch of inconsistencies like functions
that support most range types but not all (merely because the function
author didn't know that the type existed).

In the several talks that I've given, a common question is related to
multiranges (ranges with holes). These get a little complex, and I
don't have a complete answer. However, multiranges can be approximated
with ordered arrays of non-overlapping, non-adjacent ranges. If someone
wants to take it upon themselves to develop a set of operators here,
that would be great -- but without ANYRANGE the operators would be
unmanageable.

2. Documentation and Tests
--
Let's say we take a minimalist view, and only have ANYRANGE and CREATE
TYPE ... AS RANGE in core; and leave the rest as an extension.

What exactly would the documentation say? I think it would be even more
hypothetical and abstract than the documentation for Exclusion
Constraints. So, there is a certain documentation advantage to having at
least enough functionality to allow someone to try out the feature.

And the tests for such a minimalist feature would be a significant
challenge -- what do we do there? Get pg_regress to load the extension
from PGXN?


3. Quality
--
PostgreSQL has a great reputation for quality, and for good reason. But
extensions don't follow the same quality-control standards; and even if
some do, there is no visible stamp of approval. So, to ask someone to
use an extension means that they have to evaluate the quality for
themselves, which is a pretty high barrier.

Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
many of the other issues, quality control is one of the biggest ones
remaining. I still get questions about when the temporal type will be
in core, and I think this is why.

I don't think this is a good excuse to put it in core though. We need to
solve this problem, and the best way to start is by getting
well-reviewed, high-quality extensions out there.


4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
-
There are a few aspects of range types that aren't in the first patch,
but are fairly obvious follow-up additions. These will require some
knowledge about ranges in the backend, like finding the overlaps
operator for a range. The current patch provides this knowledge by
providing a built-in overlaps operator for ANYRANGE. This would be a
non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in range interface,
and the range extension could add  as the range interface's overlaps
operator for the type ANYRANGE.

=

So, where on this spectrum should range types fall? I think the most
minimalist would be to only support #1 (and the necessary type IO
functions); and leave all other functions, operators, and opclasses to
an extension. That has a lot of appeal, but I don't think we can ignore
the challenges above.

On the other hand, trying to make it a complete feature in core has
challenges as well. For instance, even with Range Types, Exclusion
Constraints aren't practical out-of-the-box unless we also have
BTree-GiST in core. So there's a snowball effect.

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

Thoughts?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-05 Thread Darren Duncan


Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?


I believe that ranges aka intervals are widely useful generic types, next after 
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.


In particular, the usefulness of ranges/intervals is often orthogonal to many 
other things, and for many types including numbers, strings, temporals.


Now assuming that a range/interval value is generally defined in terms of a pair 
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or 
{,,=,=} etc or LIMIT makes sense), it will be essential that this value is 
capable of distinguishing open and closed intervals.


For example, a range value can be represented by a tuple with 4 attributes, 
where two of those are the endpoint values, and two of those are booleans saying 
whether each of the endpoints is inside or outside the range/interval.


Also, if Postgres has some concept of type-generic special values -Inf and +Inf 
(which always sort before or after any other value in the type system), those 
can be used as endpoints to indicate that the interval is unbounded.


Unless you have some other syntax in mind, I suggest lifting the range literal 
syntax from Perl 6, where .. is an infix operator building a range between its 
arguments, and a ^ on either side means that side is open, I think; so there 
are 4 variants: {..,^..,..^,^..^}.


Now as to general usefulness of intervals ...

Any operation that wants to deal with a range somehow, such as the BETWEEN 
syntax, could instead use a range/interval; for example, both of:


  foo in 1..10

  foo between 1 and 10

... would mean the same thing, but the 1..10 can be replaced by an arbitrary 
value expression or variable reference.


Likewise with:

  date in start ..^ end

  date = start and date  end

... mean the same thing.

The LIMIT clause could take a range to specify take and skip count at once.

Array slicing can be done using foo[first..last] or such.

A random number generator that takes endpoints can take a range argument.

An array or relation of these range can represent ranges with holes, and the 
general results of range union operations.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-05 Thread Pavel Stehule

2011/6/6 Darren Duncan dar...@darrenduncan.net:
 Jeff Davis wrote:

 I'd like to take another look at Range Types and whether part of it
 should be an extension. Some of these issues relate to extensions in
 general, not just range types.

 First of all, what are the advantages to being in core?

it should be supported by FOREACH statement in PL/pgSQL

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-05 Thread Darren Duncan


Pavel Stehule wrote:

2011/6/6 Darren Duncan dar...@darrenduncan.net:

Jeff Davis wrote:

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?


it should be supported by FOREACH statement in PL/pgSQL


Yes, absolutely.  I know this feature is loved in Perl.  But this usage would 
only work for a more limited range of data types, namely those over which one 
can build a sequence generator, such as integers, because they have a 
next-value/prev-value function defined.  In other words, while range types in 
general would work for any ordered type, FOREACH would only work for the subset 
of those that are ordinal types. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo