Re: [HACKERS] Range Types and extensions

Darren Duncan Mon, 06 Jun 2011 14:43:22 -0700

Jeff Davis wrote:

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:

Jeff Davis wrote:
I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.
First of all, what are the advantages to being in core?
I believe that ranges aka intervals are widely useful generic types, next afterrelations/tuples/arrays, and they *should* be supported in core, same as arrays are.


I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

Generally speaking, the best way to go about this is to define the *generic*data type in the core, and leave most operators to extensions. So, in core, weneed to have the way to select a range value over ANYTYPE either completely as avalue literal or in terms of endpoint values from arbitrary expressions orvariables, store the range value in a database, retrieve it, and access itscomponent attributes (endpoints, open/closed) in user-defined constraint andoperator definitions.

The fundamental value of ranges is the fact that they're a concise way to storeand express an interval over an ordered type, and to either compare suchintervals or test whether individual values or sets of values are in intervals.And people do that a *lot* (such as with dates), so I see having this rangetype, which is generic and orthogonal to other types in the same way as arraysor tables are, in core just makes the most sense, and as previously illustrated,ranges are useful in places one might not always think about.

Ranges are also much more flexible than BETWEEN for what it does, because AFAIKyou can't indicate open or closed with BETWEEN.

You should not need to define separate range types or operators for each orderedtype, same as you should not have to do so for arrays, or where suchfunctionality is defined should be similar; whatever functionality for arraysyou do or don't define in core, do corresponding things for ranges.

Now assuming that a range/interval value is generally defined in terms of a pairof endpoints of some ordered type (that is, a type for which ORDER BY or RANK or{<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value iscapable of distinguishing open and closed intervals.
Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.


On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about rangesis actually being overloaded for 2 very distinct concepts, which are probablybest dealt with separately, possibly as distinct types.

This discussion came up in the development of Perl 6 too, and that discussion isprobably worth looking into.

Ranges/intervals in the general sense can *not* be used to enumerate a list ofvalues in a standard type-sensical manner, such as FOREACH requires.Ranges/intervals are about *comparison*, meaning combinations of tests of how 2arbitrary values of an ordered type sort relative to each other, and that's it.This usage works for integers, other numbers, strings, dates, and so on, allin a natural manner.


Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and areused in distinct contexts. Enumeration requires next/prev-value operators,while ranges/intervals in general do not. Enumeration requires discrete types(or the faking of such) like integers while ranges work for continuous types.

Moreover, in practice, one probably wants enumerations to be more flexible thanjust monotonic increases. With enumerations you'd probably want to start gotop-down or bottom-up, you might want to increase geometrically or by some otherformula rather than incrementally.

I totally agree with sharing syntax and using ranges/intervals to definesequence generators, but a range value should be considered immutable like anumber or string while a sequence generator may mutate.

For syntax, one could use "x..y" to define an interval while "x...y" for asequence generator, or that's what Perl 6 does.

See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics thattalks about how Perl 6 does ranges.

Also, if Postgres has some concept of type-generic special values -Inf and +Inf(which always sort before or after any other value in the type system), thosecan be used as endpoints to indicate that the interval is unbounded.
I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.


Well, what you have is the least one would want.

Unless you have some other syntax in mind, I suggest lifting the range literalsyntax from Perl 6, where ".." is an infix operator building a range between itsarguments, and a "^" on either side means that side is open, I think; so thereare 4 variants: {..,^..,..^,^..^}.
Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, "." is not valid in an operator name in PG.
Maybe I can use tilde or dash?

Can Pg be changed to support "." in operator names as long as they don't justappear by themselves? What would this break to do so?

Any operation that wants to deal with a range somehow, such as the BETWEENsyntax, could instead use a range/interval; for example, both of:
   foo in 1..10
I don't know if it's reasonable to introduce syntax like "in" here.
Maybe we could just still use "between" and it would recognize that the
RHS is a range?

I believe it is quite reasonable to treat ranges like sets, in an abstractsense, and so using set membership syntax like "in" is valid. Same as oneshould be able to use "in" to test whether a value is in an array. I wouldexpect "in" to be a polymorphic infix operator same as "<" or "=" etc are,aren't they? This shouldn't conflict with testing tuples in relations as theyare different types, same as you can use the same "<" for numbers and strings,can't you?


We could add parenthesis if that helps:

  foo in (1..10)

The LIMIT clause could take a range to specify take and skip count at once.


Interesting idea.

Array slicing can be done using foo[first..last] or such.


I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use ":" as an operator.


On that note, some languages use ":" for defining intervals rather than "..".

Some languages also use round parenthesis or curly braces to define intervals,but I really don't like that and we shouldn't use it.

A random number generator that takes endpoints can take a range argument.


Sounds useful because it would make it more explicit whether the
endpoints are possible results.


Exactly.

An array or relation of these range can represent ranges with holes, and thegeneral results of range union operations.
Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

I also see these as considerably less important and useful in practice than thecontinuous intervals. Facilities for discontinuous intervals could more easilybe left to extensions than those for continuous ones. I see the continuous asmore fundamental, at least in the same manner as seeing integers as morefundamental than rationals (you can define the latter with the former), thoughone could define things in the opposite manner too.

Regards,
        Jeff Davis


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

Reply via email to