Jeff Davis wrote:
On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
Jeff Davis wrote:
I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?
I believe that ranges aka intervals are widely useful generic types, next after relations/tuples/arrays, and they *should* be supported in core, same as arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

Generally speaking, the best way to go about this is to define the *generic* data type in the core, and leave most operators to extensions. So, in core, we need to have the way to select a range value over ANYTYPE either completely as a value literal or in terms of endpoint values from arbitrary expressions or variables, store the range value in a database, retrieve it, and access its component attributes (endpoints, open/closed) in user-defined constraint and operator definitions.

The fundamental value of ranges is the fact that they're a concise way to store and express an interval over an ordered type, and to either compare such intervals or test whether individual values or sets of values are in intervals. And people do that a *lot* (such as with dates), so I see having this range type, which is generic and orthogonal to other types in the same way as arrays or tables are, in core just makes the most sense, and as previously illustrated, ranges are useful in places one might not always think about.

Ranges are also much more flexible than BETWEEN for what it does, because AFAIK you can't indicate open or closed with BETWEEN.

You should not need to define separate range types or operators for each ordered type, same as you should not have to do so for arrays, or where such functionality is defined should be similar; whatever functionality for arrays you do or don't define in core, do corresponding things for ranges.

Now assuming that a range/interval value is generally defined in terms of a pair of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or {<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about ranges is actually being overloaded for 2 very distinct concepts, which are probably best dealt with separately, possibly as distinct types.

This discussion came up in the development of Perl 6 too, and that discussion is probably worth looking into.

Ranges/intervals in the general sense can *not* be used to enumerate a list of values in a standard type-sensical manner, such as FOREACH requires. Ranges/intervals are about *comparison*, meaning combinations of tests of how 2 arbitrary values of an ordered type sort relative to each other, and that's it. This usage works for integers, other numbers, strings, dates, and so on, all in a natural manner.

Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and are used in distinct contexts. Enumeration requires next/prev-value operators, while ranges/intervals in general do not. Enumeration requires discrete types (or the faking of such) like integers while ranges work for continuous types.

Moreover, in practice, one probably wants enumerations to be more flexible than just monotonic increases. With enumerations you'd probably want to start go top-down or bottom-up, you might want to increase geometrically or by some other formula rather than incrementally.

I totally agree with sharing syntax and using ranges/intervals to define sequence generators, but a range value should be considered immutable like a number or string while a sequence generator may mutate.

For syntax, one could use "x..y" to define an interval while "x...y" for a sequence generator, or that's what Perl 6 does.

See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that talks about how Perl 6 does ranges.

Also, if Postgres has some concept of type-generic special values -Inf and +Inf (which always sort before or after any other value in the type system), those can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

Well, what you have is the least one would want.

Unless you have some other syntax in mind, I suggest lifting the range literal syntax from Perl 6, where ".." is an infix operator building a range between its arguments, and a "^" on either side means that side is open, I think; so there are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, "." is not valid in an operator name in PG.
Maybe I can use tilde or dash?

Can Pg be changed to support "." in operator names as long as they don't just appear by themselves? What would this break to do so?

Any operation that wants to deal with a range somehow, such as the BETWEEN syntax, could instead use a range/interval; for example, both of:

   foo in 1..10

I don't know if it's reasonable to introduce syntax like "in" here.
Maybe we could just still use "between" and it would recognize that the
RHS is a range?

I believe it is quite reasonable to treat ranges like sets, in an abstract sense, and so using set membership syntax like "in" is valid. Same as one should be able to use "in" to test whether a value is in an array. I would expect "in" to be a polymorphic infix operator same as "<" or "=" etc are, aren't they? This shouldn't conflict with testing tuples in relations as they are different types, same as you can use the same "<" for numbers and strings, can't you?

We could add parenthesis if that helps:

  foo in (1..10)

The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use ":" as an operator.

On that note, some languages use ":" for defining intervals rather than "..".

Some languages also use round parenthesis or curly braces to define intervals, but I really don't like that and we shouldn't use it.

A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

Exactly.

An array or relation of these range can represent ranges with holes, and the general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

I also see these as considerably less important and useful in practice than the continuous intervals. Facilities for discontinuous intervals could more easily be left to extensions than those for continuous ones. I see the continuous as more fundamental, at least in the same manner as seeing integers as more fundamental than rationals (you can define the latter with the former), though one could define things in the opposite manner too.

Regards,
        Jeff Davis

-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to