Re: [HACKERS] WIP: generalized index constraints

2009-09-21 Thread Peter Eisentraut
On Sun, 2009-09-20 at 10:08 -0700, Jeff Davis wrote:
> On Sun, 2009-09-20 at 13:01 -0400, Tom Lane wrote:
> > The current infrastructure for deferred uniqueness requires that the
> > thing actually be a constraint, with an entry in pg_constraint that
> > can carry the deferrability options.  So unless we want to rethink
> > that, this might be a sufficient reason to override my arguments
> > about not wanting to use CONSTRAINT syntax.
> 
> Ok. Using the word EXCLUSION would hopefully guard us against future
> changes to SQL, but you know more about the subtle dangers of language
> changes than I do.
> 
> So, do I still omit it from information_schema?

I would say yes.

Overall, I think this terminology is pretty good now.  We could say,
PostgreSQL has a new constraint type, exclusion constraint.  It shares
common properties with other constraint types, e.g., deferrability (in
the future), ADD/DROP CONSTRAINT, etc.  But because the standard does
not describe exclusion constraints, they are not listed in the
information schema.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-20 Thread Tom Lane
Jeff Davis  writes:
> So, do I still omit it from information_schema?

My thought is yes --- any representation of it within information_schema
would be so inaccurate/incomplete as to be worse than useless, IMO.
Peter might have a different idea though ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-20 Thread Jeff Davis
On Sun, 2009-09-20 at 13:01 -0400, Tom Lane wrote:
> The current infrastructure for deferred uniqueness requires that the
> thing actually be a constraint, with an entry in pg_constraint that
> can carry the deferrability options.  So unless we want to rethink
> that, this might be a sufficient reason to override my arguments
> about not wanting to use CONSTRAINT syntax.

Ok. Using the word EXCLUSION would hopefully guard us against future
changes to SQL, but you know more about the subtle dangers of language
changes than I do.

So, do I still omit it from information_schema?

> As far as implementation goes, I think there would be very little
> choice but to use the insert-the-index-entry-first, check-later
> approach; so your ideas involving extra state in shared memory
> seem to fall to the ground anyhow.

True.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-20 Thread Tom Lane
BTW, I just thought of an issue that might change some of these
conclusions: what about supporting deferred constraint checking,
as we just recently managed to do for straight UNIQUE constraints?
I don't say that this has to be in the first cut, but it's something
we ought to try to leave room for down the road.

The current infrastructure for deferred uniqueness requires that the
thing actually be a constraint, with an entry in pg_constraint that
can carry the deferrability options.  So unless we want to rethink
that, this might be a sufficient reason to override my arguments
about not wanting to use CONSTRAINT syntax.

As far as implementation goes, I think there would be very little
choice but to use the insert-the-index-entry-first, check-later
approach; so your ideas involving extra state in shared memory
seem to fall to the ground anyhow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-20 Thread Jeff Davis
On Sun, 2009-09-20 at 12:31 -0400, Tom Lane wrote:
> > T1: inserts into index
> > T2: inserts into index
> > T1: checks index for conflicts, finds T2
> > T2: checks index for conflicts, finds T1
> 
> You get a deadlock failure, because both transactions will wait for each
> other.  So what?  It's an error in any case, and you can get a reported
> deadlock in constraint-enforcement scenarios today (conflicting FK
> changes, for instance).

Well, in theory, one of the transactions may have been destined to be
aborted later anyway, and the deadlock detector might kill the wrong
one. But I agree, perhaps I'm over-thinking this one.

Aside: I just realized that my solution to the deadlock problem won't
work with your simpler idea anyway. When reading the index we've long
since lost the information about the specific insert of the specific
command of the other transaction.

I'll make the change.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-20 Thread Tom Lane
Jeff Davis  writes:
> On Sat, 2009-09-19 at 18:00 -0400, Tom Lane wrote:
>> Well, you can't do it *exactly* the same way btree does, but what
>> I would envision is first insert the index tuple and then do a
>> dirty-snapshot search for conflicting tuples.  The interlock against
>> conflicting concurrent inserts doesn't need all this new infrastructure
>> you propose; just wait to see if conflicting transactions commit, same
>> as we do now.  And I do maintain that that sort of code has a high risk
>> of undetected bugs.

> How do you prevent deadlocks in the following case?

> T1: inserts into index
> T2: inserts into index
> T1: checks index for conflicts, finds T2
> T2: checks index for conflicts, finds T1

You get a deadlock failure, because both transactions will wait for each
other.  So what?  It's an error in any case, and you can get a reported
deadlock in constraint-enforcement scenarios today (conflicting FK
changes, for instance).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-20 Thread Jeff Davis
On Sat, 2009-09-19 at 23:15 -0400, Robert Haas wrote:
> I was wondering if we couldn't introduce a dummy tuple name similar to
> OLD and NEW, called, say, OTHER.  Then instead of writing a =, you
> could write a = OTHER.a ... or perhaps a = OTHER.b ... although that
> might also open the door to more things than you want to support at
> this point.

Interesting idea. At this point though, there is enough disagreement
over the language that I just want to take the least-invasive route that
has the lowest chance of causing a problem. It looks like ALTER INDEX
might be the path of least resistance.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Robert Haas
On Sat, Sep 19, 2009 at 2:51 PM, Jeff Davis  wrote:
> On Sat, 2009-09-19 at 14:05 -0400, Tom Lane wrote:
>> What about them?  It's not clear why you think this requires anything
>> special.
>
> >From a syntax standpoint, I need to represent one operator for every
> index column involved in the constraint. So, if there's a functional
> index on ((a||b)::circle), I clearly can't have an exclusion constraint
> like (a =, b =).
>
> I see two options:
>
>  1. ( ), where  is an expression over table attributes
>    that must have the exact signature as the expression for the index.
>  2. ( ), and then read the expression from the index

I was wondering if we couldn't introduce a dummy tuple name similar to
OLD and NEW, called, say, OTHER.  Then instead of writing a =, you
could write a = OTHER.a ... or perhaps a = OTHER.b ... although that
might also open the door to more things than you want to support at
this point.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Jeff Davis
On Sat, 2009-09-19 at 18:00 -0400, Tom Lane wrote:
> Well, you can't do it *exactly* the same way btree does, but what
> I would envision is first insert the index tuple and then do a
> dirty-snapshot search for conflicting tuples.  The interlock against
> conflicting concurrent inserts doesn't need all this new infrastructure
> you propose; just wait to see if conflicting transactions commit, same
> as we do now.  And I do maintain that that sort of code has a high risk
> of undetected bugs.

How do you prevent deadlocks in the following case?

T1: inserts into index
T2: inserts into index
T1: checks index for conflicts, finds T2
T2: checks index for conflicts, finds T1

We can't say "only wait if your xid is higher" because xid 200 may both
insert and check the index before xid 100 even inserts.

The way I solve this in my current patch is by assigning a sequence
number in a shared memory table for each insert. The sequence number
works because a higher sequence number will always be able to see a
lower sequence number's tuple, so we can safely say "only wait if you
have a higher sequence number".

I can tack the same solution onto your idea, but I would need to keep my
shared memory table and probably some other infrastructure. It may be
less complex than it is currently, however. Simpler ideas welcome.

And to clarify the syntax issue, I assume this means that:
  ((a||b)::circle &&)

would look for the column in the index that matches that expression, and
then use that attribute number when scanning the index? I'm OK with
that; I don't see a lot of obvious value in having separate expressions
for the constraint and the index (even if it did have value, it would
take some real creativity to find it ;)

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Tom Lane
Jeff Davis  writes:
> On Sat, 2009-09-19 at 16:43 -0400, Tom Lane wrote:
>> I don't understand why this isn't handled exactly the way unique
>> constraints are done now.  Frankly, the amount of added complexity you
>> propose below is enough to make me want to reject the patch forthwith;
>> given that it's going to be a relatively little-used feature, the bugs
>> are never going to be out of it completely if we do it like this.

> Unique constraints lock the index page while the insert is happening.
> How am I supposed to do that, when the conflicting values might be
> anywhere in the index (circles have no total order)?

Well, you can't do it *exactly* the same way btree does, but what
I would envision is first insert the index tuple and then do a
dirty-snapshot search for conflicting tuples.  The interlock against
conflicting concurrent inserts doesn't need all this new infrastructure
you propose; just wait to see if conflicting transactions commit, same
as we do now.  And I do maintain that that sort of code has a high risk
of undetected bugs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Jeff Davis
On Sat, 2009-09-19 at 16:43 -0400, Tom Lane wrote:
> I don't understand why this isn't handled exactly the way unique
> constraints are done now.  Frankly, the amount of added complexity you
> propose below is enough to make me want to reject the patch forthwith;
> given that it's going to be a relatively little-used feature, the bugs
> are never going to be out of it completely if we do it like this.

Unique constraints lock the index page while the insert is happening.
How am I supposed to do that, when the conflicting values might be
anywhere in the index (circles have no total order)?

It may sound complex, but it basically boils down to a two stage
process:
 1. test for conflicts with concurrently-inserting backends
 2. test for conflicts that already exist in the index (dirty or not)

I don't think that it's ridiculously complex. In fact, I think there are
relatively few scenarios that will make any real difference, and those
scenarios can be tested with gdb pretty thoroughly.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Tom Lane
Jeff Davis  writes:
> The design is that one backend needs to be able to see values being
> inserted by other backends before commit.

I don't understand why this isn't handled exactly the way unique
constraints are done now.  Frankly, the amount of added complexity you
propose below is enough to make me want to reject the patch forthwith;
given that it's going to be a relatively little-used feature, the bugs
are never going to be out of it completely if we do it like this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Jeff Davis
On Sat, 2009-09-19 at 15:26 -0400, Tom Lane wrote:
> I haven't read the patch, but this whole discussion sounds to me like
> it means you're trying to plug things in at the wrong level.  Indexes
> generally don't care where the values they are storing came from ---
> whether it's a simple column or a expression result, it's all the same
> to the index.  I don't see why that shouldn't be true for exclusion
> constraints too.

The design is that one backend needs to be able to see values being
inserted by other backends before commit. There are two ways I can see
to do this:

(a) have all concurrent inserters serialize doing something like:
  1. acquire exclusive LWLock
  2. search index for conflicts with dirty snapshot and recheck if
 necessary
  3. insert into index
  4. release exclusive LWLock

(b) do what I do now, which is to:
  1. acquire exlusive LWLock
  2. put self in table of concurrent inserters, along with TID of heap 
 tuple I'm inserting
  3. release exclusive LWLock
  4. acquire shared LWLock
  5. copy potential conflicts to local memory
  6. release shared LWLock
  7. test for real conflicts between my heap tuple and the potentially 
 conflicting heap tuple (which can be found by TID).
  8. search index with dirty snapshot for conflicts and recheck if 
 necessary
  9. insert tuple into index
 10. acquire exclusive LWLock
 11. remove self from table of concurrent inserters
 12. release exclusive LWLock

Design (b) offers better concurrency because all conflict testing, index
searching, and index insertion take place without a lock at all. So, I
chose design (b). This has been out there for quite a long time[1][2],
and if it is an unacceptable design I need to know soon in order for
this feature to make it.

However, the consequence of (b) is that ExecInsertIndexTuples needs to
know about the translation from a heap tuple to an index tuple so that
the conflicts can be checked.

> BTW, further betraying that I've not read the patch: what exactly are
> you doing about the information_schema views?  If we are treating these
> things as SQL constraints, one would expect them to show up in
> information_schema; but I don't see how to represent them there in any
> adequate fashion, even without the expression-index angle.

Nothing right now. I think they should just be omitted from
information_schema, which can only (almost by definition) represent the
lowest common denominator features.

> On the whole
> I think we'd be a lot better off to NOT consider them to be constraints,
> but just another option for CREATE INDEX.

You suggested allowing an ALTER TABLE representation[3] and that design
has floated around for quite some time as well.

ALTER TABLE also has a major advantage: multiple constraints can use the
same index. For instance, an index on (a, b, c) can be used to enforce
both (a =, b =) and (a =, c =). You can't do that with btree, and it
could be a powerful feature that might cause some people to choose my
mechanism for a regular UNIQUE constraint over btree's existing
uniqueness mechanism.

So, I actually switched over the ALTER TABLE as my primary syntactic
representation, and dropped the CREATE INDEX variant (I think that would
be worthwhile to bring back as an extra option, but I haven't yet). If I
need to drop ALTER TABLE, I need to know soon.

Regards,
Jeff Davis

[1] http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php
[2] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00302.php
[3] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00406.php




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Tom Lane
Jeff Davis  writes:
> On Sat, 2009-09-19 at 14:05 -0400, Tom Lane wrote:
>> What about them?  It's not clear why you think this requires anything
>> special.

>> From a syntax standpoint, I need to represent one operator for every
> index column involved in the constraint. So, if there's a functional
> index on ((a||b)::circle), I clearly can't have an exclusion constraint
> like (a =, b =).

> I see two options:

>  1. ( ), where  is an expression over table attributes 
> that must have the exact signature as the expression for the index.
>  2. ( ), and then read the expression from the index

You need to do (1), I think, because (2) seems to require using the
index column name.  We have generally felt that the names assigned
to index columns are implementation artifacts that the user ought not
rely on in SQL commands.

> and in either case, use that expression for the extra checking that I
> need to do: I need to check whether the input heap tuple conflicts with
> concurrently inserting heap tuples, and I also need to do a recheck
> step.

I haven't read the patch, but this whole discussion sounds to me like
it means you're trying to plug things in at the wrong level.  Indexes
generally don't care where the values they are storing came from ---
whether it's a simple column or a expression result, it's all the same
to the index.  I don't see why that shouldn't be true for exclusion
constraints too.

BTW, further betraying that I've not read the patch: what exactly are
you doing about the information_schema views?  If we are treating these
things as SQL constraints, one would expect them to show up in
information_schema; but I don't see how to represent them there in any
adequate fashion, even without the expression-index angle.  On the whole
I think we'd be a lot better off to NOT consider them to be constraints,
but just another option for CREATE INDEX.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Jeff Davis
On Sat, 2009-09-19 at 14:05 -0400, Tom Lane wrote:
> What about them?  It's not clear why you think this requires anything
> special.

>From a syntax standpoint, I need to represent one operator for every
index column involved in the constraint. So, if there's a functional
index on ((a||b)::circle), I clearly can't have an exclusion constraint
like (a =, b =).

I see two options:

 1. ( ), where  is an expression over table attributes 
that must have the exact signature as the expression for the index.
 2. ( ), and then read the expression from the index

and in either case, use that expression for the extra checking that I
need to do: I need to check whether the input heap tuple conflicts with
concurrently inserting heap tuples, and I also need to do a recheck
step.

#1 seems like extra work and complexity, because I need to test for the
correct signature (maybe that's not difficult), and that extra
flexibility is pretty marginal -- I can't think of an obvious case where
you'd want different expressions. Also, it complicates the simple case
of wanting the expressions to match.

#2 is awkward because the expression columns of an index have generated
names, and you would have to write things like (pg_expression_1 &&).
Also, it makes the constraint too tied to the index, which is a valid
complaint Peter had.

Perhaps you can point me in the right direction to see if two
expressions/functions have matching signatures? Or, if that is too much
of a pain, perhaps I should just test for equal expressions?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Tom Lane
Jeff Davis  writes:
> There's an important unresolved question with this patch that I need to
> address, which just came to light: what about functional/expression
> indexes?

What about them?  It's not clear why you think this requires anything
special.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-19 Thread Jeff Davis
I think we have a reasonable consensus around the name "operator
exclusion constraints", Robert Haas's suggestion. I am OK with that
name, and it got support from David Fetter and Tom Lane. As David Fetter
said, it's useful for the name to hint at the API.

Peter had some reasonable objections to that name, but the word "unique"
just doesn't cut it for this feature. My feature allows constraints
which are more restrictive than a unique constraint; but the final straw
was after a discussion with Tomás in which we determined that you can
also define constraints which are the opposite of unique: all values
must be the same (by using <> as the operator*).

I agree with Peter that we should support creating these constraints at
table creation time. This can be supported with the following syntax:

  CONSTRAINT foo_constr (a , ...)
{ USING INDEX foo_idx | USING method }

and it's also a more declarative syntax for the ALTER TABLE case, and
prevents a series of other problems that Peter pointed out.

There's an important unresolved question with this patch that I need to
address, which just came to light: what about functional/expression
indexes?

Say you have a table foo(a text, b text) and an index on:
  ((a||b)::circle)

You could define an operator constraint like:
  ((a||b)::circle &&)

and that would be sane. But I suppose I should also allow any expression
with the same signature, like:
  ((b||a)::circle &&)

[ not a very realistic example, but it seems like it may be useful ]

Does that make sense? Does someone have a better idea? Am I missing
other issues here?

How do I test if two functions/expressions:
  a. are identical?
  b. have matching signatures?

Regards,
Jeff Davis

*: Understandably, there is no strategy for <> for most data types.
However, if your constraint is that all values must be the same, it's
quite reasonable to add one and be able to use an index to quickly find
values that are different.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Sep 16, 2009 at 09:45:52AM -0700, Jeff Davis wrote:
> On Wed, 2009-09-16 at 15:11 +0200, to...@tuxteam.de wrote:
> > One question: does the operator have to be reflexive? I.e. "A op A holds
> > for all A"?
> 
> I don't think that reflexivity is a strict requirement. You could make
> this a constraint over a boolean attribute such that false conflicts
> with true and true conflicts with false. That would mean that your table
> would have to consist of either all false or all true.

Let me see whether I've understood this: more in general, op could be
"not equal" -- i.e. <>. Then, once one value was inserted into the
column, all other values would have to be equal.

[...]

> That's an interesting idea: "proximity constraint". I like it because
> (a) "proximity" might reasonably be considered a more general form of
> the word "unique", which might satisfy Peter's argument; (b) it conveys
> the use case; and (c) it sounds good.

Yes, "riding" on this geometric metaphor (distance), I was trying to
visualize relations which are symmetric but not refexive and came up
with things like

  "point X is at least d1 far, at most d2 far from Y"

i.e. Y are all points which are whithin a ring centered on Y, with inner
radius d1 and outer radius d2. Special cases would be d1=0, d2>0 (that's
conventional proximity) -- but there are weirder cases, as your example
above (d2 possibly infinite, d1 small, giving a "punctured space").

> There are a couple bizarre cases where "proximity" doesn't quite fit,
> like my boolean example above, but I'm OK with that.

Hmmm.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKsT88Bcgs9XrR2kYRAlmcAJ9+rP7AkimXRPoKGaBoJkthX2LzggCfTWst
KF/XMRouhlEbQcORaeoIbc0=
=BBEF
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Robert Haas
On Wed, Sep 16, 2009 at 3:14 AM, Peter Eisentraut  wrote:
> On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
>> Instead of calling these generalized index constraints, I wonder if we
>> oughtn't to be calling them something like "don't-overlap constraints"
>> (that's a bad name, but something along those lines).  They're not
>> really general at all, except compared to uniqueness constraints (and
>> they aren't called generalized unique-index constraints, just
>> generalized index constraints).
>
> What they should be called is generalized unique constraints, without
> reference to "index".  Because what they generalize is the operator by
> which uniqueness is determined.

Well, it should eventually be possible to use this feature to create
an index which excludes overlapping ranges  in fact, unless I
misunderstand, that's the principle likely use case.  Which is not
unique-ness at all.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Jeff Davis
On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
> I'm just getting started reviewing this version now.  I noticed that
> your patch seems to have been generated by git.

Ok, I now have a public git repo on git.postgresql.org, and I rebased my
patch before I pushed it.

See updates in my "generalized-constraints" branch. I have psql support
now, so it might be slightly easier to review.

The language and name of the feature are going through a little turmoil
right now, as you can see, so I'm trying to keep up with that. As that
settles down I'll improve the docs.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Jeff Davis
On Wed, 2009-09-16 at 10:14 +0300, Peter Eisentraut wrote:
> What they should be called is generalized unique constraints, without
> reference to "index".  Because what they generalize is the operator by
> which uniqueness is determined.

How about GUC, for short?  ;-)

Do you think that Tomás's suggestion of "proximity constraints" would
satisfy your requirement on the basis that "proximity" is a more general
form of the word "unique"?

> First, we have so far been fairly consistent to document that unique
> indexes are an implementation detail of unique constraints and should
> usually not be used directly.  This new approach basically reverses that
> and forces you to define your constraints by means of implementation
> details rather than a logical description.  There is nothing in this
> feature that makes it strikingly different from the existing constraint
> types in a way that would prevent a reasonable syntax for defining the
> constraint at table definition time.  Another problem this would lead to
> is that a say dump of a table definition wouldn't actually contain all
> the constraints that apply to the table anymore, because there might be
> additional stuff such as this that can't be expressed that way.

Those are all good points. I think ultimately we need to support this at
table creation time and make index specification optional.

We do need to allow the user to specify the index, however. An important
use case of this feature is defining an index on (A, B, C) and using it
to enforce UNIQUE(A,B) and UNIQUE(A,C).

> CREATE TABLE circles (c circle UNIQUE ON gist &&);

Should we use USING instead of ON to be consistent with CREATE INDEX?

Also, right now a UNIQUE by itself creates a btree. Let's say the user
has btree_gist installed and they declare (a =, b =) without specifying
the AM. Which one do we choose?

I suppose we can come up with a default order, like btree, hash, gist,
gin; and just choose the first one with a matching opclass.

Also, we should decide whether UNIQUE(a,b) should be shorthand for (a =,
b =), or whether they should be treated differently somehow. If so,
we'll need to come up with some kind of rules for how it determines that
UNIQUE chooses to use a btree with indisunique enforcement; and we need
to come up with a way to force it to choose the new enforcement
mechanism in my patch if the user wants to.

> CREATE TABLE data (a int UNIQUE ON btree =);

I think we should provide some way for the user to specify what
enforcement mechanism is used when multiple options are possible. In
this example should it use the existing mechanism or the new mechanism?

> ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
> circles_idx;
> 
> doesn't seem very intuitive about what is actually being constrained.
> For a while I was thinking that it was constraining the table to values
> that are in the index or something.  So using a word such as UNIQUE
> would help explain what is going on.

I'm still uncomfortable using the word UNIQUE. I see that you are taking
an approach similar to ORDER BY ... USING. However, I'm concerned
because we're using a special case word to describe the general feature,
and I think that's confusing.

On the other hand, it's hard to argue that (a &&, b =) is intuitive, so
I'll acquiesce if you get some agreement from someone else.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Jeff Davis
On Wed, 2009-09-16 at 15:11 +0200, to...@tuxteam.de wrote:
> One question: does the operator have to be reflexive? I.e. "A op A holds
> for all A"?

I don't think that reflexivity is a strict requirement. You could make
this a constraint over a boolean attribute such that false conflicts
with true and true conflicts with false. That would mean that your table
would have to consist of either all false or all true.

> I am thinking "proximity" or as you state above "similarity". May be
> this is a good metaphor, leading to a good name.

That's an interesting idea: "proximity constraint". I like it because
(a) "proximity" might reasonably be considered a more general form of
the word "unique", which might satisfy Peter's argument; (b) it conveys
the use case; and (c) it sounds good.

There are a couple bizarre cases where "proximity" doesn't quite fit,
like my boolean example above, but I'm OK with that.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Robert Haas
On Tue, Sep 15, 2009 at 7:02 PM, Joshua Tolley  wrote:
> On Tue, Sep 15, 2009 at 05:52:35PM -0400, Tom Lane wrote:
>> Jeff Davis  writes:
>> > On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
>> >> operator constraints
>> >> operator exclusion constraints
>> >> operator conflict constraints
>> >> conflict operator constraints
>> >> operator index constraints
>> >> index constraints
>> >> generalized index constraints
>> >> something else?
>>
>> > Just to add a couple more permutations of Robert Haas's suggestions:
>>
>> >  exclusion operator constraints
>> >  exclusive operator constraints
>>
>> To my ear, "operator exclusion constraints" or "exclusive operator
>> constraints" seem reasonable; the other permutations of that phrase
>> simply aren't good English.
>
> I was having a hard time coming up with a name that was adequately
> short-and-sweet, and still conveyed the idea of both "operator" and "index",
> which seems important so as to designate between these and the constraints
> we've had all along. Perhaps "indexed operator constraints"?

Really I suppose they are indexed operator exclusion constraints, but
that may be too wordy.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, Sep 15, 2009 at 10:28:28AM -0700, Jeff Davis wrote:
> On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
> > Uhh so what happens if I create an index constraint using the
> > +(integer, integer) operator?
> 
> You can use any operator that has an index search strategy. Overlaps is
> probably the most useful, but you could imagine other operators, like a
> bi-directional containment operator (either LHS is contained in RHS, or
> vice-versa).
> 
> You can also get creative and have a "similarity" operator that
> determines whether two tuples are "too similar". As long as it is
> symmetric, the feature will work.

One question: does the operator have to be reflexive? I.e. "A op A holds
for all A"?

I am thinking "proximity" or as you state above "similarity". May be
this is a good metaphor, leading to a good name.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKsOPyBcgs9XrR2kYRAhsUAJkBICYUMK0tDrycPbctiGF7YKI/9gCeLQzq
DPyAAkMgZJFn8BZ7P8119/g=
=lVz1
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Peter Eisentraut
On Tue, 2009-09-15 at 12:22 -0700, Jeff Davis wrote:
> I don't like using the word "unique" in the description, I think it
> only
> adds to the confusion.

It would emphasize that a unique constraint is a common special case of
the feature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-16 Thread Peter Eisentraut
On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
> Instead of calling these generalized index constraints, I wonder if we
> oughtn't to be calling them something like "don't-overlap constraints"
> (that's a bad name, but something along those lines).  They're not
> really general at all, except compared to uniqueness constraints (and
> they aren't called generalized unique-index constraints, just
> generalized index constraints).

What they should be called is generalized unique constraints, without
reference to "index".  Because what they generalize is the operator by
which uniqueness is determined.

Don't all of these have the same effect at the end?

CREATE TABLE data (a int UNIQUE);

CREATE TABLE data (a int);
CREATE UNIQUE INDEX data_a_idx ON data USING btree (a);

CREATE TABLE data (a int);
CREATE INDEX data_a_idx ON data USING btree (a);
ALTER TABLE data ADD CONSTRAINT a_unique (a =) USING INDEX data_a_idx;

Which brings me to two concerns about the syntax.

First, we have so far been fairly consistent to document that unique
indexes are an implementation detail of unique constraints and should
usually not be used directly.  This new approach basically reverses that
and forces you to define your constraints by means of implementation
details rather than a logical description.  There is nothing in this
feature that makes it strikingly different from the existing constraint
types in a way that would prevent a reasonable syntax for defining the
constraint at table definition time.  Another problem this would lead to
is that a say dump of a table definition wouldn't actually contain all
the constraints that apply to the table anymore, because there might be
additional stuff such as this that can't be expressed that way.

If you look at the example from the documentation,

CREATE TABLE circles(c circle);
CREATE INDEX circles_idx ON circles USING gist (c);
ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
circles_idx;

the only variable pieces of information that need to be provided to the
table are the index type and the operator.  So why not just write it
like this

CREATE TABLE circles (c circle UNIQUE ON gist &&);

and let that create the required index.

And then traditional unique constraints would fit into this as well:

CREATE TABLE data (a int UNIQUE ON btree =);

The other problem I see with the syntax is that

ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
circles_idx;

doesn't seem very intuitive about what is actually being constrained.
For a while I was thinking that it was constraining the table to values
that are in the index or something.  So using a word such as UNIQUE
would help explain what is going on.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Joshua Tolley
On Tue, Sep 15, 2009 at 05:52:35PM -0400, Tom Lane wrote:
> Jeff Davis  writes:
> > On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
> >> operator constraints
> >> operator exclusion constraints
> >> operator conflict constraints
> >> conflict operator constraints
> >> operator index constraints
> >> index constraints
> >> generalized index constraints
> >> something else?
> 
> > Just to add a couple more permutations of Robert Haas's suggestions:
> 
> >  exclusion operator constraints
> >  exclusive operator constraints
> 
> To my ear, "operator exclusion constraints" or "exclusive operator
> constraints" seem reasonable; the other permutations of that phrase
> simply aren't good English.

I was having a hard time coming up with a name that was adequately
short-and-sweet, and still conveyed the idea of both "operator" and "index",
which seems important so as to designate between these and the constraints
we've had all along. Perhaps "indexed operator constraints"?

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Tom Lane
Jeff Davis  writes:
> On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
>> operator constraints
>> operator exclusion constraints
>> operator conflict constraints
>> conflict operator constraints
>> operator index constraints
>> index constraints
>> generalized index constraints
>> something else?

> Just to add a couple more permutations of Robert Haas's suggestions:

>  exclusion operator constraints
>  exclusive operator constraints

To my ear, "operator exclusion constraints" or "exclusive operator
constraints" seem reasonable; the other permutations of that phrase
simply aren't good English.

I'm not tremendously happy with any of them though...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 14:42 -0700, Jeff Davis wrote:
> operator constraints
> operator exclusion constraints
> operator conflict constraints
> conflict operator constraints
> operator index constraints
> index constraints
> generalized index constraints
> something else?

Just to add a couple more permutations of Robert Haas's suggestions:

 exclusion operator constraints
 exclusive operator constraints

I also like those.

I think that using the word "operator" first makes it sound like the
operator is the thing being excluded, and adding "-based" makes it more
clear but it is too verbose.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 12:49 -0700, David Fetter wrote:
> > I like this much better. Maybe "index operator constraints" or "operator
> > index constraints"?
> 
> The word, "index" goes to implementation details, which may change.

Ok, let's vote on a name then:

operator constraints
operator exclusion constraints
operator conflict constraints
conflict operator constraints
operator index constraints
index constraints
generalized index constraints
something else?

Right now, I like "conflict operator constraints" for the long name
(e.g. feature title, long description in docs), and "operator
constraints" for short (e.g. in the code and some places in the docs).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread David Fetter
On Tue, Sep 15, 2009 at 12:22:46PM -0700, Jeff Davis wrote:
> On Tue, 2009-09-15 at 12:03 -0700, David Fetter wrote:
> > * "operator-based constraints"
> > A little math-ier, but talks about the API rather than details of
> > the server implementation.
> 
> I like this much better. Maybe "index operator constraints" or "operator
> index constraints"?

The word, "index" goes to implementation details, which may change.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Tom Lane
Jeff Davis  writes:
> On Tue, 2009-09-15 at 14:49 -0400, Tom Lane wrote:
>> Does it behave sanely for operators that are non-commutative, such
>> as '>'?  (I'm not even very sure that I know what "sanely" would be
>> in such a case.)

> If you try it, my current patch won't stop you. Maybe I should detect
> the fact that the commutator of an operator is not the operator itself,
> and throw an ERROR? Probably would be a good idea.

+1.  Otherwise people *will* try it, and then send us bug reports when
it doesn't behave sanely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 12:03 -0700, David Fetter wrote:
> Interesting :)  I take it op1..opN (it's opN, not op2, right?) need to
> commute?

Yeah, it's opN.

And they should commute, but my current patch won't stop you. I think I
should stop that though, it's pretty difficult to think of a good
use-case for that and there is all kinds of danger.

> * "generalized-uniqueness constraints"
> the hyphen disambiguates

I don't like using the word "unique" in the description, I think it only
adds to the confusion.

> * "operator-based constraints"
> A little math-ier, but talks about the API rather than details of
> the server implementation.

I like this much better. Maybe "index operator constraints" or "operator
index constraints"?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 14:49 -0400, Tom Lane wrote:
> Does it behave sanely for operators that are non-commutative, such
> as '>'?  (I'm not even very sure that I know what "sanely" would be
> in such a case.)

One of the requirements is commutativity (I called it "symmetry" in the
docs, for some reason, I will change that).

I haven't explored in too much detail, but using "x >" (or maybe its
"<" ?) would basically mean that you can only insert increasing values
of x. There most likely be some serious problems there, for instance, if
you HOT update an old tuple's "y" attribute, everything is fine; if you
cold update it you would get an error.

Not exactly intuitive, but if you have lots of other requirements about
how things are updated, then I suppose it might be useful to someone.

If you try it, my current patch won't stop you. Maybe I should detect
the fact that the commutator of an operator is not the operator itself,
and throw an ERROR? Probably would be a good idea.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Robert Haas
On Tue, Sep 15, 2009 at 3:03 PM, David Fetter  wrote:
> * "operator-based constraints"
>    A little math-ier, but talks about the API rather than details of
>    the server implementation.

Or operator-exclusion constraints?  Operator-based exclusion constraints?

I'm feeling exclusive.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread David Fetter
On Tue, Sep 15, 2009 at 11:31:48AM -0700, Jeff Davis wrote:
> On Tue, 2009-09-15 at 13:48 -0400, Robert Haas wrote:
> > So it allows us to create constraints of the following form?
> > 
> > For all A in the index, there exists no B in the index such that the
> > given operator (which must be a binary operator returning boolean)
> > holds of A and B.
> 
> Yes. And it's slightly more complicated for multi-column constraints:
> 
> For all tuples A in the index with attributes 1 to N, there exists no
> tuple B such that:
>A1 op1 B1 AND
>A2 op2 B2 AND
>...
>AN op2 BN
> 
> If all operators are "=", and the index implements searching on
> equality, it's semantically equivalent to a unique index.

Interesting :)  I take it op1..opN (it's opN, not op2, right?) need to
commute?

> > These are certainly less common requirements than what you're
> > talking about here, and I don't think it's important to try to
> > support them - at least not at this point - but the word
> > "generalized" doesn't give me a clue that I won't be able to do
> > those things but I will be able to make an index that prevents my
> > users from handing out duplicate IP blocks.
> 
> As far as the name goes, the best I've got so far are "index
> constraints" and "generalized index constraints". I'm happy to change
> the name if you have a reasonable suggestion.

Here's a couple:

* "generalized-uniqueness constraints"
the hyphen disambiguates

* "operator-based constraints"
A little math-ier, but talks about the API rather than details of
the server implementation.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Tom Lane
Jeff Davis  writes:
> On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
>> Uhh so what happens if I create an index constraint using the
>> +(integer, integer) operator?

> You can use any operator that has an index search strategy. Overlaps is
> probably the most useful, but you could imagine other operators, like a
> bi-directional containment operator (either LHS is contained in RHS, or
> vice-versa).

Does it behave sanely for operators that are non-commutative, such
as '>'?  (I'm not even very sure that I know what "sanely" would be
in such a case.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 13:48 -0400, Robert Haas wrote:
> So it allows us to create constraints of the following form?
> 
> For all A in the index, there exists no B in the index such that the
> given operator (which must be a binary operator returning boolean)
> holds of A and B.

Yes. And it's slightly more complicated for multi-column constraints:

For all tuples A in the index with attributes 1 to N, there exists no
tuple B such that:
   A1 op1 B1 AND
   A2 op2 B2 AND
   ...
   AN op2 BN

If all operators are "=", and the index implements searching on
equality, it's semantically equivalent to a unique index.

> 
> If that's correct, I think we should definitely at least mention the
> word "overlap" somewhere in the documentation, because that's what
> people are going to want to use it for, and it's hard to conceptualize
> without examples, at least for me.  You may already be doing this, I
> haven't read the patch.

My current example uses "overlaps", but I will expand the documentation
to provide more clarity.

> Also, there are certainly other things you could want to do that can't
> be handled by this approach.  Perhaps you'd like to create a
> constraint that a given value can appear at most twice, or a two
> column index (A, B) such that for any A the smallest value of B is
> less than A.

The first is a good example, and actually I think that could be an
add-on to my patch without much difficulty.

The second can't be enforced with an index in nearly the same way
because deleting a tuple could violate the constraint. Also, it seems
like it would be hard to express that kind of constraint. But I agree
that, in principle, it is an index-enforceable constraint.

> These are certainly less common requirements than what
> you're talking about here, and I don't think it's important to try to
> support them - at least not at this point - but the word "generalized"
> doesn't give me a clue that I won't be able to do those things but I
> will be able to make an index that prevents my users from handing out
> duplicate IP blocks.

As far as the name goes, the best I've got so far are "index
constraints" and "generalized index constraints". I'm happy to change
the name if you have a reasonable suggestion.

I don't think the word "generalized" implies that it can do absolutely
anything possible. For the list discussion, I think it's appropriate to
use the term "generalized", because my patch generalizes index
constraints. However, I agree that we shouldn't use that too much in the
code/docs because someone might think of something more general later.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Robert Haas
On Tue, Sep 15, 2009 at 1:28 PM, Jeff Davis  wrote:
> On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
>> Uhh so what happens if I create an index constraint using the
>> +(integer, integer) operator?
>
> You can use any operator that has an index search strategy. Overlaps is
> probably the most useful, but you could imagine other operators, like a
> bi-directional containment operator (either LHS is contained in RHS, or
> vice-versa).
>
> You can also get creative and have a "similarity" operator that
> determines whether two tuples are "too similar". As long as it is
> symmetric, the feature will work.

So it allows us to create constraints of the following form?

For all A in the index, there exists no B in the index such that the
given operator (which must be a binary operator returning boolean)
holds of A and B.

If that's correct, I think we should definitely at least mention the
word "overlap" somewhere in the documentation, because that's what
people are going to want to use it for, and it's hard to conceptualize
without examples, at least for me.  You may already be doing this, I
haven't read the patch.

Also, there are certainly other things you could want to do that can't
be handled by this approach.  Perhaps you'd like to create a
constraint that a given value can appear at most twice, or a two
column index (A, B) such that for any A the smallest value of B is
less than A.  These are certainly less common requirements than what
you're talking about here, and I don't think it's important to try to
support them - at least not at this point - but the word "generalized"
doesn't give me a clue that I won't be able to do those things but I
will be able to make an index that prevents my users from handing out
duplicate IP blocks.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 13:16 -0400, Robert Haas wrote:
> Uhh so what happens if I create an index constraint using the
> +(integer, integer) operator?

You can use any operator that has an index search strategy. Overlaps is
probably the most useful, but you could imagine other operators, like a
bi-directional containment operator (either LHS is contained in RHS, or
vice-versa).

You can also get creative and have a "similarity" operator that
determines whether two tuples are "too similar". As long as it is
symmetric, the feature will work.

Or just use wrap random() in an operator and see what happens ;)

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Brendan Jurd
2009/9/16 Robert Haas :
> On Tue, Sep 15, 2009 at 1:14 PM, Brendan Jurd  wrote:
>> 2009/9/16 Robert Haas :
>>> On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis  wrote:
 I don't want to call them "don't overlap constraints", because it's not
 limited to a non-overlapping constraint.
>>>
>>> Oh.  What else can you do with it?
>>
>> Anything that there is an operator for.
>
> Uhh so what happens if I create an index constraint using the
> +(integer, integer) operator?

Okay, so my first answer was a simplification.  You can use any
operator that has an appropriate index strategy entry.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Robert Haas
On Tue, Sep 15, 2009 at 1:14 PM, Brendan Jurd  wrote:
> 2009/9/16 Robert Haas :
>> On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis  wrote:
>>> I don't want to call them "don't overlap constraints", because it's not
>>> limited to a non-overlapping constraint.
>>
>> Oh.  What else can you do with it?
>
> Anything that there is an operator for.

Uhh so what happens if I create an index constraint using the
+(integer, integer) operator?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Brendan Jurd
2009/9/16 Robert Haas :
> On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis  wrote:
>> I don't want to call them "don't overlap constraints", because it's not
>> limited to a non-overlapping constraint.
>
> Oh.  What else can you do with it?

Anything that there is an operator for.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Robert Haas
On Tue, Sep 15, 2009 at 12:54 PM, Jeff Davis  wrote:
> On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
>> Instead of calling these generalized index constraints, I wonder if we
>> oughtn't to be calling them something like "don't-overlap constraints"
>> (that's a bad name, but something along those lines).  They're not
>> really general at all, except compared to uniqueness constraints (and
>> they aren't called generalized unique-index constraints, just
>> generalized index constraints).
>
> What would you like to be able to enforce using an index that can't be
> solved by this patch? It only works for constraints entirely within a
> single table, can you think of a way to express that better?
>
> In the code/docs, mostly I call them just "index constraints" or some
> variation thereof. But for the lists, I think that might be too vague.
>
> I don't want to call them "don't overlap constraints", because it's not
> limited to a non-overlapping constraint.

Oh.  What else can you do with it?

> I also don't think "generalized
> unique-index constraints" is a good name: it's confusing and it makes it
> sound like it is some new way to use a unique index.

I agree.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
> Instead of calling these generalized index constraints, I wonder if we
> oughtn't to be calling them something like "don't-overlap constraints"
> (that's a bad name, but something along those lines).  They're not
> really general at all, except compared to uniqueness constraints (and
> they aren't called generalized unique-index constraints, just
> generalized index constraints).

What would you like to be able to enforce using an index that can't be
solved by this patch? It only works for constraints entirely within a
single table, can you think of a way to express that better?

In the code/docs, mostly I call them just "index constraints" or some
variation thereof. But for the lists, I think that might be too vague.

I don't want to call them "don't overlap constraints", because it's not
limited to a non-overlapping constraint. I also don't think "generalized
unique-index constraints" is a good name: it's confusing and it makes it
sound like it is some new way to use a unique index.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Brendan Jurd
2009/9/16 Robert Haas :
> Instead of calling these generalized index constraints, I wonder if we
> oughtn't to be calling them something like "don't-overlap constraints"
> (that's a bad name, but something along those lines).  They're not
> really general at all, except compared to uniqueness constraints (and
> they aren't called generalized unique-index constraints, just
> generalized index constraints).

Well "generalized index constraints" is what we're calling the patch,
but I don't think they are called by that name anywhere in the
proposed documentation changes.  In the extension to ALTER TABLE
syntax, they are simply referred to as "index_constraint".

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Robert Haas
On Tue, Sep 15, 2009 at 12:18 PM, Jeff Davis  wrote:
> On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
>> I'm just getting started reviewing this version now.  I noticed that
>> your patch seems to have been generated by git.  Are you hosting this
>> work on a public repo somewhere that I can pull from?
>
> I just requested a public repo. I will publish there as soon as its
> approved.
>
>> Also I think
>> the committers generally prefer context diffs (pipe it through
>> "filterdiff --format=context --strip=1") in submissions.
>
> Thanks, I will do that for my future patch submissions.
>
>> Regarding the documentation updates, I think you might want to add
>> some commentary to Chapter 11: Indexes -- perhaps add a new section
>> after 11.6 Unique Indexes to talk about general index constraints,
>> and/or update the wording of 11.6 to reflect your changes.
>
> Will do.
>
>> My eyes started to cross in the second sentence.  "Detect conflicts
>> symmetrically"?  I have actually *used* this feature successfully in
>> testing the patch, and I still don't know quite what to make of that
>> phrase.  You might need to dumb it down.
>
> Will do.
>
>> It might also be good to be a bit more explicit about the way the
>> choice of operators works.  It is the inverse of the logic used to
>> express an ordinary value constraint.  E.g., when you use the equality
>> operator in an index constraint you are in effect saying that new rows
>> MUST NOT satisfy this operator for any existing rows.
>
> I'll include that, thanks.
>
> I appreciate the quick feedback; I'll make these changes tonight.

Instead of calling these generalized index constraints, I wonder if we
oughtn't to be calling them something like "don't-overlap constraints"
(that's a bad name, but something along those lines).  They're not
really general at all, except compared to uniqueness constraints (and
they aren't called generalized unique-index constraints, just
generalized index constraints).

I didn't realize understand what this was all for until I read Brendan's review.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 08:08 -0600, Joshua Tolley wrote:
> Perhaps the tuple that caused the violation as well, like UNIQUE index
> violations already do? Even if we know what constraint has been tripped, we
> might not know what value did it.

Or, even better, include both tuples. With these new constraints the
conflicting tuples aren't necessarily equal.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 23:21 +1000, Brendan Jurd wrote:
> How about also including the name of the constraint (or index) that
> was violated?  I could imagine this error message being frustrating
> for someone who had a table with multiple index constraints, as they
> wouldn't know which one had raised the conflict.

Yes, that makes sense. As Joshua Tolley mentions, I'll also include the
tuples that caused the conflict.

> Also, the DETAIL part should be written as a full sentence with
> leading capital and full stop [1], see

Oh, I haven't seen that document before. Thanks.

> postgres=# alter table circles add constraint circles_overlap (c <->)
> using index circle_idx;
> ERROR:  no strategy found for operator 1520 in operator family 2595
> 
> The error message is pretty unfriendly, but I'm ambivalent about
> whether it's worth doing anything about this particular case.

I think I could make that error a little better by providing a detail
message explaining what the operator should be able to do.

> One of the comments I made in my original review [2] was that "\d" in
> psql should show the constraint.  I don't think you've addressed this
> in the current version.

I have psql on my list along with pg_dump, but unfortunately I haven't
done either yet. I don't think it will take too much work, so I'll fix
this as soon as I can.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
> I'm just getting started reviewing this version now.  I noticed that
> your patch seems to have been generated by git.  Are you hosting this
> work on a public repo somewhere that I can pull from?

I just requested a public repo. I will publish there as soon as its
approved.

> Also I think
> the committers generally prefer context diffs (pipe it through
> "filterdiff --format=context --strip=1") in submissions.

Thanks, I will do that for my future patch submissions.

> Regarding the documentation updates, I think you might want to add
> some commentary to Chapter 11: Indexes -- perhaps add a new section
> after 11.6 Unique Indexes to talk about general index constraints,
> and/or update the wording of 11.6 to reflect your changes.

Will do.

> My eyes started to cross in the second sentence.  "Detect conflicts
> symmetrically"?  I have actually *used* this feature successfully in
> testing the patch, and I still don't know quite what to make of that
> phrase.  You might need to dumb it down.

Will do.

> It might also be good to be a bit more explicit about the way the
> choice of operators works.  It is the inverse of the logic used to
> express an ordinary value constraint.  E.g., when you use the equality
> operator in an index constraint you are in effect saying that new rows
> MUST NOT satisfy this operator for any existing rows.

I'll include that, thanks.

I appreciate the quick feedback; I'll make these changes tonight.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Joshua Tolley
On Tue, Sep 15, 2009 at 11:21:14PM +1000, Brendan Jurd wrote:
> 2009/9/15 Jeff Davis :
> > Attached is the latest version.
> >
> 
> The new error message for a conflict is:
> 
> ERROR:  index constraint violation detected
> DETAIL:  tuple conflicts with existing data
> 
> How about also including the name of the constraint (or index) that
> was violated?  I could imagine this error message being frustrating
> for someone who had a table with multiple index constraints, as they
> wouldn't know which one had raised the conflict.

Perhaps the tuple that caused the violation as well, like UNIQUE index
violations already do? Even if we know what constraint has been tripped, we
might not know what value did it.

j...@josh# create table a (a integer);
j...@josh*# create unique index a_unique on a (a);
j...@josh*# insert into a values (1), (2), (3);
j...@josh*# insert into a values (8), (3), (4);
ERROR:  duplicate key value violates unique constraint "a_unique"
DETAIL:  Key (a)=(3) already exists.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Brendan Jurd
2009/9/15 Jeff Davis :
> Attached is the latest version.
>

The new error message for a conflict is:

ERROR:  index constraint violation detected
DETAIL:  tuple conflicts with existing data

How about also including the name of the constraint (or index) that
was violated?  I could imagine this error message being frustrating
for someone who had a table with multiple index constraints, as they
wouldn't know which one had raised the conflict.

Also, the DETAIL part should be written as a full sentence with
leading capital and full stop [1], see

I deliberately tried to create an index constraint using a bogus
operator, to see what would happen:

postgres=# alter table circles add constraint circles_overlap (c <->)
using index circle_idx;
ERROR:  no strategy found for operator 1520 in operator family 2595

The error message is pretty unfriendly, but I'm ambivalent about
whether it's worth doing anything about this particular case.

One of the comments I made in my original review [2] was that "\d" in
psql should show the constraint.  I don't think you've addressed this
in the current version.

Cheers,
BJ

[1] http://www.postgresql.org/docs/current/static/error-style-guide.html
[2] 
http://archives.postgresql.org/message-id/37ed240d090715w7ccfc13i8ce8d11a0c51...@mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Brendan Jurd
2009/9/15 Jeff Davis :
> Attached is the latest version.

Hi Jeff,

I'm just getting started reviewing this version now.  I noticed that
your patch seems to have been generated by git.  Are you hosting this
work on a public repo somewhere that I can pull from?  Also I think
the committers generally prefer context diffs (pipe it through
"filterdiff --format=context --strip=1") in submissions.

Regarding the documentation updates, I think you might want to add
some commentary to Chapter 11: Indexes -- perhaps add a new section
after 11.6 Unique Indexes to talk about general index constraints,
and/or update the wording of 11.6 to reflect your changes.

The paragraph explaining index_constraints in ALTER TABLE may be a
little bit confusing.


ADD index_constraint

This form adds a new index constraint to the table which is
enforced by the given index. The operator provided must be usable as a
search strategy for the index, and it also must detect conflicts
symmetrically. The semantics are similar to a unique index but it
opens up new possibilities. A unique index can only detect conflicts
when the two values are equal, and that behavior is equivalent to an
index constraint where all operators are the equality operator.

However, an index constraint allows you to use other operators as
well, such as the overlaps operator provided for the circle data type.
The index constraint will ensure that no two circles overlap. See
example below.


My eyes started to cross in the second sentence.  "Detect conflicts
symmetrically"?  I have actually *used* this feature successfully in
testing the patch, and I still don't know quite what to make of that
phrase.  You might need to dumb it down.

It might also be good to be a bit more explicit about the way the
choice of operators works.  It is the inverse of the logic used to
express an ordinary value constraint.  E.g., when you use the equality
operator in an index constraint you are in effect saying that new rows
MUST NOT satisfy this operator for any existing rows.

I'll continue reviewing and post more comments on the code itself shortly.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-15 Thread Jeff Davis
On Sun, 2009-09-13 at 19:08 +1000, Brendan Jurd wrote:
> Any update on this patch?

Attached is the latest version.

Changes:

 * Merged with HEAD
 * Changed from storing the information in pg_index to pg_constraint. 
   This required rewriting a large portion of the patch, so it's not a 
   clean diff from the previous patch.
 * Implemented the language using ALTER TABLE to add the constraint, as 
   discussed (with a slight change to avoid the extra "INDEX" keyword).
 * Added docs
 * Added tests
 * All relations with relindconstraints == 0 do not pass through the 
   index constraints enforcement code at all. I did this a little 
   differently than what I laid out in the design, but the idea is the 
   same and it should avoid any problems.

That's the good news. The bad news:

 * No pg_dump support yet (shouldn't be too hard)
 * Creating a new constraint does not check the existing data for 
   conflicts. 
 * Doesn't work like the new deferrable unique constraints yet (also 
   shouldn't be too hard).
 * I didn't make any changes to the behavior of LIKE (also not hard).
 * Can't be specified at CREATE INDEX time. I don't think this is a 
   showstopper, and it will take some significant effort. The advantage 
   of allowing this is that the constraints can be checked more quickly 
   (in theory) while building the index, and it also might be handy 
   shorthand. However, it suffers from a number of problems:
 1. Extra syntax that is almost entirely redundant.
 2. More work.
 3. The performance gains will probably be pretty marginal. We have 
to do N index scans anyway; the savings would only be due to 
the better caching impact and the fact that the index in the 
process of being built is smaller than an already-built index.
   So, right now I'm not in a hurry to fix this last point.

I realize that some of the things missing make the patch uncomittable in
its current form. However, I would still appreciate a review of what I
have ready.

Regards,
Jeff Davis


generalized-index-constraints-20090915.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-14 Thread Jeff Davis
On Sun, 2009-09-13 at 19:08 +1000, Brendan Jurd wrote:
> The September CF starts in a couple of days, so this patch is in
> danger of missing the boat.

Thanks for keeping track. I accomplished a significant amount today, so
there's still hope for 9/15.

I will most likely just focus on the core functionality so that I have
something complete and reviewable.

> The unresolved points seem to be:
> 
>  * What to do about INCLUDING INDEXES EXCLUDING CONSTRAINTS --
> Postgres gets this wrong for unique indexes currently.  Should we
> persist with the existing behaviour or fix it as part of this patch?
> My personal feeling was +1 for fixing it in this patch.

I don't think that it should make a difference whether "EXCLUDING
CONSTRAINTS" is specified or omitted. There is no "[INCLUDING|EXCLUDING]
CONSTRAINTS" option in the standard, but for the other LIKE options,
EXCLUDING is implied when INCLUDING is not specified.

So, I think we have to make a decision:
 1. If INCLUDING CONSTRAINTS is specified, but not INCLUDING INDEXES,
do we: copy the indexes silently; or emit a nice message; or throw 
an ERROR?
 2. What if INCLUDING INDEXES is specified, but not INCLUDING 
CONSTRAINTS?

>  * Should we emit some sort of message when the user specifies
> INCLUDING INDEXES or INCLUDING CONSTRAINTS but not both?  I didn't
> have strong feelings about this one but there was some differing
> thoughts about what log level to use.  I thought NOTICE but Alvaro
> reckons we've got too many of those already.  Tom mentioned the
> suggested (but unimplemented) NOVICE level, which seems like a good
> move but doesn't resolve the problem of what to do in this patch.  One
> option would be to add a message at the NOTICE level with a TODO to
> downgrade it to NOVICE if/when that becomes available.

I don't think either of these things are a huge amount of work; they are
mostly just decisions that need to be made. I'll start off implementing
whatever is easiest/cleanest, and we'll continue the discussion.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-09-13 Thread Brendan Jurd
2009/8/21 Brendan Jurd :
> 2009/8/21 Jeff Davis :
>> On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
>>> The current behaviour seems to be predicated on the unique constraint
>>> being an integral part of the index itself.  While this might be true
>>> from a system catalog point of view (pg_index.indisunique), if a user
>>> says that they want to copy a table's structure INCLUDING INDEXES
>>> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
>>> clear.  They'd expect it to create an index sans the unique
>>> constraint.  Ignoring the user's intention and copying the index as-is
>>> (including the unique constraint) would be unfriendly.
>>
>> I don't have strong feelings either way. I think that's probably a
>> separate patch, and a fairly small patch.
>>
>
> Yeah, as I was writing the above I was thinking that it might end up a
> separate patch.  However I was also concerned that it might be less
> disruptive if we implement your patch with the less-astonishing
> behaviour and fix the unique index case in passing, than to commit
> your patch with the bad behavior and then fix both.
>
> Up to you.
>

Hi Jeff,

Any update on this patch?  The discussion appeared to trail off around
21 Aug with some inconclusive thoughts about handling the corner cases
in CREATE TABLE LIKE.

The September CF starts in a couple of days, so this patch is in
danger of missing the boat.

The unresolved points seem to be:

 * What to do about INCLUDING INDEXES EXCLUDING CONSTRAINTS --
Postgres gets this wrong for unique indexes currently.  Should we
persist with the existing behaviour or fix it as part of this patch?
My personal feeling was +1 for fixing it in this patch.

 * Should we emit some sort of message when the user specifies
INCLUDING INDEXES or INCLUDING CONSTRAINTS but not both?  I didn't
have strong feelings about this one but there was some differing
thoughts about what log level to use.  I thought NOTICE but Alvaro
reckons we've got too many of those already.  Tom mentioned the
suggested (but unimplemented) NOVICE level, which seems like a good
move but doesn't resolve the problem of what to do in this patch.  One
option would be to add a message at the NOTICE level with a TODO to
downgrade it to NOVICE if/when that becomes available.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-22 Thread David Fetter
On Fri, Aug 21, 2009 at 12:23:15PM +1000, Brendan Jurd wrote:
> 2009/8/21 Jeff Davis :
> > If they include indexes and not constraints, I think we should
> > follow the same policy as unique constraints, and create the index
> > and the constraint.
> >
> > The behavior seems a little strange to me, but that's the current
> > behavior for unique indexes.
> 
> This may be an opportunity to fix it.
> 
> The current behaviour seems to be predicated on the unique
> constraint being an integral part of the index itself.  While this
> might be true from a system catalog point of view
> (pg_index.indisunique), if a user says that they want to copy a
> table's structure INCLUDING INDEXES EXCLUDING CONSTRAINTS then IMO
> they've made their intention perfectly clear.  They'd expect it to
> create an index sans the unique constraint.  Ignoring the user's
> intention and copying the index as-is (including the unique
> constraint) would be unfriendly.
> 
> Unless the SQL spec demands that we do so?

SQL:2008, like its predecessors, does not mention indexing at all.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-21 Thread Tom Lane
Alvaro Herrera  writes:
> NOTICEs is what we do with index creation on primary key, unique
> indexes, and sequences on serial columns, and I think they are seen as
> just noise by everyone except novices.  Do we want to add more?

> Maybe they should be INFO, so that they are shown to the client but not
> sent to the server log.

There was some discussion awhile back of creating a new NOVICE message
level.  INFO strikes me as a completely bad idea here, because it would
actually make it harder for non-novices to suppress the messages.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-21 Thread Alvaro Herrera
Brendan Jurd escribió:

> I would be fine with a NOTICE in the former case, so something like
> this would be cool
> 
> # CREATE TABLE foo (LIKE bar INCLUDING INDEXES);
> NOTICE: INCLUDING INDEXES will also include any constraints on those indexes.
> HINT: Specify EXCLUDING CONSTRAINTS to omit them.
> 
> To my mind the severity is similar to such notices as "NOTICE: CREATE
> TABLE / UNIQUE will create implicit index ...".
> 
> i.e., "this is probably what you wanted us to do, but just in case you
> weren't expecting this side-effect, we're letting you know about it".

NOTICEs is what we do with index creation on primary key, unique
indexes, and sequences on serial columns, and I think they are seen as
just noise by everyone except novices.  Do we want to add more?

Maybe they should be INFO, so that they are shown to the client but not
sent to the server log.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-21 Thread Dimitri Fontaine

Hi,

Le 21 août 09 à 06:04, Jeff Davis a écrit :

There is not much of a problem with backwards compatibility. LIKE is
shorthand (not stored in catalogs), so it doesn't affect
pg_dump/restore. And hopefully there aren't a lot of apps out there
creating tables dynamically using the LIKE syntax.


I for one use this a lot, every time I'm doing partitioning. What I do  
is a plpgsql function creating partitions for a given period  
(create_parts(date, date) and default interval with  
create_parts(date)), and the function will EXECUTE something like this:


  CREATE TABLE schema.partition_MM (
LIKE schema.parent INCLUDING DEFAULTS INCLUDING INDEXES INCLUDING  
CONSTRAINTS,

CHECK ( partition check expression )
  )
  INHERITS( schema.parent );

The reason to do this is that inherits won't care at all about the  
indexes, defaults and constraints. The drawback to doing it this way  
is the cheer number of NOTICEs you get back at inherits time when PG  
is so verbose about finding that child already has all the parents  
columns. From 8.3 onwards it's possible to trick the system though:


  CREATE FUNCTION ... ()
   RETURNS ...
   LANGUAGE plpgsql
   SET client_min_messages TO warning
  AS $$
  $$;

Regards,
--
dim


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Brendan Jurd
2009/8/21 Jeff Davis :
> On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
>> The current behaviour seems to be predicated on the unique constraint
>> being an integral part of the index itself.  While this might be true
>> from a system catalog point of view (pg_index.indisunique), if a user
>> says that they want to copy a table's structure INCLUDING INDEXES
>> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
>> clear.  They'd expect it to create an index sans the unique
>> constraint.  Ignoring the user's intention and copying the index as-is
>> (including the unique constraint) would be unfriendly.
>
> I don't have strong feelings either way. I think that's probably a
> separate patch, and a fairly small patch.
>

Yeah, as I was writing the above I was thinking that it might end up a
separate patch.  However I was also concerned that it might be less
disruptive if we implement your patch with the less-astonishing
behaviour and fix the unique index case in passing, than to commit
your patch with the bad behavior and then fix both.

Up to you.

> Using the principle of least surprise, if a user specified one of
> INDEXES or CONSTRAINTS, but not both, and there is a unique index, we
> should raise an ERROR (or at least a WARNING).

Actually for what it's worth I would expect INCLUDING INDEXES (with no
mention of what to do about constraints) to go ahead and include
constraints on indexes.  I only have strong feelings where the user
has gone to the trouble of explicitly stating that they want indexes
but *not* constraints and we include the constraints anyway.

I would be fine with a NOTICE in the former case, so something like
this would be cool

# CREATE TABLE foo (LIKE bar INCLUDING INDEXES);
NOTICE: INCLUDING INDEXES will also include any constraints on those indexes.
HINT: Specify EXCLUDING CONSTRAINTS to omit them.

To my mind the severity is similar to such notices as "NOTICE: CREATE
TABLE / UNIQUE will create implicit index ...".

i.e., "this is probably what you wanted us to do, but just in case you
weren't expecting this side-effect, we're letting you know about it".

>
> There is not much of a problem with backwards compatibility. LIKE is
> shorthand (not stored in catalogs), so it doesn't affect
> pg_dump/restore. And hopefully there aren't a lot of apps out there
> creating tables dynamically using the LIKE syntax.
>

Well, it wouldn't surprise me if people are using LIKE to produce
tables for partitioning arrangements.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Jeff Davis
On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
> This may be an opportunity to fix it.
> 
> The current behaviour seems to be predicated on the unique constraint
> being an integral part of the index itself.  While this might be true
> from a system catalog point of view (pg_index.indisunique), if a user
> says that they want to copy a table's structure INCLUDING INDEXES
> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
> clear.  They'd expect it to create an index sans the unique
> constraint.  Ignoring the user's intention and copying the index as-is
> (including the unique constraint) would be unfriendly.

I don't have strong feelings either way. I think that's probably a
separate patch, and a fairly small patch.

Using the principle of least surprise, if a user specified one of
INDEXES or CONSTRAINTS, but not both, and there is a unique index, we
should raise an ERROR (or at least a WARNING).

There is not much of a problem with backwards compatibility. LIKE is
shorthand (not stored in catalogs), so it doesn't affect
pg_dump/restore. And hopefully there aren't a lot of apps out there
creating tables dynamically using the LIKE syntax.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Greg Stark
On Fri, Aug 21, 2009 at 3:23 AM, Brendan Jurd wrote:
> They'd expect it to create an index sans the unique
> constraint.  Ignoring the user's intention and copying the index as-is
> (including the unique constraint) would be unfriendly.
>
> Unless the SQL spec demands that we do so?

There are no indexes in the SQL spec.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Brendan Jurd
2009/8/21 Jeff Davis :
> If they include indexes and not constraints, I think we should follow
> the same policy as unique constraints, and create the index and the
> constraint.
>
> The behavior seems a little strange to me, but that's the current
> behavior for unique indexes.

This may be an opportunity to fix it.

The current behaviour seems to be predicated on the unique constraint
being an integral part of the index itself.  While this might be true
from a system catalog point of view (pg_index.indisunique), if a user
says that they want to copy a table's structure INCLUDING INDEXES
EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
clear.  They'd expect it to create an index sans the unique
constraint.  Ignoring the user's intention and copying the index as-is
(including the unique constraint) would be unfriendly.

Unless the SQL spec demands that we do so?

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Jeff Davis
On Fri, 2009-08-21 at 11:14 +1000, Brendan Jurd wrote:
> As an aside, Jeff, have you considered how this feature would interact
> with CREATE TABLE ... LIKE parent_table [ { INCLUDING | EXCLUDING } {
> DEFAULTS | CONSTRAINTS | INDEXES } ] ... }?  What if someone asks to
> include indexes but not constraints?  Vice-versa?  Will these cases be
> handled gracefully?

I hadn't considered that yet, thanks for bringing it to my attention.

>From the docs on CREATE TABLE (... LIKE ...):

"Not-null constraints are always copied to the new table. CHECK
constraints will only be copied if INCLUDING CONSTRAINTS is specified;
other types of constraints will never be copied."

If they include constraints and not indexes, nothing special.

If they include indexes and not constraints, I think we should follow
the same policy as unique constraints, and create the index and the
constraint.

The behavior seems a little strange to me, but that's the current
behavior for unique indexes.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Brendan Jurd
2009/8/21 Heikki Linnakangas :
> Jeff Davis wrote:
>> I'm leaning toward not allowing it at CREATE TABLE time.
>
> Seems reasonable to me too.
>

+1

There are plenty of other things to do with tables that you can't mix
directly into a CREATE TABLE statement (grant permissions, create
triggers, change owner, to name a few) so this would not be a surprise
-- or a hardship -- for users IMO.

As an aside, Jeff, have you considered how this feature would interact
with CREATE TABLE ... LIKE parent_table [ { INCLUDING | EXCLUDING } {
DEFAULTS | CONSTRAINTS | INDEXES } ] ... }?  What if someone asks to
include indexes but not constraints?  Vice-versa?  Will these cases be
handled gracefully?

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Heikki Linnakangas
Jeff Davis wrote:
> On Thu, 2009-08-20 at 11:47 +0300, Heikki Linnakangas wrote:
>> That sounds like the constraint is based on an existing index, but there
>> can't be any existing indexes on a table that hasn't been created yet.
>> If this creates the index, then the syntax needs to support specifying
>> index access method and an opclass for all the columns.
> 
> Of course, thanks for pointing that out. To make it work at CREATE TABLE
> time, the language would have to specify the index access method, and
> the index name should be optional. Do you think it's worthwhile adjust
> the syntax for that, or would it just bloat the CREATE TABLE syntax for
> no reason?
> 
> I'm leaning toward not allowing it at CREATE TABLE time.

Seems reasonable to me too.

> However, I'm not sure if it's very easy to provide support for
> concurrent index building. Should I block it, or is it worth
> investigating further?

Dunno. It sure would be nice, but it's not a showstopper.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Jeff Davis
On Thu, 2009-08-20 at 11:47 +0300, Heikki Linnakangas wrote:
> That sounds like the constraint is based on an existing index, but there
> can't be any existing indexes on a table that hasn't been created yet.
> If this creates the index, then the syntax needs to support specifying
> index access method and an opclass for all the columns.

Of course, thanks for pointing that out. To make it work at CREATE TABLE
time, the language would have to specify the index access method, and
the index name should be optional. Do you think it's worthwhile adjust
the syntax for that, or would it just bloat the CREATE TABLE syntax for
no reason?

I'm leaning toward not allowing it at CREATE TABLE time.

> It would be nice to have syntax to create the index and constraint in
> one command, so that the constraint can be checked while the index is
> being created. Otherwise you need an extra heap scan to check it.

I can leave the old syntax in:

CREATE INDEX  ON  USING 
  (a CONSTRAINT , b CONSTRAINT ) ...;

and allow both.

However, I'm not sure if it's very easy to provide support for
concurrent index building. Should I block it, or is it worth
investigating further?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-20 Thread Heikki Linnakangas
Jeff Davis wrote:
> I'm going to try to get this patch ready for the 9-15 commitfest. Here
> are a few design updates:
> 
> (1) Language:
> 
> I think that the new language should be a table constraint, and I think
> there's a consensus on that. The specific language I have in mind is:
> 
>   CREATE TABLE (
> ...,
> INDEX CONSTRAINT ( , ...) USING INDEX 
>   );

That sounds like the constraint is based on an existing index, but there
can't be any existing indexes on a table that hasn't been created yet.
If this creates the index, then the syntax needs to support specifying
index access method and an opclass for all the columns.

>   ALTER TABLE ADD INDEX CONSTRAINT ( , ...)
> USING INDEX ;
> 
> Where  is the constraint operator. For example, if all s are
> "=" (or whatever the operator for BTEqualStragey is for that type), that
> would be semantically identical to a UNIQUE constraint.

This makes more sense.

It would be nice to have syntax to create the index and constraint in
one command, so that the constraint can be checked while the index is
being created. Otherwise you need an extra heap scan to check it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-08-16 Thread Jeff Davis

I'm going to try to get this patch ready for the 9-15 commitfest. Here
are a few design updates:

(1) Language:

I think that the new language should be a table constraint, and I think
there's a consensus on that. The specific language I have in mind is:

  CREATE TABLE (
...,
INDEX CONSTRAINT ( , ...) USING INDEX 
  );

  ALTER TABLE ADD INDEX CONSTRAINT ( , ...)
USING INDEX ;

Where  is the constraint operator. For example, if all s are
"=" (or whatever the operator for BTEqualStragey is for that type), that
would be semantically identical to a UNIQUE constraint.

(2) Enforce while creating constraint:

To enforce the constraint while adding it to a table with existing data,
I'd treat it somewhat similar to a CHECK constraint: exclusive lock on
the table, and check every tuple. 

Note that this is going to be O( n * log n ), assuming a search time of
O( log n ) on the index. A CHECK constraint is obviously just O(n).

(3) Catalog change:

I'll need to add a column pg_constraint.constrategies of type
int2vector. This will hold the strategy numbers of the constraint
operators. For instance, if there's a 2-column index constraint on a
BTree index, and both constraint operators are "=", it will be a vector
containing [BTEqualStrategy, BTEqualStrategy].

I will add a column pg_class.relindconstraints which will be similar to
relchecks (see below).

Also, introduce a new "contype 'i'" meaning that it's an index
constraint.

The patch relies on the existence of pg_constraint.conindid, which is
already committed for 8.5.

(4) Enforce on insert procedure:

This is mostly the same as the previous patch. However, I think I need
to avoid catalog lookups in the index insert path, which is a problem
Tom pointed out in the deferrable unique constraints discussion.

My plan is to make it a part of ResultRelInfo and initialize it in a way
similar to a CHECK constraint (in ExecRelCheck() if it's not already
initialized).

I would use relindconstraints to prevent going into that path during the
common case where there is no index constraint (in particular, when
bootstrapping).

Comments, suggestions, ideas?

Regards,
Jeff Davis






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-16 Thread Jeff Davis
On Fri, 2009-07-17 at 09:51 +1000, Brendan Jurd wrote:
> I like that idea ... although how would this interact (if at all) with
> the existing pg_index.isunique flag?  Would it become deprecated in
> favour of using indconstrats, or would you actually look at switching
> isunique to TRUE if somebody applies a constraint which is made up
> entirely of equality ops?

If this ALTER TABLE ADD UNIQUE ... USING syntax is really shorthand for
my special index constraints, it would probably have to use the general
mechanism. Otherwise there would be no way to use the general mechanism
over a btree, which I think should be possible (if nothing else it would
be good to allow apples-to-apples performance testing of my patch).

But I guess it doesn't have to be directly shorthand, ALTER TABLE ADD
UNIQUE ... USING could choose to turn an existing index unique when
possible (e.g. btree), otherwise use the general mechanism.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-16 Thread Brendan Jurd
2009/7/17 Jeff Davis :
> Another idea that I thought about is that:
>
>   ALTER TABLE foo ADD UNIQUE (a, b) USING foo_idx;
>
> could be a shorthand for:
>
>   ALTER TABLE foo ADD INDEX CONSTRAINT (a =, b =) USING foo_idx;
>
> The benefit is that it could go over GiST indexes or hash indexes, not
> just btrees. The syntax could also be useful to turn an existing btree
> into a unique btree.

I like that idea ... although how would this interact (if at all) with
the existing pg_index.isunique flag?  Would it become deprecated in
favour of using indconstrats, or would you actually look at switching
isunique to TRUE if somebody applies a constraint which is made up
entirely of equality ops?

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-16 Thread Jeff Davis
On Thu, 2009-07-16 at 15:22 +1000, Brendan Jurd wrote:
> I had a play around with the feature in psql.  I think the syntax is
> okay, but using "ALTER TABLE ... ADD" as you mentioned upthread could
> be a better option.

Ok, I think we're pretty much settled on that option then.

Another idea that I thought about is that:

   ALTER TABLE foo ADD UNIQUE (a, b) USING foo_idx;

could be a shorthand for:

   ALTER TABLE foo ADD INDEX CONSTRAINT (a =, b =) USING foo_idx;

The benefit is that it could go over GiST indexes or hash indexes, not
just btrees. The syntax could also be useful to turn an existing btree
into a unique btree.

> I noticed that there's no change to the output of \d in psql to show
> the constraint, so when I do a \d on my test table, I can see that
> there's a gist index there, but I can't tell that there is also a
> constraint on it.  This seems like a pretty significant shortcoming.
> Essentially once you've created one of these index constraints, it
> vanishes into the catalogs and becomes invisible to the user.  This
> might call for a modification of pg_get_indexdef()?

I agree, that's important. Psql support, regression tests, and docs are
all intertwined somewhat with the syntax, so I held off on that work
until I got a little feedback. I will get to work and see if I can put
together a more complete version in the next few days.

If you happen to have time, you can see if you can break my current
patch. I expect the basic algorithm to remain about the same for my next
version, so if you see any problems with that, please let me know. Also,
if you see any possible improvements that could make it useful for more
situations, that would be helpful, too.

But I think I have enough information to move forward, so if you want to
move on to a more complete patch, feel free.

Thanks for the review!

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-15 Thread Brendan Jurd
2009/7/15 Jeff Davis :
> Updated patch attached.
>
> Changes:
>  * Added syntax support:
>     CREATE INDEX foo_idx ON foo ... (a CONSTRAINT =, b CONSTRAINT &&);
>  * More aggressively clear the shared memory entries to avoid
>   unnecessary checks
>  * Code cleanup
>
> TODO:
>  * When adding constraint to table with data already in it, verify that
>   existing data satisfies constraint.
>  * Clean up error messages a little
>  * Docs

Hi Jeff,

I've been assigned to do an initial review of your patch.  Because
this patch is still WIP, there's not a lot for me to do.  I see from
the thread that there are still some design questions that remain
open, but I won't be addressing those.  The internals of indexes and
constraints is not an area of the code I have any particular insight
about.

The patch applies cleanly, builds successfully and passes regression
tests.  I read through the code changes, and didn't notice any code
convention violations.

I had a play around with the feature in psql.  I think the syntax is
okay, but using "ALTER TABLE ... ADD" as you mentioned upthread could
be a better option.

I noticed that there's no change to the output of \d in psql to show
the constraint, so when I do a \d on my test table, I can see that
there's a gist index there, but I can't tell that there is also a
constraint on it.  This seems like a pretty significant shortcoming.
Essentially once you've created one of these index constraints, it
vanishes into the catalogs and becomes invisible to the user.  This
might call for a modification of pg_get_indexdef()?

You've already noted this as a TODO, but clearly the error messages
need some serious help.  If I deliberately violate an index constraint
I get:

ERROR:  conflict detected 3

At minimum the message should use the term "constraint" and give the
name of the index that has detected the conflict.  I think something
like this would be okay:

ERROR:  new record violates constraint on index "circle_idx"

I also noticed that the patch does not include any changes or
additions to the regression test suite.  Perhaps it ought to?

That's all the feedback I have for the moment.  I hope you find my
comments constructive.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-14 Thread Jeff Davis
Updated patch attached.

Changes:
 * Added syntax support:
 CREATE INDEX foo_idx ON foo ... (a CONSTRAINT =, b CONSTRAINT &&);
 * More aggressively clear the shared memory entries to avoid 
   unnecessary checks
 * Code cleanup

TODO:
 * When adding constraint to table with data already in it, verify that 
   existing data satisfies constraint.
 * Clean up error messages a little
 * Docs

The following are possible TODO items, but I'd like to get some feedback
first:
 * It seems like an alternative language would be better:
 ALTER TABLE foo ADD INDEX CONSTRAINT optional_name (a =, b &&)
   USING foo_idx;
   This language would be more like a table constraint that happens to 
   use an index. I think it's better because it allows multiple 
   constraints to be enforced by the same index.
 * Right now it only supports index AMs that offer amgettuple, which 
   excludes GIN. Consider adding a crude implementation of gingettuple 
   that just calls gingetbitmap internally (obviously providing no 
   performance advantage over gingetbitmap).

Regards,
Jeff Davis
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 1515d9f..d88387b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -26,6 +26,7 @@
  *		index_vacuum_cleanup	- post-deletion cleanup of an index
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *  index_check_constraint - check index constraints
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -64,9 +65,13 @@
 
 #include "access/relscan.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/lwlock.h"
+#include "storage/procarray.h"
+#include "utils/lsyscache.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/tqual.h"
@@ -116,6 +121,19 @@ do { \
 static IndexScanDesc index_beginscan_internal(Relation indexRelation,
 		 int nkeys, ScanKey key);
 
+typedef struct
+{
+	Oid	relid;
+	TransactionId		xid;
+	ItemPointerData		tid;
+} CurrentIndexInsertEntry;
+
+static CurrentIndexInsertEntry *CurrentIndexInsertsTable = NULL;
+
+static bool index_check_constraint_conflict(TupleTableSlot *slot,
+			HeapTuple tup, int2 *heap_attnums,
+			int2 index_natts,
+			Oid *constraint_procs);
 
 /* 
  *   index_ interface functions
@@ -846,3 +864,303 @@ index_getprocinfo(Relation irel,
 
 	return locinfo;
 }
+
+void
+index_check_constraint(Relation heap, Relation index,
+		ItemPointer tid, TupleTableSlot *slot)
+{
+		IndexScanDesc	 index_scan;
+		HeapTuple		 tup;
+		ScanKeyData		*scankeys;
+		int2vector		*constr_strats;
+		Oid*constr_procs;
+		int i;
+		int2			*heap_attnums = index->rd_index->indkey.values;
+		int2			 index_natts  = index->rd_index->indnatts;
+		SnapshotData	 DirtySnapshot;
+		int nkeys		  = 0;
+
+		CurrentIndexInsertEntry	 potential_conflicts[MaxBackends];
+		int		 n_potential_conflicts = 0;
+
+		/* Find constraint strategy numbers */
+		constr_strats = RelationGetIndexConstraintStrategies(index);
+
+		/* return if no constraint */
+		if (constr_strats == NULL)
+			return;
+
+		/*
+		 * if any of the indexed columns are NULL, the constraint
+		 * is satisfied
+		 */
+		for (i = 0; i < index_natts; i++)
+			if (slot_attisnull(slot, heap_attnums[i]))
+return;
+
+		/*
+		 * Find the function that tests for a conflict based on the
+		 * strategy number, operator family, and types.
+		 */
+		constr_procs = palloc(sizeof(Oid) * index_natts);
+		for (i = 0; i < index_natts; i++)
+		{
+			/*
+			 * Find the procedure implementing the strategy for the
+			 * index for two arguments both with the type of the
+			 * indexed attribute.
+			 */
+			Oid	oper;
+			Oid	opfamily = index->rd_opfamily[i];
+			StrategyNumber		strategy = constr_strats->values[i];
+			Oid	typeOid	= heap->rd_att->attrs[heap_attnums[i] - 1]->atttypid;
+
+			if (strategy == InvalidStrategy)
+continue;
+
+			oper = get_opfamily_member(opfamily, typeOid, typeOid, strategy);
+
+			if(OidIsValid(oper))
+constr_procs[i] = get_opcode(oper);
+			else
+elog(ERROR, "cannot determine operator for type %d and "
+	 "strategy %d", typeOid, strategy);
+		}
+
+		/*
+		 * Check for conflicts with concurrent inserts. These are
+		 * inserts that are actually in-progress now, and have not
+		 * actually been put in the index yet.
+		 */
+
+		Assert (CurrentIndexInsertsTable != NULL);
+
+		LWLockAcquire(IndexConstraintLock, LW_SHARED);
+
+		for (i = 0; i < MaxBackends; i++)
+		{
+			CurrentIndexInsertEntry entry = CurrentIndexInsertsTable[i];
+
+			if (RelationGetRelid(heap) == entry.relid &&
+!TransactionIdIsCurrentTransactionId(entry.xid) &&
+TransactionIdIsInProgress(entry.xid))
+			{
+	potential

Re: [HACKERS] WIP: generalized index constraints

2009-07-11 Thread Jeff Davis
On Sat, 2009-07-11 at 19:06 -0400, Tom Lane wrote:
> > Is it possible to re-add amgettuple to GIN, and just set the cost high

> We wouldn't have deleted it if it were practical to make it work.

Can you elaborate a little?

Following the thread, I see:

http://archives.postgresql.org/pgsql-hackers/2009-02/msg00532.php

"What would be wrong with letting it degrade to lossy?  I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions."

And the next message refers to:

http://archives.postgresql.org/message-id/4974b002.3040...@sigaev.ru

"amgettuple interface hasn't possibility to work with page-wide result
instead of exact ItemPointer. amgettuple can not return just a block
number as amgetbitmap can."

I see why it's impractical to make it efficient, but the way I see it we
can make gingettuple just a wrapper around gingetbitmap, which just
iterates through the bitmap. It would not have any performance benefit
over gingetbitmap, obviously. But if my index constraints patch is going
to support GIN (which seems like an interesting application), I would
need to implement a function which does this anyway (efficiently or
not).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-11 Thread Tom Lane
Jeff Davis  writes:
> Is it possible to re-add amgettuple to GIN, and just set the cost high
> so it's not chosen by the planner? Or is there some reason this is
> fundamentally a bad idea (or won't work at all)?

We wouldn't have deleted it if it were practical to make it work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-11 Thread Jeff Davis
Right now this patch does not support GIN because GIN doesn't support
amgettuple.

It could be made to support GIN by doing a bitmap index scan, manually
fetching the next tuple (or, if it's lossy, the next one on the page),
checking it against the snapshot, and then rechecking it to make sure it
still matches.

The API I'm looking for is essentially the same as index_getnext(),
which makes the most sense for finding constraint violations.

Is it possible to re-add amgettuple to GIN, and just set the cost high
so it's not chosen by the planner? Or is there some reason this is
fundamentally a bad idea (or won't work at all)?

I know we removed it in 8.4:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg01123.php
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00532.php

but the reasoning seemed mostly because:
1. GIN could not support amgettuple efficiently (lots of rechecking)
2. Normal index scans did not fit a common use case for GIN, anyway

However, if my feature needs to perform this check anyway (to support
GIN, that is), it seems like it could be re-added. There was also some
resistance to removing it in the first place (e.g. for anti-joins), so
perhaps it can be made to be efficient again during the 8.5 cycle.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-08 Thread Dean Rasheed
Tom Lane wrote:
> ...  I think it might be interesting to turn
> around Jeff's syntax sketch and provide a way to say that a CONSTRAINT
> declaration should depend on some previously added index, eg
> something like
> 
>   ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index
> 

Is there any reason to limit UNIQUE constraints to lists of table
columns? If you can build a unique index on an expression, why not a
unique constraint?

A quick test defining an index and manually adding catalog entries for
the constraint and depends showed that it appears to work fine (and it's
compatible with my deferrable unique constraints patch :-) )

 - Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Jeff Davis
On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> Also, if hash indexes were a realistic alternative to btree for this,
> we'd already have come up against the problem that the CONSTRAINT syntax
> doesn't provide any way to specify what kind of index you want to use
> underneath the constraint.  I think it might be interesting to turn
> around Jeff's syntax sketch and provide a way to say that a CONSTRAINT
> declaration should depend on some previously added index, eg
> something like
> 
>   ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index
> 

How about:

ALTER TABLE tab ADD INDEX CONSTRAINT [name]
(col1 [op], col2 [op]) USING index

And then just validate the constraint at creation time, and store the
information in pg_constraint.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Jeff Davis
On Tue, 2009-07-07 at 14:57 -0400, Tom Lane wrote:
> I don't think this even approximates the need --- in particular it's not
> clear what the semantics of combination across different index columns
> are.  I assume you've hot-wired it so that several BTEqualStrategyNumber
> columns will work like a normal multicolumn uniqueness constraint (IOW
> it's okay as long as at least one column is NULL or isn't equal).  But
> I'm not at all sure that's what I'd want for some other operator type.

If any input attributes are NULL, the constraint automatically succeeds
(I think that follows the SQL philosophy). Perhaps we should be able to
override this behavior somehow?

Now, for each attribute where a constraint is defined, it becomes a new
scan key used in the index lookup to enforce the constraint. So, the
more attributes in the constraint, the more permissive the constraint
(just like UNIQUE). 

I can't think of another good interpretation of the constraint. If a
constraint on (x,y) meant "x is unique, and y is unique", it would be
hard to scan the index for y's uniqueness. If you want to say that both
are independently unique, I believe that requires two indexes, and so it
would probably be best to just require the constraints to be specified
separately. Thoughts?

> Also, what happens if you want to use the same index to support more
> than one logical constraint?  This is impossible if you put the
> information into pg_index, I think.
> 

I like the idea to store the information outside of pg_index. It means
that I don't have to build the index and check the constraint at the
same time. It would be more like adding a CHECK constraint.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Tom Lane
Jeff Davis  writes:
> On Tue, 2009-07-07 at 18:36 +0100, Simon Riggs wrote:
>> On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
>> It is likely to be useful in the future to allow an index with N
>> columns, yet which can provide uniqueness with < N of those columns.
>> This capability is known as covered indexes and will be important if
>> Heikki writes his index-only scan code.

> My patch offers this capability, and the language I suggested would
> support it.

> In the current version of the patch, just use InvalidStrategy (0)
> instead of, say, BTEqualStrategyNumber (3) for the attributes that you
> don't want to be a part of the constraint. Some of the proper error
> checking is not done yet, but it will work.

I don't think this even approximates the need --- in particular it's not
clear what the semantics of combination across different index columns
are.  I assume you've hot-wired it so that several BTEqualStrategyNumber
columns will work like a normal multicolumn uniqueness constraint (IOW
it's okay as long as at least one column is NULL or isn't equal).  But
I'm not at all sure that's what I'd want for some other operator type.

Also, what happens if you want to use the same index to support more
than one logical constraint?  This is impossible if you put the
information into pg_index, I think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Jeff Davis
On Tue, 2009-07-07 at 18:36 +0100, Simon Riggs wrote:
> On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> It is likely to be useful in the future to allow an index with N
> columns, yet which can provide uniqueness with < N of those columns.
> This capability is known as covered indexes and will be important if
> Heikki writes his index-only scan code.

My patch offers this capability, and the language I suggested would
support it.

In the current version of the patch, just use InvalidStrategy (0)
instead of, say, BTEqualStrategyNumber (3) for the attributes that you
don't want to be a part of the constraint. Some of the proper error
checking is not done yet, but it will work.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Greg Stark
On Tue, Jul 7, 2009 at 6:22 PM, Tom Lane wrote:
>
> This seems a bit pointless.  There is certainly not any use case for a
> constraint without an enforcement mechanism (or at least none the PG
> community is likely to consider legitimate ;-)).  And it's not very
> realistic to suppose that you'd check a constraint by doing a seqscan
> every time.  Therefore there has to be an index underlying the
> constraint somehow.

I'm not entirely convinced that running a full scan to enforce
constraints is necessarily such a crazy idea. It may well be the most
efficient approach after a major bulk load. And consider a read-only
database where the only purpose of the constraint is to inform the
optimizer that it can trust the property to hold.

That said this seems like an orthogonal issue to me.

> Jeff's complaint about total order is not an
> argument against having an index, it's just pointing out that btree is
> not the only possible type of index.  It's perfectly legitimate to
> imagine using a hash index to enforce uniqueness, for example.  If hash
> indexes had better performance we'd probably already have been looking
> for a way to do that, and wanting some outside-the-AM mechanism for it
> so we didn't have to duplicate code from btree.

I'm a bit at a loss why we need this extra data structure though. The
whole duplicated code issue seems to me to be one largely of code
structure. If we hoisted the heap-value rechecking code out of the
btree AM then the hash AM could reuse it just fine.

Both the hash and btree AMs would have to implement some kind of
"insert-unique-key" operation which would hold some kind of lock
preventing duplicate unique keys from being inserted but both btree
and hash could implement that efficiently by locking one page or one
hash value.

GIST would need something like this "store the key value or tid in
shared memory" mechanism. But that could be implemented as an external
facility which GIST then made use of -- just the way every part of the
system makes use of other parts. It doesn't mean we have to make
"prevent concurrent unique inserts" not the responsibility of the AM
which knows best how to handle that.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Simon Riggs

On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> Jeff Davis  writes:
> > On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
> >> In many cases, people add unique indexes solely to allow replication to
> >> work correctly. The index itself may never be used, especially in high
> >> volume applications.
> 
> > Interesting. Maybe we should at least try to leave room for this feature
> > to be added later. I agree that, from a theoretical perspective,
> > requiring a UNIQUE constraint to use an index is wrong. For one thing,
> > you can't ensure the uniqueness without defining some total order
> > (although you can define an arbitrary total order for cases with no
> > meaningful total order).
> 
> This seems a bit pointless.  There is certainly not any use case for a
> constraint without an enforcement mechanism (or at least none the PG
> community is likely to consider legitimate ;-)).  And it's not very
> realistic to suppose that you'd check a constraint by doing a seqscan
> every time.  Therefore there has to be an index underlying the
> constraint somehow.  

I think the idea has been misconstrued.

Obviously a constraint requires an enforcement mechanism. That doesn't
imply that the enforcement mechanism must be fully usable as an index.

The example being discussed was enforcing uniqueness on monotonically
increasing columns. If we knew that a column value was GENERATED ALWAYS
using a sequence, then we could simply skip the uniqueness check
altogether. No index, yet an enforced unique constraint.

Yes, we would need to understand the relationship between the sequence
and the table and throw an error in certain sequence update cases (and
we may need to check those with a seq scan). But that seems a small
price to pay for the avoidance of a potentially very large index that
may have no purpose.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Simon Riggs

On Tue, 2009-07-07 at 13:22 -0400, Tom Lane wrote:
> ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index

This would be very useful, though perhaps only because we do not have
REINDEX CONCURRENTLY.

It is likely to be useful in the future to allow an index with N
columns, yet which can provide uniqueness with < N of those columns.
This capability is known as covered indexes and will be important if
Heikki writes his index-only scan code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Tom Lane
Jeff Davis  writes:
> On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
>> In many cases, people add unique indexes solely to allow replication to
>> work correctly. The index itself may never be used, especially in high
>> volume applications.

> Interesting. Maybe we should at least try to leave room for this feature
> to be added later. I agree that, from a theoretical perspective,
> requiring a UNIQUE constraint to use an index is wrong. For one thing,
> you can't ensure the uniqueness without defining some total order
> (although you can define an arbitrary total order for cases with no
> meaningful total order).

This seems a bit pointless.  There is certainly not any use case for a
constraint without an enforcement mechanism (or at least none the PG
community is likely to consider legitimate ;-)).  And it's not very
realistic to suppose that you'd check a constraint by doing a seqscan
every time.  Therefore there has to be an index underlying the
constraint somehow.  Jeff's complaint about total order is not an
argument against having an index, it's just pointing out that btree is
not the only possible type of index.  It's perfectly legitimate to
imagine using a hash index to enforce uniqueness, for example.  If hash
indexes had better performance we'd probably already have been looking
for a way to do that, and wanting some outside-the-AM mechanism for it
so we didn't have to duplicate code from btree.

Also, if hash indexes were a realistic alternative to btree for this,
we'd already have come up against the problem that the CONSTRAINT syntax
doesn't provide any way to specify what kind of index you want to use
underneath the constraint.  I think it might be interesting to turn
around Jeff's syntax sketch and provide a way to say that a CONSTRAINT
declaration should depend on some previously added index, eg
something like

ALTER TABLE tab ADD CONSTRAINT UNIQUE (col1, col2) USING index

Not sure how that squares exactly with the question of variant
definitions of uniqueness semantics (as opposed to purely implementation
decisions like hash vs btree).  But it's a different way to come at it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-07 Thread Teodor Sigaev



CREATE INDEX test_idx ON test USING gist
  (i CONSTRAINT =, c CONSTRAINT &&);

which would avoid the need for updating the catalog, of course.

Hmm, looks like "index"-fied table's constrains

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Jeff Davis
On Mon, 2009-07-06 at 18:27 +0100, Simon Riggs wrote:
> In many cases, people add unique indexes solely to allow replication to
> work correctly. The index itself may never be used, especially in high
> volume applications.

Interesting. Maybe we should at least try to leave room for this feature
to be added later. I agree that, from a theoretical perspective,
requiring a UNIQUE constraint to use an index is wrong. For one thing,
you can't ensure the uniqueness without defining some total order
(although you can define an arbitrary total order for cases with no
meaningful total order).

> How do you handle uniqueness within a stream? Presumably it is possible
> and useful to have a stream of data that can be guaranteed unique, yet a
> stream would never be uniquely targeted for lookups because of the
> volume of data involved.

[ Simon is asking me because I work for Truviso, but my response is not
officially from Truviso ]

There are a few cases worth mentioning here. First, if you have a stream
that's backed by a table, you can use a table constraint. Second, you
might choose to have an "in-order" constraint (not necessary, the system
can fix out-of-order data), which could be a unique constraint that's
very cheap to test.

Additionally, this is not strictly a constraint, but if you have
downstream operators, like COUNT(DISTINCT...), that can be seen as being
similar to a constraint. These will often be over a limited span of
time, say, a minute or an hour, and we can keep the necessary state. If
there are a huge number of distinct values there, then it's a challenge
to avoid keeping a lot of state.

There are a few other specialized methods that we can use for specific
use-cases.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Greg Stark
On Mon, Jul 6, 2009 at 6:20 PM, Jeff Davis wrote:
> On Mon, 2009-07-06 at 17:02 +0100, Greg Stark wrote:
>> On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis wrote:
>> >
>> > Exactly, you already know my use case ;) My goal is a "temporal key",
>> > where you can't have overlapping intervals of time, e.g. the constraint
>> > "nobody can be two places at the same time".
>>
>> Incidentally to handle non-overlapping ranges you don't need GIST, you
>> can actually use a plain btree. Since there are no overlapping ranges
>> the ranges have a complete ordering and you can get that by just
>> sorting by either endpoint. To enforce the constraint you only have to
>> compare with the previous and following element in the btree.
>
> What if you have an entire index full of overlapping dead tuples, and a
> few live ones? How would search work?

I should clarify I didn't mean you could implement it in SQL using
Postgres btrees. I just meant that a tree data structure was
sufficient, you don't need the power of GIST. It's probably easier to
implement it in GIST in Postgres since it's there though.

So it would work just like regular btrees, you only consider it a
conflict if there's a live value that conflicts.



-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Simon Riggs

On Mon, 2009-07-06 at 08:50 -0700, Jeff Davis wrote:
> On Mon, 2009-07-06 at 11:56 +0100, Simon Riggs wrote:
> > I think it will be useful to separate the concepts of a constraint from
> > the concept of an index. It seems possible to have a UNIQUE constraint
> > that doesn't help at all in locating rows, just in proving that the rows
> > are unique.
> 
> That would be interesting. Do you have a use case? Checking the
> constraint would surely be slower in a lot of cases.
> 
> I could imagine different constraint-checking schemes that could be fast
> against a heap. For instance, if it's greater than the max or less than
> the min value, that would be cheap to check. That might be an
> interesting way to handle the constraint for a sequence-generated
> column, or timestamp column that is always ascending.

Yes.

> However, the problem is I don't see a lot of room for a practical use
> case. In the above situations, you'd almost certainly want indexes
> anyway: what's the point of a sequence number unless you're going to do
> lookups? And if you have an ascending timestamp column, I would think
> that you might do range lookups occasionally (which will be even better
> because the heap will be clustered).

In many cases, people add unique indexes solely to allow replication to
work correctly. The index itself may never be used, especially in high
volume applications.

How do you handle uniqueness within a stream? Presumably it is possible
and useful to have a stream of data that can be guaranteed unique, yet a
stream would never be uniquely targeted for lookups because of the
volume of data involved.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Jeff Davis
On Mon, 2009-07-06 at 17:02 +0100, Greg Stark wrote:
> On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis wrote:
> >
> > Exactly, you already know my use case ;) My goal is a "temporal key",
> > where you can't have overlapping intervals of time, e.g. the constraint
> > "nobody can be two places at the same time".
> 
> Incidentally to handle non-overlapping ranges you don't need GIST, you
> can actually use a plain btree. Since there are no overlapping ranges
> the ranges have a complete ordering and you can get that by just
> sorting by either endpoint. To enforce the constraint you only have to
> compare with the previous and following element in the btree.

What if you have an entire index full of overlapping dead tuples, and a
few live ones? How would search work?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Greg Stark
On Mon, Jul 6, 2009 at 4:57 PM, Jeff Davis wrote:
>
> Exactly, you already know my use case ;) My goal is a "temporal key",
> where you can't have overlapping intervals of time, e.g. the constraint
> "nobody can be two places at the same time".

Incidentally to handle non-overlapping ranges you don't need GIST, you
can actually use a plain btree. Since there are no overlapping ranges
the ranges have a complete ordering and you can get that by just
sorting by either endpoint. To enforce the constraint you only have to
compare with the previous and following element in the btree.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Jeff Davis
On Mon, 2009-07-06 at 07:30 -0700, David Fetter wrote:
> > It would be useful to see a real example of what this can be used
> > for.
> 
> Constraints like "these intervals can't overlap" would be one.  It's
> handy in calendaring applications, for example.

Exactly, you already know my use case ;) My goal is a "temporal key",
where you can't have overlapping intervals of time, e.g. the constraint
"nobody can be two places at the same time".

> > I think it will be useful to separate the concepts of a constraint
> > from the concept of an index.  It seems possible to have a UNIQUE
> > constraint that doesn't help at all in locating rows, just in
> > proving that the rows are unique.
> 
> Interesting idea.  Are you thinking of this in terms of things the
> planner can do once it knows a set is all distinct values, or...?

I think that's an orthogonal idea.

It's a good idea though, I would like the planner to be smarter about
those kinds of things. A simple example is that if a table has a
non-partial unique constraint anywhere, then "select * from foo union
select * from foo" can be transformed into "select * from
foo" (eliminating the expensive union).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Jeff Davis
On Mon, 2009-07-06 at 11:56 +0100, Simon Riggs wrote:
> I think it will be useful to separate the concepts of a constraint from
> the concept of an index. It seems possible to have a UNIQUE constraint
> that doesn't help at all in locating rows, just in proving that the rows
> are unique.

That would be interesting. Do you have a use case? Checking the
constraint would surely be slower in a lot of cases.

I could imagine different constraint-checking schemes that could be fast
against a heap. For instance, if it's greater than the max or less than
the min value, that would be cheap to check. That might be an
interesting way to handle the constraint for a sequence-generated
column, or timestamp column that is always ascending.

However, the problem is I don't see a lot of room for a practical use
case. In the above situations, you'd almost certainly want indexes
anyway: what's the point of a sequence number unless you're going to do
lookups? And if you have an ascending timestamp column, I would think
that you might do range lookups occasionally (which will be even better
because the heap will be clustered).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Jeff Davis
On Mon, 2009-07-06 at 12:28 +0100, Greg Stark wrote:
> He only needs to handle inserts for the period they're actively being
> inserted into the index. Once they're in the index he'll find them
> using the index scan. In other words this is all a proxy for the way
> btree locks index pages while it looks for a unique key violation.

Exactly, that was my design:

/*
 * We have to find all tuples, even those not visible
 * yet. Other transactions may have inserted many tuples (or
 * the transaction might be a prepared transaction), so there
 * may be some tuples that are not in the shared memory
 * structure and not visible.
 */

> I'm a bit concerned about the use of tid. You might have to look at a
> lot of heap pages to check for conflicts. I suppose they're almost
> certainly all in shared memory though.

That was my hope.

The 8.4 bulk insert code might defeat that to some degree, however.
Maybe that could be disabled when inserting into an index with
constraints? I didn't think about it before, but the bulk insert buffer
ring would affect unique btrees, too, right?

> Also, it sounds like you're
> anticipating the possibility of dead entries in the array but if you
> do then you need to store the xmin also to protect against a tuple
> that's been vacuumed and had its line pointer reused since. But I
> don't see the necessity for that anyways since you can just clean up
> the entry on abort.

Can you tell me a little more specifically the problem you're worried
about? If the tuple has been VACUUMed and removed, then the TID search
will either find a tuple, and do a spurious constraint check against it;
or not find a tuple, and just move on.

I could have the commit and abort paths clear the entry, which might
optimize away some of the TransactionIdIsInProgress() calls for
transactions that ended normally. But that didn't strike me as a big
cost compared to the index scan.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread David Fetter
On Mon, Jul 06, 2009 at 11:56:41AM +0100, Simon Riggs wrote:
> On Sun, 2009-07-05 at 17:28 -0700, Jeff Davis wrote:
> > This is a follow up to my old proposal here:
> > 
> > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php
> > 
> 
> > Any input is appreciated (design problems, implementation,
> > language ideas, or anything else). I'd like to get it into shape
> > for the July 15 commitfest if no major problems are found.
> 
> I was concerned that your definition of concurrently inserted didn't
> seem to match the size of the shared memory array required.
> 
> How will you cope with a large COPY? Surely there can be more than
> one concurrent insert from any backend?
> 
> It would be useful to see a real example of what this can be used
> for.

Constraints like "these intervals can't overlap" would be one.  It's
handy in calendaring applications, for example.

> I think it will be useful to separate the concepts of a constraint
> from the concept of an index.  It seems possible to have a UNIQUE
> constraint that doesn't help at all in locating rows, just in
> proving that the rows are unique.

Interesting idea.  Are you thinking of this in terms of things the
planner can do once it knows a set is all distinct values, or...?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Greg Stark
On Mon, Jul 6, 2009 at 11:56 AM, Simon Riggs wrote:
> How will you cope with a large COPY? Surely there can be more than one
> concurrent insert from any backend?

He only needs to handle inserts for the period they're actively being
inserted into the index. Once they're in the index he'll find them
using the index scan. In other words this is all a proxy for the way
btree locks index pages while it looks for a unique key violation.

I'm a bit concerned about the use of tid. You might have to look at a
lot of heap pages to check for conflicts. I suppose they're almost
certainly all in shared memory though. Also, it sounds like you're
anticipating the possibility of dead entries in the array but if you
do then you need to store the xmin also to protect against a tuple
that's been vacuumed and had its line pointer reused since. But I
don't see the necessity for that anyways since you can just clean up
the entry on abort.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-06 Thread Simon Riggs

On Sun, 2009-07-05 at 17:28 -0700, Jeff Davis wrote:
> This is a follow up to my old proposal here:
> 
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php
> 

> Any input is appreciated (design problems, implementation, language
> ideas, or anything else). I'd like to get it into shape for the July
> 15 commitfest if no major problems are found.

I was concerned that your definition of concurrently inserted didn't
seem to match the size of the shared memory array required.

How will you cope with a large COPY? Surely there can be more than one
concurrent insert from any backend?


It would be useful to see a real example of what this can be used for.


I think it will be useful to separate the concepts of a constraint from
the concept of an index. It seems possible to have a UNIQUE constraint
that doesn't help at all in locating rows, just in proving that the rows
are unique.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: generalized index constraints

2009-07-05 Thread Jeff Davis
This is a follow up to my old proposal here:

http://archives.postgresql.org/pgsql-hackers/2008-06/msg00404.php

Top pointed out a few problems here:

http://archives.postgresql.org/pgsql-hackers/2008-06/msg00427.php

Here are my updated answers:

1. Not a problem with the new design, which checks the constraints from
ExecInsertIndexTuples().

2. Not a problem for similar reasons.

3. I don't have an answer here yet, but I have a few thoughts. I see it
as a separate proposal. My hand-waving answer is that it should be just
as possible as before to append index constraint failures to a big list,
and loop through it as long as we're making progress. If I need a more
solid proposal for this problem before my generalized constraints
proposal is considered, let me know.

To try out my patch:

(1) Apply patch to 8.5-devel and Init DB

(2) Install contrib/btree_gist (only necessary for this example, patch
works with Btree and GIN, too).

(3)
  => create table test(i int, c circle);
  => create index test_idx on test using gist(i, c);
  => UPDATE pg_index SET indconstrats = '3 3' 
 WHERE indexrelid='test_idx'::regclass;

  In the above query, 3 is the equality strategy number for the GiST
opclass for integers, and 3 is also the "overlaps" strategy number for
the GiST opclass for circles, so we put a 3 for each attribute. What
this will mean is that it will reject any new tuple when there is
already another tuple in the table with an equal value of i AND an
overlapping value of c. Concurrency should behave identically to UNIQUE
on a btree.

(4) Now, try some inserts (concurrent or otherwise) and see what
happens.

Ultimately, I think the language for this might shape up something like:

CREATE INDEX test_idx ON test USING gist
  (i CONSTRAINT =, c CONSTRAINT &&);

which would avoid the need for updating the catalog, of course.

Limitations:

 * Still not deferrable, even 'til the end of the command.
 * Your constraint must be symmetric (if tuple A conflicts with tuple B,
tuple B must conflict with tuple A).
 * The types have to match between the left and right arguments in the
operator class and the type of the column in the table. This is normally
true, but the GIN Array opclass works on type "anyarray", but the table
has a normal type, which causes a problem. Maybe it's possible to be
smarter about this, but the workaround is to just create more opclasses
(I believe).

Any input is appreciated (design problems, implementation, language
ideas, or anything else). I'd like to get it into shape for the July 15
commitfest if no major problems are found.

Regards,
Jeff Davis
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 1515d9f..eedb456 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -26,6 +26,7 @@
  *		index_vacuum_cleanup	- post-deletion cleanup of an index
  *		index_getprocid - get a support procedure OID
  *		index_getprocinfo - get a support procedure's lookup info
+ *  index_check_constraint - check index constraints
  *
  * NOTES
  *		This file contains the index_ routines which used
@@ -64,9 +65,13 @@
 
 #include "access/relscan.h"
 #include "access/transam.h"
+#include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/lwlock.h"
+#include "storage/procarray.h"
+#include "utils/lsyscache.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/tqual.h"
@@ -116,6 +121,19 @@ do { \
 static IndexScanDesc index_beginscan_internal(Relation indexRelation,
 		 int nkeys, ScanKey key);
 
+typedef struct
+{
+	Oid	relid;
+	TransactionId		xid;
+	ItemPointerData		tid;
+} CurrentIndexInsertEntry;
+
+static CurrentIndexInsertEntry *CurrentIndexInsertsTable = NULL;
+
+static bool index_check_constraint_conflict(TupleTableSlot *slot,
+			HeapTuple tup, int2 *heap_attnums,
+			int2 index_natts,
+			Oid *constraint_procs);
 
 /* 
  *   index_ interface functions
@@ -846,3 +864,278 @@ index_getprocinfo(Relation irel,
 
 	return locinfo;
 }
+
+void
+index_check_constraint(Relation heap, Relation index,
+		ItemPointer tid, TupleTableSlot *slot)
+{
+		IndexScanDesc	 index_scan;
+		HeapTuple		 tup;
+		ScanKeyData		*scankeys;
+		int2vector		*constr_strats;
+		Oid*constr_procs;
+		int i;
+		int2			*heap_attnums = index->rd_index->indkey.values;
+		int2			 index_natts  = index->rd_index->indnatts;
+		SnapshotData	 DirtySnapshot;
+		int nkeys		  = 0;
+
+		CurrentIndexInsertEntry *MyIndexInsertEntry;
+		CurrentIndexInsertEntry	 potential_conflicts[MaxBackends];
+		int		 n_potential_conflicts = 0;
+
+		/* Find constraint strategy numbers */
+		constr_strats = RelationGetIndexConstraintStrategies(index);
+
+		/* return if no constraint */
+		if (constr_strats == NULL)
+			return;
+
+		/*
+		 * if any of the indexed columns are NULL, the cons

  1   2   >