Re: [HACKERS] Declarative partitioning - another take

Amit Langote Sun, 18 Sep 2016 00:36:47 -0700

On Thu, Sep 15, 2016 at 10:07 PM, Robert Haas <[email protected]> wrote:
> On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote
> <[email protected]> wrote:
>> Wow, this is bad.  What is needed in this case is "canonicalization" of
>> the range partition bounds specified in the command.
>
> I think we shouldn't worry about this.  It seems like unnecessary
> scope creep.  All human beings capable of using PostgreSQL will
> understand that there are no integers between 4 and 5, so any
> practical impact on this would be on someone creating partitions
> automatically.  But if someone is creating partitions automatically
> they are highly likely to be using the same EXCLUSIVE/INCLUSIVE
> settings for all of the partitions, in which case this won't arise.
> And if they aren't, then I think we should just make them deal with
> this limitation in their code instead of dealing with it in our code.
> This patch is plenty complicated enough already; introducing a whole
> new canonicalization concept that will help practically nobody seems
> to me to be going in the wrong direction.  If somebody really cares
> enough to want to try to fix this, they can submit a followup patch
> someday.
>
>> To mitigate this, how about we restrict range partition key to contain
>> columns of only those types for which we know we can safely canonicalize a
>> range bound (ie, discrete range types)?  I don't think we can use, say,
>> existing int4range_canonical but will have to write a version of it for
>> partitioning usage (range bounds of partitions are different from what
>> int4range_canonical is ready to handle).  This approach will be very
>> limiting as then range partitions will be limited to columns of int,
>> bigint and date type only.
>
> -1.  That is letting the tail wag the dog.  Let's leave it the way you
> had it and be happy.


Alright, let's leave this as something to work out later.  We will
have to document the fact that such limitation exists though, I'd
think.

>>> -- Observation 2 : able to create sub-partition out of the range set for
>>> main table, causing not able to insert data satisfying any of the partition.
>>>
>>> create table test_subpart (c1 int) partition by range (c1);
>>> create table test_subpart_p1 partition of test_subpart for values start (1)
>>> end (100) inclusive partition by range (c1);
>>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>>> start (101) end (200);
>>
>> It seems that DDL should prevent the same column being used in partition
>> key of lower level partitions.  I don't know how much sense it would make,
>> but being able to use the same column as partition key of lower level
>> partitions may be a feature useful to some users if they know what they
>> are doing.  But this last part doesn't sound like a good thing.  I
>> modified the patch such that lower level partitions cannot use columns
>> used by ancestor tables.
>
> I again disagree.  If somebody defines partition bounds that make it
> impossible to insert the data that they care about, that's operator
> error.  The fact that, across multiple levels, you can manage to make
> it impossible to insert any data at all does not make it anything
> other than operator error.  If we take the contrary position that it's
> the system's job to prevent this sort of thing, we may disallow some
> useful cases, like partitioning by the year portion of a date and then
> subpartitioning by the month portion of that same date.
>
> I think we'll probably also find that we're making the code
> complicated to no purpose.  For example, now you have to check when
> attaching a partition that it doesn't violate the rule; otherwise you
> end up with a table that can't be created directly (and thus can't
> survive dump-and-restore) but can be created indirectly by exploiting
> loopholes in the checks.  It's tempting to think that we can check
> simple cases - e.g. if the parent and the child are partitioning on
> the same exact column, the child's range should be contained within
> the parent's range - but more complicated cases are tricky.  Suppose
> the table is range-partitioned on (a, b) and range-subpartitioned on
> b.  It's not trivial to figure out whether the set of values that the
> user can insert into that partition is non-empty.  If we allow
> partitioning on expressions, then it quickly becomes altogether
> impossible to deduce anything useful - unless you can solve the
> halting problem.
>
> And, really, why do we care?  If the user creates a partitioning
> scheme that permits no rows in some or all of the partitions, then
> they will have an empty table that can be correctly dumped and
> restored but which cannot be used for anything useful unless it is
> modified first.  They probably don't want that, but it's not any more
> broken than a table inheritance setup with mutually exclusive CHECK
> constraints, or for that matter a plain table with mutually exclusive
> CHECK constraints - and we don't try to prohibit those things.  This
> patch is supposed to be implementing partitioning, not artificial
> intelligence.

Agree with your arguments.  Certainly, I overlooked some use cases
that my proposed solution would outright prevent.  I withdraw it.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

Reply via email to