On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > Meanwhile, here are updated patch that address most of the following comments.
OK, I have re-reviewed 0005 and it looks basically fine to me, modulo a few minor nitpicks. "This is called, *iff*" shouldn't have a comma there, and I think the entire comment block that starts with "NOTE: SQL specifies that a NULL" and ends with "it's unlikely that NULL would result." should be changed to say something like /* As for catalogued constraints, we treat a NULL result as a success, not a failure. */ rather than duplicating an existing comment that doesn't quite apply here. Finally, ExecConstraints() contains a new if-block whose sole contents are another if-block. Perhaps if (this && that) would be better. Regarding 0006 and 0007, I think the PartitionTreeNodeData structure you've chosen is awfully convoluted and probably not that efficient. For example, get_partition_for_tuple() contains this loop: + prev = parent; + node = parent->downlink; + while (node != NULL) + { + if (node->index >= cur_idx) + break; + + prev = node; + node = node->next; + } Well, it looks to me like that's an O(n) way to find the n'th partition, which seems like a pretty bad idea in performance-critical code, which this is. I think this whole structure needs a pretty heavy overhaul. Here's a proposal: 1. Forget the idea of a tree. Instead, let the total number of tables in the partitioning hierarchy be N and let the number of those that are partitioned be K. Assign each partitioned table in the hierarchy an index between 0 and K-1. Make your top level data structure (in lieu of PartitionTreeNodeData) be an array of K PartitionDispatch objects, with the partitioning root in entry 0 and the rest in the remaining entries. 2. Within each PartitionDispatch object, store (a) a pointer to a PartitionDesc and (b) an array of integers of length equal to the PartitionDesc's nparts value. Each integer i, if non-negative, is the final return value for get_partition_for_tuple. If i == -1, tuple routing fails. If i < -1, we must next route using the subpartition whose PartitionDesc is at index -(i+1). Arrange for the array to be in the same order the PartitionDesc's OID list. 3. Now get_partition_for_tuple looks something like this: K = 0 loop: pd = PartitionDispatch[K] idx = list/range_partition_for_tuple(pd->partdesc, ...) if (idx >= -1) return idx K = -(idx + 1) No recursion, minimal pointer chasing, no linked lists. The whole thing is basically trivial aside from the cost of list/range_partition_for_tuple itself; optimizing that is a different project. I might have some details slightly off here, but hopefully you can see what I'm going for: you want to keep the computation that happens in get_partition_for_tuple() to an absolute minimum, and instead set things up in advance so that getting the partition for a tuple is FAST. And you want the data structures that you are using in that process to be very compact, hence arrays instead of linked lists. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers