Re: [HACKERS] Partitioning option for COPY

Emmanuel Cecchet Mon, 23 Nov 2009 06:40:54 -0800

Simon,

I think you should read the thread and the patch before making any falsestatements like you did in your email.


1. The patch does not use any trigger for routing.

2. This is just an option for COPY that is useful for loading operationsin the datawarehouse world. It is not meant to implement fullpartitioning as explained many times already in this thread.3. This patch elaborates on existing mechanisms and cannot rely on ameta-data representation of partitions which does not exist yet and willprobably not exist in 8.5

You should justify your statements when you say 'potentially buggy inits approach to developing a cache and using trigger-like stuff'. Iunderstand that you don't like it because this is not what you want butthis is not my fault. This is not an implementation of partitioning likeCOPY does not do update/delete/alter/...And yes the use case is 'narrow' like any option in COPY. It is likecomplaining that the CSV option is not useful because you want to loadbinary dumps.

If Itagaki gets the support of the community to get his implementationaccepted, I will gladly use it. Contributing? If Aster is willing tocontribute a code monkey to implement your specs, why not but you willhave to convince them.

You should really think twice about the style of your emails that cast adetestable tone to discussions on pg-hackers.


Emmanuel

On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:

Hi,
I have extracted the partitioning option for COPY (removed the errorlogging part) from the previous patch.
We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY?
Tom has already explained on the list why using a trigger was a bad idea(and I know we can use a trigger since I am the one who wrote it).If you look at the code you will see that you can do optimizations inthe COPY code that you cannot do in the trigger.
 Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
As 8.5 will at best provide a syntactic hack on top of the existingconstraint implementation, I think that it will not hurt to have routingin COPY since we will not have it anywhere otherwise.
BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.
This implementation is only for the current 8.5 and it will not beneeded anymore once we get a fully functional partitioning in Postgreswhich seems to be for a future version.

Yes, the trigger way of doing this is a bad way.

I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and

using trigger-like stuff.

ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make

this work for 8.5?

---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values. This will then
allow us to use bsearch to find the partition that a specific value
applies to. Thus it can be used for routing data from INSERTs or COPY,
can be used for identifying which partitions need to be
included/excluded from an APPEND node. Using this will be O(logN) rather
than O(N), so allowing us to have much larger number of partitions when
required. Note that it can also be used within the executor to perform
dynamic partition elimination, thus allowing us to easily implement
partition aware joins etc.

To construct the array we must sort the partition boundary values and
prove that the partition definitions do not overlap. That is much easier
to do when the partitions are explicitly defined. (Plus, there is no
requirement to have, or mechanism to specify, unique partitions
currently, although most users assume this in their usage).

I imagine we would have an API called something like
RelationIdentifyPartition() where we provide value(s) for the
PartitioningKey column(s) and we then return the Oid of the partition
that holds that value. That function would build the metadata, if not
already cached, then bsearch it to provide the Oid.



--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

Reply via email to