* Simon Riggs (si...@2ndquadrant.com) wrote: > It would seem normal and natural to have > > * pg_joinam catalog table for "join methods" with a join method API > Which would include some way of defining which operators/datatypes we > consider this for, so if PostGIS people come up with some fancy GIS > join thing, we don't invoke it every time even when its inapplicable. > I would prefer it if PostgreSQL also had some way to control when the > joinam was called, possibly with some kind of table_size_threshold on > the AM tuple, which could be set to >=0 to control when this was even > considered.
It seems useful to think about how we would redefine our existing join methods using such a structure. While thinking about that, it seems like we would worry more about what the operators provide rather than the specific operators themselves (ala hashing / HashJoin) and I'm not sure we really care about the data types directly- just about the operations which we can do on them.. I can see a case for sticking data types into this if we feel that we have to constrain the path possibilities for some reason, but I'd rather try and deal with any issues around "it doesn't make sense to do X because we'll know it'll be really expensive" through the cost model instead of with a table that defines what's allowed or not allowed. There may be cases where we get the costing wrong and it's valuable to be able to tweak cost values on a per-connection basis or for individual queries. I don't mean to imply that a 'pg_joinam' table is a bad idea, just that I'd think of it being defined in terms of what capabilities it requires of operators and a way for costing to be calculated for it, plus the actual functions which it provides to implement the join itself (to include some way to get output suitable for explain, etc..). > * pg_scanam catalog table for "scan methods" with a scan method API > Again, a list of operators that can be used with it, like indexes and > operator classes Ditto for this- but there's lots of other things this makes me wonder about because it's essentially trying to define a pluggable storage layer, which is great, but also requires some way to deal with all of things we use our storage system for: cacheing / shared buffers, locking, visibility, WAL, unique identifier / ctid (for use in indexes, etc)... > By analogy to existing mechanisms, we would want > > * A USERSET mechanism to allow users to turn it off for testing or > otherwise, at user, database level If we re-implement our existing components through this ("eat our own dogfood" as it were), I'm not sure that we'd be able to have a way to turn it on/off.. I realize we wouldn't have to, but then it seems like we'd have two very different code paths and likely a different level of support / capability afforded to "external" storage systems and then I wonder if we're not back to just FDWs again.. > We would also want > > * A startup call that allows us to confirm it is available and working > correctly, possibly with some self-test for hardware, performance > confirmation/derivation of planning parameters Yeah, we'd need this for anything that supports a GPU, regardless of how we implement it, I'd think. > * Some kind of trace mode that would allow people to confirm the > outcome of calls Seems like this would be useful independently of the rest.. > * Some interface to the stats system so we could track the frequency > of usage of each join/scan type. This would be done within Postgres, > tracking the calls by name, rather than trusting the plugin to do it > for us This is definitely something I want for core already... Thanks, Stephen
signature.asc
Description: Digital signature