Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On vendredi 24 mars 2017 08:14:03 CEST Ashutosh Bapat wrote: > On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklau <ronan.dunk...@dalibo.com> wrote: > > On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote: > >> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunk...@dalibo.com> > > > > wrote: > >> > With range partitioning, we guarantee that each partition contains non- > >> > overlapping values. Since we know the range allowed for each partition, > >> > it > >> > is possible to sort them according to the partition key (as is done > >> > already for looking up partitions). > >> > > >> > Thus, we ca generate sorted Append plans instead of MergeAppend when > >> > sorting occurs on the partition key. > >> > >> Great idea. This is too late for v10 at this point, but please add it > >> to the next CommitFest so we don't forget about it. > > > > I know it is too late, and thought that it was too early to add it to the > > commitfest properly since so many design decisions should be discussed. > > Thanks for the feedback, I added it. > > Thanks for working on this. I was also thinking about the same. > > I will try to get back to review/work with this for v11. Mean time, I > am working on partition-wise joins [1]. In those patches, I have added > a structure called PartitionScheme, which represents how a relation is > partitioned. For base relations it's mostly copy of PartitionDesc and > PartitionKey, but then it bubbles up across join, with each > partitioned join getting relevant partitioning scheme. If you could > base your design such that is uses PartitionScheme, it could be used > for joins and probably when Jeevan's patch for partition-wise > aggregate [2] comes along, it can be used with grouping. > > [1] > https://www.postgresql.org/message-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQ > vQTwzQOGczq_sw%40mail.gmail.com [2] > http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html Thank you. I should probably wait until your patch is finalised before spending too much time on it, and I could probably also leverage the incremental sort patch if there is progress on it. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote: > On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunk...@dalibo.com> wrote: > > With range partitioning, we guarantee that each partition contains non- > > overlapping values. Since we know the range allowed for each partition, it > > is possible to sort them according to the partition key (as is done > > already for looking up partitions). > > > > Thus, we ca generate sorted Append plans instead of MergeAppend when > > sorting occurs on the partition key. > > Great idea. This is too late for v10 at this point, but please add it > to the next CommitFest so we don't forget about it. I know it is too late, and thought that it was too early to add it to the commitfest properly since so many design decisions should be discussed. Thanks for the feedback, I added it. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Proposal] Make the optimiser aware of partitions ordering
> Seq Scan on child22 EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1, c2; QUERY PLAN Append Sort Key: parent1.c1, parent1.c2 -> Sort Sort Key: parent1.c1, parent1.c2 -> Append -> Seq Scan on parent1 -> Seq Scan on child11 -> Seq Scan on child12 -> Append Sort Key: child21.c1, child21.c2 -> Sort Sort Key: child21.c1, child21.c2 -> Seq Scan on child21 -> Sort Sort Key: child22.c1, child22.c2 -> Seq Scan on child22 = Patch design = First, we don't really know what we're doing :). But this is a proof of concept, and if there is interest in this patch, let us know how we should have done things and we're willing to rewrite it entirely if needed. Overview In the optimiser, we generate PathKeys representing the two sort orders by wich partition can be ordered (ASC and DESC), only for "native" range-based partitioned tables, and store them in RelOptInfo associated with every parent table, along with a list of OIDs corresponding to the partitions. Then, when the time comes to generate append paths, we add another step which tries to match the query_pathkeys to those of the partition, and generate another append node with the sorted paths for each partitions, in the expected order, and a pathkey to the append path itself to indicate its already sorted. Known Problems with the patch Once again, keep in mind it's a proof-of-concept - It is in conflict with the existing patch designed to avoid adding the parent partitions to the append node. As such, it will almost certainly need a full rewrite to adapt to that since that is done in this patch in a probably less than ideal fashion. - As of now, the patch doesn't try to match the partition pathkey to mergejoinable pathkeys. - maybe a new node should be created altogether, instead of porting most of the MergeAppend struct fields to the basic Append ? - the way we store the PathKeys and partitions OID directly in RelOptInfo is probably wrong - the major impact on existing queries is the fact that to handle sub- partitioning, RangeTblEntries representing intermediate append nodes are added (but just in the case of native partitioning, regular inheritance is not affected). See updated prepunion.c for what that means. Those RTE are then ignored when generating the real Append nodes. - new regression tests have not been written yet, and existing ones haven't been "fixed" to take into account the different partition ordering: in case of sub partitioning, the order is now "depth-first" rather than "breadth-first" like it was earlier. - no documentation has been added, although we don't really know where it should go - the patch lacks a lot of comments - the key displayed in EXPLAIN output comes from the first partition, not the parent. - maybe a "real" SortPath should be generated, instead of generating Sort nodes out of nowhere when creating the plan. This has been done to conform to what MergeAppend already did. = Questions = Is there interest for such an optimization ? Should we work from the patch proposed to remove parent tables already ? Should there be a new Node for such a "SortedAppend" operation or is it fine keeping it with the Append node already ? Should that instead be an optimization of the MergeAppend Node ? What is or is not acceptable with regards to storing things in RelOptInfo (among others) ? More generally, any kind of feedback that will help us get on the right track will be appreciated. -- Ronan Dunklau & Julien Rouhaud http://dalibo.com - http://dalibo.orgcommit 21e09fb1038b0fb48f32a9013bee64279af8dfd7 Author: Ronan Dunklau <ronan.dunk...@dalibo.com> Date: Tue Feb 28 09:00:44 2017 +0100 Consider sorting order of partitions for append nodes diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index c9b55ead3d..e566928f6c 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -81,6 +81,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es); static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors, ExplainState *es); +static void show_append_keys(AppendState *mstate, List *ancestors, + ExplainState *es); static void show_agg_keys(AggState *astate, List *ancestors, ExplainState *es); static void show_grouping_sets(PlanState *planstate, Agg *agg, @@ -1561,6 +1563,10 @@ ExplainNode(PlanState *planstate, List *ancestors, show_sort_keys(castNode(SortState, planstate), ancestors, es); show_sort_info(castNode(SortState, planstate), es); break; + case
[HACKERS] Questions about MergeAppend
Hello, Looking into the MergeAppendPath generation, I'm a bit surprised by several things: - When generating the mergeappendpath, we use a dummy path to take the sort cost into account for non-sorted subpaths. This is called with the base relation tuples instead of the subpath estimated number of rows. This tends to overestimate the sorting cost drastically, since the base relation can be filtered thus reducing the number of input tuples for the sorting routine. Please find attached a trivial patch fixing this. - Why does it only generate a "fake" SortPath for sorting purpose, not adding it to the subpath, and posptone the creation of Sort plan node until later ? This also adds a bit of complexity when fixing the sort cost node later for explain output. - Why do we only consider generating MergeAppendPath for PathKeys for which a sorted Path exists in any of the child relation ? It seems to me there could be an advantage in using a MergeAppend of explicitly sorted relations over sorting an Append, in particular if every existing subpath can be sorted in work_mem. -- Ronan Dunklau http://dalibo.com - http://dalibo.org diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 324829690d..a99a3aeceb 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1313,7 +1313,7 @@ create_merge_append_path(PlannerInfo *root, root, pathkeys, subpath->total_cost, - subpath->parent->tuples, + subpath->rows, subpath->pathtarget->width, 0.0, work_mem, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New design for FK-based join selectivity estimation
On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote: > Hi hackers, > > The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an > estimation error : [] > > Estimated row is 10x larger since 100340e2d > > Regards, Hello, I think I understand what the problem is. In get_foreign_key_join_selectiviy, we remove the restrict info clauses which match a foreign key. This is done so that the selectivy is not applied twice (once in the function itself, once when processing the restrictinfos). The problem is, for semi and anti joins, we assume that we have nohing to do (costsize.c:4253): else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { /* * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the * fraction of LHS rows that have matches. If the referenced * table is on the inner side, that means the selectivity is 1.0 * (modulo nulls, which we're ignoring for now). We already * covered the other case, so no work here. */ } This results in assuming that the whole outerrel will match, no matter the selectivity of the innerrel. If I understand it correctly and the above is right, I think we should ignore SEMI or ANTI joins altogether when considering FKs, and keep the corresponding restrictinfos for later processing since they are already special-cased later on. Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Possible regression regarding estimating relation width in FDWs
Hello, While working on adapting the Multicorn FDW for 9.6, I noticed that there is a regression with regards to estimating the remote relation width. This behavior can be exposed using the postgres_fdw, using "use_remote_estimate". Test case: CREATE EXTENSION postgres_fdw; CREATE SERVER localhost FOREIGN DATA WRAPPER postgres_fdw; CREATE USER MAPPING FOR CURRENT_USER SERVER localhost; CREATE TABLE local_table (c1 text); INSERT INTO local_table (c1) (SELECT repeat('test', 1)); ANALYZE local_table; CREATE FOREIGN TABLE foreign_table (c1 text) SERVER localhost OPTIONS (table_name 'local_table', use_remote_estimate 'true'); EXPLAIN SELECT * FROM foreign_table; Output, on current HEAD: QUERY PLAN -- Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=32) On 9.5: QUERY PLAN --- Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=472) While the FDW correctly sets the pathtarget width, it is then overriden at a later point. I'm not sure what happens exactly, but it seems that the relation path target is ignored completely, in planner.c:1695: /* * Convert the query's result tlist into PathTarget format. * * Note: it's desirable to not do this till after query_planner(), * because the target width estimates can use per-Var width numbers * that were obtained within query_planner(). */ final_target = create_pathtarget(root, tlist); It says explicitly that it will be computed using per-Var width numbers. I think the current_rel->cheapest_total_path->pathtarget should be taken into account, at least in the FDW case. I'm not sure if the ability to estimate the whole relation width should be deprecated in favor of per-var width, or if it still should be supported (after all, the GetForeignRelSize callback is called AFTER having set a value computed from the individual attr_widths, in set_foreign_size). But in any case, at least postgres_fdw should be updated to support that. Sorry if that was not clear, I'm at PGCon at the moment so if anyone want to discuss that in person I'm available. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump / copy bugs with big lines ?
Le lundi 30 mars 2015 18:45:41 Jim Nasby a écrit : On 3/30/15 5:46 AM, Ronan Dunklau wrote: Hello hackers, I've tried my luck on pgsql-bugs before, with no success, so I report these problem here. The documentation mentions the following limits for sizes: Maximum Field Size 1 GB Maximum Row Size1.6 TB However, it seems like rows bigger than 1GB can't be COPYed out: ro=# create table test_text (c1 text, c2 text); CREATE TABLE ro=# insert into test_text (c1) VALUES (repeat('a', 536870912)); INSERT 0 1 ro=# update test_text set c2 = c1; UPDATE 1 Then, trying to dump or copy that results in the following error: ro=# COPY test_text TO '/tmp/test'; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912 more bytes. In fact, the same thing happens when using a simple SELECT: ro=# select * from test_text ; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912 more bytes. In the case of COPY, the server uses a StringInfo to output the row. The problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row should be able to hold much more than that. Yeah, shoving a whole row into one StringInfo is ultimately going to limit a row to 1G, which is a far cry from what the docs claim. There's also going to be problems with FE/BE communications, because things like pq_sendbyte all use StringInfo as a buffer too. So while Postgres can store a 1.6TB row, you're going to find a bunch of stuff that doesn't work past around 1GB. So, is this a bug ? Or is there a caveat I would have missed in the documentation ? I suppose that really depends on your point of view. The real question is whether we think it's worth fixing, or a good idea to change the behavior of StringInfo. StringInfo uses int's to store length, so it could possibly be changed, but then you'd just error out due to MaxAllocSize. Now perhaps those could both be relaxed, but certainly not to the extent that you can shove an entire 1.6TB row into an output buffer. Another way to look at it would be to work in small chunks. For the first test case (rows bigger than 1GB), maybe the copy command could be rewritten to work in chunks, flushing the output more often if needed. For the conversion related issues, I don't really see any other solution than extending StrinigInfo to allow for more than 1GB of data. On the other hand, those one can easily be circumvented by using a COPY ... WITH binary. The other issue is that there's a LOT of places in code that blindly copy detoasted data around, so while we technically support 1GB toasted values you're probably going to be quite unhappy with performance. I'm actually surprised you haven't already seen this with 500MB objects. So long story short, I'm not sure how worthwhile it would be to try and fix this. We probably should improve the docs though. I think that having data that can't be output by pg_dump is quite surprising, and if this is not fixable, I agree that it should clearly be documented. Have you looked at using large objects for what you're doing? (Note that those have their own set of challenges and limitations.) Yes I do. This particular customer of ours did not mind the performance penalty of using bytea objects as long as it was convenient to use. We also hit a second issue, this time related to bytea encoding. There's probably several other places this type of thing could be a problem. I'm thinking of conversions in particular. Yes, thats what the two other test cases I mentioned are about: any conversion leadng to a size greater than 1GB results in an error, even implicit conversions like doubling antislashes in the output. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
[HACKERS] pg_dump / copy bugs with big lines ?
Hello hackers, I've tried my luck on pgsql-bugs before, with no success, so I report these problem here. The documentation mentions the following limits for sizes: Maximum Field Size 1 GB Maximum Row Size1.6 TB However, it seems like rows bigger than 1GB can't be COPYed out: ro=# create table test_text (c1 text, c2 text); CREATE TABLE ro=# insert into test_text (c1) VALUES (repeat('a', 536870912)); INSERT 0 1 ro=# update test_text set c2 = c1; UPDATE 1 Then, trying to dump or copy that results in the following error: ro=# COPY test_text TO '/tmp/test'; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912 more bytes. In fact, the same thing happens when using a simple SELECT: ro=# select * from test_text ; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912 more bytes. In the case of COPY, the server uses a StringInfo to output the row. The problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row should be able to hold much more than that. So, is this a bug ? Or is there a caveat I would have missed in the documentation ? We also hit a second issue, this time related to bytea encoding. This test case is a bit more complicated, since I had to use an external (client) program to insert my data. It involves inserting a string that fit into 1GB when encoded in escape format, but is larger than that in hex, and another string which fits in 1GB using the hex format, but is larger than that in escape: from psycopg2 import connect from io import BytesIO conn = connect(dbname=ro) cur = conn.cursor() fullcontent = BytesIO() # Write a binary string that weight less # than 1 GB when escape encoded, but more than # that if hex encoded for i in range(200): fullcontent.write(baaa * 100) fullcontent.seek(0) cur.copy_from(fullcontent, test_bytea) fullcontent.seek(0) fullcontent.truncate() # Write another binary string that weight # less than 1GB when hex encoded, but more than # that if escape encoded cur.execute(SET bytea_output = 'hex') fullcontent.write(bx) for i in range(300): fullcontent.write(b00 * 100) fullcontent.seek(0) cur.copy_from(fullcontent, test_bytea) cur.execute(COMMIT;) cur.close() I couldn't find an invocation of pg_dump which would allow me to dump both lines: ro@ronan_laptop /tmp % PGOPTIONS=-c bytea_output=escape pg_dump -Fc /dev/null pg_dump: Dumping the contents of table test_bytea failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 120001 pg_dump: The command was: COPY public.test_bytea (c1) TO stdout; ro@ronan_laptop /tmp % PGOPTIONS=-c bytea_output=hex pg_dump -Fc /dev/null pg_dump: Dumping the contents of table test_bytea failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid memory alloc request size 120003 pg_dump: The command was: COPY public.test_bytea (c1) TO stdout; Using a COPY with binary format works: ro=# COPY test_bytea TO '/tmp/test' WITH BINARY; There seems to be a third issue, with regards to escape encoding: the backslash character is escaped, by adding another backslash. This means that a field which size is less than 1GB using the escape sequence will not be able to be output once the backslash are escaped. For example, lets consider a string consisting of 3 '\' characters: ro=# select length(c1) from test_bytea; length --- 3 (1 ligne) ro=# select length(encode(c1, 'escape')) from test_bytea ; length --- 6 (1 ligne) ro=# set bytea_output to escape; SET ro=# copy test_bytea to '/tmp/test.csv' ; ERROR: out of memory DÉTAIL : Cannot enlarge string buffer containing 1073741822 bytes by 1 more bytes. I think pg_dump should not error out on any data which was valid upon insertion. It seems the fix would be non-trivial, since StringInfo structures are relying on a limit of MaxAllocSize. Or am I missing something ? Thank you. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Regarding pg_stat_statements
Le vendredi 13 mars 2015 14:59:28 Sreerama Manoj a écrit : Hi, As we know that pg_stat_statements will monitor the queries after normalizing the queries(Removes the values present in query). I want to know is there a way to store those normalized values because I want to check the type of data(values) ,range of data that is being hit to the database. I am using Postgres 9.4 Hello. You may be interested in the pg_qualstats extension: https://github.com/dalibo/pg_qualstats The purpose of the extension is to track values like pg_stat_statements, but at the predicate level rather than statement level. It stores normalized predicates as well as constants. The documentation is here: http://powa.readthedocs.org/en/latest/stats_extensions/pg_qualstats.html#pg-qualstats It won't give you all normalized values though, only those present in predicates. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
[HACKERS] Dumping database creation options and ACLs
Hello. As of now, the only way to restore database options and ACLs is to use pg_dumpall without the globals options. The often recommended pg_dumpall -g + individual dumps of the target databases doesn't restore those. Since pg_dump/pg_restore offer the ability to create the database, it should do so with the correct owner, options and database ACLs. There was some discussion about those issues a while ago (see http://www.postgresql.org/message-id/11646.1272814...@sss.pgh.pa.us for example). As I understand it, the best way to handle that would be to push these modifications in pg_dump, but it is unclear how it should be done with regards to restoring to a different database. In the meantime, it would be great to add an option to pg_dumpall allowing to dump this information. We could add the db creation in the output of pg_dumpall -g, and add a specific --createdb-only option (similar to --roles- only and --tablespaces-only). Would such a patch be welcome ? PS: this email was originally sent to the pgsql-bugs mailing list -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] foreign data wrapper option manipulation during Create foreign table time?
Le mercredi 29 octobre 2014 14:16:23 Demai Ni a écrit : Robert and Ronan, many thanks for your response. I realized there is no clean way/api for it. maybe a hacking of ptree can do the trick.. :-) I will also take a look at IMPORT FOREIGN SCHEMA. However, for this requirement, I still need the user to input filename or filefolder, and I'd like to process the file(s) during create foreign table time, and save the processed result somewhere like ftoptions column in pg_foreign_table. may be some other way I can save the process result and make it assessable during query time? You can pass options to IMPORT FOREIGN SCHEMA. So, you could maybe implement it so that the user can do that: IMPORT FOREIGN SCHEMA public FROM SERVER file_server INTO local_schema OPTIONS (filename '/path/to/the/file'); Your FDW would then issue several CREATE FOREIGN TABLE statements, with all the necessary options. Or even, to allow importing multiple files at once: IMPORT FOREIGN SCHEMA public FROM SERVER file_server INTO local_schema OPTIONS (directory '/path/to/the/file_dir/'); Demai On Wed, Oct 29, 2014 at 10:01 AM, Ronan Dunklau ronan.dunk...@dalibo.com wrote: Le mercredi 29 octobre 2014 12:49:12 Robert Haas a écrit : On Tue, Oct 28, 2014 at 5:26 PM, Demai Ni nid...@gmail.com wrote: I am looking for a couple pointers here about fdw, and how to change the option values during CREATE table time. I am using postgres-xc-1.2.1 right now. For example, it contains file_fdw, whose create-table-stmt looks like: CREATE FOREIGN TABLE t1() SERVER file_server OPTIONS(format 'text',filename 'testing.txt'); I would like to replace the 'testing.txt' with absolute path like '/user/testor1/testing.txt', and make sure the new value is saved in pg_foreign_table; the file_fdw_validator is used to validate the options, but is there a way to replace the optionValue here? And get the new value stored in pg_foreign_table? Thanks BTW, in my real use case, I am trying to interpret a hdfs file and would need to save some hostname/port information in the option value, which not necessary specified by user. I don't think there's going to be a clean way to do this. The intention of the system is that the user-provided options are simply stored, not that the FDW author is going to use the options list to store their own bits and pieces. I would do that during the IMPORT FOREIGN SCHEMA statement. That way, the user doesn't have to specify those options: they would be generated at IMPORT time, and the user could change them later if really needed. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] foreign data wrapper option manipulation during Create foreign table time?
Le mercredi 29 octobre 2014 12:49:12 Robert Haas a écrit : On Tue, Oct 28, 2014 at 5:26 PM, Demai Ni nid...@gmail.com wrote: I am looking for a couple pointers here about fdw, and how to change the option values during CREATE table time. I am using postgres-xc-1.2.1 right now. For example, it contains file_fdw, whose create-table-stmt looks like: CREATE FOREIGN TABLE t1() SERVER file_server OPTIONS(format 'text',filename 'testing.txt'); I would like to replace the 'testing.txt' with absolute path like '/user/testor1/testing.txt', and make sure the new value is saved in pg_foreign_table; the file_fdw_validator is used to validate the options, but is there a way to replace the optionValue here? And get the new value stored in pg_foreign_table? Thanks BTW, in my real use case, I am trying to interpret a hdfs file and would need to save some hostname/port information in the option value, which not necessary specified by user. I don't think there's going to be a clean way to do this. The intention of the system is that the user-provided options are simply stored, not that the FDW author is going to use the options list to store their own bits and pieces. I would do that during the IMPORT FOREIGN SCHEMA statement. That way, the user doesn't have to specify those options: they would be generated at IMPORT time, and the user could change them later if really needed. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)
Le dimanche 12 octobre 2014 13:17:00 Andres Freund a écrit : On 2014-10-12 23:13:27 +1300, David Rowley wrote: On Wed, Sep 24, 2014 at 8:02 AM, Christoph Berg c...@df7cb.de wrote: Re: Tom Lane 2014-09-23 15155.1411493...@sss.pgh.pa.us Robert Haas robertmh...@gmail.com writes: On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote: Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time line? That would even be backwards compatible with 9.x where it would be a no-op. I don't think that'll work becuase: /* check that timing is used with EXPLAIN ANALYZE */ if (es.timing !es.analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(EXPLAIN option TIMING requires ANALYZE))); It looks to me like that would complain about EXPLAIN (TIMING ON), not the case Christoph is suggesting. What he proposes seems a bit odd and non-orthogonal, but we could make the code do it if we wanted. I don't think this warrants a new flag, and TIMING OFF seems to be the right naming for it. (In fact it was the first I tried, and I was cursing quite a bit over the lack of configurability until I realized that COSTS OFF disabled the planning time display as well.) It might be a bit odd, but it's easy to remember. I'm pretty interested in seeing something change around here. The patch I'm working on at the moment (INNER JOIN removals) implements skipping of joins at execution time rather than planning time. Currently I'm working on the regression test for this and it's not all that easy due to the execution time appearing in the results. An explain analyze output from master with the patch can look something like: explain (analyze, costs off, timing off) select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id; QUERY PLAN --- Hash Join (actual rows=1 loops=1) Hash Cond: (b.c_id = c.id) - Hash Join (actual rows=1 loops=1) Hash Cond: (a.b_id = b.id) - Seq Scan on a (actual rows=1 loops=1) - Hash (never executed) - Seq Scan on b (never executed) - Hash (never executed) - Seq Scan on c (never executed) Execution time: 0.092 ms (10 rows) So you're now the third person reporting problems here. Let's remove 'execution time' for COSTS off. I personally would even say that we should backpatch that to make backpatches involving regression tests less painful. That wouldn't solve the first problem mentioned, which is that for some regression tests one may want to test the costs themselves, which is now impossible with the new planning time feature. What would IMO make both cases suitable would be to eliminate ALL timing from TIMING OFF, not only the timing on the individual nodes. As was mentioned before, it is a bit counter intuitive to have COSTS OFF disable the planning time, and not TIMING OFF. Greetings, Andres Freund -- Ronan Dunklau signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le mercredi 9 juillet 2014 21:30:05 Tom Lane a écrit : Michael Paquier michael.paqu...@gmail.com writes: I guess that this implementation is enough as a first shot, particularly regarding the complexity that default expression import would bring up for postgres_fdw (point raised during the review, but not really discussed afterwards). Yeah. I'm actually more concerned about the lack of collation import, but that's unfixable unless we can figure out how to identify collations in a platform-independent way. regards, tom lane Thank you for working on this patch, I'm not really fond of building strings in a FDW for the core to parse them again but I don't see any other solution to the problem you mentioned. Similarly to what we do for the schema, we should also check that the server in the parsed statement is the one we are importing from. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le lundi 7 juillet 2014 07:58:33 Robert Haas a écrit : On Mon, May 26, 2014 at 12:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: Albe Laurenz laurenz.a...@wien.gv.at writes: In addition to data type mapping questions (which David already raised) I have one problem when I think of the Oracle FDW: Oracle follows the SQL standard in folding table names to upper case. So this would normally result in a lot of PostgreSQL foreign tables with upper case names, which would be odd and unpleasant. I cannot see a way out of that, but I thought I should mention it. It seems like it would often be desirable for the Oracle FDW to smash all-upper-case names to all-lower-case while importing, so that no quoting is needed on either side. I doubt though that this is so desirable that it should happen unconditionally. Between this and the type-mapping questions, it seems likely that we're going to need a way for IMPORT FOREIGN SCHEMA to accept user-supplied control options, which would in general be specific to the FDW being used. (Another thing the SQL committee failed to think about.) Is this part of the SQL standard? What is it defined to do about non-table objects? The OPTIONS clause is not part of the SQL Standard. Regarding non-table objects, the standard only talks about tables, and nothing else. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le mardi 1 juillet 2014 06:59:49 Albe Laurenz a écrit : Michael Paquier wrote: After sleeping on it, I have put my hands on the postgres_fdw portion and came up with a largely simplified flow, resulting in the patch attached. [...] Ronan, what do you think of those patches? I have nothing more to add, and I think that they should be looked by a committer. Particularly the FDW API that is perhaps not the best fit, but let's see some extra opinions about that. The remote_schema parameter can be used for SQL injection. Either we should go back to using parameters, or be extra careful. Since the remote schema is parsed as a name, it is limited to 64 characters which is not that useful for an SQL injection, but still. The new query as you wrote it returns the typname (was cast to regtype before) This is not schema qualified, and will fail when importing tables with columns from a type not in search_path. The regression tests don't pass: a user name is hard-coded in the result of DROP USER MAPPING. Should we expect the tests to be run as postgres ? I looked the the API and ist documentation, and while I saw no problem with the API, I think that the documentation still needs some attention: It mentions a local_schema, which doesn't exist (any more?). It should be mentioned that the CreateForeignTableStmt's base.relation-schemaname should be set to NULL. Also, it would be nice to find a few words for options, maybe explaining a potential application. Yours, Laurenz Albe -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Array of composite types returned from python
Le dimanche 29 juin 2014 16:54:03 Tom Lane a écrit : Abhijit Menon-Sen a...@2ndquadrant.com writes: 3) This is such a simple change with no new infrastructure code (PLyObject_ToComposite already exists). Can you think of a reason why this wasn't done until now? Was it a simple miss or purposefully excluded? This is not an authoritative answer: I think the infrastructure was originally missing, but was later added in #bc411f25 for OUT parameters. Perhaps it was overlooked at the time that the same code would suffice for this earlier-missing case. (I've Cc:ed Peter E. in case he has any comments.) I think the patch is ready for committer. Sorry for being this late. I've tested the patch, everything seems to work as expected, including complex nesting of Composite and array types. No documentation changes are needed, since the limitation wasn't even mentioned before. Regression tests are ok, and the patch seems simple enough. Formatting looks OK too. I took a quick look at this; not really a review either, but I have a couple comments. 1. While I think the patch does what it intends to, it's a bit distressing that it will invoke the information lookups in PLyObject_ToComposite over again for *each element* of the array. We probably ought to quantify that overhead to see if it's bad enough that we need to do something about improving caching, as speculated in the comment in PLyObject_ToComposite. I don't know how to do that without implementing the cache itself. 2. I wonder whether the no-composites restriction in plpy.prepare (see lines 133ff in plpy_spi.c) could be removed as easily. Hum, I tried that, but its not that easy: lifting the restriction results in a SEGFAULT when trying to pfree the parameters given to SPI_ExecutePlan (line 320 in plpy_spi.c). Correct me if I'm wrong, but I think the problem is that HeapTupleGetDatum returns the t_data field, whereas heap_form_tuple allocation returns the address of the HeapTuple itself. Then, the datum itself has not been palloced. Changing the HeapTupleGetDatum call for an heap_copy_tuple_as_datum fixes this issue, but I'm not sure this the best way to do that. The attached patch implements this. regards, tom lane -- Ronan Dunklau http://dalibo.com - http://dalibo.orgdiff --git a/src/pl/plpython/expected/plpython_composite.out b/src/pl/plpython/expected/plpython_composite.out index 61b3046..e7a7edb 100644 --- a/src/pl/plpython/expected/plpython_composite.out +++ b/src/pl/plpython/expected/plpython_composite.out @@ -270,9 +270,13 @@ yield {'tab': [['first', 1], ['second', 2]], {'first': 'fourth', 'second': 4}]} $$ LANGUAGE plpythonu; SELECT * FROM composite_types_table(); -ERROR: PL/Python functions cannot return type table_record[] -DETAIL: PL/Python does not support conversion to arrays of row types. -CONTEXT: PL/Python function composite_types_table +tab |typ ++ + {(first,1),(second,2)} | {(third,3),(fourth,4)} + {(first,1),(second,2)} | {(third,3),(fourth,4)} + {(first,1),(second,2)} | {(third,3),(fourth,4)} +(3 rows) + -- check what happens if the output record descriptor changes CREATE FUNCTION return_record(t text) RETURNS record AS $$ return {'t': t, 'val': 10} diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out index 5175794..de547d2 100644 --- a/src/pl/plpython/expected/plpython_spi.out +++ b/src/pl/plpython/expected/plpython_spi.out @@ -376,6 +376,15 @@ plan = plpy.prepare(select fname, lname from users where fname like $1 || '%', [text]) c = plpy.cursor(plan, [a, b]) $$ LANGUAGE plpythonu; +CREATE TYPE test_composite_type AS ( + a1 int, + a2 varchar +); +CREATE OR REPLACE FUNCTION plan_composite_args() RETURNS test_composite_type AS $$ +plan = plpy.prepare(select $1::test_composite_type as c1, [test_composite_type]) +res = plpy.execute(plan, [{a1: 3, a2: label}]) +return res[0][c1] +$$ LANGUAGE plpythonu; SELECT simple_cursor_test(); simple_cursor_test @@ -432,3 +441,9 @@ CONTEXT: Traceback (most recent call last): PL/Python function cursor_plan_wrong_args, line 4, in module c = plpy.cursor(plan, [a, b]) PL/Python function cursor_plan_wrong_args +SELECT plan_composite_args(); + plan_composite_args +- + (3,label) +(1 row) + diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out index b98318c..9576905 100644 --- a/src/pl/plpython/expected/plpython_types.out +++ b/src/pl/plpython/expected/plpython_types.out @@ -632,13 +632,12 @@ PL/Python function test_type_conversion_array_mixed2 CREATE FUNCTION test_type_conversion_array_record() RETURNS type_record[] AS $$ return [None] $$ LANGUAGE plpythonu; -ERROR: PL/Python functions cannot return type type_record[] -DETAIL: PL/Python does
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le lundi 30 juin 2014 16:43:38 Michael Paquier a écrit : On Thu, Jun 19, 2014 at 11:00 PM, Michael Paquier michael.paqu...@gmail.com wrote: This seems to be related to re-using the table-name between invocations. The attached patch should fix point 2. As for point 1, I don't know the cause for it. Do you have a reproducible test case ? Sure. I'll try harder when looking in more details at the patch for postgres_fdw :) With v2, I have tried randomly some of the scenarios of v1 plus some extras, but was not able to reproduce it. I'll look into the patch for postgres_fdw later. And here are some comments about it, when applied on top of the other 2 patches. 1) Code compiles without warning, regression tests pass. 2) In fetch_remote_tables, the palloc for the parameters should be done after the number of parameters is determined. In the case of IMPORT_ALL, some memory is wasted for nothing. 3) Current code is not able to import default expressions for a table. A little bit of processing with pg_get_expr would be necessary: select pg_catalog.pg_get_expr(adbin, adrelid) from pg_attrdef; There are of course bonus cases like SERIAL columns coming immediately to my mind but it would be possible to determine if a given column is serial with pg_get_serial_sequence. This would be a good addition for the FDW IMO. 4) The same applies of course to the constraint name: CREATE FOREIGN TABLE foobar (a int CONSTRAINT toto NOT NULL) for example. 5) A bonus idea: enable default expression obtention and null/not null switch by default but add options to disable their respective obtention. 6) Defining once PGFDW_IMPORTRESULT_NUMCOLS at the top of postgres_fdw.c without undefining would be perfectly fine. 7) In postgresImportForeignSchema, the palloc calls and the definitions of the variables used to save the results should be done within the for loop. 8) At quick glance, the logic of postgresImportForeignSchema looks awkward... I'll have a second look with a fresher mind later on this one. While having a second look at the core patch, I have found myself re-hacking it, fixing a couple of bugs and adding things that have been missing in the former implementation: - Deletions of unnecessary structures to simplify code and make it cleaner - Fixed a bug related to the management of local schema name. A FDW was free to set the schema name where local tables are created, this should not be the case. - Improved documentation, examples and other things, fixed doc padding for example - Added some missing stuff in equalfuncs.c and copyfuncs.c - Some other things. With that, core patch looks pretty nice actually, and I think that we should let a committer have a look at this part at least for this CF. Also, the postgres_fdw portion has been updated based on the previous core patch modified, using a version that Ronan sent me, which has addressed the remarks I sent before. This patch is still lacking documentation, some cleanup, and regression tests are broken, but it can be used to test the core feature. I unfortunately don't have much time today but I am sending this patch either way to let people play with IMPORT SCHEMA if I don't come back to it before. The regression tests fail because of a typo in pg_type.h: BOOLARRAYOID should be defined to 1000, not 1003 (which clashes against NAMEARRAYOID). What do you think should be documented, and where ? Regards, -- Ronan Dunklau http://dalibo.com - http://dalibo.orgFrom 1cc2922e9087f2d852d2b1196314d7e35dce42a3 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 30 Jun 2014 16:36:47 +0900 Subject: [PATCH 2/2] Add support of IMPORT SCHEMA for postgres_fdw --- contrib/postgres_fdw/expected/postgres_fdw.out | 62 ++ contrib/postgres_fdw/postgres_fdw.c| 259 + contrib/postgres_fdw/sql/postgres_fdw.sql | 30 +++ src/include/catalog/pg_type.h | 2 + 4 files changed, 353 insertions(+) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 2e49ee3..07e6c11 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -2834,3 +2834,65 @@ NOTICE: NEW: (13,test triggered !) (0,27) (1 row) +-- Test IMPORT FOREIGN SCHEMA statement +CREATE schema import_destination; +CREATE schema import_source; +CREATE TABLE import_source.t1 (c1 int, c2 varchar NOT NULL); +CREATE TABLE import_source.t2 (c1 int, c2 varchar); +CREATE TABLE import_source.t3 (c1 int, c2 varchar); +IMPORT FOREIGN SCHEMA import_source FROM SERVER loopback INTO import_destination; +\det+ import_destination; +List of foreign tables + Schema | Table | Server | FDW Options | Description
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le lundi 16 juin 2014 11:32:38 Atri Sharma a écrit : On Mon, Jun 16, 2014 at 11:28 AM, Michael Paquier michael.paqu...@gmail.com wrote: Just wondering: what about the case where the same data type is defined on both local and remote, but with *different* definitions? Is it the responsibility of the fdw to check for type incompatibilities? Logically, should be. This is akin to what Stephen proposed, to allow IMPORT FOREIGN SCHEMA to also import types. The problem with checking if the type is the same is deciding where to stop. For composite types, sure it should be easy. But what about built-in types ? Or types provided by an extension / a native library ? These could theorically change from one release to another. Just wondering, cant we extend the above proposed function typedef List *(*ImportForeignSchema_function) (ForeignServer *server, ImportForeignSchemaStmt * parsetree); be changed a bit to give exact type definitions from the remote side as well? I toyed with this idea, but the more I think about it the less I'm sure what the API should look like, should we ever decide to go beyond the standard and import more than tables. Should the proposed function return value be changed to void, letting the FDW execute any DDL statement ? The advantage of returning a list of statement was to make it clear that tables should be imported, and letting the core enforce INTO local_schema part of the clause. I would prefer if the API is limited by design to import tables. This limitation can always be bypassed by executing arbitrary statements before returning the list of ImportForeignSchemaStmt*. For the postgres_fdw specific case, we could add two IMPORT options (since it looked like we had a consensus on this): - import_types - check_types Import types would import simple, composite types, issuing the corresponding statements before returning to core. Check types would compare the local and remote definition for composite types. For types installed by an extension, it would check that the local type has also been created by an extension of the same name, installed in the same schema, raising a warning if the local and remote version differ. For built-in types, a warning would be raised if the local and remote versions of PostgreSQL differ. However, I don't know what we should be doing about types located in a different schema. For example, if the remote table s1.t1 has a column of composite type s2.typ1, should we import typ1 in s1 ? In S2, optionnaly creating the non-existing schema ? Raise an error ? Regards, -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Additionally, I don't really see how that would be useful in a general case. With an in-core defined meaning of type transformation, any FDW that doesn't fit exactly into the model would have a hard time. For example, what happens if an FDW is only ever capable of returning text ? That's actually the case where it's most important to have the feature all the way down to the column level. I'm not sure configuration about specific columns from specific tables would be that useful: if you already know the structure of the table, you can either create it yourself, or run an alter column statement just after creating it. Alternativeley, with the arbitrary options clause proposed by Tom and detailed below, one could use the LIMIT TO / EXCEPT clauses to fine-tune what options should apply. That's why I suggested doing it via CREATE/ALTER with JSONB or similar containing the details rather than inventing new SQL grammar, an option I included only to dismiss it. Hum, adding a simple TYPE MAPPING is already inventing new SQL grammar, but its less invasive. Between this and the type-mapping questions, it seems likely that we're going to need a way for IMPORT FOREIGN SCHEMA to accept user-supplied control options, which would in general be specific to the FDW being used. (Another thing the SQL committee failed to think about.) So, without extending the syntax too much, we could add options: IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema [ { LIMIT TO | EXCEPT } (table_name [, ...])] [ OPTIONS (option_list) ] This option list could then contain fdw-specific options, including type mapping. Or we could add a specific clause: IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema [ { LIMIT TO | EXCEPT } (table_name [, ...])] [ OPTIONS (option_list) ] [ TYPE MAPPING some mapping_definition ] With mapping_definition being either a tuple, or well-defined jsonb format common accross FDWs. A third solution, which I don't like but doesn't modify the SQL grammar, would be to encode the options in the remote_schema name. Would one of those solutions be acceptable ? -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le mardi 27 mai 2014 09:52:27 Stephen Frost a écrit : * David Fetter (da...@fetter.org) wrote: In the easy case of PostgreSQL, you might also be able to establish whether the UDT in the already defined locally case above is identical to the one defined remotely, but I don't think it's possible even in principle for non-PostgreSQL remote systems, possibly not even for ones with non-identical architecture, PostgreSQL major version, etc. For my 2c, it'd be very handy if IMPORT had an option or similar to also copy over all (in the schema) or at least any depended-upon UDTs. It'd really be nice if we had an ability to check the remote UDT vs. the local one at some point also, since otherwise you're going to get bitten at run-time when querying the foreign table. The specification only talks about importing tables, not types, views or (why not ?) foreign tables. If we want to extend the spec further, we could accept more than CreateForeignTableStmt as return values from the API: - CreateDomainStmt - CompositeTypeStmt - AlterTableCmd The core code would be responsible for executing them, and making sure the destination schema is correctly positioned on all of those. The postgres_fdw behaviour could then be controlled with options such as include_types (tri-valued enum accepting none, all, as_needed), relation_kinds (default to 'r', can support multiple kinds with a string 'rfv' for tables, foreign tables and views). I think we're drifting further away from the standard with that kind of stuff, but I'd be happy to implement it if that's the path we choose. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit : On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote: Hello, Since my last proposal didn't get any strong rebuttal, please find attached a more complete version of the IMPORT FOREIGN SCHEMA statement. Thanks! Please to send future patches to this thread so people can track them in their mail. I'll do. I didn't for the previous one because it was a few months ago, and no patch had been added to the commit fest. I tried to follow the SQL-MED specification as closely as possible. This adds discoverability to foreign servers. The structure of the statement as I understand it is simple enough: IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO | EXCEPT) table_list ] INTO local_schema. The import_foreign_schema patch adds the infrastructure, and a new FDW routine: typedef List *(*ImportForeignSchema_function) (ForeignServer *server, ImportForeignSchemaStmt * parsetree); This routine must return a list of CreateForeignTableStmt mirroring whatever tables were found on the remote side, which will then be executed. The import_foreign_schema_postgres_fdw patch proposes an implementation of this API for postgres_fdw. It will import a foreign schema using the right types as well as nullable information. In the case of PostgreSQL, the right types are obvious until there's a user-defined one. What do you plan to do in that case ? The current implementation fetches the types as regtype, and when receiving a custom type, two things can happen: - the type is defined locally: everything will work as expected - the type is not defined locally: the conversion function will fail, and raise an error of the form: ERROR: type schema.typname does not exist Should I add that to the regression test suite ? Regarding documentation, I don't really know where it should have been put. If I missed something, let me know and I'll try to correct it. It's not exactly something you missed, but I need to bring it up anyway before we go too far. The standard missed two crucial concepts when this part of it was written: 1. No single per-database-type universal type mapping can be correct. People will have differing goals for type mapping, and writing a whole new FDW for each of those goals is, to put it mildly, wasteful. I will illustrate with a concrete and common example. MySQL's datetime type encourages usages which PostgreSQL's corresponding type, timestamptz, simply disallows, namely '-00-00 00:00:00' as its idea of UNKNOWN or NULL. One way PostgreSQL's mapping could work is to map it to TEXT, which would preserve the strings exactly and be in some sense an identity map. It would also make the type somewhat useless in its original intended form. Another one would map the type is to a composite data type mysql_datetime(tstz timestamptz, is_wacky boolean) which would capture, for example, ('2014-04-01 00:00:00+00', false) for the UTC start of April Fools' Day this year, and (NULL, true) for '-00-00 00:00:00'. There are doubtless others, and there is no principled way to assign any one of them as universally correct. This brings me to the next crucial concept the standard missed: 2. The correct mapping may not be the identity, and furthermore, the inbound and outbound mappings might in general not be mere inversions of each other. MySQL (no aspersions intended) again provides a concrete example with its unsigned integer types. Yes, it's possible to create a domain over INT8 which simulates UINT4, a domain over NUMERIC which simulates UINT8, etc., but again this process's correctness depends on circumstances. To address these problems, I propose the following: - We make type mappings settable at the level of: - FDW - Instance (a.k.a. cluster) - Database - Schema - Table - Column using the existing ALTER command and some way of spelling out how a remote type maps to a local type. This would consist of: - The remote type - The local type to which it maps - The inbound transformation (default identity) - The outbound transformation (default identity) At any given level, the remote type would need to be unique. To communicate this to the system, we either invent new syntax, with all the hazards attendant thereto, or we could use JSON or similar serialization. ALTER FOREIGN TABLE foo ADD TYPE MAPPING FROM datetime TO TEXT WITH ( INBOUND TRANSFORMATION IDENTITY, OUTBOUND TRANSFORMATION IDENTITY ) /* Ugh!!! */ vs. ALTER FOREIGN TABLE foo ADD (mapping '{ datetime: text, inbound: IDENTITY, outbound: IDENTITY }') Each FDW would have some set of default mappings and some way to override them as above
Re: [HACKERS] Triggers on foreign tables
Le dimanche 23 mars 2014 02:44:26 Noah Misch a écrit : On Tue, Mar 18, 2014 at 09:31:06AM +0100, Ronan Dunklau wrote: Le mardi 18 mars 2014 03:54:19 Kouhei Kaigai a écrit : (1) To acquire the old tuple for UPDATE/DELETE operations, the patch closely parallels our handling for INSTEAD OF triggers on views. It adds a wholerow resjunk attribute, from which it constructs a HeapTuple before calling a trigger function. This loses the system columns, an irrelevant consideration for views but meaningful for foreign tables. postgres_fdw maintains the ctid system column (t_self), but triggers will always see an invalid t_self for the old tuple. One way to fix this is to pass around the old tuple data as ROW(ctid, oid, xmin, cmin, xmax, cmax, tableoid, wholerow). That's fairly close to sufficient, but it doesn't cover t_ctid. Frankly, I would like to find a principled excuse to not worry about making foreign table system columns accessible from triggers. Supporting system columns dramatically affects the mechanism, and what trigger is likely to care? Unfortunately, that argument seems too weak. Does anyone have a cleaner idea for keeping track of the system column values or a stronger argument for not bothering? Regarding to the first suggestion, I think, it is better not to care about system columns on foreign tables, because it fully depends on driver's implementation whether FDW fetches ctid from its data source, or not. Usually, system columns of foreign table, except for tableoid, are nonsense. Because of implementation reason, postgres_fdw fetches ctid of remote tables on UPDATE / DELETE, it is not a common nature for all FDW drivers. For example, we can assume an implementation that uses primary key of remote table to identify the record to be updated or deleted. In this case, local ctid does not have meaningful value. So, fundamentally, we cannot guarantee FDW driver returns meaningful ctid or other system columns. I agree, I think it is somewhat clunky to have postgres_fdw use a feature that is basically meaningless for other FDWs. Looking at some threads in this list, it confused many people. My own reasoning for accepting omission of system columns is more along the lines of Robert's argument. Regardless, three folks voting to do so and none against suffices for me. I documented the system columns limitation, made the returningList change I mentioned, and committed the patch. Thank you, I'm glad the patch found its way to the repository ! This is off-topic, but maybe we could devise an API allowing for local system attributes on foreign table. This would allow FDWs to carry attributes that weren't declared as part of the table definition. This could then be used for postgres_fdw ctid, as well as others foreign data wrappers equivalent of an implicit tuple id. We could, but I discourage it. System columns are a legacy feature; I doubt we'd choose that design afresh today. On the off-chance that you need the value of a remote system column, you can already declare an ordinary foreign table column for it. I raised the issue because it's inconsistent for RETURNING to convey system columns while tg_trigtuple/tg_newtuple do not, not because acquiring system columns from foreign tables is notably useful. The idea here was to allow an FDW author to add columns which are not part of the table definition, for example colums which are required to identify the tuple remotely. Without system columns, a postgres_fdw user would have to declare the ctid column on every table for a tuble to be identifiable. The proposal would allow postgres_fdw to automatically inject an hidden (system ?) column on every table for this ctid. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Triggers on foreign tables
Le mardi 18 mars 2014 03:54:19 Kouhei Kaigai a écrit : I hacked on this for awhile, but there remain two matters on which I'm uncertain about the right way forward. (1) To acquire the old tuple for UPDATE/DELETE operations, the patch closely parallels our handling for INSTEAD OF triggers on views. It adds a wholerow resjunk attribute, from which it constructs a HeapTuple before calling a trigger function. This loses the system columns, an irrelevant consideration for views but meaningful for foreign tables. postgres_fdw maintains the ctid system column (t_self), but triggers will always see an invalid t_self for the old tuple. One way to fix this is to pass around the old tuple data as ROW(ctid, oid, xmin, cmin, xmax, cmax, tableoid, wholerow). That's fairly close to sufficient, but it doesn't cover t_ctid. Frankly, I would like to find a principled excuse to not worry about making foreign table system columns accessible from triggers. Supporting system columns dramatically affects the mechanism, and what trigger is likely to care? Unfortunately, that argument seems too weak. Does anyone have a cleaner idea for keeping track of the system column values or a stronger argument for not bothering? Regarding to the first suggestion, I think, it is better not to care about system columns on foreign tables, because it fully depends on driver's implementation whether FDW fetches ctid from its data source, or not. Usually, system columns of foreign table, except for tableoid, are nonsense. Because of implementation reason, postgres_fdw fetches ctid of remote tables on UPDATE / DELETE, it is not a common nature for all FDW drivers. For example, we can assume an implementation that uses primary key of remote table to identify the record to be updated or deleted. In this case, local ctid does not have meaningful value. So, fundamentally, we cannot guarantee FDW driver returns meaningful ctid or other system columns. I agree, I think it is somewhat clunky to have postgres_fdw use a feature that is basically meaningless for other FDWs. Looking at some threads in this list, it confused many people. This is off-topic, but maybe we could devise an API allowing for local system attributes on foreign table. This would allow FDWs to carry attributes that weren't declared as part of the table definition. This could then be used for postgres_fdw ctid, as well as others foreign data wrappers equivalent of an implicit tuple id. (2) When a foreign table has an AFTER ROW trigger, the FDW's ExecForeign{Insert,Update,Delete} callbacks must return a slot covering all columns. Current FDW API documentation says things like this: The data in the returned slot is used only if the INSERT query has a RETURNING clause. Hence, the FDW could choose to optimize away returning some or all columns depending on the contents of the RETURNING clause. Consistent with that, postgres_fdw inspects the returningLists of the ModifyTable node to decide which columns are necessary. This patch has rewriteTargetListIU() add a resjunk wholerow var to the returningList of any query that will fire an AFTER ROW trigger on a foreign table. That avoids the need to change the FDW API contract or any postgres_fdw code. I was pleased about that for a time, but on further review, I'm doubting that the benefit for writable FDWs justifies muddying the definition of returningList; until now, returningList has been free from resjunk TLEs. For example, callers of FetchStatementTargetList() may be unprepared to see an all-resjunk list, instead of NIL, for a data modification statement that returns nothing. If we do keep the patch's approach, I'm inclined to rename returningList. However, I more lean toward adding a separate flag to indicate the need to return a complete tuple regardless of the RETURNING list. The benefits of overloading returningList are all short-term benefits. We know that the FDW API is still converging, so changing it seems eventually-preferable to, and safer than, changing the name or meaning of returningList. Thoughts? On Thu, Mar 06, 2014 at 09:11:19AM +0100, Ronan Dunklau wrote: Le mercredi 5 mars 2014 22:36:44 Noah Misch a écrit : Agreed. More specifically, I see only two scenarios for retrieving tuples from the tuplestore. Scenario one is that we need the next tuple (or pair of tuples, depending on the TriggerEvent). Scenario two is that we need the tuple(s) most recently retrieved. If that's correct, I'm inclined to rearrange afterTriggerInvokeEvents() and AfterTriggerExecute() to remember the tuple or pair of tuples most recently retrieved. They'll then never call tuplestore_advance() just to reposition. Do you see a problem with that? I don't see any problem with that. I don't know how this would be implemented
Re: [HACKERS] Triggers on foreign tables
Le mercredi 5 mars 2014 22:36:44 Noah Misch a écrit : Agreed. More specifically, I see only two scenarios for retrieving tuples from the tuplestore. Scenario one is that we need the next tuple (or pair of tuples, depending on the TriggerEvent). Scenario two is that we need the tuple(s) most recently retrieved. If that's correct, I'm inclined to rearrange afterTriggerInvokeEvents() and AfterTriggerExecute() to remember the tuple or pair of tuples most recently retrieved. They'll then never call tuplestore_advance() just to reposition. Do you see a problem with that? I don't see any problem with that. I don't know how this would be implemented, but it would make sense to avoid those scans, as long as a fresh copy is passed to the trigger: modifications to a tuple performed in an after trigger should not be visible to the next one. I was again somewhat tempted to remove ate_tupleindex, perhaps by defining the four flag bits this way: #define AFTER_TRIGGER_DONE0x1000 #define AFTER_TRIGGER_IN_PROGRESS 0x2000 /* two bits describing the size of and tuple sources for this event */ #define AFTER_TRIGGER_TUP_BITS0xC000 #define AFTER_TRIGGER_FDW_REUSE 0x #define AFTER_TRIGGER_FDW_FETCH 0x4000 #define AFTER_TRIGGER_1CTID 0x8000 #define AFTER_TRIGGER_2CTID 0xC000 AFTER_TRIGGER_FDW_FETCH and AFTER_TRIGGER_FDW_REUSE correspond to the aforementioned scenarios one and two, respectively. I think, though, I'll rate this as needless micro-optimization and not bother; opinions welcome. (The savings is four bytes per foreign table trigger event.) I was already happy with having a lower footprint for foreign table trigger events than for regular trigger events, but if we remove the need for seeking in the tuplestore entirely, it would make sense to get rid of the index. Thanks, nm Thanks to you. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] jsonb and nested hstore
Le jeudi 6 mars 2014 09:33:18 Andrew Dunstan a écrit : On 03/06/2014 08:16 AM, Oleg Bartunov wrote: On Thu, Mar 6, 2014 at 12:43 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Mar 6, 2014 at 12:23 AM, Teodor Sigaev teo...@sigaev.ru wrote: That's possible to introduce GUC variable for i/o functions which will control old bug-to-bug behavior. IMHO, this is much better option that stopping hstore development or split hstore to two branches. A GUC that controls i/o functions is generally considered to be an unacceptable hack. In what sense are we really stopping hstore development if hstore2 lives as jsonb? I have a hard time imagining someone dealing with the incompatibility that a user-facing hstore2 would introduce, while still preferring hstore syntax over json syntax given the choice. There are very rich facilities for manipulating json available in every programming language. The same is not true of hstore. Having looked at the issue today, I think that the amount of redundant code between a hstore2 in core as jsonb and hstore1 will be acceptable. The advantages of making a clean-break in having to support the legacy hstore disk format strengthen the case for doing so too. Heh, let's not to do an implusive decision about hstore2. I agree, that jsonb has a lot of facilities, but don't forget, that json(b) has to follow standard and in that sense it's more constrained than hstore, which we could further develop to support some interesting features, which will never be implemented in json(b). Also, it'd be a bit awkward after working on nested hstore and declaring it on several conferences (Engine Yard has sponsored part of our hstore work), suddenly break people expectation and say, that our work has moved to core to provide json some very cool features, good bye, hstore users :( I'm afraid people will not understand us. Oleg, I hear you, and largely agree, as long as the compatibility issue is solved. If it's not, I think inventing a new hstore2 type is probably a lousy way to go. For good or ill, the world has pretty much settled on wanting to use json for lightweight treeish data. That's where we'll get the most impact. Virtually every programming language (including Perl) has good support for json. I'm not sure what the constraints of json that you might want to break are. Perhaps you'd like to specify. I haven't followed the whole thread, but json is really restrictive on the supported types: a hierarchical hstore could maybe support more types (timestamp comes to mind) as its values, which is not a valid data type in the json spec. Whatever we do, rest assured your work won't go to waste. cheers andrew -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Triggers on foreign tables
Hello. Did you have time to review the latest version of this patch ? Is there anything I can do to get this ready for commiter ? Thank you for all the work performed so far. Le mardi 4 février 2014 13:16:22 Ronan Dunklau a écrit : Le lundi 3 février 2014 23:28:45 Noah Misch a écrit : On Sun, Feb 02, 2014 at 11:53:51AM +0100, Ronan Dunklau wrote: Le jeudi 30 janvier 2014 14:05:08 Noah Misch a écrit : On Thu, Jan 23, 2014 at 03:17:35PM +0100, Ronan Dunklau wrote: What do you think about this approach ? Is there something I missed which would make it not sustainable ? Seems basically reasonable. I foresee multiple advantages from having one tuplestore per query level as opposed to one for the entire transaction. You would remove the performance trap of backing up the tuplestore by rescanning. It permits reclaiming memory and disk space in AfterTriggerEndQuery() rather than at end of transaction. You could remove ate_ptr1 and ate_ptr2 from AfterTriggerEventDataFDW and just store the flags word: depending on AFTER_TRIGGER_2CTIDS, grab either the next one or the next two tuples from the tuplestore. Using work_mem per AfterTriggerBeginQuery() instead of per transaction is no problem. What do you think of that design change? I agree that this design is better, but I have some objections. We can remove ate_ptr2 and rely on the AFTER_TRIGGER_2CTIDS flag, but the rescanning and ate_ptr1 (renamed ate_tupleindex in the attached patch) can't go away. Consider for example the case of a foreign table with more than one AFTER UPDATE triggers. Unless we store the tuples once for each trigger, we will have to rescan the tuplestore. Will we? Within a given query level, when do (non-deferred) triggers execute in an order other than the enqueue order? Let me explain what I had in mind. Looking at the code in AfterTriggerSaveEvent: - we build a template AfterTriggerEvent, and store the tuple(s) - for each suitable after trigger that matches the trigger type, as well as the WHEN condition if any, a copy of the previously built AfterTriggerEvent is queued Later, those events are fired in order. This means that more than one event can be fired for one tuple. Take this example: CREATE TRIGGER trig_row_after1 AFTER UPDATE ON rem2 FOR EACH ROW WHEN (NEW.f1 % 5 3) EXECUTE PROCEDURE trigger_func('TRIG1'); CREATE TRIGGER trig_row_after2 AFTER UPDATE ON rem2 FOR EACH ROW WHEN (NEW.f1 % 5 4) EXECUTE PROCEDURE trigger_func('TRIG2'); UPDATE rem2 set f2 = 'something'; Assuming 5 rows with f1 as a serial, the fired AfterTriggerEvent's ate_tupleindex will be, in that order. Ass 0-0-2-2-4-8-8 So, at least a backward seek is required for trig_row_after2 to be able to retrieve a tuple that was already consumed when firing trig_row_after1. On a side note, this made me realize that it is better to avoid storing a tuple entirely if there is no enabled trigger (the f1 = 4 case above). The attached patch does that, so the previous sequence becomes: 0-0-2-2-4-6-6 It also prevents from initalizing a tuplestore at all if its not needed. To mitigate the effects of this behaviour, I added the option to perform a reverse_seek when the looked-up tuple is nearer from the current index than from the start. If there's still a need to seek within the tuplestore, that should get rid of the O(n^2) effect. I'm hoping that per-query-level tuplestores will eliminate the need to seek entirely. I think the only case when seeking is still needed is when there are more than one after trigger that need to be fired, since the abovementioned change prevents from seeking to skip tuples. If you do pursue that change, make sure the code still does the right thing when it drops queued entries during subxact abort. I don't really understand what should be done at that stage. Since triggers on foreign tables are not allowed to be deferred, everything should be cleaned up at the end of each query, right ? So, there shouldn't be any queued entries. I suspect that's right. If you haven't looked over AfterTriggerEndSubXact(), please do so and ensure all its actions still make sense in the context of this new kind of trigger storage. You're right, I missed something here. When aborting a subxact, the tuplestores for queries below the subxact query depth should be cleaned, if any, because AfterTriggerEndQuery has not been called for the failing query. The attached patch fixes that. The attached patch checks this, and add documentation for this limitation. I'm not really sure about how to phrase that correctly in the error message and the documentation. One can store at most INT_MAX foreign tuples, which means that at most INT_MAX insert or delete
[HACKERS] Proposal: IMPORT FOREIGN SCHEMA statement.
Hello, The SQL-MED specification defines the IMPORT FOREIGN SCHEMA statement. This adds discoverability to foreign servers. The structure of the statement as I understand it is simple enough: IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO | EXCEPT) table_list ] INTO local_schema. Is anyone working on this? I found a reference to this from 2010 in the archive, stating that work should be focused on core functionality, but nothing more recent. This would be very useful for postgres_fdw and other RDBMS-backed fdws, but I think even file_fdw could benefit from it if it was able to create a foreign table for every csv-with-header file in a directory. I can see a simple API working for that. A new function would be added to the fdw routine, which is responsible for crafting CreateForeignTableStmt. It could have the following signature: typedef List *(*ImportForeignSchema_function) (ForeignServer *server, ImportForeignSchemaStmt * parsetree); I experimented with this idea, and came up with the attached two patches: one for the core, and the other for actually implementing the API in postgres_fdw. Maybe those can serve as a proof-of-concept for discussing the design? -- Ronan Dunklau http://dalibo.com - http://dalibo.orgdiff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c index 024a477..40a2540 100644 --- a/src/backend/commands/event_trigger.c +++ b/src/backend/commands/event_trigger.c @@ -250,7 +250,8 @@ check_ddl_tag(const char *tag) pg_strcasecmp(tag, REFRESH MATERIALIZED VIEW) == 0 || pg_strcasecmp(tag, ALTER DEFAULT PRIVILEGES) == 0 || pg_strcasecmp(tag, ALTER LARGE OBJECT) == 0 || - pg_strcasecmp(tag, DROP OWNED) == 0) + pg_strcasecmp(tag, DROP OWNED) == 0 || + pg_strcasecmp(tag, IMPORT FOREIGN SCHEMA) == 0) return EVENT_TRIGGER_COMMAND_TAG_OK; /* diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c index 7f007d7..719c674 100644 --- a/src/backend/commands/foreigncmds.c +++ b/src/backend/commands/foreigncmds.c @@ -27,7 +27,9 @@ #include catalog/pg_type.h #include catalog/pg_user_mapping.h #include commands/defrem.h +#include commands/tablecmds.h #include foreign/foreign.h +#include foreign/fdwapi.h #include miscadmin.h #include parser/parse_func.h #include utils/acl.h @@ -1427,3 +1429,48 @@ CreateForeignTable(CreateForeignTableStmt *stmt, Oid relid) heap_close(ftrel, RowExclusiveLock); } + +/* + * Import a foreign schema + */ +void +ImportForeignSchema(ImportForeignSchemaStmt * stmt) +{ + Oid ownerId; + ForeignDataWrapper *fdw; + ForeignServer *server; + FdwRoutine *fdw_routine; + AclResult aclresult; + List *table_list = NULL; + ListCell *lc; + char * local_schema = strdup(stmt-local_schema); + + ownerId = GetUserId(); + server = GetForeignServerByName(stmt-servername, false); + aclresult = pg_foreign_server_aclcheck(server-serverid, ownerId, ACL_USAGE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_FOREIGN_SERVER, server-servername); + + /* Check the permissions on the schema */ + LookupCreationNamespace(local_schema); + + fdw = GetForeignDataWrapper(server-fdwid); + fdw_routine = GetFdwRoutine(fdw-fdwhandler); + if (fdw_routine-ImportForeignSchema == NULL) + { + ereport(ERROR, +(errcode(ERRCODE_FDW_NO_SCHEMAS), + errmsg(This FDW does not support schema importation))); + } + table_list = fdw_routine-ImportForeignSchema(server, stmt); + foreach(lc, table_list) + { + CreateForeignTableStmt *create_stmt = lfirst(lc); + Oid relOid = DefineRelation((CreateStmt *) create_stmt, + RELKIND_FOREIGN_TABLE, + InvalidOid); + /* Override whatever the fdw set for the schema */ + create_stmt-base.relation-schemaname = local_schema; + CreateForeignTable(create_stmt, relOid); + } +} diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 81169a4..80c18cc 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -235,8 +235,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt DropRoleStmt DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt DropForeignServerStmt DropUserMappingStmt ExplainStmt FetchStmt - GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt - LockStmt NotifyStmt ExplainableStmt PreparableStmt + GrantStmt GrantRoleStmt IndexStmt ImportForeignSchemaStmt InsertStmt + ListenStmt LoadStmt LockStmt NotifyStmt ExplainableStmt PreparableStmt CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt RemoveFuncStmt RemoveOperStmt RenameStmt RevokeStmt RevokeRoleStmt RuleActionStmt RuleActionStmtOrEmpty RuleStmt @@ -319,6 +319,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type ival defacl_privilege_target %type defelt DefACLOption %type list DefACLOptionList +%type node OptImportForeignSchemaRestriction +%type ival
Re: [HACKERS] Proposal: IMPORT FOREIGN SCHEMA statement.
I havent had a look at the patch yet since I dont have a nice editor right now, but how do you handle inter operability between datatypes? Specifically, how do you handle those datatypes which have a different name from the PostgreSQL name for them and/or are stored in a different manner? Do you mean in general, or for the postgres_fdw specifically ? In general, only valid column types should be accepted in the CreateForeignTableStmt. The CreateForeignTableStmt is passed through DefineRelation, which takes care of looking up the actual data types. For the postgres_fdw POC implementation, this is done by parsing the attributes type from the query result with the regtype input functions. The attribute typmod is injected too. Regards, Atri -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Proposal: IMPORT FOREIGN SCHEMA statement.
Le vendredi 21 février 2014 16:45:20 Atri Sharma a écrit : On Fri, Feb 21, 2014 at 4:39 PM, Ronan Dunklau ronan.dunk...@dalibo.comwrote: I havent had a look at the patch yet since I dont have a nice editor right now, but how do you handle inter operability between datatypes? Specifically, how do you handle those datatypes which have a different name from the PostgreSQL name for them and/or are stored in a different manner? Do you mean in general, or for the postgres_fdw specifically ? In general, only valid column types should be accepted in the CreateForeignTableStmt. The CreateForeignTableStmt is passed through DefineRelation, which takes care of looking up the actual data types. For the postgres_fdw POC implementation, this is done by parsing the attributes type from the query result with the regtype input functions. The attribute typmod is injected too. I actually meant in general. Thanks for the reply. So please help me understand here. How exactly does CreateForeignTableStmt help in type compatibility? I'm not sure I understand your concern. It doesn't help in type compatibility, it is still the responsibility of the FDW to convert local types to remote ones. The CreateForeignTableStmt defines the column, with their types. It is executed locally to create a new foreign table according to a remote description of the table. The only difference with a user-written CreateForeignTable statement is that the structure is crafted by the FDW instead of the parser. A statement may be valid on a foreign server but may not be compatible. Do you mean the CreateForeignTableStmt ? It has to be valid locally, or it won't be executed. It is the FDW responsibility to build this statement in such a way that it is valid locally. Regards, Atri -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Proposal: IMPORT FOREIGN SCHEMA statement.
Who says that the OIDs are the same on the local and the remove postgres instance? For user-defined types, that's certainly not going to be true... That's why the the result is casted as regtype[], and parsed as such. The oid is not transmitted over the wire, but set by regtype_in. Also, why do you aggregate the lists of columns, types and oids into arrays when querying them from the remote server? Just producing a query that returns one row per table column seems much simpler, both conceptually and implementation wise. As I said, this is a Proof-Of-Concept. It is not meant to be a fully functional, optimal implementation, but to serve as basis for discussion of the API and the feature themselves. The query could indeed be replaced by what you suggest, performing the grouping locally. I have absolutely no opinion on which implementation is better, this one seemed the most natural to me. Finally, I think there are a few missing pieces. For example, you quite easily could copy over not-NULL flags, but you currently don't. Similarly, what about inheritance relationships between remote tables? There's a patch in the current CF, I believe, which adds support for inheritance to foreign tables, so all you'd have to do is to make the foreign table's inheritance structure match the remote table's. Duly noted, we could probably import NOT NULL flags, and maybe even the table's inheritance structure. I'll look into that if the feature and the API are deemed worthy. Thank you for the review. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly
Le mardi 7 janvier 2014 17:05:03 Michael Paquier a écrit : On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote: Could you confirm again and tell me what problem is happening? FWIW, I just quickly tested those two patches independently and got them correctly applied with patch -p1 $PATCH on master at edc4345. They compiled and passed as well make check. Regards, Both patches apply cleanly, I'll focus on the pg_stop_fail_v3 patch. Tests are running correctly. The bug this patch is supposed to fix has been reproduced on current HEAD, and cannot be reproduced using the patch. Previous concerns about using both get_pgpid and postmaster_is_alive are adressed. There is no regression tests covering this bugfix, althought I don't know if it would be practical to implement them. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
[HACKERS] Triggers on foreign tables
Hello. Since the last discussion about it (http://www.postgresql.org/message-id/cadyhksugp6ojb1pybtimop3s5fg_yokguto-7rbcltnvaj5...@mail.gmail.com), I finally managed to implement row triggers for foreign tables. The attached patch does not contain regression tests or documentation yet, a revised patch will include those sometime during the week. I'll add it to the commitfest then. A simple test scenario using postgres_fdw is however included as an attachment. The attached patch implements triggers for foreign tables according to the design discussed in the link above. For statement-level triggers, nothing has changed since the last patch. Statement-level triggers are just allowed in the command checking. For row-level triggers, it works as follows: - during rewrite phase, every attribute of the foreign table is duplicated as a resjunk entry if a trigger is defined on the relation. These entries are then used to store the old values for a tuple. There is room for improvement here: we could check if any trigger is in fact a row-level trigger - during execution phase, the before triggers are fired exactly like triggers on regular tables, except that old tuples are not fetched using their ctid, but rebuilt using the previously-stored resjunk attributes. - for after triggers, the whole queuing mechanism is bypassed for foreign tables. This is IMO acceptable, since foreign tables cannot have constraints or constraints triggers, and thus have not need for deferrable execution. This design avoids the need for storing and retrieving/identifying remote tuples until the query or transaction end. - the duplicated resjunk attributes are identified by being: - marked as resjunk in the targetlist - not being system or whole-row attributes (varno 0) There is still one small issue with the attached patch: modifications to the tuple performed by the foreign data wrapper (via the returned TupleTableSlot in ExecForeignUpdate and ExecForeignInsert hooks) are not visible to the AFTER trigger. This could be fixed by merging the planslot containing the resjunk columns with the returned slot before calling the trigger, but I'm not really sure how to safely perform that. Any advice ? Many thanks to Kohei Kaigai for taking the time to help with the design. -- Ronan Dunklau http://dalibo.com - http://dalibo.org test_with_postgres_fdw.sql Description: application/sql diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b9cd88d..a509595 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3150,13 +3150,16 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, /* No command-specific prep needed */ break; case AT_EnableTrig: /* ENABLE TRIGGER variants */ - case AT_EnableAlwaysTrig: - case AT_EnableReplicaTrig: case AT_EnableTrigAll: case AT_EnableTrigUser: case AT_DisableTrig: /* DISABLE TRIGGER variants */ case AT_DisableTrigAll: case AT_DisableTrigUser: + ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); + pass = AT_PASS_MISC; + break; + case AT_EnableReplicaTrig: + case AT_EnableAlwaysTrig: case AT_EnableRule: /* ENABLE/DISABLE RULE variants */ case AT_EnableAlwaysRule: case AT_EnableReplicaRule: diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 4eff184..12870eb 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -75,6 +75,14 @@ static HeapTuple GetTupleForTrigger(EState *estate, ItemPointer tid, LockTupleMode lockmode, TupleTableSlot **newSlot); + +static HeapTuple ExtractOldTuple(TupleTableSlot * mixedtupleslot, + ResultRelInfo * relinfo); +static void ExecARForeignTrigger(EState * estate, +TriggerData LocTriggerData, +ResultRelInfo *relinfo, +int triggerEvent, int triggerType); + static bool TriggerEnabled(EState *estate, ResultRelInfo *relinfo, Trigger *trigger, TriggerEvent event, Bitmapset *modifiedCols, @@ -184,12 +192,22 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, RelationGetRelationName(rel)), errdetail(Views cannot have TRUNCATE triggers.))); } + else if (rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE) + { + if(stmt-isconstraint) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg(\%s\ is a foreign table, + RelationGetRelationName(rel)), + errdetail(Foreign Tables cannot have constraint triggers.))); + } else + { ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg(\%s\ is not a table or view, RelationGetRelationName(rel; - + } if (!allowSystemTableMods IsSystemRelation(rel)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), @@ -1062,10 +1080,11 @@ RemoveTriggerById(Oid trigOid) rel = heap_open(relid, AccessExclusiveLock); if (rel-rd_rel-relkind != RELKIND_RELATION - rel-rd_rel-relkind != RELKIND_VIEW) + rel-rd_rel-relkind
Re: [HACKERS] Triggers on foreign tables
Le mardi 15 octobre 2013 09:47:31 Robert Haas a écrit : On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: And, I also want some comments from committers, not only from mine. +1 +1 /me pokes head up. I know I'm going to annoy people with this comment, but I feel like it's going to have to be made at some point by somebody, so here goes: I don't see the point of this feature. If you want a trigger on a table, why not set it on the remote side? A trigger on the foreign table won't be enforced consistently; it'll only work when the update is routed through the foreign table, not when people access the underlying table on the remote side through any other mechanism. The number of useful things you can do this way seems fairly small. Perhaps you could use a row-level trigger for RLS, to allow only certain rows on the foreign side to be updated, but even that seems like a slightly strange design: generally it'd be better to enforce the security as close to the target object as possible. There's another issue that concerns me here also: performance. IIUC, an update of N tuples on the remote side currently requires N+1 server round-trips. That is unspeakably awful, and we really oughta be looking for ways to make that number go down, by pushing the whole update to the remote side. But obviously that won't be possible if there's a per-row trigger that has to be evaluated on the local side. Now, assuming somebody comes up with code that implements that optimization, we can just disable it when there are local-side triggers. But, then you're back to having terrible performance. So even if the use case for this seemed really broad, I tend to think performance concerns would sink most of the possible real-world uses. I could, of course, be all wet Even if the row-level trigger functionality is controversial, in terms of provided features VS performance, the original patch I submitted enables statement-level triggers. Although my goal was to implement row-level triggers and I hit a wall pretty quickly, would it make sense to have statement-level triggers without their row counterparts ? I also think that pushing the whole update to the remote side will not be possible in all cases, and like Kohei wrote, it may be an acceptable loss to not be able to push it when a trigger is present. If we want to push the whole update, we will have to cope with local joins and function evaluations in all cases. IMO, triggers are just a special case of these limiting factors. signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Triggers on foreign tables
Sorry, I might call it something like primary key, instead of 'tupleid'. Apart from whether we can uniquely identify a particular remote record with an attribute, what FDW needs here is something to identify a remote record. So, we were talking about same concept with different names. Ah, that makes sense: I was understanding tupleid as a synonym for ctid. Does the incomplete tuple mean a tuple image but some of columns are replaced with NULL due to optimization, as postgres_fdw is doing, doesn't it? If so, a solution is to enforce FDW driver to fetch all the columns if its managing foreign table has row level triggers for update / delete. What I meant is that a DELETE statement, for example, does not build a scan- stage reltargetlist consisting of the whole set of attributes: the only attributes that are fetched are the ones built by addForeignUpdateTargets, and whatever additional attributes are involved in the DELETE statement (in the WHERE clause, for example). DELETE statement indeed does not (need to) construct a complete tuple image on scan stage, however, it does not prohibit FDW driver to keep the latest record being fetched from remote server. FDW driver easily knows whether relation has row-level trigger preliminary, so here is no matter to keep a complete image internally if needed. Or, it is perhaps available to add additional target-entries that is needed to construct a complete tuple using AddForeignUpdateTargets() callback. I think that the additional target-entries should be added in core, because that would require additional work from FDW drivers, and the code would be duplicated amongst all FDW drivers that wish to support triggers. I like this idea, but I don't really see all the implications of such a design. Where would this tuple be stored ? From what I understand, after-triggers are queued for later execution, using the old/new tupleid for later retrieval (in the AfterTriggerEventData struct). This would need a major change to work with foreign tables. Would that involve a special path for executing those triggers ASAP ? Ops, I oversight here. AfterTriggerSaveEvent() indeed saves only ctid of tuples in regular tables, because it can re-construct a complete tuple image from a particular ctid any time. On the other hand, it is not available on foreign table due to its nature. All we can do seems to me, probably, to save copy of before/after tuple image on local memory, even though it consumes much memory than regular tables. Or, as suggested above, add a special code path for foreign tables which would execute the trigger as soon as possible instead of queuing it. The feature we need here might become a bit clear little by little. A complete tuple image shall be fetched on the scan stage, if foreign table has row-level trigger. The planSlot may make sense if it contains (junk) attributes that is sufficient to re-construct an old tuple image. Target-list of DELETE statement contains a junk ctid attribute. Here is no reason why we cannot add junk attributes that reference each regular attribute, isn't it? Also, target-list of UPDATE statement contains new tuple image, however, it is available to add junk attributes that just reference old value, as a carrier of old values from scan stage to modify stage. I want core PostgreSQL to support a part of these jobs. For example, it may be able to add junk attribute to re-construct old tuple image on rewriteTargetListUD(), if target relation has row-level triggers. Also, it may be able to extract these old values from planSlot to construct an old tuple image on GetTupleForTrigger(), if relation is foreign table. So, if I understand you, the target list would be built as follow: - core builds the target list, including every attribute - this target list is updated by the FDW to add any junk attributes it wishes to use - in the case of an UPDATE, core duplicates the whole list of attributes (including the added junk attributes), as another set of junk attributes. Maybe we could duplicate only the updated attributes. Unfortunately, I have no special idea to handle old/new remote tuple on AfterTriggerSaveEvent(), except for keeping it as copy, but it is straightforward. Maybe avoid it altogether in the case of a trigger on a remote foreign table ? And, I also want some comments from committers, not only from mine. +1 +1 Thanks, signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Triggers on foreign tables
Le samedi 12 octobre 2013 07:30:35 Kohei KaiGai a écrit : 2013/10/10 Ronan Dunklau rdunk...@gmail.com: Sorry, I'm uncertain the point above. Are you saying FDW driver may be able to handle well a case when a remote tuple to be updated is different from a remote tuple being fetched on the first stage? Or, am I misunderstanding? I was not clear in the point above: what I meant was that, from my understanding, the FDW driver has no obligation to use the system-attribute tupleid to identify the remote tuple. IIRC, one of the early API proposal involved had the tupleid passed as an argument to the ExecForeign* hooks. Now, these hooks receive the whole planslot instead. This slot can store additional resjunks attributes (thanks to the AddForeignUpdateTargets) which shouldn't be altered during the execution and are carried on until modify stage. So, these additional attributes can be used as identifiers instead of the tupleid. In fact, this is the approach I implemented for the multicorn fdw, and I believe it to be used in other FDWs as well (HadoopFDW does that, if I understand it correctly). So, it doesn't make sense to me to assume that the tupleid will be filled as valid remote identifier in the context of triggers. From another standpoint, I'd like to cancel the above my suggestion, because after-row update or delete trigger tries to fetch an older image of tuple next to the actual update / delete jobs. Even if FDW is responsible to identify a remote tuple using tupleid, the identified tuple we can fetch is the newer image because FDW driver already issues a remote query to update or delete the target record. So, soon or later, FDW shall be responsible to keep a complete tuple image when foreign table has row-level triggers, until writer operation is finished. +1 Does the incomplete tuple mean a tuple image but some of columns are replaced with NULL due to optimization, as postgres_fdw is doing, doesn't it? If so, a solution is to enforce FDW driver to fetch all the columns if its managing foreign table has row level triggers for update / delete. What I meant is that a DELETE statement, for example, does not build a scan- stage reltargetlist consisting of the whole set of attributes: the only attributes that are fetched are the ones built by addForeignUpdateTargets, and whatever additional attributes are involved in the DELETE statement (in the WHERE clause, for example). I apologize if this is not clear, since both my technical and english vocabulary are maybe not precise enough to get my point accross. Or, perform a second round-trip to the foreign data store to fetch the whole tuple. It might be a solution for before-row trigger, but not for after-row trigger, unfortunately. +1 As I noted above, 2nd round trip is not a suitable design to support after-row trigger. Likely, GetTupleForTrigger() hook shall perform to return a cached older image being fetched on the first round of remote scan. So, I guess every FDW driver will have similar implementation to handle these the situation, thus it makes sense that PostgreSQL core has a feature to support this handling; that keeps a tuple being fetched last and returns older image for row-level triggers. How about your opinion? I like this idea, but I don't really see all the implications of such a design. Where would this tuple be stored ? From what I understand, after-triggers are queued for later execution, using the old/new tupleid for later retrieval (in the AfterTriggerEventData struct). This would need a major change to work with foreign tables. Would that involve a special path for executing those triggers ASAP ? And, I also want some comments from committers, not only from mine. +1 Thank you for taking the time to think this through. -- Ronan Dunklau signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Triggers on foreign tables
Le dimanche 6 octobre 2013 22:33:23 Kohei KaiGai a écrit : 2013/9/10 Ronan Dunklau rdunk...@gmail.com: For row-level triggers, it seems more complicated. From what I understand, OLD/NEW tuples are fetched from the heap using their ctid (except for BEFORE INSERT triggers). How could this be adapted for foreign tables ? It seems to me the origin of difficulty to support row-level trigger is that foreign table does not have a back-door to see the older tuple to be updated, unlike regular tables. The core-PostgreSQL knows on-disk format to store tuples within regular tables, thus, GetTupleForTrigger() can fetch a tuple being identified by tupleid. On the other hand, writable foreign table is also designed to identify a remote tuple with tupleid, as long as FDW module is responsible. It is my understanding that the tupleid is one way for a FDW to identify a tuple, but the AddForeignUpdateTargets hook allows for arbitrary resjunks columns to be added. It is these columns that are then used to identify the tuple to update. I don't think the tupleid itself can be used to identify a tuple for the trigger. So, a straightforward idea is adding a new FDW interface to get an older image of the tuple to be updated. It makes possible to implement something similar to GetTupleForTrigger() on foreign tables, even though it takes a secondary query to fetch a record from the remote server. Probably, it is an headache for us One thing we pay attention is, the tuple to be updated is already fetched from the remote server and the tuple being fetched latest is (always?) the target of update or delete. It is not a heavy job for ForeignScanState node to hold a copy of the latest tuple. Isn't it an available way to reference relevant ForeignScanState to get the older image? It is however possible to have only an incomplete tuple. The FDW may have only fetched the tupleid-equivalent, and we would have to ensure that the reltargetlist contains every attribute that the trigger could need. Or, perform a second round-trip to the foreign data store to fetch the whole tuple. If my assumption is right, all we need to enhance are (1) add a store on ForeignScanState to hold the last tuple on the scan stage, (2) add GetForeignTupleForTrigger() to reference the store above on the relevant ForeignScanState. I would add a 3) have a GetTupleForTrigger() hook that would get called with the stored tuple. I personnally don't like this approach: there is already (n+1) round trips to the foreign store (one during the scan phase, and one per tuple during the update/delete phase), we don't need another one per tuple for the triggers. signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Extensions makefiles - coverage
On Sunday 22 September 2013 01:34:53 Peter Eisentraut wrote: On Thu, 2013-07-25 at 17:07 +0200, Ronan Dunklau wrote: I am using approximatively the layout that was proposed here: http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net It looks like everything is hard-coded to take the source and the gcda, gcno files in the base directory, but these files lay in a src directory with the proposed layout. The PostgreSQL build system isn't going to work very well if you build files outside of the current directory. If you want to put your source files into a src/ subdirectory, then your top-level makefile should to a $(MAKE) -C src, and you need to have a second makefile in the src directory. If you do that, then the existing coverage targets will work alright, I think. The PGXS build system allows for the definition of an OBJS variable, which works fine with almost every other make target. Maybe we need to take a step back, and think about what kind of extension layouts we want to support ? At the time of this writing, the HOW TO on http://manager.pgxn.org/howto strongly encourage to put all C-files in an src directory. As a result, many extensions on pgxn use this layout. It would be great not to have to change them to measure code coverage. signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Triggers on foreign tables
On Thursday 12 September 2013 12:10:01 Peter Eisentraut wrote: The documentation build fails: openjade:trigger.sgml:72:9:E: end tag for ACRONYM omitted, but OMITTAG NO was specified openjade:trigger.sgml:70:56: start tag was here Thank you, I took the time to install a working doc-building environment. Please find attached a patch that corrects this.diff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml index e5ec738..08d133c 100644 --- a/doc/src/sgml/ref/create_trigger.sgml +++ b/doc/src/sgml/ref/create_trigger.sgml @@ -93,7 +93,7 @@ CREATE [ CONSTRAINT ] TRIGGER replaceable class=PARAMETERname/replaceable para The following table summarizes which types of triggers may be used on - tables and views: + tables, views and foreign tables: /para informaltable id=supported-trigger-types @@ -111,7 +111,7 @@ CREATE [ CONSTRAINT ] TRIGGER replaceable class=PARAMETERname/replaceable entry align=center morerows=1literalBEFORE//entry entry align=centercommandINSERT//commandUPDATE//commandDELETE//entry entry align=centerTables/entry - entry align=centerTables and views/entry + entry align=centerTables, views and foreign tables/entry /row row entry align=centercommandTRUNCATE//entry @@ -122,7 +122,7 @@ CREATE [ CONSTRAINT ] TRIGGER replaceable class=PARAMETERname/replaceable entry align=center morerows=1literalAFTER//entry entry align=centercommandINSERT//commandUPDATE//commandDELETE//entry entry align=centerTables/entry - entry align=centerTables and views/entry + entry align=centerTables, views and foreign tables/entry /row row entry align=centercommandTRUNCATE//entry @@ -244,7 +244,7 @@ UPDATE OF replaceablecolumn_name1/replaceable [, replaceablecolumn_name2/ termreplaceable class=parametertable_name/replaceable/term listitem para - The name (optionally schema-qualified) of the table or view the trigger + The name (optionally schema-qualified) of the table, view or foreign table the trigger is for. /para /listitem diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index f579340..37c0a35 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -33,7 +33,7 @@ para A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is -performed. Triggers can be attached to both tables and views. +performed. Triggers can be attached to tables, views and foreign tables. /para para @@ -64,6 +64,15 @@ /para para +On foreign tables, trigger can be defined to execute either before or after any +commandINSERT/command, +commandUPDATE/command or +commandDELETE/command operation, only once per acronymSQL/acronym statement. +This means that row-level triggers are not supported for foreign tables. + /para + + + para The trigger function must be defined before the trigger itself can be created. The trigger function must be declared as a function taking no arguments and returning type literaltrigger/. @@ -96,6 +105,7 @@ after may only be defined at statement level, while triggers that fire instead of an commandINSERT/command, commandUPDATE/command, or commandDELETE/command may only be defined at row level. +On foreign tables, triggers can not be defined at row level. /para para diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index adc74dd..68bfa9c 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3141,13 +3141,16 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, pass = AT_PASS_MISC; break; case AT_EnableTrig: /* ENABLE TRIGGER variants */ - case AT_EnableAlwaysTrig: - case AT_EnableReplicaTrig: case AT_EnableTrigAll: case AT_EnableTrigUser: case AT_DisableTrig: /* DISABLE TRIGGER variants */ case AT_DisableTrigAll: case AT_DisableTrigUser: +ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); +pass = AT_PASS_MISC; +break; + case AT_EnableReplicaTrig: + case AT_EnableAlwaysTrig: case AT_EnableRule: /* ENABLE/DISABLE RULE variants */ case AT_EnableAlwaysRule: case AT_EnableReplicaRule: diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index d86e9ad..d84b61b 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -184,12 +184,23 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, RelationGetRelationName(rel)), errdetail(Views cannot have TRUNCATE triggers.))); } - else + else if (rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE) + { + if(stmt-row){ + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg(\%s\ is a foreign table, +
Re: [HACKERS] Triggers on foreign tables
On Wednesday 11 September 2013 06:27:24 Michael Paquier wrote: As your patch is targeting the implementation of a new feature, please consider adding this patch to the next commit fest that is going to begin in a couple of days: https://commitfest.postgresql.org/action/commitfest_view?id=19 I intended to do that in the next couple of days if I don't get any feedback. Thank you for the reminder, its done. signature.asc Description: This is a digitally signed message part.
[HACKERS] Triggers on foreign tables
Hello. I wanted to know what it would take to implement triggers on foreign tables. It seems that statement-level triggers can work provided they are allowed in the code. Please find attached a simple POC patch that implement just that. For row-level triggers, it seems more complicated. From what I understand, OLD/NEW tuples are fetched from the heap using their ctid (except for BEFORE INSERT triggers). How could this be adapted for foreign tables ? -- Ronan Dunklaudiff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml index e5ec738..08d133c 100644 --- a/doc/src/sgml/ref/create_trigger.sgml +++ b/doc/src/sgml/ref/create_trigger.sgml @@ -93,7 +93,7 @@ CREATE [ CONSTRAINT ] TRIGGER replaceable class=PARAMETERname/replaceable para The following table summarizes which types of triggers may be used on - tables and views: + tables, views and foreign tables: /para informaltable id=supported-trigger-types @@ -111,7 +111,7 @@ CREATE [ CONSTRAINT ] TRIGGER replaceable class=PARAMETERname/replaceable entry align=center morerows=1literalBEFORE//entry entry align=centercommandINSERT//commandUPDATE//commandDELETE//entry entry align=centerTables/entry - entry align=centerTables and views/entry + entry align=centerTables, views and foreign tables/entry /row row entry align=centercommandTRUNCATE//entry @@ -122,7 +122,7 @@ CREATE [ CONSTRAINT ] TRIGGER replaceable class=PARAMETERname/replaceable entry align=center morerows=1literalAFTER//entry entry align=centercommandINSERT//commandUPDATE//commandDELETE//entry entry align=centerTables/entry - entry align=centerTables and views/entry + entry align=centerTables, views and foreign tables/entry /row row entry align=centercommandTRUNCATE//entry @@ -244,7 +244,7 @@ UPDATE OF replaceablecolumn_name1/replaceable [, replaceablecolumn_name2/ termreplaceable class=parametertable_name/replaceable/term listitem para - The name (optionally schema-qualified) of the table or view the trigger + The name (optionally schema-qualified) of the table, view or foreign table the trigger is for. /para /listitem diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index f579340..37c0a35 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -33,7 +33,7 @@ para A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is -performed. Triggers can be attached to both tables and views. +performed. Triggers can be attached to tables, views and foreign tables. /para para @@ -64,6 +64,15 @@ /para para +On foreign tables, trigger can be defined to execute either before or after any +commandINSERT/command, +commandUPDATE/command or +commandDELETE/command operation, only onece per acronymSQL statement. +This means that row-level triggers are not supported for foreign tables. + /para + + + para The trigger function must be defined before the trigger itself can be created. The trigger function must be declared as a function taking no arguments and returning type literaltrigger/. @@ -96,6 +105,7 @@ after may only be defined at statement level, while triggers that fire instead of an commandINSERT/command, commandUPDATE/command, or commandDELETE/command may only be defined at row level. +On foreign tables, triggers can not be defined at row level. /para para diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index adc74dd..68bfa9c 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3141,13 +3141,16 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, pass = AT_PASS_MISC; break; case AT_EnableTrig: /* ENABLE TRIGGER variants */ - case AT_EnableAlwaysTrig: - case AT_EnableReplicaTrig: case AT_EnableTrigAll: case AT_EnableTrigUser: case AT_DisableTrig: /* DISABLE TRIGGER variants */ case AT_DisableTrigAll: case AT_DisableTrigUser: +ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); +pass = AT_PASS_MISC; +break; + case AT_EnableReplicaTrig: + case AT_EnableAlwaysTrig: case AT_EnableRule: /* ENABLE/DISABLE RULE variants */ case AT_EnableAlwaysRule: case AT_EnableReplicaRule: diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index d86e9ad..d84b61b 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -184,12 +184,23 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString, RelationGetRelationName(rel)), errdetail(Views cannot have TRUNCATE triggers.))); } - else + else if (rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE) + { + if(stmt-row){ + ereport(ERROR, +
Re: [HACKERS] Extensions makefiles - coverage
Thank you for the tip, its done. 2013/7/26 Robert Haas robertmh...@gmail.com: On Thu, Jul 25, 2013 at 11:07 AM, Ronan Dunklau rdunk...@gmail.com wrote: Hello. I was having trouble figuring how to use the coverage targets when using an extension. I am using approximatively the layout that was proposed here: http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net It looks like everything is hard-coded to take the source and the gcda, gcno files in the base directory, but these files lay in a src directory with the proposed layout. It may be better to base the .gcda file discovery on the OBJS variables when using PGXS. Please find attached a small patch that implements this. There is probably a better way than the redundant rm $(gcda_files) / rm *.gcda to cleanup the generated files. With the attached patch, the following targets seem to have the same behaviour as on the current HEAD, both on the whole tree and on individual contrib modules: - coverage-html - clean - coverage-clean - clean-coverage I noticed that make clean leaves gcda and gcov files on the current HEAD, and this is no different with the given patch. I also tested it against several pgxn extensions, and it seems to work fine. You can ensure that your patch doesn't get forgotten about by adding it here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extensions makefiles - coverage
Hello. I was having trouble figuring how to use the coverage targets when using an extension. I am using approximatively the layout that was proposed here: http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net It looks like everything is hard-coded to take the source and the gcda, gcno files in the base directory, but these files lay in a src directory with the proposed layout. It may be better to base the .gcda file discovery on the OBJS variables when using PGXS. Please find attached a small patch that implements this. There is probably a better way than the redundant rm $(gcda_files) / rm *.gcda to cleanup the generated files. With the attached patch, the following targets seem to have the same behaviour as on the current HEAD, both on the whole tree and on individual contrib modules: - coverage-html - clean - coverage-clean - clean-coverage I noticed that make clean leaves gcda and gcov files on the current HEAD, and this is no different with the given patch. I also tested it against several pgxn extensions, and it seems to work fine. coverage_pgxs.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions makefiles - coverage
Please ignore this comment: I noticed that make clean leaves gcda and gcov files on the current HEAD, and this is no different with the given patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None. It seems the global decimal ctor is not initialized. Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible. Using the performance test from steve, on my machine: - with cdecimal installed: ~84ms - without cdecimal installed (standard decimal module): ~511ms 2013/6/26 Szymon Guz mabew...@gmail.com On 26 June 2013 01:40, Steve Singer st...@ssinger.info wrote: On 06/25/2013 06:42 AM, Szymon Guz wrote: Hi, I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email. thanks, Szymon This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch. One minor thing I noticed in this round, PLy_elog(ERROR, could not import module 'decimal'); I think should have decimal in double-quotes. I think this patch is ready for a committer to look at it. Steve Hi Steve, thanks for the review. I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available? thanks, Szymon cnumeric.patch Description: Binary data null_ctor.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
It seems like you confused me with steve :) The patch applies cleanly, and the regression tests pass on python2 when cdecimal is not installed. When it is, the type info returned for the converted numeric value is cdecimal.Decimal instead of decimal.Decimal. The regression tests expected output have not been modified for python3, and as such they fail on the type conversions. I am a bit confused with the use of PyModule_GetDict: shouldn't PyObj_GetAttrString be used directly instead ? Moreover, the reference count in the current implementation might be off: the reference count for the decimal module is never decreased, while the reference count to the module dict is, when the docs say it returns a borrowed reference. Please find a patch that fixes both issues. 2013/6/26 Szymon Guz mabew...@gmail.com Thanks Steve, that's exactly what I wanted to send when you sent your patches :) I need to figure out why that patch v2 worked for me, I think I made mess somewhere in my git repo and didn't create the patch properly. Sorry for that. Patch is attached, I've also added information about cdecimal to plpython documentation. I'm just wondering how to make integration tests to check when cdecimal is installed and when it is not. thanks, Szymon On 26 June 2013 10:12, Ronan Dunklau rdunk...@gmail.com wrote: The v2 patch does not work for me: regression tests for plpython fail on the plpython_types test: every numeric is converted to None. It seems the global decimal ctor is not initialized. Please find two patches, to be applied on top of the v2 patch: one initializes the decimal ctor, the other uses cdecimal when possible. Using the performance test from steve, on my machine: - with cdecimal installed: ~84ms - without cdecimal installed (standard decimal module): ~511ms 2013/6/26 Szymon Guz mabew...@gmail.com On 26 June 2013 01:40, Steve Singer st...@ssinger.info wrote: On 06/25/2013 06:42 AM, Szymon Guz wrote: Hi, I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email. thanks, Szymon This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch. One minor thing I noticed in this round, PLy_elog(ERROR, could not import module 'decimal'); I think should have decimal in double-quotes. I think this patch is ready for a committer to look at it. Steve Hi Steve, thanks for the review. I was thinking about speeding up the Decimal conversion using the module you wrote about. What about trying to import it, if it fails, than trying to load decimal.Decimal? There will be no warning in logs, just additional information in documentation that it uses this module if it is available? thanks, Szymon plpython_decimal_fix_regression.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
Concerning the efficiency problem, it should be noted that the latest 3.3 release of cpython introduces an accelerator for decimal data types, as a C-module. This module was previously available from the Python package index at: https://pypi.python.org/pypi/cdecimal/2.2 It may be overkill to try to include such a dependency, but the performance overhead from using decimal is really mitigated by this implementation. 2013/6/25 Szymon Guz mabew...@gmail.com On 25 June 2013 05:16, Steve Singer st...@ssinger.info wrote: On 05/28/2013 04:41 PM, Szymon Guz wrote: Hi, I've got a patch. This is for a plpython enhancement. There is an item at the TODO list http://wiki.postgresql.org/** wiki/Todo#Server-Side_**Languageshttp://wiki.postgresql.org/wiki/Todo#Server-Side_Languages Fix loss of information during conversion of numeric type to Python float This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float. Patch contains changes in code, documentation and tests. Most probably there is something wrong, as this is my first Postgres patch :) Thanks for contributing. This patch applies cleanly against master and compiles with warnings plpy_main.c: In function ‘PLy_init_interp’: plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-**statement] plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-**statement] You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared. Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module. The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function. One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before. create temp table b(a numeric); insert into b select generate_series(1,1); create or replace function x(a numeric,b numeric) returns numeric as $$ if a==None: return b return a+b $$ language plpythonu; create aggregate sm(basetype=numeric, sfunc=x,stype=numeric); test=# select sm(a) from b; sm -- 50005000 (1 row) Time: 565.650 ms versus before the patch this was taking in the range of 80ms. Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing, Documentation = Your patched version of the docs say PostgreSQL typereal/type, typedouble/type, and typenumeric/type are converted to Python typeDecimal/type. This type is imported fromliteraldecimal.Decimal/**literal. I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point. listitem para PostgreSQL typereal/type and typedouble/typeare converted to Python typefloat/type. /para /listitem listitem para PostgreSQL typenumeric/type is converted to Python typeDecimal/type. This type is imported from literaldecimal.Decimal/**literal. /para /listitem Hi, I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email. thanks, Szymon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Writable FDW: returning clauses.
Hello. While implementing the new writable API for FDWs, I wondered wether they are any obvious problems with the following behavior for handling returning clauses (for the delete case). The goal is to fetch all required attributes during the initial scan, and avoid fetching data on the delete operation itself. - in the AddForeignUpdateTargets hook, add resjunk entries for the columns in the returning clause - in the ExecForeignDelete hook, fill the returned slot with values taken from the planSlot. -- Ronan Dunklau signature.asc Description: This is a digitally signed message part.
[HACKERS] FDW: ForeignPlan and parameterized paths
Hello. I've noticed that, when implementing a FDW, it is difficult to use a plan which best path is a parameterized path. This comes from the fact that the parameterized clause is not easily available at plan time. This is what I understood from how it works: - The clauses coming from the best path restrictinfo are not available in the scan_clauses argument to the GetForeignPlan function. - They are, however, directly available on the path, but at this point the clauses are of the form InnerVar OPERATOR OuterVar. The outer Var node is then replaced by a Param node, using the replace_nestloop_params function. It could be useful to make the parameterized version of the clause (in the form InnerVar OPERATOR Param) available to the fdw at plan time. Could this be possible ? Maybe by replacing the clauses on the restrictinfo nodes from the path param info by the parameterized clauses, and then adding these to the scan clauses passed to GetForeignPlan ? Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW: ForeignPlan and parameterized paths
I intentionally did the nestloop_params substitution after calling GetForeignPlan not before. It's not apparent to me why it would be useful to do it before, because the FDW is going to have no idea what those params represent. (Note that they represent values coming from some other, probably local, relation; not from the foreign table.) Even if the FDW have no idea what they represent, it can identify a clause of the form Var Operator Param, which allows to store the param reference (paramid) for retrieving the param value at execution time. If the chosen best path is a parameterized path that has been built by the FDW, it allows to push down this restriction. If this isn't possible, the only way I found to use those clauses would be at scan time. Lets's assume atable is a local relation, and aftable is a foreign table, and the query looks like this: select * from atable t1 inner join aftable t2 on t1.c1 = t2.c1 The FDW identifies the join clause on its column c1, and build a parameterized path on this column (maybe because this column is unique and indexed on the remote side). The planner chooses this path, building a nested loop rescanning the foreign table with this parameter value reflecting the outer relation value (maybe because the local relation's size is much smaller than the remote relation's size). In that case, it seems to be of particular importance to have access to the clause, so that the nested loop can work as intended: avoiding a full seqscan on the remote side. Or is there another way to achieve the same goal ? Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] writable foreign tables
Hello. I've tried to implement this API for our Multicorn FDW, and I have a few questions about the API. First, I don't understand what are the requirements on the rowid pseudo-column values. In particular, this sentence from a previous mail makes it ambiguous to me: At the beginning, I constructed the rowid pseudo-column using INTERNALOID, but it made troubles when fetched tuples are materialized prior to table modification. So, the rowid argument of them are re-defined as const char *. Does that mean that one should only store a cstring in the rowid pseudo-column ? Or can one store an arbitrary pointer ? Currently, I'm building a text value using cstring_to_text_with_len. Could there be a problem with that ? Secondly, how does one use a returning clause ? I've tried to look at the postgres_fdw code, but it does not seem to handle that. For what its worth, knowing that the postgres_fdw is still in WIP status, here is a couple result of my experimentation with it: - Insert into a foreign table mapped to a table with a before trigger, using a returning clause, will return the values as they were before being modified by the trigger. - Same thing, but if the trigger prevents insertion (returning NULL), the would-have-been inserted row is still returned, although the insert statement reports zero rows. - Inserting into a table with default values does not work as intended, since missing values are replaced by a null value in the remote statement. What can be done to make the behaviour more consistent ? I'm very excited about this feature, thank you for making this possible. Regards, -- Ronan Dunklau 2012/12/14 Albe Laurenz laurenz.a...@wien.gv.at Kohei KaiGai wrote: I came up with one more query that causes a problem: [...] This causes a deadlock, but one that is not detected; the query just keeps hanging. The UPDATE in the CTE has the rows locked, so the SELECT ... FOR UPDATE issued via the FDW connection will hang indefinitely. I wonder if that's just a pathological corner case that we shouldn't worry about. Loopback connections for FDWs themselves might not be so rare, for example as a substitute for autonomous subtransactions. I guess it is not easily possible to detect such a situation or to do something reasonable about it. It is not avoidable problem due to the nature of distributed database system, not only loopback scenario. In my personal opinion, I'd like to wait for someone implements distributed lock/transaction manager on top of the background worker framework being recently committed, to intermediate lock request. However, it will take massive amount of efforts to existing lock/transaction management layer, not only enhancement of FDW APIs. It is obviously out of scope in this patch. So, I'd like to suggest authors of FDW that support writable features to put mention about possible deadlock scenario in their documentation. At least, above writable CTE example is a situation that two different sessions concurrently update the test relation, thus, one shall be blocked. Fair enough. I tried to overhaul the documentation, see the attached patch. There was one thing that I was not certain of: You say that for writable foreign tables, BeginForeignModify and EndForeignModify *must* be implemented. I thought that these were optional, and if you can do your work with just, say, ExecForeignDelete, you could do that. Yes, that's right. What I wrote was incorrect. If FDW driver does not require any state during modification of foreign tables, indeed, these are not mandatory handler. I have updated the documentation, that was all I changed in the attached patches. OK. I split the patch into two portion, part-1 is the APIs relevant patch, part-2 is relevant to postgres_fdw patch. Great. I'll mark the patch as ready for committer. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Arguments to foreign tables?
Maybe you could set some options on the foreign table before selecting from it ? Another way you could achieve the same result would be to give some column a special meaning (like it is done in the twitter_fdw for example). If you don't mind, do you have a specific use-case for this ? -- Ronan Dunklau 2012/11/6 Jeff Davis pg...@j-davis.com On Sun, 2012-11-04 at 15:13 -0500, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Is there any fundamental or philosophical reason why a foreign table can't accept arguments? That isn't a table; it's some sort of function. Now that we have LATERAL, there is no good reason to contort SQL's syntax and semantics in the direction you suggest. Maybe I should rephrase this as a problem with SRFs: you don't get to define the init/exec/end executor functions, and you don't get access to the optimizer information. It seems like foreign tables are a better mechanism (except for the simple cases where you don't care about the details), and the only thing an SRF can do that a foreign table can't is accept arguments. So, I thought maybe it would make more sense to combine the mechanisms somehow. Take something as simple as generate_series: right now, it materializes the entire thing if it's in the FROM clause, but it wouldn't need to if it could use the foreign table mechanism. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Query planning, nested loops and FDW.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello. Thanks to the indications given by Tom Lane, I was able to make use of parameterized path in our multicorn extension. For example, given a (local) table real_table and another (foreign) table foreign_table, having the same set of columns, if the foreign table declares that a filter on col1 and col2 will, on average, return 100 rows (vs 1million for an unparameterized path), postgresql will produce the following plan: explain select real_table.*, foreign_table.* from real_table inner join foreign_table on real_table.col1 = foreign_table.col1 and real_table.col2 = foreign_table.col2; QUERY PLAN - - Nested Loop (cost=10.00..19145165.27 rows=50558 width=64) - Seq Scan on real_table (cost=0.00..38.27 rows=2127 width=32) - Foreign Scan on foreign_table (cost=10.00..9000.00 rows=100 width=90) Filter: ((real_table.col1 = (col1)::text) AND (real_table.col2 = (col2)::text)) This is exactly what I wanted to achieve. But there is still room for improvement here, because the inner foreign scan will be executed for each row, and not for each distinct couple of col1, col2. Does this happen because: - there is no other way to proceed. - the planner thinks it is less costly to execute the inner scan multiple times than to do whatever it takes to execute it only one time for each distinct couple ? Best regards, - -- Ronan Dunklau -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAEBAgAGBQJQAEbxAAoJECTYLCgFy323FTUH/j81AAT1ODBdizIdTV+yI7nX KjCg+hBwTlKMs8l8KUuslEo0wp3Wc8Yem0PFCvO3+0IYZ26iGsi5jIoqflaZ86gZ MAjRoUyXfn3Maz/vU3TIYYwYnWhMp1i4GwFf6bqXaVlCVYAaARetksxc5o52lZZT cgN/D1wek6FQkKSSN916siuIwlkEIHiMkB3VF2up1veRtzPbOvvosmdAKyYaMAcH auqOBu/PVMUkR/5g/HbqkK+DoN3PYXpUw7LWPfoAQHEYijCPdR9De9BnJGW4RZL2 j2xkixJSR0h8KYkgH6WIAXTfbz1/l9GFXe0mJFskfU42mGmpLO41YeqhbSHaNtw= =SiMC -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG9.2 and FDW query planning.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/07/2012 18:30, Tom Lane wrote: Ronan Dunklau rdunk...@gmail.com writes: Let's say I have an IMAP foreign data wrapper, and I write a query joining the table on itself using the In-Reply-To and Message-ID headers, is there anything I can do to avoid fetching all the mails from the remote server ? If I could somehow inform the planner that it can look up rows by message-id, thus avoiding the need to fetch everything from the remote server. Perhaps persuading the planner to use a nested-loop ? OK, so what you're saying is that the imap server can effectively provide an index on message_id. What you'd do is create a parameterized path that uses the tbl.message_id = other_tbl.in_reply_to join clause. If that's enough cheaper than a full scan, the planner would select it. Thank you, I was able to build such paths from your indication. The python FDW implementor can optionally give a list of tuples consisting of (path key, expected_number_of_row). So, in the imap example that could be [('Message-ID', 1), ('From', 1000)] for example. - From this information, if there is an equivalence class which restrictinfo uses one of those keys, we build a parameterized path, with an associated cost of base_width * expected_number_of_row, in addition to the generic, unparameterized path. The planner can then select this path, and build plans looking like this: postgres=# explain select m1.From, m1.To, m2.From, m2.To from mails m1 inner join mails m2 on m2.Message-ID = m1.In-Reply-To where m1.From = '%t...@example.com%'; QUERY PLAN - Nested Loop (cost=10.00..60001000.00 rows=5 width=128) - Foreign Scan on mails m1 (cost=0.00..3000.00 rows=10 width=300) Filter: ((From)::text = '%t...@example.com%'::text) - Foreign Scan on mails m2 (cost=10.00..300.00 rows=1 width=300) Filter: ((Message-ID)::text = (m1.In-Reply-To)::text) If I understand it correctly, after returning a ForeignScan (from GetForeignPlan), the planner decides to use a nestloop, and in the process of creating the nestloop plan, replaces Var nodes coming from the outerel (here, m1.In-Reply-To) by params nodes. My current implementation already looks for (var = param) expressions that it may handle during the plan phase and stores the association between the var and the param_id. At execution time, the needed parameters values are fetched (from the ParamExecData array found in es_param_exec_vals) and passed to the python foreign data wrapper. The problem I have: how can I retrieve the generated params and keep the association between the var and the param ? Should I replace the (var = outervar) clauses by (var = param) myself and store them in the fdw_exprs field of the foreign scan ? FWIW, I'm not sure that it's sane to try to expose this stuff to python yet. It's complicated and still something of a moving target. Once we've got a few more C-coded FDWs that can do this type of optimization, things might be stable enough that it'd be useful to try to provide a high-level API. The current API (as mentioned above) would be more declarative than anything, only offering a way to (maybe) build parameterized paths without guaranteeing anything. Even if the internals change, I fail to see how it can hurt to offer such a feature. Regards, - -- Ronan Dunklau -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAEBAgAGBQJP/s2dAAoJECTYLCgFy3239KkIAIiKJo/F1r4Yp49wLpmThjQI ICo910ZajqlUKVsl9ye8m2l6p+lyGEmZMWUAWP6ae2pqFR+aC0zThypjF1faZ9tN HfqMbEKx/trkDf05U28tJlvOeu21tiEOEs4n02fmfdHu9SvemuLdyhU3dOLxoBVK ZZ8ra9q/+zHCPpc3zt0Mow80Q1X1M3DtirsHPoeIdOK69wD4nD2ZfhQule5HaoV1 dG3FlrKGAGzRpohLBCuWzyGPcWCS584lXGWfhsz/waLaSDIjcjvaaMke54eaa8Ci 7KxXkMM12CKFQyheSR5VVwFJrobnME2HDiJCoAOkRc0dW+Y2aASJnKG/FwL8C7s= =4RjB -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PG9.2 and FDW query planning.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello. I'm in the process of porting our multicorn extension to pg9.2, and I'd like to take advantage of the GetForeignPaths hook. The multicorn extension allows the creation of a FDW in python using a simple API, and I'd like to be able to provide FDW implementors a way to influence the planner. Let's say I have an IMAP foreign data wrapper, and I write a query joining the table on itself using the In-Reply-To and Message-ID headers, is there anything I can do to avoid fetching all the mails from the remote server ? If I could somehow inform the planner that it can look up rows by message-id, thus avoiding the need to fetch everything from the remote server. Perhaps persuading the planner to use a nested-loop ? If I understand correctly, I should build one (or more) list of pathkeys, and add more foreign paths using those pathkeys. There was a discussion about index on foreign tables back in march. - From what I understand from this discussion, someone proposed to locally store information about indexes on the foreign tables, but I did not find anything on how to build a path from scratch. Thank you. - -- Ronan Dunklau -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAEBAgAGBQJP9bsIAAoJECTYLCgFy323KtsH/2/8AVfBRm75oWFMlU0l0oBC ujxkt338PnpVi1V5gtE5GSRSwybPWytXAkgzIQ5/DEP/RmeW8pliV+0V7zvqlEWG zgvfA7stRBWtIIIv6mdlTM0eBBgsFnoJLiJDTUDst5vAaj8vg8b+pX/ip7nSF5sw dV2i3ir6JSsG4nJOcQ/kP6xg4Joan65pOwFDfwnx9pFnerT0YN9f87DRuohcj12e fgWSqZkGU5nx9yCLWa294YzIFFY7lIjLowzEfg2eP2dVIM09GquKsSXyilJiMy4J 3QL64mUDF/pNZeH0LnpyGJfCfPlQsX4c554rZbO03tVeSEZMyVpCISLqutTnR9I= =HwMc -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FOREGIN TABLE LACUNA
On 14/03/2012 16:47, David Fetter wrote: On Wed, Mar 14, 2012 at 11:29:14AM -0400, Tom Lane wrote: David Fetter da...@fetter.org writes: On Wed, Mar 14, 2012 at 10:27:28AM -0400, Tom Lane wrote: Hm. That opinion seems to me to connect to the recently-posted patch to make contrib/file_fdw enforce NOT NULL constraints. Should we instead have the position that constraints declared for foreign tables are statements that we can take on faith, and it's the user's fault if they are wrong? I think that's something FDWs need to be able to communicate to PostgreSQL. For example, something talking to another PostgreSQL would (potentially, anyhow) have access to deep knowledge of the remote side, but file_fdw would only have best efforts even for clever things like statistics. I think we're talking at cross-purposes. What you're saying seems to assume that it's the system's responsibility to do something about a constraint declared on a foreign table. What I'm suggesting is that maybe it isn't. Actually, I'm suggesting that this behavior needs to be controlled, not system-wide, but per FDW, and eventually per server, table and column. A constraint, ordinarily, would be enforced against table *updates*, and then just assumed valid during reads. In the case of a foreign table, we can't enforce constraints during updates because we don't have control of all updates. I think that the situation will become a bit more nuanced than that. A FDW could delegate constraints to the remote side, and in principle, the remote side could inform PostgreSQL of all types of changes (DML, DCL, DDL). Should we ignore declared constraints because they're not necessarily true? Should we assume on faith that they're true? Neither. We should instead have ways for FDWs to say which constraints are local-only, and which presumed correct on the remote side. If they lie when asserting the latter, that's pilot error. I don't understand what value would that bring. Do you propose that, if a FDW declares a constraint as local-only, the planner should ignore them but when declared as remote, it could use this information ? Let me describe a simple use case we have in one of our web applications, which would benefit from foreign keys on foreign tables. The application has users, stored in a users table, which can upload files. The files are stored on the server's filesystem, using one folder per user, named after the user_id. Ex: / 1/ myfile.png 2/ myotherfile.png This filesystem is accessed using the StructuredFS FDW, which maps a file system tree to a set of columns corresponding to parts of the file path: every file which path matches the pattern results in a row. Using the aforementioned structure, the foreign table would have an user_id column, and a filename column. Now, the FDW itself cannot know that the foreign key will be enforced, but as the application developer, I know that every directory will be named after an user_id. Allowing foreign keys on such a foreign table would allow us to describe the model more precisely in PostgreSQL, and external tools could use this knowledge too, even if PostgreSQL completely ignore them. Especially ORMs relying on foreign keys to determine join conditions between tables. On the other hand, should foreign keys referencing a foreign table be allowed too ? From a foreign table to another, from a local table to a foreign table ? Regards, -- Ronan Dunklau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers