Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 1/9/15, 6:54 PM, Jim Nasby wrote: On 1/9/15, 6:44 PM, Petr Jelinek wrote: Yep, I had a same impression when I looked at the code first time, however, it is defined as below. Not a manner of custom-scan itself. /* * == * Scan nodes * == */ typedef struct Scan { Planplan; Index scanrelid; /* relid is index into the range table */ } Scan; Yeah there are actually several places in the code where relid means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that. Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don't want to waste time if it'll just be rejected. Any other examples of this I should fix too? Sorry, to clarify... any other items besides Scan.scanrelid that I should fix? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing memory leak in pg_upgrade
On Fri, Jan 9, 2015 at 9:23 PM, Tatsuo Ishii is...@postgresql.org wrote: This is because gen_db_file_maps() allocates memory even if n_maps == 0. Purely cosmetic: the initialization n_maps = 0 before the call of gen_db_file_maps is unnecessary ;) Of course. n_maps is written by calling gen_db_file_maps() anyway. I was talking about the case after calling gen_db_file_maps(). Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 1/9/15, 6:44 PM, Petr Jelinek wrote: Yep, I had a same impression when I looked at the code first time, however, it is defined as below. Not a manner of custom-scan itself. /* * == * Scan nodes * == */ typedef struct Scan { Planplan; Index scanrelid; /* relid is index into the range table */ } Scan; Yeah there are actually several places in the code where relid means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that. Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don't want to waste time if it'll just be rejected. Any other examples of this I should fix too? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On 2014-11-17 18:22:54 +0300, Alex Shulgin wrote: Andres Freund and...@2ndquadrant.com writes: I've invested some more time in this: 0002 now makes sense on its own and doesn't change anything around the interrupt handling. Oh, and it compiles without 0003. In this patch, the endif appears to be misplaced in PostgresMain: + if (MyProcPort != NULL) + { +#ifdef WIN32 + pgwin32_noblock = true; +#else + if (!pg_set_noblock(MyProcPort-sock)) + ereport(COMMERROR, + (errmsg(could not set socket to nonblocking mode: %m))); + } +#endif + Uh. Odd. Anyway, that bit of code is now somewhere else anyway... One thing I did try is sending a NOTICE to the client when in ProcessInterrupts() and DoingCommandRead is true. I think[1] it was expected to be delivered instantly, but actually the client (psql) only displays it after sending the next statement. Yea, that should be psql specific though. I hope ;) While I'm reading on FE/BE protocol someone might want to share his wisdom on this subject. My guess: psql blocks on readline/libedit call and can't effectively poll the server socket before complete input from user. I'm not sure if it's actually a can't. It doesn't at least ;) Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] max_connections documentation
I'm surprised to see that the docs make no mention of how max_connections, max_worker_processes and autovacuum_max_workers (don't) relate. I couldn't remember and had to actually look at the code. I'd like to clarify this in the max_connecitons section of the documents by doing s/connections/user connections/ and including the formula for MaxBackends (MaxBackends = MaxConnections + autovacuum_max_workers + 1 + max_worker_processes). I'll also mention that any postgres_fdw connections are considered user connections. Objections? Comments? Also, my understanding is that the parallel stuff will continue to fall under max_worker_processes? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
On Jan 9, 2015, at 11:37 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/08/2015 08:42 PM, Aaron Botsis wrote: On Jan 8, 2015, at 3:44 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/08/2015 03:05 PM, Aaron Botsis wrote: It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this: copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02’; While perhaps unnecessary, given the size and simplicity of the patch, IMO it’s a no brainer to merge (it actually makes the code smaller by 3 lines). It also enables non-json use cases anytime one might want to preserve embedded escapes, or use different ones entirely. Do you see other reasons not to commit it? Well, for one thing it's seriously incomplete. You need to be able to change the delimiter as well. Otherwise, any embedded tab in the json will cause you major grief. Currently the delimiter and the escape MUST be a single byte non-nul character, and there is a check for this in csv mode. Your patch would allow any arbitrary string (including one of zero length) for the escape in text mode, and would then silently ignore all but the first byte. That's not the way we like to do things. And, frankly, I would need to spend quite a lot more time thinking about other implications than I have given it so far. This is an area where I tend to be VERY cautious about making changes. This is a fairly fragile ecosystem. So I’m going to do a bit more testing with another patch tomorrow with delimiters removed. If you can think of any specific cases you think will break it let me know and I’ll make sure to add regression tests for them as well. Well, I still need convincing that this is the best solution to the problem. As I said, I need to spend more time thinking about it. I offered an alternative (RAW mode w/record terminator) that you ignored. So in lieu of a productive discussion about “the best solution to the problem”, I went ahead and continued to complete the patch since I believe it’s a useful FE that it could be helpful for this and other use cases. There have been multiple times (not even w/json) I wished COPY would stop being so smart. FWIW, (if anyone’s interested in it) I also hacked up some python that’ll read a json file, and outputs a binary file suitable for use with COPY BINARY that gets around all this stuff. Obviously this only works for json (not jsonb) columns (though you could SELECT INTO a jsonb column). Happy to pass it along. Aaron
Re: [HACKERS] Parallel Seq Scan
On Fri, Jan 9, 2015 at 10:54 PM, Stephen Frost sfr...@snowman.net wrote: * Amit Kapila (amit.kapil...@gmail.com) wrote: In our case as currently we don't have a mechanism to reuse parallel workers, so we need to account for that cost as well. So based on that, I am planing to add three new parameters cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost * cpu_tuple_comm_cost - Cost of CPU time to pass a tuple from worker to master backend with default value DEFAULT_CPU_TUPLE_COMM_COST as 0.1, this will be multiplied with tuples expected to be selected * parallel_setup_cost - Cost of setting up shared memory for parallelism with default value as 100.0 * parallel_startup_cost - Cost of starting up parallel workers with default value as 1000.0 multiplied by number of workers decided for scan. I will do some experiments to finalise the default values, but in general, I feel developing cost model on above parameters is good. The parameters sound reasonable but I'm a bit worried about the way you're describing the implementation. Specifically this comment: Cost of starting up parallel workers with default value as 1000.0 multiplied by number of workers decided for scan. That appears to imply that we'll decide on the number of workers, figure out the cost, and then consider parallel as one path and not-parallel as another. I'm worried that if I end up setting the max parallel workers to 32 for my big, beefy, mostly-single-user system then I'll actually end up not getting parallel execution because we'll always be including the full startup cost of 32 threads. For huge queries, it'll probably be fine, but there's a lot of room to parallelize things at levels less than 32 which we won't even consider. Actually the main factor to decide whether a parallel plan will be selected or not will be based on selectivity and cpu_tuple_comm_cost, parallel_startup_cost is mainly to prevent the cases where user has set parallel_seqscan_degree, but the table is small enough (letus say 10,000 tuples) that it doesn't need parallelism. If you are worried by default cost parameter's, then I think those still needs to be decided based on certain experiments. What I was advocating for up-thread was to consider multiple parallel paths and to pick whichever ends up being the lowest overall cost. The flip-side to that is increased planning time. The main idea behind providing a parameter like parallel_seqscan_degree is such that it will try to use that many number of workers for a single parallel operation (intra-node parallelism) and incase we have to perform inter-node parallelism than having such an parameter means that each node can use that many number of parallel worker. For example we have to parallelize scan as well as sort (Select * from t1 order by c1), and parallel_degree is specified as 2, then each of the scan and sort can use 2 parallel workers each. This is somewhat similar to the concept how degree of parallelism (DOP) works in other databases. Refer case of Oracle [1] (Setting Degree of Parallelism). I don't deny the fact that it will be a idea worth exploring to make optimizer more smart for deciding parallel plans, but it seems to me it is an advanced topic which will be more valuable when we will try to parallelize joins or other similar stuff and even most papers talk about it in those regards only. At this moment if we can ensure that parallel plan should not be selected for cases where it will perform poorly is more than enough considering we have lots of other work left to even make any parallel operation work. Perhaps we can come up with an efficient way of working out where the break-point is based on the non-parallel cost and go at it from that direction instead of building out whole paths for each increment of parallelism. I'd really like to be able to set the 'max parallel' high and then have the optimizer figure out how many workers should actually be spawned for a given query. Execution: Most other databases does partition level scan for partition on different disks by each individual parallel worker. However, it seems amazon dynamodb [2] also works on something similar to what I have used in patch which means on fixed blocks. I think this kind of strategy seems better than dividing the blocks at runtime because dividing randomly the blocks among workers could lead to random scan for a parallel sequential scan. Yeah, we also need to consider the i/o side of this, which will definitely be tricky. There are i/o systems out there which are faster than a single CPU and ones where a single CPU can manage multiple i/o channels. There are also cases where the i/o system handles sequential access nearly as fast as random and cases where sequential is much faster than random. Where we can get an idea of that distinction is with seq_page_cost vs. random_page_cost as folks running
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 10/01/15 01:19, Kohei KaiGai wrote: 2015-01-10 8:18 GMT+09:00 Jim Nasby jim.na...@bluetreble.com: On 1/6/15, 5:43 PM, Kouhei Kaigai wrote: scan_relid != InvalidOid Ideally, they should be OidIsValid(scan_relid) Scan.scanrelid is an index of range-tables list, not an object-id. So, InvalidOid or OidIsValid() are not a good choice. I think the name needs to change then; scan_relid certainly looks like the OID of a relation. scan_index? Yep, I had a same impression when I looked at the code first time, however, it is defined as below. Not a manner of custom-scan itself. /* * == * Scan nodes * == */ typedef struct Scan { Planplan; Index scanrelid; /* relid is index into the range table */ } Scan; Yeah there are actually several places in the code where relid means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating copyright notices to 2015 for PGDG
On 1/6/15, 6:15 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 1/6/15, 3:30 PM, Stefan Kaltenbrunner wrote: I dont know why it is really needed but maybe for the files that have identical copyrights one could simple reference to the COPYRIGHT file we already have in the tree? +1 Unless either of you is a copyright lawyer, your opinion on whether that's sufficient is of zero relevance. No, but a friend's dad is. I could probably get his opinion on it if that would be helpful. Personally I think it's just fine if we have some mechanism that forces text files to have trailing newlines ;-) Which is what I was suggesting... ;) AFAIK that would be a matter of installing the appropriate hook on git.postgresql.org. https://www.google.com/search?q=git+enforce+trailing+newline -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015-01-10 9:56 GMT+09:00 Jim Nasby jim.na...@bluetreble.com: On 1/9/15, 6:54 PM, Jim Nasby wrote: On 1/9/15, 6:44 PM, Petr Jelinek wrote: Yep, I had a same impression when I looked at the code first time, however, it is defined as below. Not a manner of custom-scan itself. /* * == * Scan nodes * == */ typedef struct Scan { Planplan; Index scanrelid; /* relid is index into the range table */ } Scan; Yeah there are actually several places in the code where relid means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that. Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don't want to waste time if it'll just be rejected. Any other examples of this I should fix too? Sorry, to clarify... any other items besides Scan.scanrelid that I should fix? This naming is a little bit confusing, however, I don't think it should be changed because this structure has been used for a long time, so reworking will prevent back-patching when we find bugs around scanrelid. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Sat, Jan 10, 2015 at 2:45 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: On 01/09/2015 08:01 PM, Stephen Frost wrote: Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: On Fri, Jan 9, 2015 at 1:02 AM, Jim Nasby jim.na...@bluetreble.com wrote: I agree, but we should try and warn the user if they set parallel_seqscan_degree close to max_worker_processes, or at least give some indication of what's going on. This is something you could end up beating your head on wondering why it's not working. Yet another way to handle the case when enough workers are not available is to let user specify the desired minimum percentage of requested parallel workers with parameter like PARALLEL_QUERY_MIN_PERCENT. For example, if you specify 50 for this parameter, then at least 50% of the parallel workers requested for any parallel operation must be available in order for the operation to succeed else it will give error. If the value is set to null, then all parallel operations will proceed as long as at least two parallel workers are available for processing. Now, for debugging purposes, I could see such a parameter being available but it should default to 'off/never-fail'. not sure what it really would be useful for - if I execute a query I would truely expect it to get answered - if it can be made faster if done in parallel thats nice but why would I want it to fail? One usecase where I could imagine it to be useful is when the query is going to take many hours if run sequentially and it could be finished in minutes if run with 16 parallel workers, now let us say during execution if there are less than 30% of parallel workers available it might not be acceptable to user and he would like to rather wait for some time and again run the query and if he wants to run query even if 2 workers are available, he can choose not to such a parameter. Having said that, I also feel this doesn't seem to be an important case to introduce a new parameter and such a behaviour. I have mentioned, because it came across my eyes how some other databases handle such a situation. Lets forget this suggestion if we can't imagine any use of such a parameter. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] max_connections documentation
On Sat, Jan 10, 2015 at 6:20 AM, Jim Nasby jim.na...@bluetreble.com wrote: I'm surprised to see that the docs make no mention of how max_connections, max_worker_processes and autovacuum_max_workers (don't) relate. I couldn't remember and had to actually look at the code. I'd like to clarify this in the max_connecitons section of the documents by doing s/connections/user connections/ and including the formula for MaxBackends (MaxBackends = MaxConnections + autovacuum_max_workers + 1 + max_worker_processes). I'll also mention that any postgres_fdw connections are considered user connections. I think it makes sense to add such a clarification in docs, however not sure if specifying along with max_connections parameter is good idea as the formula is somewhat internal and is not directly related to max_connections. How about specifying in below page: http://www.postgresql.org/docs/devel/static/connect-estab.html Also, my understanding is that the parallel stuff will continue to fall under max_worker_processes? Yes. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] libpq 9.4 requires /etc/passwd?
On Fri, Jan 9, 2015 at 06:57:02PM -0500, Tom Lane wrote: I wrote: Christoph Berg c...@df7cb.de writes: libpq wants the user home directory to find .pgpass and .pg_service.conf files, but apparently the behavior to require the existence of the passwd file (or nss equivalent) is new in 9.4. There is demonstrably no direct reference to /etc/passwd in the PG code. It's possible that we've added some new expectation that $HOME can be identified, but a quick look through the code shows no such checks that don't look like they've been there for some time. Hmm ... actually, I'll bet it's not $HOME that's at issue, but the name of the user. Commit a4c8f14364c27508233f8a31ac4b10a4c90235a9 turned failure of pg_fe_getauthname() into a hard connection failure, whereas previously it was harmless as long as the caller provided a username. I wonder if we shouldn't just revert that commit in toto. Yeah, identifying an out-of-memory error might be useful, but this cure seems a lot worse than the disease. What's more, this coding reports *any* pg_fe_getauthname failure as out of memory, which is much worse than useless. Alternatively, maybe don't try to resolve username this way until we've found that the caller isn't providing any username. Seems we both found it at the same time. Here is the thread were we discussed it, and you were concerned about the memory allocation failure too: http://www.postgresql.org/message-id/flat/20140320154905.gc7...@momjian.us#20140320154905.gc7...@momjian.us In particular, it appears to me that if the strdup in pg_fe_getauthname fails, we'll just let that pass without comment, which is inconsistent with all the other out-of-memory cases in conninfo_add_defaults. (I wonder whether any code downstream of this supposes that we always have a value for the user option. It's a pretty safe bet that the case is hardly ever tested.) I have developed the attached patch which documents why the user name lookup might fail, and sets the failure string to . It preserves the memory allocation failure behavior. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c new file mode 100644 index 179793e..b9155b4 *** a/src/interfaces/libpq/fe-auth.c --- b/src/interfaces/libpq/fe-auth.c *** pg_fe_sendauth(AuthRequest areq, PGconn *** 715,726 * pg_fe_getauthname * * Returns a pointer to dynamic space containing whatever name the user ! * has authenticated to the system. If there is an error, return NULL. */ char * pg_fe_getauthname(void) { ! const char *name = NULL; char *authn; #ifdef WIN32 --- 715,731 * pg_fe_getauthname * * Returns a pointer to dynamic space containing whatever name the user ! * has authenticated to the system. Returns NULL on memory allocation ! * failure, and a zero-length string on user name lookup failure. */ char * pg_fe_getauthname(void) { ! /* ! * We might be in a chroot environment where we can't look up our own ! * user name, so return . ! */ ! const char *name = ; char *authn; #ifdef WIN32 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq 9.4 requires /etc/passwd?
Hi, I've got several reports that postfix's pgsql lookup tables are broken with 9.4's libpq5, while 9.3's libpq5 works just fine. The error message looks like this: Jan 10 00:11:40 lehmann postfix/trivial-rewrite[29960]: warning: connect to pgsql server localhost:5432: out of memory? Jan 10 00:11:40 lehmann postfix/trivial-rewrite[29960]: warning: pgsql:/etc/postfix/pgsqltest lookup error for * The out of memory? message comes from PQsetdbLogin and PQerrorMessage in postfix-2.11.3/src/global/dict_pgsql.c: if ((host-db = PQsetdbLogin(host-name, host-port, NULL, NULL, dbname, username, password)) == NULL || PQstatus(host-db) != CONNECTION_OK) { msg_warn(connect to pgsql server %s: %s, host-hostname, PQerrorMessage(host-db)); There are two ways around the problem on the postfix side: not running the postfix service chrooted, or using a proxy map which effectively does the lookup also from outside the chroot. Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756627 mentions another solution: creation of an /etc/passwd file inside the postfix chroot. libpq wants the user home directory to find .pgpass and .pg_service.conf files, but apparently the behavior to require the existence of the passwd file (or nss equivalent) is new in 9.4. I've tried to make sense of the fe-connect.c code to find the issue but couldn't spot it. Can someone explain what's going on there? Is this a bug in libpq? (It's certainly a regression.) Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq 9.4 requires /etc/passwd?
Christoph Berg c...@df7cb.de writes: libpq wants the user home directory to find .pgpass and .pg_service.conf files, but apparently the behavior to require the existence of the passwd file (or nss equivalent) is new in 9.4. There is demonstrably no direct reference to /etc/passwd in the PG code. It's possible that we've added some new expectation that $HOME can be identified, but a quick look through the code shows no such checks that don't look like they've been there for some time. Are the complainants doing anything that would result in SSL certificate checking? More generally, it'd be useful to see an exact example of what are the connection parameters and environment that result in a failure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq 9.4 requires /etc/passwd?
I wrote: Christoph Berg c...@df7cb.de writes: libpq wants the user home directory to find .pgpass and .pg_service.conf files, but apparently the behavior to require the existence of the passwd file (or nss equivalent) is new in 9.4. There is demonstrably no direct reference to /etc/passwd in the PG code. It's possible that we've added some new expectation that $HOME can be identified, but a quick look through the code shows no such checks that don't look like they've been there for some time. Hmm ... actually, I'll bet it's not $HOME that's at issue, but the name of the user. Commit a4c8f14364c27508233f8a31ac4b10a4c90235a9 turned failure of pg_fe_getauthname() into a hard connection failure, whereas previously it was harmless as long as the caller provided a username. I wonder if we shouldn't just revert that commit in toto. Yeah, identifying an out-of-memory error might be useful, but this cure seems a lot worse than the disease. What's more, this coding reports *any* pg_fe_getauthname failure as out of memory, which is much worse than useless. Alternatively, maybe don't try to resolve username this way until we've found that the caller isn't providing any username. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq 9.4 requires /etc/passwd?
On Fri, Jan 9, 2015 at 06:42:19PM -0500, Tom Lane wrote: Christoph Berg c...@df7cb.de writes: libpq wants the user home directory to find .pgpass and .pg_service.conf files, but apparently the behavior to require the existence of the passwd file (or nss equivalent) is new in 9.4. There is demonstrably no direct reference to /etc/passwd in the PG code. It's possible that we've added some new expectation that $HOME can be identified, but a quick look through the code shows no such checks that don't look like they've been there for some time. Are the complainants doing anything that would result in SSL certificate checking? More generally, it'd be useful to see an exact example of what are the connection parameters and environment that result in a failure. I see similar error strings, but nothing that ends with a question mark: out of memory?. However, I wonder if the crux of the problem is that they are running in a chroot environment where the user id can't be looked up. Looking at libpq's pg_fe_getauthname(), I see that if it can't get the OS user name of the effective user, it assumes it failed and returns NULL: /* * We document PQconndefaults() to return NULL for a memory allocation * failure. We don't have an API to return a user name lookup failure, so * we just assume it always succeeds. */ #ifdef WIN32 if (GetUserName(username, namesize)) name = username; #else if (pqGetpwuid(geteuid(), pwdstr, pwdbuf, sizeof(pwdbuf), pw) == 0) name = pw-pw_name; #endif authn = name ? strdup(name) : NULL; and conninfo_add_defaults() assumes a pg_fe_getauthname() failure is a memory allocation failure: option-val = pg_fe_getauthname(); if (!option-val) { if (errorMessage) printfPQExpBuffer(errorMessage, libpq_gettext(out of memory\n)); return false; It was my 9.4 commit that changed this: commit a4c8f14364c27508233f8a31ac4b10a4c90235a9 Author: Bruce Momjian br...@momjian.us Date: Thu Mar 20 11:48:31 2014 -0400 libpq: pass a memory allocation failure error up to PQconndefaults() Previously user name memory allocation failures were ignored and the default user name set to NULL. I am thinking we have to now assume that user name lookups can fail for other reasons. I am unclear how this worked in the past though. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On 1/8/15, 12:00 PM, Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 8, 2015 at 10:19 AM, Kevin Grittner kgri...@ymail.com wrote: Robert Haas robertmh...@gmail.com wrote: Andres is talking in my other ear suggesting that we ought to reuse the 2PC infrastructure to do all this. If you mean that the primary transaction and all FDWs in the transaction must use 2PC, that is what I was saying, although apparently not clearly enough. All nodes *including the local one* must be prepared and committed with data about the nodes saved safely off somewhere that it can be read in the event of a failure of any of the nodes *including the local one*. Without that, I see this whole approach as a train wreck just waiting to happen. Clearly, all the nodes other than the local one need to use 2PC. I am unconvinced that the local node must write a 2PC state file only to turn around and remove it again almost immediately thereafter. The key point is that the distributed transaction data must be flagged as needing to commit rather than roll back between the prepare phase and the final commit. If you try to avoid the PREPARE, flagging, COMMIT PREPARED sequence by building the flagging of the distributed transaction metadata into the COMMIT process, you still have the problem of what to do on crash recovery. You really need to use 2PC to keep that clean, I think. If we had an independent transaction coordinator then I agree with you Kevin. I think Robert is proposing that if we are controlling one of the nodes that's participating as well as coordinating the overall transaction that we can take some shortcuts. AIUI a PREPARE means you are completely ready to commit. In essence you're just waiting to write and fsync the commit message. That is in fact the state that a coordinating PG node would be in by the time everyone else has done their prepare. So from that standpoint we're OK. Now, as soon as ANY of the nodes commit, our coordinating node MUST be able to commit as well! That would require it to have a real prepared transaction of it's own created. However, as long as there is zero chance of any other prepared transactions committing before our local transaction, that step isn't actually needed. Our local transaction will either commit or abort, and that will determine what needs to happen on all other nodes. I'm ignoring the question of how the local node needs to store info about the other nodes in case of a crash, but AFAICT you could reliably recover manually from what I just described. I think the question is: are we OK with going under the skirt in this fashion? Presumably it would provide better performance, whereas forcing ourselves to eat our own 2PC dogfood would presumably make it easier for someone to plugin an external coordinator instead of using our own. I think there's also a lot to be said for getting a partial implementation of this available today (requiring manual recovery), so long as it's not in core. BTW, I found https://www.cs.rutgers.edu/~pxk/417/notes/content/transactions.html a useful read, specifically the 2PC portion. I'm not really clear on the mechanism that is being proposed for doing this, but one way would be to have the PREPARE of the local transaction be requested explicitly and to have that cause all FDWs participating in the transaction to also be prepared. (That might be what Andres meant; I don't know.) We want this to be client-transparent, so that the client just says COMMIT and everything Just Works. What about the case where one or more nodes doesn't support 2PC. Do we silently make the choice, without the client really knowing? We abort. (Unless we want to have a running_with_scissors GUC...) That doesn't strike me as the only possible mechanism to drive this, but it might well be the simplest and cleanest. The trickiest bit might be to find a good way to persist the distributed transaction information in a way that survives the failure of the main transaction -- or even the abrupt loss of the machine it's running on. I'd be willing to punt on surviving a loss of the entire machine. But I'd like to be able to survive an abrupt reboot. As long as people are aware that there is an urgent need to find and fix all data stores to which clusters on the failed machine were connected via FDW when there is a hard machine failure, I guess it is OK. In essence we just document it and declare it to be somebody else's problem. In general I would expect a distributed transaction manager to behave well in the face of any single-machine failure, but if there is one aspect of a full-featured distributed transaction manager we could give up, I guess that would be it. ISTM that one option here would be to simply write and sync WAL record(s) of all externally prepared transactions. That would be enough for a hot standby to find all the other servers and tell them to either commit or abort, based on whether
Re: [HACKERS] Parallel Seq Scan
On 1/9/15, 3:34 PM, Stephen Frost wrote: * Stefan Kaltenbrunner (ste...@kaltenbrunner.cc) wrote: On 01/09/2015 08:01 PM, Stephen Frost wrote: Now, for debugging purposes, I could see such a parameter being available but it should default to 'off/never-fail'. not sure what it really would be useful for - if I execute a query I would truely expect it to get answered - if it can be made faster if done in parallel thats nice but why would I want it to fail? I was thinking for debugging only, though I'm not really sure why you'd need it if you get a NOTICE when you don't end up with all the workers you expect. Yeah, debugging is my concern as well. You're working on a query, you expect it to be using parallelism, and EXPLAIN is showing it's not. Now you're scratching your head. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
2015-01-10 8:18 GMT+09:00 Jim Nasby jim.na...@bluetreble.com: On 1/6/15, 5:43 PM, Kouhei Kaigai wrote: scan_relid != InvalidOid Ideally, they should be OidIsValid(scan_relid) Scan.scanrelid is an index of range-tables list, not an object-id. So, InvalidOid or OidIsValid() are not a good choice. I think the name needs to change then; scan_relid certainly looks like the OID of a relation. scan_index? Yep, I had a same impression when I looked at the code first time, however, it is defined as below. Not a manner of custom-scan itself. /* * == * Scan nodes * == */ typedef struct Scan { Planplan; Index scanrelid; /* relid is index into the range table */ } Scan; -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible typo in create_policy.sgml
On 9 January 2015 at 20:46, Stephen Frost sfr...@snowman.net wrote: I'd suggest we further clarify with: The commandCREATE POLICY/command command defines a new policy for a table. Note that row level security must also be enabled on the table using commandALTER TABLE/command in order for created policies to be applied. Once row level security has been enabled, a default-deny policy is used and no rows in the table are visible, except to the table owner or superuser, unless permitted by a specific policy. A policy permits SELECT, INSERT, UPDATE or DELETE commands to access rows in a table that has row level security enabled. Access to existing table rows is granted if they match a policy expression specified via USING, while new rows that would be created via INSERT or UPDATE are checked against policy expressions specified via WITH CHECK. For policy expressions specified via USING which grant access to existing rows, the system will generally test the policy expressions prior to any qualifications that appear in the query itself, in order to the prevent the inadvertent exposure of the protected data to user-defined functions which might not be trustworthy. However, functions and operators marked by the system (or the system administrator) as LEAKPROOF may be evaluated before policy expressions, as they are assumed to be trustworthy. Looks good to me. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/9/15, 11:24 AM, Stephen Frost wrote: What I was advocating for up-thread was to consider multiple parallel paths and to pick whichever ends up being the lowest overall cost. The flip-side to that is increased planning time. Perhaps we can come up with an efficient way of working out where the break-point is based on the non-parallel cost and go at it from that direction instead of building out whole paths for each increment of parallelism. I think at some point we'll need the ability to stop planning part-way through for queries producing really small estimates. If the first estimate you get is 1000 units, does it really make sense to do something like try every possible join permutation, or attempt to parallelize? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Translating xlogreader.c
Heikki Linnakangas wrote: xlogreader.c contains a bunch of strings passed to the report_invalid_record function, that are supposed to be translated. src/backend/nls.mk lists report_invalid_record as a gettext trigger. In 9.2 and below, when those functions were still in xlog, those strings were in the postgres.pot file, and translated. But not since 9.3, when they were moved to xlogreader.c What's going on? I missed a :2 in GETTEXT_TRIGGERS. Fixed now, backpatched to 9.3. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS qual pushdown
Dean, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: A while ago [1] I proposed an enhancement to the way qual pushdown safety is decided in RLS / security barrier views. Currently we just test for the presence of leaky functions in the qual, but it is possible to do better than that, by further testing if the leaky function is actually being passed information that we don't want to be leaked. This certainly sounds reasonable to me. In fact the majority of builtin functions aren't marked leakproof, and probably most user functions aren't either, so this could potentially be useful in a wide range of real-world queries, where it is common to write quals of the form column operator expression, and the expression may contain leaky functions. Agreed. Looks like you've already added it to the next commitfest, which is great. I'm definitely interested but probably won't get to it right away as I have a few other things to address. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Comment typo in src/backend/executor/execMain.c
Etsuro, * Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote: I ran into a comment type. Please find attached a patch. Fix pushed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On 9 January 2015 at 00:49, Stephen Frost sfr...@snowman.net wrote: Peter, * Peter Geoghegan (p...@heroku.com) wrote: For column level privileges, you wouldn't expect to only get an error about not having the relevant update permissions at runtime, when the update path happens to be taken. And so it is for RLS. Right, that's the precedent we should be considering. Column-level privileges is a great example- you need both insert and update privileges for the columns involved for the command to succeed. It shouldn't depend on which path actually ends up being taken. It's not the same as column-level privileges though, because RLS policies depend on the new/existing data, not just the table metadata. So for example an UPDATE ... WHERE false can fail column-level privilege checks even though it isn't updating anything, but it cannot fail a RLS policy check. I was trying to think up an example where you might actually have different INSERT and UPDATE policies, and the best I can think of is some sort of mod_count column where you have an INSERT CHECK (mod_count = 0) and an UPDATE CHECK (mod_count 0). In that case, checking both policies would make an UPSERT impossible, whereas if you think of it as doing either an INSERT or an UPDATE, as the syntax suggests, it becomes possible. (I'm assuming here from your description that you intend for the policy expressions to be AND'ed together, so it might end up being an AND of 2 OR expressions.) Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Fri, Jan 9, 2015 at 12:19 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I was trying to think up an example where you might actually have different INSERT and UPDATE policies, and the best I can think of is some sort of mod_count column where you have an INSERT CHECK (mod_count = 0) and an UPDATE CHECK (mod_count 0). In that case, checking both policies would make an UPSERT impossible, whereas if you think of it as doing either an INSERT or an UPDATE, as the syntax suggests, it becomes possible. Why does this user want to do this upsert? If they're upserting, then the inserted row could only reasonably have a value of (mod_count = 0). If updating, then they must have a constant value for the update path (a value that's greater than 0, naturally - say 2), which doesn't make any sense in the context of an upsert's auxiliary update - what happened to the 0 value? Sorry, but I don't think your example makes sense - I can't see what would motivate anyone to write a query like that with those RLS policies in place. It sounds like you're talking about an insert and a separate update that may or may not affect the same row, and not an upsert. Then those policies make sense, but in practice they render the upsert you describe contradictory. FWIW, I'm not suggesting that there couldn't possibly be a use case that doesn't do well with this convention where we enforce RLS deepening on the path taken. The cases are just very marginal, as I think your difficulty in coming up with a convincing counter-argument shows. I happen to think what Stephen and I favor (bunching together USING() barrier quals and check options from INSERT and UPDATE policies) is likely to be the best alternative available on balance. More generally, you could point out that I'm actually testing different tuples at different points in query processing under that regime (e.g. the post-insert tuple, or the before-update conflicting, existing tuple from the target, or the post update tuple) and so things could fail when the update path is taken despite the fact that they didn't fail when the insert path was taken. That's technically true, of course, but with idiomatic usage it isn't true, and that's what I care about. Does anyone have another counter-example of a practical upsert statement that cannot be used with certain RLS policies due to the fact that we chose to bunch together INSERT and UPDATE RLS policies? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On 01/08/2015 10:44 PM, Peter Eisentraut wrote: On 1/6/15 7:17 PM, Andres Freund wrote: One problem is that it doesn't use the replication protocol, so the setup is going to be inconsistent with pg_basebackup. Maybe the replication protocol could be extended to provide the required data. I'm not particularly bothered by the requirement of also requiring a normal, not replication, connection. In most cases that'll already be allowed. I don't agree. We have separated out replication access, especially in pg_hba.conf and with the replication role attribute, for a reason. (I hope there was a reason; I don't remember.) It is not unreasonable to set things up so that non-replication access is only from the application tier, and replication access is only from within the database tier. I agree. A big difference between a replication connection and a normal database connection is that a replication connection is not bound to any particular database. It also cannot run transactions, and many other things. It would nevertheless be handy to be able to do more stuff in a replication connection. For example, you might want to run functions like pg_create_restore_point(), pg_xlog_replay_pause/resume etc. in a replication connection. We should extend the replication protocol to allow such things. I'm not sure what it would look like; we can't run arbitrary queries when not being connected to a database, or arbitrary functions, but there are a lot of functions that would make sense. Now I understand that making pg_rewind work over a replication connection is a lot more work, and maybe we don't want to spend it, at least right now. But then we either need to document this as an explicit deficiency and think about fixing it later, or we should revisit how replication access control is handled. Right, that's how I've been thinking about this. I don't want to start working on the replication protocol right now, I think we can live with it as it is, but it sure would be nicer if it worked over a replication connection. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What exactly is our CRC algorithm?
Hi Heikki. I've attached two regenerated CRC patches, split up as before. 1. The slicing-by-8 patch contains numerous changes: a. A configure test for __builtin_bswap32 b. A comment referencing the slicing-by-8 paper (which is behind a paywall, unfortunately, so I haven't even read it). Are more comments needed? If so, where/what kind? c. A byte-reversed big-endian version of the 8*256 table. In Linux, there's only one table that uses __constant_swab32, but for us it's simpler to have two tables. d. Thanks to (c), we can avoid the bswap32 in the hot loop. e. On big-endian systems, FIN_CRC32C now bswap32()s the CRC before finalising it. (We don't need to do this in INIT_CRC32C only because the initialiser is 0x.) 2. The sse4.2 patch has only some minor compile fixes. I have built and tested both patches individually on little-endian (amd64) and big-endian (ppc) systems. I verified that the _sse is chosen at startup on the former, and _sb8 on the latter, and that both implementations function correctly with respect to HEAD. Please let me know if there's anything else I need to do. -- Abhijit From a202673fa8e4e60e7fd5087fd8577a90a75e90ee Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen a...@2ndquadrant.com Date: Fri, 9 Jan 2015 10:38:47 +0530 Subject: Implement slicing-by-8 CRC calculation The COMP_CRC32C macro now calls pg_comp_crc32c(), which processes eight data bytes at a time. Its output is identical to the byte-at-a-time CRC code. (This change does not apply to the LEGACY_CRC32 computation.) Reviewers: Andres Freund, Heikki Linnakangas Author: Abhijit Menon-Sen --- config/c-compiler.m4 | 17 + configure | 30 + configure.in |1 + src/include/pg_config.h.in|3 + src/include/pg_config.h.win32 |3 + src/include/utils/pg_crc.h| 11 +- src/include/utils/pg_crc_tables.h | 1128 ++--- src/port/pg_crc.c | 105 +++- 8 files changed, 1228 insertions(+), 70 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 90b56e7..509f961 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -193,6 +193,23 @@ fi])# PGAC_C_TYPES_COMPATIBLE +# PGAC_C_BUILTIN_BSWAP32 +# - +# Check if the C compiler understands __builtin_bswap32(), +# and define HAVE__BUILTIN_BSWAP32 if so. +AC_DEFUN([PGAC_C_BUILTIN_BSWAP32], +[AC_CACHE_CHECK(for __builtin_bswap32, pgac_cv__builtin_bswap32, +[AC_TRY_COMPILE([static unsigned long int x = __builtin_bswap32(0xaabbccdd);], +[], +[pgac_cv__builtin_bswap32=yes], +[pgac_cv__builtin_bswap32=no])]) +if test x$pgac_cv__builtin_bswap32 = xyes ; then +AC_DEFINE(HAVE__BUILTIN_BSWAP32, 1, + [Define to 1 if your compiler understands __builtin_bswap32.]) +fi])# PGAC_C_BUILTIN_BSWAP32 + + + # PGAC_C_BUILTIN_CONSTANT_P # - # Check if the C compiler understands __builtin_constant_p(), diff --git a/configure b/configure index fce4908..27092ec 100755 --- a/configure +++ b/configure @@ -10324,6 +10324,36 @@ if test x$pgac_cv__types_compatible = xyes ; then $as_echo #define HAVE__BUILTIN_TYPES_COMPATIBLE_P 1 confdefs.h fi +{ $as_echo $as_me:${as_lineno-$LINENO}: checking for __builtin_bswap32 5 +$as_echo_n checking for __builtin_bswap32... 6; } +if ${pgac_cv__builtin_bswap32+:} false; then : + $as_echo_n (cached) 6 +else + cat confdefs.h - _ACEOF conftest.$ac_ext +/* end confdefs.h. */ +static unsigned long int x = __builtin_bswap32(0xaabbccdd); +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile $LINENO; then : + pgac_cv__builtin_bswap32=yes +else + pgac_cv__builtin_bswap32=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo $as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_bswap32 5 +$as_echo $pgac_cv__builtin_bswap32 6; } +if test x$pgac_cv__builtin_bswap32 = xyes ; then + +$as_echo #define HAVE__BUILTIN_BSWAP32 1 confdefs.h + +fi { $as_echo $as_me:${as_lineno-$LINENO}: checking for __builtin_constant_p 5 $as_echo_n checking for __builtin_constant_p... 6; } if ${pgac_cv__builtin_constant_p+:} false; then : diff --git a/configure.in b/configure.in index 6aa69fb..0206836 100644 --- a/configure.in +++ b/configure.in @@ -1176,6 +1176,7 @@ PGAC_C_SIGNED PGAC_C_FUNCNAME_SUPPORT PGAC_C_STATIC_ASSERT PGAC_C_TYPES_COMPATIBLE +PGAC_C_BUILTIN_BSWAP32 PGAC_C_BUILTIN_CONSTANT_P PGAC_C_BUILTIN_UNREACHABLE PGAC_C_VA_ARGS diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 83f04e9..7962757 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -666,6 +666,9 @@ /* Define to 1 if you have the winldap.h header file. */ #undef HAVE_WINLDAP_H +/* Define to 1 if your compiler understands __builtin_bswap32. */ +#undef HAVE__BUILTIN_BSWAP32 + /* Define to 1 if your compiler understands
Re: [HACKERS] TABLESAMPLE patch
On Fri, Jan 9, 2015 at 1:10 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 06/01/15 14:22, Petr Jelinek wrote: On 06/01/15 08:51, Michael Paquier wrote: On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com wrote: Attached is v3 which besides the fixes mentioned above also includes changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for crash with FETCH FIRST and is rebased against current master. This patch needs a rebase, there is a small conflict in parallel_schedule. Sigh, I really wish we had automation that checks this automatically for patches in CF. Here is rebase against current master. Thanks! Structurally speaking, I think that the tsm methods should be added in src/backend/utils and not src/backend/access which is more high-level as tsm_bernoulli.c and tsm_system.c contain only a set of new I am not sure if I parsed this correctly, do you mean to say that only low-level access functions belong to src/backend/access? Makes sense. Made this change. procedure functions. Having a single header file tsm.h would be also a good thing. Moved into single tablesample.h file. Regarding the naming, is tsm (table sample method) really appealing? Wouldn't it be better to use simply tablesample_* for the file names and the method names? I created the src/backend/tablesample and files are named just system.c and bernoulli.c, but I kept tsm_ prefix for methods as they become too long for my taste when prefixing with tablesample_. This is a large patch... Wouldn't sampling.[c|h] extracted from ANALYZE live better as a refactoring patch? This would limit a bit bug occurrences on the main patch. That's a good idea, I'll split it into patch series. I've split the sampling.c/h into separate patch. I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as separate patch in the attached patch-set. This also includes modules test with simple custom tablesample method. There are some very minor cleanups in the main tablesample code itself but no functional changes. Some comments about the 1st patch: 1) Nitpicking: + * Block sampling routines shared by ANALYZE and TABLESAMPLE. TABLESAMPLE is not added yet. You may as well mention simplify this description with something like Sampling routines for relation blocks. 2) file_fdw and postgres_fdw do not compile correctly as they still use anl_random_fract. This function is still mentioned in vacuum.h as well. 3) Not really an issue of this patch, but I'd think that this comment should be reworked: + * BlockSampler is used for stage one of our new two-stage tuple + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject + * Large DB). It selects a random sample of samplesize blocks out of + * the nblocks blocks in the table. If the table has less than + * samplesize blocks, all blocks are selected. 4) As a refactoring patch, why is the function providing a random value changed? Shouldn't sample_random_fract be consistent with anl_random_fract? 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and RAND48_SEED_2 instead of being hardcoded? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Async execution of postgres_fdw.
Hello, thank you for the comment. This is the second version of the patch. - Refactored to make the code simpler and clearer. - Added comment about logic outline and struct members. - Removed trailig white spaces.. - No additional test yet. == warning: 3 lines add whitespace errors. Whoops. Fixed. 2. The patches compile cleanly. 3. The regression is clean, even in contrib/postgres_fdw and contrib/file_fdw Tests --- We need tests to make sure that the logic remains intact even after further changes in this area. Couple of tests which require multiple foreign scans within the same query fetching rows more than fetch size (100) would be required. Also, some DMLs, which involve multiple foreign scans would test the sanity when UPDATE/DELETE interleave such scans. By multiple foreign scans I mean both multiple scans on a single foreign server and multiple scans spread across multiple foreign servers. Additional tests indeed might be needed. Some of the test related to this patch are implicitly done in the present regression tests. But no explicit ones. fetch_size is currently a bare constant so I think it is not so necessary to test for other fetch sizes. Even if different size will potentially cause a problem, it will be found when the different number is actually applied. On the current design, async scan is started only on the first scan on the connection, and if the next scan or modify claims the same connection, the async state is immediately finished and behaves as the same as the unpatched version. But since asynchronous/parallel scan is introduced in any form, such kind of test seems to be needed. multi-server tests are not done also in the unpatched version but there's no difference between multiple foregn servers on the same remote server and them distributed on multiple remotes. The async scan of this patch works only on the same foreign server so there seems to be no need such kind of test. Do you have any specific concern about this? After all, I will add some explict tests for async-canceling in the next patch. Code --- Because previous conn is now replaced by conn-pgconn, the double indirection makes the code a bit ugly and prone to segfaults (conn being NULL or invalid pointer). Can we minimize such code or wrap it under a macro? Agreed. It was annoyance also for me. I've done the following things to encapsulate PgFdwConn to some extent in the second version of this patch. They are described below. We need some comments about the structure definition of PgFdwConn and its members explaining the purpose of this structure and its members. Thank you for pointing that. I forgot that. I added simple comments there. Same is the case with enum PgFdwConnState. In fact, the state diagram of a connection has become more complicated with the async connections, so it might be better to explain that state diagram at one place in the code (through comments). The definition of the enum might be a good place to do that. I added a comment describing the and logic and meaning of the statesjust above the enum declaration. Otherwise, the logic of connection maintenance is spread at multiple places and is difficult to understand by looking at the code. In function GetConnection(), at line elog(DEBUG3, new postgres_fdw connection %p for server \%s\, -entry-conn, server-servername); +entry-conn-pgconn, server-servername); Thank you, I replaced conn's in this form with PFC_PGCONN(conn). entry-conn-pgconn may not necessarily be a new connection and may be NULL (as the next line check it for being NULL). So, I think this line should be moved within the following if block after pgconn has been initialised with the new connection. + if (entry-conn-pgconn == NULL) + { + entry-conn-pgconn = connect_pg_server(server, user); + entry-conn-nscans = 0; + entry-conn-async_state = PGFDW_CONN_IDLE; + entry-conn-async_scan = NULL; + } The if condition if (entry-conn == NULL) in GetConnection(), used to track whether there is a PGConn active for the given entry, now it tracks whether it has PgFdwConn for the same. After some soncideration, I decided to make PgFdwConn to be handled more similarly to PGconn. This patch has shrunk as a result and bacame looks clear. - Added macros to encapsulate PgFdwConn struct. (One of them is a function) - Added macros to call PQxxx functions taking PgFdwConn. - connect_pg_server() returns PgFdwConn. - connection.c does not touch the inside of PgFdwConn except a few points. The PgFdwConn's memory is allocated with malloc() as PGconn and freed by PFCfinish() which is the correspondent of PQfinish(). As the result of those chagnes, this patch has altered into the following shape. - All points where PGconn is used now uses PgFdwConn. They are seemingly simple replacements. - The major functional changes are concentrated within
Re: [HACKERS] Possible typo in create_policy.sgml
On 8 January 2015 at 18:57, Stephen Frost sfr...@snowman.net wrote: What do you think of the attached rewording? Rewording it this way is a great idea. Hopefully that will help address the confusion which we've seen. The only comment I have offhand is: should we should add a sentence to this paragraph about the default-deny policy? Yes, good idea, although I think perhaps that sentence should be added to the preceding paragraph, after noting that RLS has to be enabled on the table for the policies to be applied: The commandCREATE POLICY/command command defines a new policy for a table. Note that row level security must also be enabled on the table using commandALTER TABLE/command in order for created policies to be applied. Once row level security has been enabled, a default-deny policy is used and no rows in the table are visible unless permitted by a specific policy. A policy permits SELECT, INSERT, UPDATE or DELETE commands to access rows in a table that has row level security enabled. Access to existing table rows is granted if they match a policy expression specified via USING, while new rows that would be created via INSERT or UPDATE are checked against policy expressions specified via WITH CHECK. For policy expressions specified via USING which grant access to existing rows, the system will generally test the policy expressions prior to any qualifications that appear in the query itself, in order to the prevent the inadvertent exposure of the protected data to user-defined functions which might not be trustworthy. However, functions and operators marked by the system (or the system administrator) as LEAKPROOF may be evaluated before policy expressions, as they are assumed to be trustworthy. Also, perhaps the ALTER TABLE in the first paragraph should be turned into a link. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POLA violation with \c service=
On Thu, Jan 08, 2015 at 08:04:47PM -0800, David Fetter wrote: On Mon, Jan 05, 2015 at 02:26:59PM -0800, David Fetter wrote: On Tue, Dec 30, 2014 at 04:48:11PM -0800, David Fetter wrote: On Wed, Dec 17, 2014 at 08:14:04AM -0500, Andrew Dunstan wrote: Yeah, that's the correct solution. It should not be terribly difficult to create a test for a conninfo string in the dbname parameter. That's what libpq does after all. We certainly don't want psql to have to try to interpret the service file. psql just needs to let libpq do its work in this situation. This took a little longer to get time to polish than I'd hoped, but please find attached a patch which: - Correctly connects to service= and postgres(ql)?:// with \c - Disallows tab completion in the above cases I'd like to see about having tab completion actually work correctly in at least the service= case, but that's a matter for a follow-on patch. Thanks to Andrew Dunstan for the original patch, and to Andrew Gierth for his help getting it into shape. Cheers, David. I should mention that the patch also corrects a problem where the password was being saved/discarded at inappropriate times. Please push this patch to the back branches :) Per discussion with Stephen Frost, I've documented the previously undocumented behavior with conninfo strings and URIs. Some C cleanups... I think longer term we should see about having libpq export the functions I've put in common.[ch], but that's for a later patch. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index bdfb67c..eb6a57b 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -796,18 +796,20 @@ testdb=gt; /varlistentry varlistentry -termliteral\c/literal or literal\connect/literal literal[ replaceable class=parameterdbname/replaceable [ replaceable class=parameterusername/replaceable ] [ replaceable class=parameterhost/replaceable ] [ replaceable class=parameterport/replaceable ] ]/literal/term +termliteral\c/literal or literal\connect/literal literal [ { [ replaceable class=parameterdbname/replaceable [ replaceable class=parameterusername/replaceable ] [ replaceable class=parameterhost/replaceable ] [ replaceable class=parameterport/replaceable ] ] | replaceable class=parameterconninfo/replaceable string | replaceable class=parameterURI/replaceable } ] /literal/term listitem para Establishes a new connection to a productnamePostgreSQL/ -server. If the new connection is successfully made, the -previous connection is closed. If any of replaceable +server using positional parameters as described below, a +parameterconninfo/parameter string, or a acronymURI/acronym. If the new connection is +successfully made, the +previous connection is closed. When using positional parameters, if any of replaceable class=parameterdbname/replaceable, replaceable class=parameterusername/replaceable, replaceable class=parameterhost/replaceable or replaceable class=parameterport/replaceable are omitted or specified as literal-/literal, the value of that parameter from the -previous connection is used. If there is no previous +previous connection is used. If using positional parameters and there is no previous connection, the applicationlibpq/application default for the parameter's value is used. /para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 4ac21f2..f290fbc 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1610,6 +1610,8 @@ do_connect(char *dbname, char *user, char *host, char *port) PGconn *o_conn = pset.db, *n_conn; char *password = NULL; + boolkeep_password = true; + boolhas_connection_string = false; if (!o_conn (!dbname || !user || !host || !port)) { @@ -1623,14 +1625,32 @@ do_connect(char *dbname, char *user, char *host, char *port) return false; } - if (!dbname) - dbname = PQdb(o_conn); if (!user) user = PQuser(o_conn); + else if (strcmp(user, PQuser(o_conn)) != 0) + keep_password = false; + if (!host) host = PQhost(o_conn); + else if (strcmp(host, PQhost(o_conn)) != 0) + keep_password = false; + if (!port) port = PQport(o_conn); + else if (strcmp(port, PQport(o_conn)) != 0) +
Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
On 01/08/2015 08:42 PM, Aaron Botsis wrote: On Jan 8, 2015, at 3:44 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/08/2015 03:05 PM, Aaron Botsis wrote: It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this: copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02’; While perhaps unnecessary, given the size and simplicity of the patch, IMO it’s a no brainer to merge (it actually makes the code smaller by 3 lines). It also enables non-json use cases anytime one might want to preserve embedded escapes, or use different ones entirely. Do you see other reasons not to commit it? Well, for one thing it's seriously incomplete. You need to be able to change the delimiter as well. Otherwise, any embedded tab in the json will cause you major grief. Currently the delimiter and the escape MUST be a single byte non-nul character, and there is a check for this in csv mode. Your patch would allow any arbitrary string (including one of zero length) for the escape in text mode, and would then silently ignore all but the first byte. That's not the way we like to do things. And, frankly, I would need to spend quite a lot more time thinking about other implications than I have given it so far. This is an area where I tend to be VERY cautious about making changes. This is a fairly fragile ecosystem. Good point. This version: * doesn't allow ENCODING in BINARY mode (existing bug) * doesn’t allow ESCAPE in BINARY mode * makes COPY TO work with escape * ensures escape char length is 2 for text mode, 1 for CSV Couple more things to realize: setting both the escape and delimiter characters to null won’t be any different than how you fiddled with them in CSV mode. The code paths will be the same because we should never encounter a null in the middle of the string. And even if we did (and the encoding didn’t catch it), we’d treat it just like any other field delimiter or escape character. So I’m going to do a bit more testing with another patch tomorrow with delimiters removed. If you can think of any specific cases you think will break it let me know and I’ll make sure to add regression tests for them as well. Well, I still need convincing that this is the best solution to the problem. As I said, I need to spend more time thinking about it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
On Jan 8, 2015, at 3:44 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/08/2015 03:05 PM, Aaron Botsis wrote: It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this: copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02’; While perhaps unnecessary, given the size and simplicity of the patch, IMO it’s a no brainer to merge (it actually makes the code smaller by 3 lines). It also enables non-json use cases anytime one might want to preserve embedded escapes, or use different ones entirely. Do you see other reasons not to commit it? Well, for one thing it's seriously incomplete. You need to be able to change the delimiter as well. Otherwise, any embedded tab in the json will cause you major grief. Currently the delimiter and the escape MUST be a single byte non-nul character, and there is a check for this in csv mode. Your patch would allow any arbitrary string (including one of zero length) for the escape in text mode, and would then silently ignore all but the first byte. That's not the way we like to do things. And, frankly, I would need to spend quite a lot more time thinking about other implications than I have given it so far. This is an area where I tend to be VERY cautious about making changes. This is a fairly fragile ecosystem. Good point. This version: * doesn't allow ENCODING in BINARY mode (existing bug) * doesn’t allow ESCAPE in BINARY mode * makes COPY TO work with escape * ensures escape char length is 2 for text mode, 1 for CSV Couple more things to realize: setting both the escape and delimiter characters to null won’t be any different than how you fiddled with them in CSV mode. The code paths will be the same because we should never encounter a null in the middle of the string. And even if we did (and the encoding didn’t catch it), we’d treat it just like any other field delimiter or escape character. So I’m going to do a bit more testing with another patch tomorrow with delimiters removed. If you can think of any specific cases you think will break it let me know and I’ll make sure to add regression tests for them as well. Cheers! Aaron escape-without-csv-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: On Fri, Jan 9, 2015 at 1:02 AM, Jim Nasby jim.na...@bluetreble.com wrote: I agree, but we should try and warn the user if they set parallel_seqscan_degree close to max_worker_processes, or at least give some indication of what's going on. This is something you could end up beating your head on wondering why it's not working. Yet another way to handle the case when enough workers are not available is to let user specify the desired minimum percentage of requested parallel workers with parameter like PARALLEL_QUERY_MIN_PERCENT. For example, if you specify 50 for this parameter, then at least 50% of the parallel workers requested for any parallel operation must be available in order for the operation to succeed else it will give error. If the value is set to null, then all parallel operations will proceed as long as at least two parallel workers are available for processing. Ugh. I'm not a fan of this.. Based on how we're talking about modeling this, if we decide to parallelize at all, then we expect it to be a win. I don't like the idea of throwing an error if, at execution time, we end up not being able to actually get the number of workers we want- instead, we should degrade gracefully all the way back to serial, if necessary. Perhaps we should send a NOTICE or something along those lines to let the user know we weren't able to get the level of parallelization that the plan originally asked for, but I really don't like just throwing an error. Now, for debugging purposes, I could see such a parameter being available but it should default to 'off/never-fail'. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
On Jan 7, 2015, at 2:45 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/07/2015 08:25 AM, Aaron Botsis wrote: Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysis. ...Except when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck. I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route. This isn't a bug. Neither CSV format nor TEXT format are partucularly suitable for json. I'm quite certain I could compose legal json that will break your proposal (for example, with an embedded newline in the white space.) Sorry - though I originally reported it as a bug, I didn’t mean to imply it ultimately was one. :) The patch is a feature enhancement. It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this: copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02’; While perhaps unnecessary, given the size and simplicity of the patch, IMO it’s a no brainer to merge (it actually makes the code smaller by 3 lines). It also enables non-json use cases anytime one might want to preserve embedded escapes, or use different ones entirely. Do you see other reasons not to commit it? You aren't the first person to encounter this problem. See http://adpgtech.blogspot.com/2014/09/importing-json-data.html http://adpgtech.blogspot.com/2014/09/importing-json-data.html Maybe we need to add something like this to the docs, or to the wiki. In your post you acknowledged a text mode copy with null escape character would have solved your problem, and to me, that was the intuitive/obvious choice as well. *shrug* Note too my comment in that blog post: Now this solution is a bit of a hack. I wonder if there's a case for a COPY mode that simply treats each line as a single datum. I also wonder if we need some more specialized tools for importing JSON, possibly one or more Foreign Data Wrappers. Such things could handle, say, embedded newline punctuation. Agree. Though https://github.com/citusdata/json_fdw https://github.com/citusdata/json_fdw already exists (I haven’t used it). How do folks feel about a COPY mode that reads a single datum until it finds a single character record terminator alone on a line? something like: =# COPY test(data) from stdin terminator ‘}’; End with a backslash and a period on a line by itself. { a: 123, c: string, b: [1, 2, 3 ] } { [1,2,3] } \. COPY 2 You could also get fancy and support multi-character record terminators which would allow the same thing in a slightly different way: COPY test(data) from stdin terminator ‘\n}’; COPY test(data) from stdin terminator ‘\n}’; I don’t know how crazy you could get without a lot of rework. It might have to be used in conjunction with a more constrained mode like COPY…RAW”. Aaron
Re: [HACKERS] Fixing memory leak in pg_upgrade
On Fri, Jan 9, 2015 at 11:34:24AM -0500, Tom Lane wrote: Tatsuo Ishii is...@postgresql.org writes: According to Coverity, there's a memory leak bug in transfer_all_new_dbs(). It's pretty difficult to get excited about that; how many table-free databases is pg_upgrade likely to see in one run? But surely we could just move the pg_free call to after the if-block. I have fixed this with the attached, applied patch. I thought malloc(0) would return null, but our src/common pg_malloc() has: /* Avoid unportable behavior of malloc(0) */ if (size == 0) size = 1; so some memory is allocated, and has to be freed. I looked at avoiding the call to gen_db_file_maps() for old_db-rel_arr.nrels == 0, but there are checks in there comparing the old/new relation counts, so it can't be skipped. I also removed the unnecessary memory initialization. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + commit ac7009abd228362042edd10e6b12556ddef35171 Author: Bruce Momjian br...@momjian.us Date: Fri Jan 9 12:12:30 2015 -0500 pg_upgrade: fix one-byte per empty db memory leak Report by Tatsuo Ishii, Coverity diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c new file mode 100644 index 70753f2..423802b *** a/contrib/pg_upgrade/relfilenode.c --- b/contrib/pg_upgrade/relfilenode.c *** transfer_all_new_dbs(DbInfoArr *old_db_a *** 110,119 pg_fatal(old database \%s\ not found in the new cluster\n, old_db-db_name); - n_maps = 0; mappings = gen_db_file_maps(old_db, new_db, n_maps, old_pgdata, new_pgdata); - if (n_maps) { print_maps(mappings, n_maps, new_db-db_name); --- 110,117 *** transfer_all_new_dbs(DbInfoArr *old_db_a *** 123,131 #endif transfer_single_new_db(pageConverter, mappings, n_maps, old_tablespace); - - pg_free(mappings); } } return; --- 121,129 #endif transfer_single_new_db(pageConverter, mappings, n_maps, old_tablespace); } + /* We allocate something even for n_maps == 0 */ + pg_free(mappings); } return; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing memory leak in pg_upgrade
Tatsuo Ishii is...@postgresql.org writes: According to Coverity, there's a memory leak bug in transfer_all_new_dbs(). It's pretty difficult to get excited about that; how many table-free databases is pg_upgrade likely to see in one run? But surely we could just move the pg_free call to after the if-block. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: On Fri, Dec 19, 2014 at 7:57 PM, Stephen Frost sfr...@snowman.net wrote: There's certainly documentation available from the other RDBMS' which already support parallel query, as one source. Other academic papers exist (and once you've linked into one, the references and prior work helps bring in others). Sadly, I don't currently have ACM access (might have to change that..), but there are publicly available papers also, I have gone through couple of papers and what some other databases do in case of parallel sequential scan and here is brief summarization of same and how I am planning to handle in the patch: Great, thanks! Costing: In one of the paper's [1] suggested by you, below is the summarisation: a. Startup costs are negligible if processes can be reused rather than created afresh. b. Communication cost consists of the CPU cost of sending and receiving messages. c. Communication costs can exceed the cost of operators such as scanning, joining or grouping These findings lead to the important conclusion that Query optimization should be concerned with communication costs but not with startup costs. In our case as currently we don't have a mechanism to reuse parallel workers, so we need to account for that cost as well. So based on that, I am planing to add three new parameters cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost * cpu_tuple_comm_cost - Cost of CPU time to pass a tuple from worker to master backend with default value DEFAULT_CPU_TUPLE_COMM_COST as 0.1, this will be multiplied with tuples expected to be selected * parallel_setup_cost - Cost of setting up shared memory for parallelism with default value as 100.0 * parallel_startup_cost - Cost of starting up parallel workers with default value as 1000.0 multiplied by number of workers decided for scan. I will do some experiments to finalise the default values, but in general, I feel developing cost model on above parameters is good. The parameters sound reasonable but I'm a bit worried about the way you're describing the implementation. Specifically this comment: Cost of starting up parallel workers with default value as 1000.0 multiplied by number of workers decided for scan. That appears to imply that we'll decide on the number of workers, figure out the cost, and then consider parallel as one path and not-parallel as another. I'm worried that if I end up setting the max parallel workers to 32 for my big, beefy, mostly-single-user system then I'll actually end up not getting parallel execution because we'll always be including the full startup cost of 32 threads. For huge queries, it'll probably be fine, but there's a lot of room to parallelize things at levels less than 32 which we won't even consider. What I was advocating for up-thread was to consider multiple parallel paths and to pick whichever ends up being the lowest overall cost. The flip-side to that is increased planning time. Perhaps we can come up with an efficient way of working out where the break-point is based on the non-parallel cost and go at it from that direction instead of building out whole paths for each increment of parallelism. I'd really like to be able to set the 'max parallel' high and then have the optimizer figure out how many workers should actually be spawned for a given query. Execution: Most other databases does partition level scan for partition on different disks by each individual parallel worker. However, it seems amazon dynamodb [2] also works on something similar to what I have used in patch which means on fixed blocks. I think this kind of strategy seems better than dividing the blocks at runtime because dividing randomly the blocks among workers could lead to random scan for a parallel sequential scan. Yeah, we also need to consider the i/o side of this, which will definitely be tricky. There are i/o systems out there which are faster than a single CPU and ones where a single CPU can manage multiple i/o channels. There are also cases where the i/o system handles sequential access nearly as fast as random and cases where sequential is much faster than random. Where we can get an idea of that distinction is with seq_page_cost vs. random_page_cost as folks running on SSDs tend to lower random_page_cost from the default to indicate that. Also I find in whatever I have read (Oracle, dynamodb) that most databases divide work among workers and master backend acts as coordinator, atleast that's what I could understand. Yeah, I agree that's more typical. Robert's point that the master backend should participate is interesting but, as I recall, it was based on the idea that the master could finish faster than the worker- but if that's the case then we've planned it out wrong from the beginning. Thanks! Stephen
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Tue, Jan 6, 2015 at 9:17 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: The attached patch is newer revision of custom-/foreign-join interface. It seems that the basic purpose of this patch is to allow a foreign scan or custom scan to have scanrelid == 0, reflecting the case where we are scanning a joinrel rather than a baserel. The major problem that seems to create is that we can't set the target list from the relation descriptor, because there isn't one. To work around that, you've added fdw_ps_list and custom_ps_tlist, which the FDW or custom-plan provider must set. I don't know off-hand whether that's a good interface or not. How does the FDW know what to stick in there? In the most usual scenario, FDP/CSP will make a ps_tlist according to the target-list of the joinrel (that contains mixture of var-nodes to left-side and right-side), and qualifier's expression tree if any. As long as FDW can construct a remote query, it knows which attributes shall be returned and which relation does it come from. It is equivalent to what ps_tlist tries to inform the core optimizer. There's a comment that seems to be trying to explain this: + * An optional fdw_ps_tlist is used to map a reference to an attribute + of + * underlying relation(s) on a pair of INDEX_VAR and alternative varattno. + * It looks like a scan on pseudo relation that is usually result of + * relations join on remote data source, and FDW driver is responsible + to + * set expected target list for this. If FDW returns records as + foreign- + * table definition, just put NIL here. ...but I can't understand what that's telling me. Sorry, let me explain in another expression. A joinrel has a target-list that can/may contain references to both of the left and right relations. These are eventually mapped to either INNER_VAR or OUTER_VAR, then executor switch the TupleTableSlot (whether ecxt_innertuple or ecxt_outertuple) according to the special varno. On the other hands, because ForeignScan/CustomScan is a scan plan, it shall have just one TupleTableSlot on execution time. Thus, we need a mechanism that maps attributes from both of the relations on a certain location of the slot; that shall be eventually translated to var-node with INDEX_VAR to reference ecxt_scantuple. Of course, ps_tlist is not necessary if ForeignScan/CustomScan scans on a base relation as literal. In this case, the interface contract expects NIL is set on the ps_tlist field. You've added an Oid fdw_handler field to the ForeignScan and RelOptInfo structures. I think this is the OID of the pg_proc entry for the handler function; and I think we need it because, if scanrelid == 0 then we don't have a relation that we can trace to a foreign table, to a server, to an FDW, and then to a handler. So we need to get that information some other way. When building joinrels, the fdw_handler OID, and the associated routine, are propagated from any two relations that share the same fdw_handler OID to the resulting joinrel. I guess that's reasonable, although it feels a little weird that we're copying around both the OID and the structure-pointer. Unlike CustomScan node, ForeignScan node does not have function pointers. In addition, it is dynamically allocated by palloc(), so we have no guarantee the pointer constructed on plan-stage is valid on beginning of the executor. It is the reason why I put OID of the FDW handler routine. Any other better idea? For non-obvious reasons, you've made create_plan_recurse() non-static. When custom-scan node replaced a join-plan, it shall have at least two child plan-nodes. The callback handler of PlanCustomPath needs to be able to call create_plan_recurse() to transform the underlying path-nodes to plan-nodes, because this custom-scan node may take other built-in scan or sub-join nodes as its inner/outer input. In case of FDW, it shall kick any underlying scan relations to remote side, thus we may not expect ForeignScan has underlying plans... Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Fri, Jan 9, 2015 at 2:22 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: Whoa, hang on. I think you're being a bit quick to dismiss that example. Why shouldn't I want an upsert where the majority of the table columns follow the usual make it so pattern of an upsert, but there is also this kind of audit column to be maintained? Then I would write something like INSERT INTO tbl (some values, 0) ON CONFLICT UPDATE SET same values, mod_count=mod_count+1; The root of the problem is the way that you're proposing to combine the RLS policies (using AND), which runs contrary to the way RLS policies are usually combined (using OR), which is why this kind of example fails -- RLS policies in general aren't intended to all be true simultaneously. In case I wasn't clear, I'm proposing that we AND together the already OR'd together UPDATE and INSERT quals. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Peter Geoghegan (p...@heroku.com) wrote: On Fri, Jan 9, 2015 at 1:09 PM, Stephen Frost sfr...@snowman.net wrote: Therefore, I'm not sure that I see the point in checking the INSERT tuple against the UPDATE policy. I guess it wouldn't be hard to modify the struct WithCheckOption to store the cmd (e.g. RowSecurityPolicy-style ACL_*_CHR permissions), usable only in this context. That way, when ExecWithCheckOptions() is called from the INSERT, we can tell it to only enforce INSERT-applicable policy cmds (in particular, not UPDATE-only-applicable policy cmds that happen to end up associated with the same ResultRelation). Whereas, at the end of ExecUpdate(), that final ExecWithCheckOptions() call (call 3 of 3 when the UPDATE path is taken), INSERT-based policies can still be enforced against the final tuple (as can USING() quals that would ordinarily not be checked at that point either). A given ResultRelation's policies wouldn't necessarily always need to be enforced within ExecWithCheckOptions(), which is a change. Does that make sense to you? That sounds reasonable on first blush. I'll note that I've not looked deeply at the UPSERT patch, but if the above means that the INSERT policy is always checked against the INSERT tuple and the UPDATE policy is always checked against the tuple-resulting-from-an-UPDATE, and that tuples which are not visible due to the UPDATE policy throw an error in that case, then it's working as I'd expect. This does mean that the UPDATE policy won't be checked when the INSERT path is used but that seems appropriate to me, as we can't check a policy against a tuple that never exists (the result of the update applied to an existing tuple). Note that I've already taught ExecWithCheckOptions() to not leak the existing, target tuple when the UPDATE path is taken (the tuple that can be referenced in the UPDATE using the TARGET.* alias), while still performing enforcement against it. I can teach it this too. Ok. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] libpq bad async behaviour
I'm worried about libpq blocking in some circumstances; particularly around SSL renegotiations. This came up while writing an async postgres library for lua, I realised that this code was dangerous: https://github.com/daurnimator/cqueues-pgsql/blob/ee9c3fc85c94669b8128386d99d730fe93d9dbad/cqueues-pgsql.lua#L121 e.g. 1: When a PQ connection is in non-blocking mode, PQflush returns 1, the docs say: wait for the socket to be write-ready and call it again However, if the SSL layer is waiting on data for a renegotiation, write readiness is not enough: Waiting for POLLOUT and calling PQflush again will (untested) just return 1 again, and continue to do so until data is recieved. This is a busy-loop, and will block the host application. e.g. 2: An SSL renegiation happens while trying to receive a response. According to 'andres' on IRC, inside of `PQisBusy` there is a busy loop: 14:22:32 andres You'll not see that. Even though the explanation for it is absolutely horrid. 14:23:32 andres There's a busy retry loop because of exactly that reason inside libpq's ssl read function whenever it hits a WANT_WRITE. 14:23:58 daurnimator so... libpq will block my process? :( 14:24:25 andres daurnimator: That case is unlikely to be hit often luckily because of the OS buffering. But yea, it's really unsatisfying. 14:26:06 andres daurnimator: I think this'll need a new API to be properly fixed. One idea that came to mind if we want to keep the same api, is to hide the socket behind an epoll file descriptor, they always poll read ready when an fd in their set becomes ready. I think this is also possible for kqueue on bsd, ports on solaris and IOCP on windows. Regards, Daurnimator. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
Dean, Peter, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: On 9 January 2015 at 08:49, Peter Geoghegan p...@heroku.com wrote: On Fri, Jan 9, 2015 at 12:19 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I was trying to think up an example where you might actually have different INSERT and UPDATE policies, and the best I can think of is some sort of mod_count column where you have an INSERT CHECK (mod_count = 0) and an UPDATE CHECK (mod_count 0). In that case, checking both policies would make an UPSERT impossible, whereas if you think of it as doing either an INSERT or an UPDATE, as the syntax suggests, it becomes possible. Why does this user want to do this upsert? If they're upserting, then the inserted row could only reasonably have a value of (mod_count = 0). If updating, then they must have a constant value for the update path (a value that's greater than 0, naturally - say 2), which doesn't make any sense in the context of an upsert's auxiliary update - what happened to the 0 value? Sorry, but I don't think your example makes sense - I can't see what would motivate anyone to write a query like that with those RLS policies in place. It sounds like you're talking about an insert and a separate update that may or may not affect the same row, and not an upsert. Then those policies make sense, but in practice they render the upsert you describe contradictory. Whoa, hang on. I think you're being a bit quick to dismiss that example. Why shouldn't I want an upsert where the majority of the table columns follow the usual make it so pattern of an upsert, but there is also this kind of audit column to be maintained? Then I would write something like INSERT INTO tbl (some values, 0) ON CONFLICT UPDATE SET same values, mod_count=mod_count+1; That's a good point- it doesn't make sense to compare the INSERT values against the UPDATE policy as the result of the UPDATE could be a completely different tuple, one we can't know the contents of until we actually find a conflicting tuple. We can't simply combine the policies and run them at the same time, but I'm not sure that then means we should let an INSERT .. ON CONFLICT UPDATE pass in values to the INSERT which are not permitted by any INSERT policy on the relation. Further, forgetting about RLS, what happens if a user doesn't have INSERT rights on the mod_count column at all? We still allow the above to execute if the row already exists? That doesn't strike me as a good idea. In this case, I do think it makes sense to consider RLS and our regular permissions system as they are both 'OR' cases. I might have UPDATE rights for a particular column through multiple different roles for a given relation, but if none of them give me INSERT rights for that column, then I'd expect to get an error if I try to do an INSERT into that column. The root of the problem is the way that you're proposing to combine the RLS policies (using AND), which runs contrary to the way RLS policies are usually combined (using OR), which is why this kind of example fails -- RLS policies in general aren't intended to all be true simultaneously. I agree that RLS policies, in general, are not intended to all be true simultaneously. I don't think it then follows that an INSERT .. ON CONFLICT UPDATE should be allowed if, for example, there are no INSERT policies on an RLS-enabled relation, because the call happens to end up doing an UPDATE. To try and outline another possible use-case, consider that you have tuples coming in from two independent sets of users, orange and blue. All users are allowed to INSERT, but the 'color' column must equal the user's. For UPDATEs, the blue users can see both orange and blue records while orange users can only see orange records. Rows resulting from an UPDATE can be orange or blue for blue users, but must be orange for orange users. Further, in this environment, attempts by orange users to insert blue records are flagged to be audited for review because orange users should never have access to nor be handling blue data. With the approach you're outlining, a command like: INSERT INTO data VALUES (... , 'blue') ON CONFLICT UPDATE set ... = ...; run by an orange user wouldn't error if it happened to result in an UPDATE (presuming, of course, that the UPDATE didn't try to change the existing color), but from a security perspective, it's clearly an error for an orange user to be attempting to insert blue data. I don't see this as impacting the more general case of how RLS policies are applied. An individual blue user might also be a member of some orange-related role and it wouldn't make any sense for that user to then be only able to see orange records (were we to AND RLS policies together). Where this leaves me, at least, is feeling like we should always apply the INSERT WITH CHECK policy, then if there is a conflict, check the UPDATE USING policy and throw an error if the row
Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract
On Thu, Sep 4, 2014 at 12:52:14PM -0400, Robert Haas wrote: On Wed, Sep 3, 2014 at 6:24 PM, Bruce Momjian br...@momjian.us wrote: On Fri, May 9, 2014 at 12:03:36PM -0400, Robert Haas wrote: On Thu, May 8, 2014 at 5:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Perhaps the text should be like this: The result is 1 if the termination message was sent; or in nonblocking mode, this may only indicate that the termination message was successfully queued. (In nonblocking mode, to be certain that the data has been sent, you should next wait for write-ready and call functionPQflush/, repeating until it returns zero.) Zero indicates that the function could not queue the termination message because of full buffers; this will only happen in nonblocking mode. (In this case, wait for write-ready and try the PQputCopyEnd call again.) If a hard error occurs, -1 is returned; you can use functionPQerrorMessage/function to retrieve details. That looks pretty good. However, I'm realizing this isn't the only place where we probably need to clarify the language. Just to take one example near at hand, PQputCopyData may also return 1 when it's only queued the data; it seems to try even less hard than PQputCopyEnd to ensure that the data is actually sent. Uh, where are we on this? I think someone needs to take Tom's proposed language and make it into a patch. And figure out which other functions in the documentation need similar updates. I have developed such a patch --- attached. I didn't see any other mentions of blocking in the docs. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml new file mode 100644 index de272c5..ad104a3 *** a/doc/src/sgml/libpq.sgml --- b/doc/src/sgml/libpq.sgml *** int PQsetnonblocking(PGconn *conn, int a *** 4318,4325 para In the nonblocking state, calls to functionPQsendQuery/function, functionPQputline/function, !functionPQputnbytes/function, and !functionPQendcopy/function will not block but instead return an error if they need to be called again. /para --- 4318,4325 para In the nonblocking state, calls to functionPQsendQuery/function, functionPQputline/function, !functionPQputnbytes/function, functionPQputCopyData/function, !and functionPQendcopy/function will not block but instead return an error if they need to be called again. /para *** int PQputCopyData(PGconn *conn, *** 4961,4969 para Transmits the commandCOPY/command data in the specified parameterbuffer/, of length parameternbytes/, to the server. !The result is 1 if the data was sent, zero if it was not sent !because the attempt would block (this case is only possible if the !connection is in nonblocking mode), or -1 if an error occurred. (Use functionPQerrorMessage/function to retrieve details if the return value is -1. If the value is zero, wait for write-ready and try again.) --- 4961,4969 para Transmits the commandCOPY/command data in the specified parameterbuffer/, of length parameternbytes/, to the server. !The result is 1 if the data was queued, zero if it was not queued !because of full buffers (this will only happen in nonblocking mode), !or -1 if an error occurred. (Use functionPQerrorMessage/function to retrieve details if the return value is -1. If the value is zero, wait for write-ready and try again.) *** int PQputCopyEnd(PGconn *conn, *** 5009,5021 connections.) /para ! para !The result is 1 if the termination data was sent, zero if it was !not sent because the attempt would block (this case is only possible !if the connection is in nonblocking mode), or -1 if an error !occurred. (Use functionPQerrorMessage/function to retrieve !details if the return value is -1. If the value is zero, wait for !write-ready and try again.) /para para --- 5009,5026 connections.) /para ! para !The result is 1 if the termination message was sent; or in !nonblocking mode, this may only indicate that the termination !message was successfully queued. (In nonblocking mode, to be !certain that the data has been sent, you should next wait for !write-ready and call functionPQflush/, repeating until it !returns zero.) Zero indicates that the function could not queue !the termination message because of full buffers; this will only !happen in nonblocking mode. (In this case, wait
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On 1/6/15, 5:43 PM, Kouhei Kaigai wrote: scan_relid != InvalidOid Ideally, they should be OidIsValid(scan_relid) Scan.scanrelid is an index of range-tables list, not an object-id. So, InvalidOid or OidIsValid() are not a good choice. I think the name needs to change then; scan_relid certainly looks like the OID of a relation. scan_index? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible typo in create_policy.sgml
Dean, * Dean Rasheed (dean.a.rash...@gmail.com) wrote: On 8 January 2015 at 18:57, Stephen Frost sfr...@snowman.net wrote: What do you think of the attached rewording? Rewording it this way is a great idea. Hopefully that will help address the confusion which we've seen. The only comment I have offhand is: should we should add a sentence to this paragraph about the default-deny policy? Yes, good idea, although I think perhaps that sentence should be added to the preceding paragraph, after noting that RLS has to be enabled on the table for the policies to be applied: I'm a bit on the fence about these ending up as different paragraphs then, but ignoring that for the moment, I'd suggest we further clarify with: The commandCREATE POLICY/command command defines a new policy for a table. Note that row level security must also be enabled on the table using commandALTER TABLE/command in order for created policies to be applied. Once row level security has been enabled, a default-deny policy is used and no rows in the table are visible, except to the table owner or superuser, unless permitted by a specific policy. A policy permits SELECT, INSERT, UPDATE or DELETE commands to access rows in a table that has row level security enabled. Access to existing table rows is granted if they match a policy expression specified via USING, while new rows that would be created via INSERT or UPDATE are checked against policy expressions specified via WITH CHECK. For policy expressions specified via USING which grant access to existing rows, the system will generally test the policy expressions prior to any qualifications that appear in the query itself, in order to the prevent the inadvertent exposure of the protected data to user-defined functions which might not be trustworthy. However, functions and operators marked by the system (or the system administrator) as LEAKPROOF may be evaluated before policy expressions, as they are assumed to be trustworthy. Also, perhaps the ALTER TABLE in the first paragraph should be turned into a link. Ah, yes, agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
Peter, * Peter Geoghegan (p...@heroku.com) wrote: On Fri, Jan 9, 2015 at 2:22 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: Whoa, hang on. I think you're being a bit quick to dismiss that example. Why shouldn't I want an upsert where the majority of the table columns follow the usual make it so pattern of an upsert, but there is also this kind of audit column to be maintained? Then I would write something like INSERT INTO tbl (some values, 0) ON CONFLICT UPDATE SET same values, mod_count=mod_count+1; The root of the problem is the way that you're proposing to combine the RLS policies (using AND), which runs contrary to the way RLS policies are usually combined (using OR), which is why this kind of example fails -- RLS policies in general aren't intended to all be true simultaneously. In case I wasn't clear, I'm proposing that we AND together the already OR'd together UPDATE and INSERT quals. I'm not sure how that would work exactly though, since the tuple the UPDATE results in might be different from what the INSERT has, as Dean pointed out. The INSERT tuple might even pass the UPDATE policy where the resulting tuple from the actual UPDATE part doesn't. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Fri, Jan 9, 2015 at 12:53 PM, Stephen Frost sfr...@snowman.net wrote: I'm not sure how that would work exactly though, since the tuple the UPDATE results in might be different from what the INSERT has, as Dean pointed out. The INSERT tuple might even pass the UPDATE policy where the resulting tuple from the actual UPDATE part doesn't. I'm not suggesting that Dean's example is totally without merit (his revised explanation made it clearer what he meant). I just think, on balance, that enforcing all INSERT + UPDATE policies at various points is the best behavior. Part of the reason for that is that if you ask people how they think it works, you'll get as many answers as the number of people you ask. My proposal (which may or may not be the same as yours, but if not is only slightly more restrictive) is unlikely to affect many users negatively, and is relatively easy to explain and reason about. There are workarounds for Dean's case, too. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 01/09/2015 08:01 PM, Stephen Frost wrote: Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: On Fri, Jan 9, 2015 at 1:02 AM, Jim Nasby jim.na...@bluetreble.com wrote: I agree, but we should try and warn the user if they set parallel_seqscan_degree close to max_worker_processes, or at least give some indication of what's going on. This is something you could end up beating your head on wondering why it's not working. Yet another way to handle the case when enough workers are not available is to let user specify the desired minimum percentage of requested parallel workers with parameter like PARALLEL_QUERY_MIN_PERCENT. For example, if you specify 50 for this parameter, then at least 50% of the parallel workers requested for any parallel operation must be available in order for the operation to succeed else it will give error. If the value is set to null, then all parallel operations will proceed as long as at least two parallel workers are available for processing. Ugh. I'm not a fan of this.. Based on how we're talking about modeling this, if we decide to parallelize at all, then we expect it to be a win. I don't like the idea of throwing an error if, at execution time, we end up not being able to actually get the number of workers we want- instead, we should degrade gracefully all the way back to serial, if necessary. Perhaps we should send a NOTICE or something along those lines to let the user know we weren't able to get the level of parallelization that the plan originally asked for, but I really don't like just throwing an error. yeah this seems like the the behaviour I would expect, if we cant get enough parallel workers we should just use as much as we can get. Everything else and especially erroring out will just cause random application failures and easy DoS vectors. I think all we need initially is being able to specify a maximum number of workers per query as well as a maximum number of workers in total for parallel operations. Now, for debugging purposes, I could see such a parameter being available but it should default to 'off/never-fail'. not sure what it really would be useful for - if I execute a query I would truely expect it to get answered - if it can be made faster if done in parallel thats nice but why would I want it to fail? Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Fri, Jan 9, 2015 at 1:09 PM, Stephen Frost sfr...@snowman.net wrote: To flip it around a bit, I don't think we can avoid checking the *resulting* tuple from the UPDATE against the UPDATE policy. We can avoid it - by not updating. What I'm suggesting is that an enforcement occurs that is more or less equivalent to the enforcement that occurs when the UPDATE path is taken, without it actually being taken. I accept that that isn't perfect, but it has its advantages. Therefore, I'm not sure that I see the point in checking the INSERT tuple against the UPDATE policy. I guess it wouldn't be hard to modify the struct WithCheckOption to store the cmd (e.g. RowSecurityPolicy-style ACL_*_CHR permissions), usable only in this context. That way, when ExecWithCheckOptions() is called from the INSERT, we can tell it to only enforce INSERT-applicable policy cmds (in particular, not UPDATE-only-applicable policy cmds that happen to end up associated with the same ResultRelation). Whereas, at the end of ExecUpdate(), that final ExecWithCheckOptions() call (call 3 of 3 when the UPDATE path is taken), INSERT-based policies can still be enforced against the final tuple (as can USING() quals that would ordinarily not be checked at that point either). A given ResultRelation's policies wouldn't necessarily always need to be enforced within ExecWithCheckOptions(), which is a change. Does that make sense to you? Note that I've already taught ExecWithCheckOptions() to not leak the existing, target tuple when the UPDATE path is taken (the tuple that can be referenced in the UPDATE using the TARGET.* alias), while still performing enforcement against it. I can teach it this too. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
* Stefan Kaltenbrunner (ste...@kaltenbrunner.cc) wrote: On 01/09/2015 08:01 PM, Stephen Frost wrote: Now, for debugging purposes, I could see such a parameter being available but it should default to 'off/never-fail'. not sure what it really would be useful for - if I execute a query I would truely expect it to get answered - if it can be made faster if done in parallel thats nice but why would I want it to fail? I was thinking for debugging only, though I'm not really sure why you'd need it if you get a NOTICE when you don't end up with all the workers you expect. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Fri, Jan 9, 2015 at 12:26 PM, Stephen Frost sfr...@snowman.net wrote: Where this leaves me, at least, is feeling like we should always apply the INSERT WITH CHECK policy, then if there is a conflict, check the UPDATE USING policy and throw an error if the row isn't visible but otherwise perform the UPDATE and then check the UPDATE WITH CHECK policy. I see your point that this runs counter to the 'mod_count' example use-case and could cause problems for users using RLS with such a strategy. For my part, I expect users of RLS to expect errors in such a case rather than allowing it, but it's certainly a judgement call. I mostly agree, but I don't know that I fully agree. Specifically, I also think we should check the update policy even when we only insert, on the theory that if we did go to update, the would-be inserted row would be a proxy for what we'd check then (the existing, target relation's tuple). What do you think of that? I certainly agree that the correct behavior here is at least a bit subjective. We cannot exactly generalize from other areas of the code, nor can we look for a precedent set by other systems (AFAICT there is none). The only reasonable way that I can see to support both sides would be to allow UPSERT to be a top-level policy definition in its own right and let users specify exactly what is allowed in the UPSERT case (possibly requiring three different expressions to represent the initial INSERT, what the UPDATE can see, and what the UPDATE results in..). I tend to doubt it would be worth it unless we end up supporting UPSERT-specific triggers and permissions.. Agreed. That would technically make everyone happy, in that it defers to the user, but is unlikely to be worth it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Peter Geoghegan (p...@heroku.com) wrote: On Fri, Jan 9, 2015 at 12:26 PM, Stephen Frost sfr...@snowman.net wrote: Where this leaves me, at least, is feeling like we should always apply the INSERT WITH CHECK policy, then if there is a conflict, check the UPDATE USING policy and throw an error if the row isn't visible but otherwise perform the UPDATE and then check the UPDATE WITH CHECK policy. I see your point that this runs counter to the 'mod_count' example use-case and could cause problems for users using RLS with such a strategy. For my part, I expect users of RLS to expect errors in such a case rather than allowing it, but it's certainly a judgement call. I mostly agree, but I don't know that I fully agree. Specifically, I also think we should check the update policy even when we only insert, on the theory that if we did go to update, the would-be inserted row would be a proxy for what we'd check then (the existing, target relation's tuple). What do you think of that? To flip it around a bit, I don't think we can avoid checking the *resulting* tuple from the UPDATE against the UPDATE policy. Therefore, I'm not sure that I see the point in checking the INSERT tuple against the UPDATE policy. I also don't like the idea that a completely legitimate command (one which allows the to-be-inserted row via the INSERT policy AND allows the tuple-post-update via the UPDATE policy) to throw an error because the to-be-inserted row violated the UPDATE policy. That doesn't make sense to me. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On 9 January 2015 at 08:49, Peter Geoghegan p...@heroku.com wrote: On Fri, Jan 9, 2015 at 12:19 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I was trying to think up an example where you might actually have different INSERT and UPDATE policies, and the best I can think of is some sort of mod_count column where you have an INSERT CHECK (mod_count = 0) and an UPDATE CHECK (mod_count 0). In that case, checking both policies would make an UPSERT impossible, whereas if you think of it as doing either an INSERT or an UPDATE, as the syntax suggests, it becomes possible. Why does this user want to do this upsert? If they're upserting, then the inserted row could only reasonably have a value of (mod_count = 0). If updating, then they must have a constant value for the update path (a value that's greater than 0, naturally - say 2), which doesn't make any sense in the context of an upsert's auxiliary update - what happened to the 0 value? Sorry, but I don't think your example makes sense - I can't see what would motivate anyone to write a query like that with those RLS policies in place. It sounds like you're talking about an insert and a separate update that may or may not affect the same row, and not an upsert. Then those policies make sense, but in practice they render the upsert you describe contradictory. Whoa, hang on. I think you're being a bit quick to dismiss that example. Why shouldn't I want an upsert where the majority of the table columns follow the usual make it so pattern of an upsert, but there is also this kind of audit column to be maintained? Then I would write something like INSERT INTO tbl (some values, 0) ON CONFLICT UPDATE SET same values, mod_count=mod_count+1; The root of the problem is the way that you're proposing to combine the RLS policies (using AND), which runs contrary to the way RLS policies are usually combined (using OR), which is why this kind of example fails -- RLS policies in general aren't intended to all be true simultaneously. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fixing memory leak in pg_upgrade
According to Coverity, there's a memory leak bug in transfer_all_new_dbs(). mappings = gen_db_file_maps(old_db, new_db, n_maps, old_pgdata, new_pgdata); if (n_maps) { print_maps(mappings, n_maps, new_db-db_name); #ifdef PAGE_CONVERSION pageConverter = setupPageConverter(); #endif transfer_single_new_db(pageConverter, mappings, n_maps, old_tablespace); pg_free(mappings); } - leaks mappings } return; } This is because gen_db_file_maps() allocates memory even if n_maps == 0. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
On Tue, Jan 6, 2015 at 9:17 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: The attached patch is newer revision of custom-/foreign-join interface. It seems that the basic purpose of this patch is to allow a foreign scan or custom scan to have scanrelid == 0, reflecting the case where we are scanning a joinrel rather than a baserel. The major problem that seems to create is that we can't set the target list from the relation descriptor, because there isn't one. To work around that, you've added fdw_ps_list and custom_ps_tlist, which the FDW or custom-plan provider must set. I don't know off-hand whether that's a good interface or not. How does the FDW know what to stick in there? There's a comment that seems to be trying to explain this: + * An optional fdw_ps_tlist is used to map a reference to an attribute of + * underlying relation(s) on a pair of INDEX_VAR and alternative varattno. + * It looks like a scan on pseudo relation that is usually result of + * relations join on remote data source, and FDW driver is responsible to + * set expected target list for this. If FDW returns records as foreign- + * table definition, just put NIL here. ...but I can't understand what that's telling me. You've added an Oid fdw_handler field to the ForeignScan and RelOptInfo structures. I think this is the OID of the pg_proc entry for the handler function; and I think we need it because, if scanrelid == 0 then we don't have a relation that we can trace to a foreign table, to a server, to an FDW, and then to a handler. So we need to get that information some other way. When building joinrels, the fdw_handler OID, and the associated routine, are propagated from any two relations that share the same fdw_handler OID to the resulting joinrel. I guess that's reasonable, although it feels a little weird that we're copying around both the OID and the structure-pointer. For non-obvious reasons, you've made create_plan_recurse() non-static. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Translating xlogreader.c
xlogreader.c contains a bunch of strings passed to the report_invalid_record function, that are supposed to be translated. src/backend/nls.mk lists report_invalid_record as a gettext trigger. In 9.2 and below, when those functions were still in xlog, those strings were in the postgres.pot file, and translated. But not since 9.3, when they were moved to xlogreader.c What's going on? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Jan 9, 2015 at 1:02 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/5/15, 9:21 AM, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: I think it's right to view this in the same way we view work_mem. We plan on the assumption that an amount of memory equal to work_mem will be available at execution time, without actually reserving it. Agreed- this seems like a good approach for how to address this. We should still be able to end up with plans which use less than the max possible parallel workers though, as I pointed out somewhere up-thread. This is also similar to work_mem- we certainly have plans which don't expect to use all of work_mem and others that expect to use all of it (per node, of course). I agree, but we should try and warn the user if they set parallel_seqscan_degree close to max_worker_processes, or at least give some indication of what's going on. This is something you could end up beating your head on wondering why it's not working. Yet another way to handle the case when enough workers are not available is to let user specify the desired minimum percentage of requested parallel workers with parameter like PARALLEL_QUERY_MIN_PERCENT. For example, if you specify 50 for this parameter, then at least 50% of the parallel workers requested for any parallel operation must be available in order for the operation to succeed else it will give error. If the value is set to null, then all parallel operations will proceed as long as at least two parallel workers are available for processing. This is something how other commercial database handles such a situation. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Improving RLS qual pushdown
A while ago [1] I proposed an enhancement to the way qual pushdown safety is decided in RLS / security barrier views. Currently we just test for the presence of leaky functions in the qual, but it is possible to do better than that, by further testing if the leaky function is actually being passed information that we don't want to be leaked. Attached is a patch that does that, allowing restriction clause quals to be pushed down into subquery RTEs if they contain leaky functions, provided that the arglists of those leaky functions contain no Vars (such Vars would necessarily refer to output columns of the subquery, which is the data that must not be leaked). An example of the sort of query this will help optimise is: SELECT * FROM table_with_rls WHERE created now() - '1 hour'::interval; where currently this qual cannot be pushed down because neither now() nor timestamptz_mi_interval() are leakproof, but since they are not being passed any data from the table, they can't actually leak anything, so the qual can be safely pushed down, allowing indexes to be used if available. In fact the majority of builtin functions aren't marked leakproof, and probably most user functions aren't either, so this could potentially be useful in a wide range of real-world queries, where it is common to write quals of the form column operator expression, and the expression may contain leaky functions. Regards, Dean [1] http://www.postgresql.org/message-id/caezatcwkcpfwilocnmfmszklgoabz7axkmgz-mk83gd3jg8...@mail.gmail.com diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c new file mode 100644 index 58d78e6..4433468 *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *** targetIsInAllPartitionLists(TargetEntry *** 1982,1988 * 2. If unsafeVolatile is set, the qual must not contain any volatile * functions. * ! * 3. If unsafeLeaky is set, the qual must not contain any leaky functions. * * 4. The qual must not refer to the whole-row output of the subquery * (since there is no easy way to name that within the subquery itself). --- 1982,1989 * 2. If unsafeVolatile is set, the qual must not contain any volatile * functions. * ! * 3. If unsafeLeaky is set, the qual must not contain any leaky functions ! * that might reveal values from the subquery as side effects. * * 4. The qual must not refer to the whole-row output of the subquery * (since there is no easy way to name that within the subquery itself). *** qual_is_pushdown_safe(Query *subquery, I *** 2009,2015 /* Refuse leaky quals if told to (point 3) */ if (safetyInfo-unsafeLeaky ! contain_leaky_functions(qual)) return false; /* --- 2010,2016 /* Refuse leaky quals if told to (point 3) */ if (safetyInfo-unsafeLeaky ! contain_leaked_vars(qual)) return false; /* diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c new file mode 100644 index b340b01..7f82d54 *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *** static bool contain_mutable_functions_wa *** 97,103 static bool contain_volatile_functions_walker(Node *node, void *context); static bool contain_volatile_functions_not_nextval_walker(Node *node, void *context); static bool contain_nonstrict_functions_walker(Node *node, void *context); ! static bool contain_leaky_functions_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); --- 97,103 static bool contain_volatile_functions_walker(Node *node, void *context); static bool contain_volatile_functions_not_nextval_walker(Node *node, void *context); static bool contain_nonstrict_functions_walker(Node *node, void *context); ! static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); *** contain_nonstrict_functions_walker(Node *** 1318,1343 } /* ! * Check clauses for non-leakproof functions */ /* ! * contain_leaky_functions ! * Recursively search for leaky functions within a clause. * ! * Returns true if any function call with side-effect may be present in the ! * clause. Qualifiers from outside the a security_barrier view should not ! * be pushed down into the view, lest the contents of tuples intended to be ! * filtered out be revealed via side effects. */ bool
Re: [HACKERS] Fixing memory leak in pg_upgrade
On Fri, Jan 9, 2015 at 9:23 PM, Tatsuo Ishii is...@postgresql.org wrote: This is because gen_db_file_maps() allocates memory even if n_maps == 0. Purely cosmetic: the initialization n_maps = 0 before the call of gen_db_file_maps is unnecessary ;) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
So this test can be used to evaluate how shorter records influence performance since the master waits for flush confirmation from the standby, right? Yes. This test can help measure performance improvement due to reduced I/O on standby as master waits for WAL records flush on standby. Isn't that GB and not MB? Yes. That is a typo. It should be GB. How many FPWs have been generated and how many dirty buffers have been flushed for the 3 checkpoints of each test? Any data about the CPU activity? Above data is not available for this run . I will rerun the tests to gather above data. Thank you, Rahila Syed -- View this message in context: http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5833389.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Dec 19, 2014 at 7:57 PM, Stephen Frost sfr...@snowman.net wrote: There's certainly documentation available from the other RDBMS' which already support parallel query, as one source. Other academic papers exist (and once you've linked into one, the references and prior work helps bring in others). Sadly, I don't currently have ACM access (might have to change that..), but there are publicly available papers also, I have gone through couple of papers and what some other databases do in case of parallel sequential scan and here is brief summarization of same and how I am planning to handle in the patch: Costing: In one of the paper's [1] suggested by you, below is the summarisation: a. Startup costs are negligible if processes can be reused rather than created afresh. b. Communication cost consists of the CPU cost of sending and receiving messages. c. Communication costs can exceed the cost of operators such as scanning, joining or grouping These findings lead to the important conclusion that Query optimization should be concerned with communication costs but not with startup costs. In our case as currently we don't have a mechanism to reuse parallel workers, so we need to account for that cost as well. So based on that, I am planing to add three new parameters cpu_tuple_comm_cost, parallel_setup_cost, parallel_startup_cost * cpu_tuple_comm_cost - Cost of CPU time to pass a tuple from worker to master backend with default value DEFAULT_CPU_TUPLE_COMM_COST as 0.1, this will be multiplied with tuples expected to be selected * parallel_setup_cost - Cost of setting up shared memory for parallelism with default value as 100.0 * parallel_startup_cost - Cost of starting up parallel workers with default value as 1000.0 multiplied by number of workers decided for scan. I will do some experiments to finalise the default values, but in general, I feel developing cost model on above parameters is good. Execution: Most other databases does partition level scan for partition on different disks by each individual parallel worker. However, it seems amazon dynamodb [2] also works on something similar to what I have used in patch which means on fixed blocks. I think this kind of strategy seems better than dividing the blocks at runtime because dividing randomly the blocks among workers could lead to random scan for a parallel sequential scan. Also I find in whatever I have read (Oracle, dynamodb) that most databases divide work among workers and master backend acts as coordinator, atleast that's what I could understand. Let me know your opinion about the same? I am planning to proceed with above ideas to strengthen the patch in absence of any objection or better ideas. [1] : http://i.stanford.edu/pub/cstr/reports/cs/tr/96/1570/CS-TR-96-1570.pdf [2] : http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [PATCH] server_version_num should be GUC_REPORT
Craig Ringer cr...@2ndquadrant.com writes: While looking into client code that relies on parsing server_version instead of checking server_version_num, I was surprised to discover that server_version_num isn't sent to the client by the server as part of the standard set of parameters reported post-auth. Why should it be? server_version is what's documented to be sent. The attached patch marks server_version_num GUC_REPORT and documents that it's reported to the client automatically. I think this is just a waste of network bandwidth. No client-side code could safely depend on its being available for many years yet, therefore they're going to keep using server_version. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers