Re: [HACKERS] Parallel Seq Scan
On Sat, Dec 6, 2014 at 10:43 AM, David Rowley wrote: > On 4 December 2014 at 19:35, Amit Kapila wrote: >> >> Attached patch is just to facilitate the discussion about the >> parallel seq scan and may be some other dependent tasks like >> sharing of various states like combocid, snapshot with parallel >> workers. It is by no means ready to do any complex test, ofcourse >> I will work towards making it more robust both in terms of adding >> more stuff and doing performance optimizations. >> >> Thoughts/Suggestions? >> > > This is good news! Thanks. > I've not gotten to look at the patch yet, but I thought you may be able to make use of the attached at some point. > I also think so, that it can be used in near future to enhance and provide more value to the parallel scan feature. Thanks for taking the initiative to do the leg-work for supporting aggregates. > It's bare-bones core support for allowing aggregate states to be merged together with another aggregate state. I would imagine that if a query such as: > > SELECT MAX(value) FROM bigtable; > > was run, then a series of parallel workers could go off and each find the max value from their portion of the table and then perhaps some other node type would then take all the intermediate results from the workers, once they're finished, and join all of the aggregate states into one and return that. Naturally, you'd need to check that all aggregates used in the targetlist had a merge function first. > Direction sounds to be right. > This is just a few hours of work. I've not really tested the pg_dump support or anything yet. I've also not added any new functions to allow AVG() or COUNT() to work, I've really just re-used existing functions where I could, as things like MAX() and BOOL_OR() can just make use of the existing transition function. I thought that this might be enough for early tests. > > I'd imagine such a workload, ignoring IO overhead, should scale pretty much linearly with the number of worker processes. Of course, if there was a GROUP BY clause then the merger code would have to perform more work. > Agreed. > If you think you might be able to make use of this, then I'm willing to go off and write all the other merge functions required for the other aggregates. > Don't you think that first we should stabilize the basic (target list and quals that can be independently evaluated by workers) parallel scan and then jump to do such enhancements? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On Sat, Dec 6, 2014 at 12:27 AM, Jim Nasby wrote: > On 12/5/14, 9:08 AM, José Luis Tallón wrote: >> >> >> More over, when load goes up, the relative cost of parallel working should go up as well. >> Something like: >> p = number of cores >> l = 1min-load >> >> additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1) >> >> (for c>1, of course) > > > ... > >> The parallel seq scan nodes are definitively the best approach for "parallel query", since the planner can optimize them based on cost. >> I'm wondering about the ability to modify the implementation of some methods themselves once at execution time: given a previously planned query, chances are that, at execution time (I'm specifically thinking about prepared statements here), a different implementation of the same "node" might be more suitable and could be used instead while the condition holds. > > > These comments got me wondering... would it be better to decide on parallelism during execution instead of at plan time? That would allow us to dynamically scale parallelism based on system load. If we don't even consider parallelism until we've pulled some number of tuples/pages from a relation, > >this would also eliminate all parallel overhead on small relations. > -- I think we have access to this information in planner (RelOptInfo -> pages), if we want, we can use that to eliminate the small relations from parallelism, but question is how big relations do we want to consider for parallelism, one way is to check via tests which I am planning to follow, do you think we have any heuristic which we can use to decide how big relations should be consider for parallelism? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On Fri, Dec 5, 2014 at 8:43 PM, Stephen Frost wrote: > > José, > > * José Luis Tallón (jltal...@adv-solutions.net) wrote: > > On 12/04/2014 07:35 AM, Amit Kapila wrote: > > >The number of worker backends that can be used for > > >parallel seq scan can be configured by using a new GUC > > >parallel_seqscan_degree, the default value of which is zero > > >and it means parallel seq scan will not be considered unless > > >user configures this value. > > > > The number of parallel workers should be capped (of course!) at the > > maximum amount of "processors" (cores/vCores, threads/hyperthreads) > > available. > > > > More over, when load goes up, the relative cost of parallel working > > should go up as well. > > Something like: > > p = number of cores > > l = 1min-load > > > > additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1) > > > > (for c>1, of course) > > While I agree in general that we'll need to come up with appropriate > acceptance criteria, etc, I don't think we want to complicate this patch > with that initially. > >A SUSET GUC which caps the parallel GUC would be > enough for an initial implementation, imv. > This is exactly what I have done in patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On Fri, Dec 5, 2014 at 8:46 PM, Stephen Frost wrote: > > Amit, > > * Amit Kapila (amit.kapil...@gmail.com) wrote: > > postgres=# explain select c1 from t1; > > QUERY PLAN > > -- > > Seq Scan on t1 (cost=0.00..101.00 rows=100 width=4) > > (1 row) > > > > > > postgres=# set parallel_seqscan_degree=4; > > SET > > postgres=# explain select c1 from t1; > > QUERY PLAN > > -- > > Parallel Seq Scan on t1 (cost=0.00..25.25 rows=100 width=4) > >Number of Workers: 4 > >Number of Blocks Per Workers: 25 > > (3 rows) > > This is all great and interesting, but I feel like folks might be > waiting to see just what kind of performance results come from this (and > what kind of hardware is needed to see gains..). Initially I was thinking that first we should discuss if the design and idea used in patch is sane, but now as you have asked and even Robert has asked the same off list to me, I will take the performance data next week (Another reason why I have not taken any data is that still the work to push qualification down to workers is left which I feel is quite important). However I still think if I get some feedback on some of the basic things like below, it would be good. 1. As the patch currently stands, it just shares the relevant data (like relid, target list, block range each worker should perform on etc.) to the worker and then worker receives that data and form the planned statement which it will execute and send the results back to master backend. So the question here is do you think it is reasonable or should we try to form the complete plan for each worker and then share the same and may be other information as well like range table entries which are required. My personal gut feeling in this matter is that for long term it might be better to form the complete plan of each worker in master and share the same, however I think the current way as done in patch (okay that needs some improvement) is also not bad and quite easier to implement. 2. Next question related to above is what should be the output of ExplainPlan, as currently worker is responsible for forming its own plan, Explain Plan is not able to show the detailed plan for each worker, is that okay? 3. Some places where optimizations are possible: - Currently after getting the tuple from heap, it is deformed by worker and sent via message queue to master backend, master backend then forms the tuple and send it to upper layer which before sending it to frontend again deforms it via slot_getallattrs(slot). - Master backend currently receives the data from multiple workers serially. We can optimize in a way that it can check other queues, if there is no data in current queue. - Master backend is just responsible for coordination among workers It shares the required information to workers and then fetch the data processed by each worker, by using some more logic, we might be able to make master backend also fetch data from heap rather than doing just co-ordination among workers. I think in all above places we can do some optimisation, however we can do that later as well, unless they hit the performance badly for cases which people care most. 4. Should parallel_seqscan_degree value be dependent on other backend processes like MaxConnections, max_worker_processes, autovacuum_max_workers do or should it be independent like max_wal_senders? I think it is better to keep it dependent on other backend processes, however for simplicity, I have kept it similar to max_wal_senders for now. > There's likely to be > situations where this change is an improvement while also being cases > where it makes things worse. Agreed and I think that will be more clear after doing some performance tests. > One really interesting case would be parallel seq scans which are > executing against foreign tables/FDWs.. > Sure. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Parallel Seq Scan
On 4 December 2014 at 19:35, Amit Kapila wrote: > > Attached patch is just to facilitate the discussion about the > parallel seq scan and may be some other dependent tasks like > sharing of various states like combocid, snapshot with parallel > workers. It is by no means ready to do any complex test, ofcourse > I will work towards making it more robust both in terms of adding > more stuff and doing performance optimizations. > > Thoughts/Suggestions? > > This is good news! I've not gotten to look at the patch yet, but I thought you may be able to make use of the attached at some point. It's bare-bones core support for allowing aggregate states to be merged together with another aggregate state. I would imagine that if a query such as: SELECT MAX(value) FROM bigtable; was run, then a series of parallel workers could go off and each find the max value from their portion of the table and then perhaps some other node type would then take all the intermediate results from the workers, once they're finished, and join all of the aggregate states into one and return that. Naturally, you'd need to check that all aggregates used in the targetlist had a merge function first. This is just a few hours of work. I've not really tested the pg_dump support or anything yet. I've also not added any new functions to allow AVG() or COUNT() to work, I've really just re-used existing functions where I could, as things like MAX() and BOOL_OR() can just make use of the existing transition function. I thought that this might be enough for early tests. I'd imagine such a workload, ignoring IO overhead, should scale pretty much linearly with the number of worker processes. Of course, if there was a GROUP BY clause then the merger code would have to perform more work. If you think you might be able to make use of this, then I'm willing to go off and write all the other merge functions required for the other aggregates. Regards David Rowley merge_aggregate_state_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Dec 5, 2014 at 8:38 PM, José Luis Tallón wrote: > > On 12/04/2014 07:35 AM, Amit Kapila wrote: >> >> [snip] >> >> The number of worker backends that can be used for >> parallel seq scan can be configured by using a new GUC >> parallel_seqscan_degree, the default value of which is zero >> and it means parallel seq scan will not be considered unless >> user configures this value. > > > The number of parallel workers should be capped (of course!) at the maximum amount of "processors" (cores/vCores, threads/hyperthreads) available. > Also, it should consider MaxConnections configured by user. > More over, when load goes up, the relative cost of parallel working should go up as well. > Something like: > p = number of cores > l = 1min-load > > additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1) > > (for c>1, of course) > How will you identify load in above formula and what is exactly 'c' (is it parallel workers involved?). For now, I have managed this simply by having a configuration variable and it seems to me that the same should be good enough for first version, we can definitely enhance it in future version by dynamically allocating the number of workers based on their availability and need of query, but I think lets leave that for another day. > >> In ExecutorStart phase, initiate the required number of workers >> as per parallel seq scan plan and setup dynamic shared memory and >> share the information required for worker to execute the scan. >> Currently I have just shared the relId, targetlist and number >> of blocks to be scanned by worker, however I think we might want >> to generate a plan for each of the workers in master backend and >> then share the same to individual worker. > > [snip] >> >> Attached patch is just to facilitate the discussion about the >> parallel seq scan and may be some other dependent tasks like >> sharing of various states like combocid, snapshot with parallel >> workers. It is by no means ready to do any complex test, ofcourse >> I will work towards making it more robust both in terms of adding >> more stuff and doing performance optimizations. >> >> Thoughts/Suggestions? > > > Not directly (I haven't had the time to read the code yet), but I'm thinking about the ability to simply *replace* executor methods from an extension. > This could be an alternative to providing additional nodes that the planner can include in the final plan tree, ready to be executed. > > The parallel seq scan nodes are definitively the best approach for "parallel query", since the planner can optimize them based on cost. > I'm wondering about the ability to modify the implementation of some methods themselves once at execution time: given a previously planned query, chances are that, at execution time (I'm specifically thinking about prepared statements here), a different implementation of the same "node" might be more suitable and could be used instead while the condition holds. > Idea sounds interesting and I think probably in some cases different implementation of same node might help, but may be at this stage if we focus on one kind of implementation (which is a win for reasonable number of cases) and make it successful, then doing alternative implementations will be comparatively easier and have more chances of success. > If this latter line of thinking is too off-topic within this thread and there is any interest, we can move the comments to another thread and I'd begin work on a PoC patch. It might as well make sense to implement the executor overloading mechanism alongide the custom plan API, though. > Sure, please go ahead which ever way you like to proceed. If you want to contribute in this area/patch, then you are welcome. > Any comments appreciated. > > > Thank you for your work, Amit Many thanks to you as well for showing interest. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On 12/5/14, 5:49 PM, Josh Berkus wrote: On 12/05/2014 02:41 PM, Jim Nasby wrote: Perhaps we should also officially recommend production servers be setup to create core files. AFAIK the only downside is the time it would take to write a core that's huge because of shared buffers, but perhaps there's some way to avoid writing those? (That means the core won't help if the bug is due to something in a buffer, but that seems unlikely enough that the tradeoff is worth it...) Not practical in a lot of cases. For example, this user was unwilling to enable core dumps on the production replicas because writing out the 16GB of shared buffers they had took over 10 minutes in a test. Which is why I wondered if there's a way to avoid writing out shared buffers... But at least getting a stack trace would be a big start. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On Fri, Dec 5, 2014 at 3:49 PM, Josh Berkus wrote: > to enable core dumps on the production replicas because writing out the > 16GB of shared buffers they had took over 10 minutes in a test. No one ever thinks it'll happen to them anyway - recommending enabling core dumps seems like a waste of time, since as Tom mentioned package managers shouldn't be expected to get on board with that plan. I think a zero overhead backtrace feature from within a SIGSEGV handler (with appropriate precautions around corrupt/exhausted call stacks) using glibc is the right thing here. Indeed, glibc does have infrastructure that can be used to get a backtrace [1], which is probably what we'd end up using, but even POSIX has infrastructure like sigaltstack(). It can be done. [1] https://www.gnu.org/software/libc/manual/html_node/Backtraces.html -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On 12/05/2014 02:41 PM, Jim Nasby wrote: > Perhaps we should also officially recommend production servers be setup > to create core files. AFAIK the only downside is the time it would take > to write a core that's huge because of shared buffers, but perhaps > there's some way to avoid writing those? (That means the core won't help > if the bug is due to something in a buffer, but that seems unlikely > enough that the tradeoff is worth it...) Not practical in a lot of cases. For example, this user was unwilling to enable core dumps on the production replicas because writing out the 16GB of shared buffers they had took over 10 minutes in a test. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
Peter Geoghegan writes: > On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby wrote: >> Perhaps we should also officially recommend production servers be setup to >> create core files. AFAIK the only downside is the time it would take to >> write a core that's huge because of shared buffers > I don't think that's every going to be practical. I'm fairly sure that on some distros (Red Hat, at least) there is distro policy against having daemons produce core dumps by default, for multiple reasons including possible disk space consumption and leakage of secure information. So even if we recommended this, the recommendation would be overridden by some/many packagers. There is much to be said though for trying to emit at least a minimal stack trace into the postmaster log file. I'm pretty sure glibc has a function for that; dunno if it's going to be practical on other platforms. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby wrote: > Perhaps we should also officially recommend production servers be setup to > create core files. AFAIK the only downside is the time it would take to > write a core that's huge because of shared buffers I don't think that's every going to be practical. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On 12/5/14, 4:11 PM, Peter Geoghegan wrote: On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus wrote: We made some changes which decreased query cancel (optimizing queries, turning on hot_standby_feedback) and we haven't seen a segfault since then. As far as the user is concerned, this solves the problem, so I'm never going to get a trace or a core dump file. Forgot a major piece of evidence as to why I think this is related to query cancel: in each case, the segfault was preceeded by a multi-backend query cancel 3ms to 30ms beforehand. It is possible that the backend running the query which segfaulted might have been the only backend *not* cancelled due to query conflict concurrently. Contradicting this, there are other multi-backend query cancels in the logs which do NOT produce a segfault. I wonder if it would be useful to add additional instrumentation so that even without a core dump, there was some cursory information about the nature of a segfault. Yes, doing something with a SIGSEGV handler is very scary, and there are major portability concerns (e.g. https://bugs.ruby-lang.org/issues/9654), but I believe it can be made robust on Linux. For what it's worth, this open source project offers that kind of functionality in the form of a library: https://github.com/vmarkovtsev/DeathHandler Perhaps we should also officially recommend production servers be setup to create core files. AFAIK the only downside is the time it would take to write a core that's huge because of shared buffers, but perhaps there's some way to avoid writing those? (That means the core won't help if the bug is due to something in a buffer, but that seems unlikely enough that the tradeoff is worth it...) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
Stephen, Thanks for the feedback. > > diff --git a/src/backend/access/transam/xlogfuncs.c > b/src/backend/access/transam/xlogfuncs.c > > --- 56,62 > > > > backupidstr = text_to_cstring(backupid); > > > > ! if (!superuser() && !check_role_attribute(GetUserId(), > ROLE_ATTR_REPLICATION)) > > ereport(ERROR, > > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > errmsg("must be superuser or replication role to run a > backup"))); > > The point of has_role_attribute() was to avoid the need to explicitly > say "!superuser()" everywhere. The idea with check_role_attribute() is > that we want to present the user (in places like pg_roles) with the > values that are *actually* set. > > In other words, the above should just be: > > if (!has_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION)) > Yes, I understand. My original thought here was that I was replacing 'has_rolreplication' which didn't perform any superuser check and that I was trying to adhere to minimal changes. But I agree this would be the appropriate solution. Fixed. > > > + /* > > + * check_role_attribute > > + * Check if the role with the specified id has been assigned a > specific > > + * role attribute. This function does not allow any superuser > bypass. > > I don't think we need to say that it doesn't "allow" superuser bypass. > Instead, I'd comment that has_role_attribute() should be used for > permissions checks while check_role_attribute is for checking what the > value in the rolattr bitmap is and isn't for doing permissions checks > directly. > Ok. Understood. Fixed. > diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c > > *** pg_role_aclcheck(Oid role_oid, Oid rolei > > *** 4602,4607 > > --- 4603,4773 > > return ACLCHECK_NO_PRIV; > > } > > > > + /* > > + * pg_has_role_attribute_id_attr > > I'm trying to figure out what the point of the trailing "_attr" in the > function name is..? Doesn't seem necessary to have that for these > functions and it'd look a bit cleaner without it, imv. > So, I was trying to follow what I perceived as the following convention for these functions: pg. So, what "_attr" represents is the second argument of the function which is the attribute to check. I could agree that might be unnecessary, but that was my thought process on it. At any rate, I've removed it. > ! #define ROLE_ATTR_ALL 255 /* or (1 << N_ROLE_ATTRIBUTES) - 1 */ > > I'd say "equals" or something rather than "or" since that kind of > implies that it's an laternative, but we can't do that as discussed in > the comment (which I like). > Agreed. Fixed. > > ! /* role attribute check routines */ > > ! extern bool has_role_attribute(Oid roleid, RoleAttr attribute); > > ! extern bool check_role_attribute(Oid roleid, RoleAttr attribute); > > With all the 'has_role_attribute(GetUserId(), ROLEATTR_BLAH)' cases, I'd > suggest doing the same as 'superuser()' and also provide a simpler > version: 'have_role_attribute(ROLEATTR_BLAH)' which handles doing the > GetUserId() itself. That'd simplify quite a few of the above calls. > Good point. Added. Attached is an updated patch. -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c new file mode 100644 index 133143d..8475985 *** a/src/backend/access/transam/xlogfuncs.c --- b/src/backend/access/transam/xlogfuncs.c *** *** 22,32 --- 22,34 #include "access/xlog_internal.h" #include "access/xlogutils.h" #include "catalog/catalog.h" + #include "catalog/pg_authid.h" #include "catalog/pg_type.h" #include "funcapi.h" #include "miscadmin.h" #include "replication/walreceiver.h" #include "storage/smgr.h" + #include "utils/acl.h" #include "utils/builtins.h" #include "utils/numeric.h" #include "utils/guc.h" *** pg_start_backup(PG_FUNCTION_ARGS) *** 54,60 backupidstr = text_to_cstring(backupid); ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); --- 56,62 backupidstr = text_to_cstring(backupid); ! if (!have_role_attribute(ROLE_ATTR_REPLICATION)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); *** pg_stop_backup(PG_FUNCTION_ARGS) *** 82,88 { XLogRecPtr stoppoint; ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup"; --- 84,90 { XLogRecPtr stoppoint; ! if (!have_role_attribute(ROLE_ATTR_REPLICATION)) ereport(ERR
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus wrote: >> We made some changes which decreased query cancel (optimizing queries, >> turning on hot_standby_feedback) and we haven't seen a segfault since >> then. As far as the user is concerned, this solves the problem, so I'm >> never going to get a trace or a core dump file. > > Forgot a major piece of evidence as to why I think this is related to > query cancel: in each case, the segfault was preceeded by a > multi-backend query cancel 3ms to 30ms beforehand. It is possible that > the backend running the query which segfaulted might have been the only > backend *not* cancelled due to query conflict concurrently. > Contradicting this, there are other multi-backend query cancels in the > logs which do NOT produce a segfault. I wonder if it would be useful to add additional instrumentation so that even without a core dump, there was some cursory information about the nature of a segfault. Yes, doing something with a SIGSEGV handler is very scary, and there are major portability concerns (e.g. https://bugs.ruby-lang.org/issues/9654), but I believe it can be made robust on Linux. For what it's worth, this open source project offers that kind of functionality in the form of a library: https://github.com/vmarkovtsev/DeathHandler -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Elusive segfault with 9.3.5 & query cancel
On 12/05/2014 12:54 PM, Josh Berkus wrote: > Hackers, > > This is not a complete enough report for a diagnosis. I'm posting it > here just in case someone else sees something like it, and having an > additional report will help figure out the underlying issue. > > * 700GB database with around 5,000 writes per second > * 8 replicas handling around 10,000 read queries per second each > * replicas are slammed (40-70% utilization) > * replication produces lots of replication query cancels > > In this scenario, a specific query against some of the less busy and > fairly small tables would produce a segfault (signal 11) once every 1-4 > days randomly. This query could have 100's of successful runs for every > segfault. This was not reproduceable manually, and the segfaults never > happened on the master. Nor did we ever see a segfault based on any > other query, including against the tables which were generally the > source of the query cancels. > > In case it's relevant, the query included use of regexp_split_to_array() > and ORDER BY random(), neither of which are generally used in the user's > other queries. > > We made some changes which decreased query cancel (optimizing queries, > turning on hot_standby_feedback) and we haven't seen a segfault since > then. As far as the user is concerned, this solves the problem, so I'm > never going to get a trace or a core dump file. Forgot a major piece of evidence as to why I think this is related to query cancel: in each case, the segfault was preceeded by a multi-backend query cancel 3ms to 30ms beforehand. It is possible that the backend running the query which segfaulted might have been the only backend *not* cancelled due to query conflict concurrently. Contradicting this, there are other multi-backend query cancels in the logs which do NOT produce a segfault. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Elusive segfault with 9.3.5 & query cancel
Hackers, This is not a complete enough report for a diagnosis. I'm posting it here just in case someone else sees something like it, and having an additional report will help figure out the underlying issue. * 700GB database with around 5,000 writes per second * 8 replicas handling around 10,000 read queries per second each * replicas are slammed (40-70% utilization) * replication produces lots of replication query cancels In this scenario, a specific query against some of the less busy and fairly small tables would produce a segfault (signal 11) once every 1-4 days randomly. This query could have 100's of successful runs for every segfault. This was not reproduceable manually, and the segfaults never happened on the master. Nor did we ever see a segfault based on any other query, including against the tables which were generally the source of the query cancels. In case it's relevant, the query included use of regexp_split_to_array() and ORDER BY random(), neither of which are generally used in the user's other queries. We made some changes which decreased query cancel (optimizing queries, turning on hot_standby_feedback) and we haven't seen a segfault since then. As far as the user is concerned, this solves the problem, so I'm never going to get a trace or a core dump file. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
On Fri, Dec 5, 2014 at 04:10:12PM -0300, Alvaro Herrera wrote: > Well, ALTER TABLE is special: you can give several subcommands, and each > subcommand can be one of a rather long list of possible subcommands. > Testing every combination would mean a combinatorial explosion, which > would indeed be too large. But surely we want a small bunch of tests to > prove that having several subcommands works fine, and also at least one > test for every possible subcommand. > > > We have rejected simple regression test additions on the basis that > > the syntax works and is unlikely to break once tested once by the > > developer. > > This rationale doesn't sound so good to me. Something might work fine > the minute it is committed, but someone else might break it > inadvertently later; this has actually happened. Having no tests at all > for a feature isn't good. > > I know we have recently rejected patches that added tests only to > improve the coverage percent, for instance in CREATE DATABASE, because > the runtime of the tests got too large. Are there other examples of > rejected tests? Yes, there are many cases we have added options or keywords but didn't add a regression test. > > > > and it could easily bloat the regression tests over time. > > > > > > We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this > > > qualify as bloat? > > > > No, that seems fine. I am worried about having to have a test for every > > syntax change, which we currently don't do? Was that issue not clear in > > my first email? > > Well, if with "every syntax change" you mean "every feature addition", > then I think we should have at least one test for each, yes. It's not > like we add new syntax every day anyway. Well, my point is that this is a new behavior we have to do, at least to test the logical DDL behavior --- I suppose it could be remove after testing. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Fri, Dec 5, 2014 at 12:33 PM, Robert Haas wrote: >> Well, if an alias is used, and you refer to an attribute using a >> non-alias name (i.e. the original table name), then you'll already get >> an error suggesting that the alias be used instead -- of course, >> that's nothing new. It doesn't matter to the existing hinting >> mechanism if the attribute name is otherwise wrong. Once you fix the >> code to use the alias suggested, you'll then get this new >> Levenshtein-based hint. > > In that case, I think I favor giving no hint at all when the RTE name > is specified but doesn't match exactly. I don't follow. The existing mechanism only concerns what to do when the original table name was used when an alias should have been used instead. What does that have to do with this patch? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Wed, Dec 3, 2014 at 9:21 PM, Peter Geoghegan wrote: > On Tue, Dec 2, 2014 at 1:11 PM, Robert Haas wrote: >> Basically, the case in which I think it's helpful to issue a >> suggestion here is when the user has used the table name rather than >> the alias name. I wonder if it's worth checking for that case >> specifically, in lieu of what you've done here, and issuing a totally >> different hint in that case ("HINT: You must refer to this as column >> as "prime_minister.id" rather than "cameron.id"). > > Well, if an alias is used, and you refer to an attribute using a > non-alias name (i.e. the original table name), then you'll already get > an error suggesting that the alias be used instead -- of course, > that's nothing new. It doesn't matter to the existing hinting > mechanism if the attribute name is otherwise wrong. Once you fix the > code to use the alias suggested, you'll then get this new > Levenshtein-based hint. In that case, I think I favor giving no hint at all when the RTE name is specified but doesn't match exactly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Fri, Dec 5, 2014 at 3:05 PM, Jim Nasby wrote: >> On what basis do you expect that? Every time you use a view, you're >> using a pg_node_tree. Nobody's ever complained that having to reload >> the pg_node_tree column was too slow, and I see no reason to suppose >> that things would be any different here. >> >> I mean, we can certainly invent something new if there is a reason to >> do so. But you (and a few other people) seem to be trying pretty hard >> to avoid using the massive amount of infrastructure that we already >> have to do almost this exact thing, which puzzles the heck out of me. > > My concern is how to do the routing of incoming tuples. I'm assuming it'd be > significantly faster to compare two tuples than to run each tuple through a > bunch of nodetrees. As I said before, that's a completely unrelated problem. To quickly route tuples for range or list partitioning, you're going to want to have an array of Datums in memory and bseach it. That says nothing about how they should be stored on disk. Whatever the on-disk representation looks like, the relcache is going to need to reassemble it into an array that can be binary-searched. As long as that's not hard to do - and none of the proposals here would make it hard to do - there's no reason to care about this from that point of view. At least, not that I can see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 12/5/14, 2:02 PM, Robert Haas wrote: On Fri, Dec 5, 2014 at 2:52 PM, Jim Nasby wrote: The other option would be to use some custom rowtype to store boundary values and have a method that can form a boundary tuple from a real one. Either way, I suspect this is better than frequently evaluating pg_node_trees. On what basis do you expect that? Every time you use a view, you're using a pg_node_tree. Nobody's ever complained that having to reload the pg_node_tree column was too slow, and I see no reason to suppose that things would be any different here. I mean, we can certainly invent something new if there is a reason to do so. But you (and a few other people) seem to be trying pretty hard to avoid using the massive amount of infrastructure that we already have to do almost this exact thing, which puzzles the heck out of me. My concern is how to do the routing of incoming tuples. I'm assuming it'd be significantly faster to compare two tuples than to run each tuple through a bunch of nodetrees. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Fri, Dec 5, 2014 at 1:01 AM, Anssi Kääriäinen wrote: > If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in > Django for the existing save() method, then it needs to know if the > result was an UPDATE or INSERT. If we are going to use this for other > operations (for example bulk merge of rows to the database), it would be > very convenient to have per-row updated/created information available so > that we can fire the post_save signals for the rows. If we don't have > that information available, it means we can't fire signals, and no > signals means we can't use the bulk merge operation internally as we > have to fire the signals where that happened before. > > Outside of Django there are likely similar reasons to want to know if > the result of an operation was a creation of a new row. The reason could > be creation of related row, doing some action in application layer, or > just UI message telling "object created successfully" vs "object updated > successfully". It probably isn't ideal, but you'd at least be able to do something with row level triggers in the absence of a standard way of directly telling if an insert or update was performed. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Fri, Dec 5, 2014 at 2:52 PM, Jim Nasby wrote: > The other option would be to use some custom rowtype to store boundary > values and have a method that can form a boundary tuple from a real one. > Either way, I suspect this is better than frequently evaluating > pg_node_trees. On what basis do you expect that? Every time you use a view, you're using a pg_node_tree. Nobody's ever complained that having to reload the pg_node_tree column was too slow, and I see no reason to suppose that things would be any different here. I mean, we can certainly invent something new if there is a reason to do so. But you (and a few other people) seem to be trying pretty hard to avoid using the massive amount of infrastructure that we already have to do almost this exact thing, which puzzles the heck out of me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 12/5/14, 1:22 PM, Jim Nasby wrote: On 12/5/14, 3:42 AM, Amit Langote wrote: > I think you are right. I think in this case we need something similar >to column pg_index.indexprs which is of type pg_node_tree(which >seems to be already suggested by Robert). So may be we can proceed >with this type and see if any one else has better idea. One point raised about/against pg_node_tree was the values represented therein would turn out to be too generalized to be used with advantage during planning. But, it seems we could deserialize it in advance back to the internal form (like an array of a struct) as part of the cached relation data. This overhead would only be incurred in case of partitioned tables. Perhaps this is what Robert suggested elsewhere. In order to store a composite type in a catalog, we would need to have one field that has the typid of the composite, and the field that stores the actual composite data would need to be a "dumb" varlena that stores the composite HeapTupleHeader. On further thought; if we disallow NULL as a partition boundary, we don't need a separate rowtype; we could just use the one associated with the relation itself. Presumably that would make comparing tuples to the relation list a lot easier. I was hung up on how that would work in the case of ALTER TABLE, but we'd have the same problem with using pg_node_tree: if you alter a table in such a way that *might* affect your partitioning, you have to do some kind of revalidation anyway. The other option would be to use some custom rowtype to store boundary values and have a method that can form a boundary tuple from a real one. Either way, I suspect this is better than frequently evaluating pg_node_trees. There may be one other option. If range partitions are defined in terms of an expression that is different for every partition (ie: (substr(product_key, 1, 4), date_trunc('month', sales_date))) then we could use a hash of that expression to identify a partition. In other words, range partitioning becomes a special case of hash partitioning. I do think we need a programmatic means to identify the range of an individual partition and hash won't solve that, but the performance of that case isn't critical so we could use pretty much whatever we wanted to there. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On 12/5/14, 3:42 AM, Amit Langote wrote: > I think you are right. I think in this case we need something similar >to column pg_index.indexprs which is of type pg_node_tree(which >seems to be already suggested by Robert). So may be we can proceed >with this type and see if any one else has better idea. One point raised about/against pg_node_tree was the values represented therein would turn out to be too generalized to be used with advantage during planning. But, it seems we could deserialize it in advance back to the internal form (like an array of a struct) as part of the cached relation data. This overhead would only be incurred in case of partitioned tables. Perhaps this is what Robert suggested elsewhere. In order to store a composite type in a catalog, we would need to have one field that has the typid of the composite, and the field that stores the actual composite data would need to be a "dumb" varlena that stores the composite HeapTupleHeader. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
Bruce Momjian wrote: > On Fri, Dec 5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote: > > Bruce Momjian wrote: > > > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > > > > Standard regression tests are helpful because patch authors include new > > > > test > > > > cases in the patches that stress their new options or commands. This > > > > new test > > > > framework needs to be something that internally runs the regression > > > > tests and > > > > exercises DDL deparsing as the regression tests execute DDL. That > > > > would mean > > > > that new commands and options would automatically be deparse-tested by > > > > our new > > > > framework as soon as patch authors add standard regress support. > > > > > > Are you saying every time a new option is added to a command that a new > > > regression test needs to be added? > > > > Not necessarily -- an existing test could be modified, as well. > > > > > We don't normally do that, > > > > I sure hope we do have all options covered by tests. > > Are you saying that every combination of ALTER options is tested? Well, ALTER TABLE is special: you can give several subcommands, and each subcommand can be one of a rather long list of possible subcommands. Testing every combination would mean a combinatorial explosion, which would indeed be too large. But surely we want a small bunch of tests to prove that having several subcommands works fine, and also at least one test for every possible subcommand. > We have rejected simple regression test additions on the basis that > the syntax works and is unlikely to break once tested once by the > developer. This rationale doesn't sound so good to me. Something might work fine the minute it is committed, but someone else might break it inadvertently later; this has actually happened. Having no tests at all for a feature isn't good. I know we have recently rejected patches that added tests only to improve the coverage percent, for instance in CREATE DATABASE, because the runtime of the tests got too large. Are there other examples of rejected tests? > > > and it could easily bloat the regression tests over time. > > > > We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this > > qualify as bloat? > > No, that seems fine. I am worried about having to have a test for every > syntax change, which we currently don't do? Was that issue not clear in > my first email? Well, if with "every syntax change" you mean "every feature addition", then I think we should have at least one test for each, yes. It's not like we add new syntax every day anyway. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
On Fri, Dec 5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote: > Bruce Momjian wrote: > > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > > > Standard regression tests are helpful because patch authors include new > > > test > > > cases in the patches that stress their new options or commands. This new > > > test > > > framework needs to be something that internally runs the regression tests > > > and > > > exercises DDL deparsing as the regression tests execute DDL. That would > > > mean > > > that new commands and options would automatically be deparse-tested by > > > our new > > > framework as soon as patch authors add standard regress support. > > > > Are you saying every time a new option is added to a command that a new > > regression test needs to be added? > > Not necessarily -- an existing test could be modified, as well. > > > We don't normally do that, > > I sure hope we do have all options covered by tests. Are you saying that every combination of ALTER options is tested? We have rejected simple regression test additions on the basis that the syntax works and is unlikely to break once tested once by the developer. > > and it could easily bloat the regression tests over time. > > We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this > qualify as bloat? No, that seems fine. I am worried about having to have a test for every syntax change, which we currently don't do? Was that issue not clear in my first email? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 12/5/14, 9:08 AM, José Luis Tallón wrote: More over, when load goes up, the relative cost of parallel working should go up as well. Something like: p = number of cores l = 1min-load additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1) (for c>1, of course) ... The parallel seq scan nodes are definitively the best approach for "parallel query", since the planner can optimize them based on cost. I'm wondering about the ability to modify the implementation of some methods themselves once at execution time: given a previously planned query, chances are that, at execution time (I'm specifically thinking about prepared statements here), a different implementation of the same "node" might be more suitable and could be used instead while the condition holds. These comments got me wondering... would it be better to decide on parallelism during execution instead of at plan time? That would allow us to dynamically scale parallelism based on system load. If we don't even consider parallelism until we've pulled some number of tuples/pages from a relation, this would also eliminate all parallel overhead on small relations. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
Adam, * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: > I have attached an updated patch that incorporates the feedback and > recommendations provided. Thanks. Comments below. > diff --git a/src/backend/access/transam/xlogfuncs.c > b/src/backend/access/transam/xlogfuncs.c > --- 56,62 > > backupidstr = text_to_cstring(backupid); > > ! if (!superuser() && !check_role_attribute(GetUserId(), > ROLE_ATTR_REPLICATION)) > ereport(ERROR, > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > errmsg("must be superuser or replication role to run a > backup"))); The point of has_role_attribute() was to avoid the need to explicitly say "!superuser()" everywhere. The idea with check_role_attribute() is that we want to present the user (in places like pg_roles) with the values that are *actually* set. In other words, the above should just be: if (!has_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION)) > --- 84,90 > { > XLogRecPtr stoppoint; > > ! if (!superuser() && !check_role_attribute(GetUserId(), > ROLE_ATTR_REPLICATION)) > ereport(ERROR, > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), >(errmsg("must be superuser or replication role to run a > backup"; Ditto here. > diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c > --- 5031,5081 > } > > /* > ! * has_role_attribute > ! * Check if the role has the specified role has a specific role attribute. > ! * This function will always return true for roles with superuser > privileges > ! * unless the attribute being checked is CATUPDATE. >* > ! * roleid - the oid of the role to check. > ! * attribute - the attribute to check. >*/ > bool > ! has_role_attribute(Oid roleid, RoleAttr attribute) > { > ! /* > ! * Superusers bypass all permission checking except in the case of > CATUPDATE. > ! */ > ! if (!(attribute & ROLE_ATTR_CATUPDATE) && superuser_arg(roleid)) > return true; > > ! return check_role_attribute(roleid, attribute); > } > > + /* > + * check_role_attribute > + * Check if the role with the specified id has been assigned a specific > + * role attribute. This function does not allow any superuser bypass. I don't think we need to say that it doesn't "allow" superuser bypass. Instead, I'd comment that has_role_attribute() should be used for permissions checks while check_role_attribute is for checking what the value in the rolattr bitmap is and isn't for doing permissions checks directly. > diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c > *** CreateRole(CreateRoleStmt *stmt) > *** 82,93 > boolencrypt_password = Password_encryption; /* encrypt > password? */ > charencrypted_password[MD5_PASSWD_LEN + 1]; > boolissuper = false;/* Make the user a superuser? */ > ! boolinherit = true; /* Auto inherit privileges? */ > boolcreaterole = false; /* Can this user create > roles? */ > boolcreatedb = false; /* Can the user create > databases? */ > boolcanlogin = false; /* Can this user login? > */ > boolisreplication = false; /* Is this a replication role? > */ > boolbypassrls = false; /* Is this a row > security enabled role? */ > int connlimit = -1; /* maximum connections allowed > */ > List *addroleto = NIL;/* roles to make this a member of */ > List *rolemembers = NIL; /* roles to be members of this > role */ > --- 74,86 > boolencrypt_password = Password_encryption; /* encrypt > password? */ > charencrypted_password[MD5_PASSWD_LEN + 1]; > boolissuper = false;/* Make the user a superuser? */ > ! boolinherit = true; /* Auto inherit privileges? */ > boolcreaterole = false; /* Can this user create > roles? */ > boolcreatedb = false; /* Can the user create > databases? */ > boolcanlogin = false; /* Can this user login? > */ > boolisreplication = false; /* Is this a replication role? > */ > boolbypassrls = false; /* Is this a row > security enabled role? */ > + RoleAttrattributes = ROLE_ATTR_NONE;/* role attributes, > initialized to none. */ > int connlimit = -1; /* maximum connections allowed > */ > List *addroleto = NIL;/* roles to make this a member of */ > List *rolemembers = NIL; /* roles to be members of this > role */ Please clean up the wh
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Fri, Dec 5, 2014 at 1:07 PM, Peter Geoghegan wrote: > Agreed. Importantly, we won't have painted ourselves into a corner > where we cannot add it later, now that RETURNING projects updates > tuples, too (V1.5 established that). Yeah, it seems fine to postpone that to a later version, as long as we haven't painted ourselves into a corner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Fri, Dec 5, 2014 at 10:00 AM, Josh Berkus wrote: > I thought the point of INSERT ... ON CONFLICT update was so that you > didn't have to care if it was a new row or not? > > If you do care, it seems like it makes more sense to do your own INSERTs > and UPDATEs, as Django currently does. > > I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff > which let me know updated|inserted|ignored, but I also don't see it as a > feature requirement for 9.5. Agreed. Importantly, we won't have painted ourselves into a corner where we cannot add it later, now that RETURNING projects updates tuples, too (V1.5 established that). I'm pretty confident that it would be a mistake to try and make the inner "excluded" and "target" aliases visible, in any case, because of the annoying ambiguity it creates for the common, simple cases. I think it has to be a magic function, or something like that. I really hoped we'd be having a more technical, implementation level discussion at this point. Simon rightly emphasized the importance of getting the semantics right, and the importance of discussing that up front, but I'm concerned; that's almost all that has been discussed here so far, which is surely not balanced either. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSSAPI, SSPI - include_realm default
* Magnus Hagander (mag...@hagander.net) wrote: > On Wed, Nov 26, 2014 at 8:01 PM, Stephen Frost wrote: > > As such, I'd like to propose changing the default to be > > 'include_realm=1'. > > Per our previous discussions, but to make sure it's also on record for > others, +1 for this suggestion. Patch attached which does this for master. > > This would be done for 9.5 and we would need to note it in the release > > notes, of course. > > I suggest we also backpatch some documentation suggesting that people > manually change the include_realm parameter (perhaps also with a note > saying that the default will change in 9.5). I'll work on a patch for back-branches if everyone is alright with this patch against master. Given my recent track record for changing wording around, it seems prudent to get agreement on this first. Thanks, Stephen From fe4d7a01f5d31fac565a8de2485cd9d113a9cbb4 Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Fri, 5 Dec 2014 12:54:05 -0500 Subject: [PATCH] Change default for include_realm to zero The default behavior for GSS and SSPI authentication methods has long been to strip the realm off of the principal, however, this is not a secure approach in multi-realm environments and the use-case for the parameter at all has been superseded by the regex-based mapping support available in pg_ident.conf. Change the default for include_realm to be 'zero', meaning that we do NOT remove the realm from the principal by default. Any installations which depend on the existing behavior will need to update their configurations (ideally by leaving include_realm on and adding a mapping in pg_ident.conf). Note that the mapping capability exists in all currently supported versions of PostgreSQL and so this change can be done today. Barring that, existing users can update their configurations today to explicitly set include_realm=0 to ensure that the prior behavior is maintained when they upgrade. This needs to be noted in the release notes. Per discussion with Magnus and Peter. --- doc/src/sgml/client-auth.sgml | 66 --- src/backend/libpq/hba.c | 13 + 2 files changed, 57 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 7704f73..69517dd 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -943,15 +943,24 @@ omicron bryanh guest1 -Client principals must have their PostgreSQL database user -name as their first component, for example -pgusername@realm. Alternatively, you can use a user name -mapping to map from the first component of the principal name to the -database user name. By default, the realm of the client is -not checked by PostgreSQL. If you have cross-realm -authentication enabled and need to verify the realm, use the -krb_realm parameter, or enable include_realm -and use user name mapping to check the realm. +Client principals can be mapped to different PostgreSQL +database user names with pg_ident.conf. For example, +pgusername@realm could be mapped to just pgusername. +Alternatively, you can use the full username@realm principal as +the role name in PostgreSQL without any mapping. + + + +PostgreSQL also supports a parameter to strip the realm from +the principal. This method is supported for backwards compatibility and is +strongly discouraged as it is then impossible to distinguish different users +with the same username but coming from different realms. To enable this, +set include_realm to zero. For simple single-realm +installations, include_realm combined with the +krb_realm parameter (which checks that the realm provided +matches exactly what is in the krb_realm parameter) would be a secure but +less capable option compared to specifying an explicit mapping in +pg_ident.conf. @@ -993,10 +1002,13 @@ omicron bryanh guest1 include_realm -If set to 1, the realm name from the authenticated user -principal is included in the system user name that's passed through -user name mapping (). This is -useful for handling users from multiple realms. +If set to 0, the realm name from the authenticated user principal is +stripped off before being passed through the user name mapping +(). This is discouraged and is +primairly available for backwards compatibility as it is not secure +in multi-realm environments unless krb_realm is also used. Users +are recommended to leave include_realm set to the default (1) and to +provide an explicit mapping in pg_ident.conf. @@ -1008,10 +1020,11 @@ omicron bryanh guest1 Allows for mapping between system and database user names.
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
All, I have attached an updated patch that incorporates the feedback and recommendations provided. Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c new file mode 100644 index 133143d..ccdff09 *** a/src/backend/access/transam/xlogfuncs.c --- b/src/backend/access/transam/xlogfuncs.c *** *** 22,32 --- 22,34 #include "access/xlog_internal.h" #include "access/xlogutils.h" #include "catalog/catalog.h" + #include "catalog/pg_authid.h" #include "catalog/pg_type.h" #include "funcapi.h" #include "miscadmin.h" #include "replication/walreceiver.h" #include "storage/smgr.h" + #include "utils/acl.h" #include "utils/builtins.h" #include "utils/numeric.h" #include "utils/guc.h" *** pg_start_backup(PG_FUNCTION_ARGS) *** 54,60 backupidstr = text_to_cstring(backupid); ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); --- 56,62 backupidstr = text_to_cstring(backupid); ! if (!superuser() && !check_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("must be superuser or replication role to run a backup"))); *** pg_stop_backup(PG_FUNCTION_ARGS) *** 82,88 { XLogRecPtr stoppoint; ! if (!superuser() && !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup"; --- 84,90 { XLogRecPtr stoppoint; ! if (!superuser() && !check_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION)) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg("must be superuser or replication role to run a backup"; diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c new file mode 100644 index d30612c..0b23ba0 *** a/src/backend/catalog/aclchk.c --- b/src/backend/catalog/aclchk.c *** aclcheck_error_type(AclResult aclerr, Oi *** 3423,3448 } - /* Check if given user has rolcatupdate privilege according to pg_authid */ - static bool - has_rolcatupdate(Oid roleid) - { - bool rolcatupdate; - HeapTuple tuple; - - tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); - if (!HeapTupleIsValid(tuple)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg("role with OID %u does not exist", roleid))); - - rolcatupdate = ((Form_pg_authid) GETSTRUCT(tuple))->rolcatupdate; - - ReleaseSysCache(tuple); - - return rolcatupdate; - } - /* * Relay for the various pg_*_mask routines depending on object kind */ --- 3423,3428 *** pg_class_aclmask(Oid table_oid, Oid role *** 3630,3636 if ((mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) && IsSystemClass(table_oid, classForm) && classForm->relkind != RELKIND_VIEW && ! !has_rolcatupdate(roleid) && !allowSystemTableMods) { #ifdef ACLDEBUG --- 3610,3616 if ((mask & (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) && IsSystemClass(table_oid, classForm) && classForm->relkind != RELKIND_VIEW && ! !has_role_attribute(roleid, ROLE_ATTR_CATUPDATE) && !allowSystemTableMods) { #ifdef ACLDEBUG *** pg_extension_ownercheck(Oid ext_oid, Oid *** 5051,5102 } /* ! * Check whether specified role has CREATEROLE privilege (or is a superuser) * ! * Note: roles do not have owners per se; instead we use this test in ! * places where an ownership-like permissions test is needed for a role. ! * Be sure to apply it to the role trying to do the operation, not the ! * role being operated on! Also note that this generally should not be ! * considered enough privilege if the target role is a superuser. ! * (We don't handle that consideration here because we want to give a ! * separate error message for such cases, so the caller has to deal with it.) */ bool ! has_createrole_privilege(Oid roleid) { ! bool result = false; ! HeapTuple utup; ! ! /* Superusers bypass all permission checking. */ ! if (superuser_arg(roleid)) return true; ! utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); ! if (HeapTupleIsValid(utup)) ! { ! result = ((Form_pg_authid) GETSTRUCT(utup))->rolcreaterole; ! ReleaseSysCache(utup); ! } ! return result; } bool ! has_bypassrls_privilege(Oid roleid) { ! bool result = false; ! HeapTuple utup; ! /* Superusers bypass all permission checking. */ ! if (superuser_arg(roleid)) ! return true; ! utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); ! if (HeapTupleIsValid(utup)) ! { ! result = ((Fo
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On 12/05/2014 07:59 AM, Robert Haas wrote: > I think it's probably an important distinction, for the kinds of > reasons Anssi mentions, but we should look for some method other than > a system column of indicating it. Maybe there's a magic function that > returns a Boolean which you can call, or maybe some special clause, as > with WITH ORDINALITY. I thought the point of INSERT ... ON CONFLICT update was so that you didn't have to care if it was a new row or not? If you do care, it seems like it makes more sense to do your own INSERTs and UPDATEs, as Django currently does. I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff which let me know updated|inserted|ignored, but I also don't see it as a feature requirement for 9.5. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 5, 2014 at 1:49 AM, Rahila Syed wrote: >>If that's really true, we could consider having no configuration any >>time, and just compressing always. But I'm skeptical that it's >>actually true. > > I was referring to this for CPU utilization: > http://www.postgresql.org/message-id/1410414381339-5818552.p...@n5.nabble.com > > > The above tests were performed on machine with configuration as follows > Server specifications: > Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos > RAM: 32GB > Disk : HDD 450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos > 1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm I think that measurement methodology is not very good for assessing the CPU overhead, because you are only measuring the percentage CPU utilization, not the absolute amount of CPU utilization. It's not clear whether the duration of the tests was the same for all the configurations you tried - in which case the number of transactions might have been different - or whether the number of operations was exactly the same - in which case the runtime might have been different. Either way, it could obscure an actual difference in absolute CPU usage per transaction. It's unlikely that both the runtime and the number of transactions were identical for all of your tests, because that would imply that the patch makes no difference to performance; if that were true, you wouldn't have bothered writing it What I would suggest is instrument the backend with getrusage() at startup and shutdown and have it print the difference in user time and system time. Then, run tests for a fixed number of transactions and see how the total CPU usage for the run differs. Last cycle, Amit Kapila did a bunch of work trying to compress the WAL footprint for updates, and we found that compression was pretty darn expensive there in terms of CPU time. So I am suspicious of the finding that it is free here. It's not impossible that there's some effect which causes us to recoup more CPU time than we spend compressing in this case that did not apply in that case, but the projects are awfully similar, so I tend to doubt it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Fri, Dec 5, 2014 at 3:11 AM, Amit Langote wrote: >> I think you are right. I think in this case we need something similar >> to column pg_index.indexprs which is of type pg_node_tree(which >> seems to be already suggested by Robert). So may be we can proceed >> with this type and see if any one else has better idea. > > Yeah, with that, I was thinking we may be able to do something like dump a > Node that describes the range partition bounds or list of allowed values > (say, RangePartitionValues, ListPartitionValues). That's exactly what the kind of thing I was thinking about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Fri, Dec 5, 2014 at 2:18 AM, Amit Kapila wrote: > Do we really need to support dml or pg_dump for individual partitions? I think we do. It's quite reasonable for a DBA (or developer or whatever) to want to dump all the data that's in a single partition; for example, maybe they have the table partitioned, but also spread across several servers. When the data on one machine grows too big, they want to dump that partition, move it to a new machine, and drop the partition from the old machine. That needs to be easy and efficient. More generally, with inheritance, I've seen the ability to reference individual inheritance children be a real life-saver on any number of occasions. Now, a new partitioning system that is not as clunky as constraint exclusion will hopefully be fast enough that people don't need to do it very often any more. But I would be really cautious about removing the option. That is the equivalent of installing a new fire suppression system and then boarding up the emergency exit. Yeah, you *hope* the new fire suppression system is good enough that nobody will ever need to go out that way any more. But if you're wrong, people will die, so getting rid of it isn't prudent. The stakes are not quite so high here, but the principle is the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote wrote: >> So, we're going to support exactly two levels of partitioning? >> partitions with partissub=false and subpartitions with partissub=true? >> Why not support only one level of partitioning here but then let the >> children have their own pg_partitioned_rel entries if they are >> subpartitioned? That seems like a cleaner design and lets us support >> an arbitrary number of partitioning levels if we ever need them. > > Yeah, that's what I thought at some point in favour of dropping partissub > altogether. However, not that this design solves it, there is one question - > if we would want to support defining for a table both partition key and > sub-partition key in advance? That is, without having defined a first level > partition yet; in that case, what level do we associate sub-(sub-) > partitioning key with or more to the point where do we keep it? Do we really need to allow that? I think you let people partition a toplevel table, and then partition its partitions once they've been created. I'm not sure there's a good reason to associate the subpartitioning scheme with the toplevel table. For one thing, that forces all subpartitions to be partitioned the same way - do we want to insist on that? If we do, then I agree that we need to think a little harder here. > That would be a default partition. That is, where the tuples that don't > belong elsewhere (other defined partitions) go. VALUES clause of the > definition for such a partition would look like: > > (a range partition) ... VALUES LESS THAN MAXVALUE > (a list partition) ... VALUES DEFAULT > > There has been discussion about whether there shouldn't be such a place for > tuples to go. That is, it should generate an error if a tuple can't go > anywhere (or support auto-creating a new one like in interval partitioning?) I think Alvaro's response further down the thread is right on target. But to go into a bit more detail, let's consider the three possible cases: - Hash partitioning. Every key value gets hashed to some partition. The concept of an overflow or default partition doesn't even make sense. - List partitioning. Each key for which the user has defined a mapping gets sent to the corresponding partition. The keys that aren't mapped anywhere can either (a) cause an error or (b) get mapped to some default partition. It's probably useful to offer both behaviors. But I don't think it requires a partitionisoverflow column, because you can represent it some other way, such as by making partitionvalues NULL, which is otherwise meaningless. - Range partitioning. In this case, what you've basically got is a list of partition bounds and a list of target partitions. Suppose there are N partition bounds; then there will be N+1 targets. Some of those targets can be undefined, meaning an attempt to insert a key with that value will error out. For example, suppose the user defines a partition for values 1-3 and 10-13. Then your list of partition bounds looks like this: 1,3,10,13 And your list of destinations looks like this: undefined,firstpartition,undefined,secondpartition,undefined More commonly, the ranges will be contiguous, so that there are no gaps. If you have everything <10 in the first partition, everything 10-20 in the second partition, and everything else in a third partition, then you have bounds 10,20 and destinations firstpartition,secondpartition,thirdpartition. If you want values greater than 20 to error out, then you have bounds 10,20 and destinations firstpartition,secondpartition,undefined. In none of this do you really have "an overflow partition". Rather, the first and last destinations, if defined, catch everything that has a key lower than the lowest key or higher than the highest key. If not defined, you error out. > I wonder if your suggestion of pg_node_tree plays well here. This then could > be a list of CONSTs or some such... And I am thinking it's a concern only for > range partitions, no? (that is, a multicolumn partition key) I guess you could list or hash partition on multiple columns, too. And yes, this is why I though of pg_node_tree. >> > * DDL syntax (no multi-column partitioning, sub-partitioning support as >> > yet): >> > >> > -- create partitioned table and child partitions at once. >> > CREATE TABLE parent (...) >> > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ] >> > [ ( >> > PARTITION child >> >{ >> >VALUES LESS THAN { ... | MAXVALUE } -- for RANGE >> > | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST >> >} >> >[ WITH ( ... ) ] [ TABLESPACE tbs ] >> > [, ...] >> > ) ] ; >> >> How are you going to dump and restore this, bearing in mind that you >> have to preserve a bunch of OIDs across pg_upgrade? What if somebody >> wants to do pg_dump --table name_of_a_partition? >> > Assuming everything's (including partitioned relation and partitions at all > levels
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On Sat, Dec 6, 2014 at 12:49 AM, Robert Haas wrote: > On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier > wrote: > >> Here is patch which renames action_at_recovery_target to > >> recovery_target_action everywhere. > > Thanks, Looks good to me. > > > > A couple of things that would be good to document as well in > > recovery-config.sgml: > > - the fact that pause_at_recovery_target is deprecated. > > Why not just remove it altogether? I don't think the > backward-compatibility break is enough to get excited about, or to > justify the annoyance of carrying an extra parameter that does (part > of) the same thing. > At least we won't forget removing in the future something that has been marked as deprecated for years. So +1 here for a simple removal, and a mention in the future release notes. -- Michael
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Dec 4, 2014 at 1:27 PM, Peter Geoghegan wrote: > On Thu, Dec 4, 2014 at 3:04 AM, Craig Ringer wrote: >> Yes, I think that's pretty important. With a negative attno so it's >> treated as a "hidden" col that must be explicitly named to be shown and >> won't be confused with user columns. > > I think that the standard for adding a new system attribute ought to > be enormous. The only case where a new one was added post-Postgres95 > was "tableoid". I'm pretty sure that others aren't going to want to do > it that way. +1. System attributes are a mess; a negative attribute number implies not only that the column is by default hidden from view but also that it is stored in the tuple header rather than the tuple data. Using one here, where we're not even talking about a property of the tuple per se, would be a pretty goofy solution. > Besides, I'm not entirely convinced that this is actually > an important distinction to expose. I think it's probably an important distinction, for the kinds of reasons Anssi mentions, but we should look for some method other than a system column of indicating it. Maybe there's a magic function that returns a Boolean which you can call, or maybe some special clause, as with WITH ORDINALITY. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier wrote: >> Here is patch which renames action_at_recovery_target to >> recovery_target_action everywhere. > Thanks, Looks good to me. > > A couple of things that would be good to document as well in > recovery-config.sgml: > - the fact that pause_at_recovery_target is deprecated. Why not just remove it altogether? I don't think the backward-compatibility break is enough to get excited about, or to justify the annoyance of carrying an extra parameter that does (part of) the same thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw does not see enums
On Wed, Dec 3, 2014 at 5:17 PM, Tom Lane wrote: > David Fetter writes: >> On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote: >>> What do you mean "reconstruct the enum"? > >> Capture its state at the time when IMPORT FOREIGN SCHEMA is executed. >> Right now, if you try IMPORT SCHEMA on a foreign table with an enum in >> it, postgresql_fdw errors out rather than trying to notice that >> there's an enum definition which should precede creation and execute >> it in the correct order. > > Oh, you think IMPORT FOREIGN SCHEMA should try to import enums? > I doubt it. What happens if the enum already exists locally? > And why enums, and not domains, ranges, composite types, etc? Probably IMPORT FOREIGN SCHEMA should not attempt to include type dependencies. However, if they are present in the importer (that is, the type exists by name), it should assume that they are correct come what may. Something like 'IMPORT FOREIGN TYPE' would probably be needed to translate a type between servers. Unless the SQL standard has it or gets it I doubt it will ever appear but the status quo isn't too bad IMO. Personally I have no issues with the risks involved with type synchronizion; the problems faced are mostly academic with clean errors and easily managed unless binary format is used with postgres to postgres transfer (which IIRC the postgres fdw does not utilize at this time). User created types can't be transmitted between servers with the existing binary format; you have to transmit them as text and hope the structures agree. Binary format transmission in postgres tends to be quite a bit faster depending on the nature of the types involved, things like ints, numerics, and timestamps tend to be much faster. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-12-06 00:10:11 +0900, Michael Paquier wrote: > On Sat, Dec 6, 2014 at 12:06 AM, Michael Paquier > wrote: > > > > > > > > > On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed > > wrote: > > > >> I attempted quick review and could not come up with much except this > >> > >> + /* > >> +* Calculate the amount of FPI data in the record. Each backup block > >> +* takes up BLCKSZ bytes, minus the "hole" length. > >> +* > >> +* XXX: We peek into xlogreader's private decoded backup blocks for > >> the > >> +* hole_length. It doesn't seem worth it to add an accessor macro for > >> +* this. > >> +*/ > >> + fpi_len = 0; > >> + for (block_id = 0; block_id <= record->max_block_id; block_id++) > >> + { > >> + if (XLogRecHasCompressedBlockImage(record, block_id)) > >> + fpi_len += BLCKSZ - record->blocks[block_id].compress_len; > >> > >> > >> IIUC, fpi_len in case of compressed block image should be > >> > >> fpi_len = record->blocks[block_id].compress_len; > >> > > Yep, true. Patches need a rebase btw as Heikki fixed a commit related to > > the stats of pg_xlogdump. > > > > In any case, any opinions to switch this patch as "Ready for committer"? Needing a rebase is a obvious conflict to that... But I guess some wider looks afterwards won't hurt. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: > postgres=# explain select c1 from t1; > QUERY PLAN > -- > Seq Scan on t1 (cost=0.00..101.00 rows=100 width=4) > (1 row) > > > postgres=# set parallel_seqscan_degree=4; > SET > postgres=# explain select c1 from t1; > QUERY PLAN > -- > Parallel Seq Scan on t1 (cost=0.00..25.25 rows=100 width=4) >Number of Workers: 4 >Number of Blocks Per Workers: 25 > (3 rows) This is all great and interesting, but I feel like folks might be waiting to see just what kind of performance results come from this (and what kind of hardware is needed to see gains..). There's likely to be situations where this change is an improvement while also being cases where it makes things worse. One really interesting case would be parallel seq scans which are executing against foreign tables/FDWs.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel Seq Scan
José, * José Luis Tallón (jltal...@adv-solutions.net) wrote: > On 12/04/2014 07:35 AM, Amit Kapila wrote: > >The number of worker backends that can be used for > >parallel seq scan can be configured by using a new GUC > >parallel_seqscan_degree, the default value of which is zero > >and it means parallel seq scan will not be considered unless > >user configures this value. > > The number of parallel workers should be capped (of course!) at the > maximum amount of "processors" (cores/vCores, threads/hyperthreads) > available. > > More over, when load goes up, the relative cost of parallel working > should go up as well. > Something like: > p = number of cores > l = 1min-load > > additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1) > > (for c>1, of course) While I agree in general that we'll need to come up with appropriate acceptance criteria, etc, I don't think we want to complicate this patch with that initially. A SUSET GUC which caps the parallel GUC would be enough for an initial implementation, imv. > Not directly (I haven't had the time to read the code yet), but I'm > thinking about the ability to simply *replace* executor methods from > an extension. You probably want to look at the CustomScan thread+patch directly then.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Sat, Dec 6, 2014 at 12:06 AM, Michael Paquier wrote: > > > > On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed > wrote: > >> I attempted quick review and could not come up with much except this >> >> + /* >> +* Calculate the amount of FPI data in the record. Each backup block >> +* takes up BLCKSZ bytes, minus the "hole" length. >> +* >> +* XXX: We peek into xlogreader's private decoded backup blocks for >> the >> +* hole_length. It doesn't seem worth it to add an accessor macro for >> +* this. >> +*/ >> + fpi_len = 0; >> + for (block_id = 0; block_id <= record->max_block_id; block_id++) >> + { >> + if (XLogRecHasCompressedBlockImage(record, block_id)) >> + fpi_len += BLCKSZ - record->blocks[block_id].compress_len; >> >> >> IIUC, fpi_len in case of compressed block image should be >> >> fpi_len = record->blocks[block_id].compress_len; >> > Yep, true. Patches need a rebase btw as Heikki fixed a commit related to > the stats of pg_xlogdump. > In any case, any opinions to switch this patch as "Ready for committer"? -- Michael
Re: [HACKERS] Parallel Seq Scan
On 12/04/2014 07:35 AM, Amit Kapila wrote: [snip] The number of worker backends that can be used for parallel seq scan can be configured by using a new GUC parallel_seqscan_degree, the default value of which is zero and it means parallel seq scan will not be considered unless user configures this value. The number of parallel workers should be capped (of course!) at the maximum amount of "processors" (cores/vCores, threads/hyperthreads) available. More over, when load goes up, the relative cost of parallel working should go up as well. Something like: p = number of cores l = 1min-load additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1) (for c>1, of course) In ExecutorStart phase, initiate the required number of workers as per parallel seq scan plan and setup dynamic shared memory and share the information required for worker to execute the scan. Currently I have just shared the relId, targetlist and number of blocks to be scanned by worker, however I think we might want to generate a plan for each of the workers in master backend and then share the same to individual worker. [snip] Attached patch is just to facilitate the discussion about the parallel seq scan and may be some other dependent tasks like sharing of various states like combocid, snapshot with parallel workers. It is by no means ready to do any complex test, ofcourse I will work towards making it more robust both in terms of adding more stuff and doing performance optimizations. Thoughts/Suggestions? Not directly (I haven't had the time to read the code yet), but I'm thinking about the ability to simply *replace* executor methods from an extension. This could be an alternative to providing additional nodes that the planner can include in the final plan tree, ready to be executed. The parallel seq scan nodes are definitively the best approach for "parallel query", since the planner can optimize them based on cost. I'm wondering about the ability to modify the implementation of some methods themselves once at execution time: given a previously planned query, chances are that, at execution time (I'm specifically thinking about prepared statements here), a different implementation of the same "node" might be more suitable and could be used instead while the condition holds. If this latter line of thinking is too off-topic within this thread and there is any interest, we can move the comments to another thread and I'd begin work on a PoC patch. It might as well make sense to implement the executor overloading mechanism alongide the custom plan API, though. Any comments appreciated. Thank you for your work, Amit Regards, / J.L. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed wrote: > I attempted quick review and could not come up with much except this > > + /* > +* Calculate the amount of FPI data in the record. Each backup block > +* takes up BLCKSZ bytes, minus the "hole" length. > +* > +* XXX: We peek into xlogreader's private decoded backup blocks for the > +* hole_length. It doesn't seem worth it to add an accessor macro for > +* this. > +*/ > + fpi_len = 0; > + for (block_id = 0; block_id <= record->max_block_id; block_id++) > + { > + if (XLogRecHasCompressedBlockImage(record, block_id)) > + fpi_len += BLCKSZ - record->blocks[block_id].compress_len; > > > IIUC, fpi_len in case of compressed block image should be > > fpi_len = record->blocks[block_id].compress_len; > Yep, true. Patches need a rebase btw as Heikki fixed a commit related to the stats of pg_xlogdump. -- Michael
Re: [HACKERS] advance local xmin more aggressively
[ reviving a very old thread ] On Tue, Feb 10, 2009 at 4:11 PM, Tom Lane wrote: > Alvaro Herrera writes: >> For example, maybe we could keep track of counts of snapshots removed >> since the last xmin calculation, and only run this routine if the number >> is different from zero (or some small positive integer). > > I think most of the callers of SnapshotResetXmin already know they > removed something. > > It might be interesting for FreeSnapshot or something nearby to note > whether the snapshot being killed has xmin = proc's xmin, and only do > the update calculation if so. > > I still dislike the assumption that all resource owners are children of > a known owner. I suspect in fact that it's demonstrably wrong right > now, let alone in future (cf comments in PortalRun). If we're going to > do this then snapmgr.c needs to track the snapshots for itself. Of > course that's going to make the "is it worth it" question even more > pressing. I've run into essentially the same problem Jeff originally complained about with a large customer who has long-running transactions that make extensive use of cursors. Cursors are opened and closed over time but it is rare for the number open to reach exactly zero, so what ends up happening is that the backend xmin does not advance. As you can imagine, that doesn't work out well. The approach I came up with initially was similar to Jeff's: start at the topmost resource owner and traverse them all, visiting every snapshot along the way. But then I found this thread and saw the comment that this might be "demonstrably wrong" and referring to the comments in PortalRun. Having reviewed those comments, which don't seem to have changed much in the last five years, I can't understand how they related to this issue. It's true that the TopTransaction resource owner could get swapped out under us during an internal commit, but why should SnapshotResetXmin() have to care? It just traverses the one that is in effect at the time it gets called. The only real danger I see here is that there could be *more than one* toplevel resource owner. I wonder if we could solve that problem by adding a registry of active toplevel resource owners, so that if we have a forest rather than a tree we can still find everything. The problem I see with having snapmgr.c track the snapshots for itself is that it is mostly duplicating of bookkeeping which is already being done. Since this problem doesn't affect the majority of users, it's not desirable to add a lot of extra bookkeeping to cater to it - but even if it did affect a lot of users, we still want it to be as cheap as possible, and reusing the tracking that resource owners are already doing seems like the way to get there. I think there are a couple of things we can do to make this cheaper. Suppose we keep track of the oldest xmin of any registered snapshot and the number of registered snapshots that have that particular xmin. Every time we deregister a snapshot, we check whether it is one of the ones with the minimum xmin; if it is, we decrement the count. When the count reaches 0, we know that a traversal of all registered snapshots is guaranteed to find a newer value to advertise in MyProc->xmin; that way, we can arrange to do the work only when it's likely to pay off. In most cases this won't happen until the last snapshot is unregistered, because our snapshots normally form a stack, with newer snapshots having been taken later. But if somebody unregisters the oldest snapshot we'll immediately notice and recalculate. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLOG_PARAMETER_CHANGE (WAL record) missing two params in its desc routine
On Fri, Dec 5, 2014 at 7:12 PM, Heikki Linnakangas wrote: > On 12/05/2014 04:54 AM, Michael Paquier wrote: > >> Hi all, >> >> While reading the code in this area this morning, I noticed that >> wal_log_hints and track_commit_timestamp are not mentioned in the desc >> routine of XLOG_CHANGE_PARAMETER. Also, it is not mentioned in >> postgresql.conf.sample that a value update of wal_log_hints requires a >> system restart. >> >> In order to fix those things, attached are two patches: >> - patch 1 should be applied back to 9.4 where wal_log_hints has been added >> > > You got the wal_level and wal_log_hints strings backwards: > Oops. Thanks for double-checking. -- Michael
Re: [HACKERS] Review of GetUserId() Usage
* Andres Freund (and...@2ndquadrant.com) wrote: > Is the 'Only allow superusers to signal superuser-owned backends' check > actually safe that way? I personally try to never use a superuser role > as the login user, but grant my account a superuser role that doesn't > inherit. But IIRC PGPROC->roleId won't change, even if a user does SET > ROLE. You're correct- but it's exactly the same as it is today. If you grant another user your role and then they 'SET ROLE' to you, they can cancel any of your queries or terminate your backends, regardless of if those roles have done some other 'SET ROLE'. This change only removes the need for those users to 'SET ROLE' to your user first. The backend isn't considered 'superuser-owned' unless it's the login role that's a superuser. It might be interesting to change that to mean 'when a SET ROLE to superuser has been done', but what about security definer functions or other transient escalation to superuser? Would those calls have to muck with PGPROC->roleId? If we want to go there, it should definitely be a different patch. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Review of GetUserId() Usage
On 2014-12-05 09:28:13 -0500, Stephen Frost wrote: > static int > pg_signal_backend(int pid, int sig) > { > @@ -113,7 +117,12 @@ pg_signal_backend(int pid, int sig) > return SIGNAL_BACKEND_ERROR; > } > > - if (!(superuser() || proc->roleId == GetUserId())) > + /* Only allow superusers to signal superuser-owned backends. */ > + if (superuser_arg(proc->roleId) && !superuser()) > + return SIGNAL_BACKEND_NOSUPERUSER; > + > + /* Users can signal backends they have role membership in. */ > + if (!has_privs_of_role(GetUserId(), proc->roleId)) > return SIGNAL_BACKEND_NOPERMISSION; > > /* > @@ -141,35 +150,49 @@ pg_signal_backend(int pid, int sig) > } Is the 'Only allow superusers to signal superuser-owned backends' check actually safe that way? I personally try to never use a superuser role as the login user, but grant my account a superuser role that doesn't inherit. But IIRC PGPROC->roleId won't change, even if a user does SET ROLE. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of GetUserId() Usage
* Stephen Frost (sfr...@snowman.net) wrote: > * Stephen Frost (sfr...@snowman.net) wrote: > > > 3. It messes around with pg_signal_backend(). There are currently no > > > cases in which pg_signal_backend() throws an error, which is good, > > > because it lets you write queries against pg_stat_activity() that > > > don't fail halfway through, even if you are missing permissions on > > > some things. This patch introduces such a case, which is bad. > > > > Good point, I'll move that check up into the other functions, which will > > allow for a more descriptive error as well. > > Err, I'm missing something here, as pg_signal_backend() is a misc.c > static internal function? How would you be calling it from a query > against pg_stat_activity()? > > I'm fine making the change anyway, just curious.. Updated patch attached which move the ereport() out of pg_signal_backend() and into its callers, as the other permissions checks are done, and includes the documentation changes. The error messages are minimally changed to match the new behvaior. Thanks, Stephen From c60718e7a7ecb8d671627271533d6bd2b6878782 Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Mon, 1 Dec 2014 11:13:09 -0500 Subject: [PATCH] GetUserId() changes to has_privs_of_role() The pg_stat and pg_signal-related functions have been using GetUserId() instead of has_privs_of_role() for checking if the current user should be able to see details in pg_stat_activity or signal other processes, requiring a user to do 'SET ROLE' for inheirited roles for a permissions check, unlike other permissions checks. This patch changes that behavior to, instead, act like most other permission checks and use has_privs_of_role(), removing the 'SET ROLE' need. Documentation and error messages updated accordingly. Per discussion with Alvaro, Peter, Adam (though not using Adam's patch), and Robert. --- doc/src/sgml/func.sgml | 13 ++--- src/backend/utils/adt/misc.c| 39 + src/backend/utils/adt/pgstatfuncs.c | 19 +- 3 files changed, 47 insertions(+), 24 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 62ec275..dbb61dc 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -16121,9 +16121,9 @@ SELECT set_config('log_statement_stats', 'off', false); pg_cancel_backend(pid int) boolean - Cancel a backend's current query. You can execute this against -another backend that has exactly the same role as the user calling the -function. In all other cases, you must be a superuser. + Cancel a backend's current query. This is also allowed if the +calling role is a member of the role whose backend is being cancelled, +however only superusers can cancel superuser backends. @@ -16145,10 +16145,9 @@ SELECT set_config('log_statement_stats', 'off', false); pg_terminate_backend(pid int) boolean - Terminate a backend. You can execute this against -another backend that has exactly the same role as the user -calling the function. In all other cases, you must be a -superuser. + Terminate a backend. This is also allowed if the calling role +is a member of the role whose backend is being terminated, however only +superusers can terminate superuser backends. diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 67539ec..2e74eb9 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -37,6 +37,7 @@ #include "utils/lsyscache.h" #include "utils/ruleutils.h" #include "tcop/tcopprot.h" +#include "utils/acl.h" #include "utils/builtins.h" #include "utils/timestamp.h" @@ -81,7 +82,9 @@ current_query(PG_FUNCTION_ARGS) * role as the backend being signaled. For "dangerous" signals, an explicit * check for superuser needs to be done prior to calling this function. * - * Returns 0 on success, 1 on general failure, and 2 on permission error. + * Returns 0 on success, 1 on general failure, 2 on normal permission error + * and 3 if the caller needs to be a superuser. + * * In the event of a general failure (return code 1), a warning message will * be emitted. For permission errors, doing that is the responsibility of * the caller. @@ -89,6 +92,7 @@ current_query(PG_FUNCTION_ARGS) #define SIGNAL_BACKEND_SUCCESS 0 #define SIGNAL_BACKEND_ERROR 1 #define SIGNAL_BACKEND_NOPERMISSION 2 +#define SIGNAL_BACKEND_NOSUPERUSER 3 static int pg_signal_backend(int pid, int sig) { @@ -113,7 +117,12 @@ pg_signal_backend(int pid, int sig) return SIGNAL_BACKEND_ERROR; } - if (!(superuser() || proc->roleId == GetUserId())) + /* Only allow superusers to signal superuser-owned backends. */ + if (superuser_arg(proc->roleId) && !superuser()) + return SIGNAL_BACKEND_NOSUPERUSER; + + /* Users c
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
I attempted quick review and could not come up with much except this + /* +* Calculate the amount of FPI data in the record. Each backup block +* takes up BLCKSZ bytes, minus the "hole" length. +* +* XXX: We peek into xlogreader's private decoded backup blocks for the +* hole_length. It doesn't seem worth it to add an accessor macro for +* this. +*/ + fpi_len = 0; + for (block_id = 0; block_id <= record->max_block_id; block_id++) + { + if (XLogRecHasCompressedBlockImage(record, block_id)) + fpi_len += BLCKSZ - record->blocks[block_id].compress_len; IIUC, fpi_len in case of compressed block image should be fpi_len = record->blocks[block_id].compress_len; Thank you, Rahila Syed -- View this message in context: http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5829403.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compress method for spgist - 2
On 12/01/2014 02:44 PM, Teodor Sigaev wrote: Initial message: http://www.postgresql.org/message-id/5447b3ff.2080...@sigaev.ru Second version fixes a forgotten changes in pg_am. + /* Get the information we need about each relevant datatypes */ + + if (OidIsValid(cache->config.leafType) && cache->config.leafType != atttype) + { + if (!OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC))) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("compress method is not defined although input and leaf types are differ"))); + + fillTypeDesc(&cache->attType, cache->config.leafType); + } + else + { + fillTypeDesc(&cache->attType, atttype); + } + For some datatypes, the compress method might be useful even if the leaf type is the same as the column type. For example, you could allow indexing text datums larger than the page size, with a compress function that just truncates the input. Could you find some use for this in one of the built-in or contrib types? Just to have something that exercises it as part of the regression suite. How about creating an opclass for the built-in polygon type that stores the bounding box, like the PostGIS guys are doing? The documentation needs to be updated. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
* Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Dec 4, 2014 at 3:59 PM, Stephen Frost wrote: > > I have a hard time wrapping my head around what a *lot* of our users ask > > when they see a given error message, but if our error message is 'you > > must have a pear-shaped object to run this command' then I imagine that > > a new-to-PG user might think "well, I can't create pear shaped objects > > in PG, so I guess this just isn't supported." That's not necessairly > > any fault of ours, but I do think "permission denied" reduces the chance > > that there's any confusion about the situation. > > This is just ridiculous. You're postulating the existing of a person > who thinks that a replication role is something that they buy at > Wendi's rather than something granted by the database administrator. > Give me a break. I was pointing out that it might not be recognizable as a permission by a person new to PG, yes. I did not characterize that individual as a database administrator, nor would they characterize themselves that way (at least, the person I'm thinking of). > > I do prefer my version but it's not an unfounded preference. > > I never said that your preference was unfounded. I said that I > disagreed with it. And this doesn't seem like it's going to change and certainly not over email and so I'm going to go with Alvaro's approach and simply drop it. I'll finish up the changes that I commented on above and move on. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] initdb: Improve error recovery.
On 11/27/2014 02:28 PM, Mats Erik Andersson wrote: Hello there, I would like to improve error recovery of initdb when the password file is empty. The present code declares "Error 0" as the cause of failure. It suffices to use ferrer() since fgets() returns NULL also at a premature EOF. Thanks, committed. In addition, a minor case is the need of a line feed in order to print the error message on a line of its own seems desirable. Hmm. If you've piped stdout somewhere else, that produces a spurious empty line in stderr: $ bin/initdb -D data-foo > /dev/null initdb: input file "/home/heikki/pgsql.master/share/postgres.bki" does not belong to PostgreSQL 9.5devel Check your installation or specify the correct path using the option -L. initdb: removing data directory "data-foo" I think the newline needs to go to stdout instead. But in any case, that's just one error out of many similar ones that can happen during initdb. If we care enough to fix this, we should fix them all. Not sure it's worth the churn, though, unless you can come up with some kind of a simple wholesale solution, rather than adding fprintf(stdout, "\n") calls to every error condition. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
Bruce Momjian wrote: > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > > Standard regression tests are helpful because patch authors include new test > > cases in the patches that stress their new options or commands. This new > > test > > framework needs to be something that internally runs the regression tests > > and > > exercises DDL deparsing as the regression tests execute DDL. That would > > mean > > that new commands and options would automatically be deparse-tested by our > > new > > framework as soon as patch authors add standard regress support. > > Are you saying every time a new option is added to a command that a new > regression test needs to be added? Not necessarily -- an existing test could be modified, as well. > We don't normally do that, I sure hope we do have all options covered by tests. > and it could easily bloat the regression tests over time. We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this qualify as bloat? > In summary, this testing will help, but it will not be fully reliable. No testing is ever fully reliable. If it were, there would never be bugs. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > Standard regression tests are helpful because patch authors include new test > cases in the patches that stress their new options or commands. This new test > framework needs to be something that internally runs the regression tests and > exercises DDL deparsing as the regression tests execute DDL. That would mean > that new commands and options would automatically be deparse-tested by our new > framework as soon as patch authors add standard regress support. Are you saying every time a new option is added to a command that a new regression test needs to be added? We don't normally do that, and it could easily bloat the regression tests over time. In summary, this testing will help, but it will not be fully reliable. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API
On 2 December 2014 at 15:36, Craig Ringer wrote: > On 12/01/2014 09:51 PM, Marco Nenciarini wrote: > > I think this is a leftover, as you don't use elog afterwards. > > Good catch, fixed. > > I've looked over this again and tested it on a windows 8.1 machine. I cannot find any problems The only comments about the code I have would maybe be to use some constants like: #define FILETIME_PER_SEC 1000L #define FILETIME_PER_USEC 10 I had to read the Microsoft documentation to see that "A file time is a 64-bit value that represents the number of 100-nanosecond intervals that have elapsed since 12:00 A.M. January 1, 1601 Coordinated Universal Time (UTC)." http://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx The attached patch gets rid of those magic numbers, and hopefully makes it a bit easier to see what's going on. I agree with the lack of real need to log any sort of errors if init_win32_gettimeofday() gets any unexpected errors while trying to lookup GetSystemTimePreciseAsFileTime. I'm marking this as ready for committer. It seems worth going in just for the performance improvement alone, never mind the increased clock accuracy. I'll leave it up to the committer to decide if it's better with or without the attached patch. Regards David Rowley diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c index ab4f491..f4d8393 100644 --- a/src/port/gettimeofday.c +++ b/src/port/gettimeofday.c @@ -35,6 +35,13 @@ static const unsigned __int64 epoch = UINT64CONST(1164447360); /* + * FILETIME represents the number of 100-nanosecond intervals since + * January 1, 1601 (UTC). + */ +#define FILETIME_PER_SEC 1000L +#define FILETIME_PER_USEC 10 + +/* * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a * signature, so we can just store a pointer to whichever we find. This * is the pointer's type. @@ -98,8 +105,9 @@ gettimeofday(struct timeval * tp, struct timezone * tzp) ularge.LowPart = file_time.dwLowDateTime; ularge.HighPart = file_time.dwHighDateTime; - tp->tv_sec = (long) ((ularge.QuadPart - epoch) / 1000L); - tp->tv_usec = (long) (((ularge.QuadPart - epoch) % 1000L) / 10); + tp->tv_sec = (long) ((ularge.QuadPart - epoch) / FILETIME_PER_SEC); + tp->tv_usec = (long) (((ularge.QuadPart - epoch) % FILETIME_PER_SEC) + / FILETIME_PER_USEC); return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq pipelining
On 12/05/2014 02:30 AM, Matt Newell wrote: The explanation of PQgetFirstQuery makes it sound pretty hard to match up the result with the query. You have to pay attention to PQisBusy. PQgetFirstQuery should also be valid after calling PQgetResult and then you don't have to worry about PQisBusy, so I should probably change the documentation to indicate that is the preferred usage, or maybe make that the only guaranteed usage, and say the results are undefined if you call it before calling PQgetResult. That usage also makes it consistent with PQgetLastQuery being called immediately after PQsendQuery. I changed my second example to call PQgetFirstQuery after PQgetResult instead of before, and that removes the need to call PQconsumeInput and PQisBusy when you don't mind blocking. It makes the example super simple: PQsendQuery(conn, "INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT) RETURNING id"); query1 = PQgetLastQuery(conn); /* Duplicate primary key error */ PQsendQuery(conn, "UPDATE test SET id=2 WHERE id=1"); query2 = PQgetLastQuery(conn); PQsendQuery(conn, "SELECT * FROM test"); query3 = PQgetLastQuery(conn); while( (result = PQgetResult(conn)) != NULL ) { curQuery = PQgetFirstQuery(conn); if (curQuery == query1) checkResult(conn,result,curQuery,PGRES_TUPLES_OK); if (curQuery == query2) checkResult(conn,result,curQuery,PGRES_FATAL_ERROR); if (curQuery == query3) checkResult(conn,result,curQuery,PGRES_TUPLES_OK); } Note that the curQuery == queryX check will work no matter how many results a query produces. Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I didn't understand that before. I'd suggest renaming them to something like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names made me think that they're used to iterate a list of queries, but in fact they're supposed to be used at very different stages. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] check-world failure: dummy_seclabel
On 12/05/2014 07:29 AM, Adam Brightwell wrote: All, I've noticed that 'check-world' fails for dummy_seclabel after a 'clean'. I believe that in commit da34731, the EXTRA_CLEAN statement should have been removed from 'src/test/modules/dummy_seclabel/Makefile' as well. Ah, that's why those files kept disappearing. Attached is a proposed patch that addresses this issue. Applied, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLOG_PARAMETER_CHANGE (WAL record) missing two params in its desc routine
On 12/05/2014 04:54 AM, Michael Paquier wrote: Hi all, While reading the code in this area this morning, I noticed that wal_log_hints and track_commit_timestamp are not mentioned in the desc routine of XLOG_CHANGE_PARAMETER. Also, it is not mentioned in postgresql.conf.sample that a value update of wal_log_hints requires a system restart. In order to fix those things, attached are two patches: - patch 1 should be applied back to 9.4 where wal_log_hints has been added You got the wal_level and wal_log_hints strings backwards: rmgr: XLOGlen (rec/tot): 24/50, tx: 0, lsn: 0/07172488, prev 0/07172418, desc: PARAMETER_CHANGE max_connections=100 max_worker_processes=8 max_prepared_xacts=0 max_locks_per_xact=64 wal_level=false wal_log_hints=hot_standby Fixed that, and changed "true"/"false" to "on"/"off", as that's more common phrasing for boolean GUCs. - patch 2 is master-only, and needs to be applied on top of patch 1. Same here. Committed both patches with those fixes, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: Friday, December 05, 2014 5:10 PM To: Amit Langote Cc: Jim Nasby; Robert Haas; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers Subject: Re: [HACKERS] On partitioning On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote wrote: > From: Amit Kapila [mailto:amit.kapil...@gmail.com] > On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote > wrote: > > > > > The more SQL way would be records (composite types). That would make > > > catalog inspection a LOT easier and presumably make it easier to change > > > the > > > partitioning key (I'm assuming ALTER TYPE cascades to stored data). > > > Records > > > are stored internally as tuples; not sure if that would be faster than a > > > List of > > > Consts or a pg_node_tree. Nodes would theoretically allow using things > > > other > > > than Consts, but I suspect that would be a bad idea. > > > > > > > While I couldn’t find an example in system catalogs where a > > record/composite type is used, there are instances of pg_node_tree at a > > number of places like in pg_attrdef and others. Could you please point me > > to such a usage for reference? > > > > > I think you can check the same by manually creating table > > with a user-defined type. > > > Create type typ as (f1 int, f2 text); > > Create table part_tab(c1 int, c2 typ); > > Is there such a custom-defined type used in some system catalog? Just not > sure how one would put together a custom type to use in a system catalog > given the way a system catalog is created. That's my concern but it may not > be valid. > > > I think you are right. I think in this case we need something similar > to column pg_index.indexprs which is of type pg_node_tree(which > seems to be already suggested by Robert). So may be we can proceed > with this type and see if any one else has better idea. One point raised about/against pg_node_tree was the values represented therein would turn out to be too generalized to be used with advantage during planning. But, it seems we could deserialize it in advance back to the internal form (like an array of a struct) as part of the cached relation data. This overhead would only be incurred in case of partitioned tables. Perhaps this is what Robert suggested elsewhere. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New wal format distorts pg_xlogdump --stats
On 12/05/2014 02:31 AM, Andres Freund wrote: On 2014-12-05 08:58:33 +0900, Michael Paquier wrote: On Fri, Dec 5, 2014 at 8:09 AM, Andres Freund wrote: On 2014-12-04 16:26:02 +0200, Heikki Linnakangas wrote: Yeah, that's broken. I propose the attached. Or does anyone want to argue for adding an XLogRecGetFPILen() accessor macro for the hole_length in xlogreader.h. It's not something a redo routine would need, nor most XLOG-reading applications, so I thought it's better to just have pg_xlogdump peek into the DecodedBkpBlock struct directly. I think both would be justifyable, so I don't really care for now. One slight reason for wrapping it in xlogreader.h is that we might add compression of some form or another soon and it'd possibly be better to have it in xlogreader.h so pg_xlogdump doesn't have to be changed. But that's really rather minor. Good point about the compression patch. I committed this as posted. It's not yet clear how the compression patch is going to do the compression. If we just compress all records above a certain size, pg_xlogdump should display the compressed and uncompressed sizes separately, and an accessor macro for the backup block's size would help much. The compression patch will almost certainly need to modify pg_xlogdump --stats anyway, so there's not much point trying to guess now what it's going to need. If we go this road and want to be complete, you may as well add access macros for the image offset, the block image and for the block data stuff. That would be handy and consistent with the rest, now both approaches are fine as long as DecodedBkpBlock is in xlogreader.h. I don't see the point. Let's introduce that if (which I doubt a bit) there's a user. Agreed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Fri, 2014-12-05 at 00:21 -0800, Peter Geoghegan wrote: > On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen > wrote: > > For Django's use case this is a requirement. We must inform the user if > > the save() action created a new row or if it modified an existing one. > > Can you explain that in more detail, please? Django has a post_save signal. The signal provide information of the save operation. One piece of information is a created boolean flag. When set to True the operation was an insert, on False it was an update. See https://docs.djangoproject.com/en/1.7/ref/signals/#django.db.models.signals.post_save for details. The created flag is typically used to perform some related action. An example is User and UserProfile models. Each user must have an UserProfile, so post_save can be used to create an empty userprofile on creation of user. If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in Django for the existing save() method, then it needs to know if the result was an UPDATE or INSERT. If we are going to use this for other operations (for example bulk merge of rows to the database), it would be very convenient to have per-row updated/created information available so that we can fire the post_save signals for the rows. If we don't have that information available, it means we can't fire signals, and no signals means we can't use the bulk merge operation internally as we have to fire the signals where that happened before. Outside of Django there are likely similar reasons to want to know if the result of an operation was a creation of a new row. The reason could be creation of related row, doing some action in application layer, or just UI message telling "object created successfully" vs "object updated successfully". - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On Thu, Dec 04, 2014 at 02:42:41PM +0200, Heikki Linnakangas wrote: > On 10/06/2014 04:21 PM, Heikki Linnakangas wrote: > >This probably needs some further cleanup before it's ready for > >committing. One issues is that it creates a temporary cluster that > >listens for TCP connections on localhost, which isn't safe on a > >multi-user system. > > This issue remains. There isn't much we can do about it; SSL doesn't work > over Unix domain sockets. We could make it work, but that's a whole > different feature. A large subset of the test suite could be made secure. Omit or lock down "trustdb", and skip affected tests. (Perhaps have an --unsafe-tests option to reactivate them.) Instead of distributing frozen keys, generate all keys on-demand. Ensure that key files have secure file modes from the start. Having said that, ... > How do people feel about including this test suite in the source tree? It's > probably not suitable for running as part of "make check-world", but it's > extremely handy if you're working on a patch related to SSL. I'd like to > commit this, even if it has some rough edges. That way we can improve it > later, rather than have it fall into oblivion. Any objections? ... +1 for having this suite in the tree, even if check-world ignores it. Echoing Tom's comment, the README should mention its security weakness. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}
On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen wrote: > For Django's use case this is a requirement. We must inform the user if > the save() action created a new row or if it modified an existing one. Can you explain that in more detail, please? > Another way to do this would be to expose the "excluded" alias in the > returning clause. All columns of the excluded alias would be null in > the case of insert (especially the primary key column), and thus if a > query > insert into foobar values(2, '2') on conflict (id) update set > other_col=excluded.other_col returning excluded.id > returns a non-null value, then it was an update. I don't like that idea much, TBH. Consider this: postgres=# update upsert u set key = 1 from upsert i returning key; ERROR: 42702: column reference "key" is ambiguous LINE 1: update upsert u set key = 1 from upsert i returning key; ^ So, suppose this example was actually an ON CONFLICT UPDATE query. If I similarly make the aliases in the ON CONFLICT UPDATE ("target"/"excluded") visible in the returning list, it becomes necessary to qualify every column - an ambiguity is introduced by making both aliases visible, since any non-qualified column in the RETURNING clause could be from either the "target" or "excluded" alias/RTE. This is particularly annoying for the common, simple cases. Also, when there was an update in respect of a any given slot, how, in general, can I be sure that *any* visible excluded.* attribute is not null (which you suggest as a reliable proxy for the update path having been taken)? For one thing, the unique index that arbitrates whether or not we take the "alternative path" is not restricting to covering non-nullable attributes. So does the user end up specifying system/hidden atrributes, just to make what you outline work? That seems sort of messy. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
From: Amit Kapila [mailto:amit.kapil...@gmail.com] On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote wrote: > From: Amit Kapila [mailto:amit.kapil...@gmail.com] > On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote > wrote: > > > > > The more SQL way would be records (composite types). That would make > > > catalog inspection a LOT easier and presumably make it easier to change > > > the > > > partitioning key (I'm assuming ALTER TYPE cascades to stored data). > > > Records > > > are stored internally as tuples; not sure if that would be faster than a > > > List of > > > Consts or a pg_node_tree. Nodes would theoretically allow using things > > > other > > > than Consts, but I suspect that would be a bad idea. > > > > > > > While I couldn’t find an example in system catalogs where a > > record/composite type is used, there are instances of pg_node_tree at a > > number of places like in pg_attrdef and others. Could you please point me > > to such a usage for reference? > > > > > I think you can check the same by manually creating table > > with a user-defined type. > > > Create type typ as (f1 int, f2 text); > > Create table part_tab(c1 int, c2 typ); > > Is there such a custom-defined type used in some system catalog? Just not > sure how one would put together a custom type to use in a system catalog > given the way a system catalog is created. That's my concern but it may not > be valid. > > I think you are right. I think in this case we need something similar > to column pg_index.indexprs which is of type pg_node_tree(which > seems to be already suggested by Robert). So may be we can proceed > with this type and see if any one else has better idea. Yeah, with that, I was thinking we may be able to do something like dump a Node that describes the range partition bounds or list of allowed values (say, RangePartitionValues, ListPartitionValues). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote wrote: > From: Amit Kapila [mailto:amit.kapil...@gmail.com] > On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote < langote_amit...@lab.ntt.co.jp> wrote: > > > > > The more SQL way would be records (composite types). That would make > > > catalog inspection a LOT easier and presumably make it easier to change the > > > partitioning key (I'm assuming ALTER TYPE cascades to stored data). Records > > > are stored internally as tuples; not sure if that would be faster than a List of > > > Consts or a pg_node_tree. Nodes would theoretically allow using things other > > > than Consts, but I suspect that would be a bad idea. > > > > > > > While I couldn’t find an example in system catalogs where a record/composite type is used, there are instances of pg_node_tree at a number of places like in pg_attrdef and others. Could you please point me to such a usage for reference? > > > > > I think you can check the same by manually creating table > > with a user-defined type. > > > Create type typ as (f1 int, f2 text); > > Create table part_tab(c1 int, c2 typ); > > Is there such a custom-defined type used in some system catalog? Just not sure how one would put together a custom type to use in a system catalog given the way a system catalog is created. That's my concern but it may not be valid. > I think you are right. I think in this case we need something similar to column pg_index.indexprs which is of type pg_node_tree(which seems to be already suggested by Robert). So may be we can proceed with this type and see if any one else has better idea. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com