Re: [HACKERS] Multi-tenancy with RLS
On Fri, Oct 9, 2015 at 2:04 PM, Stephen Frostwrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> We've got one reloption for views already - security_barrier. Maybe >> we could have another one that effectively changes a particular view >> from "security definer" as it is today to "security invoker". > > As I recall, there was a previous suggestion (honestly, I thought it was > your idea) to have a reloption which made views "fully" security > definer, in that functions in the view definition would run as the view > owner instead of the view invoker. > > I liked that idea, though we would need to have a function to say "who > is the 'outer' user?" (CURRENT_USER always being the owner with the > above described reloption). > > I'm less sure about the idea of having a view which runs entirely as the > view invoker, but I'm not against it either. I changed in function check_enable_rls to use the invoker id instead of owner id for all the system objects, the catalog table policies are getting applied and it is working fine till now in my multi-tenancy testing. Currently I am writing tests to validate it against all user objects also. If this change works for all user objects also, then we may not needed the security invoker reloption. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RLS bug in expanding security quals
During the testing of multi-tenancy feature from system catalog views, that is described in [1], found a problem in executing "user_privileges" view from information_schema. The following is the minimal test sql that reproduces the problem. SELECT (u_grantor.rolname) AS grantor, (grantee.rolname) AS grantee FROM pg_authid u_grantor, ( SELECT pg_authid.oid, pg_authid.rolname FROM pg_authid UNION ALL SELECT (0)::oid AS oid, 'PUBLIC'::name) grantee(oid, rolname) WHERE (grantee.rolname = 'PUBLIC'::name) >From further analysis, I found that the same issue can happen with user tables also. Attached rls_failure.sql file has test steps to reproduce the issue. The problem is, while expanding security quals in function expand_security_quals, the relation u_grantor and test_tbl are to be expanded as they are the relations which have security quals. Following is the debug information of parse->rtable that shows the details of each RangeTblEntry. $69 = {type = T_Alias, aliasname = 0x19bd870 "u_grantor", colnames = 0x19bd890} (gdb) p *((RangeTblEntry *)parse->rtable->head->next->next->next->data.ptr_value)->eref $70 = {type = T_Alias, aliasname = 0x19bffc8 "grantee", colnames = 0x19bffe0} (gdb) p *((RangeTblEntry *)parse->rtable->head->next->next->next->next->data.ptr_value)->eref $71 = {type = T_Alias, aliasname = 0x19c3a60 "*SELECT* 1", colnames = 0x19c3a80} (gdb) p *((RangeTblEntry *)parse->rtable->head->next->next->next->next->next->data.ptr_value)->eref $72 = {type = T_Alias, aliasname = 0x19c40d8 "*SELECT* 2", colnames = 0x19c40f8} (gdb) p *((RangeTblEntry *)parse->rtable->head->next->next->next->next->next->next->data.ptr_value)->eref $73 = {type = T_Alias, aliasname = 0x19c4648 "test_tbl", colnames = 0x19c4668} In expand_security_qual function, the security_barrier_replace_vars function is called to prepare the context.targetlist. But this function doesn't generate targetlist for test_tbl RangeTblEntry. Because of this reason, while accessing the targetlist, it fails and throws an error. In case if the policy is changed to below other than specified in the rls_failure.sql create policy test_tbl_policy on test_tbl for select using(true); the query execution passes, because in expand_security_quals function, the rangeTableEntry_used function returns false for test_tbl entry, thus it avoids expanding the security qual. Any ideas how to handle this problem? Regards, Hari Babu Fujitsu Australia [1] - http://www.postgresql.org/message-id/cajrrpgd2cf4hz_edpx+uqjv1ytkajs_wjdiwj7pzzuuqwou...@mail.gmail.com rls_failure.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS bug in expanding security quals
On Thu, Oct 8, 2015 at 2:54 PM, Stephen Frost <sfr...@snowman.net> wrote: > Haribabu, > > * Haribabu Kommi (kommi.harib...@gmail.com) wrote: >> During the testing of multi-tenancy feature from system catalog views, that >> is described in [1], found a problem in executing "user_privileges" view >> from information_schema. The following is the minimal test sql that >> reproduces the problem. > > Interesting, thanks. > >> >From further analysis, I found that the same issue can happen with user >> tables also. Attached >> rls_failure.sql file has test steps to reproduce the issue. > > Just to make sure we're on the same page, this results in this assertion > being tripped: > > TRAP: FailedAssertion("!(var->varattno <= rel->max_attr)", File: > "/home/sfrost/git/pg/dev/postgresql/src/backend/optimizer/path/costsize.c", > Line: 4152) > > Due to var->varattno being 1 and rel->max_attr being 0. Yes, the same the assertion problem with assert build. without assert build, query fails with the following error. ERROR: invalid attnum -2 for rangetable entry test_tbl >> Any ideas how to handle this problem? > > It's quite late here, but I'll take a look at this in more depth > tomorrow. > > Based on what the Assert's testing, I took an educated guess and tried > running without the UNION ALL, which appeared to work correctly. Yes, it works fine without UNION ALL. And also if we change the table column datatype from name to char, the "pull_up_subqueries" function doesn't pull the union all because of datatype mismatch and it works fine even with row level security is enabled. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-tenancy with RLS
On Tue, Oct 6, 2015 at 10:56 AM, Haribabu Kommi <kommi.harib...@gmail.com> wrote: > Here I attached an updated version of the patch with the following changes. I found some problems related to providing multi-tenancy on a system catalog view. This is because, system catalog view uses the owner that is created the user instead of the current user by storing the user information in "checkAsUser" field in RangeTblEntry structure. The same "checkAsUser" is used in check_enable_rls function before getting the policies for the table. All the system catalog views are created by the super user, so no row level security policies are applied to the views. Ex- SET SESSION ROLE tenancy_user1; select relname from pg_class where relname = 'tenancy_user2_tbl1'; relname - (0 rows) select schemaname, relname from pg_stat_all_tables where relname = 'tenancy_user2_tbl1'; schemaname | relname + public | tenancy_user2_tbl1 (1 row) The policy that is created on pg_class system catalog table is, get all the objects that current user have permissions. Permissions can be anything. To fix the problem, I thought of using current session id instead of "checkAsUser" while applying row level security policies to system catalog objects. This doesn't affect the normal objects. But this solution has given some problems for foreign_data.sql while running the regress tests as the planner is generating targetlist as NULL. Is the above specified solution is the correct approach to handle this problem? If it is i will check the foreign_data.sql problem, otherwise is there any good approach to handle the same? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-tenancy with RLS
On Tue, Oct 6, 2015 at 10:29 PM, Stephen Frost <sfr...@snowman.net> wrote: > * Haribabu Kommi (kommi.harib...@gmail.com) wrote: >> On Tue, Oct 6, 2015 at 10:56 AM, Haribabu Kommi >> <kommi.harib...@gmail.com> wrote: >> > Here I attached an updated version of the patch with the following changes. >> >> I found some problems related to providing multi-tenancy on a system >> catalog view. >> This is because, system catalog view uses the owner that is created >> the user instead >> of the current user by storing the user information in "checkAsUser" >> field in RangeTblEntry >> structure. > > Right, when querying through a view to tables underneath, we use the > permissions of the view owner. View creators should be generally aware > of this already. > > I agree that it adds complications to the multi-tenancy idea since the > system views, today, allow viewing of all objects. There are two ways > to address that: > > Modify the system catalog views to include the same constraints that the > policies on the tables do > > or > > Allow RLS policies against views and then create the necessary policies > on the views in the catalog. > > My inclination is to work towards the latter as that's a capability we'd > like to have anyway. Thanks for the solutions to handle the problem. Currently I thought of providing two multi-tenancy solutions to the user. They are: 1. Tenancy at shared system catalog tables level 2. Tenancy at database system catalog tables. User can create views on system catalog tables, even though I want to provide tenancy on those views also. I will do further analysis and provide details of which solution gives the benefit of two tenancy levels and then I can proceed for implementation after discussion. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-tenancy with RLS
On Fri, Sep 11, 2015 at 7:50 AM, Joe Conway <m...@joeconway.com> wrote: > On 09/01/2015 11:25 PM, Haribabu Kommi wrote: >> If any user is granted any permissions on that object then that user >> can view it's meta data of that object from the catalog tables. >> To check the permissions of the user on the object, instead of >> checking each and every available option, I just added a new >> privilege check option called "any". If user have any permissions on >> the object, the corresponding permission check function returns >> true. Patch attached for the same. >> >> Any thoughts/comments? > > Thanks for working on this! Overall I like the concept and know of use > cases where it is critical and should be supported. Some comments: Here I attached an updated version of the patch with the following changes. Two options to the user to create catalog security on system catalog tables. ./initdb -C -D data With the above option during initdb, the catalog security is enabled on all shared system catalog tables. With this way the user can achieve the catalog security at database level. For some users this may be enough. Currently enabling catalog security is supported only at initdb. ALTER DATABASE WITH CATALOG SECURITY=true; ALTER DATABASE WITH CATALOG SECURITY=false; With the above commands, user can enable/disable catalog security on a database system catalog tables if multi-tenancy requires at table level. Currently setting catalog security at create database command is not supported. And also with alter database command to enable/disable to database where the backend is not connected. This is because of a restriction to execute the policy commands without connecting to a database. Pending things needs to be taken care: 1. select * from tenancy_user1_tbl1; ERROR: permission denied for relation tenancy_user1_tbl1 As we are not able to see the above user table in any catalog relation because of the multi-tenancy policies, but if user tries to select the data of the table directly, The error message comes as permission denied, I feel instead of the permission denied error, in case of multi-tenancy is enabled, the error message should be "relation doesn't exist". 2. Correct all catalog relation policies 3. Add regression tests for all system catalog relations and views. 4. Documentation changes Any comments? Regards, Hari Babu Fujitsu Australia multi-tenancy_with_rls_poc_3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Tue, Aug 4, 2015 at 2:18 AM, Jeff Janeswrote: > On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs wrote: >> >> On 22 July 2015 at 17:11, Jeff Janes wrote: >>> >>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas >>> wrote: On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes wrote: > Attached is a patch that implements the vm scan for truncation. It > introduces a variable to hold the last blkno which was skipped during > the > forward portion. Any blocks after both this blkno and after the last > inspected nonempty page (which the code is already tracking) must have > been > observed to be empty by the current vacuum. Any other process > rendering the > page nonempty are required to clear the vm bit, and no other process > can set > the bit again during the vacuum's lifetime. So if the bit is still > set, the > page is still empty without needing to inspect it. Urgh. So if we do this, that forever precludes having HOT pruning set the all-visible bit. >>> >>> >>> I wouldn't say forever, as it would be easy to revert the change if >>> something more important came along that conflicted with it. >> >> >> I think what is being said here is that someone is already using this >> technique, or if not, then we actively want to encourage them to do so as an >> extension or as a submission to core. >> >> In that case, I think the rely-on-VM technique sinks again, sorry Jim, >> Jeff. Probably needs code comments added. > > > Sure, that sounds like the consensus. The VM method was very efficient, but > I agree it is pretty fragile and restricting. > >> >> >> That does still leave the prefetch technique, so all is not lost. >> >> Can we see a patch with just prefetch, probably with a simple choice of >> stride? Thanks. > > > I probably won't get back to it this commit fest, so it can be set to > returned with feedback. But if anyone has good ideas for how to set the > stride (or detect that it is on SSD and so is pointless to try) I'd love to > hear about them anytime. I got the following way to get the whether data file is present in the DISK or SSD. 1. Get the device file system that table data file is mapped using the following or similar. df -P "filename" | awk 'NR==2{print $1}' 2. if the device file system is of type /dev/sd* then treat is as a disk system and proceed with the prefetch optimization. 3. if we are not able to find the device details directly then we need to get the information from the mapping system. Usually the devices will map like the following /dev/mapper/v** points to ../dm-* 4. Get the name of the "dm-*" from the above details and check whether it is a SSD or not with the following command. /sys/block/dm-*/queue/rotation 5. If the value is 0 then it is an SSD drive, 1 means disk drive. The described above procedure works only for linux. I didn't check for other operating systems yet. Is it worth to consider? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Sep 18, 2015 at 9:45 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Fri, Sep 18, 2015 at 1:33 PM, Haribabu Kommi <kommi.harib...@gmail.com> > wrote: >> >> On Thu, Sep 3, 2015 at 8:21 PM, Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> > >> > Attached, find the rebased version of patch. >> >> Here are the performance test results: >> >> Query selectivityHashAgg HashAgg >> (million) + seqscan(ms)+ >> parallel seq scan(ms) >> 2 >> workers 4 workers 8 workers >> $1 <= '001' 0.1 16717.00 7086.00 >> 4459.00 2912.00 >> $1 <= '004' 0.4 17962.00 7410.00 >> 4651.00 2977.00 >> $1 <= '008' 0.8 18870.00 7849.00 >> 4868.00 3092.00 >> $1 <= '016' 1.5 21368.00 8645.00 >> 6800.00 3486.00 >> $1 <= '030' 2.7 24622.00 14796.0013108.00 >> 9981.00 >> $1 <= '060' 5.4 31690.00 29839.0026544.00 >> 23814.00 >> $1 <= '080' 7.2 37147.00 40485.0035763.00 >> 32679.00 >> > > I think here probably when the selectivity is more than 5, then it should > not have selected Funnel plan. Have you by any chance changed > cpu_tuple_comm_cost? If not, then you can try by setting value of > parallel_setup_cost (may be 10) and then see if it selects the Funnel > Plan. Is it possible for you to check the cost difference of Sequence > and Funnel plan, hopefully explain or explain analyze should be sufficient? Yes, I changed cpu_tuple_comm_cost to zero to observe how parallel seq scan performs in high selectivity. Forgot to mention in the earlier mail. Overall the parallel seq scan performance is good. >> And also attached perf results for selectivity of 0.1 million and 5.4 >> million cases for analysis. >> > > I have checked perf reports and it seems that when selectivity is more, it > seems to be spending time in some kernel calls which could be due > communication of tuples. Yes. And also in low selectivity with increase of workers, tas and s_lock functions usage is getting increased. May be these are also one of the reasons for scaling problem. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Sep 17, 2015 at 6:10 AM, Robert Haaswrote: > On Thu, Sep 10, 2015 at 12:12 AM, Amit Kapila wrote: >>> 2. I think it's probably a good idea - at least for now, and maybe >>> forever - to avoid nesting parallel plans inside of other parallel >>> plans. It's hard to imagine that being a win in a case like this, and >>> it certainly adds a lot more cases to think about. >> >> I also think that avoiding nested parallel plans is a good step forward. > > Doing that as a part of the assess parallel safety patch was trivial, so I > did. > I tried with latest HEAD code, seems to be problem is present in other scenarios. postgres=# explain select * from tbl a where exists (select 1 from tbl b where a.f1=b.f1 limit 0); QUERY PLAN -- Funnel on tbl a (cost=0.00..397728310227.27 rows=500 width=214) Filter: (SubPlan 1) Number of Workers: 10 -> Partial Seq Scan on tbl a (cost=0.00..397727310227.27 rows=500 width=214) Filter: (SubPlan 1) SubPlan 1 -> Limit (cost=0.00..437500.00 rows=1 width=0) -> Seq Scan on tbl b (cost=0.00..437500.00 rows=1 width=0) Filter: (a.f1 = f1) SubPlan 1 -> Limit (cost=0.00..437500.00 rows=1 width=0) -> Seq Scan on tbl b (cost=0.00..437500.00 rows=1 width=0) Filter: (a.f1 = f1) (13 rows) postgres=# explain analyze select * from tbl a where exists (select 1 from tbl b where a.f1=b.f1 limit 0); ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... LOG: worker process: parallel worker for PID 8775 (PID 9121) exited with exit code 1 ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... LOG: worker process: parallel worker for PID 8775 (PID 9116) exited with exit code 1 LOG: worker process: parallel worker for PID 8775 (PID 9119) exited with exit code 1 ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... LOG: worker process: parallel worker for PID 8775 (PID 9117) exited with exit code 1 LOG: worker process: parallel worker for PID 8775 (PID 9114) exited with exit code 1 ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... LOG: worker process: parallel worker for PID 8775 (PID 9118) exited with exit code 1 ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... CONTEXT: parallel worker, pid 9115 STATEMENT: explain analyze select * from tbl a where exists (select 1 from tbl b where a.f1=b.f1 limit 0); LOG: worker process: parallel worker for PID 8775 (PID 9115) exited with exit code 1 LOG: worker process: parallel worker for PID 8775 (PID 9120) exited with exit code 1 ERROR: badly formatted node string "SUBPLAN :subLinkType 0 :testexpr"... CONTEXT: parallel worker, pid 9115 Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Sep 17, 2015 at 12:03 PM, Amit Kapilawrote: > As mentioned previously [1], we have to do two different things to make > this work, Robert seems to have taken care of one of those (basically > second point in mail[1]) and still another one needs to be taken care > which is to provide support of reading subplans in readfuncs.c and that > will solve the problem you are seeing now. Thanks for the information. During my test, I saw a plan change from parallel seq scan to seq scan for the first reported query. So I thought that all scenarios are corrected as not to generate the parallel seq scan. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Sep 10, 2015 at 2:12 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Thu, Sep 10, 2015 at 4:16 AM, Robert Haas <robertmh...@gmail.com> wrote: >> >> On Wed, Sep 9, 2015 at 11:07 AM, Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> > On Wed, Sep 9, 2015 at 11:47 AM, Haribabu Kommi >> > <kommi.harib...@gmail.com> >> > wrote: >> >> With subquery, parallel scan is having some problem, please refer >> >> below. >> >> >> >> postgres=# explain analyze select * from test01 where kinkocord not in >> >> (select kinkocord from test02 where tenpocord = '001'); >> >> ERROR: badly formatted node string "SUBPLAN :subLinkType 2 >> >> :testexpr"... >> >> CONTEXT: parallel worker, pid 32879 >> >> postgres=# >> > >> > The problem here is that readfuncs.c doesn't have support for reading >> > SubPlan nodes. I have added support for some of the nodes, but it seems >> > SubPlan node also needs to be added. Now I think this is okay if the >> > SubPlan >> > is any node other than Funnel, but if Subplan contains Funnel, then each >> > worker needs to spawn other workers to execute the Subplan which I am >> > not sure is the best way. Another possibility could be store the >> > results of >> > Subplan in some tuplestore or some other way and then pass those to >> > workers >> > which again doesn't sound to be promising way considering we might have >> > hashed SubPlan for which we need to build a hashtable. Yet another way >> > could be for such cases execute the Filter in master node only. >> >> IIUC, there are two separate issues here: >> > > Yes. > >> 1. We need to have readfuncs support for all the right plan nodes. >> Maybe we should just bite the bullet and add readfuncs support for all >> plan nodes. But if not, we can add support for whatever we need. >> >> 2. I think it's probably a good idea - at least for now, and maybe >> forever - to avoid nesting parallel plans inside of other parallel >> plans. It's hard to imagine that being a win in a case like this, and >> it certainly adds a lot more cases to think about. >> > > I also think that avoiding nested parallel plans is a good step forward. > I reviewed the parallel_seqscan_funnel_v17.patch and following are my comments. I will continue my review with the parallel_seqscan_partialseqscan_v17.patch. + if (inst_options) + { + instoptions = shm_toc_lookup(toc, PARALLEL_KEY_INST_OPTIONS); + *inst_options = *instoptions; + if (inst_options) Same pointer variable check, it should be if (*inst_options) as per the estimate and store functions. + if (funnelstate->ss.ps.ps_ProjInfo) + slot = funnelstate->ss.ps.ps_ProjInfo->pi_slot; + else + slot = funnelstate->ss.ss_ScanTupleSlot; Currently, there will not be a projinfo for funnel node. So always it uses the scan tuple slot. In case if it is different, we need to add the ExecProject call in ExecFunnel function. Currently it is not present, either we can document it or add the function call. + if (!((*dest->receiveSlot) (slot, dest))) + break; and +void +TupleQueueFunnelShutdown(TupleQueueFunnel *funnel) +{ + if (funnel) + { + int i; + shm_mq_handle *mqh; + shm_mq *mq; + for (i = 0; i < funnel->nqueues; i++) + { + mqh = funnel->queue[i]; + mq = shm_mq_get_queue(mqh); + shm_mq_detach(mq); + } + } +} Using this function, the backend detaches from the message queue, so that the workers which are trying to put results into the queues gets an error message as SHM_MQ_DETACHED. Then worker finshes the execution of the plan. For this reason all the printtup return types are changed from void to bool. But this way the worker doesn't get exited until it tries to put a tuple in the queue. If there are no valid tuples that satisfy the condition, then it may take time for the workers to exit. Am I correct? I am not sure how frequent such scenarios can occur. + if (parallel_seqscan_degree >= MaxConnections) + { + write_stderr("%s: parallel_scan_degree must be less than max_connections\n", progname); + ExitPostmaster(1); + } The error condition works only during server start. User still can set parallel seqscan degree more than max connection at super user session level and etc. + if (!parallelstmt->inst_options) + (*receiver->rDestroy) (receiver); Why only when there is no instruementation only, the receiver needs to be destroyed? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-tenancy with RLS
On Fri, Sep 11, 2015 at 7:50 AM, Joe Conway <m...@joeconway.com> wrote: > On 09/01/2015 11:25 PM, Haribabu Kommi wrote: >> If any user is granted any permissions on that object then that user >> can view it's meta data of that object from the catalog tables. >> To check the permissions of the user on the object, instead of >> checking each and every available option, I just added a new >> privilege check option called "any". If user have any permissions on >> the object, the corresponding permission check function returns >> true. Patch attached for the same. >> >> Any thoughts/comments? > > Thanks for working on this! Overall I like the concept and know of use > cases where it is critical and should be supported. Some comments: Thanks for the review, I will take care of the comments in the next patch. I didn't find any better approach other than creating policies automatically or providing permission to superuser on system catalog tables. If everyone feels as this is the best approach, then i will create policies for all catalog tables in the next version. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Sep 3, 2015 at 8:21 PM, Amit Kapilawrote: > On Thu, Jul 23, 2015 at 7:43 PM, Kouhei Kaigai wrote: >> >> Hi Amit, >> >> The latest v16 patch cannot be applied to the latest >> master as is. >> 434873806a9b1c0edd53c2a9df7c93a8ba021147 changed various >> lines in heapam.c, so it probably conflicts with this. >> > > Attached, find the rebased version of patch. It fixes the comments raised > by Jeff Davis and Antonin Houska. The main changes in this version are > now it supports sync scan along with parallel sequential scan (refer > heapam.c) > and the patch has been split into two parts, first contains the code for > Funnel node and infrastructure to support the same and second contains > the code for PartialSeqScan node and its infrastructure. > Thanks for the updated patch. With subquery, parallel scan is having some problem, please refer below. postgres=# explain select * from test01 where kinkocord not in (select kinkocord from test02 where tenpocord = '001'); QUERY PLAN -- Funnel on test01 (cost=0.00..155114352184.12 rows=2008 width=435) Filter: (NOT (SubPlan 1)) Number of Workers: 16 -> Partial Seq Scan on test01 (cost=0.00..155114352184.12 rows=2008 width=435) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=0.00..130883.67 rows=385333 width=5) -> Funnel on test02 (cost=0.00..127451.01 rows=385333 width=5) Filter: (tenpocord = '001'::bpchar) Number of Workers: 16 -> Partial Seq Scan on test02 (cost=0.00..127451.01 rows=385333 width=5) Filter: (tenpocord = '001'::bpchar) SubPlan 1 -> Materialize (cost=0.00..130883.67 rows=385333 width=5) -> Funnel on test02 (cost=0.00..127451.01 rows=385333 width=5) Filter: (tenpocord = '001'::bpchar) Number of Workers: 16 -> Partial Seq Scan on test02 (cost=0.00..127451.01 rows=385333 width=5) Filter: (tenpocord = '001'::bpchar) (19 rows) postgres=# explain analyze select * from test01 where kinkocord not in (select kinkocord from test02 where tenpocord = '001'); ERROR: badly formatted node string "SUBPLAN :subLinkType 2 :testexpr"... CONTEXT: parallel worker, pid 32879 postgres=# And also regarding the number of workers (16) that is shown in the explain analyze plan are not actually allotted because the in my configuration i set the max_worker_process as 8 only. I feel the plan should show the allotted workers not the planned workers. If the query execution takes time because of lack of workers and the plan is showing as 16 workers, in that case user may think that even with 16 workers the query is slower, but actually it is not. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
On Mon, Sep 7, 2015 at 4:34 AM, Pavel Stehulewrote: > Hi > > >> >> postgres=# select pg_hba_lookup('postgres','all'); >> pg_hba_lookup >> --- >> (84,local,"[""all""]","[""all""]",,,trust,{}) >> (86,host,"[""all""]","[""all""]",127.0.0.1,,trust,{}) >> (88,host,"[""all""]","[""all""]",::1,,trust,{}) >> >> Here I attached a proof of concept patch for the same. >> >> Any suggestions/comments on this proposed approach? >> > > If I understand well to your proposal, the major benefit is in impossibility > to enter pg_hba keywords - so you don't need to analyse if parameter is > keyword or not? It has sense, although It can be hard to do image about > pg_hba conf from these partial views. >From the function output, it is little bit difficult to map the pg_hba.conf file. Because of problems in processing keywords in where clause of a view, I changed from view to function. Is there any possibility with rule or something, that the where clause details can be passed as function arguments to get the data? > There can be other way - theoretically we can have a function pg_hba_debug > with similar API like pg_hba_conf. The result will be a content of > pg_hba.conf with information about result of any rule. The output of pg_hba_debug function looks like, the entry of pg_hba.conf and the result match for the given input data. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi-tenancy with RLS
On Fri, Aug 14, 2015 at 12:00 PM, Haribabu Kommi <kommi.harib...@gmail.com> wrote: > > Here I attached the proof concept patch. Here I attached an updated patch by adding policies to the most of the system catalog tables, except the following. AggregateRelationId AccessMethodRelationId AccessMethodOperatorRelationId AccessMethodProcedureRelationId AuthMemRelationId CastRelationId EnumRelationId EventTriggerRelationId ExtensionRelationId LargeObjectRelationId LargeObjectMetadataRelationId PLTemplateRelationId RangeRelationId RewriteRelationId TransformRelationId TSConfigRelationId TSConfigMapRelationId TSDictionaryRelationId TSParserRelationId TSTemplateRelationId Following catalog tables needs to create the policy based on the class, so currently didn't added any policy for the same. SecLabelRelationId SharedDependRelationId SharedDescriptionRelationId SharedSecLabelRelationId If any user is granted any permissions on that object then that user can view it's meta data of that object from the catalog tables. To check the permissions of the user on the object, instead of checking each and every available option, I just added a new privilege check option called "any". If user have any permissions on the object, the corresponding permission check function returns true. Patch attached for the same. Any thoughts/comments? Regards, Hari Babu Fujitsu Australia multi-tenancy_with_rls_poc_2.patch Description: Binary data any_privilege_check_option.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multi-tenancy with RLS
This is regarding supporting of multi-tenancy in a single PostgreSQL instance using the row level security feature. The main idea is to have the row level security enabled on system catalog tables, thus the user can get only the rows that are either system objects or the user objects, where the user is the owner. Example: postgres=# create role test login; postgres=# create role test1 login; postgres=# \c postgres test postgres= create table test(f1 int); postgres= \d List of relations Schema | Name | Type | Owner +--+---+--- public | test | table | test (1 row) postgres= \c postgres test1 postgres= create table test1(f1 int); postgres= \d List of relations Schema | Name | Type | Owner +---+---+--- public | test1 | table | test1 (1 row) postgres=# \c postgres test postgres= \d List of relations Schema | Name | Type | Owner +--+---+--- public | test | table | test (1 row) To enable row level security on system catalog tables, currently I added a new database option to create/alter database. The syntax can be changed later. Adding an option to database makes it easier for users to enable/disable the row level security on system catalog tables. CREATE DATABASE USERDB WITH ROW LEVEL SECURITY = TRUE; ALTER DATBASE USERDB WITH ROW LEVEL SECURITY = FALSE; A new boolean column datrowlevelsecurity is added to pg_database system catalog table to display the status of row level security on that database. Currently I just implemented the row level security is enabled only for pg_class system table as a proof of concept. whenever the row level security on the database is enabled/disabled, it internally fires the create policy/remove policy commands using SPI interfaces. Here I attached the proof concept patch. Pending items: 1. Supporting of RLS on all system catalog tables 2. Instead of SPI interfaces, any better way to create/remove policies. Any comments/suggestions regarding the way to achieve multi-tenancy in PostgreSQL? Regards, Hari Babu Fujitsu Australia multi-tenancy_with_rls_poc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Priority table or Cache table
On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: What is the configuration for test (RAM of m/c, shared_buffers, scale_factor, etc.)? Here are the details: CPU - 16 core, RAM - 252 GB shared_buffers - 1700MB, buffer_cache_ratio - 70 wal_buffers - 16MB, synchronous_commit - off checkpoint_timeout - 15min, max_wal_size - 5GB. pgbench scale factor - 75 (1GB) Load test table size - 1GB Threads HeadPatchedDiff 1 3123 3238 3.68% 2 5997 6261 4.40% 4 11102 11407 2.75% I am suspecting that, this may because of buffer locks that are causing the problem. where as in older approach of different buffer pools, each buffer pool have it's own locks. I will try to collect the profile output and analyze the same. Any better ideas? I think you should try to find out during test, for how many many pages, it needs to perform clocksweep (add some new counter like numBufferBackendClocksweep in BufferStrategyControl to find out the same). By theory your patch should reduce the number of times it needs to perform clock sweep. I think in this approach even if you make some buffers as non-replaceable (buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep needs to access all the buffers. I think we might want to find some way to reduce that if this idea helps. Another thing is that, this idea looks somewhat similar (although not same) to current Ring Buffer concept, where Buffers for particular types of scan uses buffers from Ring. I think it is okay to prototype as you have done in patch and we can consider to do something on those lines if at all this patch's idea helps. Thanks for the details. I will try the same. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Priority table or Cache table
On Tue, Aug 11, 2015 at 4:43 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: What is the configuration for test (RAM of m/c, shared_buffers, scale_factor, etc.)? Here are the details: CPU - 16 core, RAM - 252 GB shared_buffers - 1700MB, buffer_cache_ratio - 70 wal_buffers - 16MB, synchronous_commit - off checkpoint_timeout - 15min, max_wal_size - 5GB. pgbench scale factor - 75 (1GB) Load test table size - 1GB It seems that test table can fit easily in shared buffers, I am not sure this patch will be of benefit for such cases, why do you think it can be beneficial for such cases? Yes. This configuration combination is may not be best for the test. The idea behind these setting is to provide enough shared buffers to cache table by tuning the buffer_cache_ratio from 0 to 70% of shared buffers So the cache tables have enough shared buffers and rest of the shared buffers can be used for normal tables i.e load test table. I will try to evaluate some more performance tests with different shared buffers settings and load. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Priority table or Cache table
On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson memissemer...@gmail.com wrote: I also ran the test script after making the same configuration changes that you have specified. I found that I was not able to get the same performance difference that you have reported. Following table lists the tps in each scenario and the % increase in performance. Threads Head PatchedDiff 1 1669 1718 3% 2 2844 3195 12% 4 3909 4915 26% 8 7332 8329 14% coming back to this old thread. I just tried a new approach for this priority table, instead of a entirely separate buffer pool, Just try to use a some portion of shared buffers to priority tables using some GUC variable buffer_cache_ratio(0-75) to specify what percentage of shared buffers to be used. Syntax: create table tbl(f1 int) with(buffer_cache=true); Comparing earlier approach, I though of this approach is easier to implement. But during the performance run, it didn't showed much improvement in performance. Here are the test results. Threads HeadPatchedDiff 1 3123 3238 3.68% 2 5997 6261 4.40% 4 11102 11407 2.75% I am suspecting that, this may because of buffer locks that are causing the problem. where as in older approach of different buffer pools, each buffer pool have it's own locks. I will try to collect the profile output and analyze the same. Any better ideas? Here I attached a proof of concept patch. Regards, Hari Babu Fujitsu Australia cache_table_poc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Caching offsets of varlena attributes
On Thu, Aug 6, 2015 at 4:09 AM, Vignesh Raghunathan vignesh.pg...@gmail.com wrote: Hello, In the function heap_deform_tuple, there is a comment before caching varlena attributes specifying that the offset will be valid for either an aligned or unaligned value if there are no padding bytes. Could someone please elaborate on this? In PostgreSQL tuple can contain two types of varlena headers. Those are short varlena(doesn't need any alignment) and 4-byte varlena(needs alignment). Refer the function heap_fill_tuple to see how the tuple is constructed. For short varlena headers, even if the alignment suggests to do the alignment, we shouldn't not do. Because of this reason instead of att_align_nominal, the att_align_pointer is called. The following is the comment from att_align_pointer macro which gives the details why we should use this macro instead of att_align_nominal. * (A zero byte must be either a pad byte, or the first byte of a correctly * aligned 4-byte length word; in either case we can align safely. A non-zero * byte must be either a 1-byte length word, or the first byte of a correctly * aligned 4-byte length word; in either case we need not align.) Also, why is it safe to call att_align_nominal if the attribute is not varlena? Couldn't att_align_pointer be called for both cases? I am not able to understand how att_align_nominal is faster. All other types definitely needs either char or int or double alignment. Because of this reason it is safe to use the att_align_nominal macro. Please refer the function heap_fill_tuple for more details. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sharing aggregate states between different aggregate functions
On Thu, Jul 9, 2015 at 7:44 PM, David Rowley david.row...@2ndquadrant.com wrote: On 15 June 2015 at 12:05, David Rowley david.row...@2ndquadrant.com wrote: This basically allows an aggregate's state to be shared between other aggregate functions when both aggregate's transition functions (and a few other things) match There's quite a number of aggregates in our standard set which will benefit from this optimisation. After compiling the original patch with another compiler, I noticed a couple of warnings. The attached fixes these. I did some performance tests on the patch. This patch shown good improvement for same column aggregates. With int or bigint datatype columns, this patch doesn't show any visible performance difference. But with numeric datatype it shows good improvement. select sum(x), avg(y) from test where x $1; Different columns: selectivityHeadpatch (millions) 0.1 315 322 0.3 367 376 0.5 419 427 1 551 558 2 824 826 select sum(x), avg(x) from test where x $1; Same column: selectivityHeadpatch (millions) 0.1 314 314 0.3 363 343 0.5 412 373 1 536 440 2 795 586 Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jul 23, 2015 at 9:42 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Jul 22, 2015 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote: One thing I noticed that is a bit dismaying is that we don't get a lot of benefit from having more workers. Look at the 0.1 data. At 2 workers, if we scaled perfectly, we would be 3x faster (since the master can do work too), but we are actually 2.4x faster. Each process is on the average 80% efficient. That's respectable. At 4 workers, we would be 5x faster with perfect scaling; here we are 3.5x faster. So the third and fourth worker were about 50% efficient. Hmm, not as good. But then going up to 8 workers bought us basically nothing. I think the improvement also depends on how costly is the qualification, if it is costly, even for same selectivity the gains will be shown till higher number of clients and for simple qualifications, we will see that cost of having more workers will start dominating (processing data over multiple tuple queues) over the benefit we can achieve by them. Yes, That's correct. when the qualification cost is increased, the performance is also increasing with number of workers. Instead of using all the configured workers per query, how about deciding number of workers based on cost of the qualification? I am not sure whether we have any information available to find out the qualification cost. This way the workers will be distributed to all backends properly. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries
Hi Hackers, This is the patch adds a new function called pg_hba_lookup to get all matching entries from the pg_hba.conf for the providing input data.The rows are displayed from the other the hba conf entries are matched. This is an updated version of previous failure attempt to create a catalog view for the pg_hba.conf [1]. The view is not able to handle the SQL queries properly because keywords that are present in database and user columns. currently the following two types are added: pg_hba_lookup(database, user) pg_hba_lookup(database, user, address, hostname) How it works: With the provided input data, it tries to match the entries of pg_hba.conf and populate the result set with all matching entries. With the recent Tomlane's commit 1e24cf645d24aab3ea39a9d259897fd0cae4e4b6 of Don't leave pg_hba and pg_ident data lying around in running backends [2], the parsed hba conf entries are not available in the backend side. Temporarily I just reverted this patch for the proof of concept purpose. Once we agree with the approach, I will try to find out a solution to the same. How is it useful: With the output of this view, administrator can identify the lines that are matching for the given criteria easily without going through the file. Record format: Column name | datatype --- line_number | int4 type | text database | jsonb user | jsonb address| inet hostname | text method | text options | jsonb Please suggest me for any column additions or data type changes that are required. Example output: postgres=# select pg_hba_lookup('postgres','all'); pg_hba_lookup --- (84,local,[all],[all],,,trust,{}) (86,host,[all],[all],127.0.0.1,,trust,{}) (88,host,[all],[all],::1,,trust,{}) Here I attached a proof of concept patch for the same. Any suggestions/comments on this proposed approach? [1] http://www.postgresql.org/message-id/f40b0968db0a904da78a924e633be78645f...@sydexchtmp2.au.fjanz.com [2] http://www.postgresql.org/message-id/e1zaquy-00072j...@gemulon.postgresql.org Regards, Hari Babu Fujitsu Australia pg_hba_lookup_poc.patch Description: Binary data revert_hba_context_release_in_backend.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Mon, Jul 20, 2015 at 3:31 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jul 17, 2015 at 1:22 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Thu, Jul 16, 2015 at 1:10 PM, Amit Kapila amit.kapil...@gmail.com wrote: Thanks, I will fix this in next version of patch. I am posting in this thread as I am not sure, whether it needs a separate thread or not? I gone through the code and found that the newly added funnel node is is tightly coupled with partial seq scan, in order to add many more parallel plans along with parallel seq scan, we need to remove the integration of this node with partial seq scan. This assumption is wrong, Funnel node can execute any node beneath it (Refer ExecFunnel-funnel_getnext-ExecProcNode, similarly you can see exec_parallel_stmt). Yes, funnel node can execute any node beneath it. But during the planning phase, the funnel path is added on top of partial scan path. I just want the same to enhanced to support other parallel nodes. Yes, currently nodes supported under Funnel nodes are limited like partialseqscan, result (due to reasons mentioned upthread like readfuncs.s doesn't have support to read Plan nodes which is required for worker backend to read the PlannedStmt, ofcourse we can add them, but as we are supportting parallelism for limited nodes, so I have not enhanced the readfuncs.c) but in general the basic infrastructure is designed such a way that it can support other nodes beneath it. To achieve the same, I have the following ideas. Execution: The funnel execution varies based on the below plan node. 1) partial scan - Funnel does the local scan also and returns the tuples 2) partial agg - Funnel does the merging of aggregate results and returns the final result. Basically Funnel will execute any node beneath it, the Funnel node itself is not responsible for doing local scan or any form of consolidation of results, as of now, it has these 3 basic properties – Has one child, runs multiple copies in parallel. – Combines the results into a single tuple stream. – Can run the child itself if no workers available. + if (!funnelstate-local_scan_done) + { + outerPlan = outerPlanState(funnelstate); + + outerTupleSlot = ExecProcNode(outerPlan); From the above code in funnel_getnext function, it directly does the calls the below node to do the scan in the backend side also. This code should refer the below node type, based on that only it can go for the backend scan. I feel executing outer plan always may not be correct for other parallel nodes. Any other better ideas to achieve the same? Refer slides 16-19 in Parallel Sequential Scan presentation in PGCon https://www.pgcon.org/2015/schedule/events/785.en.html Thanks for the information. I don't have very clear idea what is the best way to transform the nodes in optimizer, but I think we can figure that out later unless majority people see that as blocking factor. I am also not finding it as a blocking factor for parallel scan. I written the above mail to get some feedback/suggestions from hackers on how to proceed in adding other parallelism nodes along with parallel scan. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Thu, Jul 16, 2015 at 1:10 PM, Amit Kapila amit.kapil...@gmail.com wrote: Thanks, I will fix this in next version of patch. I am posting in this thread as I am not sure, whether it needs a separate thread or not? I gone through the code and found that the newly added funnel node is is tightly coupled with partial seq scan, in order to add many more parallel plans along with parallel seq scan, we need to remove the integration of this node with partial seq scan. To achieve the same, I have the following ideas. Plan: 1) Add the funnel path immediately for every parallel path similar to the current parallel seq scan, but during the plan generation generate the funnel plan only for the top funnel path and ignore rest funnel paths. 2)Instead of adding a funnel path immediately after the partial seq scan path is generated. Add the funnel path in grouping_planner once the final rel path is generated before creating the plan. Execution: The funnel execution varies based on the below plan node. 1) partial scan - Funnel does the local scan also and returns the tuples 2) partial agg - Funnel does the merging of aggregate results and returns the final result. Any other better ideas to achieve the same? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Thu, Jul 16, 2015 at 10:17 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: Here I attached updated patches, 1) without prefetch logic. 2) with combined vm and prefetch logic. I think it is better to just get the first patch in as that in itself is a clear win and then we can evaluate the second one (prefetching during truncation) separately. I think after first patch, the use case for doing prefetch seems to be less beneficial and I think it could hurt by loading not required pages (assume you have prefetched 32 pages and after inspecting first page, it decides to quit the loop as that is non-empty page and other is when it has prefetched just because for page the visibility map bit was cleared, but for others it is set) in OS buffer cache. Yes, in the above cases, the prefetch is an overhead. I am not sure how frequently such scenarios occur in real time scenarios. Having said that, I am not against prefetching in this scenario as that can help in more cases then it could hurt, but I think it will be better to evaluate that separately. Yes, the prefetch patch works better in cases, where majorly the first vacuum skips the truncation because of lock waiters. If such cases are more in real time scenarios, then the prefetch patch is needed. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote: Attached is a patch that implements the vm scan for truncation. It introduces a variable to hold the last blkno which was skipped during the forward portion. Any blocks after both this blkno and after the last inspected nonempty page (which the code is already tracking) must have been observed to be empty by the current vacuum. Any other process rendering the page nonempty are required to clear the vm bit, and no other process can set the bit again during the vacuum's lifetime. So if the bit is still set, the page is still empty without needing to inspect it. One case where this patch can behave differently then current code is, when before taking Exclusive lock on relation for truncation, if some backend clears the vm bit and then inserts-deletes-prune that page. I think with patch it will not consider to truncate pages whereas current code will allow to truncate it as it re-verifies each page. I think such a case would be rare and we might not want to bother about it, but still worth to think about it once. Thanks for your review. corrected the code as instead of returning the blkno after visibility map check failure, the code just continue to verify the contents as earlier approach. Some minor points about patch: count_nondeletable_pages() { .. if (PageIsNew(page) || PageIsEmpty(page)) { /* PageIsNew probably shouldn't happen... */ UnlockReleaseBuffer(buf); continue; } .. } Why vmbuffer is not freed in above loop? In the above check, we are just continuing to the next blkno, at the end of the loop the vmbuffer is freed. count_nondeletable_pages() { .. + /* + * Pages that were inspected and found to be empty by this vacuum run + * must still be empty if the vm bit is still set. Only vacuum sets + * vm bits, and only one vacuum can exist in a table at one time. + */ + trust_vm=vacrelstats-nonempty_pagesvacrelstats-skipped_pages ? + vacrelstats-nonempty_pages : vacrelstats-skipped_pages; .. } I think it is better to have spaces in above assignment statement (near '=' and near '') corrected. Here I attached updated patches, 1) without prefetch logic. 2) with combined vm and prefetch logic. I marked the patch as ready for committer. Regards, Hari Babu Fujitsu Australia vac_trunc_trust_vm_v2.patch Description: Binary data vac_trunc_trust_vm_and_prefetch_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Mon, Jul 13, 2015 at 5:16 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: I will do some performance tests and send you the results. Here are the performance results tested on my machine. Head vm patchvm+prefetch patch First vacuum120sec1sec 1sec second vacuum180 sec 180 sec30 sec I did some modifications in the code to skip the vacuum truncation by the first vacuum command. This way I collected the second vacuum time taken performance. I just combined your vm and prefetch patch into a single patch vm+prefetch patch without a GUC. I kept the prefetch as 32 and did the performance test. I chosen prefetch based on the current buffer access strategy, which is 32 for vacuum presently instead of an user option. Here I attached the modified patch with both vm+prefetch logic. I will do some tests on a machine with SSD and let you know the results. Based on these results, we can decide whether we need a GUC or not? based on the impact of prefetch on ssd machines. Following are the performance readings on a machine with SSD. I increased the pgbench scale factor to 1000 in the test instead of 500 to show a better performance numbers. Head vm patchvm+prefetch patch First vacuum6.24 sec 2.91 sec 2.91 sec second vacuum6.66 sec 6.66 sec 7.19 sec There is a small performance impact on SSD with prefetch. The above prefetch overhead is observed with prefeching of 1639345 pages. I feel this overhead is small. Hi Jeff, If you are fine with earlier attached patch, then I will mark this patch as ready for committer, to get some committer view on the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: I will do some performance tests and send you the results. Here are the performance results tested on my machine. Head vm patchvm+prefetch patch First vacuum120sec1sec 1sec second vacuum180 sec 180 sec30 sec I did some modifications in the code to skip the vacuum truncation by the first vacuum command. This way I collected the second vacuum time taken performance. I just combined your vm and prefetch patch into a single patch vm+prefetch patch without a GUC. I kept the prefetch as 32 and did the performance test. I chosen prefetch based on the current buffer access strategy, which is 32 for vacuum presently instead of an user option. Here I attached the modified patch with both vm+prefetch logic. I will do some tests on a machine with SSD and let you know the results. Based on these results, we can decide whether we need a GUC or not? based on the impact of prefetch on ssd machines. Following are the performance readings on a machine with SSD. I increased the pgbench scale factor to 1000 in the test instead of 500 to show a better performance numbers. Head vm patchvm+prefetch patch First vacuum6.24 sec 2.91 sec 2.91 sec second vacuum6.66 sec 6.66 sec 7.19 sec There is a small performance impact on SSD with prefetch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: I will do some performance tests and send you the results. Here are the performance results tested on my machine. Head vm patchvm+prefetch patch First vacuum120sec1sec 1sec second vacuum180 sec 180 sec30 sec I did some modifications in the code to skip the vacuum truncation by the first vacuum command. This way I collected the second vacuum time taken performance. I just combined your vm and prefetch patch into a single patch vm+prefetch patch without a GUC. I kept the prefetch as 32 and did the performance test. I chosen prefetch based on the current buffer access strategy, which is 32 for vacuum presently instead of an user option. Here I attached the modified patch with both vm+prefetch logic. I will do some tests on a machine with SSD and let you know the results. Based on these results, we can decide whether we need a GUC or not? based on the impact of prefetch on ssd machines. Regards, Hari Babu Fujitsu Australia vac_trunc_trust_vm_and_prefetch.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Thu, Jul 9, 2015 at 4:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Jul 8, 2015 at 9:46 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote: There is still the case of pages which had their visibility bit set by a prior vacuum and then were not inspected by the current one. Once the truncation scan runs into these pages, it falls back to the previous behavior of reading block by block backwards. So there could still be reason to optimize that fallback using forward-reading prefetch. The case, I didn't understand is that, how the current vacuum misses the page which was set by the prior vacuum? The prior vacuum set them to all visible, but then doesn't delete them. Either because it got interrupted or because there were still some pages after them (at the time) that were not all empty. The current vacuum skips them entirely on the forward scan because it thinks it is waste of time to vacuum all visible pages. Since it skips them, it doesn't know if they are empty as well as being all-visible. There is no permanent indicator of the pages being all-empty, there is only the inference based on the current vacuum's counters and protected by the lock held on the table. The page should be counted either in skipped_pages or in nonempty_pages. Is it possible that a page doesn't comes under these two categories and it is not empty also? If the above doesn't exist, then we can directly truncate the relation from the highest block number of either nonempty_pages or skipped_pages to till the end of the relation. Right, and that is what this does (provided the vm bit is still set, so it does still have to loop over the vm to verify that it is still set, while it holds the access exclusive lock). Thanks I got it. To re-verify the vm bit of a page after getting the access exclusive lock, we have to do the backward scan. I also feel that your prefetch buffer patch needed to improve the performance, as because there is a need of backward scan to re-verify vm bit, I will do some performance tests and send you the results. The problem is that the pages between the two counters are not known to be empty, and also not known to be nonempty. Someone has to be willing to go check on those pages at some point, or else they will never be eligible for truncation. Yes, there is no way to identify the page is empty or not without reading the page. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Waits monitoring
On Thu, Jul 9, 2015 at 1:52 AM, Ildus Kurbangaliev i.kurbangal...@postgrespro.ru wrote: Hello. Currently, PostgreSQL offers many metrics for monitoring. However, detailed monitoring of waits is still not supported yet. Such monitoring would let dba know how long backend waited for particular event and therefore identify bottlenecks. This functionality is very useful, especially for highload databases. Metric for waits monitoring are provided by many popular commercial DBMS. We currently have requests of this feature from companies migrating to PostgreSQL from commercial DBMS. Thus, I think it would be nice for PostgreSQL to have it too. Yes, It is good have such wait monitoring views in PostgreSQL. you can add this patch to the open commitfest. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote: Attached is a patch that implements the vm scan for truncation. It introduces a variable to hold the last blkno which was skipped during the forward portion. Any blocks after both this blkno and after the last inspected nonempty page (which the code is already tracking) must have been observed to be empty by the current vacuum. Any other process rendering the page nonempty are required to clear the vm bit, and no other process can set the bit again during the vacuum's lifetime. So if the bit is still set, the page is still empty without needing to inspect it. The patch looks good and it improves the vacuum truncation speed significantly. There is still the case of pages which had their visibility bit set by a prior vacuum and then were not inspected by the current one. Once the truncation scan runs into these pages, it falls back to the previous behavior of reading block by block backwards. So there could still be reason to optimize that fallback using forward-reading prefetch. The case, I didn't understand is that, how the current vacuum misses the page which was set by the prior vacuum? The page should be counted either in skipped_pages or in nonempty_pages. Is it possible that a page doesn't comes under these two categories and it is not empty also? If the above doesn't exist, then we can directly truncate the relation from the highest block number of either nonempty_pages or skipped_pages to till the end of the relation. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Jul 3, 2015 at 10:05 PM, Amit Kapila amit.kapil...@gmail.com wrote: Attached, find the rebased version of patch. Note - You need to first apply the assess-parallel-safety patch which you can find at: http://www.postgresql.org/message-id/CAA4eK1JjsfE_dOsHTr_z1P_cBKi_X4C4X3d7Nv=vwx9fs7q...@mail.gmail.com I ran some performance tests on a 16 core machine with large shared buffers, so there is no IO involved. With the default value of cpu_tuple_comm_cost, parallel plan is not getting generated even if we are selecting 100K records from 40 million records. So I changed the value to '0' and collected the performance readings. Here are the performance numbers: selectivity(millions) Seq scan(ms) Parallel scan 2 workers 4 workers 8 workers 0.1 11498.934821.40 3305.843291.90 0.4 10942.984967.46 3338.583374.00 0.8 11619.445189.61 3543.863534.40 1.5 12585.515718.07 4162.712994.90 2.7 14725.668346.96 10429.058049.11 5.4 18719.00 20212.33 21815.19 19026.99 7.2 21955.79 28570.74 28217.60 27042.27 The average table row size is around 500 bytes and query selection column width is around 36 bytes. when the query selectivity goes more than 10% of total table records, the parallel scan performance is dropping. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-vacuum is not running in 9.1.12
On Wed, Jun 17, 2015 at 2:17 PM, Prakash Itnal prakash...@gmail.com wrote: Hi, Currently the issue is easily reproducible. Steps to reproduce: * Set some aggressive values for auto-vacuuming. * Run a heavy database update/delete/insert queries. This leads to invoking auto-vacuuming in quick successions. * Change the system time to older for eg. 1995-01-01 Suddenly auto-vacuuming stops working. Even after changing system time back to current time, the auto-vacuuming did not resume. So the question is, does postrges supports system time changes?. PostgreSQL uses the system time to check whether it reached the specific nap time to trigger the autovacuum worker. Is it possible for you to check what autovauum launcher is doing even after the system time is reset to current time? I can think of a case where the launcher_determine_sleep function returns a big sleep value because of system time change. Because of that it is possible that the launcher is not generating workers to do the vacuum. May be I am wrong. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] does tuple store subtransaction id in it?
On Tue, Jun 16, 2015 at 1:08 PM, Xiaoyulei xiaoyu...@huawei.com wrote: In XidInMVCCSnapshot, it will check xid from tuple if is in snapshot-subxip. It means tuple store subtransaction? Tuple stores only the transaction id related to the operation. This can be either main transaction id or sub transaction id. But in PushTransaction, I see TransactionState.subTransaction will assign currentSubTransactionId, currentSubTransactionId will reinitialize in StartTransaction. So I think tuple should not store subtransaction id. I am confuse about this. If subtransaction id will reinitialize every start time. How to judge it is a subtransaction from xid in tuple? StartTransaction is called only once per transaction.Further on for sub transactions it calls only startSubTransaction. Hope this answers your question. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible pointer dereference
On Thu, May 28, 2015 at 6:07 AM, Gaetano Mendola mend...@gmail.com wrote: I'm playing with a static analyzer and it's giving out some real error analyzing postgresql code base like the following one src/backend/access/transam/commit_ts.c return *ts != 0 // line 321 but a few line up (line 315) ts is checked for null, so either is not needed to check for null or *ts can lead to a null pointer dereference. Same happens a few line later lines 333 and 339 Thanks for providing detailed information. The function TransactionIdGetCommitTsData is currently used only at one place. The caller always passes an valid pointer to this function. So there shouldn't be a problem. But in future if the same function is used at somewhere by passing the NULL pointer then it leads to a crash. By correcting the following way will solve the problem. return ts ? (*ts != 0) : false; instead of retun *ts != 0; Attached a patch for it. Regards, Hari Babu Fujitsu Australia commit_ts_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Apr 22, 2015 at 10:48 PM, Amit Kapila amit.kapil...@gmail.com wrote: parallel_seqscan_v14.patch (Attached with this mail) This patch is not applying/working with the latest head after parallel mode patch got committed. can you please rebase the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Fri, May 15, 2015 at 11:24 PM, Stephen Frost sfr...@snowman.net wrote: * Haribabu Kommi (kommi.harib...@gmail.com) wrote: On Tue, May 5, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote: It still looks quite dubious to me. The more I test this, the more fond I grow of the idea of having this information available in SQL. But I'm also growing more perplexed by how this the file is mapped to a table. It just isn't a good match. For instance: What is keyword_databases? Why is it an array? Same for keyword_users. How can I know whether a given database or user matches a keyword? What is compare_method? (Should perhaps be keyword_address?) Why is compare method set to mask when a hostname is set? (Column order is also a bit confusing here.) I'd also like options to be jsonb instead of a text array. Thanks for your suggestion. I am not sure how to use jsonb here, i will study the same and provide a patch for the next version. Regarding next version- are you referring to 9.6 and therefore we should go ahead and bounce this to the next CF, or were you planning to post a next version of the patch today? Yes, for 9.6 version. This is certainly a capability which I'd like to see, though I share Peter's concerns regarding the splitting up of the keywords rather than keeping the same structure as what's in the actual pg_hba.conf. That strikes me as confusing. It'd be neat if we were able to change pg_hba.conf to make more sense and then perhaps the SQL version wouldn't look so different but I don't think there's any way to do that. I discussed the patch briefing with Greg over IM, who pointed out that keeping things just exactly as they are in the config file would mean implementing, essentially, a pg_hba.conf parser in SQL. I can understand that perspective, but I don't think there's really much hope in users being able to use this view directly without a lot of effort, regardless. We need to provide a function which takes the arguments that our pg_hba lookup does (database, user-to-login-as, maybe system user for pg_ident checks, optionally an IP, etc) and then returns the record that matches. Thanks for details. I will try to come up with a view and a function by considering all the above for the next commitfest. Apologies for not being able to provide more feedback earlier. I'll be happy to help with all of the above and review the patch. Independently, I'd love to see an SQL interface to pg_ident.conf too, where, I expect anyway, it'll be a lot simpler, though I'm not sure that it's very useful until we also have pg_hba.conf available through SQL. Yes, Definitely I look into pg_ident also along with pg_hba. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Tue, May 5, 2015 at 6:48 AM, Peter Eisentraut pete...@gmx.net wrote: On 5/1/15 12:33 PM, Andres Freund wrote: On 2015-04-08 19:19:29 +0100, Greg Stark wrote: I'm not sure what the best way to handle the hand-off from patch contribution to reviewer/committer. If I start tweaking things then you send in a new version it's actually more work to resolve the conflicts. I think at this point it's easiest if I just take it from here. Are you intending to commit this? It still looks quite dubious to me. The more I test this, the more fond I grow of the idea of having this information available in SQL. But I'm also growing more perplexed by how this the file is mapped to a table. It just isn't a good match. For instance: What is keyword_databases? Why is it an array? Same for keyword_users. How can I know whether a given database or user matches a keyword? What is compare_method? (Should perhaps be keyword_address?) Why is compare method set to mask when a hostname is set? (Column order is also a bit confusing here.) I'd also like options to be jsonb instead of a text array. Thanks for your suggestion. I am not sure how to use jsonb here, i will study the same and provide a patch for the next version. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Sat, Apr 4, 2015 at 4:19 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi 2015-03-31 14:38 GMT+02:00 Haribabu Kommi kommi.harib...@gmail.com: keyword_databases - The database name can be all, replication, sameuser, samerole and samegroup. keyword_roles - The role can be all and a group name prefixed with +. I am not very happy about name - but I have no better idea :( - maybe database_mask, user_mask How about database_keywords and user_keywords as column names? The rest of the database and role names are treated as normal database and role names. 2. Added the code changes to identify the names with quoted. Is it a good idea? Database's names are not quoted every else (in system catalog). So now, we cannot to use join to this view. postgres=# select (databases)[1] from pg_hba_conf ; databases --- omega 2 (4 rows) postgres=# select datname from pg_database ; datname --- template1 template0 postgres omega 2 (4 rows) I dislike this - we know, so the name must be quoted in file (without it, the file was incorrect). And if you need quotes, there is a function quote_ident. If we use quotes elsewhere, then it should be ok, bot not now. Please, remove it. More, it is not necessary, when you use a keyword columns. Thanks. The special handling of quotes for pg_hba_conf is not required. I will keep the behaviour similar to the other catalog tables. I will remove the quote support in the next version. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Mon, Mar 30, 2015 at 4:34 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi I checked this patch. I like the functionality and behave. Thanks for the review. Here I attached updated patch with the following changes. 1. Addition of two new keyword columns keyword_databases - The database name can be all, replication, sameuser, samerole and samegroup. keyword_roles - The role can be all and a group name prefixed with +. The rest of the database and role names are treated as normal database and role names. 2. Added the code changes to identify the names with quoted. 3. Updated documentation changes 4. Regression test is corrected. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Fri, Mar 13, 2015 at 1:33 PM, Peter Eisentraut pete...@gmx.net wrote: On 3/4/15 1:34 AM, Haribabu Kommi wrote: On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. Updated patch is attached with the addition of check_for_interrupts in the for loop. I tried out your latest patch. I like that it updates even in running sessions when the file is reloaded. Thanks for the review. Sorry for late reply. The permission checking is faulty, because unprivileged users can execute pg_hba_settings() directly. corrected. Check the error messages against the style guide (especially capitalization). corrected. I don't like that there is a hard-coded limit of 16 options 5 pages away from where it is actually used. That could be done better. changed to 12 instead of 16. I'm not sure about the name pg_hba_settings. Why not just pg_hba or pg_hba_conf if you're going for a representation that is close to the file (as opposed to pg_settings, which is really a lot more abstract than any particular file). changed to pg_hba_conf. I would put the line_number column first. changed. I continue to think that it is a serious mistake to stick special values like 'all' and 'replication' into the arrays without additional decoration. That makes this view useless or at least dangerous for editing tools or tools that want to reassemble the original file. Clearly at least one of those has to be a use case. Otherwise we can just print out the physical lines without interpretation. It is possible to provide more than one keyword for databases or users. Is it fine to use the text array for keyword databases and keyword users. The mask field can go, because address is of type inet, which can represent masks itself. (Or should it be cidr then? Who knows.) The preferred visual representation of masks in pg_hba.conf has been address/mask for a while now, so we should preserve that. Additionally, you can then use the existing inet/cidr operations to do things like checking whether some IP address is contained in an address specification. removed. I can't tell from the documentation what the compare_method field is supposed to do. I see it on the code, but that is not a natural representation of pg_hba.conf. In fact, this just supports my earlier statement. Why are special values in the address field special, but not in the user or database fields? uaImplicitReject is not a user-facing authentication method, so it shouldn't be shown (or showable). removed. I would have expected options to be split into keys and values. All that code to reassemble the options from the parsed struct representation seems crazy to me. Surely, we could save the strings as we parse them? I didn't get this point clearly. Can you explain it a bit more. I can't make sense of the log message pg_hba.conf not reloaded, pg_hba_settings may show stale information. If load_hba() failed, the information isn't stale, is it? In any case, printing this to the server log seems kind of pointless, because someone using the view is presumably doing so because they can't or don't want to read the server log. The proper place might be a warning when the view is actually called. Changed accordingly as if the reload fails, further selects on the view through a warning as view data may not be proper like below. There was some failure in reloading pg_hba.conf file. The pg_hba.conf settings data may contains stale information Here I attached latest patch with the fixes other than keyword columns. I will provide the patch with keyword columns and documentation changes later. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Wed, Mar 11, 2015 at 6:31 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Mar 3, 2015 at 7:47 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have modified the patch to introduce a Funnel node (and left child as PartialSeqScan node). Apart from that, some other noticeable changes based on feedback include: a) Master backend forms and send the planned stmt to each worker, earlier patch use to send individual elements and form the planned stmt in each worker. b) Passed tuples directly via tuple queue instead of going via FE-BE protocol. c) Removed restriction of expressions in target list. d) Introduced a parallelmodeneeded flag in plannerglobal structure and set it for Funnel plan. There is still some work left like integrating with access-parallel-safety patch (use parallelmodeok flag to decide whether parallel path can be generated, Enter/Exit parallel mode is still done during execution of funnel node). I think these are minor points which can be fixed once we decide on the other major parts of patch. Find modified patch attached with this mail. - Something is deeply wrong with the separation of concerns between nodeFunnel.c and nodePartialSeqscan.c. nodeFunnel.c should work correctly with *any arbitrary plan tree* as its left child, and that is clearly not the case right now. shm_getnext() can't just do heap_getnext(). Instead, it's got to call ExecProcNode() on its left child and let the left child decide what to do about that. The logic in InitFunnelRelation() belongs in the parallel seq scan node, not the funnel. ExecReScanFunnel() cannot be calling heap_parallel_rescan(); it needs to *not know* that there is a parallel scan under it. The comment in FunnelRecheck is a copy-and-paste from elsewhere that is not applicable to a generic funnel mode. In create_parallelscan_paths() function the funnel path is added once the partial seq scan path is generated. I feel the funnel path can be added once on top of the total possible parallel path in the entire query path. Is this the right patch to add such support also? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Mar 10, 2015 at 3:09 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Mar 10, 2015 at 6:50 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Tue, Mar 10, 2015 at 1:38 AM, Amit Kapila amit.kapil...@gmail.com wrote: Assuming previous patch is in right direction, I have enabled join support for the patch and done some minor cleanup of patch which leads to attached new version. Is this patch handles the cases where the re-scan starts without finishing the earlier scan? Do you mean to say cases like ANTI, SEMI Join (in nodeNestLoop.c) where we scan the next outer tuple and rescan inner table without completing the previous scan of inner table? Yes. I have currently modelled it based on existing rescan for seqscan (ExecReScanSeqScan()) which means it will begin the scan again. Basically if the workers are already started/initialized by previous scan, then re-initialize them (refer function ExecReScanFunnel() in patch). Can you elaborate more if you think current handling is not sufficient for any case? From ExecReScanFunnel function it seems that the re-scan waits till all the workers has to be finished to start again the next scan. Are the workers will stop the current ongoing task? otherwise this may decrease the performance instead of improving as i feel. I am not sure if it already handled or not, when a worker is waiting to pass the results, whereas the backend is trying to start the re-scan? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Mar 10, 2015 at 1:38 AM, Amit Kapila amit.kapil...@gmail.com wrote: Assuming previous patch is in right direction, I have enabled join support for the patch and done some minor cleanup of patch which leads to attached new version. Is this patch handles the cases where the re-scan starts without finishing the earlier scan? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. Updated patch is attached with the addition of check_for_interrupts in the for loop. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 12:18 PM, Greg Stark st...@mit.edu wrote: On Wed, Mar 4, 2015 at 12:34 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: Out of curiosity, regarding the result materialize code addition, Any way the caller of hba_settings function ExecMakeTableFunctionResult also stores the results in tuple_store. Is there any advantage doing it in hba_settings function rather than in the caller? That might protect against concurrent pg_hba reloads, though I suspect there's a CHECK_FOR_INTERRUPTS hanging out in that loop. We could maybe put one in this loop and check if the parser memory context has been reset. I feel there is no problem of current pg_hba reloads, because the check_for_interrupts doesn't reload the conf files. It will be done in the postgresMain function once the query finishes. Am I missing something? + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. But the immediate problem is that having the API that looked up a line by line number and then calling it repeatedly with successive line numbers was O(n^2). Each time it was called it had to walk down the list from the head all over again. 1 + 2 + 3 + 4 + ... n = n(n+1)/2 or O(n^2). This isn't performance critical but it would not have run in a reasonable length of time for large pg_hba files. And having to store the state between calls means the code is at least as simple for the tuplestore, imho even simpler. Got it. Thanks. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Mar 4, 2015 at 5:57 AM, Greg Stark st...@mit.edu wrote: On further review I've made a few more changes attached. I think we should change the column names to users and databases to be clear they're lists and also to avoid the user SQL reserved word. I removed the dependency on strlist_to_array which is in objectaddress.c which isn't a very sensible dependency -- it does seem like it would be handy to have a list-based version of construct_array moved to arrayfuncs.c but for now it's not much more work to handle these ourselves. I changed the options to accumulate one big array instead of an array of bunches of options. Previously you could only end up with a singleton array with a comma-delimited string or a two element array with one of those and map=. The changes are fine, this eliminates the unnecessary dependency on objectaddress. I think the error if pg_hba fails to reload needs to be LOG. It would be too unexpected to the user who isn't necessarily the one who issued the SIGHUP to spontaneously get a warning. I also removed the mask from local entries and made some of the NULLS that shouldn't be possible to happen (unknown auth method or connection method) actually throw errors. changes are fine. All the tests are passed. Out of curiosity, regarding the result materialize code addition, Any way the caller of hba_settings function ExecMakeTableFunctionResult also stores the results in tuple_store. Is there any advantage doing it in hba_settings function rather than in the caller? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Sat, Feb 28, 2015 at 11:41 AM, Stephen Frost sfr...@snowman.net wrote: Pavel, * Pavel Stehule (pavel.steh...@gmail.com) wrote: 2015-02-27 22:26 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Stephen Frost sfr...@snowman.net writes: Right, we also need a view (or function, or both) which provides what the *active* configuration of the running postmaster is. This is exactly what I was proposing (or what I was intending to, at least) with pg_hba_active, so, again, I think we're in agreement here. I think that's going to be a lot harder than you realize, and it will have undesirable security implications, in that whatever you do to expose the postmaster's internal state to backends will also make it visible to other onlookers; not to mention probably adding new failure modes. we can do copy of pg_hba.conf somewhere when postmaster starts or when it is reloaded. Please see my reply to Tom. There's no trivial way to reach into the postmaster from a backend- but we do get a copy of whatever the postmaster had when we forked, and the postmaster only reloads pg_hba.conf on a sighup and that sighup is passed down to the children, so we simply need to also reload the pg_hba.conf in the children when they get a sighup. That's how postgresql.conf is handled, which is what pg_settings is based off of, and I believe is the behavior folks are really looking for. Loading pg_hba.conf during SIGHUP in the backends will solve the problem of displaying the data which is not yet loaded. This change may produce a warning if it fails to load pg_hba.conf in the backends. Here I attached the updated patch. I didn't yet added the pg_hba_check function. I feel it is better to be a separate patch. I can share the same later. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Sat, Feb 7, 2015 at 8:26 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi I am sending a review of this patch. Thanks for the review. sorry for the delay. 4. Regress tests test rules... FAILED -- missing info about new view Thanks. Corrected. My objections: 1. data type for database field should be array of name or array of text. When name contains a comma, then this comma is not escaped currently: {omega,my stupid extremly, name2,my stupid name} expected: {my stupid name,omega,my stupid extremly, name2} Same issue I see in options field Changed including the user column also. 2. Reload is not enough for content refresh - logout is necessary I understand, why it is, but it is not very friendly, and can be very stressful. It should to work with last reloaded data. At present the view data is populated from the variable parsed_hba_lines which contains the already parsed lines of pg_hba.conf file. During the reload, the hba entries are reloaded only in the postmaster process and not in every backend, because of this reason the reloaded data is not visible until the session is disconnected and connected. Instead of changing the SIGHUP behavior in the backend to load the hba entries also, whenever the call is made to the view, the pg_hba.conf file is parsed to show the latest reloaded data in the view. This way it doesn't impact the existing behavior but it may impact the performance because of file load into memory every time. Updated patch is attached. comments? Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about durability and postgresql.
On Fri, Feb 20, 2015 at 5:09 PM, Alfred Perlstein bri...@mu.org wrote: Hello, We have a combination of 9.3 and 9.4 databases used for logging of data. We do not need a strong durability guarantee, meaning it is ok if on crash a minute or two of data is lost from our logs. (This is just stats for our internal tool). I am looking at this page: http://www.postgresql.org/docs/9.4/static/non-durability.html And it's not clear which setting I should turn on. What we do NOT want is to lose the entire table or corrupt the database. We do want to gain speed though by not making DATA writes durable. Which setting is appropriate for this use case? At a glance it looks like a combination of 1) Turn off synchronous_commit and possibly: 2) Increase checkpoint_segments and checkpoint_timeout ; this reduces the frequency of checkpoints, but increases the storage requirements of /pg_xlog. I feel changing above two configuration points are enough for your requirement. 3) Turn off full_page_writes; there is no need to guard against partial page writes. Turning off this may lead to a corrupted database in case if the system crash during the page write until unless your file system supports guard against partial page writes. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Wed, Jan 28, 2015 at 9:47 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/27/15 1:04 AM, Haribabu Kommi wrote: Here I attached the latest version of the patch. I will add this patch to the next commitfest. Apologies if this was covered, but why isn't the IP address an inet instead of text? Corrected the address type as inet instead of text. updated patch is attached. Also, what happens if someone reloads the config in the middle of running the SRF? hba entries are reloaded only in postmaster process, not in every backend. So there shouldn't be any problem with config file reload. Am i missing something? Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Mon, Jun 30, 2014 at 5:06 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: I think having two columns would work. The columns could be called database and database_list and user and user_list respectively. The database column may contain one of all, sameuser, samegroup, replication, but if it's empty, database_list will contain an array of database names. Then (all, {}) and (, {all}) are easily separated. Likewise for user and user_list. Thanks for the review. I corrected all the review comments except the one to add two columns as (database, database_list and user, user_list). I feel this may cause some confusion to the users. Here I attached the latest version of the patch. I will add this patch to the next commitfest. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-10 10:57:16 -0500, Robert Haas wrote: Does parallelism help at all? I'm pretty damn sure. We can't even make a mildly powerfull storage fully busy right now. Heck, I can't make my workstation's storage with a raid 10 out of four spinning disks fully busy. I think some of that benefit also could be reaped by being better at hinting the OS... Yes, it definitely helps but not only limited to IO bound operations. It gives a good gain for the queries having CPU intensive where conditions. One more point we may need to consider, is there any overhead in passing the data row from workers to backend? I feel this may play a major role when the selectivity is more. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan
On Tue, Nov 11, 2014 at 2:35 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Nov 11, 2014 at 5:30 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-10 10:57:16 -0500, Robert Haas wrote: Does parallelism help at all? I'm pretty damn sure. We can't even make a mildly powerfull storage fully busy right now. Heck, I can't make my workstation's storage with a raid 10 out of four spinning disks fully busy. I think some of that benefit also could be reaped by being better at hinting the OS... Yes, it definitely helps but not only limited to IO bound operations. It gives a good gain for the queries having CPU intensive where conditions. One more point we may need to consider, is there any overhead in passing the data row from workers to backend? I am not sure if that overhead will be too much visible if we improve the use of I/O subsystem by making parallel tasks working on it. I feel there may be an overhead because of workers needs to put the result data in the shared memory and the backend has to read from there to process it further. If the cost of transfering data from worker to backend is more than fetching a tuple from the scan, then the overhead is visible when the selectivity is more. However another idea here could be that instead of passing tuple data, we just pass tuple id, but in that case we have to retain the pin on the buffer that contains tuple untill master backend reads from it that might have it's own kind of problems. Transfering tuple id doesn't solve the scenarios if the node needs any projection. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On Mon, Oct 27, 2014 at 10:38 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 26 October 2014 10:42, Haribabu Kommi wrote: I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? Actually the keysCount can be more than one only if the key strategy is BTEqualStrategyNumber (i.e. = condition) But our optimization is only for the '' or '=' condition only. So adding code to set flag for all keys is of no use. Let me know if I am missing anything here? Thanks. I understood the point. If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Do you mean to say that whether we should add any special handling for volatile function? If yes then as you said since index scan itself is not selected for such functions, then I feel We don’t need to add anything for that. I also have the same opinion. I marked the patch as ready for committer. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Possible problem with shm_mq spin lock
Hi Hackers, I am thinking of a possible problem with shm_mq structure spin lock. This is used for protecting the shm_mq structure. During the processing of any code under the spin lock, if the process receives SIGQUIT signal then it is leading to a dead lock situation. SIGQUIT-proc_exit-shm_mq_detach-try to acquire spin lock. The spin lock is already took by the process. It is very dificult to reproduce the problem as because the code under the lock is very minimal. Please let me know if I missed anything. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible problem with shm_mq spin lock
On Sun, Oct 26, 2014 at 10:17 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-10-26 08:52:42 +1100, Haribabu Kommi wrote: I am thinking of a possible problem with shm_mq structure spin lock. This is used for protecting the shm_mq structure. During the processing of any code under the spin lock, if the process receives SIGQUIT signal then it is leading to a dead lock situation. SIGQUIT-proc_exit-shm_mq_detach-try to acquire spin lock. The spin lock is already took by the process. It is very dificult to reproduce the problem as because the code under the lock is very minimal. Please let me know if I missed anything. I think you missed the following bit in postgres.c: /* * quickdie() occurs when signalled SIGQUIT by the postmaster. * * Some backend has bought the farm, * so we need to stop what we're doing and exit. */ void quickdie(SIGNAL_ARGS) { ... /* * We DO NOT want to run proc_exit() callbacks -- we're here because * shared memory may be corrupted, so we don't want to try to clean up our * transaction. Just nail the windows shut and get out of town. Now that * there's an atexit callback to prevent third-party code from breaking * things by calling exit() directly, we have to reset the callbacks * explicitly to make this work as intended. */ on_exit_reset(); Thanks for the details. I am sorry It is not proc_exit. It is the exit callback functions that can cause problem. The following is the callstack where the problem can happen, if the signal handler is called after the spin lock took by the worker. Breakpoint 1, 0x0072dd83 in shm_mq_detach () (gdb) bt #0 0x0072dd83 in shm_mq_detach () #1 0x0072e7db in shm_mq_detach_callback () #2 0x00726d71 in dsm_detach () #3 0x00726c43 in dsm_backend_shutdown () #4 0x00727450 in shmem_exit () #5 0x007272fc in proc_exit_prepare () #6 0x00727501 in atexit_callback () #7 0x0030ff435da2 in exit () from /lib64/libc.so.6 #8 0x006ddaec in bgworker_quickdie () #9 signal handler called #10 0x0072ce9a in shm_mq_sendv () Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible problem with shm_mq spin lock
On Sun, Oct 26, 2014 at 12:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Haribabu Kommi kommi.harib...@gmail.com writes: Thanks for the details. I am sorry It is not proc_exit. It is the exit callback functions that can cause problem. The following is the callstack where the problem can happen, if the signal handler is called after the spin lock took by the worker. Breakpoint 1, 0x0072dd83 in shm_mq_detach () (gdb) bt #0 0x0072dd83 in shm_mq_detach () #1 0x0072e7db in shm_mq_detach_callback () #2 0x00726d71 in dsm_detach () #3 0x00726c43 in dsm_backend_shutdown () #4 0x00727450 in shmem_exit () #5 0x007272fc in proc_exit_prepare () #6 0x00727501 in atexit_callback () #7 0x0030ff435da2 in exit () from /lib64/libc.so.6 #8 0x006ddaec in bgworker_quickdie () Or in other words, Robert broke it. This control path should absolutely not occur: the entire point of the on_exit_reset call in quickdie() is to prevent any callbacks from being executed when we get to shmem_exit(). DSM-related functions DO NOT get an exemption. The reset_on_dsm_detach function is called to remove the DSM related callbacks. It's my mistake, I am really sorry, the code I am using is a wrong one. Sorry for the noise. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On Tue, Sep 23, 2014 at 10:38 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 22 September 2014 19:17, Heikki Linnakangas wrote: On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. I have finished implementation of the discussed optimization. I got a performance improvement of around 30% on the schema and data shared in earlier mail. I also tested for the index scan case, where our optimization is not done and observed that there is no effect on those query because of this change. Change details: I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey structure), the value used for this 0x0004, which was unused. Inside the function _bt_first, once we finish finding the start scan position based on the first key, I am appending the flag SK_BT_MATCHED to the first key. Then in the function _bt_checkkeys, during the key comparison, I am checking if the key has SK_BT_MATCHED flag set, if yes then there is no need to further comparison. But if the tuple is having NULL value, then even if this flag is set, we will continue with further comparison (this handles the Heikki point of checking NULLs). Hi, I reviewed index scan optimization patch, the following are the observations. - Patch applies cleanly. - Compiles without warnings - All regress tests are passed. There is a good performance gain with the patch in almost all scenarios. I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Thanks for the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Fri, Aug 29, 2014 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Haribabu Kommi kommi.harib...@gmail.com writes: Thanks for your review. Please find the rebased patch to latest HEAD. Committed with minor (mostly cosmetic) alterations. Thanks. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Wed, Aug 27, 2014 at 8:27 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera wrote: So my proposal is a bit more complicated. First we introduce the notion of a single number, to enable sorting and computations: the delay equivalent, which is the cost_limit divided by cost_delay. Here's a patch that implements this idea. As you see this is quite a bit more complicated that Haribabu's proposal. There are two holes in this: 1. if you ALTER DATABASE to change vacuum delay for a database, those values are not considered in the global equiv delay. I don't think this is very important and anyway we haven't considered this very much, so it's okay if we don't handle it. 2. If you have a fast worker that's only slightly faster than regular workers, it will become slower in some cases. This is explained in a FIXME comment in the patch. I don't really have any more time to invest in this, but I would like to see it in 9.4. Mark, would you test this? Haribabu, how open are you to fixing point (2) above? Thanks Alvaro. I will check the point(2). Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Sequence Scan doubts
On Sun, Aug 24, 2014 at 12:34 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 08/24/2014 09:40 AM, Haribabu Kommi wrote: Any suggestions? Another point I didn't raise first time around, but that's IMO quite significant, is that you haven't addressed why this approach to fully parallel seqscans is useful and solves real problems in effective ways. It might seem obvious - of course they're useful. But I see two things they'd address: - CPU-limited sequential scans, where expensive predicates are filtering the scan; and Yes, we are mainly targeting CPU-limited sequential scans, Because of this reason only I want the worker to handle the predicates also not just reading the tuples from disk. - I/O limited sequential scans, where the predicates already execute fast enough on one CPU, so most time is spent waiting for more disk I/O. The problem I see with your design is that it's not going to be useful for a large class of CPU-limited scans where the predicate isn't composed entirely of immutable functions an operators. Especially since immutable-only predicates are the best candidates for expression indexes anyway. While it'd likely be useful for I/O limited scans, it's going to increase contention on shared_buffers locking and page management. More importantly, is it the most efficient way to solve the problem with I/O limited scans? I would seriously suggest looking at first adding support for asynchronous I/O across ranges of extents during a sequential scan. You might not need multiple worker backends at all. I'm sure using async I/O to implement effective_io_concurrency for seqscans has been been discussed and explored before, so again I think some time in the list archives might make sense. Thanks for your inputs. I will check it. I don't know if it makes sense to do something as complex and parallel multi-process seqscans without having a path forward for supporting non-immutable functions - probably with fmgr API enhancements, additional function labels (PARALLEL), etc, depending on what you find is needed. Thanks for your inputs. I will check for a solution to extend the support for non-immutable functions also. 3. In the executor Init phase, Try to copy the necessary data required by the workers and start the workers. Copy how? Back-ends can only communicate with each other over shared memory, signals, and using sockets. Sorry for not being clear, copying those data structures into dynamic shared memory only. From there the workers can access. That'll probably work with read-only data, but it's not viable for read/write data unless you use a big lock to protect it, in which case you lose the parallelism you want to achieve. As of now I am thinking of sharing read-only data with workers. In case if read/write data needs to be shared, then we may need another approach to handle the same. You'd have to classify what may be modified during scan execution carefully and determine if you need to feed any of the resulting modifications back to the original backend - and how to merge modifications by multiple workers, if it's even possible. That's going to involve a detailed structure-by-structure analysis and seems likely to be error prone and buggy. Thanks for your inputs. I will check it properly. I think you should probably talk to Robert Haas about what he's been doing over the last couple of years on parallel query. Sure, I will check with him. 4. In the executor run phase, just get the tuples which are sent by the workers and process them further in the plan node execution. Again, how do you propose to copy these back to the main bgworker? With the help of message queues that are created in the dynamic shared memory, the workers can send the data to the queue. On other side the main backend receives the tuples from the queue. OK, so you plan to implement shmem queues. That'd be a useful starting point, as it'd be something that would be useful in its own right. shmem queues are already possible with dynamic shared memory. Just I want to use them here. You'd have to be able to handle individual values that're than the ring buffer or whatever you're using for transfers, in case you're dealing with already-detoasted tuples or in-memory tuples. Again, chatting with Robert and others who've worked on dynamic shmem, parallel query, etc would be wise here. Yes you are correct. For that reason only I am thinking of Supporting of functions that only dependent on input variables and are not modifying any global data. You'll want to be careful with that. Nothing stops an immutable function referencing a cache in a C global that it initializes one and then treats as read only, for example. I suspect you'll need a per-function whitelist. I'd love to be wrong. Yes, we need per-function level details. Once we have a better solution to handle non-immutable functions also then these may not be required
Re: [HACKERS] Parallel Sequence Scan doubts
On Sun, Aug 24, 2014 at 1:11 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 08/21/2014 02:47 PM, Haribabu Kommi wrote: Implementation of Parallel Sequence Scan I think you mean Parallel Sequential Scan. Sorry for not being clear, Yes it is parallel sequential scan. 1.Parallel Sequence Scan can achieved by using the background workers doing the job of actual sequence scan including the qualification check also. Only if the qualifiers are stable/immutable I think. Not even necessarily stable functions - consider use of the fmgr/executor state contexts to carry information over between calls, and what parallel execution would do to that. Thanks for your input. As of now we are targeting only immutable functions, that can be executed in parallel without any side effects. 3. In the executor Init phase, Try to copy the necessary data required by the workers and start the workers. Copy how? Back-ends can only communicate with each other over shared memory, signals, and using sockets. Sorry for not being clear, copying those data structures into dynamic shared memory only. From there the workers can access. 4. In the executor run phase, just get the tuples which are sent by the workers and process them further in the plan node execution. Again, how do you propose to copy these back to the main bgworker? With the help of message queues that are created in the dynamic shared memory, the workers can send the data to the queue. On other side the main backend receives the tuples from the queue. 1. Data structures that are required to be copied from backend to worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate and etc. That's a big etc. Huge, in fact. Any function can reference any global variable. Even an immutable function might use globals for cache etc - and might, for example, set them up using an executor start hook. You cannot really make any assumptions about what functions access what memory. Yes you are correct. For that reason only I am thinking of Supporting of functions that only dependent on input variables and are not modifying any global data. I see some problems in copying Estate data structure into the shared memory because it contains so many pointers. There is a need of some infrastructure to copy these data structures into the shared memory. It's not just a matter of copying them into/via shmem. It's about their meaning. Does it even make sense to copy the executor state to another backend? If so, you can't copy it back, so what do you do at the end of the scans? If we handle the locking of relation in the backend and avoid doing the parallel sequential scan if any sub query is involved, then there is no need of full estate in the worker. In those cases by sharing less information, I think we can execute the plan in the worker. Any suggestions? Before you try to design anything more on this, study the *large* amount of discussion that has happened on this topic on this mailing list over the last years. This is not a simple or easy task, and it's not one you should approach without studying what's already been worked on, built, contemplated, etc. Thanks for your information. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Mon, Aug 18, 2014 at 8:12 PM, Asif Naeem anaeem...@gmail.com wrote: Thank you for sharing updated patch. With latest 9.5 source code, patch build is failing with following error message i.e. /Applications/Xcode.app/Contents/Developer/usr/bin/make -C catalog schemapg.h cd ../../../src/include/catalog '/opt/local/bin/perl' ./duplicate_oids 3255 make[3]: *** [postgres.bki] Error 1 make[2]: *** [submake-schemapg] Error 2 make[1]: *** [all-backend-recurse] Error 2 make: *** [all-src-recurse] Error 2 New function is being added by following commit i.e. commit b34e37bfefbed1bf9396dde18f308d8b96fd176c Author: Robert Haas rh...@postgresql.org Date: Thu Aug 14 12:09:52 2014 -0400 Add sortsupport routines for text. This provides a small but worthwhile speedup when sorting text, at least in cases to which the sortsupport machinery applies. Robert Haas and Peter Geoghegan It seems that you need to use another oid. Other than this patch looks good to me, please share updated patch and feel free to assign it to committer. Thanks. Thanks for your review. Please find the rebased patch to latest HEAD. Regards, Hari Babu Fujitsu Australia inet_agg_v8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Implementing parallel sequence doubts
Hi Hackers, Implementation of Parallel Sequence Scan Approach: 1.Parallel Sequence Scan can achieved by using the background workers doing the job of actual sequence scan including the qualification check also. 2. Planner generates the parallel scan plan by checking the possible criteria of the table to do a parallel scan and generates the tasks (range of blocks). 3. In the executor Init phase, Try to copy the necessary data required by the workers and start the workers. 4. In the executor run phase, just get the tuples which are sent by the workers and process them further in the plan node execution. some of the problems i am thinking: 1. Data structures that are required to be copied from backend to worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate and etc. I see some problems in copying Estate data structure into the shared memory because it contains so many pointers. There is a need of some infrastructure to copy these data structures into the shared memory. If the backend takes care of locking the relation and there are no sub plans involved in the qual and targetlist, is the estate is really required in the worker to achieve the parallel scan? Is it possible to Initialize the qual, targetlist and projinfo with the local Estate structure created in the worker with the help of plan structure received from the backend? Or In case if estate is required in the worker for the processing, is there any better way possible to share backend data structures with the workers? Any suggestions? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Sequence Scan doubts
Corrected subject. Hi Hackers, Implementation of Parallel Sequence Scan Approach: 1.Parallel Sequence Scan can achieved by using the background workers doing the job of actual sequence scan including the qualification check also. 2. Planner generates the parallel scan plan by checking the possible criteria of the table to do a parallel scan and generates the tasks (range of blocks). 3. In the executor Init phase, Try to copy the necessary data required by the workers and start the workers. 4. In the executor run phase, just get the tuples which are sent by the workers and process them further in the plan node execution. some of the problems i am thinking: 1. Data structures that are required to be copied from backend to worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate and etc. I see some problems in copying Estate data structure into the shared memory because it contains so many pointers. There is a need of some infrastructure to copy these data structures into the shared memory. If the backend takes care of locking the relation and there are no sub plans involved in the qual and targetlist, is the estate is really required in the worker to achieve the parallel scan? Is it possible to Initialize the qual, targetlist and projinfo with the local Estate structure created in the worker with the help of plan structure received from the backend? Or In case if estate is required in the worker for the processing, is there any better way possible to share backend data structures with the workers? Any suggestions? Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Mon, Aug 4, 2014 at 3:22 PM, Asif Naeem anaeem...@gmail.com wrote: Sorry for not being clear, above mentioned test is related to following doc (sgml) changes that seems not working as described i.e. Table 9-35. cidr and inet Functions FunctionReturn TypeDescriptionExampleResult max(inet, inet) inetlarger of two inet typesmax('192.168.1.5', '192.168.1.4')192.168.1.5 min(inet, inet) inetsmaller of two inet typesmin('192.168.1.5', '192.168.1.4')192.168.1.4 Can you please elaborate these sgml change ? I thought of adding them for newly added network functions but mistakenly I kept the names as max and min. Anyway with your suggestion in the earlier mail, these changes are not required. I removed these changes in the attached patch. Thanks for reviewing the patch. Regards, Hari Babu Fujitsu Australia inet_agg_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Thu, Jul 24, 2014 at 5:59 PM, Asif Naeem anaeem...@gmail.com wrote: Sorry for being late. Thank you for sharing updated patch, sgml changes seems not working i.e. postgres=# select max('192.168.1.5', '192.168.1.4'); ERROR: function max(unknown, unknown) does not exist LINE 1: select max('192.168.1.5', '192.168.1.4'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. postgres=# select min('192.168.1.5', '192.168.1.4'); ERROR: function min(unknown, unknown) does not exist LINE 1: select min('192.168.1.5', '192.168.1.4'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. I didn't get your point. I tested the patch, the sgml changes are working fine. I would suggest the following or similar changes e.g. doc/src/sgml/func.sgml /indexterm functionmax(replaceable class=parameterexpression/replaceable)/function /entry - entryany array, numeric, string, or date/time type/entry + entryany inet, array, numeric, string, or date/time type/entry entrysame as argument type/entry entry maximum value of replaceable @@ -12204,7 +12228,7 @@ NULL baz/literallayout(3 rows)/entry /indexterm functionmin(replaceable class=parameterexpression/replaceable)/function /entry - entryany array, numeric, string, or date/time type/entry + entryany inet, array, numeric, string, or date/time type/entry entrysame as argument type/entry entry minimum value of replaceable Thanks for reviewing the patch. I missed the above change. Corrected the same in the attached patch. Regards, Hari Babu Fujitsu Australia inet_agg_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Mon, Jul 7, 2014 at 6:59 PM, Asif Naeem anaeem...@gmail.com wrote: Hi Haribabu, Thank you for sharing the patch. I have spent some time to review the changes. Overall patch looks good to me, make check and manual testing seems run fine with it. There seems no related doc/sgml changes ?. Patch added network_smaller() and network_greater() functions but in PG source code, general practice seems to be to use “smaller and “larger” as related function name postfix e.g. timestamp_smaller()/timestamp_larger(), interval_smaller/interval_larger(), cashsmaller()/cashlarger() etc. Thanks. Thanks for reviewing the patch. I corrected the function names as smaller and larger. and also added documentation changes. Updated patch attached in the mail. Regards, Hari Babu Fujitsu Australia *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 8647,8652 CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple --- 8647,8676 row entry indexterm + primarymax/primary + /indexterm + literalfunctionmax(typeinet/type, typeinet/type)/function/literal + /entry + entrytypeinet/type/entry + entrylarger of two inet types/entry + entryliteralmax('192.168.1.5', '192.168.1.4')/literal/entry + entryliteral192.168.1.5/literal/entry +/row +row + entry + indexterm + primarymin/primary + /indexterm + literalfunctionmin(typeinet/type, typeinet/type)/function/literal + /entry + entrytypeinet/type/entry + entrysmaller of two inet types/entry + entryliteralmin('192.168.1.5', '192.168.1.4')/literal/entry + entryliteral192.168.1.4/literal/entry +/row +row + entry + indexterm primarynetmask/primary /indexterm literalfunctionnetmask(typeinet/type)/function/literal *** a/src/backend/utils/adt/network.c --- b/src/backend/utils/adt/network.c *** *** 471,476 network_ne(PG_FUNCTION_ARGS) --- 471,499 PG_RETURN_BOOL(network_cmp_internal(a1, a2) != 0); } + Datum + network_smaller(PG_FUNCTION_ARGS) + { + inet *a1 = PG_GETARG_INET_PP(0); + inet *a2 = PG_GETARG_INET_PP(1); + + if (network_cmp_internal(a1, a2) 0) + PG_RETURN_INET_P(a1); + else + PG_RETURN_INET_P(a2); + } + + Datum + network_larger(PG_FUNCTION_ARGS) + { + inet *a1 = PG_GETARG_INET_PP(0); + inet *a2 = PG_GETARG_INET_PP(1); + + if (network_cmp_internal(a1, a2) 0) + PG_RETURN_INET_P(a1); + else + PG_RETURN_INET_P(a2); + } /* * Support function for hash indexes on inet/cidr. */ *** a/src/include/catalog/pg_aggregate.h --- b/src/include/catalog/pg_aggregate.h *** *** 164,169 DATA(insert ( 2050 n 0 array_larger ----f f 1073 2277 0 0 0 _nu --- 164,170 DATA(insert ( 2244 n 0 bpchar_larger ----f f 1060 1042 0 0 0 _null_ _null_ )); DATA(insert ( 2797 n 0 tidlarger ----f f 2800 27 0 0 0 _null_ _null_ )); DATA(insert ( 3526 n 0 enum_larger ----f f 3519 3500 0 0 0 _null_ _null_ )); + DATA(insert ( 3255 n 0 network_larger ----f f 1205 869 0 0 0 _null_ _null_ )); /* min */ DATA(insert ( 2131 n 0 int8smaller ----f f 412 20 0 0 0 _null_ _null_ )); *** *** 186,191 DATA(insert ( 2051 n 0 array_smaller ----f f 1072 2277 0 0 0 _n --- 187,193 DATA(insert ( 2245 n 0 bpchar_smaller ----f f 1058 1042 0 0 0 _null_ _null_ )); DATA(insert ( 2798 n 0 tidsmaller ----f f 2799 27 0 0 0 _null_ _null_ )); DATA(insert ( 3527 n 0 enum_smaller ----f f 3518 3500 0 0 0 _null_ _null_ )); + DATA(insert ( 3256 n 0 network_smaller ----f f 1203 869 0 0 0 _null_ _null_ )); /* count */ DATA(insert ( 2147 n 0 int8inc_any -int8inc_any int8dec_any -f f 0 20 0 20 0 0 0 )); *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *** *** 2120,2125 DATA(insert OID = 922 ( network_le PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 1 --- 2120,2129 DATA(insert OID = 923 ( network_gt PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 869 869 _null_ _null_ _null_ _null_ network_gt _null_ _null_ _null_ )); DATA(insert OID = 924 ( network_ge PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 869 869 _null_ _null_ _null_ _null_ network_ge _null_ _null_ _null_ )); DATA(insert OID = 925 ( network_ne PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 869 869 _null_ _null_ _null_ _null_ network_ne _null_ _null_ _null_ )); + DATA(insert OID = 3257 ( network_larger PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 869 869 869 _null_ _null_ _null_ _null_ network_larger _null_ _null_ _null_ )); + DESCR(larger of two network types); + DATA(insert OID = 3258 (
Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN
On Thu, May 8, 2014 at 10:00 AM, Jaime Casanova ja...@2ndquadrant.com wrote: Hi, This patch implements $subject only when ANALYZE and VERBOSE are on. I made it that way because for years nobody seemed interested in this info (at least no one did it) so i decided that maybe is to much information for most people (actually btree indexes are normally very fast). But because we have GiST and GIN this became an interested piece of data (there are other cases even when using btree). Current patch doesn't have docs yet i will add them soon. This patch provides the Insertion time of an index operation. This information is useful to the administrator for reorganizing the indexes based on the insertion time. Quick review: Applies to Head. Regress test is passed. Coding is fine. Minor comments: There is no need of printing the index insertion time as zero in case of hot update operations. Please correct the same. Add the documentation changes. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions
On Tue, Jun 17, 2014 at 12:30 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: (Cc: to pgsql-bugs dropped.) At 2014-03-17 18:24:55 +1100, kommi.harib...@gmail.com wrote: *** a/doc/src/sgml/xfunc.sgml --- b/doc/src/sgml/xfunc.sgml *** *** 153,159 SELECT clean_emp(); --- 153,186 (literal\/) (assuming escape string syntax) in the body of the function (see xref linkend=sql-syntax-strings). /para + +sect2 id=xfunc-sql-function-parsing-mechanism + titleParsing mechanism of a function/title +indexterm + primaryfunction/primary + secondaryparsing mechanism/secondary +/indexterm I suggest Catalog changes within functions instead of the above title. + para + The body of an SQL function is parsed as if it were a single - multi-part + statement and thus uses a constant snapshot of the system catalogs for + every sub-statement therein. Commands that alter the catalog will likely not + work as expected. + /para + + para + For example: Issuing CREATE TEMP TABLE within an SQL function will add + the table to the catalog but subsequent statements in the same function will + not see those additions and thus the temporary table will be invisible to them. + /para + + para + Thus it is generally advised that applicationPL/pgSQL/ be used, instead of + acronymSQL/acronym, when any catalog visibilities are required in the same function. + /para +/sect2 I don't think that much text is warranted. I suggest something like the following condensed wording: para The body of an SQL function is parsed as if it were a single multi-part statement, using a constant snapshot of the system catalogs. The effect of any commands that alter the catalogs (e.g. CREATE TEMP TABLE) will therefore not be visible to subsequent commands in the function body. /para para The recommended workaround is to use applicationPL/PgSQL/. /para Does that seem sensible to you? Looks good, Thanks for the review. Updated patch attached. Regards, Hari Babu Fujitsu Australia sql_functions_parsing_doc_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Thu, Jun 5, 2014 at 9:12 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-06-04 10:37:48 +1000, Haribabu Kommi wrote: Thanks for the test. Patch is re-based to the latest head. Did you look at the source of the conflict? Did you intentionally mark the functions as leakproof and reviewed that that truly is the case? Or was that caused by copy paste? Yes it is copy paste mistake. I didn't know much about that parameter. Thanks for the information. I changed it. Regards, Hari Babu Fujitsu Australia inet_agg_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Wed, Jun 4, 2014 at 5:46 AM, Keith Fiske ke...@omniti.com wrote: Andres's changes on June 3rd to https://github.com/postgres/postgres/commits/master/src/test/regress/expected/create_function_3.out are causing patch v2 to fail for that regression test file. postgres $ patch -p1 -i ../inet_agg_v2.patch patching file src/backend/utils/adt/network.c patching file src/include/catalog/pg_aggregate.h patching file src/include/catalog/pg_proc.h patching file src/include/utils/builtins.h patching file src/test/regress/expected/create_function_3.out Hunk #1 FAILED at 153. 1 out of 1 hunk FAILED -- saving rejects to file src/test/regress/expected/create_function_3.out.rej patching file src/test/regress/expected/inet.out patching file src/test/regress/sql/inet.sql Otherwise it applies without any issues to the latest HEAD. I built and started a new instance, and I was able to test at least the basic min/max functionality keith=# create table test_inet (id serial, ipaddress inet); CREATE TABLE Time: 25.546 ms keith=# insert into test_inet (ipaddress) values ('192.168.1.1'); INSERT 0 1 Time: 3.143 ms keith=# insert into test_inet (ipaddress) values ('192.168.1.2'); INSERT 0 1 Time: 2.932 ms keith=# insert into test_inet (ipaddress) values ('127.0.0.1'); INSERT 0 1 Time: 1.786 ms keith=# select min(ipaddress) from test_inet; min --- 127.0.0.1 (1 row) Time: 3.371 ms keith=# select max(ipaddress) from test_inet; max - 192.168.1.2 (1 row) Time: 1.104 ms Thanks for the test. Patch is re-based to the latest head. Regards, Hari Babu Fujitsu Australia inet_agg_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9652: inet types don't support min/max
On Thu, May 29, 2014 at 3:28 AM, Keith Fiske ke...@omniti.com wrote: On Sun, Mar 23, 2014 at 10:42 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Fri, Mar 21, 2014 at 5:17 PM, dar...@dons.net.au wrote: The following bug has been logged on the website: reclog= select * from foo; bar - 1.2.3.4 (1 row) reclog= select min(bar) from foo; ERROR: function min(inet) does not exist LINE 1: select min(bar) from foo; ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. This is surprising to me since the inet type is ordered, hence min/max are possible. In the code, some comparison logic for the inet datatypes already present. With those I thought it is easy to support the min and max aggregates also. So I modified the code to support the aggregate functions of min and max by using the already existing inet comparison logic. Is there anything I am missing? postgres=# create table tbl(f inet); CREATE TABLE postgres=# insert into tbl values('1.2.3.5'); INSERT 0 1 postgres=# insert into tbl values('1.2.3.4'); INSERT 0 1 postgres=# select min(f) from tbl; min - 1.2.3.4 (1 row) postgres=# select max(f) from tbl; max - 1.2.3.5 (1 row) Patch is attached. This is my first time reviewing a patch, so apologies if I've missed something in the process. I tried applying your patch to the current git HEAD as of 2014-05-27 and the portion against pg_aggregate.h fails postgres $ patch -Np1 -i ../inet_agg.patch patching file src/backend/utils/adt/network.c Hunk #1 succeeded at 471 (offset -49 lines). patching file src/include/catalog/pg_aggregate.h Hunk #1 FAILED at 140. Hunk #2 FAILED at 162. 2 out of 2 hunks FAILED -- saving rejects to file src/include/catalog/pg_aggregate.h.rej patching file src/include/catalog/pg_proc.h Hunk #1 succeeded at 2120 (offset 8 lines). Hunk #2 succeeded at 3163 (offset 47 lines). Hunk #3 succeeded at 3204 (offset 47 lines). patching file src/include/utils/builtins.h Hunk #1 succeeded at 907 (offset 10 lines). Looks like starting around April 12th some additional changes were made to the DATA lines in this file that have to be accounted for. https://github.com/postgres/postgres/commits/master/src/include/catalog/pg_aggregate.h Don't have the knowledge of how to fix this for the new format, but thought I'd at least try to apply the patch and see if it works. Thanks for verifying the patch. Please find the re-based patch attached in the mail. Regards, Hari Babu Fujitsu Australia inet_agg_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Tue, May 13, 2014 at 3:33 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, May 11, 2014 at 7:30 PM, Simon Riggs si...@2ndquadrant.com wrote: On 30 August 2013 04:55, Fujii Masao masao.fu...@gmail.com wrote: My idea is very simple, just compress FPW because FPW is a big part of WAL. I used pglz_compress() as a compression method, but you might think that other method is better. We can add something like FPW-compression-hook for that later. The patch adds new GUC parameter, but I'm thinking to merge it to full_page_writes parameter to avoid increasing the number of GUC. That is, I'm thinking to change full_page_writes so that it can accept new value 'compress'. * Result [tps] 1386.8 (compress_backup_block = off) 1627.7 (compress_backup_block = on) [the amount of WAL generated during running pgbench] 4302 MB (compress_backup_block = off) 1521 MB (compress_backup_block = on) Compressing FPWs definitely makes sense for bulk actions. I'm worried that the loss of performance occurs by greatly elongating transaction response times immediately after a checkpoint, which were already a problem. I'd be interested to look at the response time curves there. Yep, I agree that we should check how the compression of FPW affects the response time, especially just after checkpoint starts. I was thinking about this and about our previous thoughts about double buffering. FPWs are made in foreground, so will always slow down transaction rates. If we could move to double buffering we could avoid FPWs altogether. Thoughts? If I understand the double buffering correctly, it would eliminate the need for FPW. But I'm not sure how easy we can implement the double buffering. There is already a patch on the double buffer write to eliminate the FPW. But It has some performance problem because of CRC calculation for the entire page. http://www.postgresql.org/message-id/1962493974.656458.1327703514780.javamail.r...@zimbra-prod-mbox-4.vmware.com I think this patch can be further modified with a latest multi core CRC calculation and can be used for testing. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Mon, May 5, 2014 at 1:09 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: I modified the autovac_balance_cost function to balance the costs using the number of running workers, instead of default vacuum cost parameters. Lets assume there are 4 workers running currently with default cost values of limit 200 and delay 20ms. The cost will be distributed as 50 and 10ms each. Suppose if one worker is having a different cost limit value as 1000, which is 5 times more than default value. The cost will be distributed as 50 and 10ms each for other 3 workers and 250 and 10ms for the worker having cost limit value other than default. By this way also it still ensures the cost limit value is 5 times more than other workers. Won't this change break the basic idea of autovacuum_vacuum_cost_limit which is as follows: Note that the value is distributed proportionally among the running autovacuum workers, if there is more than one, so that the sum of the limits of each worker never exceeds the limit on this variable.. It is not breaking the behavior. This setting can be overridden for individual tables by changing storage parameters. Still the cost values for the default tables are under the guc limit. Basically with proposed change, the sum of limits of each worker will be more than autovacuum_vacuum_cost_limit and I think main reason for same is that the new calculation doesn't consider autovacuum_vacuum_cost_limit or other similar parameters. If user doesn't provide any table specific value then autovacuum_vacuum_cost_limit guc value is set to the table. So the same is used in the calculation. I think current calculation gives appropriate consideration for table level vacuum settings when autovacuum_vacuum_cost_limit is configured with more care (i.e it is more than table level settings). As an example consider the below case: autovacuum_vacuum_cost_limit = 1 t1 (autovacuum_vacuum_cost_limit = 1000) t2 (default) t3 (default) t4 (default) Consider other settings as Default. Now cost_limit after autovac_balance_cost is as follows: Worker-1 for t1 = 322 Worker-2 for t2 = 3225 Worker-3 for t3 = 3225 Worker-4 for t3 = 3225 So in this way proper consideration has been given to table level vacuum settings and guc configured for autovacuum_vacuum_cost_limit with current code. It works for the case where the table specific values less than the default cost limit. The same logic doesn't work with higher values. Usually the table specific values are more than default values to the tables where the faster vacuuming is expected. Now it might be the case that we want to improve current calculation for cases where it doesn't work well, but I think it has to be better than current behaviour and it is better to consider both guc's and table level settings with some better formula. With the proposed change, it works for both fine whether the table specific value is higher or lower to the default value. It works on the factor of the difference between the default value to the table specific value. default autovacuum_vacuum_cost_limit = 1 t1 - 1000, t2 - default, t3 - default, t4 - default -- balance costs t1 - 250, t2 - 2500, t3 - 2500, t4 - 2500. t1 - 2, t2 - default, t3 - default, t4 - default -- balance costs t1 - 5000, t2 - 2500, t3 - 2500, t4 - 2500. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Fri, Apr 11, 2014 at 7:44 AM, Bruce Momjian br...@momjian.us wrote: Can someone with Windows expertise comment on whether this should be applied? I tested the same in windows and it is working as specified. The same background running server can be closed with ctrl+break command. --- On Tue, Jan 7, 2014 at 12:44:33PM +0100, Christian Ullrich wrote: Hello all, when pg_ctl start is used to run PostgreSQL in a console window on Windows, it runs in the background (it is terminated by closing the window, but that is probably inevitable). There is one problem, however: The first Ctrl-C in that window, no matter in which situation, will cause the background postmaster to exit. If you, say, ping something, and press Ctrl-C to stop ping, you probably don't want the database to go away, too. The reason is that Windows delivers the Ctrl-C event to all processes using that console, not just to the foreground one. Here's a patch to fix that. pg_ctl stop still works, and it has no effect when running as a service, so it should be safe. It starts the postmaster in a new process group (similar to calling setpgrp() after fork()) that does not receive Ctrl-C events from the console window. -- Christian diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index 50d4586..89a9544 *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** CreateRestrictedProcess(char *cmd, PROCE *** 1561,1566 --- 1561,1567 HANDLE restrictedToken; SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY}; SID_AND_ATTRIBUTES dropSids[2]; + DWORD flags; /* Functions loaded dynamically */ __CreateRestrictedToken _CreateRestrictedToken = NULL; *** CreateRestrictedProcess(char *cmd, PROCE *** 1636,1642 AddUserToTokenDacl(restrictedToken); #endif ! r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, CREATE_SUSPENDED, NULL, NULL, si, processInfo); Kernel32Handle = LoadLibrary(KERNEL32.DLL); if (Kernel32Handle != NULL) --- 1637,1650 AddUserToTokenDacl(restrictedToken); #endif ! flags = CREATE_SUSPENDED; ! ! /* Protect console process from Ctrl-C */ ! if (!as_service) { ! flags |= CREATE_NEW_PROCESS_GROUP; ! } ! ! r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, flags, NULL, NULL, si, processInfo); Kernel32Handle = LoadLibrary(KERNEL32.DLL); if (Kernel32Handle != NULL) Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Fri, Apr 11, 2014 at 12:12 PM, Bruce Momjian br...@momjian.us wrote: On Fri, Apr 11, 2014 at 11:58:58AM +1000, Haribabu Kommi wrote: On Fri, Apr 11, 2014 at 7:44 AM, Bruce Momjian br...@momjian.us wrote: Can someone with Windows expertise comment on whether this should be applied? I tested the same in windows and it is working as specified. The same background running server can be closed with ctrl+break command. OK. If I apply this, are you able to test to see if the problem is fixed? I already tested in HEAD by applying the attached patch in the earlier mail. with ctrl+c command the background process is not closed. But with ctrl+break command the same can be closed. if this behavior is fine, then we can apply patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Mon, Mar 17, 2014 at 11:45 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, The attached patches are revised ones according to the latest custom-plan interface patch (v11). The cache-scan module was re-implemented on the newer interface, and also I noticed the extension does not handle the tuples being redirected correctly, So, I revised the logic in ccache_vacuum_page() totally. It now becomes to synchronize the cached tuples per page, not per tuple, basic and tries to merge t-tree chunks per page basis also. Also, I split the patches again because *demonstration* part is much larger than the patches to the core backend. It will help reviewing. * pgsql-v9.4-vacuum_page_hook.v11.patch - It adds a hook for each page being vacuumed; that needs to synchronize the status of in-memory cache managed by extension. * pgsql-v9.4-mvcc_allows_cache.v11.patch - It allows to run HeapTupleSatisfiesVisibility() towards the tuples on the in-memory cache, not on the heap. * pgsql-v9.4-example-cache_scan.v11.patch - It demonstrates the usage of above two patches. It allows to scan a relation without storage access if possible. All the patches are good. The cache scan extension patch may need further refinement in terms of performance improvement but the same can be handled later also. So I am marking the patch as ready for committer. Thanks for the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Wed, Mar 12, 2014 at 5:26 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Thanks for your efforts! Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. First of all, it seems to me your test case has too small data set that allows to hold all the data in memory - briefly 500K of 200bytes record will consume about 100MB. Your configuration allocates 512MB of shared_buffer, and about 3GB of OS-level page cache is available. (Note that Linux uses free memory as disk cache adaptively.) Thanks for the information and a small correction. The Total number of records are 5 million. The select operation is selecting 500K records. The total table size is around 1GB. Once I get your new patch re-based on the custom scan patch, I will test the performance again by increasing my database size more than the RAM size. And also I will make sure that memory available for disk cache is less. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty
On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote: Enclosed is the patch to implement the requirement that issue log message to suggest VACUUM FULL if a table is nearly empty. The requirement comes from the Postgresql TODO list. I think it would be better if we can use some existing stats to issue warning message rather than traversing the FSM for all pages. For example after vacuuming page in lazy_scan_heap(), we update the freespace for page. You can refer below line in lazy_scan_heap(). freespace = PageGetHeapFreeSpace(page); Now it might be possible that we might not get freespace info easily as it is not accumulated for previous vacuum's. Incase there is no viable way to get it through vacuum stats, we are already updating fsm after vacuum by FreeSpaceMapVacuum(), where I think it should be possible to get freespace. yes this way it works without extra penalty. But the problem is how to calculate the free space which is left in the skipped pages because of visibility bit. One way could be by extrapolating (vac_estimate_reltuples) like we do for some other stats, but not sure if we can get the correct estimates. The main reason is that if you observe that code path, all the decisions are mainly done on the basis of vacrelstats. I have not checked in detail if by using any other stats, this purpose can be achieved, may be once you can look into it. I checked the vac_estimate_reltuples() function, but not able to find a proper way to identify the free space. By the way have you checked if FreeSpaceMapVacuum() can serve your purpose, because this call already traverses FSM in depth-first order to update the freespace. So may be by using this call or wrapper on this such that it returns total freespace as well apart from updating freespace can serve the need. Thanks for information. we can get the table free space by writing some wrapper or modify a little bit of FreeSpaceMapVacuum() function. This way it will not add any extra overhead in identifying the table is almost empty or not. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: I will update you later regarding the performance test results. I ran the performance test on the cache scan patch and below are the readings. Configuration: Shared_buffers - 512MB cache_scan.num_blocks - 600 checkpoint_segments - 255 Machine: OS - centos - 6.4 CPU - 4 core 2.5 GHZ Memory - 4GB Head patched Diff Select - 500K772ms2659ms-200% Insert - 400K 3429ms 1948ms 43% (I am not sure how it improved in this case) delete - 200K 2066ms 3978ms-92% update - 200K3915ms 5899ms-50% This patch shown how the custom scan can be used very well but coming to patch as It is having some performance problem which needs to be investigated. I attached the test script file used for the performance test. Regards, Hari Babu Fujitsu Australia shared_buffers = 512MB shared_preload_libraries = 'cache_scan' cache_scan.num_blocks = 600 checkpoint_segments - 255 \timing --cache scan select 5 million create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR ROW EXECUTE PROCEDURE cache_scan_synchronizer(); CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT EXECUTE PROCEDURE cache_scan_synchronizer(); vacuum analyze test; explain select count(*) from test where f1 0; select count(*) from test where f1 0; select count(*) from test where f1 0; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; --sequence scan select 5 million create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; vacuum analyze test; set cache_scan.enabled = off; explain select count(*) from test where f1 0; select count(*) from test where f1 0; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; --cache scan select 500K create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; CREATE TRIGGER test_cache_row_sync AFTER INSERT OR UPDATE OR DELETE ON test FOR ROW EXECUTE PROCEDURE cache_scan_synchronizer(); CREATE TRIGGER test_cache_stmt_sync AFTER TRUNCATE ON test FOR STATEMENT EXECUTE PROCEDURE cache_scan_synchronizer(); vacuum analyze test; explain select count(*) from test where f1 % 10 = 5; select count(*) from test where f1 % 10 = 5; select count(*) from test where f1 % 10 = 5; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; --sequence scan select 500K create table test(f1 int, f2 char(70), f3 float, f4 char(100)); truncate table test; insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); checkpoint; vacuum analyze test; set cache_scan.enabled = off; explain select count(*) from test where f1 % 10 = 5; select count(*) from test where f1 % 10 = 5; delete from test where f1 100 and f1 = 120; update test set f1 = f1 + 300 where f1 120 and f1 = 140; insert into test values (generate_series(101, 140), 'fujitsu', 1.1, 'Australia software tech pvt ltd'); drop table test cascade; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issue log message to suggest VACUUM FULL if a table is nearly empty
On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing ji...@fast.au.fujitsu.com wrote: Enclosed is the patch to implement the requirement that issue log message to suggest VACUUM FULL if a table is nearly empty. The requirement comes from the Postgresql TODO list. [Solution details] A check function is added in the function 'lazy_vacuum_rel' to check if the table is large enough and contains large numbers of unused rows. If it is then issue a log message that suggesting using 'VACUUM FULL' on the table. The judgement policy is as following: If the relpage of the table RELPAGES_VALUES_THRESHOLD(default 1000) then the table is considered to be large enough. If the free_space/total_space FREESPACE_PERCENTAGE_THRESHOLD(default 0.5) then the table is considered to have large numbers of unused rows. The free_space is calculated by reading the details from the FSM pages. This may increase the IO, but expecting very less FSM pages thus it shouldn't cause I think it would be better if we can use some existing stats to issue warning message rather than traversing the FSM for all pages. For example after vacuuming page in lazy_scan_heap(), we update the freespace for page. You can refer below line in lazy_scan_heap(). freespace = PageGetHeapFreeSpace(page); Now it might be possible that we might not get freespace info easily as it is not accumulated for previous vacuum's. Incase there is no viable way to get it through vacuum stats, we are already updating fsm after vacuum by FreeSpaceMapVacuum(), where I think it should be possible to get freespace. yes this way it works without extra penalty. But the problem is how to calculate the free space which is left in the skipped pages because of visibility bit. In a normal scenario, the pages which are getting skipped during vacuum process are less in number means then this approach is a good choice. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Mar 4, 2014 at 3:07 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: 4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid); + if (pchunk != NULL pchunk != cchunk) + ccache_merge_chunk(ccache, pchunk); + pchunk = cchunk; The merge_chunk is called only when the heap tuples are spread across two cache chunks. Actually one cache chunk can accommodate one or more than heap pages. it needs some other way of handling. I adjusted the logic to merge the chunks as follows: Once a tuple is vacuumed from a chunk, it also checks whether it can be merged with its child leafs. A chunk has up to two child leafs; left one has less ctid that the parent, and right one has greater ctid. It means a chunk without right child in the left sub-tree or a chunk without left child in the right sub-tree are neighbor of the chunk being vacuumed. In addition, if vacuumed chunk does not have either (or both) of children, it can be merged with parent node. I modified ccache_vacuum_tuple() to merge chunks during t-tree walk-down, if vacuumed chunk has enough free space. Patch looks good. Regarding merging of the nodes, instead of checking whether merge is possible or not for every tuple which is vacuumed, can we put some kind of threshold as whenever the node is 50% free then try to merge it from leaf nodes until 90% is full. The rest of the 10% will be left for the next inserts on the node. what do you say? I will update you later regarding the performance test results. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] The behavior of CheckRequiredParameterValues()
On Wed, Mar 5, 2014 at 4:09 AM, Sawada Masahiko sawada.m...@gmail.comwrote: Hi all, I had doubts regarding behavior of CheckRequiredParameterValues() function. I could not start standby server which is created by pg_basebackup with following scenario. 1. Start the master server with 'wal_level = archve' , 'hot_standby = on' and other settings of replication. 2. Create the standby server from the master server by using pg_basebackup. 3. Change the wal_level value of both master and standby server to 'hot_standby'. 4. Restarting the master server. 5. Starting the standby server. In #5, I got following error even if I set wal_level to 'hot_standby'. FATAL: hot standby is not possible because wal_level was not set to hot_standby or higher on the master server I tried to investigate this behaviour. Currently CheckRequiredParameterValues() function uses wal_level value which is got from ControlFile when comparing between wal_level and WAL_LEVEL_HOT_STANDBY as following code. xlog.c:6177 if (ControlFile-wal_level WAL_LEVEL_HOT_STANDBY) ereport(ERROR, (errmsg(hot standby is not possible because wal_level was not So we have to start and stop standby server with changed wal_level(i.g., hot_standby) if we want to enable hot standby. In this case, I think that the standby server didn't need to confirm wal_level value of ControlFile. I think that it should confirm value which is written in postgreql.conf. The snapshot of running transaction information is written to WAL only when the wal_level is set to 'hot_standby'. This information is required on the standby side to recreate the running transactions. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Feb 25, 2014 at 11:13 AM, Haribabu Kommi kommi.harib...@gmail.comwrote: Thanks for the information, I will apply other patches also and start testing. When try to run the pgbench test, by default the cache-scan plan is not chosen because of more cost. So I increased the cpu_index_tuple_cost to a maximum value or by turning off index_scan, so that the plan can chose the cache_scan as the least cost. The configuration parameters changed during the test are, shared_buffers - 2GB, cache_scan.num_blocks - 1024 wal_buffers - 16MB, checkpoint_segments - 255 checkpoint_timeout - 15 min, cpu_index_tuple_cost - 10 or enable_indexscan=off Test procedure: 1. Initialize the database with pgbench with 75 scale factor. 2. Create the triggers on pgbench_accounts 3. Use a select query to load all the data into cache. 4. Run a simple update pgbench test. Plan details of pgbench simple update queries: postgres=# explain update pgbench_accounts set abalance = abalance where aid = 10; QUERY PLAN --- Update on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) - Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..18.44 rows=1 width=103) Index Cond: (aid = 10) Planning time: 0.045 ms (4 rows) postgres=# explain select abalance from pgbench_accounts where aid = 10; QUERY PLAN Custom Scan (cache scan) on pgbench_accounts (cost=0.00..99899.99 rows=1 width=4) Filter: (aid = 10) Planning time: 0.042 ms (3 rows) I am observing a too much delay in performance results. The performance test script is attached in the mail. please let me know if you find any problem in the test. Regards, Hari Babu Fujitsu Australia run_reading.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Tue, Feb 25, 2014 at 10:44 AM, Kouhei Kaigai kai...@ak.jp.nec.comwrote: Thanks for your testing, Getting some compilation warnings while compiling the extension and also I am not able to load the extension because of undefined symbol get_restriction_qual_cost. It seems to me you applied only part-1 portion of the custom-scan patches. The get_restriction_qual_cost() is re-defined as extern function, from static one, on the part-2 portion (ctidscan) to estimate cost value of the qualifiers in extension. Also, bms_to_string() and bms_from_string() are added on the part-3 portion (postgres_fdw) portion to carry a bitmap-set according to the copyObject manner. It may make sense to include the above supplemental changes in the cache- scan feature also, until either of them getting upstreamed. Thanks for the information, I will apply other patches also and start testing. Regards, Hari Babu Fujitsu Australia
[HACKERS] Bit data type header reduction in some cases
Hi, It's regarding a Todo item of Bit data type header reduction in some cases. The header contains two parts. 1) The varlena header is automatically converted to 1 byte header from 4 bytes in case of small data. 2) The bit length header called bit_len to store the actual bit length which is of 4 bytes in size. I want to reduce this bit_len size to 1 byte in some cases as similar to varlena header. With this change the size of the column reduced by 3 bytes, thus shows very good decrease in disk usage. I am planning to the change it based on the total length of bits data. If the number of bits is less than 0x7F then one byte header can be chosen instead of 4 byte header. During reading of the data, the similar way it can be calculated. The problem I am thinking is, as it can cause problems to old databases having bit columns when they upgrade to a newer version without using pg_dumpall. Is there anyway to handle this problem? Or Is there any better way to handle the problem itself? please let me know your suggestions. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Fri, Feb 21, 2014 at 2:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Hello, The attached patch is a revised one for cache-only scan module on top of custom-scan interface. Please check it. Thanks for the revised patch. Please find some minor comments. 1. memcpy(dest, tuple, HEAPTUPLESIZE); + memcpy((char *)dest + HEAPTUPLESIZE, + tuple-t_data, tuple-t_len); For a normal tuple these two addresses are different but in case of ccache, it is a continuous memory. Better write a comment as even if it continuous memory, it is treated as different only. 2. + uint32 required = HEAPTUPLESIZE + MAXALIGN(tuple-t_len); t_len is already maxaligned. No problem of using it again, The required length calculation is differing function to function. For example, in below part of the same function, the same t_len is used directly. It didn't generate any problem, but it may give some confusion. 4. + cchunk = ccache_vacuum_tuple(ccache, ccache-root_chunk, ctid); + if (pchunk != NULL pchunk != cchunk) + ccache_merge_chunk(ccache, pchunk); + pchunk = cchunk; The merge_chunk is called only when the heap tuples are spread across two cache chunks. Actually one cache chunk can accommodate one or more than heap pages. it needs some other way of handling. 4. for (i=0; i 20; i++) Better to replace this magic number with a meaningful macro. 5. columner is present in sgml file. correct it. 6. max_cached_attnum value in the document saying as 128 by default but in the code it set as 256. I will start regress and performance tests. I will inform you the same once i finish. Regards, Hari Babu Fujitsu Australia
[HACKERS] Priority table or Cache table
Hi, I want to propose a new feature called priority table or cache table. This is same as regular table except the pages of these tables are having high priority than normal tables. These tables are very useful, where a faster query processing on some particular tables is expected. The same faster query processing can be achieved by placing the tables on a tablespace of ram disk. In this case there is a problem of data loss in case of system shutdown. To avoid this there is a need of continuous backup of this tablespace and WAL files is required. The priority table feature will solve these problems by providing the similar functionality. User needs a careful decision in deciding how many tables which require a faster access, those can be declared as priority tables and also these tables should be in small in both number of columns and size. New syntax: create [priority] Table ...; or Create Table .. [ buffer_pool = priority | default ]; By adding a new storage parameter of buffer_pool to specify the type of buffer pool this table can use. The same can be extended for index also. Solution -1: This solution may not be a proper one, but it is simple. So while placing these table pages into buffer pool, the usage count is changed to double max buffer usage count instead of 1 for normal tables. Because of this reason there is a less chance of these pages will be moved out of buffer pool. The queries which operates on these tables will be faster because of less I/O. In case if the tables are not used for a long time, then only the first query on the table will be slower and rest of the queries are faster. Just for test, a new bool member can be added to RELFILENODE structure to indicate the table type is priority or not. Using this while loading the page the usage count can be modified. The pg_buffercache output of a priority table: postgres=# select * from pg_buffercache where relfilenode=16385; bufferid | relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | isdirty | usagecount ---+---+---+-++-+-+ 270 |16385 | 1663 | 12831 | 0 | 0 |t | 10 Solution - 2: By keeping an extra flag in the buffer to know whether the buffer is used for a priority table or not? By using this flag while replacing a buffer used for priority table some extra steps needs to be taken care like 1. Only another page of priority table can replace this priority page. 2. Only after at least two complete cycles of clock sweep, a normal table page can replace this. In this case the priority buffers are present in memory for long time as similar to the solution-1, but not guaranteed always. Solution - 3: Create an another buffer pool called priority buffer pool similar to shared buffer pool to place the priority table pages. A new guc parameter called priority_buffers can be added to the get the priority buffer pool size from the user. The Maximum limit of these buffers can be kept smaller value to make use of it properly. As an extra care, whenever any page needs to move out of the priority buffer pool a warning is issued, so that user can check whether the configured the priority_buffers size is small or the priority tables are grown too much as not expected? In this case all the pages are always loaded into memory thus the queries gets the faster processing. IBM DB2 have the facility of creating one more buffer pools and fixing specific tables and indexes into them. Oracle is also having a facility to specify the buffer pool option as keep or recycle. I am preferring syntax-2 and solution-3. please provide your suggestions/improvements. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Priority table or Cache table
On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Haribabu Kommi kommi.harib...@gmail.com writes: I want to propose a new feature called priority table or cache table. This is same as regular table except the pages of these tables are having high priority than normal tables. These tables are very useful, where a faster query processing on some particular tables is expected. Why exactly does the existing LRU behavior of shared buffers not do what you need? Lets assume a database having 3 tables, which are accessed regularly. The user is expecting a faster query results on one table. Because of LRU behavior which is not happening some times. So if we just separate those table pages into an another buffer pool then all the pages of that table resides in memory and gets faster query processing. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Priority table or Cache table
On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila amit.kapil...@gmail.comwrote: On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: I want to propose a new feature called priority table or cache table. This is same as regular table except the pages of these tables are having high priority than normal tables. These tables are very useful, where a faster query processing on some particular tables is expected. Why exactly does the existing LRU behavior of shared buffers not do what you need? Lets assume a database having 3 tables, which are accessed regularly. The user is expecting a faster query results on one table. Because of LRU behavior which is not happening some times. I think this will not be a problem for regularly accessed tables(pages), as per current algorithm they will get more priority before getting flushed out of shared buffer cache. Have you come across any such case where regularly accessed pages get lower priority than non-regularly accessed pages? Because of other regularly accessed tables, some times the table which expects faster results is getting delayed. However it might be required for cases where user wants to control such behaviour and pass such hints through table level option or some other way to indicate that he wants more priority for certain tables irrespective of their usage w.r.t other tables. Now I think here important thing to find out is how much helpful it is for users or why do they want to control such behaviour even when Database already takes care of such thing based on access pattern. Yes it is useful in cases where the application always expects the faster results whether the table is used regularly or not. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On Feb 15, 2014 9:19 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Haribabu Kommi escribió: I changed the balance cost calculations a little bit to give priority to the user provided per table autovacuum parameters. If any user specified per table vacuum parameters exists and those are different with guc vacuum parameters then the balance cost calculations will not include that worker in calculation. Only the cost is distributed between other workers with specified guc vacuum cost parameter. The problem in this calculation is if the user provides same guc values to the per table values also then it doesn't consider them in calculation. I think this is a strange approach to the problem, because if you configure the backends just so, they are completely ignored instead of being adjusted. And this would have action-at-a-distance consequences because if you change the defaults in postgresql.conf you end up with completely different behavior on the tables for which you have carefully tuned the delay so that they are ignored in rebalance calculations. I think that rather than ignoring some backends completely, we should be looking at how to weight the balancing calculations among all the backends in some smart way that doesn't mean they end up with the default values of limit, which AFAIU is what happens now -- which is stupid. Not real sure how to do that, perhaps base it on the globally-configured ratio of delay/limit vs. the table-specific ratio. What I mean is that perhaps the current approach is all wrong and we need to find a better algorithm to suit this case and more generally. Of course, I don't mean to say that it should behave completely differently than now in normal cases, only that it shouldn't give completely stupid results in corner cases such as this one. As an example, suppose that global limit=200 and global delay=20 (the defaults). Then we have a global ratio of 5. If all three tables being vacuumed currently are using the default values, then they all have ratio=5 and therefore all should have the same limit and delay settings applied after rebalance. Now, if two tables have ratio=5 and one table has been configured to have a very fast vacuum, that is limit=1, then ratio for that table is 1/20=500. Therefore that table should be configured, after rebalance, to have a limit and delay that are 100 times faster than the settings for the other two tables. (And there is a further constraint that the total delay per limit unit should be so-and-so to accomodate getting the correct total delay per limit unit.) I haven't thought about how to code that, but I don't think it should be too difficult. Want to give it a try? I think it makes sense to modify both the running delay and the running limit to achieve whatever ratio we come up with, except that delay should probably not go below 10ms because, apparently, some platforms have that much of sleep granularity and it wouldn't really work to have a smaller delay. Am I making sense? Yes makes sense and it's a good approach also not leaving the delay parameter as is. Thanks I will give a try. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Thu, Feb 13, 2014 at 2:42 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2014-02-12 14:59 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com: 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. It shouldn't happen, because the first or second ItemPointerCompare will handle the condition. Please assume the cchunk-ntups == 1. In this case, any given ctid shall match either of them, because any ctid is less, equal or larger to the tuple being only cached, thus, it moves to the right or left node according to the scan direction. yes you are correct. sorry for the noise. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared memory it can cause problems. For design simplification, I put a giant lock per columnar-cache. So, routines in cscan.c acquires exclusive lwlock prior to invocation of ccache_insert_tuple / ccache_delete_tuple. Correct. But this lock can be a bottleneck for the concurrency. Better to analyze the same once we have the performance report. Regards, Hari Babu Fujitsu Australia
Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only table scan?)
On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Hello, Because of time pressure in the commit-fest:Jan, I tried to simplifies the patch for cache-only scan into three portions; (1) add a hook on heap_page_prune for cache invalidation on vacuuming a particular page. (2) add a check to accept InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only scan. (1) pgsql-v9.4-heap_page_prune_hook.v1.patch Once on-memory columnar cache is constructed, then it needs to be invalidated if heap page on behalf of the cache is modified. In usual DML cases, extension can get control using row-level trigger functions for invalidation, however, we right now have no way to get control on a page is vacuumed, usually handled by autovacuum process. This patch adds a callback on heap_page_prune(), to allow extensions to prune dead entries on its cache, not only heap pages. I'd also like to see any other scenario we need to invalidate columnar cache entries, if exist. It seems to me object_access_hook makes sense to conver DDL and VACUUM FULL scenario... (2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch In case when we want to check visibility of the tuples on cache entries (thus no particular shared buffer is associated) using HeapTupleSatisfiesVisibility, it internally tries to update hint bits of tuples. However, it does not make sense onto the tuples being not associated with a particular shared buffer. Due to its definition, tuple entries being on cache does not connected with a particular shared buffer. If we need to load whole of the buffer page to set hint bits, it is totally nonsense because the purpose of on-memory cache is to reduce disk accesses. This patch adds an exceptional condition on SetHintBits() to skip anything if the given buffer is InvalidBuffer. It allows to check tuple visibility using regular visibility check functions, without re-invention of the wheel by themselves. (3) pgsql-v9.4-contrib-cache-scan.v1.patch Unlike (1) and (2), this patch is just a proof of the concept to implement cache- only scan on top of the custom-scan interface. It tries to offer an alternative scan path on the table with row-level triggers for cache invalidation if total width of referenced columns are less than 30% of the total width of table definition. Thus, it can keep larger number of records with meaningful portion on the main memory. This cache shall be invalidated according to the main heap update. One is row-level trigger, second is object_access_hook on DDL, and the third is heap_page_prune hook. Once a columns reduced tuple gets cached, it is copied to the cache memory from the shared buffer, so it needs a feature to ignore InvalidBuffer for visibility check functions. I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches are fine. And I have comments in the third patch related to cache scan. 1. +# contrib/dbcache/Makefile Makefile header comment is not matched with file name location. 2.+ /* + * Estimation of average width of cached tuples - it does not make + * sense to construct a new cache if its average width is more than + * 30% of the raw data. + */ Move the estimation of average width calculation of cached tuples into the case where the cache is not found, otherwise it is an overhead for cache hit scenario. 3. + if (old_cache) + attrs_used = bms_union(attrs_used, old_cache-attrs_used); can't we need the check to see the average width is more than 30%? During estimation it doesn't include the existing other attributes. 4. + lchunk-right = cchunk; + lchunk-l_depth = TTREE_DEPTH(lchunk-right); I think it should be lchunk-r_depth needs to be set in a clock wise rotation. 5. can you add some comments in the code with how the block is used? 6. In do_insert_tuple function I felt moving the tuples and rearranging their addresses is little bit costly. How about the following way? Always insert the tuple from the bottom of the block where the empty space is started and store their corresponding reference pointers in the starting of the block in an array. As and when the new tuple inserts this array increases from block start and tuples from block end. Just need to sort this array based on item pointers, no need to update their reference pointers. In this case the movement is required only when the tuple is moved from one block to another block and also whenever if the continuous free space is not available to insert the new tuple. you can decide based on how frequent the sorting will happen in general. 7. In ccache_find_tuple function this Assert(i_min + 1 cchunk-ntups); can go wrong when only one tuple present in the block with the equal item pointer what we are searching in the forward scan direction. 8. I am not able to find a protection mechanism in insert/delete and etc of a tuple in Ttree. As this is a shared
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Sat, Feb 8, 2014 at 12:10 PM, Peter Eisentraut pete...@gmx.net wrote: On 1/29/14, 7:37 PM, Haribabu Kommi wrote: On Tue, Jan 28, 2014 at 1:17 PM, Peter Eisentraut wrote: On 11/30/13, 6:59 AM, Haribabu kommi wrote: To detect provided data and xlog directories are same or not, I reused the Existing make_absolute_path() code as follows. I note that initdb does not detect whether the data and xlog directories are the same. I think there is no point in addressing this only in pg_basebackup. If we want to forbid it, it should be done in initdb foremost. Thanks for pointing it. if the following approach is fine for identifying the identical directories then I will do the same for initdb also. I wouldn't bother. It's a lot of work for little benefit. Any mistake a user would make can easily be fixed. I also felt a lot of work for little benefit but as of now I am not able to find an easier solution to handle this problem. can we handle the same later if it really requires? -- Regards, Hari Babu Fujitsu Australia Software Technology
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Tue, Jan 28, 2014 at 1:17 PM, Peter Eisentraut wrote: On 11/30/13, 6:59 AM, Haribabu kommi wrote: To detect provided data and xlog directories are same or not, I reused the Existing make_absolute_path() code as follows. I note that initdb does not detect whether the data and xlog directories are the same. I think there is no point in addressing this only in pg_basebackup. If we want to forbid it, it should be done in initdb foremost. Thanks for pointing it. if the following approach is fine for identifying the identical directories then I will do the same for initdb also. I'm not sure it's worth the trouble, but if I were to do it, I'd just stat() the two directories and compare their inodes. That seems much easier and more robust than comparing path strings stat() is having problems in windows, because of that reason the patch is written to identify the directories with string comparison. Regards, Hari Babu
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 20 December 2013 02:02 Bruce Momjian wrote: On Thu, Dec 19, 2013 at 05:14:50AM +, Haribabu kommi wrote: On 19 December 2013 05:31 Bruce Momjian wrote: On Wed, Dec 11, 2013 at 10:22:32AM +, Haribabu kommi wrote: The make_absolute_path() function moving to port is changed in similar way as Bruce Momjian approach. The psprintf is used to store the error string which Occurred in the function. But psprintf is not used for storing the absolute path As because it is giving problems in freeing the allocated memory in SelectConfigFiles. Because the same memory is allocated in a different code branch from guc_malloc. After adding the make_absolute_path() function with psprintf stuff in path.c file It is giving linking problem in compilation of ecpg. I am not able to find the problem. So I added another file abspath.c in port which contains these two functions. What errors are you seeing? If I move the make_absolute_path function from abspath.c to path.c, I was getting following linking errors while compiling ecpg. ../../../../src/port/libpgport.a(path.o): In function `make_absolute_path': /home/hari/postgres/src/port/path.c:795: undefined reference to `psprintf' /home/hari/postgres/src/port/path.c:809: undefined reference to `psprintf' /home/hari/postgres/src/port/path.c:818: undefined reference to `psprintf' /home/hari/postgres/src/port/path.c:830: undefined reference to `psprintf' collect2: ld returned 1 exit status make[4]: *** [ecpg] Error 1 make[3]: *** [all-preproc-recurse] Error 2 make[2]: *** [all-ecpg-recurse] Error 2 make[1]: *** [all-interfaces-recurse] Error 2 make: *** [all-src-recurse] Error 2 You didn't show the actual command that is generating the error, but I assume it is linking ecpg, not creating libecpg. I think the issue is that path.c is specially handled when it is included in libecpg. Here is a comment from the libecpg Makefile: # We use some port modules verbatim, but since we need to # compile with appropriate options to build a shared lib, we can't # necessarily use the same object files as the backend uses. Instead, # symlink the source files in here and build our own object file. My guess is that libecpg isn't marked as linking to libpgcommon, and when you called psprintf in path.c, it added a libpgcommon link requirement. My guess is that if you compiled common/psprintf.c like port/path.c in libecpg's Makefile, it would link fine. Sorry for not mentioning the command, yes it is giving problem while linking ecpg. From the compilation I observed as libpgcommon is linking while building ecpg. I tested the same by using psprintf directly in ecpg code. I modified the libecpg's Makefile as suggested by you which is attached in the mail, Still the errors are occurring. Please let me know is there anything missed? Regards, Hari babu. make_abs_path_v3.patch Description: make_abs_path_v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers