Re: [HACKERS] Index scan optimization
On 17 November 2014 16:05, Simon Riggs Wrote: I confirm 5% improvement in both short and long index scans, on the least beneficial datatype. Looks likely to be a very positive win overall. Thanks a lot for testing and confirmation. The language used in the comments is not clear enough. I'll do my best to improve that, then look to commit this in about 5 hours. Thanks a lot for support. Please let me know if I also need to add something. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 30 October 2014 08:43, Haribabu Kommi kommi.harib...@gmail.com wrote: I marked the patch as ready for committer. This looks very interesting. The value of the patch seems to come from skipping costly checks. Your performance test has a leading VARCHAR column, so in that specific case skipping the leading column is a big win. I don't see it would be in all cases. Can we see a few tests on similar things to check for other opportunities and regressions. * Single column bigint index * Multi-column bigint index * 5 column index with mixed text and integers The explanatory comments need some work to more clearly explain what this patch does. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On Mon, Oct 27, 2014 at 10:38 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 26 October 2014 10:42, Haribabu Kommi wrote: I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? Actually the keysCount can be more than one only if the key strategy is BTEqualStrategyNumber (i.e. = condition) But our optimization is only for the '' or '=' condition only. So adding code to set flag for all keys is of no use. Let me know if I am missing anything here? Thanks. I understood the point. If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Do you mean to say that whether we should add any special handling for volatile function? If yes then as you said since index scan itself is not selected for such functions, then I feel We don’t need to add anything for that. I also have the same opinion. I marked the patch as ready for committer. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 26 October 2014 10:42, Haribabu Kommi wrote: Hi, I reviewed index scan optimization patch, the following are the observations. - Patch applies cleanly. - Compiles without warnings - All regress tests are passed. There is a good performance gain with the patch in almost all scenarios. Thanks for reviewing. I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? Actually the keysCount can be more than one only if the key strategy is BTEqualStrategyNumber (i.e. = condition) But our optimization is only for the '' or '=' condition only. So adding code to set flag for all keys is of no use. Let me know if I am missing anything here? If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Do you mean to say that whether we should add any special handling for volatile function? If yes then as you said since index scan itself is not selected for such functions, then I feel We don’t need to add anything for that. Any opinion? Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On Tue, Sep 23, 2014 at 10:38 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 22 September 2014 19:17, Heikki Linnakangas wrote: On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. I have finished implementation of the discussed optimization. I got a performance improvement of around 30% on the schema and data shared in earlier mail. I also tested for the index scan case, where our optimization is not done and observed that there is no effect on those query because of this change. Change details: I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey structure), the value used for this 0x0004, which was unused. Inside the function _bt_first, once we finish finding the start scan position based on the first key, I am appending the flag SK_BT_MATCHED to the first key. Then in the function _bt_checkkeys, during the key comparison, I am checking if the key has SK_BT_MATCHED flag set, if yes then there is no need to further comparison. But if the tuple is having NULL value, then even if this flag is set, we will continue with further comparison (this handles the Heikki point of checking NULLs). Hi, I reviewed index scan optimization patch, the following are the observations. - Patch applies cleanly. - Compiles without warnings - All regress tests are passed. There is a good performance gain with the patch in almost all scenarios. I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Thanks for the patch. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 14:46, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. And also that the tests don't use volatile functions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 18:51, Stephen Frost Wrote: * Rajeev rastogi (rajeev.rast...@huawei.com) wrote: Thanks, I shall start to prepare a patch for this optimization and share in 1 or 2 days. This sounded interesting to me also- please be sure to put it on the open commitfest once you have posted the patch. Thanks, I have added it to next CommitFest i.e. October CommitFest. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 19:17, Heikki Linnakangas wrote: On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. I have finished implementation of the discussed optimization. I got a performance improvement of around 30% on the schema and data shared in earlier mail. I also tested for the index scan case, where our optimization is not done and observed that there is no effect on those query because of this change. Change details: I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey structure), the value used for this 0x0004, which was unused. Inside the function _bt_first, once we finish finding the start scan position based on the first key, I am appending the flag SK_BT_MATCHED to the first key. Then in the function _bt_checkkeys, during the key comparison, I am checking if the key has SK_BT_MATCHED flag set, if yes then there is no need to further comparison. But if the tuple is having NULL value, then even if this flag is set, we will continue with further comparison (this handles the Heikki point of checking NULLs). I will add this patch to the next CommitFest. Please let me know your feedback. Thanks and Regards, Kumar Rajeev Rastogi index_scan_opt.patch Description: index_scan_opt.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 09/22/2014 07:47 AM, Rajeev rastogi wrote: I have observed a scope of considerable performance improvement in-case of index by a very minor code change. Consider the below schema: create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); Query as: select count(*) from tbl2 where id2'a' and id399; As per current design, it takes following steps to retrieve index tuples: 1. Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2'a' 2. Then it fetches each tuples from position found in step-1. 3. For each tuple, it matches all scan key condition, in our example it matches both scan key condition. 4. If condition match, it returns the tuple otherwise scan stops. Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted). So comparison on first scan key condition again in step-3 seems to be redundant. So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. I would like to submit the patch for this improvement. Please provide your feedback. Also let me know if I am missing something. Yeah, sounds like a good idea. This scenario might not arise very often, but it should be cheap to check, so I doubt it will add any measurable overhead to the cases where the optimization doesn't help. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 22 September 2014 12:35, Heikki Linnakangas: I have observed a scope of considerable performance improvement in- case of index by a very minor code change. Consider the below schema: create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); Query as: select count(*) from tbl2 where id2'a' and id399; As per current design, it takes following steps to retrieve index tuples: 1. Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2'a' 2. Then it fetches each tuples from position found in step-1. 3. For each tuple, it matches all scan key condition, in our example it matches both scan key condition. 4. If condition match, it returns the tuple otherwise scan stops. Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted). So comparison on first scan key condition again in step-3 seems to be redundant. So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. Very much true. I would like to submit the patch for this improvement. Please provide your feedback. Also let me know if I am missing something. Yeah, sounds like a good idea. This scenario might not arise very often, but it should be cheap to check, so I doubt it will add any measurable overhead to the cases where the optimization doesn't help. Thanks, I shall start to prepare a patch for this optimization and share in 1 or 2 days. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
* Rajeev rastogi (rajeev.rast...@huawei.com) wrote: Thanks, I shall start to prepare a patch for this optimization and share in 1 or 2 days. This sounded interesting to me also- please be sure to put it on the open commitfest once you have posted the patch. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Index scan optimization
Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 09/22/2014 04:45 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 09/22/2014 07:47 AM, Rajeev rastogi wrote: So my proposal is to skip the condition check on the first scan key condition for every tuple. The same happens in a single-column case. If you have a query like SELECT * FROM tbl2 where id2 'a', once you've found the start position of the scan, you know that all the rows that follow match too. ... unless you're doing a backwards scan. Sure. And you have to still check for NULLs. Have to get the details right.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Index scan optimization
I have observed a scope of considerable performance improvement in-case of index by a very minor code change. Consider the below schema: create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); Query as: select count(*) from tbl2 where id2'a' and id399; As per current design, it takes following steps to retrieve index tuples: 1. Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2'a' 2. Then it fetches each tuples from position found in step-1. 3. For each tuple, it matches all scan key condition, in our example it matches both scan key condition. 4. If condition match, it returns the tuple otherwise scan stops. Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted). So comparison on first scan key condition again in step-3 seems to be redundant. So my proposal is to skip the condition check on the first scan key condition for every tuple. I have prototype code to see the result, Below are my some test data as per the query attached with this mail: Time: Original Code took:3621 ms Optimized Code took: 2576 ms So overall performance improvement is around 29%. Instruction: Original Code took:20057 Optimized Code took: 14557 So overall instruction improvement is around 27%. Also I observed that only for function varstr_cmp, around 661M instruction was taken, which is completely saved for optimized code. This optimization can be extended: 1. For a case where once all scan key matches the condition, no need to check condition for further tuples for any scan key. Just keep returning the tuple. 2. Above approach can be used for '' operator also by finding the position of last matching tuples. I would like to submit the patch for this improvement. Please provide your feedback. Also let me know if I am missing something. Thanks and Regards, Kumar Rajeev Rastogi --Schema create table tbl2(id1 int, id2 varchar(10), id3 int); create index idx2 on tbl2(id2, id3); --Procedure to insert 1M data: create or replace function insert_data(count1 int, count2 int) returns void AS $$ Begin for i IN 1..count1 loop insert into tbl2 values(i, 'a', i); end loop; for i IN count1+1..count2 loop insert into tbl2 values(i, 'b', i); end loop; End; $$ language plpgsql; select insert_data(99, 100); --Procedure to select data 1000 times (1000 times selected to make data more appropriate.) create or replace function select_data(count1 int) returns void AS $$ declare x int; Begin for i IN 1..count1 loop select count(*) into x from tbl2 where id2'a' and id399; end loop; End; $$ language plpgsql; select select_data(1000); --Result: OptimizedOriginal Reading-1 2579.27 3621.82 Reading-2 2573.82 3618.29 Reading-3 2575.08 3625.16 Average 2576.06 3621.76 Overall Improvement 29% Instruction Original Code took: 20057 M Insrtuction Optimized Code took:14557 M So overall instruction improvement is around 27%. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers