Re: [HACKERS] Index scan optimization

2014-11-17 Thread Rajeev rastogi
On 17 November 2014 16:05, Simon Riggs Wrote:

 I confirm 5% improvement in both short and long index scans, on the least 
 beneficial datatype. Looks likely to be a very positive win overall.

Thanks a lot for testing and confirmation.


 The language used in the comments is not clear enough. I'll do my best to 
 improve that, then look to commit this in about 5 hours.
Thanks a lot for support. Please let me know if I also need to add something.

Thanks and Regards,
Kumar Rajeev Rastogi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-11-16 Thread Simon Riggs
On 30 October 2014 08:43, Haribabu Kommi kommi.harib...@gmail.com wrote:

 I marked the patch as ready for committer.

This looks very interesting.

The value of the patch seems to come from skipping costly checks. Your
performance test has a leading VARCHAR column, so in that specific
case skipping the leading column is a big win. I don't see it would be
in all cases.

Can we see a few tests on similar things to check for other
opportunities and regressions.
* Single column bigint index
* Multi-column bigint index
* 5 column index with mixed text and integers

The explanatory comments need some work to more clearly explain what
this patch does.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-10-30 Thread Haribabu Kommi
On Mon, Oct 27, 2014 at 10:38 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 On 26 October 2014 10:42, Haribabu Kommi wrote:
 I have a question regarding setting of key flags matched. Only the
 first key was set as matched even if we have multiple index conditions.
 Is there any reason behind that?

 Actually the keysCount can be more than one only if the key strategy is 
 BTEqualStrategyNumber (i.e. = condition)
 But our optimization is only for the '' or '=' condition only. So adding 
 code to set flag for all keys is of no use.

 Let me know if I am missing anything here?

Thanks. I understood the point.

 If any volatile function is present in the index condition, the index
 scan itself is not choosen, Is there any need of handling the same
 similar to NULLS?

 Do you mean to say that whether we should add any special handling for 
 volatile function?
 If yes then as you said since index scan itself is not selected for such 
 functions, then
 I feel We don’t need to add anything for that.

I also have the same opinion. I marked the patch as ready for committer.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-10-27 Thread Rajeev rastogi
On 26 October 2014 10:42, Haribabu Kommi wrote:
 
 Hi,
 
 I reviewed index scan optimization patch, the following are the
 observations.
 
 - Patch applies cleanly.
 - Compiles without warnings
 - All regress tests are passed.
 
 There is a good performance gain with the patch in almost all scenarios.

Thanks for reviewing.

 I have a question regarding setting of key flags matched. Only the
 first key was set as matched even if we have multiple index conditions.
 Is there any reason behind that?

Actually the keysCount can be more than one only if the key strategy is 
BTEqualStrategyNumber (i.e. = condition) 
But our optimization is only for the '' or '=' condition only. So adding code 
to set flag for all keys is of no use.

Let me know if I am missing anything here?

 If any volatile function is present in the index condition, the index
 scan itself is not choosen, Is there any need of handling the same
 similar to NULLS?

Do you mean to say that whether we should add any special handling for volatile 
function?
If yes then as you said since index scan itself is not selected for such 
functions, then 
I feel We don’t need to add anything for that.
Any opinion?

Thanks and Regards,
Kumar Rajeev Rastogi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-10-25 Thread Haribabu Kommi
On Tue, Sep 23, 2014 at 10:38 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 On 22 September 2014 19:17, Heikki Linnakangas wrote:

 On 09/22/2014 04:45 PM, Tom Lane wrote:
  Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 09/22/2014 07:47 AM, Rajeev rastogi wrote:
  So my proposal is to skip the condition check on the first scan key
 condition for every tuple.
 
  The same happens in a single-column case. If you have a query like
  SELECT * FROM tbl2 where id2  'a', once you've found the start
  position of the scan, you know that all the rows that follow match
 too.
 
  ... unless you're doing a backwards scan.

 Sure. And you have to still check for NULLs. Have to get the details
 right..

 I have finished implementation of the discussed optimization.
 I got a performance improvement of around 30% on the schema and data shared 
 in earlier mail.

 I also tested for the index scan case, where our optimization is not done and 
 observed that there
 is no effect on those query because of this change.

 Change details:
 I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey 
 structure), the value used for this
 0x0004, which was unused.
 Inside the function _bt_first, once we finish finding the start scan position 
 based on the first key,
 I am appending the flag SK_BT_MATCHED to the first key.
 Then in the function _bt_checkkeys, during the key comparison, I am checking 
 if the key has SK_BT_MATCHED flag set, if yes then
 there is no need to further comparison. But if the tuple is having NULL 
 value, then even if this flag is set, we will continue
 with further comparison (this handles the Heikki point of checking NULLs).

Hi,

I reviewed index scan optimization patch, the following are the observations.

- Patch applies cleanly.
- Compiles without warnings
- All regress tests are passed.

There is a good performance gain with the patch in almost all scenarios.

I have a question regarding setting of key flags matched. Only the
first key was set as matched
even if we have multiple index conditions. Is there any reason behind that?

If any volatile function is present in the index condition, the index
scan itself is not choosen,
Is there any need of handling the same similar to NULLS?

Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-25 Thread Simon Riggs
On 22 September 2014 14:46, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 09/22/2014 04:45 PM, Tom Lane wrote:

 Heikki Linnakangas hlinnakan...@vmware.com writes:

 On 09/22/2014 07:47 AM, Rajeev rastogi wrote:

 So my proposal is to skip the condition check on the first scan key
 condition for every tuple.


 The same happens in a single-column case. If you have a query like
 SELECT * FROM tbl2 where id2  'a', once you've found the start
 position of the scan, you know that all the rows that follow match too.


 ... unless you're doing a backwards scan.


 Sure. And you have to still check for NULLs. Have to get the details right..

And also that the tests don't use volatile functions.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-24 Thread Rajeev rastogi
On 22 September 2014 18:51, Stephen Frost Wrote:

* Rajeev rastogi (rajeev.rast...@huawei.com) wrote:
 Thanks, I shall start to prepare a patch for this optimization and share in 1 
 or 2 days.

 This sounded interesting to me also- please be sure to put it on the open 
 commitfest once you have posted the patch.

Thanks, I have added it to next CommitFest i.e. October CommitFest.

Thanks and Regards,
Kumar Rajeev Rastogi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-23 Thread Rajeev rastogi
On 22 September 2014 19:17, Heikki Linnakangas wrote:
 
 On 09/22/2014 04:45 PM, Tom Lane wrote:
  Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 09/22/2014 07:47 AM, Rajeev rastogi wrote:
  So my proposal is to skip the condition check on the first scan key
 condition for every tuple.
 
  The same happens in a single-column case. If you have a query like
  SELECT * FROM tbl2 where id2  'a', once you've found the start
  position of the scan, you know that all the rows that follow match
 too.
 
  ... unless you're doing a backwards scan.
 
 Sure. And you have to still check for NULLs. Have to get the details
 right..

I have finished implementation of the discussed optimization.
I got a performance improvement of around 30% on the schema and data shared 
in earlier mail.

I also tested for the index scan case, where our optimization is not done and 
observed that there
is no effect on those query because of this change.

Change details:
I have added a new flag as SK_BT_MATCHED as part of sk_flags (ScanKey 
structure), the value used for this 
0x0004, which was unused.
Inside the function _bt_first, once we finish finding the start scan position 
based on the first key, 
I am appending the flag SK_BT_MATCHED to the first key.
Then in the function _bt_checkkeys, during the key comparison, I am checking if 
the key has SK_BT_MATCHED flag set, if yes then
there is no need to further comparison. But if the tuple is having NULL value, 
then even if this flag is set, we will continue
with further comparison (this handles the Heikki point of checking NULLs).

I will add this patch to the next CommitFest.

Please let me know your feedback. 

Thanks and Regards,
Kumar Rajeev Rastogi




index_scan_opt.patch
Description: index_scan_opt.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-22 Thread Heikki Linnakangas

On 09/22/2014 07:47 AM, Rajeev rastogi wrote:

I have observed a scope of considerable performance improvement in-case of 
index by a very minor code change.
Consider the below schema:

create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
 select count(*) from tbl2 where id2'a' and id399;

As per current design, it takes following steps to retrieve index tuples:

1.   Find the scan start position by searching first position in BTree as per 
the first key condition i.e. as per id2'a'

2.   Then it fetches each tuples from position found in step-1.

3.   For each tuple, it matches all scan key condition, in our example it 
matches both scan key condition.

4.   If condition match, it returns the tuple otherwise scan stops.

Now problem is here that already first scan key condition is matched to find 
the scan start position (Step-1), so it is obvious that any further tuple also 
will match the first scan key condition (as records are sorted).
So comparison on first scan key condition again in step-3 seems to be redundant.

So my proposal is to skip the condition check on the first scan key condition 
for every tuple.


The same happens in a single-column case. If you have a query like 
SELECT * FROM tbl2 where id2  'a', once you've found the start 
position of the scan, you know that all the rows that follow match too.



I would like to submit the patch for this improvement.
Please provide your feedback. Also let me know if I am missing something.


Yeah, sounds like a good idea. This scenario might not arise very often, 
but it should be cheap to check, so I doubt it will add any measurable 
overhead to the cases where the optimization doesn't help.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-22 Thread Rajeev rastogi
On 22 September 2014 12:35, Heikki Linnakangas:

  I have observed a scope of considerable performance improvement in-
 case of index by a very minor code change.
  Consider the below schema:
 
  create table tbl2(id1 int, id2 varchar(10), id3 int); create index
  idx2 on tbl2(id2, id3);
 
  Query as:
   select count(*) from tbl2 where id2'a' and
  id399;
 
  As per current design, it takes following steps to retrieve index
 tuples:
 
  1.   Find the scan start position by searching first position in
 BTree as per the first key condition i.e. as per id2'a'
 
  2.   Then it fetches each tuples from position found in step-1.
 
  3.   For each tuple, it matches all scan key condition, in our
 example it matches both scan key condition.
 
  4.   If condition match, it returns the tuple otherwise scan
 stops.
 
  Now problem is here that already first scan key condition is matched
 to find the scan start position (Step-1), so it is obvious that any
 further tuple also will match the first scan key condition (as records
 are sorted).
  So comparison on first scan key condition again in step-3 seems to be
 redundant.
 
  So my proposal is to skip the condition check on the first scan key
 condition for every tuple.
 
 The same happens in a single-column case. If you have a query like
 SELECT * FROM tbl2 where id2  'a', once you've found the start
 position of the scan, you know that all the rows that follow match too.

Very much true.

  I would like to submit the patch for this improvement.
  Please provide your feedback. Also let me know if I am missing
 something.
 
 Yeah, sounds like a good idea. This scenario might not arise very often,
 but it should be cheap to check, so I doubt it will add any measurable
 overhead to the cases where the optimization doesn't help.

Thanks, I shall start to prepare a patch for this optimization and share in 1 
or 2 days.

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-22 Thread Stephen Frost
* Rajeev rastogi (rajeev.rast...@huawei.com) wrote:
 Thanks, I shall start to prepare a patch for this optimization and share in 1 
 or 2 days.

This sounded interesting to me also- please be sure to put it on the
open commitfest once you have posted the patch.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Index scan optimization

2014-09-22 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 09/22/2014 07:47 AM, Rajeev rastogi wrote:
 So my proposal is to skip the condition check on the first scan key 
 condition for every tuple.

 The same happens in a single-column case. If you have a query like 
 SELECT * FROM tbl2 where id2  'a', once you've found the start 
 position of the scan, you know that all the rows that follow match too.

... unless you're doing a backwards scan.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index scan optimization

2014-09-22 Thread Heikki Linnakangas

On 09/22/2014 04:45 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

On 09/22/2014 07:47 AM, Rajeev rastogi wrote:

So my proposal is to skip the condition check on the first scan key condition 
for every tuple.



The same happens in a single-column case. If you have a query like
SELECT * FROM tbl2 where id2  'a', once you've found the start
position of the scan, you know that all the rows that follow match too.


... unless you're doing a backwards scan.


Sure. And you have to still check for NULLs. Have to get the details right..

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index scan optimization

2014-09-21 Thread Rajeev rastogi
I have observed a scope of considerable performance improvement in-case of 
index by a very minor code change.
Consider the below schema:

create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
select count(*) from tbl2 where id2'a' and id399;

As per current design, it takes following steps to retrieve index tuples:

1.   Find the scan start position by searching first position in BTree as 
per the first key condition i.e. as per id2'a'

2.   Then it fetches each tuples from position found in step-1.

3.   For each tuple, it matches all scan key condition, in our example it 
matches both scan key condition.

4.   If condition match, it returns the tuple otherwise scan stops.

Now problem is here that already first scan key condition is matched to find 
the scan start position (Step-1), so it is obvious that any further tuple also 
will match the first scan key condition (as records are sorted).
So comparison on first scan key condition again in step-3 seems to be redundant.

So my proposal is to skip the condition check on the first scan key condition 
for every tuple.

I have prototype code to see the result, Below are my some test data as per the 
query attached with this mail:
Time:
Original Code took:3621 ms
Optimized Code took:   2576 ms
So overall performance improvement is around 29%.

Instruction:
Original Code took:20057
Optimized Code took:   14557
So overall instruction improvement is around 27%.
Also I observed that only for function varstr_cmp, around 661M instruction was 
taken, which is completely saved for optimized code.

This optimization can be extended:

1.   For a case where once all scan key matches the condition, no need to 
check condition for further tuples for any scan key. Just keep returning the 
tuple.

2.   Above approach can be used for '' operator also by finding the 
position of last matching tuples.

I would like to submit the patch for this improvement.
Please provide your feedback. Also let me know if I am missing something.

Thanks and Regards,
Kumar Rajeev Rastogi

--Schema
create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

--Procedure to insert 1M data:

create or replace function insert_data(count1 int, count2 int) returns void
AS
$$
Begin
for i IN 1..count1 loop
insert into tbl2 values(i, 'a', i);
end loop;   

for i IN count1+1..count2 loop
insert into tbl2 values(i, 'b', i);
end loop;   
End;  
$$  language plpgsql;

select insert_data(99, 100);

--Procedure to select data 1000 times (1000 times selected to make data more 
appropriate.)
create or replace function select_data(count1 int) returns void
AS
$$
declare
x int;
Begin
for i IN 1..count1 loop
select count(*) into x from tbl2 where id2'a' and id399; 
end loop;   
End;  
$$  language plpgsql;

select select_data(1000);

--Result:
   OptimizedOriginal
Reading-1   2579.27 3621.82
Reading-2   2573.82 3618.29
Reading-3   2575.08 3625.16
Average 2576.06 3621.76

Overall Improvement 29% 

Instruction Original Code took: 20057 M
Insrtuction Optimized Code took:14557 M
So overall instruction improvement is around 27%.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers