[HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

Andy Colson Sun, 16 Jan 2011 11:05:35 -0800

This is a review of:
https://commitfest.postgresql.org/action/patch_view?id=468


Purpose:
========
Equal and not-equal _may_ be quickly determined if their lengths are different. 
  This _may_ be a huge speed up if we dont have to detoat.


The Patch:
==========
I was able to read and understand the patch, its a simple change and looked 
correct to me (a non PG hacker).
It applies clean to git head, compiles and runs fine with debug enabled.

make check passes


Usability:
==========
I used _may_ above.  The benchmark included with the patch, showing huge speedups, is 
really contrived.  It uses a where clause with a thousand character constant:  (where c = 
'long...long...long...long...ConstantText...etc').  In my opinion this is very uncommon 
(the author does note this is a "best case").  If you have a field large enough 
to be toasted you are not going to be using that to search on, you are going to have an 
ID field that is indexed.  (select c where id = 7)

This also only touches = and <>.  > < and like wont be touched.  So I think the 
scope of this is limited.

THAT being said, the patch is simple, and if you do happen to hit the code, it 
will speed things up.  As a user of PG I'd like to have this included.  Its a 
corner case, but a big corner, and its a small, simple change, and it wont slow 
anything else down.


Performance:
============
I created myself a more real world test, with a table with indexes and id's and 
a large toasted field.

create table junk(id serial primary key, xgroup integer, c text);
create index junk_group on junk(xgroup);


I filled it full of junk:

do $$
        declare i integer;
        declare j integer;
begin
        for i in 1..100 loop
                for j in 1..500 loop
                        insert into junk(xgroup, c) values (j, 'c'||i);
                        insert into junk (xgroup, c) select j, repeat('abc', 
2000)|| n from generate_series(1, 5) n;
                end loop;
        end loop;
end$$;


This will make about 600 records within the same xgroup.  As well as a simple 'c15' type 
of value in c we can search for.  My thinking is you may not know the exact unique id, 
but you do know what group its in, so that'll cut out 90% of the records, and then you'll 
have to add " and c = 'c15'" to get the exact one you want.

I still saw a nice performance boost.

Old PG:
$ psql < bench3.sql
Timing is on.
DO
Time: 2010.412 ms

Patched:
$ psql < bench3.sql
Timing is on.
DO
Time: 184.602 ms


bench3.sql:
do $$
        declare i integer;
begin
        for i in 1..400 loop
                perform count(*) from junk where xgroup = i and c like 'c' || i;
        end loop;
end$$;



Summary:
========
Performance speed-up:  Oh yeah!  If you just happen to hit it, and if you do 
hit it, you might want to re-think your layout a little bit.

Do I want it?  Yes please.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

Reply via email to