Hi Matteo, Thanks for your suggestions, I just run some test with ILIKE and LIKE, and ILIKE is consistently slower so I think I will keep the Lower functions. As per your suggestion, I have switched indexes to use GIN type index, they seem to build/read a bit faster, still the Recheck task continues to happen in the query plan though. I have removed the Gender column from the query since is not relevant in my tests. With all this playing around it looks like the stats are now a bit more accurate. The query went down to 9 seconds, ideally I would like to get to execute in 2 seconds..., any thoughts on what else I could try? Thanks again, Eric
=# explain (analyse on,buffers on)select T.form_id from TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND LOWER(T.household_member_last_name) LIKE LOWER('%tu%') group by T.form_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------- HashAggregate (cost=557677.27..561360.83 rows=368356 width=8) (actual time=10172.672..10410.068 rows=786669 loops=1) Buffers: shared hit=304998 -> Bitmap Heap Scan on tar_mvw_targeting_record t (cost=80048.06..552677.27 rows=2000002 width=8) (actual time=2481.418..9564.280 rows =999933 loops=1) Recheck Cond: ((status)::text <> 'ANULLED'::text) Filter: (lower((household_member_last_name)::text) ~~ '%tu%'::text) Rows Removed by Filter: 9000079 Buffers: shared hit=304998 -> Bitmap Index Scan on tar_mvw_targeting_record_lower_idx4 (cost=0.00..79548.06 rows=10000012 width=0) (actual time=2375.399..2 375.399 rows=10000012 loops=1) Buffers: shared hit=7369 Total runtime: 10475.240 ms On Mon, Nov 10, 2014 at 1:57 PM, desmodemone <desmodem...@gmail.com> wrote: > > > 2014-11-10 18:43 GMT+01:00 Eric Ramirez <eric.ramirez...@gmail.com>: > >> >> Hi, >> I have created a sample database with test data to help benchmark our >> application. The database has ten million records, and is running on a >> dedicated server(postgres 9.3) with 8GB of RAM. Our queries are pretty >> slow with this amount of data and is my job to get them to run to at >> acceptable speed. First thing that I notice was that the planner's row >> estimates are off by a large number or records (millions) I have updated >> the statistics target but didn't seem to make a difference. The relevant >> output follows. >> Am I looking in the wrong place, something else I should be trying? >> Thanks in advance for your comments/suggestions, >> Eric. >> >> >> =# show work_mem; >> work_mem >> ---------- >> 1GB >> (1 row) >> =# show effective_cache_size; >> effective_cache_size >> ---------------------- >> 5GB >> (1 row) >> >> =#ALTER TABLE TAR_MVW_TARGETING_RECORD ALTER COLUMN >> household_member_first_name SET STATISTICS 5000; >> =# vacuum analyse TAR_MVW_TARGETING_RECORD; >> >> =# \d tar_mvw_targeting_record; >> Table "public.tar_mvw_targeting_record" >> Column | Type | Modifiers >> -----------------------------+-----------------------+----------- >> household_member_id | bigint | >> form_id | bigint | >> status | character varying(64) | >> gender | character varying(64) | >> household_member_first_name | character varying(64) | >> household_member_last_name | character varying(64) | >> >> Indexes: >> "tar_mvw_targeting_record_form_id_household_member_id_idx" UNIQUE, >> btree (form_id, household_member_id) >> "tar_mvw_targeting_record_lower_idx" gist >> (lower(household_member_first_name::text) extensions.gist_trgm_ops) >> WHERE status::text <> 'ANULLED'::text >> "tar_mvw_targeting_record_lower_idx1" gist >> (lower(household_member_last_name::text) extensions.gist_trgm_ops) >> WHERE status::text <> 'ANULLED'::text >> >> >> =# explain (analyse on,buffers on)select T.form_id from >> TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND >> LOWER(T.household_member_last_name) LIKE LOWER('%tu%') AND >> T.gender='FEMALE' group by T.form_id; >> >> QUERY PLAN >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------- >> ------------------------------- >> HashAggregate (cost=450994.35..452834.96 rows=184061 width=8) (actual >> time=11932.959..12061.206 rows=442453 loops=1) >> Buffers: shared hit=307404 read=109743 >> -> Bitmap Heap Scan on tar_mvw_targeting_record t >> (cost=110866.33..448495.37 rows=999592 width=8) (actual >> time=3577.301..11629.132 row >> s=500373 loops=1) >> Recheck Cond: ((lower((household_member_last_name)::text) ~~ >> '%tu%'::text) AND ((status)::text <> 'ANULLED'::text)) >> Rows Removed by Index Recheck: 9000079 >> Filter: ((gender)::text = 'FEMALE'::text) >> Rows Removed by Filter: 499560 >> Buffers: shared hit=307404 read=109743 >> -> Bitmap Index Scan on tar_mvw_targeting_record_lower_idx1 >> (cost=0.00..110616.43 rows=2000002 width=0) (actual time=3471.142..3 >> 471.142 rows=10000012 loops=1) >> Index Cond: (lower((household_member_last_name)::text) ~~ >> '%tu%'::text) >> Buffers: shared hit=36583 read=82935 >> Total runtime: 12092.059 ms >> (12 rows) >> >> Time: 12093.107 ms >> >> p.s. this plan was ran three times, first time took 74 seconds. >> >> >> > Hello Eric, > did you try with gin index instead ? so you could > avoid, if possible, the recheck condition (almost the gin index is not > lossy ), further if you always use a predicate like "gender=" , you could > think to partition the indexes based on that predicate (where status NOT IN > ('ANULLED') and gender='FEMALE', in the other case it wil be where status > NOT IN ('ANULLED') and gender='MALE' ) . Moreover you could avoid also the > "lower" operator and try use directly the ilike , instead of "like". > > CREATE INDEX tar_mvw_targeting_record_idx02 ON tar_mvw_targeting_record USING > gin ( status gin_trgm_ops) where status NOT IN ('ANULLED') and > gender='FEMALE' ; > CREATE INDEX tar_mvw_targeting_record_idx03 ON tar_mvw_targeting_record USING > gin ( status gin_trgm_ops) where status NOT IN ('ANULLED') and > gender='MALE' ; > > > explain (analyse on,buffers on) select T.form_id from > TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND > T.household_member_last_name ilike LOWER('%tu%') AND T.gender='FEMALE' > group by T.form_id; > > > I hope it works > > have a nice day > > > -- > Matteo Durighetto > > - - - - - - - - - - - - - - - - - - - - - - - > > Italian PostgreSQL User Group <http://www.itpug.org/index.it.html> > Italian Community for Geographic Free/Open-Source Software > <http://www.gfoss.it> >