Re: [GENERAL] Why is this query not using GIN index?
Hey guys, I'm trying to understand the performance impact of "Index Recheck", I googled for Index Recheck, but didn't find much details about it, where can I know more about it? And how did you know the performance is being significantly hurt by inadequate work_mem? I'm running PG 9.6.1, built from source. On Mon, Nov 14, 2016 at 2:51 AM, Tom Lane wrote: > Oleg Bartunov writes: >> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis >>> It takes 500ms with 10m rows, could it be faster? > >> sure. Recheck with function call is pretty expensive, so I'd not recommend >> to create functional index, just create separate column of type tsvector >> (materialize to_tsvector) and create gin index on it. You should surprise. > > I doubt it'll help that much --- more than half the time is going into the > bitmap indexscan, and with over 1m candidate matches, there's no way > that's going to be super cheap. > > I wonder whether a gist index would be better here, since it would support > a plain indexscan which should require scanning much less of the index > given the small LIMIT. > > (Materializing the tsvector would probably help for gist, too, by reducing > the cost of lossy-index rechecks.) > > BTW, it still looks like the performance is being significantly hurt by > inadequate work_mem. > > regards, tom lane -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query not using GIN index?
Sigh, didn't notice that. Thanks for the heads up. It takes 500ms with 10m rows, could it be faster? I've increased work_mem to 256MB test=# explain analyze select * from mytable where to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10; QUERY PLAN --- Limit (cost=684.06..2949.42 rows=1000 width=83) (actual time=348.506..536.483 rows=1000 loops=1) -> Bitmap Heap Scan on mytable (cost=661.41..158917.22 rows=69859 width=83) (actual time=345.354..536.199 rows=1010 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, title) @@ '''x264'''::tsquery) Rows Removed by Index Recheck: 12242 Heap Blocks: exact=20 lossy=186 -> Bitmap Index Scan on name_fts (cost=0.00..643.95 rows=69859 width=0) (actual time=333.703..333.703 rows=1044673 loops=1) Index Cond: (to_tsvector('english'::regconfig, title) @@ '''x264'''::tsquery) Planning time: 0.144 ms Execution time: 537.212 ms (9 rows) On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud wrote: > On 13/11/2016 15:26, Aaron Lewis wrote: >> Hi Oleg, >> >> Can you elaborate on the title column? I don't get it. >> > >>>> create table mytable(hash char(40), title varchar(500)); >>>> create index name_fts on mytable using gin(to_tsvector('english', >>>> 'title')); > > You created an index on the text 'title', not on the title column, so > the index is useless. > > Drop the existing index and create this one instead: > > create index name_fts on mytable using gin(to_tsvector('english', title)); > >> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov wrote: >>> >>> >>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis >>> wrote: >>>> >>>> I have a simple table, and a gin index, >>>> >>>> create table mytable(hash char(40), title varchar(500)); >>>> create index name_fts on mytable using gin(to_tsvector('english', >>>> 'title')); >>> >>> >>> >>> ^ >>> >>>> >>>> create unique index md5_uniq_idx on mytable(hash); >>>> >>>> When I execute a query with tsquery, the GIN index was not in use: >>>> >>>> test=# explain analyze select * from mytable where >>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; >>>> QUERY PLAN >>>> >>>> >>>> Limit (cost=0.00..277.35 rows=10 width=83) (actual >>>> time=0.111..75.549 rows=10 loops=1) >>>>-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) >>>> (actual time=0.110..75.546 rows=10 loops=1) >>>> Filter: (to_tsvector('english'::regconfig, (title)::text) @@ >>>> '''abc'' | ''def'''::tsquery) >>>> Rows Removed by Filter: 10221 >>>> Planning time: 0.176 ms >>>> Execution time: 75.564 ms >>>> (6 rows) >>>> >>>> Any ideas? >>>> > > -- > Julien Rouhaud > http://dalibo.com - http://dalibo.org -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is this query not using GIN index?
Hi Oleg, Can you elaborate on the title column? I don't get it. On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov wrote: > > > On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis > wrote: >> >> I have a simple table, and a gin index, >> >> create table mytable(hash char(40), title varchar(500)); >> create index name_fts on mytable using gin(to_tsvector('english', >> 'title')); > > > > ^ > >> >> create unique index md5_uniq_idx on mytable(hash); >> >> When I execute a query with tsquery, the GIN index was not in use: >> >> test=# explain analyze select * from mytable where >> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; >> QUERY PLAN >> >> >> Limit (cost=0.00..277.35 rows=10 width=83) (actual >> time=0.111..75.549 rows=10 loops=1) >>-> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) >> (actual time=0.110..75.546 rows=10 loops=1) >> Filter: (to_tsvector('english'::regconfig, (title)::text) @@ >> '''abc'' | ''def'''::tsquery) >> Rows Removed by Filter: 10221 >> Planning time: 0.176 ms >> Execution time: 75.564 ms >> (6 rows) >> >> Any ideas? >> >> >> -- >> Best Regards, >> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ >> Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trigram is slow when 10m rows
Thanks Oleg. I've increased work_mem to 128MB, now the query falls down to 1.7s, faster but still not good enough. Is there any other thing I can do about it? test=# explain analyze select * from mytable where title ilike 'x264'; QUERY PLAN -- Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380 width=83) (actual time=1754.656..1754.656 rows=0 loops=1) Recheck Cond: (title ~~* 'x264'::text) Rows Removed by Index Recheck: 1220793 Heap Blocks: exact=197567 -> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35 rows=1380 width=0) (actual time=346.663..346.663 rows=1220793 loops=1) Index Cond: (title ~~* 'x264'::text) Planning time: 1.168 ms Execution time: 1755.944 ms On Sun, Nov 13, 2016 at 10:04 PM, Oleg Bartunov wrote: > > > On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis > wrote: >> >> I have a simple table with Trigram index, >> >> create table mytable(hash char(40), title text); >> create index title_trgm_idx on mytable using gin(title gin_trgm_ops); >> >> When I run a query with 10m rows, it uses the Trigram index, but takes >> 3s to execute, very slow. >> (I have 80m rows, but only inserted 10m for testing purpose) >> >> test=# select count(*) from mytable; >> count >> -- >> 13971887 >> (1 row) >> >> test=# explain analyze select * from mytable where title ilike 'x264'; >> QUERY PLAN >> >> -- >> Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380 >> width=83) (actual time=2937.308..2937.308 rows=0 loops=1) >>Recheck Cond: (title ~~* 'x264'::text) >>Rows Removed by Index Recheck: 11402855 >>Heap Blocks: exact=39557 lossy=158010 >>-> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35 >> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1) >> Index Cond: (title ~~* 'x264'::text) >> Planning time: 0.611 ms >> Execution time: 2937.729 ms >> (8 rows) >> >> Any ideas to speed things up? > > >Rows Removed by Index Recheck: 11402855 >Heap Blocks: exact=39557 lossy=158010 > > You need to increase work_mem >> >> >> -- >> Best Regards, >> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ >> Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Trigram is slow when 10m rows
I have a simple table with Trigram index, create table mytable(hash char(40), title text); create index title_trgm_idx on mytable using gin(title gin_trgm_ops); When I run a query with 10m rows, it uses the Trigram index, but takes 3s to execute, very slow. (I have 80m rows, but only inserted 10m for testing purpose) test=# select count(*) from mytable; count -- 13971887 (1 row) test=# explain analyze select * from mytable where title ilike 'x264'; QUERY PLAN -- Bitmap Heap Scan on mytable (cost=462.69..5639.67 rows=1380 width=83) (actual time=2937.308..2937.308 rows=0 loops=1) Recheck Cond: (title ~~* 'x264'::text) Rows Removed by Index Recheck: 11402855 Heap Blocks: exact=39557 lossy=158010 -> Bitmap Index Scan on title_trgm_idx (cost=0.00..462.35 rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1) Index Cond: (title ~~* 'x264'::text) Planning time: 0.611 ms Execution time: 2937.729 ms (8 rows) Any ideas to speed things up? -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why is this query not using GIN index?
I have a simple table, and a gin index, create table mytable(hash char(40), title varchar(500)); create index name_fts on mytable using gin(to_tsvector('english', 'title')); create unique index md5_uniq_idx on mytable(hash); When I execute a query with tsquery, the GIN index was not in use: test=# explain analyze select * from mytable where to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10; QUERY PLAN Limit (cost=0.00..277.35 rows=10 width=83) (actual time=0.111..75.549 rows=10 loops=1) -> Seq Scan on mytable (cost=0.00..381187.45 rows=13744 width=83) (actual time=0.110..75.546 rows=10 loops=1) Filter: (to_tsvector('english'::regconfig, (title)::text) @@ '''abc'' | ''def'''::tsquery) Rows Removed by Filter: 10221 Planning time: 0.176 ms Execution time: 75.564 ms (6 rows) Any ideas? -- Best Regards, Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/ Finger Print: 9F67 391B B770 8FF6 99DC D92D 87F6 2602 1371 4D33 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general