Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
Hey guys,

I'm trying to understand the performance impact of "Index Recheck", I
googled for Index Recheck, but didn't find much details about it,
where can I know more about it?

And how did you know the performance is being significantly hurt by
inadequate work_mem?

I'm running PG 9.6.1, built from source.


On Mon, Nov 14, 2016 at 2:51 AM, Tom Lane  wrote:
> Oleg Bartunov  writes:
>> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis 
>>> It takes 500ms with 10m rows, could it be faster?
>
>> sure.  Recheck with function call is pretty expensive, so I'd not recommend
>> to create functional index, just create separate column of type tsvector
>> (materialize to_tsvector) and create gin index on it.  You should surprise.
>
> I doubt it'll help that much --- more than half the time is going into the
> bitmap indexscan, and with over 1m candidate matches, there's no way
> that's going to be super cheap.
>
> I wonder whether a gist index would be better here, since it would support
> a plain indexscan which should require scanning much less of the index
> given the small LIMIT.
>
> (Materializing the tsvector would probably help for gist, too, by reducing
> the cost of lossy-index rechecks.)
>
> BTW, it still looks like the performance is being significantly hurt by
> inadequate work_mem.
>
> regards, tom lane



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?
I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
  QUERY PLAN
---
 Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
 Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
 Rows Removed by Index Recheck: 12242
 Heap Blocks: exact=20 lossy=186
 ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
   Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
 Planning time: 0.144 ms
 Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
 wrote:
> On 13/11/2016 15:26, Aaron Lewis wrote:
>> Hi Oleg,
>>
>> Can you elaborate on the title column? I don't get it.
>>
>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>
> You created an index on the text 'title', not on the title column, so
> the index is useless.
>
> Drop the existing index and create this one instead:
>
> create index name_fts on mytable using gin(to_tsvector('english', title));
>
>> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov  wrote:
>>>
>>>
>>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis 
>>> wrote:
>>>>
>>>> I have a simple table, and a gin index,
>>>>
>>>> create table mytable(hash char(40), title varchar(500));
>>>> create index name_fts on mytable using gin(to_tsvector('english',
>>>> 'title'));
>>>
>>>
>>>
>>> ^
>>>
>>>>
>>>> create unique index md5_uniq_idx on mytable(hash);
>>>>
>>>> When I execute a query with tsquery, the GIN index was not in use:
>>>>
>>>> test=# explain analyze select * from mytable where
>>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>>  QUERY PLAN
>>>>
>>>> 
>>>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>>>> time=0.111..75.549 rows=10 loops=1)
>>>>->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>>  Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>>> '''abc'' | ''def'''::tsquery)
>>>>  Rows Removed by Filter: 10221
>>>>  Planning time: 0.176 ms
>>>>  Execution time: 75.564 ms
>>>> (6 rows)
>>>>
>>>> Any ideas?
>>>>
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
Hi Oleg,

Can you elaborate on the title column? I don't get it.

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov  wrote:
>
>
> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis 
> wrote:
>>
>> I have a simple table, and a gin index,
>>
>> create table mytable(hash char(40), title varchar(500));
>> create index name_fts on mytable using gin(to_tsvector('english',
>> 'title'));
>
>
>
> ^
>
>>
>> create unique index md5_uniq_idx on mytable(hash);
>>
>> When I execute a query with tsquery, the GIN index was not in use:
>>
>> test=# explain analyze select * from mytable where
>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>  QUERY PLAN
>>
>> 
>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>> time=0.111..75.549 rows=10 loops=1)
>>->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>> (actual time=0.110..75.546 rows=10 loops=1)
>>  Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>> '''abc'' | ''def'''::tsquery)
>>  Rows Removed by Filter: 10221
>>  Planning time: 0.176 ms
>>  Execution time: 75.564 ms
>> (6 rows)
>>
>> Any ideas?
>>
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Aaron Lewis
Thanks Oleg.

I've increased work_mem to 128MB, now the query falls down to 1.7s,
faster but still not good enough.

Is there any other thing I can do about it?

test=# explain analyze select * from mytable where title ilike 'x264';
  QUERY PLAN
--
 Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
width=83) (actual time=1754.656..1754.656 rows=0 loops=1)
   Recheck Cond: (title ~~* 'x264'::text)
   Rows Removed by Index Recheck: 1220793
   Heap Blocks: exact=197567
   ->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
rows=1380 width=0) (actual time=346.663..346.663 rows=1220793 loops=1)
 Index Cond: (title ~~* 'x264'::text)
 Planning time: 1.168 ms
 Execution time: 1755.944 ms


On Sun, Nov 13, 2016 at 10:04 PM, Oleg Bartunov  wrote:
>
>
> On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis 
> wrote:
>>
>> I have a simple table with Trigram index,
>>
>> create table mytable(hash char(40), title text);
>> create index title_trgm_idx on mytable using gin(title gin_trgm_ops);
>>
>> When I run a query with 10m rows, it uses the Trigram index, but takes
>> 3s to execute, very slow.
>> (I have 80m rows, but only inserted 10m for testing purpose)
>>
>> test=# select count(*) from mytable;
>>   count
>> --
>>  13971887
>> (1 row)
>>
>> test=# explain analyze select * from mytable where title ilike 'x264';
>>   QUERY PLAN
>>
>> --
>>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
>> width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
>>Recheck Cond: (title ~~* 'x264'::text)
>>Rows Removed by Index Recheck: 11402855
>>Heap Blocks: exact=39557 lossy=158010
>>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
>> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
>>  Index Cond: (title ~~* 'x264'::text)
>>  Planning time: 0.611 ms
>>  Execution time: 2937.729 ms
>> (8 rows)
>>
>> Any ideas to speed things up?
>
>
>Rows Removed by Index Recheck: 11402855
>Heap Blocks: exact=39557 lossy=158010
>
> You need to increase work_mem
>>
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Aaron Lewis
I have a simple table with Trigram index,

create table mytable(hash char(40), title text);
create index title_trgm_idx on mytable using gin(title gin_trgm_ops);

When I run a query with 10m rows, it uses the Trigram index, but takes
3s to execute, very slow.
(I have 80m rows, but only inserted 10m for testing purpose)

test=# select count(*) from mytable;
  count
--
 13971887
(1 row)

test=# explain analyze select * from mytable where title ilike 'x264';
  QUERY PLAN
--
 Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
   Recheck Cond: (title ~~* 'x264'::text)
   Rows Removed by Index Recheck: 11402855
   Heap Blocks: exact=39557 lossy=158010
   ->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
 Index Cond: (title ~~* 'x264'::text)
 Planning time: 0.611 ms
 Execution time: 2937.729 ms
(8 rows)

Any ideas to speed things up?

-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));
create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
 QUERY PLAN

 Limit  (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
 Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
 Rows Removed by Filter: 10221
 Planning time: 0.176 ms
 Execution time: 75.564 ms
(6 rows)

Any ideas?


-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general