Re: [GENERAL] PostgreSQL GIN index not used when ts_query language is fetched from a column

2016-09-15 Thread Jeff Janes
On Thu, Sep 15, 2016 at 5:46 AM, Jaap Roes  wrote:

> I've got a table that stores some multilingual content:
>
> CREATE TABLE search (
>   content text NOT NULL,
>   language regconfig NOT NULL,
>   fulltext tsvector
> );
> CREATE INDEX search_fulltext ON search USING GIN(fulltext);
>
> INSERT INTO search (language, content) VALUES
>   ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen
> vliegen achterna'),
>   ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali
> op zijn kale koeli-kop.'),
>   ('dutch', 'Moeder sneed zeven scheve sneden brood'),
>   ('english', 'I saw Susie sitting in a shoe shine shop. Where she
> sits she shines, and where she shines she sits.'),
>   ('english', 'How can a clam cram in a clean cream can?'),
>   ('english', 'Can you can a can as a canner can can a can?');
>
> UPDATE search SET fulltext = to_tsvector(language, content);
>
> To make sure I always search in the correct language I use these queries:
>
> SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
> (1 row)
>

I don't think that what you want is possible to do in a clean way.  It
can't know what the language for a row is until it finds the row, but it
can't find the row using an index until it runs to_tsquery, and it can't
run to_tsquery until it knows the language.

You would probably need to run the query once for each language, and filter
out the results which it found under the 'wrong' language.

SELECT * FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine') and
language = 'dutch'::regconfig
union all
SELECT * FROM search WHERE fulltext @@ to_tsquery('english', 'shine') and
language = 'english'::regconfig;

If you have more than two languages, this will become tedious, and perhaps
non-performant.  I am not aware of a more general solution, though.

It might help to build partial indexes on each language.

Cheers,

Jeff


Re: [GENERAL] PostgreSQL GIN index not used when ts_query language is fetched from a column

2016-09-15 Thread Jaap Roes

> On 15 Sep 2016, at 14:46, Jaap Roes  wrote:
> 
> I've got a table that stores some multilingual content:
> 
>CREATE TABLE search (
>  content text NOT NULL,
>  language regconfig NOT NULL,
>  fulltext tsvector
>);
>CREATE INDEX search_fulltext ON search USING GIN(fulltext);
...
> To make sure I always search in the correct language I use these queries:
> 
>SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
>(1 row)
> 
>SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
>(1 row)
> 
> Because hardcoding the language doesn't give the correct results:
> 
>SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
>(0 rows)
> 
>SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
>(0 rows)
> 
> The problem however is that PostgreSQL doesn't use the GIN index when using 
> the first set of queries and instead does a sequential scan:
...
>EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ 
> to_tsquery(language, 'shine’);
> 
>Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual 
> time=0.040..0.044 rows=1 loops=1)
>Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
>Rows Removed by Filter: 5
>Planning time: 0.039 ms
>Execution time: 0.064 ms
> 
> While it does when hardcoding a language:
> 
>EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 
> 'vlieg');
>Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual 
> time=0.044..0.044 rows=1 loops=1)
>  Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
>  Heap Blocks: exact=1
>->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 
> width=0) (actual time=0.037..0.037 rows=1 loops=1)
>  Index Cond: (fulltext @@ '''vlieg'''::tsquery)
>Planning time: 0.128 ms
>Execution time: 0.065 ms
> 
> So my question is: Is it at all possible to use a column in the ts_query to 
> use the correct language config and still have Postgres use the GIN index?

So I came up with a solution. Pre-localise the query in a join and use that to 
filter the matches:

SELECT * FROM search s 
INNER JOIN (
SELECT 'dutch'::regconfig AS language, to_tsquery('dutch', 'shine') as q
UNION SELECT 'english'::regconfig AS language, to_tsquery('english', 
'shine') as q
UNION SELECT 'simple'::regconfig AS language, to_tsquery('simple', 
'shine') as q
) q ON (s.language=q.language) 
WHERE fulltext @@ q;

This seems to work, but the query plan looks a bit confusing, so I’m not super 
confident about the correctness:

Nested Loop  (cost=205.44..1327.12 rows=188 width=1590) (actual 
time=3.350..7.010 rows=16 loops=1)
  ->  Unique  (cost=0.08..0.11 rows=3 width=0) (actual time=0.010..0.021 
rows=3 loops=1)
->  Sort  (cost=0.08..0.09 rows=3 width=0) (actual 
time=0.008..0.011 rows=3 loops=1)
  Sort Key: ('dutch'::regconfig), ('''vlieg'''::tsquery)
  Sort Method: quicksort  Memory: 25kB
  ->  Append  (cost=0.00..0.06 rows=3 width=0) (actual 
time=0.001..0.002 rows=3 loops=1)
->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.001 rows=1 loops=1)
->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.001 rows=1 loops=1)
->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.000..0.000 rows=1 loops=1)
  ->  Bitmap Heap Scan on search s  (cost=205.36..441.70 rows=63 
width=1554) (actual time=2.310..2.318 rows=5 loops=3)
Recheck Cond: ((fulltext @@ ('''vlieg'''::tsquery)) AND 
((language)::oid = (('dutch'::regconfig))::oid))
Heap Blocks: exact=16
->  BitmapAnd  (cost=205.36..205.36 rows=63 width=0) (actual 
time=2.303..2.303 rows=0 loops=3)
  ->  Bitmap Index Scan on search_fulltext  (cost=0.00..17.41 
rows=188 width=0) (actual time=0.018..0.018 rows=16 loops=3)
Index Cond: (fulltext @@ ('''vlieg'''::tsquery))
  ->  Bitmap Index Scan on search_language  (cost=0.00..187.67 
rows=12539 width=0) (actual time=2.277..2.277 rows=12539 loops=3)
Index Cond: ((language_config)::oid = 
(('dutch'::regconfig))::oid)
Planning time: 0.228 ms
Execution time: 7.058 ms

Is this the way to go? Or is there a better way, I’m eager to find out!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL GIN index not used when ts_query language is fetched from a column

2016-09-15 Thread Jaap Roes
I've got a table that stores some multilingual content:

CREATE TABLE search (
  content text NOT NULL,
  language regconfig NOT NULL,
  fulltext tsvector
);
CREATE INDEX search_fulltext ON search USING GIN(fulltext);

INSERT INTO search (language, content) VALUES  
  ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen vliegen 
achterna'),
  ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali op 
zijn kale koeli-kop.'),
  ('dutch', 'Moeder sneed zeven scheve sneden brood'),
  ('english', 'I saw Susie sitting in a shoe shine shop. Where she sits she 
shines, and where she shines she sits.'),
  ('english', 'How can a clam cram in a clean cream can?'),
  ('english', 'Can you can a can as a canner can can a can?');

UPDATE search SET fulltext = to_tsvector(language, content);

To make sure I always search in the correct language I use these queries:

SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
(1 row)

SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
(1 row)

Because hardcoding the language doesn't give the correct results:

SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
(0 rows)

SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
(0 rows)

The problem however is that PostgreSQL doesn't use the GIN index when using the 
first set of queries and instead does a sequential scan:

(Note: I've disabled scanning using SET enable_seqscan = OFF; for these 
examples because of the low amount of rows)

EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 
'shine’);

Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual 
time=0.040..0.044 rows=1 loops=1)
Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
Rows Removed by Filter: 5
Planning time: 0.039 ms
Execution time: 0.064 ms

While it does when hardcoding a language:

EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 
'vlieg');
Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual 
time=0.044..0.044 rows=1 loops=1)
  Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
  Heap Blocks: exact=1
->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 
width=0) (actual time=0.037..0.037 rows=1 loops=1)
  Index Cond: (fulltext @@ '''vlieg'''::tsquery)
Planning time: 0.128 ms
Execution time: 0.065 ms

So my question is: Is it at all possible to use a column in the ts_query to use 
the correct language config and still have Postgres use the GIN index?

I’ve tried this on both PostgreSQL 9.4 and 9.5.

Obviously the real table has a lot more rows, so here's the execution plan from 
the real table:

Using a column for language config:

Seq Scan on search  (cost=0.00..8727.25 rows=188 width=0) (actual 
time=0.725..352.307 rows=1689 loops=1)
  Filter: (fulltext @@ to_tsquery(language_config, 'example'::text))
  Rows Removed by Filter: 35928
Planning time: 0.053 ms
Execution time: 352.915 ms

When hardcoding the language:

Bitmap Heap Scan on search  (cost=28.65..4088.92 rows=1633 width=0) (actual 
time=0.514..10.475 rows=1684 loops=1)
  Recheck Cond: (fulltext @@ '''exampl'''::tsquery)
  Heap Blocks: exact=1522  
->  Bitmap Index Scan on search_fulltext  (cost=0.00..28.24 rows=1633 
width=0) (actual time=0.333..0.333 rows=1684 loops=1)
  Index Cond: (fulltext @@ '''exampl'''::tsquery)
Planning time: 0.180 ms
Execution time: 10.564 ms

Note: I previously asked this on stackexhange 
(http://dba.stackexchange.com/questions/149765/postgresql-gin-index-not-used-when-ts-query-language-is-fetched-from-a-column)
 but I’m assuming there are more knowledgeable people on this mailing list ;-)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general