Re: [GENERAL] Full text search tsv column aproach vs concat confusion

2016-11-18 Thread Artur Zakirov

Hi,

On 16.11.2016 11:54, cen wrote:

Hi

I am seeking some clarification in regard to full text search across
multiple tables and what the best approach is. Documentation talks about
two approaches when it comes to building a document: on-the-fly concat
of columns and a dedicated tsv column approach. Let's say I want to
perform a search for |"txt1 & txt2 & txt3" on columns table1.col1,
table1.col2 and table2.col1. I see the following solutions:|

|1. Concat all three into a document and perform a FTS.|

|SELECT * FROM (
|

|SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument
F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery 
|||WHEREsubquery.document@@to_tsquery(unaccent(?));| |

|2. Create a tsv column in each table, concat tsv columns and perform
FTS on that.|

|SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id
WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));|

|3. Have a tsv column only in table1 and insert table2.col1 to the tsv
via triggers. Works but seems very hacky.|

|
|

|It seems to me that option #2 is fast and easy to implement but I am
not sure what the concat of tsvs really means from index usage and
performance standpoint. Option #1 is the most flexible and I'd use that
all the time if it was not THAT much slower than tsv column approacj.
Documentation on TSV columns states: "||Another advantage is that
searches will be faster, since it will not be necessary to redo the
to_tsvector calls to verify index matches."
|

The question is, how much faster are tsv columns really? Are there any
benchmarks about this? If the performance difference is negligible I'd
advocate that using tsv columns is a waste of time and space in most
general cases. But since there is no information on how much faster it's
hard to decide.



I haven't any such benchmarks. But if you have a real database, you can 
perform tests using it on your solutions. Because it depends on your 
task and what you need.


By the way, I suppose it is better to use COALESCE() function if your 
columns could have NULL value:


SELECT * FROM (

SELECT to_tsvector(coalesce(table1.col1,'')) ||
   to_tsvector(coalesce(table1.col2,'')) ||
   to_tsvector(coalesce(table2.col1,'')) as document FROM table1 
LEFT JOIN table2 ON table1.table2_id=table2.id


) subquery WHERE subquery.document @@ to_tsquery(unaccent(?));

And specifying a text search configuration makes queries a little bit 
faster:


... to_tsvector('english', coalesce(table1.col1,'')) ...

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Full text search tsv column aproach vs concat confusion

2016-11-16 Thread cen

Hi

I am seeking some clarification in regard to full text search across 
multiple tables and what the best approach is. Documentation talks about 
two approaches when it comes to building a document: on-the-fly concat 
of columns and a dedicated tsv column approach. Let's say I want to 
perform a search for |"txt1 & txt2 & txt3" on columns table1.col1, 
table1.col2 and table2.col1. I see the following solutions:|


|1. Concat all three into a document and perform a FTS.|

|SELECT * FROM (
|

|SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument 
F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery |||WHEREsubquery.document@@to_tsquery(unaccent(?));| |


|2. Create a tsv column in each table, concat tsv columns and perform 
FTS on that.|


|SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id 
WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));|


|3. Have a tsv column only in table1 and insert table2.col1 to the tsv 
via triggers. Works but seems very hacky.|


|
|

|It seems to me that option #2 is fast and easy to implement but I am 
not sure what the concat of tsvs really means from index usage and 
performance standpoint. Option #1 is the most flexible and I'd use that 
all the time if it was not THAT much slower than tsv column approacj. 
Documentation on TSV columns states: "||Another advantage is that 
searches will be faster, since it will not be necessary to redo the 
to_tsvector calls to verify index matches."

|

The question is, how much faster are tsv columns really? Are there any 
benchmarks about this? If the performance difference is negligible I'd 
advocate that using tsv columns is a waste of time and space in most 
general cases. But since there is no information on how much faster it's 
hard to decide.



Best regards,
Klemen

||