Re: [GENERAL] Find similar records (compare tsvectors)
Thanks. smlar is fast and quite good.I need find tuning on the search result. On Saturday, March 7, 2015 12:07 AM, Oleg Bartunov wrote: On Fri, Mar 6, 2015 at 5:05 PM, Patrick Dung wrote: > Resend. > > How to quickly compare the similarity of two tsvector? > check http://www.sai.msu.su/~megera/postgres/talks/pgcon-2012.pdf > > On Monday, March 2, 2015 11:01 PM, Patrick Dung > wrote: > > > Hello, > > I had a database with articles or attachment stored in bytea format. > I also had a trigger: it insert/update the tsv column when a record is > added/updated. > The tsv column had a GIN index. > With this setting, I can do very fast keyword search on the tsv. > > Suppose I had a specific record (id=10). > How to list similar records based on ranking? > In that case, I had to compare a tsvector with another tsvector. > > I had this SQL which make the original tsv as a text and then to tsquery, > Then I can compare a tsv and a tsquery. > SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) > as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, > (SELECT tsv, id FROM items WHERE id=10) AS original WHERE i.id != > original.id ORDER BY similarity; > > items table: > id bigint > company varchar > industry varchar > description varchar > post_timestamp timestamp > attachment bytea > tsv tsvector > > The problem is that this is very slow. > Any comment? > > Thank and regards, > Patrick > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Find similar records (compare tsvectors)
On Fri, Mar 6, 2015 at 5:05 PM, Patrick Dung wrote: > Resend. > > How to quickly compare the similarity of two tsvector? > check http://www.sai.msu.su/~megera/postgres/talks/pgcon-2012.pdf > > On Monday, March 2, 2015 11:01 PM, Patrick Dung > wrote: > > > Hello, > > I had a database with articles or attachment stored in bytea format. > I also had a trigger: it insert/update the tsv column when a record is > added/updated. > The tsv column had a GIN index. > With this setting, I can do very fast keyword search on the tsv. > > Suppose I had a specific record (id=10). > How to list similar records based on ranking? > In that case, I had to compare a tsvector with another tsvector. > > I had this SQL which make the original tsv as a text and then to tsquery, > Then I can compare a tsv and a tsquery. > SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) > as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, > (SELECT tsv, id FROM items WHERE id=10) AS original WHERE i.id != > original.id ORDER BY similarity; > > items table: > id bigint > company varchar > industry varchar > description varchar > post_timestamp timestamp > attachment bytea > tsv tsvector > > The problem is that this is very slow. > Any comment? > > Thank and regards, > Patrick > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Find similar records (compare tsvectors)
Resend. How to quickly compare the similarity of two tsvector? On Monday, March 2, 2015 11:01 PM, Patrick Dung wrote: Hello, I had a database with articles or attachment stored in bytea format.I also had a trigger: it insert/update the tsv column when a record is added/updated.The tsv column had a GIN index.With this setting, I can do very fast keyword search on the tsv. Suppose I had a specific record (id=10).How to list similar records based on ranking?In that case, I had to compare a tsvector with another tsvector. I had this SQL which make the original tsv as a text and then to tsquery, Then I can compare a tsv and a tsquery. SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, (SELECT tsv, id FROM items WHERE id=10) AS original WHERE i.id != original.id ORDER BY similarity; items table:id bigint company varchar industry varchardescription varcharpost_timestamp timestampattachment bytea tsv tsvector The problem is that this is very slow.Any comment? Thank and regards,Patrick
[GENERAL] Find similar records (compare tsvectors)
Hello, I had a database with articles or attachment stored in bytea format.I also had a trigger: it insert/update the tsv column when a record is added/updated.The tsv column had a GIN index.With this setting, I can do very fast keyword search on the tsv. Suppose I had a specific record (id=10).How to list similar records based on ranking?In that case, I had to compare a tsvector with another tsvector. I had this SQL which make the original tsv as a text and then to tsquery, Then I can compare a tsv and a tsquery. SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, (SELECT tsv, id FROM items WHERE id=10) AS original WHERE i.id != original.id ORDER BY similarity; items table:id bigint company varchar industry varchardescription varcharpost_timestamp timestampattachment bytea tsv tsvector The problem is that this is very slow.Any comment? Thank and regards,Patrick