Re: [sqlite] finding pairwise similarity
where `similarity` is a measure of overlap of `t` between each pair of `c` and is calculated as the number of `t` common between a given pair divided by the total number of unique `t` in the pair. Just like Igor, I take "unique `t` in pair" to mean "distinct `t` in pair". I want the result to be source_c target_c similarity -- 1 2 0.2 (= 1 common `t` between the 2 `c` / by total 4 `t`) Here I take 0.2 to mean 0.25. If so, here's some more amunition: select src.c_no "Source c", trg.c_no "Target c", (select count(*) from (select 1 from c_t where c_t.c_no in (src.c_no, trg.c_no) group by c_t.t_no having count(*) > 1)) * 1.0 / (select count(*) from (select distinct t_no from c_t where c_t.c_no in (src.c_no, trg.c_no))) Similarity from c src join c trg on (src.c_no < trg.c_no); If I'm not mistaken, both versions should give identical results. Check by yourself on real data and see which is faster for you (most probably Igor's). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] finding pairwise similarity
Mr. Puneet Kishor wrote: > I have two tables like so > >CREATE TABLE c (c_no INTEGER PRIMARY KEY, c_name TEXT); >CREATE TABLE t (t_no INTEGER PRIMARY KEY, t_name TEXT); >CREATE TABLE c_t (c_no INTEGER, t_no INTEGER); > > Every row in `c` has one or more `t`. I want, in a single SQL, the following > >source_c, target_c, similarity > > where `similarity` is a measure of overlap of `t` between each pair of `c` > and is calculated as the number of `t` common between > a given pair divided by the total number of unique `t` in the pair. select src.c_no source_c, trg.c_no target_c, (select count(*) from ( select t_no from c_t where c_t.c_no = src.c_no intersect select t_no from c_t where c_t.c_no = trg.c_no)) * 1.0 / (select count(*) from ( select t_no from c_t where c_t.c_no = src.c_no union select t_no from c_t where c_t.c_no = trg.c_no)) similarity from c src join c trg on (src.c_no < trg.c_no); -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] finding pairwise similarity
I have two tables like so CREATE TABLE c (c_no INTEGER PRIMARY KEY, c_name TEXT); CREATE TABLE t (t_no INTEGER PRIMARY KEY, t_name TEXT); CREATE TABLE c_t (c_no INTEGER, t_no INTEGER); Every row in `c` has one or more `t`. I want, in a single SQL, the following source_c, target_c, similarity where `similarity` is a measure of overlap of `t` between each pair of `c` and is calculated as the number of `t` common between a given pair divided by the total number of unique `t` in the pair. For example, given the following records c t --- --- 1 1 1 2 1 3 2 1 2 5 I want the result to be source_c target_c similarity -- 1 2 0.2 (= 1 common `t` between the 2 `c` / by total 4 `t`) Would appreciate any nudge toward a solution. -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users