Re: [sqlite] finding pairwise similarity

2012-01-14 Thread Jean-Christophe Deschamps



where `similarity` is a measure of overlap of `t` between each pair of 
`c` and is calculated as the number of `t` common between a given pair 
divided by the total number of unique `t` in the pair.


Just like Igor, I take "unique `t` in pair" to mean "distinct `t` in pair".



I want the result to be

source_c target_c similarity
  --
1   2 0.2 (= 1 common `t` between the 2 `c` / by 
total 4 `t`)


Here I take 0.2 to mean 0.25.


If so, here's some more amunition:

select src.c_no "Source c",
   trg.c_no "Target c",
   (select count(*) from (select 1 from c_t where c_t.c_no in 
(src.c_no, trg.c_no) group by c_t.t_no having count(*) > 1))

   * 1.0 /
   (select count(*) from (select distinct t_no from c_t where 
c_t.c_no in (src.c_no, trg.c_no))) Similarity

   from c src join c trg on (src.c_no < trg.c_no);

If I'm not mistaken, both versions should give identical 
results.  Check by yourself on real data and see which is faster for 
you (most probably Igor's).


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] finding pairwise similarity

2012-01-14 Thread Igor Tandetnik
Mr. Puneet Kishor  wrote:
> I have two tables like so
> 
>CREATE TABLE c (c_no INTEGER PRIMARY KEY, c_name TEXT);
>CREATE TABLE t (t_no INTEGER PRIMARY KEY, t_name TEXT);
>CREATE TABLE c_t (c_no INTEGER, t_no INTEGER);
> 
> Every row in `c` has one or more `t`. I want, in a single SQL, the following
> 
>source_c, target_c, similarity
> 
> where `similarity` is a measure of overlap of `t` between each pair of `c` 
> and is calculated as the number of `t` common between
> a given pair divided by the total number of unique `t` in the pair.

select src.c_no source_c, trg.c_no target_c,
(select count(*) from (
 select t_no from c_t where c_t.c_no = src.c_no
 intersect
 select t_no from c_t where c_t.c_no = trg.c_no)) * 1.0
/
(select count(*) from (
 select t_no from c_t where c_t.c_no = src.c_no
 union
 select t_no from c_t where c_t.c_no = trg.c_no)) similarity
from c src join c trg on (src.c_no < trg.c_no);

-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] finding pairwise similarity

2012-01-14 Thread Mr. Puneet Kishor
I have two tables like so

CREATE TABLE c (c_no INTEGER PRIMARY KEY, c_name TEXT);
CREATE TABLE t (t_no INTEGER PRIMARY KEY, t_name TEXT);
CREATE TABLE c_t (c_no INTEGER, t_no INTEGER);

Every row in `c` has one or more `t`. I want, in a single SQL, the following

source_c, target_c, similarity

where `similarity` is a measure of overlap of `t` between each pair of `c` and 
is calculated as the number of `t` common between a given pair divided by the 
total number of unique `t` in the pair. For example, given the following records

c   t
--- ---
1   1
1   2
1   3
2   1
2   5

I want the result to be

source_c target_c similarity
  --
1   2 0.2 (= 1 common `t` between the 2 `c` / by total 4 `t`)

Would appreciate any nudge toward a solution.

--
Puneet Kishor
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users