# product table (simplified): create table p ( id char(22) not null primary key, name text, desc text );
# product category table (simpl.): create table pc ( id char(22) not null primary key, name text, desc text );
# table that maps products into categories: create table p_pc ( id char(22) not null primary key, pid char(22) not null references p(id), pcid char(22) not null references pc(id) ); create index i_ppc_pid on p_pc(pid); create index i_ppc_pcid on p_pc(pcid); create unique index i_ppc_pid_pcid on p_pc(pid, pcid);
There are about 50k rows in p, 40k in pc, and 140k in p_pc. Most products are only placed in 1 category, some in 2, fewer in 3, and fewer stills in 4, 5, 6 categories.
I want to know how many products are placed in 1 category, how many in 2, and so on. Here's my query:
select count(pid),num_cat from ( select pid,count(cid) as num_cat from ppc group by pid) as f group by num_cat;
A sample output (4000 products are categorized in 5 different places, 4998 in 4, and so on):
count | num_cat -------+--------- 4000 | 5 4998 | 4 7502 | 3 10001 | 2 17499 | 1 (5 rows)
However, this query is taking about 2.5 minutes. Any idea on how to make it faster?
-- dave
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html