# product table (simplified):
create table p (
  id char(22) not null primary key,
  name text,
  desc text
);

# product category table (simpl.):
create table pc (
  id char(22) not null primary key,
  name text,
  desc text
);

# table that maps products into categories:
create table p_pc (
  id char(22) not null primary key,
  pid char(22) not null references p(id),
  pcid char(22) not null references pc(id)
);
create index i_ppc_pid on p_pc(pid);
create index i_ppc_pcid on p_pc(pcid);
create unique index i_ppc_pid_pcid on p_pc(pid, pcid);

There are about 50k rows in p, 40k in pc, and 140k in p_pc. Most products are only placed in 1 category, some in 2, fewer in 3, and fewer stills in 4, 5, 6 categories.

I want to know how many products are placed in 1 category, how many in 2, and so on. Here's my query:

select count(pid),num_cat from (
  select pid,count(cid) as num_cat
  from ppc group by pid) as f
group by num_cat;

A sample output (4000 products are categorized in 5 different places, 4998 in 4, and so on):

 count | num_cat
-------+---------
  4000 |       5
  4998 |       4
  7502 |       3
 10001 |       2
 17499 |       1
(5 rows)

However, this query is taking about 2.5 minutes. Any idea on how to make it faster?

--
dave

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html



Reply via email to