Sorry about the horrendous subject, let me explain by example:

Let's take this schema:


```
CREATE TABLE a (
  id bigserial PRIMARY KEY,
  created_at timestamp with time zone  NOT NULL  DEFAULT NOW()
);

CREATE TABLE b(
  id bigserial PRIMARY KEY,
  a_id bigint NOT NULL REFERENCES a(id),
  created_at      timestamp with time zone  NOT NULL  DEFAULT NOW()
);

CREATE TABLE c(
  id bigserial PRIMARY KEY,
  b_id bigint NOT NULL REFERENCES b(id),
  created_at      timestamp with time zone  NOT NULL  DEFAULT NOW()
);
```

And let's fill it up with some dummy data, that roughly matches the
distribution of mine:

```
INSERT INTO a SELECT FROM generate_series(1, 5);
INSERT INTO b(a_id) SELECT (i % 5) + 1 FROM generate_series(1, 100) i;
INSERT INTO c(b_id) SELECT (trunc(random() * 100)+1) FROM
generate_series(1, 1000000);
```


And here's the query I want to do, efficiently:

````
SELECT * FROM c
  JOIN b ON b.id = c.b_id
  JOIN a ON a.id = b.a_id
WHERE a.id = 3
ORDER BY b.created_at DESC
LIMIT 10
```

There seems to simply be no index I can put on the data, that allows me
to run this query efficiently. Four hours of playing with this, the only
solution I can see is, normalizing table `c` by adding a field "b's
a_id" and then creating an index on  (bs_a_id, created_at).

But surely there's a better solution?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to