[EMAIL PROTECTED] (Greg Stark) writes:

> Geoff Caplan <[EMAIL PROTECTED]> writes:
> 
> > Hi folks,
> > 
> > Sorry to ask a newbie SQL question but I'm struggling...
> 
> There's no efficient way to write this in standard SQL. However Postgres has
> an extension DISTINCT ON that would do it:
> 
> select url,count(*) 
>   from (select distinct on (session_id)
>                url
>           from clickstream
>          order by session_id,sequence_num  desc
>        )
>  group by url
> 
> This isn't going to be a superfast query. It has to sort all the clickstream
> records by session and sequence, take just the last one, then probably sort
> those again.
>

As an experiment I tried a more  'standard SQL' approach to this problem:

SELECT url, count(1) 
  FROM clickstream 
 WHERE (session_id, sequence_num) IN 
     (SELECT session_id, max(sequence_num) 
        FROM clickstream 
        GROUP BY session_id)
GROUP BY url;

On a table with about 100,000 rows this runs in about 720ms on my
system , compared to the ON DISTICNT version which runs in about
1000ms.  Adding an index on (session_id, sequence_num) reduced the run
time to about 690ms, but made no difference to the DISTINCT ON
version.  With only about 10,000 rows, there's no appreciable
difference. This surprised me, because I expected the DISTINCT ON to
be better.


-- 
Remove -42 for email

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to