Re: [GENERAL] Stuck with a query...

2005-03-13 Thread Edmund Bacon
[EMAIL PROTECTED] (Greg Stark) writes:

> Geoff Caplan <[EMAIL PROTECTED]> writes:
> 
> > Hi folks,
> > 
> > Sorry to ask a newbie SQL question but I'm struggling...
> 
> There's no efficient way to write this in standard SQL. However Postgres has
> an extension DISTINCT ON that would do it:
> 
> select url,count(*) 
>   from (select distinct on (session_id)
>url
>   from clickstream
>  order by session_id,sequence_num  desc
>)
>  group by url
> 
> This isn't going to be a superfast query. It has to sort all the clickstream
> records by session and sequence, take just the last one, then probably sort
> those again.
>

As an experiment I tried a more  'standard SQL' approach to this problem:

SELECT url, count(1) 
  FROM clickstream 
 WHERE (session_id, sequence_num) IN 
 (SELECT session_id, max(sequence_num) 
FROM clickstream 
GROUP BY session_id)
GROUP BY url;

On a table with about 100,000 rows this runs in about 720ms on my
system , compared to the ON DISTICNT version which runs in about
1000ms.  Adding an index on (session_id, sequence_num) reduced the run
time to about 690ms, but made no difference to the DISTINCT ON
version.  With only about 10,000 rows, there's no appreciable
difference. This surprised me, because I expected the DISTINCT ON to
be better.


-- 
Remove -42 for email

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Stuck with a query...

2005-03-08 Thread Geoff Caplan
Greg,

GS> There's no efficient way to write this in standard SQL.
GS> However Postgres has an extension DISTINCT ON that would
GS> do it:

Works as advertised - many thanks! I'd missed the DISTINCT
ON extension...

This really is a great list - you've saved me a couple of
hours of agony, I suspect.

-- 
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Stuck with a query...

2005-03-08 Thread Greg Stark

Geoff Caplan <[EMAIL PROTECTED]> writes:

> Hi folks,
> 
> Sorry to ask a newbie SQL question but I'm struggling...

There's no efficient way to write this in standard SQL. However Postgres has
an extension DISTINCT ON that would do it:

select url,count(*) 
  from (select distinct on (session_id)
   url
  from clickstream
 order by session_id,sequence_num  desc
   )
 group by url

This isn't going to be a superfast query. It has to sort all the clickstream
records by session and sequence, take just the last one, then probably sort
those again.

You could maybe make it faster by having an index on 
and doing order by "session_id desc, sequence_num desc". And giving this
session a larger than normal sort_mem would give it a better chance of being
able to use hash_agg for the count.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[GENERAL] Stuck with a query...

2005-03-08 Thread Geoff Caplan
Hi folks,

Sorry to ask a newbie SQL question but I'm struggling...

I have a website clickstream log:

request_id   session_id  sequence_num  url
100  xxx 1 /foo
101  xxx 2 /bar
102  xxx 3 /hoo
103  yyy 1 /foo
104  yyy 2 /bar

I need to count the number of times each url was an exit
page. An exit page is the highest sequence_num for each
session_id - 102 & 104 in the example. The report would look
like:

Exit pages:

/foo 4555
/bar 3204
/hoo 2337

etc...

Any pointers would be much appreciated!

-- 
Geoff Caplan
Vario Software Ltd
(+44) 121-515 1154 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org