Re: [GENERAL] Query to find contiguous ranges on a column

2009-10-14 Thread Peter Hunsberger
On Tue, Oct 13, 2009 at 5:12 PM, Tim Landscheidt t...@tim-landscheidt.de 
wrote:
 Peter Hunsberger peter.hunsber...@gmail.com wrote:

 You can either use a PL/pgSQL function (SETOF TEXT just
 for the convenience of the example):

That works well, takes about 20 seconds to do the 6M+ rows


 or a recursive query (which I always find very hard to com-
 prehend):

 | WITH RECURSIVE RecCols (LeftBoundary, Value) AS
 |   (SELECT col, col FROM t WHERE (col - 1) NOT IN (SELECT col FROM t)
 |    UNION ALL SELECT p.LeftBoundary, c.col FROM RecCols AS p, t AS c WHERE 
 c.col = p.Value + 1)
 |   SELECT LeftBoundary, MAX(Value) AS RightBoundary FROM RecCols
 |     GROUP BY LeftBoundary
 |     ORDER BY LeftBoundary;

 Could you run both against your data set and find out which
 one is faster for your six million rows?


Turns out the server is v 8.3, looks like I need to get them to
upgrade it so I get recursive and windowing :-(.  If this happens any
time soon I'll let you know the results.

Many thanks.


-- 
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query to find contiguous ranges on a column

2009-10-14 Thread Tim Landscheidt
Peter Hunsberger peter.hunsber...@gmail.com wrote:

 [...]
 or a recursive query (which I always find very hard to com-
 prehend):

 | WITH RECURSIVE RecCols (LeftBoundary, Value) AS
 |   (SELECT col, col FROM t WHERE (col - 1) NOT IN (SELECT col FROM t)
 |    UNION ALL SELECT p.LeftBoundary, c.col FROM RecCols AS p, t AS c WHERE 
 c.col = p.Value + 1)
 |   SELECT LeftBoundary, MAX(Value) AS RightBoundary FROM RecCols
 |     GROUP BY LeftBoundary
 |     ORDER BY LeftBoundary;

 Could you run both against your data set and find out which
 one is faster for your six million rows?

 Turns out the server is v 8.3, looks like I need to get them to
 upgrade it so I get recursive and windowing :-(.  If this happens any
 time soon I'll let you know the results.

 Many thanks.

After some tests with a data set of 7983 rows (and 1638 ran-
ges): Don't! :-) The recursive solution seems to be more
than double as slow as the iterative. I'll take it to -per-
formance.

Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query to find contiguous ranges on a column

2009-10-14 Thread Peter Hunsberger
On Wed, Oct 14, 2009 at 4:50 PM, Tim Landscheidt t...@tim-landscheidt.de 
wrote:
 Peter Hunsberger peter.hunsber...@gmail.com wrote:

 After some tests with a data set of 7983 rows (and 1638 ran-
 ges): Don't! :-) The recursive solution seems to be more
 than double as slow as the iterative. I'll take it to -per-
 formance.


Interesting, I've never liked recursive on Oracle but performance is
usually reasonable... Thanks for the heads up...


-- 
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query to find contiguous ranges on a column

2009-10-13 Thread Peter Hunsberger
Given a column of data resembling the following:

col
2
3
4
5
11
12
13
14
15
16
17
18
19
23
32
33
34
35
36
37

I need a query to find the contiguous ranges within this column, in
this case returning the result set:

start, end
2, 5
11, 19
23, 23
32, 37

I have one solution that joins the table against itself and does
(among other things) a subselect looking not exists col +1 and not
exists col -1 on the two instances of the table to find the start and
end.  This is, as you might guess, is not very efficient (my actual
data is some 6 million+ rows) and I'm guessing there has to be
something more efficient with windowing or possibly grouping on min
and max (though I can't see how to make sure they are part of a
contiguous set).  Anyone have any ideas?

-- 
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query to find contiguous ranges on a column

2009-10-13 Thread Tim Landscheidt
Peter Hunsberger peter.hunsber...@gmail.com wrote:

 [...]
 I have one solution that joins the table against itself and does
 (among other things) a subselect looking not exists col +1 and not
 exists col -1 on the two instances of the table to find the start and
 end.  This is, as you might guess, is not very efficient (my actual
 data is some 6 million+ rows) and I'm guessing there has to be
 something more efficient with windowing or possibly grouping on min
 and max (though I can't see how to make sure they are part of a
 contiguous set).  Anyone have any ideas?

You can either use a PL/pgSQL function (SETOF TEXT just
for the convenience of the example):

| CREATE FUNCTION SummarizeRanges () RETURNS SETOF TEXT AS $$
| DECLARE
|   CurrentFirst INT;
|   CurrentLast INT;
|   CurrentRecord RECORD;
| BEGIN
|   FOR CurrentRecord IN SELECT col FROM t ORDER BY col LOOP
| IF CurrentFirst IS NULL THEN
|   CurrentFirst := CurrentRecord.col;
|   CurrentLast  := CurrentRecord.col;
| ELSIF CurrentRecord.col = CurrentLast + 1 THEN
|   CurrentLast := CurrentRecord.col;
| ELSE
|   RETURN NEXT CurrentFirst || ', ' || CurrentLast;
|   CurrentFirst := CurrentRecord.col;
|   CurrentLast := CurrentRecord.col;
| END IF;
|   END LOOP;
|   IF CurrentFirst IS NOT NULL THEN
| RETURN NEXT CurrentFirst || ', ' || CurrentLast;
|   END IF;
|   RETURN;
| END;
| $$ LANGUAGE plpgsql;

or a recursive query (which I always find very hard to com-
prehend):

| WITH RECURSIVE RecCols (LeftBoundary, Value) AS
|   (SELECT col, col FROM t WHERE (col - 1) NOT IN (SELECT col FROM t)
|UNION ALL SELECT p.LeftBoundary, c.col FROM RecCols AS p, t AS c WHERE 
c.col = p.Value + 1)
|   SELECT LeftBoundary, MAX(Value) AS RightBoundary FROM RecCols
| GROUP BY LeftBoundary
| ORDER BY LeftBoundary;

Could you run both against your data set and find out which
one is faster for your six million rows?

Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query to find contiguous ranges on a column

2009-10-13 Thread Thomas Kellerer

Peter Hunsberger wrote on 13.10.2009 23:23:

I need a query to find the contiguous ranges within this column, in
this case returning the result set:

start, end
2, 5
11, 19
23, 23
32, 37

I have one solution that joins the table against itself and does
(among other things) a subselect looking not exists col +1 and not
exists col -1 on the two instances of the table to find the start and
end.  This is, as you might guess, is not very efficient (my actual
data is some 6 million+ rows) and I'm guessing there has to be
something more efficient with windowing or possibly grouping on min
and max (though I can't see how to make sure they are part of a
contiguous set).  Anyone have any ideas?


This is the best I can put together right now.

Not very nice, but currently I can't think of a better solution:

select * 
from 
(

 select soi as start_of_interval,
case 
  when soi is not null and eoi is null then lead(eoi) over()

  when soi is not null and eoi is not null then eoi
  else null
end as end_of_interval
 from (
 select case 
  when col - (lag(col,1,col + 1) over (order by col)) - 1  0 then col

  else null
end as soi,
case 
  when col - (lead(col,1,col + 1) over (order by col)) + 1  0 then col 
  else null

end as eoi
 from numbers
 ) t1
 where t1.soi is not null
or t1.eoi is not null
) t2
where t2.start_of_interval is not null 
 and t2.end_of_interval is not null;


The outer-most select is needed to get rid of the empty rows. I couldn't find 
a way to push that into one of the sub-queries.

The execution plan doesn't look too bad (probably better than your plan with a self join and a subselect), but it does sort the whole table which might be a problem. 


Regards
Thomas






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general