Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row Value

2011-10-11 Thread Jeff Adams
Thanks for the suggestions Chris (and Chris). After a bit more investigation
I stumbled upon the Window functions. The approach below turned out to be
much more efficient that a function or self join approach. The SQL that I
used is provided below (event_id and mmsi uniquely identify a vessel transit
for which I wished to compute how much time had elapsed between successive
records):

 

SELECT 

  a.event_id, 

  a.mmsi, 

  (a.epoch - lag(epoch) OVER (PARTITION BY event_id, mmsi ORDER BY epoch
ASC))/60 AS elapsed

FROM 

   dmas_ais a 

 

Jeff

 

From: ccur...@gmail.com [mailto:ccur...@gmail.com] On Behalf Of Chris Curvey
Sent: Saturday, October 01, 2011 10:55 PM
To: Jeff Adams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row
Value

 

On Sat, Oct 1, 2011 at 5:30 PM, Jeff Adams jeff.ad...@noaa.gov wrote:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff



 

Would a self-join with a MAX() help, like this?  (Where v is your
vessel_id and e is your time value?)

 

create table stuff

(  v int

,  e timestamp

);

 

insert into stuff (v, e) values (1, '1/1/2011');

insert into stuff (v, e) values (1, '1/2/2011');

insert into stuff (v, e) values (1, '1/3/2011');

 

insert into stuff (v, e) values (2, '2/1/2011');

insert into stuff (v, e) values (2, '2/2/2011');

 

select a.v, a.e, max(b.e), a.e - max(b.e)

from stuff a

join stuff b on a.v = b.v

where a.e  b.e

group by a.v, a.e

 

I don't have a multi-million row table handy, but I'd be interested to hear
your results.

 

-- 

e-Mail is the equivalent of a postcard written in pencil.  This message may
not have been sent by me, or intended for you.  It may have been read or
even modified while in transit.  e-Mail disclaimers have the same force in
law as a note passed in study hall.  If your corporate attorney says that
you need an disclaimer in your signature, you need a new corporate attorney.



Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row Value

2011-10-02 Thread Thomas Kellerer

Jeff Adams wrote on 01.10.2011 23:30:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff


Something like:


SELECT vessel,
   time_column,
   time_column - lag(time_column) over (partition by vessel order by 
time_column) as diff
FROM your_table

No sure how good that performs though.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] SQL Help - Finding Next Lowest Value of Current Row Value

2011-10-01 Thread Jeff Adams
Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row Value

2011-10-01 Thread Chris Travers
On Sat, Oct 1, 2011 at 2:30 PM, Jeff Adams jeff.ad...@noaa.gov wrote:
 Greetings,

 I have a large table (~19 million records). Records contains a field
 identifying a vessel and a field containing an time (epoch). Using the
 current rows vessel and time values, I need to be able to find the next
 lowest time value for the vessel and use it to compute how much time has
 elapsed between the records. I have tried a scalar subquery in the SELECT,
 which works, but it runs quite slowly. Is there an alternative approach that
 might perform better for this type of query. Any information would be
 greatly appreciated. Thanks...

How many rows per vessel?

Either way it seems to me we are talking about selecting the current
record, and then selecting the max of another record within a
timeframe.  I would probably use a plpgsql function that would do this
in two stages, ideally being able to do an index scan twice

Best Wishes,
Chris Travers

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row Value

2011-10-01 Thread Chris Curvey
On Sat, Oct 1, 2011 at 5:30 PM, Jeff Adams jeff.ad...@noaa.gov wrote:

 Greetings,

 I have a large table (~19 million records). Records contains a field
 identifying a vessel and a field containing an time (epoch). Using the
 current rows vessel and time values, I need to be able to find the next
 lowest time value for the vessel and use it to compute how much time has
 elapsed between the records. I have tried a scalar subquery in the SELECT,
 which works, but it runs quite slowly. Is there an alternative approach
 that
 might perform better for this type of query. Any information would be
 greatly appreciated. Thanks...

 Jeff



Would a self-join with a MAX() help, like this?  (Where v is your
vessel_id and e is your time value?)

create table stuff
(  v int
,  e timestamp
);

insert into stuff (v, e) values (1, '1/1/2011');
insert into stuff (v, e) values (1, '1/2/2011');
insert into stuff (v, e) values (1, '1/3/2011');

insert into stuff (v, e) values (2, '2/1/2011');
insert into stuff (v, e) values (2, '2/2/2011');

select a.v, a.e, max(b.e), a.e - max(b.e)
from stuff a
join stuff b on a.v = b.v
where a.e  b.e
group by a.v, a.e

I don't have a multi-million row table handy, but I'd be interested to hear
your results.

-- 
e-Mail is the equivalent of a postcard written in pencil.  This message may
not have been sent by me, or intended for you.  It may have been read or
even modified while in transit.  e-Mail disclaimers have the same force in
law as a note passed in study hall.  If your corporate attorney says that
you need an disclaimer in your signature, you need a new corporate attorney.