Re: [GENERAL] time series query

2007-04-02 Thread Jaime Silvela

Good idea. I tried it and got a 12% decrease in execution time!
Still slower than the usual JOIN, but not by that much.

William Garrison wrote:
Would it speed things up siginficantly if you set the dtval_smaller() 
function to be immutable?  Volatile is the default, so it may be 
redundantly evaluating things.


Jaime Silvela wrote:

In case anyone is interested, I was able to solve this, more or less.
Here's my new "Latest value" query:

 select obj_id, val_type_id, (max(row(observation_date, val))).val
 from measurements
 group by obj_id, val_type_id

It was only necessary to define a new (date, numeric) type. Below is 
the code. The performance is actually slower than using a JOIN 
between the table and its  GROUP-BY  version. I guess for 
performance, I should code the functions in C, but at the moment, the 
value for me is that it simplifies a lot of my 12-way join queries!


create type dtval as (
 dt date,
 val numeric
);

create  function dtval_smaller(dtval, dtval) returns dtval as $$
 select case when $1.dt < $2.dt then $1 else $2 end
$$ language sql;

create aggregate min (
 sfunc = dtval_smaller,
 basetype = dtval,
 stype = dtval
);

create  function dtval_larger(dtval, dtval) returns dtval as $$
 select case when $1.dt > $2.dt then $1 else $2 end
$$ language sql;

create aggregate max (
 sfunc = dtval_larger,
 basetype = dtval,
 stype = dtval
);



Jaime Silvela wrote:
The problem I'm trying to solve is pretty standard. I have a table 
that records measurements of different types at different times.


CREATE TABLE measurements (
 obj_id int4,
 val_type_id int4 references lu_val_type(val_type_id),
 val numeric,
 observation_date date
);

I want a query as simple and fast as possible to return the latest 
observation of each type for each object.
I sent a message to this list a while ago, and the suggestion I 
found to be the best compromise of clarity and speed was:

a) create an index on (obj_id, val_type_id, observation_date)
b) the "obvious" query becomes fast thanks to the index.
   select ms.*
   from (
   select obj_id, val_type_id, max(observation_date) as 
observation_date

   from measurements
  group by obj_id, val_type_id
   ) ms_last
   join measurements ms using (obj_id, val_type_id, observation_date);

It still bugged me a bit that this requires a JOIN, especially since 
in a procedural language, it would have been so easy to return the 
value associated with the max(observation_date).
I think I've found a pretty good alternative. This at the moment 
works if we keep track of time with an integer, rather than a date, 
but it would be readily extensible.


The idea is to in fact, associate the value with the 
max(observation_date) like so:

select obj_id, val_type_id, max(array[observation_date, val])
group by obj_id, val_type_id;

There are two caveats:
a) array requires elements to be of the same type, so 
observation_type must be kept as "time from"
b) a row constructor would be ideal here, but there is now max 
function for rowtypes.


If I did have a max() function for row types, it would be clean to 
do this:

select obj_id, val_type_id, max(row(observation_date, val))
group by obj_id, val_type_id;

Now, it seems that since rowtype comparison is built in, it should 
be pretty easy to build a max() aggregate for it. Has anybody done 
this? I'd have looked at the code for max(anyarray) but I don't know 
how to access it. Can someone point me in the right direction?


Also, has someone thought about this before? I'm wondering if there 
will be a speed gain coming from this.


Thank you,
Jaime


***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of 
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this com

Re: [GENERAL] time series query

2007-04-02 Thread William Garrison
Would it speed things up siginficantly if you set the dtval_smaller() 
function to be immutable?  Volatile is the default, so it may be 
redundantly evaluating things.


Jaime Silvela wrote:

In case anyone is interested, I was able to solve this, more or less.
Here's my new "Latest value" query:

 select obj_id, val_type_id, (max(row(observation_date, val))).val
 from measurements
 group by obj_id, val_type_id

It was only necessary to define a new (date, numeric) type. Below is the 
code. The performance is actually slower than using a JOIN between the 
table and its  GROUP-BY  version. I guess for performance, I should code 
the functions in C, but at the moment, the value for me is that it 
simplifies a lot of my 12-way join queries!


create type dtval as (
 dt date,
 val numeric
);

create  function dtval_smaller(dtval, dtval) returns dtval as $$
 select case when $1.dt < $2.dt then $1 else $2 end
$$ language sql;

create aggregate min (
 sfunc = dtval_smaller,
 basetype = dtval,
 stype = dtval
);

create  function dtval_larger(dtval, dtval) returns dtval as $$
 select case when $1.dt > $2.dt then $1 else $2 end
$$ language sql;

create aggregate max (
 sfunc = dtval_larger,
 basetype = dtval,
 stype = dtval
);



Jaime Silvela wrote:
The problem I'm trying to solve is pretty standard. I have a table 
that records measurements of different types at different times.


CREATE TABLE measurements (
 obj_id int4,
 val_type_id int4 references lu_val_type(val_type_id),
 val numeric,
 observation_date date
);

I want a query as simple and fast as possible to return the latest 
observation of each type for each object.
I sent a message to this list a while ago, and the suggestion I found 
to be the best compromise of clarity and speed was:

a) create an index on (obj_id, val_type_id, observation_date)
b) the "obvious" query becomes fast thanks to the index.
   select ms.*
   from (
   select obj_id, val_type_id, max(observation_date) as 
observation_date

   from measurements
  group by obj_id, val_type_id
   ) ms_last
   join measurements ms using (obj_id, val_type_id, observation_date);

It still bugged me a bit that this requires a JOIN, especially since 
in a procedural language, it would have been so easy to return the 
value associated with the max(observation_date).
I think I've found a pretty good alternative. This at the moment works 
if we keep track of time with an integer, rather than a date, but it 
would be readily extensible.


The idea is to in fact, associate the value with the 
max(observation_date) like so:

select obj_id, val_type_id, max(array[observation_date, val])
group by obj_id, val_type_id;

There are two caveats:
a) array requires elements to be of the same type, so observation_type 
must be kept as "time from"
b) a row constructor would be ideal here, but there is now max 
function for rowtypes.


If I did have a max() function for row types, it would be clean to do 
this:

select obj_id, val_type_id, max(row(observation_date, val))
group by obj_id, val_type_id;

Now, it seems that since rowtype comparison is built in, it should be 
pretty easy to build a max() aggregate for it. Has anybody done this? 
I'd have looked at the code for max(anyarray) but I don't know how to 
access it. Can someone point me in the right direction?


Also, has someone thought about this before? I'm wondering if there 
will be a speed gain coming from this.


Thank you,
Jaime


***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
e

Re: [GENERAL] time series query

2007-04-02 Thread Jaime Silvela

In case anyone is interested, I was able to solve this, more or less.
Here's my new "Latest value" query:

 select obj_id, val_type_id, (max(row(observation_date, val))).val
 from measurements
 group by obj_id, val_type_id

It was only necessary to define a new (date, numeric) type. Below is the 
code. The performance is actually slower than using a JOIN between the 
table and its  GROUP-BY  version. I guess for performance, I should code 
the functions in C, but at the moment, the value for me is that it 
simplifies a lot of my 12-way join queries!


create type dtval as (
 dt date,
 val numeric
);

create  function dtval_smaller(dtval, dtval) returns dtval as $$
 select case when $1.dt < $2.dt then $1 else $2 end
$$ language sql;

create aggregate min (
 sfunc = dtval_smaller,
 basetype = dtval,
 stype = dtval
);

create  function dtval_larger(dtval, dtval) returns dtval as $$
 select case when $1.dt > $2.dt then $1 else $2 end
$$ language sql;

create aggregate max (
 sfunc = dtval_larger,
 basetype = dtval,
 stype = dtval
);



Jaime Silvela wrote:
The problem I'm trying to solve is pretty standard. I have a table 
that records measurements of different types at different times.


CREATE TABLE measurements (
 obj_id int4,
 val_type_id int4 references lu_val_type(val_type_id),
 val numeric,
 observation_date date
);

I want a query as simple and fast as possible to return the latest 
observation of each type for each object.
I sent a message to this list a while ago, and the suggestion I found 
to be the best compromise of clarity and speed was:

a) create an index on (obj_id, val_type_id, observation_date)
b) the "obvious" query becomes fast thanks to the index.
   select ms.*
   from (
   select obj_id, val_type_id, max(observation_date) as 
observation_date

   from measurements
  group by obj_id, val_type_id
   ) ms_last
   join measurements ms using (obj_id, val_type_id, observation_date);

It still bugged me a bit that this requires a JOIN, especially since 
in a procedural language, it would have been so easy to return the 
value associated with the max(observation_date).
I think I've found a pretty good alternative. This at the moment works 
if we keep track of time with an integer, rather than a date, but it 
would be readily extensible.


The idea is to in fact, associate the value with the 
max(observation_date) like so:

select obj_id, val_type_id, max(array[observation_date, val])
group by obj_id, val_type_id;

There are two caveats:
a) array requires elements to be of the same type, so observation_type 
must be kept as "time from"
b) a row constructor would be ideal here, but there is now max 
function for rowtypes.


If I did have a max() function for row types, it would be clean to do 
this:

select obj_id, val_type_id, max(row(observation_date, val))
group by obj_id, val_type_id;

Now, it seems that since rowtype comparison is built in, it should be 
pretty easy to build a max() aggregate for it. Has anybody done this? 
I'd have looked at the code for max(anyarray) but I don't know how to 
access it. Can someone point me in the right direction?


Also, has someone thought about this before? I'm wondering if there 
will be a speed gain coming from this.


Thank you,
Jaime


***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of:  (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice.  You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues con