[SQL] window function to sort times series data?

2010-03-24 Thread Louis-David Mitterrand
Hi,

I have time series data: price(id_price int, price int, created_on timestamp)

I'd like to select the latest price before, say, 2010-03-10 and the
latest price after that date.

Using "group by" and self-joins I was able to build a (quite large :)
working query. 

But I wonder if there is a cleaner, shorter solution with a window
function.

I tried something like:

select * from (select 
first_value(p.id_price) over w as first_id_price,
first_value(p.price) over w as first_price,
first_value(p.created_on::date) over w as first_date,
nth_value(p.id_price,2) over w as second_id_price,
nth_value(p.price,2) over w as second_price,
nth_value(p.created_on::date,2) over w as second_date,
p.id_price
from price p
window w as (order by p.created_on > '2010-03-10, p.id_price desc
rows between unbounded preceding and unbounded following)) 
as t where first_id_price=id_price;

But this doesn't return correct results.

Thanks for any suggestions,

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] window function to sort times series data?

2010-03-24 Thread A. Kretschmer
In response to Louis-David Mitterrand :
> Hi,
> 
> I have time series data: price(id_price int, price int, created_on timestamp)
> 
> I'd like to select the latest price before, say, 2010-03-10 and the
> latest price after that date.

test=*# select * from price ;
 id_price | price | created_on
--+---+-
1 |10 | 2010-01-01 00:00:00
1 |12 | 2010-02-01 00:00:00
1 | 8 | 2010-03-01 00:00:00
1 |15 | 2010-03-10 00:00:00
1 |13 | 2010-03-20 00:00:00
(5 rows)

test=*# select * from (
  select distinct on(id_price) id_price, price, created_on from price where 
created_on < '2010-02-20'::date order by id_price, created_on desc
) foo union all select * from (
  select distinct on(id_price) id_price, price, created_on from price where 
created_on > '2010-02-20'::date order by id_price, created_on asc
) bar order by id_price,created_on ;
 id_price | price | created_on
--+---+-
1 |12 | 2010-02-01 00:00:00
1 | 8 | 2010-03-01 00:00:00
(2 rows)


That's okay for you?


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] window function to sort times series data?

2010-03-24 Thread Louis-David Mitterrand
On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote:
> In response to Louis-David Mitterrand :
> > Hi,
> > 
> > I have time series data: price(id_price int, price int, created_on 
> > timestamp)
> > 
> > I'd like to select the latest price before, say, 2010-03-10 and the
> > latest price after that date.
> 
> test=*# select * from price ;
>  id_price | price | created_on
> --+---+-
> 1 |10 | 2010-01-01 00:00:00
> 1 |12 | 2010-02-01 00:00:00
> 1 | 8 | 2010-03-01 00:00:00
> 1 |15 | 2010-03-10 00:00:00
> 1 |13 | 2010-03-20 00:00:00
> (5 rows)
> 
> test=*# select * from (
>   select distinct on(id_price) id_price, price, created_on from price where 
> created_on < '2010-02-20'::date order by id_price, created_on desc
> ) foo union all select * from (
>   select distinct on(id_price) id_price, price, created_on from price where 
> created_on > '2010-02-20'::date order by id_price, created_on asc
> ) bar order by id_price,created_on ;
>  id_price | price | created_on
> --+---+-
> 1 |12 | 2010-02-01 00:00:00
> 1 | 8 | 2010-03-01 00:00:00
> (2 rows)
> 
> That's okay for you?

Yes, that works, but I forgot in my specs (!) that I'd like the two
prices (pre and post 2010-03-10) to be returned on the same row and only
if a post-2010-03-10 price exists.

Thanks,

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] window function to sort times series data?

2010-03-24 Thread A. Kretschmer
In response to Louis-David Mitterrand :
> On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote:
> > In response to Louis-David Mitterrand :
> > > Hi,
> > > 
> > > I have time series data: price(id_price int, price int, created_on 
> > > timestamp)
> > > 
> > > I'd like to select the latest price before, say, 2010-03-10 and the
> > > latest price after that date.
> > 
> > test=*# select * from price ;
> >  id_price | price | created_on
> > --+---+-
> > 1 |10 | 2010-01-01 00:00:00
> > 1 |12 | 2010-02-01 00:00:00
> > 1 | 8 | 2010-03-01 00:00:00
> > 1 |15 | 2010-03-10 00:00:00
> > 1 |13 | 2010-03-20 00:00:00
> > (5 rows)
> > 
> > test=*# select * from (
> >   select distinct on(id_price) id_price, price, created_on from price where 
> > created_on < '2010-02-20'::date order by id_price, created_on desc
> > ) foo union all select * from (
> >   select distinct on(id_price) id_price, price, created_on from price where 
> > created_on > '2010-02-20'::date order by id_price, created_on asc
> > ) bar order by id_price,created_on ;
> >  id_price | price | created_on
> > --+---+-
> > 1 |12 | 2010-02-01 00:00:00
> > 1 | 8 | 2010-03-01 00:00:00
> > (2 rows)
> > 
> > That's okay for you?
> 
> Yes, that works, but I forgot in my specs (!) that I'd like the two
> prices (pre and post 2010-03-10) to be returned on the same row and only
> if a post-2010-03-10 price exists.

Well:

test=*# select * from price ;
 id_price | price | created_on
--+---+-
1 |10 | 2010-01-01 00:00:00
1 |12 | 2010-02-01 00:00:00
1 | 8 | 2010-03-01 00:00:00
1 |15 | 2010-03-10 00:00:00
1 |13 | 2010-03-20 00:00:00
(5 rows)

test=*# select id_price, sum(case when pre_post='pre' then price else 0
end) as pre, sum(case when pre_post='post' then price else 0 end) as
post, max(case when pre_post='pre' then created_on else null end) as
date_pre, max(case when pre_post='post' then created_on else null end)
as date_post from (select * from (select distinct on(id_price)
'pre'::text as pre_post, id_price, price, created_on from price where
created_on < '2010-02-20'::date order by id_price, created_on desc) foo
union all select * from (select distinct on(id_price) 'post'::text,
id_price, price, created_on from price where created_on >
'2010-02-20'::date order by id_price,created_on asc) bar order by
id_price, created_on) foo group by id_price;
 id_price | pre | post |  date_pre   |  date_post
--+-+--+-+-
1 |  12 |8 | 2010-02-01 00:00:00 | 2010-03-01 00:00:00
(1 row)


You can filter that result for rows where date_post is not null.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] window function to sort times series data?

2010-03-24 Thread Andreas Kretschmer
A. Kretschmer  wrote:

Well, and now i'm using 8.4 windowing-functions:

test=*# select * from price order by price_id, d;
 price_id | price | d
--+---+
1 |10 | 2010-03-12
1 |11 | 2010-03-19
1 |12 | 2010-03-26
1 |13 | 2010-04-02
1 |14 | 2010-04-09
1 |15 | 2010-04-16
1 |16 | 2010-04-23
1 |17 | 2010-04-30
2 |20 | 2010-03-12
2 |21 | 2010-03-19
2 |22 | 2010-03-26
2 |23 | 2010-04-02
(12 Zeilen)

-- now i'm searching for 2010-03-20:

Zeit: 0,319 ms
test=*# select price_id, sum(case when d < '2010-03-20'::date then price
else 0 end) as price_old, sum(case when d > '2010-03-20'::date then
price else 0 end) as price_new, max(case when d < '2010-03-20'::date
then d else null end) as date_old, max(case when d > '2010-03-20'::date
then d else null end) as date_new from (select price_id, price, d,
lag(d) over(partition by price_id order by d), lead(d) over(partition by
price_id order by d) from price) foo where '2010-03-20'::date between
lag and lead group by price_id;
 price_id | price_old | price_new |  date_old  |  date_new
--+---+---++
1 |11 |12 | 2010-03-19 | 2010-03-26
2 |21 |22 | 2010-03-19 | 2010-03-26
(2 Zeilen)



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] window function to sort times series data?

2010-03-24 Thread Louis-David Mitterrand
On Wed, Mar 24, 2010 at 05:29:46PM +0100, Andreas Kretschmer wrote:
> A. Kretschmer  wrote:
> 
> Well, and now i'm using 8.4 windowing-functions:
> 
> test=*# select * from price order by price_id, d;
>  price_id | price | d
> --+---+
> 1 |10 | 2010-03-12
> 1 |11 | 2010-03-19
> 1 |12 | 2010-03-26
> 1 |13 | 2010-04-02
> 1 |14 | 2010-04-09
> 1 |15 | 2010-04-16
> 1 |16 | 2010-04-23
> 1 |17 | 2010-04-30
> 2 |20 | 2010-03-12
> 2 |21 | 2010-03-19
> 2 |22 | 2010-03-26
> 2 |23 | 2010-04-02
> (12 Zeilen)
> 
> -- now i'm searching for 2010-03-20:
> 
> Zeit: 0,319 ms
> test=*# select price_id, sum(case when d < '2010-03-20'::date then price
> else 0 end) as price_old, sum(case when d > '2010-03-20'::date then
> price else 0 end) as price_new, max(case when d < '2010-03-20'::date
> then d else null end) as date_old, max(case when d > '2010-03-20'::date
> then d else null end) as date_new from (select price_id, price, d,
> lag(d) over(partition by price_id order by d), lead(d) over(partition by
> price_id order by d) from price) foo where '2010-03-20'::date between
> lag and lead group by price_id;
>  price_id | price_old | price_new |  date_old  |  date_new
> --+---+---++
> 1 |11 |12 | 2010-03-19 | 2010-03-26
> 2 |21 |22 | 2010-03-19 | 2010-03-26
> (2 Zeilen)

Nice use of lag() and lead() functions. 

In my db id_price is a serial so it's easy to use in an aggregate to
determine the latest.

I also looked at window functions and did the following:

select p3.price as first_price,
   p4.price as second_price
   from (select 
   first_value(max(p.id_price)) over w as 
first_id_price, 
   nth_value(max(p.id_price),2) over w as 
second_id_price, 
   p.created_on > '2010-03-20' as is_new_price
   from price p
   group by p.created_on > '2010-03-20'
   window w as (order by p.created_on > '2010-03-20'
   desc rows between unbounded preceding and unbounded 
following)
) as t
join price p3 on (t.first_id_price=p3.id_price)
left join price p4 on (t.second_id_price=p4.id_price)
where t.is_new_price is true

test=# \e
 first_price | second_price 
-+--
  17 |   11
(1 row)

Is there some potential optimizations or flaws?




Here is the test database:

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'SQL_ASCII';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: price; Type: TABLE; Schema: public; Owner: ldm; Tablespace: 
--

CREATE TABLE price (
id_price integer NOT NULL,
price integer,
created_on timestamp without time zone
);


ALTER TABLE public.price OWNER TO ldm;

--
-- Name: price_id_price_seq; Type: SEQUENCE; Schema: public; Owner: ldm
--

CREATE SEQUENCE price_id_price_seq
START WITH 1
INCREMENT BY 1
NO MAXVALUE
NO MINVALUE
CACHE 1;


ALTER TABLE public.price_id_price_seq OWNER TO ldm;

--
-- Name: price_id_price_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: ldm
--

ALTER SEQUENCE price_id_price_seq OWNED BY price.id_price;


--
-- Name: price_id_price_seq; Type: SEQUENCE SET; Schema: public; Owner: ldm
--

SELECT pg_catalog.setval('price_id_price_seq', 8, true);


--
-- Name: id_price; Type: DEFAULT; Schema: public; Owner: ldm
--

ALTER TABLE price ALTER COLUMN id_price SET DEFAULT 
nextval('price_id_price_seq'::regclass);


--
-- Data for Name: price; Type: TABLE DATA; Schema: public; Owner: ldm
--

COPY price (id_price, price, created_on) FROM stdin;
1   10  2010-03-12 00:00:00
2   11  2010-03-19 00:00:00
3   12  2010-03-26 00:00:00
4   13  2010-04-02 00:00:00
5   14  2010-04-09 00:00:00
6   15  2010-04-16 00:00:00
7   16  2010-04-23 00:00:00
8   17  2010-04-30 00:00:00
\.


--
-- Name: public; Type: ACL; Schema: -; Owner: postgres
--

REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;


--
-- PostgreSQL database dump complete
--


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] window function to sort times series data?

2010-03-24 Thread John Gage
In going through the arcana of string functions, I have come across the
following series of selects that contain, for me, a mysterious "$re$".

-- return all matches from regexp
SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);

-- test case insensitive
SELECT regexp_matches('foObARbEqUEbAz', $re$(bar)(beque)$re$, 'i');

-- global option - more than one match
SELECT regexp_matches('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$,
'g');

When I run this I get:

regexp_matches
--


I have not been able to find out what it all means.  Forgive me for my
blindness.

John

P.S. The author of the statements is "alexk" at Command Prompt.  They are
test statements against Postgres' string functions.


Re: [SQL] window function to sort times series data?

2010-03-24 Thread Josh Kupershmidt
On Wed, Mar 24, 2010 at 2:38 PM, John Gage  wrote:
> In going through the arcana of string functions, I have come across the
> following series of selects that contain, for me, a mysterious "$re$".
>
> -- return all matches from regexp
> SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);

The $re$ is just an arbitrary identifier for a dollar-quoted string
constant. See:

http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html

Maybe you're on an old version which doesn't support dollar-quoted strings?

On 8.3 for the above query, I get:

SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$);
 regexp_matches

 {bar,beque}
(1 row)

Josh

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Does IMMUTABLE property propagate?

2010-03-24 Thread Louis-David Mitterrand
On Sat, Mar 06, 2010 at 04:31:55PM -0500, Tom Lane wrote:
> Petru Ghita  writes:
> > "..immediately replaced with the function value" doesn't mean that the
> > results of a previously evaluated function for the same parameters are
> > stored and reused?
> 
> No, it means what it says: the function is executed once and replaced
> with a constant representing the result value.

So for example a function like:


CREATE OR REPLACE FUNCTION shorten_cruise_type(intext text) RETURNS text
AS $$
declare
outtext text;
begin
outtext = trim(regexp_replace(intext, E'\\s*Short( Break)?', 
'', 'i'));
return outtext;
end;
$$
LANGUAGE plpgsql;

could/should be declared immutable?

Thanks,

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql